Netdev Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH net-next] ip_gre: Make none-tun-dst gre tunnel keep tunnel info
@ 2019-11-19  7:08 wenxu
  2019-11-20  0:39 ` David Miller
  0 siblings, 1 reply; 25+ messages in thread
From: wenxu @ 2019-11-19  7:08 UTC (permalink / raw)
  To: pablo, davem; +Cc: netdev

From: wenxu <wenxu@ucloud.cn>

Currently only collect_md gre tunnel keep tunnel info.
But the nono-tun-dst gre tunnel already can send packte through
lwtunnel.

For non-tun-dst gre tunnel should keep the tunnel info to make
the arp response can send success through the tunnel_info in
iptunnel_metadata_reply.

The following is the test script:

ip netns add cl
ip l add dev vethc type veth peer name eth0 netns cl

ifconfig vethc 172.168.0.7/24 up
ip l add dev tun1000 type gretap key 1000

ip link add user1000 type vrf table 1
ip l set user1000 up
ip l set dev tun1000 master user1000
ifconfig tun1000 10.0.1.1/24 up

ip netns exec cl ifconfig eth0 172.168.0.17/24 up
ip netns exec cl ip l add dev tun type gretap local 172.168.0.17 remote 172.168.0.7 key 1000
ip netns exec cl ifconfig tun 10.0.1.7/24 up
ip r r 10.0.1.7 encap ip id 1000 dst 172.168.0.17 key dev tun1000 table 1

With this patch
ip netns exec cl ping 10.0.1.1 can success

Signed-off-by: wenxu <wenxu@ucloud.cn>
---
 net/ipv4/ip_gre.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index 10636fb..572b630 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -340,6 +340,8 @@ static int __ipgre_rcv(struct sk_buff *skb, const struct tnl_ptk_info *tpi,
 				  iph->saddr, iph->daddr, tpi->key);
 
 	if (tunnel) {
+		const struct iphdr *tnl_params;
+
 		if (__iptunnel_pull_header(skb, hdr_len, tpi->proto,
 					   raw_proto, false) < 0)
 			goto drop;
@@ -348,7 +350,9 @@ static int __ipgre_rcv(struct sk_buff *skb, const struct tnl_ptk_info *tpi,
 			skb_pop_mac_header(skb);
 		else
 			skb_reset_mac_header(skb);
-		if (tunnel->collect_md) {
+
+		tnl_params = &tunnel->parms.iph;
+		if (tunnel->collect_md || tnl_params->daddr == 0) {
 			__be16 flags;
 			__be64 tun_id;
 
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next] ip_gre: Make none-tun-dst gre tunnel keep tunnel info
  2019-11-19  7:08 [PATCH net-next] ip_gre: Make none-tun-dst gre tunnel keep tunnel info wenxu
@ 2019-11-20  0:39 ` David Miller
  2019-11-21  7:30   ` Question about flow table offload in mlx5e wenxu
  0 siblings, 1 reply; 25+ messages in thread
From: David Miller @ 2019-11-20  0:39 UTC (permalink / raw)
  To: wenxu; +Cc: pablo, netdev

From: wenxu@ucloud.cn
Date: Tue, 19 Nov 2019 15:08:51 +0800

> From: wenxu <wenxu@ucloud.cn>
> 
> Currently only collect_md gre tunnel keep tunnel info.
> But the nono-tun-dst gre tunnel already can send packte through
> lwtunnel.
> 
> For non-tun-dst gre tunnel should keep the tunnel info to make
> the arp response can send success through the tunnel_info in
> iptunnel_metadata_reply.

I know that English is not your native language, but there are many
typos in here and I do not understand from your description how all
of this works and what needs to be fixed.

Please try explain things more clearly, showing how collect_md works
for these tunnel types, exactly, compared to non-tun-dst tunnels.

Thank you.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Question about flow table offload in mlx5e
  2019-11-20  0:39 ` David Miller
@ 2019-11-21  7:30   ` wenxu
  2019-11-21  7:42     ` Paul Blakey
  0 siblings, 1 reply; 25+ messages in thread
From: wenxu @ 2019-11-21  7:30 UTC (permalink / raw)
  To: paulb; +Cc: pablo, netdev, markb

Hi  paul,

The flow table offload in the mlx5e is based on TC_SETUP_FT.


It is almost the same as TC_SETUP_BLOCK.

It just set MLX5_TC_FLAG(FT_OFFLOAD) flags and change cls_flower.common.chain_index = FDB_FT_CHAIN;

In following codes line 1380 and 1392

1368 static int mlx5e_rep_setup_ft_cb(enum tc_setup_type type, void *type_data,
1369                                  void *cb_priv)
1370 {
1371         struct flow_cls_offload *f = type_data;
1372         struct flow_cls_offload cls_flower;
1373         struct mlx5e_priv *priv = cb_priv;
1374         struct mlx5_eswitch *esw;
1375         unsigned long flags;
1376         int err;
1377
1378         flags = MLX5_TC_FLAG(INGRESS) |
1379                 MLX5_TC_FLAG(ESW_OFFLOAD) |
1380                 MLX5_TC_FLAG(FT_OFFLOAD);
1381         esw = priv->mdev->priv.eswitch;
1382
1383         switch (type) {
1384         case TC_SETUP_CLSFLOWER:
1385                 if (!mlx5_eswitch_prios_supported(esw) || f->common.chain_index)
1386                         return -EOPNOTSUPP;
1387
1388                 /* Re-use tc offload path by moving the ft flow to the
1389                  * reserved ft chain.
1390                  */
1391                 memcpy(&cls_flower, f, sizeof(*f));
1392                cls_flower.common.chain_index = FDB_FT_CHAIN;
1393                 err = mlx5e_rep_setup_tc_cls_flower(priv, &cls_flower, flags);
1394                 memcpy(&f->stats, &cls_flower.stats, sizeof(f->stats));


I want to add tunnel offload support in the flow table, I  add some patches in nf_flow_table_offload.

Also add the indr setup support in the mlx driver. And Now I can  flow table offload with decap.


But I meet a problem with the encap.  The encap rule can be added in hardware  successfully But it can't be offloaded.

But I think the rule I added is correct.  If I mask the line 1392. The rule also can be add success and can be offloaded.

So there are some limit for encap operation for FT_OFFLOAD in FDB_FT_CHAIN?


BR

wenxu



^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: Question about flow table offload in mlx5e
  2019-11-21  7:30   ` Question about flow table offload in mlx5e wenxu
@ 2019-11-21  7:42     ` Paul Blakey
  2019-11-21  8:28       ` wenxu
  0 siblings, 1 reply; 25+ messages in thread
From: Paul Blakey @ 2019-11-21  7:42 UTC (permalink / raw)
  To: wenxu; +Cc: pablo, netdev, Mark Bloch

Hi,

The original design was the block setup to use TC_SETUP_FT type, and the tc event type to be case TC_SETUP_CLSFLOWER.
We will post a patch to change that. I would advise to wait till we fix that 😊
I'm not sure how you get to this function mlx5e_rep_setup_ft_cb() if it the nf_flow_table_offload ndo_setup_tc event was TC_SETUP_BLOCK, and not TC_SETUP_FT.

In our driver en_rep.c we have:
>-------switch (type) {
>-------case TC_SETUP_BLOCK:
>------->-------return flow_block_cb_setup_simple(type_data,
>------->------->------->------->------->-------  &mlx5e_rep_block_tc_cb_list,
>------->------->------->------->------->-------  mlx5e_rep_setup_tc_cb,
>------->------->------->------->------->-------  priv, priv, true);
>-------case TC_SETUP_FT:
>------->-------return flow_block_cb_setup_simple(type_data,
>------->------->------->------->------->-------  &mlx5e_rep_block_ft_cb_list,
>------->------->------->------->------->-------  mlx5e_rep_setup_ft_cb,
>------->------->------->------->------->-------  priv, priv, true);
>-------default:
>------->-------return -EOPNOTSUPP;
>-------}

In nf_flow_table_offload.c:
>-------bo.binder_type>-= FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS;
>-------bo.extack>------= &extack;
>-------INIT_LIST_HEAD(&bo.cb_list);

>-------err = dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_BLOCK, &bo);
>-------if (err < 0)
>------->-------return err;

>-------return nf_flow_table_block_setup(flowtable, &bo, cmd);
}
EXPORT_SYMBOL_GPL(nf_flow_table_offload_setup);


So unless you changed that as well, you should have gotten to mlx5e_rep_setup_tc_cb and not mlx5e_rep_setup_tc_ft.

Regarding the encap action, there should be no difference on which chain the rule is on.


> -----Original Message-----
> From: wenxu <wenxu@ucloud.cn>
> Sent: Thursday, November 21, 2019 9:30 AM
> To: Paul Blakey <paulb@mellanox.com>
> Cc: pablo@netfilter.org; netdev@vger.kernel.org; Mark Bloch
> <markb@mellanox.com>
> Subject: Question about flow table offload in mlx5e
> 
> Hi  paul,
> 
> The flow table offload in the mlx5e is based on TC_SETUP_FT.
> 
> 
> It is almost the same as TC_SETUP_BLOCK.
> 
> It just set MLX5_TC_FLAG(FT_OFFLOAD) flags and change
> cls_flower.common.chain_index = FDB_FT_CHAIN;
> 
> In following codes line 1380 and 1392
> 
> 1368 static int mlx5e_rep_setup_ft_cb(enum tc_setup_type type, void
> *type_data,
> 1369                                  void *cb_priv)
> 1370 {
> 1371         struct flow_cls_offload *f = type_data;
> 1372         struct flow_cls_offload cls_flower;
> 1373         struct mlx5e_priv *priv = cb_priv;
> 1374         struct mlx5_eswitch *esw;
> 1375         unsigned long flags;
> 1376         int err;
> 1377
> 1378         flags = MLX5_TC_FLAG(INGRESS) |
> 1379                 MLX5_TC_FLAG(ESW_OFFLOAD) |
> 1380                 MLX5_TC_FLAG(FT_OFFLOAD);
> 1381         esw = priv->mdev->priv.eswitch;
> 1382
> 1383         switch (type) {
> 1384         case TC_SETUP_CLSFLOWER:
> 1385                 if (!mlx5_eswitch_prios_supported(esw) || f-
> >common.chain_index)
> 1386                         return -EOPNOTSUPP;
> 1387
> 1388                 /* Re-use tc offload path by moving the ft flow to the
> 1389                  * reserved ft chain.
> 1390                  */
> 1391                 memcpy(&cls_flower, f, sizeof(*f));
> 1392                cls_flower.common.chain_index = FDB_FT_CHAIN;
> 1393                 err = mlx5e_rep_setup_tc_cls_flower(priv, &cls_flower, flags);
> 1394                 memcpy(&f->stats, &cls_flower.stats, sizeof(f->stats));
> 
> 
> I want to add tunnel offload support in the flow table, I  add some patches in
> nf_flow_table_offload.
> 
> Also add the indr setup support in the mlx driver. And Now I can  flow table
> offload with decap.
> 
> 
> But I meet a problem with the encap.  The encap rule can be added in
> hardware  successfully But it can't be offloaded.
> 
> But I think the rule I added is correct.  If I mask the line 1392. The rule also can
> be add success and can be offloaded.
> 
> So there are some limit for encap operation for FT_OFFLOAD in
> FDB_FT_CHAIN?
> 
> 
> BR
> 
> wenxu
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Question about flow table offload in mlx5e
  2019-11-21  7:42     ` Paul Blakey
@ 2019-11-21  8:28       ` wenxu
  2019-11-21 11:39         ` Paul Blakey
  0 siblings, 1 reply; 25+ messages in thread
From: wenxu @ 2019-11-21  8:28 UTC (permalink / raw)
  To: Paul Blakey; +Cc: pablo, netdev, Mark Bloch


On 11/21/2019 3:42 PM, Paul Blakey wrote:
> Hi,
>
> The original design was the block setup to use TC_SETUP_FT type, and the tc event type to be case TC_SETUP_CLSFLOWER.
> We will post a patch to change that. I would advise to wait till we fix that 😊
> I'm not sure how you get to this function mlx5e_rep_setup_ft_cb() if it the nf_flow_table_offload ndo_setup_tc event was TC_SETUP_BLOCK, and not TC_SETUP_FT.


Yes I change the TC_SETUP_BLOCK to TC_SETUP_FT in the nf_flow_table_offload_setup.

Two fixes patch provide:

http://patchwork.ozlabs.org/patch/1197818/

http://patchwork.ozlabs.org/patch/1197876/

So this change made by me is not correct currently?

>
> In our driver en_rep.c we have:
>> -------switch (type) {
>> -------case TC_SETUP_BLOCK:
>> ------->-------return flow_block_cb_setup_simple(type_data,
>> ------->------->------->------->------->-------  &mlx5e_rep_block_tc_cb_list,
>> ------->------->------->------->------->-------  mlx5e_rep_setup_tc_cb,
>> ------->------->------->------->------->-------  priv, priv, true);
>> -------case TC_SETUP_FT:
>> ------->-------return flow_block_cb_setup_simple(type_data,
>> ------->------->------->------->------->-------  &mlx5e_rep_block_ft_cb_list,
>> ------->------->------->------->------->-------  mlx5e_rep_setup_ft_cb,
>> ------->------->------->------->------->-------  priv, priv, true);
>> -------default:
>> ------->-------return -EOPNOTSUPP;
>> -------}
> In nf_flow_table_offload.c:
>> -------bo.binder_type>-= FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS;
>> -------bo.extack>------= &extack;
>> -------INIT_LIST_HEAD(&bo.cb_list);
>> -------err = dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_BLOCK, &bo);
>> -------if (err < 0)
>> ------->-------return err;
>> -------return nf_flow_table_block_setup(flowtable, &bo, cmd);
> }
> EXPORT_SYMBOL_GPL(nf_flow_table_offload_setup);
>
>
> So unless you changed that as well, you should have gotten to mlx5e_rep_setup_tc_cb and not mlx5e_rep_setup_tc_ft.
>
> Regarding the encap action, there should be no difference on which chain the rule is on.


But for the same encap rule can be real offloaded when setup through through TC_SETUP_BLOCK. But TC_SETUP_FT can't.

So it is the problem of TC_SETUP_FT in mlx5e_rep_setup_ft_cb ?

>
>
>> -----Original Message-----
>> From: wenxu <wenxu@ucloud.cn>
>> Sent: Thursday, November 21, 2019 9:30 AM
>> To: Paul Blakey <paulb@mellanox.com>
>> Cc: pablo@netfilter.org; netdev@vger.kernel.org; Mark Bloch
>> <markb@mellanox.com>
>> Subject: Question about flow table offload in mlx5e
>>
>> Hi  paul,
>>
>> The flow table offload in the mlx5e is based on TC_SETUP_FT.
>>
>>
>> It is almost the same as TC_SETUP_BLOCK.
>>
>> It just set MLX5_TC_FLAG(FT_OFFLOAD) flags and change
>> cls_flower.common.chain_index = FDB_FT_CHAIN;
>>
>> In following codes line 1380 and 1392
>>
>> 1368 static int mlx5e_rep_setup_ft_cb(enum tc_setup_type type, void
>> *type_data,
>> 1369                                  void *cb_priv)
>> 1370 {
>> 1371         struct flow_cls_offload *f = type_data;
>> 1372         struct flow_cls_offload cls_flower;
>> 1373         struct mlx5e_priv *priv = cb_priv;
>> 1374         struct mlx5_eswitch *esw;
>> 1375         unsigned long flags;
>> 1376         int err;
>> 1377
>> 1378         flags = MLX5_TC_FLAG(INGRESS) |
>> 1379                 MLX5_TC_FLAG(ESW_OFFLOAD) |
>> 1380                 MLX5_TC_FLAG(FT_OFFLOAD);
>> 1381         esw = priv->mdev->priv.eswitch;
>> 1382
>> 1383         switch (type) {
>> 1384         case TC_SETUP_CLSFLOWER:
>> 1385                 if (!mlx5_eswitch_prios_supported(esw) || f-
>>> common.chain_index)
>> 1386                         return -EOPNOTSUPP;
>> 1387
>> 1388                 /* Re-use tc offload path by moving the ft flow to the
>> 1389                  * reserved ft chain.
>> 1390                  */
>> 1391                 memcpy(&cls_flower, f, sizeof(*f));
>> 1392                cls_flower.common.chain_index = FDB_FT_CHAIN;
>> 1393                 err = mlx5e_rep_setup_tc_cls_flower(priv, &cls_flower, flags);
>> 1394                 memcpy(&f->stats, &cls_flower.stats, sizeof(f->stats));
>>
>>
>> I want to add tunnel offload support in the flow table, I  add some patches in
>> nf_flow_table_offload.
>>
>> Also add the indr setup support in the mlx driver. And Now I can  flow table
>> offload with decap.
>>
>>
>> But I meet a problem with the encap.  The encap rule can be added in
>> hardware  successfully But it can't be offloaded.
>>
>> But I think the rule I added is correct.  If I mask the line 1392. The rule also can
>> be add success and can be offloaded.
>>
>> So there are some limit for encap operation for FT_OFFLOAD in
>> FDB_FT_CHAIN?
>>
>>
>> BR
>>
>> wenxu
>>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: Question about flow table offload in mlx5e
  2019-11-21  8:28       ` wenxu
@ 2019-11-21 11:39         ` Paul Blakey
  2019-11-21 11:40           ` Paul Blakey
  2019-11-21 12:35           ` wenxu
  0 siblings, 2 replies; 25+ messages in thread
From: Paul Blakey @ 2019-11-21 11:39 UTC (permalink / raw)
  To: wenxu; +Cc: pablo, netdev, Mark Bloch

They are good fixes, exactly what we had when we tested this, thanks.

Regarding encap, I don't know what changes you did, how does the encap rule look? Is it a FWD to vxlan device? If not it should be, as our driver expects that.

I tried it on my setup via tc, by changing the callback of tc (mlx5e_rep_setup_tc_cb) to that of ft (mlx5e_rep_setup_ft_cb),
and testing a vxlan encap rule:
sudo tc qdisc add dev ens1f0_0 ingress
sudo ifconfig ens1f0 7.7.7.7/24 up
sudo ip link add name vxlan0 type vxlan dev ens1f0 remote 7.7.7.8 dstport 4789 external
sudo ifconfig vxlan0 up
sudo tc filter add dev ens1f0_0 ingress prio 1 chain 0 protocol ip flower dst_mac aa:bb:cc:dd:ee:ff ip_proto udp skip_sw  action tunnel_key set src_ip 0.0.0.0 dst_ip 7.7.7.8 id 1234 dst_port 4789 pipe action mirred egress redirect dev vxlan

then tc show:
filter protocol ip pref 1 flower chain 0 handle 0x1 dst_mac aa:bb:cc:dd:ee:ff ip_proto udp skip_sw in_hw in_hw_count 1
        tunnel_key set src_ip 0.0.0.0 dst_ip 7.7.7.8 key_id 1234 dst_port 4789 csum pipe
        Stats: used 119 sec      0 pkt
        mirred (Egress Redirect to device vxlan0)
        Stats: used 119 sec      0 pkt



> -----Original Message-----
> From: wenxu <wenxu@ucloud.cn>
> Sent: Thursday, November 21, 2019 10:29 AM
> To: Paul Blakey <paulb@mellanox.com>
> Cc: pablo@netfilter.org; netdev@vger.kernel.org; Mark Bloch
> <markb@mellanox.com>
> Subject: Re: Question about flow table offload in mlx5e
> 
> 
> On 11/21/2019 3:42 PM, Paul Blakey wrote:
> > Hi,
> >
> > The original design was the block setup to use TC_SETUP_FT type, and the
> tc event type to be case TC_SETUP_CLSFLOWER.
> > We will post a patch to change that. I would advise to wait till we fix that
> 😊
> > I'm not sure how you get to this function mlx5e_rep_setup_ft_cb() if it the
> nf_flow_table_offload ndo_setup_tc event was TC_SETUP_BLOCK, and not
> TC_SETUP_FT.
> 
> 
> Yes I change the TC_SETUP_BLOCK to TC_SETUP_FT in the
> nf_flow_table_offload_setup.
> 
> Two fixes patch provide:
> 
> http://patchwork.ozlabs.org/patch/1197818/
> 
> http://patchwork.ozlabs.org/patch/1197876/
> 
> So this change made by me is not correct currently?
> 
> >
> > In our driver en_rep.c we have:
> >> -------switch (type) {
> >> -------case TC_SETUP_BLOCK:
> >> ------->-------return flow_block_cb_setup_simple(type_data,
> >> ------->------->------->------->------->-------  &mlx5e_rep_block_tc_cb_list,
> >> ------->------->------->------->------->-------  mlx5e_rep_setup_tc_cb,
> >> ------->------->------->------->------->-------  priv, priv, true);
> >> -------case TC_SETUP_FT:
> >> ------->-------return flow_block_cb_setup_simple(type_data,
> >> ------->------->------->------->------->-------  &mlx5e_rep_block_ft_cb_list,
> >> ------->------->------->------->------->-------  mlx5e_rep_setup_ft_cb,
> >> ------->------->------->------->------->-------  priv, priv, true);
> >> -------default:
> >> ------->-------return -EOPNOTSUPP;
> >> -------}
> > In nf_flow_table_offload.c:
> >> -------bo.binder_type>-= FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS;
> >> -------bo.extack>------= &extack;
> >> -------INIT_LIST_HEAD(&bo.cb_list);
> >> -------err = dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_BLOCK,
> &bo);
> >> -------if (err < 0)
> >> ------->-------return err;
> >> -------return nf_flow_table_block_setup(flowtable, &bo, cmd);
> > }
> > EXPORT_SYMBOL_GPL(nf_flow_table_offload_setup);
> >
> >
> > So unless you changed that as well, you should have gotten to
> mlx5e_rep_setup_tc_cb and not mlx5e_rep_setup_tc_ft.
> >
> > Regarding the encap action, there should be no difference on which chain
> the rule is on.
> 
> 
> But for the same encap rule can be real offloaded when setup through
> through TC_SETUP_BLOCK. But TC_SETUP_FT can't.
> 
> So it is the problem of TC_SETUP_FT in mlx5e_rep_setup_ft_cb ?
> 
> >
> >
> >> -----Original Message-----
> >> From: wenxu <wenxu@ucloud.cn>
> >> Sent: Thursday, November 21, 2019 9:30 AM
> >> To: Paul Blakey <paulb@mellanox.com>
> >> Cc: pablo@netfilter.org; netdev@vger.kernel.org; Mark Bloch
> >> <markb@mellanox.com>
> >> Subject: Question about flow table offload in mlx5e
> >>
> >> Hi  paul,
> >>
> >> The flow table offload in the mlx5e is based on TC_SETUP_FT.
> >>
> >>
> >> It is almost the same as TC_SETUP_BLOCK.
> >>
> >> It just set MLX5_TC_FLAG(FT_OFFLOAD) flags and change
> >> cls_flower.common.chain_index = FDB_FT_CHAIN;
> >>
> >> In following codes line 1380 and 1392
> >>
> >> 1368 static int mlx5e_rep_setup_ft_cb(enum tc_setup_type type, void
> >> *type_data,
> >> 1369                                  void *cb_priv)
> >> 1370 {
> >> 1371         struct flow_cls_offload *f = type_data;
> >> 1372         struct flow_cls_offload cls_flower;
> >> 1373         struct mlx5e_priv *priv = cb_priv;
> >> 1374         struct mlx5_eswitch *esw;
> >> 1375         unsigned long flags;
> >> 1376         int err;
> >> 1377
> >> 1378         flags = MLX5_TC_FLAG(INGRESS) |
> >> 1379                 MLX5_TC_FLAG(ESW_OFFLOAD) |
> >> 1380                 MLX5_TC_FLAG(FT_OFFLOAD);
> >> 1381         esw = priv->mdev->priv.eswitch;
> >> 1382
> >> 1383         switch (type) {
> >> 1384         case TC_SETUP_CLSFLOWER:
> >> 1385                 if (!mlx5_eswitch_prios_supported(esw) || f-
> >>> common.chain_index)
> >> 1386                         return -EOPNOTSUPP;
> >> 1387
> >> 1388                 /* Re-use tc offload path by moving the ft flow to the
> >> 1389                  * reserved ft chain.
> >> 1390                  */
> >> 1391                 memcpy(&cls_flower, f, sizeof(*f));
> >> 1392                cls_flower.common.chain_index = FDB_FT_CHAIN;
> >> 1393                 err = mlx5e_rep_setup_tc_cls_flower(priv, &cls_flower,
> flags);
> >> 1394                 memcpy(&f->stats, &cls_flower.stats, sizeof(f->stats));
> >>
> >>
> >> I want to add tunnel offload support in the flow table, I  add some patches
> in
> >> nf_flow_table_offload.
> >>
> >> Also add the indr setup support in the mlx driver. And Now I can  flow
> table
> >> offload with decap.
> >>
> >>
> >> But I meet a problem with the encap.  The encap rule can be added in
> >> hardware  successfully But it can't be offloaded.
> >>
> >> But I think the rule I added is correct.  If I mask the line 1392. The rule also
> can
> >> be add success and can be offloaded.
> >>
> >> So there are some limit for encap operation for FT_OFFLOAD in
> >> FDB_FT_CHAIN?
> >>
> >>
> >> BR
> >>
> >> wenxu
> >>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: Question about flow table offload in mlx5e
  2019-11-21 11:39         ` Paul Blakey
@ 2019-11-21 11:40           ` Paul Blakey
  2019-11-21 12:35           ` wenxu
  1 sibling, 0 replies; 25+ messages in thread
From: Paul Blakey @ 2019-11-21 11:40 UTC (permalink / raw)
  To: wenxu; +Cc: pablo, netdev, Mark Bloch

Also, can u CC me on those patches? Thanks.

> -----Original Message-----
> From: Paul Blakey
> Sent: Thursday, November 21, 2019 1:40 PM
> To: wenxu <wenxu@ucloud.cn>
> Cc: pablo@netfilter.org; netdev@vger.kernel.org; Mark Bloch
> <markb@mellanox.com>
> Subject: RE: Question about flow table offload in mlx5e
> 
> They are good fixes, exactly what we had when we tested this, thanks.
> 
> Regarding encap, I don't know what changes you did, how does the encap
> rule look? Is it a FWD to vxlan device? If not it should be, as our driver
> expects that.
> 
> I tried it on my setup via tc, by changing the callback of tc
> (mlx5e_rep_setup_tc_cb) to that of ft (mlx5e_rep_setup_ft_cb),
> and testing a vxlan encap rule:
> sudo tc qdisc add dev ens1f0_0 ingress
> sudo ifconfig ens1f0 7.7.7.7/24 up
> sudo ip link add name vxlan0 type vxlan dev ens1f0 remote 7.7.7.8 dstport
> 4789 external
> sudo ifconfig vxlan0 up
> sudo tc filter add dev ens1f0_0 ingress prio 1 chain 0 protocol ip flower
> dst_mac aa:bb:cc:dd:ee:ff ip_proto udp skip_sw  action tunnel_key set
> src_ip 0.0.0.0 dst_ip 7.7.7.8 id 1234 dst_port 4789 pipe action mirred egress
> redirect dev vxlan
> 
> then tc show:
> filter protocol ip pref 1 flower chain 0 handle 0x1 dst_mac aa:bb:cc:dd:ee:ff
> ip_proto udp skip_sw in_hw in_hw_count 1
>         tunnel_key set src_ip 0.0.0.0 dst_ip 7.7.7.8 key_id 1234 dst_port 4789
> csum pipe
>         Stats: used 119 sec      0 pkt
>         mirred (Egress Redirect to device vxlan0)
>         Stats: used 119 sec      0 pkt
> 
> 
> 
> > -----Original Message-----
> > From: wenxu <wenxu@ucloud.cn>
> > Sent: Thursday, November 21, 2019 10:29 AM
> > To: Paul Blakey <paulb@mellanox.com>
> > Cc: pablo@netfilter.org; netdev@vger.kernel.org; Mark Bloch
> > <markb@mellanox.com>
> > Subject: Re: Question about flow table offload in mlx5e
> >
> >
> > On 11/21/2019 3:42 PM, Paul Blakey wrote:
> > > Hi,
> > >
> > > The original design was the block setup to use TC_SETUP_FT type, and
> the
> > tc event type to be case TC_SETUP_CLSFLOWER.
> > > We will post a patch to change that. I would advise to wait till we fix that
> > 😊
> > > I'm not sure how you get to this function mlx5e_rep_setup_ft_cb() if it
> the
> > nf_flow_table_offload ndo_setup_tc event was TC_SETUP_BLOCK, and
> not
> > TC_SETUP_FT.
> >
> >
> > Yes I change the TC_SETUP_BLOCK to TC_SETUP_FT in the
> > nf_flow_table_offload_setup.
> >
> > Two fixes patch provide:
> >
> > http://patchwork.ozlabs.org/patch/1197818/
> >
> > http://patchwork.ozlabs.org/patch/1197876/
> >
> > So this change made by me is not correct currently?
> >
> > >
> > > In our driver en_rep.c we have:
> > >> -------switch (type) {
> > >> -------case TC_SETUP_BLOCK:
> > >> ------->-------return flow_block_cb_setup_simple(type_data,
> > >> ------->------->------->------->------->-------
> &mlx5e_rep_block_tc_cb_list,
> > >> ------->------->------->------->------->-------  mlx5e_rep_setup_tc_cb,
> > >> ------->------->------->------->------->-------  priv, priv, true);
> > >> -------case TC_SETUP_FT:
> > >> ------->-------return flow_block_cb_setup_simple(type_data,
> > >> ------->------->------->------->------->-------
> &mlx5e_rep_block_ft_cb_list,
> > >> ------->------->------->------->------->-------  mlx5e_rep_setup_ft_cb,
> > >> ------->------->------->------->------->-------  priv, priv, true);
> > >> -------default:
> > >> ------->-------return -EOPNOTSUPP;
> > >> -------}
> > > In nf_flow_table_offload.c:
> > >> -------bo.binder_type>-=
> FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS;
> > >> -------bo.extack>------= &extack;
> > >> -------INIT_LIST_HEAD(&bo.cb_list);
> > >> -------err = dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_BLOCK,
> > &bo);
> > >> -------if (err < 0)
> > >> ------->-------return err;
> > >> -------return nf_flow_table_block_setup(flowtable, &bo, cmd);
> > > }
> > > EXPORT_SYMBOL_GPL(nf_flow_table_offload_setup);
> > >
> > >
> > > So unless you changed that as well, you should have gotten to
> > mlx5e_rep_setup_tc_cb and not mlx5e_rep_setup_tc_ft.
> > >
> > > Regarding the encap action, there should be no difference on which chain
> > the rule is on.
> >
> >
> > But for the same encap rule can be real offloaded when setup through
> > through TC_SETUP_BLOCK. But TC_SETUP_FT can't.
> >
> > So it is the problem of TC_SETUP_FT in mlx5e_rep_setup_ft_cb ?
> >
> > >
> > >
> > >> -----Original Message-----
> > >> From: wenxu <wenxu@ucloud.cn>
> > >> Sent: Thursday, November 21, 2019 9:30 AM
> > >> To: Paul Blakey <paulb@mellanox.com>
> > >> Cc: pablo@netfilter.org; netdev@vger.kernel.org; Mark Bloch
> > >> <markb@mellanox.com>
> > >> Subject: Question about flow table offload in mlx5e
> > >>
> > >> Hi  paul,
> > >>
> > >> The flow table offload in the mlx5e is based on TC_SETUP_FT.
> > >>
> > >>
> > >> It is almost the same as TC_SETUP_BLOCK.
> > >>
> > >> It just set MLX5_TC_FLAG(FT_OFFLOAD) flags and change
> > >> cls_flower.common.chain_index = FDB_FT_CHAIN;
> > >>
> > >> In following codes line 1380 and 1392
> > >>
> > >> 1368 static int mlx5e_rep_setup_ft_cb(enum tc_setup_type type, void
> > >> *type_data,
> > >> 1369                                  void *cb_priv)
> > >> 1370 {
> > >> 1371         struct flow_cls_offload *f = type_data;
> > >> 1372         struct flow_cls_offload cls_flower;
> > >> 1373         struct mlx5e_priv *priv = cb_priv;
> > >> 1374         struct mlx5_eswitch *esw;
> > >> 1375         unsigned long flags;
> > >> 1376         int err;
> > >> 1377
> > >> 1378         flags = MLX5_TC_FLAG(INGRESS) |
> > >> 1379                 MLX5_TC_FLAG(ESW_OFFLOAD) |
> > >> 1380                 MLX5_TC_FLAG(FT_OFFLOAD);
> > >> 1381         esw = priv->mdev->priv.eswitch;
> > >> 1382
> > >> 1383         switch (type) {
> > >> 1384         case TC_SETUP_CLSFLOWER:
> > >> 1385                 if (!mlx5_eswitch_prios_supported(esw) || f-
> > >>> common.chain_index)
> > >> 1386                         return -EOPNOTSUPP;
> > >> 1387
> > >> 1388                 /* Re-use tc offload path by moving the ft flow to the
> > >> 1389                  * reserved ft chain.
> > >> 1390                  */
> > >> 1391                 memcpy(&cls_flower, f, sizeof(*f));
> > >> 1392                cls_flower.common.chain_index = FDB_FT_CHAIN;
> > >> 1393                 err = mlx5e_rep_setup_tc_cls_flower(priv, &cls_flower,
> > flags);
> > >> 1394                 memcpy(&f->stats, &cls_flower.stats, sizeof(f->stats));
> > >>
> > >>
> > >> I want to add tunnel offload support in the flow table, I  add some
> patches
> > in
> > >> nf_flow_table_offload.
> > >>
> > >> Also add the indr setup support in the mlx driver. And Now I can  flow
> > table
> > >> offload with decap.
> > >>
> > >>
> > >> But I meet a problem with the encap.  The encap rule can be added in
> > >> hardware  successfully But it can't be offloaded.
> > >>
> > >> But I think the rule I added is correct.  If I mask the line 1392. The rule
> also
> > can
> > >> be add success and can be offloaded.
> > >>
> > >> So there are some limit for encap operation for FT_OFFLOAD in
> > >> FDB_FT_CHAIN?
> > >>
> > >>
> > >> BR
> > >>
> > >> wenxu
> > >>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Question about flow table offload in mlx5e
  2019-11-21 11:39         ` Paul Blakey
  2019-11-21 11:40           ` Paul Blakey
@ 2019-11-21 12:35           ` wenxu
  2019-11-21 13:05             ` Paul Blakey
  1 sibling, 1 reply; 25+ messages in thread
From: wenxu @ 2019-11-21 12:35 UTC (permalink / raw)
  To: Paul Blakey; +Cc: pablo, netdev, Mark Bloch


在 2019/11/21 19:39, Paul Blakey 写道:
> They are good fixes, exactly what we had when we tested this, thanks.
>
> Regarding encap, I don't know what changes you did, how does the encap rule look? Is it a FWD to vxlan device? If not it should be, as our driver expects that.
It is fwd to a gretap devices
>
> I tried it on my setup via tc, by changing the callback of tc (mlx5e_rep_setup_tc_cb) to that of ft (mlx5e_rep_setup_ft_cb),
> and testing a vxlan encap rule:
> sudo tc qdisc add dev ens1f0_0 ingress
> sudo ifconfig ens1f0 7.7.7.7/24 up
> sudo ip link add name vxlan0 type vxlan dev ens1f0 remote 7.7.7.8 dstport 4789 external
> sudo ifconfig vxlan0 up
> sudo tc filter add dev ens1f0_0 ingress prio 1 chain 0 protocol ip flower dst_mac aa:bb:cc:dd:ee:ff ip_proto udp skip_sw  action tunnel_key set src_ip 0.0.0.0 dst_ip 7.7.7.8 id 1234 dst_port 4789 pipe action mirred egress redirect dev vxlan
>
> then tc show:
> filter protocol ip pref 1 flower chain 0 handle 0x1 dst_mac aa:bb:cc:dd:ee:ff ip_proto udp skip_sw in_hw in_hw_count 1
>         tunnel_key set src_ip 0.0.0.0 dst_ip 7.7.7.8 key_id 1234 dst_port 4789 csum pipe
>         Stats: used 119 sec      0 pkt
>         mirred (Egress Redirect to device vxlan0)
>         Stats: used 119 sec      0 pkt

Can you send packet that match this offloaded flow to check it is real offloaded?

In the flowtable offload with my patches both TC_SETUP_BLOCK and TC_SETUP_FT can offload the rule success

But in the TC_SETUP_FT case the packet is not real offloaded.


I  will test like u did.

>
>
>
>> -----Original Message-----
>> From: wenxu <wenxu@ucloud.cn>
>> Sent: Thursday, November 21, 2019 10:29 AM
>> To: Paul Blakey <paulb@mellanox.com>
>> Cc: pablo@netfilter.org; netdev@vger.kernel.org; Mark Bloch
>> <markb@mellanox.com>
>> Subject: Re: Question about flow table offload in mlx5e
>>
>>
>> On 11/21/2019 3:42 PM, Paul Blakey wrote:
>>> Hi,
>>>
>>> The original design was the block setup to use TC_SETUP_FT type, and the
>> tc event type to be case TC_SETUP_CLSFLOWER.
>>> We will post a patch to change that. I would advise to wait till we fix that
>> 😊
>>> I'm not sure how you get to this function mlx5e_rep_setup_ft_cb() if it the
>> nf_flow_table_offload ndo_setup_tc event was TC_SETUP_BLOCK, and not
>> TC_SETUP_FT.
>>
>>
>> Yes I change the TC_SETUP_BLOCK to TC_SETUP_FT in the
>> nf_flow_table_offload_setup.
>>
>> Two fixes patch provide:
>>
>> http://patchwork.ozlabs.org/patch/1197818/
>>
>> http://patchwork.ozlabs.org/patch/1197876/
>>
>> So this change made by me is not correct currently?
>>
>>> In our driver en_rep.c we have:
>>>> -------switch (type) {
>>>> -------case TC_SETUP_BLOCK:
>>>> ------->-------return flow_block_cb_setup_simple(type_data,
>>>> ------->------->------->------->------->-------  &mlx5e_rep_block_tc_cb_list,
>>>> ------->------->------->------->------->-------  mlx5e_rep_setup_tc_cb,
>>>> ------->------->------->------->------->-------  priv, priv, true);
>>>> -------case TC_SETUP_FT:
>>>> ------->-------return flow_block_cb_setup_simple(type_data,
>>>> ------->------->------->------->------->-------  &mlx5e_rep_block_ft_cb_list,
>>>> ------->------->------->------->------->-------  mlx5e_rep_setup_ft_cb,
>>>> ------->------->------->------->------->-------  priv, priv, true);
>>>> -------default:
>>>> ------->-------return -EOPNOTSUPP;
>>>> -------}
>>> In nf_flow_table_offload.c:
>>>> -------bo.binder_type>-= FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS;
>>>> -------bo.extack>------= &extack;
>>>> -------INIT_LIST_HEAD(&bo.cb_list);
>>>> -------err = dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_BLOCK,
>> &bo);
>>>> -------if (err < 0)
>>>> ------->-------return err;
>>>> -------return nf_flow_table_block_setup(flowtable, &bo, cmd);
>>> }
>>> EXPORT_SYMBOL_GPL(nf_flow_table_offload_setup);
>>>
>>>
>>> So unless you changed that as well, you should have gotten to
>> mlx5e_rep_setup_tc_cb and not mlx5e_rep_setup_tc_ft.
>>> Regarding the encap action, there should be no difference on which chain
>> the rule is on.
>>
>>
>> But for the same encap rule can be real offloaded when setup through
>> through TC_SETUP_BLOCK. But TC_SETUP_FT can't.
>>
>> So it is the problem of TC_SETUP_FT in mlx5e_rep_setup_ft_cb ?
>>
>>>
>>>> -----Original Message-----
>>>> From: wenxu <wenxu@ucloud.cn>
>>>> Sent: Thursday, November 21, 2019 9:30 AM
>>>> To: Paul Blakey <paulb@mellanox.com>
>>>> Cc: pablo@netfilter.org; netdev@vger.kernel.org; Mark Bloch
>>>> <markb@mellanox.com>
>>>> Subject: Question about flow table offload in mlx5e
>>>>
>>>> Hi  paul,
>>>>
>>>> The flow table offload in the mlx5e is based on TC_SETUP_FT.
>>>>
>>>>
>>>> It is almost the same as TC_SETUP_BLOCK.
>>>>
>>>> It just set MLX5_TC_FLAG(FT_OFFLOAD) flags and change
>>>> cls_flower.common.chain_index = FDB_FT_CHAIN;
>>>>
>>>> In following codes line 1380 and 1392
>>>>
>>>> 1368 static int mlx5e_rep_setup_ft_cb(enum tc_setup_type type, void
>>>> *type_data,
>>>> 1369                                  void *cb_priv)
>>>> 1370 {
>>>> 1371         struct flow_cls_offload *f = type_data;
>>>> 1372         struct flow_cls_offload cls_flower;
>>>> 1373         struct mlx5e_priv *priv = cb_priv;
>>>> 1374         struct mlx5_eswitch *esw;
>>>> 1375         unsigned long flags;
>>>> 1376         int err;
>>>> 1377
>>>> 1378         flags = MLX5_TC_FLAG(INGRESS) |
>>>> 1379                 MLX5_TC_FLAG(ESW_OFFLOAD) |
>>>> 1380                 MLX5_TC_FLAG(FT_OFFLOAD);
>>>> 1381         esw = priv->mdev->priv.eswitch;
>>>> 1382
>>>> 1383         switch (type) {
>>>> 1384         case TC_SETUP_CLSFLOWER:
>>>> 1385                 if (!mlx5_eswitch_prios_supported(esw) || f-
>>>>> common.chain_index)
>>>> 1386                         return -EOPNOTSUPP;
>>>> 1387
>>>> 1388                 /* Re-use tc offload path by moving the ft flow to the
>>>> 1389                  * reserved ft chain.
>>>> 1390                  */
>>>> 1391                 memcpy(&cls_flower, f, sizeof(*f));
>>>> 1392                cls_flower.common.chain_index = FDB_FT_CHAIN;
>>>> 1393                 err = mlx5e_rep_setup_tc_cls_flower(priv, &cls_flower,
>> flags);
>>>> 1394                 memcpy(&f->stats, &cls_flower.stats, sizeof(f->stats));
>>>>
>>>>
>>>> I want to add tunnel offload support in the flow table, I  add some patches
>> in
>>>> nf_flow_table_offload.
>>>>
>>>> Also add the indr setup support in the mlx driver. And Now I can  flow
>> table
>>>> offload with decap.
>>>>
>>>>
>>>> But I meet a problem with the encap.  The encap rule can be added in
>>>> hardware  successfully But it can't be offloaded.
>>>>
>>>> But I think the rule I added is correct.  If I mask the line 1392. The rule also
>> can
>>>> be add success and can be offloaded.
>>>>
>>>> So there are some limit for encap operation for FT_OFFLOAD in
>>>> FDB_FT_CHAIN?
>>>>
>>>>
>>>> BR
>>>>
>>>> wenxu
>>>>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: Question about flow table offload in mlx5e
  2019-11-21 12:35           ` wenxu
@ 2019-11-21 13:05             ` Paul Blakey
  2019-11-21 13:39               ` wenxu
                                 ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Paul Blakey @ 2019-11-21 13:05 UTC (permalink / raw)
  To: wenxu; +Cc: pablo, netdev, Mark Bloch

I see, I will test that, and how about normal FWD rules?

Paul.


> -----Original Message-----
> From: wenxu <wenxu@ucloud.cn>
> Sent: Thursday, November 21, 2019 2:35 PM
> To: Paul Blakey <paulb@mellanox.com>
> Cc: pablo@netfilter.org; netdev@vger.kernel.org; Mark Bloch
> <markb@mellanox.com>
> Subject: Re: Question about flow table offload in mlx5e
> 
> 
> 在 2019/11/21 19:39, Paul Blakey 写道:
> > They are good fixes, exactly what we had when we tested this, thanks.
> >
> > Regarding encap, I don't know what changes you did, how does the encap
> rule look? Is it a FWD to vxlan device? If not it should be, as our driver
> expects that.
> It is fwd to a gretap devices
> >
> > I tried it on my setup via tc, by changing the callback of tc
> (mlx5e_rep_setup_tc_cb) to that of ft (mlx5e_rep_setup_ft_cb),
> > and testing a vxlan encap rule:
> > sudo tc qdisc add dev ens1f0_0 ingress
> > sudo ifconfig ens1f0 7.7.7.7/24 up
> > sudo ip link add name vxlan0 type vxlan dev ens1f0 remote 7.7.7.8 dstport
> 4789 external
> > sudo ifconfig vxlan0 up
> > sudo tc filter add dev ens1f0_0 ingress prio 1 chain 0 protocol ip flower
> dst_mac aa:bb:cc:dd:ee:ff ip_proto udp skip_sw  action tunnel_key set
> src_ip 0.0.0.0 dst_ip 7.7.7.8 id 1234 dst_port 4789 pipe action mirred egress
> redirect dev vxlan
> >
> > then tc show:
> > filter protocol ip pref 1 flower chain 0 handle 0x1 dst_mac aa:bb:cc:dd:ee:ff
> ip_proto udp skip_sw in_hw in_hw_count 1
> >         tunnel_key set src_ip 0.0.0.0 dst_ip 7.7.7.8 key_id 1234 dst_port 4789
> csum pipe
> >         Stats: used 119 sec      0 pkt
> >         mirred (Egress Redirect to device vxlan0)
> >         Stats: used 119 sec      0 pkt
> 
> Can you send packet that match this offloaded flow to check it is real
> offloaded?
> 
> In the flowtable offload with my patches both TC_SETUP_BLOCK and
> TC_SETUP_FT can offload the rule success
> 
> But in the TC_SETUP_FT case the packet is not real offloaded.
> 
> 
> I  will test like u did.
> 
> >
> >
> >
> >> -----Original Message-----
> >> From: wenxu <wenxu@ucloud.cn>
> >> Sent: Thursday, November 21, 2019 10:29 AM
> >> To: Paul Blakey <paulb@mellanox.com>
> >> Cc: pablo@netfilter.org; netdev@vger.kernel.org; Mark Bloch
> >> <markb@mellanox.com>
> >> Subject: Re: Question about flow table offload in mlx5e
> >>
> >>
> >> On 11/21/2019 3:42 PM, Paul Blakey wrote:
> >>> Hi,
> >>>
> >>> The original design was the block setup to use TC_SETUP_FT type, and
> the
> >> tc event type to be case TC_SETUP_CLSFLOWER.
> >>> We will post a patch to change that. I would advise to wait till we fix that
> >> 😊
> >>> I'm not sure how you get to this function mlx5e_rep_setup_ft_cb() if it
> the
> >> nf_flow_table_offload ndo_setup_tc event was TC_SETUP_BLOCK, and
> not
> >> TC_SETUP_FT.
> >>
> >>
> >> Yes I change the TC_SETUP_BLOCK to TC_SETUP_FT in the
> >> nf_flow_table_offload_setup.
> >>
> >> Two fixes patch provide:
> >>
> >> http://patchwork.ozlabs.org/patch/1197818/
> >>
> >> http://patchwork.ozlabs.org/patch/1197876/
> >>
> >> So this change made by me is not correct currently?
> >>
> >>> In our driver en_rep.c we have:
> >>>> -------switch (type) {
> >>>> -------case TC_SETUP_BLOCK:
> >>>> ------->-------return flow_block_cb_setup_simple(type_data,
> >>>> ------->------->------->------->------->-------
> &mlx5e_rep_block_tc_cb_list,
> >>>> ------->------->------->------->------->-------  mlx5e_rep_setup_tc_cb,
> >>>> ------->------->------->------->------->-------  priv, priv, true);
> >>>> -------case TC_SETUP_FT:
> >>>> ------->-------return flow_block_cb_setup_simple(type_data,
> >>>> ------->------->------->------->------->-------
> &mlx5e_rep_block_ft_cb_list,
> >>>> ------->------->------->------->------->-------  mlx5e_rep_setup_ft_cb,
> >>>> ------->------->------->------->------->-------  priv, priv, true);
> >>>> -------default:
> >>>> ------->-------return -EOPNOTSUPP;
> >>>> -------}
> >>> In nf_flow_table_offload.c:
> >>>> -------bo.binder_type>-=
> FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS;
> >>>> -------bo.extack>------= &extack;
> >>>> -------INIT_LIST_HEAD(&bo.cb_list);
> >>>> -------err = dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_BLOCK,
> >> &bo);
> >>>> -------if (err < 0)
> >>>> ------->-------return err;
> >>>> -------return nf_flow_table_block_setup(flowtable, &bo, cmd);
> >>> }
> >>> EXPORT_SYMBOL_GPL(nf_flow_table_offload_setup);
> >>>
> >>>
> >>> So unless you changed that as well, you should have gotten to
> >> mlx5e_rep_setup_tc_cb and not mlx5e_rep_setup_tc_ft.
> >>> Regarding the encap action, there should be no difference on which
> chain
> >> the rule is on.
> >>
> >>
> >> But for the same encap rule can be real offloaded when setup through
> >> through TC_SETUP_BLOCK. But TC_SETUP_FT can't.
> >>
> >> So it is the problem of TC_SETUP_FT in mlx5e_rep_setup_ft_cb ?
> >>
> >>>
> >>>> -----Original Message-----
> >>>> From: wenxu <wenxu@ucloud.cn>
> >>>> Sent: Thursday, November 21, 2019 9:30 AM
> >>>> To: Paul Blakey <paulb@mellanox.com>
> >>>> Cc: pablo@netfilter.org; netdev@vger.kernel.org; Mark Bloch
> >>>> <markb@mellanox.com>
> >>>> Subject: Question about flow table offload in mlx5e
> >>>>
> >>>> Hi  paul,
> >>>>
> >>>> The flow table offload in the mlx5e is based on TC_SETUP_FT.
> >>>>
> >>>>
> >>>> It is almost the same as TC_SETUP_BLOCK.
> >>>>
> >>>> It just set MLX5_TC_FLAG(FT_OFFLOAD) flags and change
> >>>> cls_flower.common.chain_index = FDB_FT_CHAIN;
> >>>>
> >>>> In following codes line 1380 and 1392
> >>>>
> >>>> 1368 static int mlx5e_rep_setup_ft_cb(enum tc_setup_type type, void
> >>>> *type_data,
> >>>> 1369                                  void *cb_priv)
> >>>> 1370 {
> >>>> 1371         struct flow_cls_offload *f = type_data;
> >>>> 1372         struct flow_cls_offload cls_flower;
> >>>> 1373         struct mlx5e_priv *priv = cb_priv;
> >>>> 1374         struct mlx5_eswitch *esw;
> >>>> 1375         unsigned long flags;
> >>>> 1376         int err;
> >>>> 1377
> >>>> 1378         flags = MLX5_TC_FLAG(INGRESS) |
> >>>> 1379                 MLX5_TC_FLAG(ESW_OFFLOAD) |
> >>>> 1380                 MLX5_TC_FLAG(FT_OFFLOAD);
> >>>> 1381         esw = priv->mdev->priv.eswitch;
> >>>> 1382
> >>>> 1383         switch (type) {
> >>>> 1384         case TC_SETUP_CLSFLOWER:
> >>>> 1385                 if (!mlx5_eswitch_prios_supported(esw) || f-
> >>>>> common.chain_index)
> >>>> 1386                         return -EOPNOTSUPP;
> >>>> 1387
> >>>> 1388                 /* Re-use tc offload path by moving the ft flow to the
> >>>> 1389                  * reserved ft chain.
> >>>> 1390                  */
> >>>> 1391                 memcpy(&cls_flower, f, sizeof(*f));
> >>>> 1392                cls_flower.common.chain_index = FDB_FT_CHAIN;
> >>>> 1393                 err = mlx5e_rep_setup_tc_cls_flower(priv, &cls_flower,
> >> flags);
> >>>> 1394                 memcpy(&f->stats, &cls_flower.stats, sizeof(f->stats));
> >>>>
> >>>>
> >>>> I want to add tunnel offload support in the flow table, I  add some
> patches
> >> in
> >>>> nf_flow_table_offload.
> >>>>
> >>>> Also add the indr setup support in the mlx driver. And Now I can  flow
> >> table
> >>>> offload with decap.
> >>>>
> >>>>
> >>>> But I meet a problem with the encap.  The encap rule can be added in
> >>>> hardware  successfully But it can't be offloaded.
> >>>>
> >>>> But I think the rule I added is correct.  If I mask the line 1392. The rule
> also
> >> can
> >>>> be add success and can be offloaded.
> >>>>
> >>>> So there are some limit for encap operation for FT_OFFLOAD in
> >>>> FDB_FT_CHAIN?
> >>>>
> >>>>
> >>>> BR
> >>>>
> >>>> wenxu
> >>>>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Question about flow table offload in mlx5e
  2019-11-21 13:05             ` Paul Blakey
@ 2019-11-21 13:39               ` wenxu
  2019-11-22  6:12               ` wenxu
       [not found]               ` <64285654-bc9a-c76e-5875-dc6e434dc4d4@ucloud.cn>
  2 siblings, 0 replies; 25+ messages in thread
From: wenxu @ 2019-11-21 13:39 UTC (permalink / raw)
  To: Paul Blakey; +Cc: pablo, netdev, Mark Bloch

The normal FWD rules that fwd to pf-rep or vf-rep can real offloaded

在 2019/11/21 21:05, Paul Blakey 写道:
> I see, I will test that, and how about normal FWD rules?
>
> Paul.
>
>
>> -----Original Message-----
>> From: wenxu <wenxu@ucloud.cn>
>> Sent: Thursday, November 21, 2019 2:35 PM
>> To: Paul Blakey <paulb@mellanox.com>
>> Cc: pablo@netfilter.org; netdev@vger.kernel.org; Mark Bloch
>> <markb@mellanox.com>
>> Subject: Re: Question about flow table offload in mlx5e
>>
>>
>> 在 2019/11/21 19:39, Paul Blakey 写道:
>>> They are good fixes, exactly what we had when we tested this, thanks.
>>>
>>> Regarding encap, I don't know what changes you did, how does the encap
>> rule look? Is it a FWD to vxlan device? If not it should be, as our driver
>> expects that.
>> It is fwd to a gretap devices
>>> I tried it on my setup via tc, by changing the callback of tc
>> (mlx5e_rep_setup_tc_cb) to that of ft (mlx5e_rep_setup_ft_cb),
>>> and testing a vxlan encap rule:
>>> sudo tc qdisc add dev ens1f0_0 ingress
>>> sudo ifconfig ens1f0 7.7.7.7/24 up
>>> sudo ip link add name vxlan0 type vxlan dev ens1f0 remote 7.7.7.8 dstport
>> 4789 external
>>> sudo ifconfig vxlan0 up
>>> sudo tc filter add dev ens1f0_0 ingress prio 1 chain 0 protocol ip flower
>> dst_mac aa:bb:cc:dd:ee:ff ip_proto udp skip_sw  action tunnel_key set
>> src_ip 0.0.0.0 dst_ip 7.7.7.8 id 1234 dst_port 4789 pipe action mirred egress
>> redirect dev vxlan
>>> then tc show:
>>> filter protocol ip pref 1 flower chain 0 handle 0x1 dst_mac aa:bb:cc:dd:ee:ff
>> ip_proto udp skip_sw in_hw in_hw_count 1
>>>         tunnel_key set src_ip 0.0.0.0 dst_ip 7.7.7.8 key_id 1234 dst_port 4789
>> csum pipe
>>>         Stats: used 119 sec      0 pkt
>>>         mirred (Egress Redirect to device vxlan0)
>>>         Stats: used 119 sec      0 pkt
>> Can you send packet that match this offloaded flow to check it is real
>> offloaded?
>>
>> In the flowtable offload with my patches both TC_SETUP_BLOCK and
>> TC_SETUP_FT can offload the rule success
>>
>> But in the TC_SETUP_FT case the packet is not real offloaded.
>>
>>
>> I  will test like u did.
>>
>>>
>>>
>>>> -----Original Message-----
>>>> From: wenxu <wenxu@ucloud.cn>
>>>> Sent: Thursday, November 21, 2019 10:29 AM
>>>> To: Paul Blakey <paulb@mellanox.com>
>>>> Cc: pablo@netfilter.org; netdev@vger.kernel.org; Mark Bloch
>>>> <markb@mellanox.com>
>>>> Subject: Re: Question about flow table offload in mlx5e
>>>>
>>>>
>>>> On 11/21/2019 3:42 PM, Paul Blakey wrote:
>>>>> Hi,
>>>>>
>>>>> The original design was the block setup to use TC_SETUP_FT type, and
>> the
>>>> tc event type to be case TC_SETUP_CLSFLOWER.
>>>>> We will post a patch to change that. I would advise to wait till we fix that
>>>> 😊
>>>>> I'm not sure how you get to this function mlx5e_rep_setup_ft_cb() if it
>> the
>>>> nf_flow_table_offload ndo_setup_tc event was TC_SETUP_BLOCK, and
>> not
>>>> TC_SETUP_FT.
>>>>
>>>>
>>>> Yes I change the TC_SETUP_BLOCK to TC_SETUP_FT in the
>>>> nf_flow_table_offload_setup.
>>>>
>>>> Two fixes patch provide:
>>>>
>>>> http://patchwork.ozlabs.org/patch/1197818/
>>>>
>>>> http://patchwork.ozlabs.org/patch/1197876/
>>>>
>>>> So this change made by me is not correct currently?
>>>>
>>>>> In our driver en_rep.c we have:
>>>>>> -------switch (type) {
>>>>>> -------case TC_SETUP_BLOCK:
>>>>>> ------->-------return flow_block_cb_setup_simple(type_data,
>>>>>> ------->------->------->------->------->-------
>> &mlx5e_rep_block_tc_cb_list,
>>>>>> ------->------->------->------->------->-------  mlx5e_rep_setup_tc_cb,
>>>>>> ------->------->------->------->------->-------  priv, priv, true);
>>>>>> -------case TC_SETUP_FT:
>>>>>> ------->-------return flow_block_cb_setup_simple(type_data,
>>>>>> ------->------->------->------->------->-------
>> &mlx5e_rep_block_ft_cb_list,
>>>>>> ------->------->------->------->------->-------  mlx5e_rep_setup_ft_cb,
>>>>>> ------->------->------->------->------->-------  priv, priv, true);
>>>>>> -------default:
>>>>>> ------->-------return -EOPNOTSUPP;
>>>>>> -------}
>>>>> In nf_flow_table_offload.c:
>>>>>> -------bo.binder_type>-=
>> FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS;
>>>>>> -------bo.extack>------= &extack;
>>>>>> -------INIT_LIST_HEAD(&bo.cb_list);
>>>>>> -------err = dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_BLOCK,
>>>> &bo);
>>>>>> -------if (err < 0)
>>>>>> ------->-------return err;
>>>>>> -------return nf_flow_table_block_setup(flowtable, &bo, cmd);
>>>>> }
>>>>> EXPORT_SYMBOL_GPL(nf_flow_table_offload_setup);
>>>>>
>>>>>
>>>>> So unless you changed that as well, you should have gotten to
>>>> mlx5e_rep_setup_tc_cb and not mlx5e_rep_setup_tc_ft.
>>>>> Regarding the encap action, there should be no difference on which
>> chain
>>>> the rule is on.
>>>>
>>>>
>>>> But for the same encap rule can be real offloaded when setup through
>>>> through TC_SETUP_BLOCK. But TC_SETUP_FT can't.
>>>>
>>>> So it is the problem of TC_SETUP_FT in mlx5e_rep_setup_ft_cb ?
>>>>
>>>>>> -----Original Message-----
>>>>>> From: wenxu <wenxu@ucloud.cn>
>>>>>> Sent: Thursday, November 21, 2019 9:30 AM
>>>>>> To: Paul Blakey <paulb@mellanox.com>
>>>>>> Cc: pablo@netfilter.org; netdev@vger.kernel.org; Mark Bloch
>>>>>> <markb@mellanox.com>
>>>>>> Subject: Question about flow table offload in mlx5e
>>>>>>
>>>>>> Hi  paul,
>>>>>>
>>>>>> The flow table offload in the mlx5e is based on TC_SETUP_FT.
>>>>>>
>>>>>>
>>>>>> It is almost the same as TC_SETUP_BLOCK.
>>>>>>
>>>>>> It just set MLX5_TC_FLAG(FT_OFFLOAD) flags and change
>>>>>> cls_flower.common.chain_index = FDB_FT_CHAIN;
>>>>>>
>>>>>> In following codes line 1380 and 1392
>>>>>>
>>>>>> 1368 static int mlx5e_rep_setup_ft_cb(enum tc_setup_type type, void
>>>>>> *type_data,
>>>>>> 1369                                  void *cb_priv)
>>>>>> 1370 {
>>>>>> 1371         struct flow_cls_offload *f = type_data;
>>>>>> 1372         struct flow_cls_offload cls_flower;
>>>>>> 1373         struct mlx5e_priv *priv = cb_priv;
>>>>>> 1374         struct mlx5_eswitch *esw;
>>>>>> 1375         unsigned long flags;
>>>>>> 1376         int err;
>>>>>> 1377
>>>>>> 1378         flags = MLX5_TC_FLAG(INGRESS) |
>>>>>> 1379                 MLX5_TC_FLAG(ESW_OFFLOAD) |
>>>>>> 1380                 MLX5_TC_FLAG(FT_OFFLOAD);
>>>>>> 1381         esw = priv->mdev->priv.eswitch;
>>>>>> 1382
>>>>>> 1383         switch (type) {
>>>>>> 1384         case TC_SETUP_CLSFLOWER:
>>>>>> 1385                 if (!mlx5_eswitch_prios_supported(esw) || f-
>>>>>>> common.chain_index)
>>>>>> 1386                         return -EOPNOTSUPP;
>>>>>> 1387
>>>>>> 1388                 /* Re-use tc offload path by moving the ft flow to the
>>>>>> 1389                  * reserved ft chain.
>>>>>> 1390                  */
>>>>>> 1391                 memcpy(&cls_flower, f, sizeof(*f));
>>>>>> 1392                cls_flower.common.chain_index = FDB_FT_CHAIN;
>>>>>> 1393                 err = mlx5e_rep_setup_tc_cls_flower(priv, &cls_flower,
>>>> flags);
>>>>>> 1394                 memcpy(&f->stats, &cls_flower.stats, sizeof(f->stats));
>>>>>>
>>>>>>
>>>>>> I want to add tunnel offload support in the flow table, I  add some
>> patches
>>>> in
>>>>>> nf_flow_table_offload.
>>>>>>
>>>>>> Also add the indr setup support in the mlx driver. And Now I can  flow
>>>> table
>>>>>> offload with decap.
>>>>>>
>>>>>>
>>>>>> But I meet a problem with the encap.  The encap rule can be added in
>>>>>> hardware  successfully But it can't be offloaded.
>>>>>>
>>>>>> But I think the rule I added is correct.  If I mask the line 1392. The rule
>> also
>>>> can
>>>>>> be add success and can be offloaded.
>>>>>>
>>>>>> So there are some limit for encap operation for FT_OFFLOAD in
>>>>>> FDB_FT_CHAIN?
>>>>>>
>>>>>>
>>>>>> BR
>>>>>>
>>>>>> wenxu
>>>>>>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Question about flow table offload in mlx5e
  2019-11-21 13:05             ` Paul Blakey
  2019-11-21 13:39               ` wenxu
@ 2019-11-22  6:12               ` wenxu
       [not found]               ` <64285654-bc9a-c76e-5875-dc6e434dc4d4@ucloud.cn>
  2 siblings, 0 replies; 25+ messages in thread
From: wenxu @ 2019-11-22  6:12 UTC (permalink / raw)
  To: Paul Blakey; +Cc: pablo, netdev, Mark Bloch

Hi Paul,


There are some update.


ifconfig mlx_p0 172.168.152.75/24 up

ip l add dev tun1 type gretap external
tc qdisc add dev tun1 ingress
tc qdisc add dev mlx_pf0vf0 ingress

tc filter add dev mlx_pf0vf0 pref 2 ingress  protocol ip flower skip_sw ip_proto tcp dst_ip 10.0.1.241 src_ip 10.0.0.75 src_port 5002 dst_port 5001 tcp_flags 0/0x5  action tunnel_key set dst_ip 172.168.152.241 src_ip 0 id 1000 nocsum pipe action mirred egress redirect dev tun1

tc filter add dev tun1 pref 2 ingress  protocol ip flower ip_proto tcp src_ip 10.0.1.241 dst_ip 10.0.0.75 src_port 5001 dst_port 5002 tcp_flags 0/0x5 enc_key_id 1000 enc_src_ip 172.168.152.241 action tunnel_key unset pipe action mirred egress redirect dev mlx_pf0vf0


If you run this script on the host,  and in the virtual machine  run "iperf -c 10.0.1.241  -i 2  -B 10.0.0.75:5002  -t 1000"

The tcp syn packet will not be offloaded


But if you  only run the script  without the last filter as following , The tcp syn packet will be offloaded.

ifconfig mlx_p0 172.168.152.75/24 up

ip l add dev tun1 type gretap external
tc qdisc add dev tun1 ingress
tc qdisc add dev mlx_pf0vf0 ingress

tc filter add dev mlx_pf0vf0 pref 2 ingress  protocol ip flower skip_sw ip_proto tcp dst_ip 10.0.1.241 src_ip 10.0.0.75 src_port 5002 dst_port 5001 tcp_flags 0/0x5  action tunnel_key set dst_ip 172.168.152.241 src_ip 0 id 1000 nocsum pipe action mirred egress redirect dev tun1





On 11/21/2019 9:05 PM, Paul Blakey wrote:
> I see, I will test that, and how about normal FWD rules?
>
> Paul.
>
>
>> -----Original Message-----
>> From: wenxu <wenxu@ucloud.cn>
>> Sent: Thursday, November 21, 2019 2:35 PM
>> To: Paul Blakey <paulb@mellanox.com>
>> Cc: pablo@netfilter.org; netdev@vger.kernel.org; Mark Bloch
>> <markb@mellanox.com>
>> Subject: Re: Question about flow table offload in mlx5e
>>
>>
>> 在 2019/11/21 19:39, Paul Blakey 写道:
>>> They are good fixes, exactly what we had when we tested this, thanks.
>>>
>>> Regarding encap, I don't know what changes you did, how does the encap
>> rule look? Is it a FWD to vxlan device? If not it should be, as our driver
>> expects that.
>> It is fwd to a gretap devices
>>> I tried it on my setup via tc, by changing the callback of tc
>> (mlx5e_rep_setup_tc_cb) to that of ft (mlx5e_rep_setup_ft_cb),
>>> and testing a vxlan encap rule:
>>> sudo tc qdisc add dev ens1f0_0 ingress
>>> sudo ifconfig ens1f0 7.7.7.7/24 up
>>> sudo ip link add name vxlan0 type vxlan dev ens1f0 remote 7.7.7.8 dstport
>> 4789 external
>>> sudo ifconfig vxlan0 up
>>> sudo tc filter add dev ens1f0_0 ingress prio 1 chain 0 protocol ip flower
>> dst_mac aa:bb:cc:dd:ee:ff ip_proto udp skip_sw  action tunnel_key set
>> src_ip 0.0.0.0 dst_ip 7.7.7.8 id 1234 dst_port 4789 pipe action mirred egress
>> redirect dev vxlan
>>> then tc show:
>>> filter protocol ip pref 1 flower chain 0 handle 0x1 dst_mac aa:bb:cc:dd:ee:ff
>> ip_proto udp skip_sw in_hw in_hw_count 1
>>>         tunnel_key set src_ip 0.0.0.0 dst_ip 7.7.7.8 key_id 1234 dst_port 4789
>> csum pipe
>>>         Stats: used 119 sec      0 pkt
>>>         mirred (Egress Redirect to device vxlan0)
>>>         Stats: used 119 sec      0 pkt
>> Can you send packet that match this offloaded flow to check it is real
>> offloaded?
>>
>> In the flowtable offload with my patches both TC_SETUP_BLOCK and
>> TC_SETUP_FT can offload the rule success
>>
>> But in the TC_SETUP_FT case the packet is not real offloaded.
>>
>>
>> I  will test like u did.
>>
>>>
>>>
>>>> -----Original Message-----
>>>> From: wenxu <wenxu@ucloud.cn>
>>>> Sent: Thursday, November 21, 2019 10:29 AM
>>>> To: Paul Blakey <paulb@mellanox.com>
>>>> Cc: pablo@netfilter.org; netdev@vger.kernel.org; Mark Bloch
>>>> <markb@mellanox.com>
>>>> Subject: Re: Question about flow table offload in mlx5e
>>>>
>>>>
>>>> On 11/21/2019 3:42 PM, Paul Blakey wrote:
>>>>> Hi,
>>>>>
>>>>> The original design was the block setup to use TC_SETUP_FT type, and
>> the
>>>> tc event type to be case TC_SETUP_CLSFLOWER.
>>>>> We will post a patch to change that. I would advise to wait till we fix that
>>>> 😊
>>>>> I'm not sure how you get to this function mlx5e_rep_setup_ft_cb() if it
>> the
>>>> nf_flow_table_offload ndo_setup_tc event was TC_SETUP_BLOCK, and
>> not
>>>> TC_SETUP_FT.
>>>>
>>>>
>>>> Yes I change the TC_SETUP_BLOCK to TC_SETUP_FT in the
>>>> nf_flow_table_offload_setup.
>>>>
>>>> Two fixes patch provide:
>>>>
>>>> http://patchwork.ozlabs.org/patch/1197818/
>>>>
>>>> http://patchwork.ozlabs.org/patch/1197876/
>>>>
>>>> So this change made by me is not correct currently?
>>>>
>>>>> In our driver en_rep.c we have:
>>>>>> -------switch (type) {
>>>>>> -------case TC_SETUP_BLOCK:
>>>>>> ------->-------return flow_block_cb_setup_simple(type_data,
>>>>>> ------->------->------->------->------->-------
>> &mlx5e_rep_block_tc_cb_list,
>>>>>> ------->------->------->------->------->-------  mlx5e_rep_setup_tc_cb,
>>>>>> ------->------->------->------->------->-------  priv, priv, true);
>>>>>> -------case TC_SETUP_FT:
>>>>>> ------->-------return flow_block_cb_setup_simple(type_data,
>>>>>> ------->------->------->------->------->-------
>> &mlx5e_rep_block_ft_cb_list,
>>>>>> ------->------->------->------->------->-------  mlx5e_rep_setup_ft_cb,
>>>>>> ------->------->------->------->------->-------  priv, priv, true);
>>>>>> -------default:
>>>>>> ------->-------return -EOPNOTSUPP;
>>>>>> -------}
>>>>> In nf_flow_table_offload.c:
>>>>>> -------bo.binder_type>-=
>> FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS;
>>>>>> -------bo.extack>------= &extack;
>>>>>> -------INIT_LIST_HEAD(&bo.cb_list);
>>>>>> -------err = dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_BLOCK,
>>>> &bo);
>>>>>> -------if (err < 0)
>>>>>> ------->-------return err;
>>>>>> -------return nf_flow_table_block_setup(flowtable, &bo, cmd);
>>>>> }
>>>>> EXPORT_SYMBOL_GPL(nf_flow_table_offload_setup);
>>>>>
>>>>>
>>>>> So unless you changed that as well, you should have gotten to
>>>> mlx5e_rep_setup_tc_cb and not mlx5e_rep_setup_tc_ft.
>>>>> Regarding the encap action, there should be no difference on which
>> chain
>>>> the rule is on.
>>>>
>>>>
>>>> But for the same encap rule can be real offloaded when setup through
>>>> through TC_SETUP_BLOCK. But TC_SETUP_FT can't.
>>>>
>>>> So it is the problem of TC_SETUP_FT in mlx5e_rep_setup_ft_cb ?
>>>>
>>>>>> -----Original Message-----
>>>>>> From: wenxu <wenxu@ucloud.cn>
>>>>>> Sent: Thursday, November 21, 2019 9:30 AM
>>>>>> To: Paul Blakey <paulb@mellanox.com>
>>>>>> Cc: pablo@netfilter.org; netdev@vger.kernel.org; Mark Bloch
>>>>>> <markb@mellanox.com>
>>>>>> Subject: Question about flow table offload in mlx5e
>>>>>>
>>>>>> Hi  paul,
>>>>>>
>>>>>> The flow table offload in the mlx5e is based on TC_SETUP_FT.
>>>>>>
>>>>>>
>>>>>> It is almost the same as TC_SETUP_BLOCK.
>>>>>>
>>>>>> It just set MLX5_TC_FLAG(FT_OFFLOAD) flags and change
>>>>>> cls_flower.common.chain_index = FDB_FT_CHAIN;
>>>>>>
>>>>>> In following codes line 1380 and 1392
>>>>>>
>>>>>> 1368 static int mlx5e_rep_setup_ft_cb(enum tc_setup_type type, void
>>>>>> *type_data,
>>>>>> 1369                                  void *cb_priv)
>>>>>> 1370 {
>>>>>> 1371         struct flow_cls_offload *f = type_data;
>>>>>> 1372         struct flow_cls_offload cls_flower;
>>>>>> 1373         struct mlx5e_priv *priv = cb_priv;
>>>>>> 1374         struct mlx5_eswitch *esw;
>>>>>> 1375         unsigned long flags;
>>>>>> 1376         int err;
>>>>>> 1377
>>>>>> 1378         flags = MLX5_TC_FLAG(INGRESS) |
>>>>>> 1379                 MLX5_TC_FLAG(ESW_OFFLOAD) |
>>>>>> 1380                 MLX5_TC_FLAG(FT_OFFLOAD);
>>>>>> 1381         esw = priv->mdev->priv.eswitch;
>>>>>> 1382
>>>>>> 1383         switch (type) {
>>>>>> 1384         case TC_SETUP_CLSFLOWER:
>>>>>> 1385                 if (!mlx5_eswitch_prios_supported(esw) || f-
>>>>>>> common.chain_index)
>>>>>> 1386                         return -EOPNOTSUPP;
>>>>>> 1387
>>>>>> 1388                 /* Re-use tc offload path by moving the ft flow to the
>>>>>> 1389                  * reserved ft chain.
>>>>>> 1390                  */
>>>>>> 1391                 memcpy(&cls_flower, f, sizeof(*f));
>>>>>> 1392                cls_flower.common.chain_index = FDB_FT_CHAIN;
>>>>>> 1393                 err = mlx5e_rep_setup_tc_cls_flower(priv, &cls_flower,
>>>> flags);
>>>>>> 1394                 memcpy(&f->stats, &cls_flower.stats, sizeof(f->stats));
>>>>>>
>>>>>>
>>>>>> I want to add tunnel offload support in the flow table, I  add some
>> patches
>>>> in
>>>>>> nf_flow_table_offload.
>>>>>>
>>>>>> Also add the indr setup support in the mlx driver. And Now I can  flow
>>>> table
>>>>>> offload with decap.
>>>>>>
>>>>>>
>>>>>> But I meet a problem with the encap.  The encap rule can be added in
>>>>>> hardware  successfully But it can't be offloaded.
>>>>>>
>>>>>> But I think the rule I added is correct.  If I mask the line 1392. The rule
>> also
>>>> can
>>>>>> be add success and can be offloaded.
>>>>>>
>>>>>> So there are some limit for encap operation for FT_OFFLOAD in
>>>>>> FDB_FT_CHAIN?
>>>>>>
>>>>>>
>>>>>> BR
>>>>>>
>>>>>> wenxu
>>>>>>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: Question about flow table offload in mlx5e
       [not found]               ` <64285654-bc9a-c76e-5875-dc6e434dc4d4@ucloud.cn>
@ 2019-11-24  8:46                 ` Paul Blakey
  2019-11-24 11:14                   ` wenxu
  2019-11-26  8:18                   ` wenxu
  0 siblings, 2 replies; 25+ messages in thread
From: Paul Blakey @ 2019-11-24  8:46 UTC (permalink / raw)
  To: wenxu; +Cc: pablo, netdev, Mark Bloch

Hi,

The syn packet might not be actually offloaded because there isn't a neighbor to resolve the destination mac for the tunnel destination ip (next hop mac).
Try setting the neighbor via "ip neigh replace dev mlx5_p0 172.168.152.241 lladdr <next hop mac>"
Or running ping to 172.168.152.241 before adding (or in background) the rule to resolve the mac and make available.
I'll test it on my end.


Thanks,
Paul.


> -----Original Message-----
> From: wenxu <wenxu@ucloud.cn>
> Sent: Friday, November 22, 2019 8:26 AM
> To: Paul Blakey <paulb@mellanox.com>
> Cc: pablo@netfilter.org; netdev@vger.kernel.org; Mark Bloch
> <markb@mellanox.com>
> Subject: Re: Question about flow table offload in mlx5e
> 
> Hi Paul,
> 
> 
> There are some update. I also test it through replacing mlx5e_rep_setup_tc
> _cb with mlx5e_rep_setup_ft_cb
> 
> 
> ifconfig mlx_p0 172.168.152.75/24 up
> 
> ip l add dev tun1 type gretap external
> tc qdisc add dev tun1 ingress
> tc qdisc add dev mlx_pf0vf0 ingress
> 
> tc filter add dev mlx_pf0vf0 pref 2 ingress  protocol ip flower skip_sw
> ip_proto tcp dst_ip 10.0.1.241 src_ip 10.0.0.75 src_port 5002 dst_port 5001
> tcp_flags 0/0x5  action tunnel_key set dst_ip 172.168.152.241 src_ip 0 id 1000
> nocsum pipe action mirred egress redirect dev tun1
> 
> tc filter add dev tun1 pref 2 ingress  protocol ip flower ip_proto tcp src_ip
> 10.0.1.241 dst_ip 10.0.0.75 src_port 5001 dst_port 5002 tcp_flags 0/0x5
> enc_key_id 1000 enc_src_ip 172.168.152.241 action tunnel_key unset pipe
> action mirred egress redirect dev mlx_pf0vf0
> 
> 
> If you run this script on the host,  and in the virtual machine  run "iperf -c
> 10.0.1.241  -i 2  -B 10.0.0.75:5002  -t 1000"
> 
> The tcp syn packet will not be offloaded
> 
> 
> But if you  only run the script  without the last filter as following , The tcp syn
> packet will be offloaded.
> 
> ifconfig mlx_p0 172.168.152.75/24 up
> 
> ip l add dev tun1 type gretap external
> tc qdisc add dev tun1 ingress
> tc qdisc add dev mlx_pf0vf0 ingress
> 
> tc filter add dev mlx_pf0vf0 pref 2 ingress  protocol ip flower skip_sw
> ip_proto tcp dst_ip 10.0.1.241 src_ip 10.0.0.75 src_port 5002 dst_port 5001
> tcp_flags 0/0x5  action tunnel_key set dst_ip 172.168.152.241 src_ip 0 id 1000
> nocsum pipe action mirred egress redirect dev tun1.
> 
> I think there are some problem in mlx5e_rep_setup_ft_cb.
> 
> On 11/21/2019 9:05 PM, Paul Blakey wrote:
> 
> 
> 	I see, I will test that, and how about normal FWD rules?
> 
> 	Paul.
> 
> 
> 
> 		-----Original Message-----
> 		From: wenxu <wenxu@ucloud.cn>
> <mailto:wenxu@ucloud.cn>
> 		Sent: Thursday, November 21, 2019 2:35 PM
> 		To: Paul Blakey <paulb@mellanox.com>
> <mailto:paulb@mellanox.com>
> 		Cc: pablo@netfilter.org <mailto:pablo@netfilter.org> ;
> netdev@vger.kernel.org <mailto:netdev@vger.kernel.org> ; Mark Bloch
> 		<markb@mellanox.com> <mailto:markb@mellanox.com>
> 		Subject: Re: Question about flow table offload in mlx5e
> 
> 
> 		在 2019/11/21 19:39, Paul Blakey 写道:
> 
> 			They are good fixes, exactly what we had when we
> tested this, thanks.
> 
> 			Regarding encap, I don't know what changes you did,
> how does the encap
> 
> 		rule look? Is it a FWD to vxlan device? If not it should be, as
> our driver
> 		expects that.
> 		It is fwd to a gretap devices
> 
> 
> 			I tried it on my setup via tc, by changing the callback
> of tc
> 
> 		(mlx5e_rep_setup_tc_cb) to that of ft
> (mlx5e_rep_setup_ft_cb),
> 
> 			and testing a vxlan encap rule:
> 			sudo tc qdisc add dev ens1f0_0 ingress
> 			sudo ifconfig ens1f0 7.7.7.7/24 up
> 			sudo ip link add name vxlan0 type vxlan dev ens1f0
> remote 7.7.7.8 dstport
> 
> 		4789 external
> 
> 			sudo ifconfig vxlan0 up
> 			sudo tc filter add dev ens1f0_0 ingress prio 1 chain 0
> protocol ip flower
> 
> 		dst_mac aa:bb:cc:dd:ee:ff ip_proto udp skip_sw  action
> tunnel_key set
> 		src_ip 0.0.0.0 dst_ip 7.7.7.8 id 1234 dst_port 4789 pipe action
> mirred egress
> 		redirect dev vxlan
> 
> 
> 			then tc show:
> 			filter protocol ip pref 1 flower chain 0 handle 0x1
> dst_mac aa:bb:cc:dd:ee:ff
> 
> 		ip_proto udp skip_sw in_hw in_hw_count 1
> 
> 			        tunnel_key set src_ip 0.0.0.0 dst_ip 7.7.7.8 key_id
> 1234 dst_port 4789
> 
> 		csum pipe
> 
> 			        Stats: used 119 sec      0 pkt
> 			        mirred (Egress Redirect to device vxlan0)
> 			        Stats: used 119 sec      0 pkt
> 
> 
> 		Can you send packet that match this offloaded flow to check
> it is real
> 		offloaded?
> 
> 		In the flowtable offload with my patches both
> TC_SETUP_BLOCK and
> 		TC_SETUP_FT can offload the rule success
> 
> 		But in the TC_SETUP_FT case the packet is not real offloaded.
> 
> 
> 		I  will test like u did.
> 
> 
> 
> 
> 
> 
> 				-----Original Message-----
> 				From: wenxu <wenxu@ucloud.cn>
> <mailto:wenxu@ucloud.cn>
> 				Sent: Thursday, November 21, 2019 10:29 AM
> 				To: Paul Blakey <paulb@mellanox.com>
> <mailto:paulb@mellanox.com>
> 				Cc: pablo@netfilter.org
> <mailto:pablo@netfilter.org> ; netdev@vger.kernel.org
> <mailto:netdev@vger.kernel.org> ; Mark Bloch
> 				<markb@mellanox.com>
> <mailto:markb@mellanox.com>
> 				Subject: Re: Question about flow table
> offload in mlx5e
> 
> 
> 				On 11/21/2019 3:42 PM, Paul Blakey wrote:
> 
> 					Hi,
> 
> 					The original design was the block
> setup to use TC_SETUP_FT type, and
> 
> 		the
> 
> 				tc event type to be case
> TC_SETUP_CLSFLOWER.
> 
> 					We will post a patch to change that. I
> would advise to wait till we fix that
> 
> 				😊
> 
> 					I'm not sure how you get to this
> function mlx5e_rep_setup_ft_cb() if it
> 
> 		the
> 
> 				nf_flow_table_offload ndo_setup_tc event
> was TC_SETUP_BLOCK, and
> 
> 		not
> 
> 				TC_SETUP_FT.
> 
> 
> 				Yes I change the TC_SETUP_BLOCK to
> TC_SETUP_FT in the
> 				nf_flow_table_offload_setup.
> 
> 				Two fixes patch provide:
> 
> 				http://patchwork.ozlabs.org/patch/1197818/
> 
> 				http://patchwork.ozlabs.org/patch/1197876/
> 
> 				So this change made by me is not correct
> currently?
> 
> 
> 					In our driver en_rep.c we have:
> 
> 					-------switch (type) {
> 					-------case TC_SETUP_BLOCK:
> 					------->-------return
> flow_block_cb_setup_simple(type_data,
> 					------->------->------->------->------->---
> ----
> 
> 		&mlx5e_rep_block_tc_cb_list,
> 
> 					------->------->------->------->------->---
> ----  mlx5e_rep_setup_tc_cb,
> 					------->------->------->------->------->---
> ----  priv, priv, true);
> 					-------case TC_SETUP_FT:
> 					------->-------return
> flow_block_cb_setup_simple(type_data,
> 					------->------->------->------->------->---
> ----
> 
> 		&mlx5e_rep_block_ft_cb_list,
> 
> 					------->------->------->------->------->---
> ----  mlx5e_rep_setup_ft_cb,
> 					------->------->------->------->------->---
> ----  priv, priv, true);
> 					-------default:
> 					------->-------return -EOPNOTSUPP;
> 					-------}
> 
> 					In nf_flow_table_offload.c:
> 
> 					-------bo.binder_type>-=
> 
> 		FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS;
> 
> 					-------bo.extack>------= &extack;
> 					-------INIT_LIST_HEAD(&bo.cb_list);
> 					-------err = dev->netdev_ops-
> >ndo_setup_tc(dev, TC_SETUP_BLOCK,
> 
> 				&bo);
> 
> 					-------if (err < 0)
> 					------->-------return err;
> 					-------return
> nf_flow_table_block_setup(flowtable, &bo, cmd);
> 
> 					}
> 
> 	EXPORT_SYMBOL_GPL(nf_flow_table_offload_setup);
> 
> 
> 					So unless you changed that as well,
> you should have gotten to
> 
> 				mlx5e_rep_setup_tc_cb and not
> mlx5e_rep_setup_tc_ft.
> 
> 					Regarding the encap action, there
> should be no difference on which
> 
> 		chain
> 
> 				the rule is on.
> 
> 
> 				But for the same encap rule can be real
> offloaded when setup through
> 				through TC_SETUP_BLOCK. But TC_SETUP_FT
> can't.
> 
> 				So it is the problem of TC_SETUP_FT in
> mlx5e_rep_setup_ft_cb ?
> 
> 
> 
> 
> 					-----Original Message-----
> 					From: wenxu <wenxu@ucloud.cn>
> <mailto:wenxu@ucloud.cn>
> 					Sent: Thursday, November 21, 2019
> 9:30 AM
> 					To: Paul Blakey
> <paulb@mellanox.com> <mailto:paulb@mellanox.com>
> 					Cc: pablo@netfilter.org
> <mailto:pablo@netfilter.org> ; netdev@vger.kernel.org
> <mailto:netdev@vger.kernel.org> ; Mark Bloch
> 					<markb@mellanox.com>
> <mailto:markb@mellanox.com>
> 					Subject: Question about flow table
> offload in mlx5e
> 
> 					Hi  paul,
> 
> 					The flow table offload in the mlx5e is
> based on TC_SETUP_FT.
> 
> 
> 					It is almost the same as
> TC_SETUP_BLOCK.
> 
> 					It just set
> MLX5_TC_FLAG(FT_OFFLOAD) flags and change
> 					cls_flower.common.chain_index =
> FDB_FT_CHAIN;
> 
> 					In following codes line 1380 and 1392
> 
> 					1368 static int
> mlx5e_rep_setup_ft_cb(enum tc_setup_type type, void
> 					*type_data,
> 					1369                                  void *cb_priv)
> 					1370 {
> 					1371         struct flow_cls_offload *f =
> type_data;
> 					1372         struct flow_cls_offload
> cls_flower;
> 					1373         struct mlx5e_priv *priv =
> cb_priv;
> 					1374         struct mlx5_eswitch *esw;
> 					1375         unsigned long flags;
> 					1376         int err;
> 					1377
> 					1378         flags =
> MLX5_TC_FLAG(INGRESS) |
> 					1379
> MLX5_TC_FLAG(ESW_OFFLOAD) |
> 					1380
> MLX5_TC_FLAG(FT_OFFLOAD);
> 					1381         esw = priv->mdev-
> >priv.eswitch;
> 					1382
> 					1383         switch (type) {
> 					1384         case
> TC_SETUP_CLSFLOWER:
> 					1385                 if
> (!mlx5_eswitch_prios_supported(esw) || f-
> 
> 					common.chain_index)
> 
> 					1386                         return -
> EOPNOTSUPP;
> 					1387
> 					1388                 /* Re-use tc offload
> path by moving the ft flow to the
> 					1389                  * reserved ft chain.
> 					1390                  */
> 					1391                 memcpy(&cls_flower, f,
> sizeof(*f));
> 					1392
> cls_flower.common.chain_index = FDB_FT_CHAIN;
> 					1393                 err =
> mlx5e_rep_setup_tc_cls_flower(priv, &cls_flower,
> 
> 				flags);
> 
> 					1394                 memcpy(&f->stats,
> &cls_flower.stats, sizeof(f->stats));
> 
> 
> 					I want to add tunnel offload support
> in the flow table, I  add some
> 
> 		patches
> 
> 				in
> 
> 					nf_flow_table_offload.
> 
> 					Also add the indr setup support in the
> mlx driver. And Now I can  flow
> 
> 				table
> 
> 					offload with decap.
> 
> 
> 					But I meet a problem with the encap.
> The encap rule can be added in
> 					hardware  successfully But it can't be
> offloaded.
> 
> 					But I think the rule I added is correct.
> If I mask the line 1392. The rule
> 
> 		also
> 
> 				can
> 
> 					be add success and can be offloaded.
> 
> 					So there are some limit for encap
> operation for FT_OFFLOAD in
> 					FDB_FT_CHAIN?
> 
> 
> 					BR
> 
> 					wenxu
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Question about flow table offload in mlx5e
  2019-11-24  8:46                 ` Paul Blakey
@ 2019-11-24 11:14                   ` wenxu
  2019-11-26  8:18                   ` wenxu
  1 sibling, 0 replies; 25+ messages in thread
From: wenxu @ 2019-11-24 11:14 UTC (permalink / raw)
  To: Paul Blakey; +Cc: pablo, netdev, Mark Bloch

sorry for miss some information, There is a real remote and the mac always available.  Not just one packets can't be offloaded. ALL the syn packets with repeated can't be offloaded.  But if there is no the last filter rule. All syn packets can be offload

在 2019/11/24 16:46, Paul Blakey 写道:
> Hi,
>
> The syn packet might not be actually offloaded because there isn't a neighbor to resolve the destination mac for the tunnel destination ip (next hop mac).
> Try setting the neighbor via "ip neigh replace dev mlx5_p0 172.168.152.241 lladdr <next hop mac>"
> Or running ping to 172.168.152.241 before adding (or in background) the rule to resolve the mac and make available.
> I'll test it on my end.
>
>
> Thanks,
> Paul.
>
>
>> -----Original Message-----
>> From: wenxu <wenxu@ucloud.cn>
>> Sent: Friday, November 22, 2019 8:26 AM
>> To: Paul Blakey <paulb@mellanox.com>
>> Cc: pablo@netfilter.org; netdev@vger.kernel.org; Mark Bloch
>> <markb@mellanox.com>
>> Subject: Re: Question about flow table offload in mlx5e
>>
>> Hi Paul,
>>
>>
>> There are some update. I also test it through replacing mlx5e_rep_setup_tc
>> _cb with mlx5e_rep_setup_ft_cb
>>
>>
>> ifconfig mlx_p0 172.168.152.75/24 up
>>
>> ip l add dev tun1 type gretap external
>> tc qdisc add dev tun1 ingress
>> tc qdisc add dev mlx_pf0vf0 ingress
>>
>> tc filter add dev mlx_pf0vf0 pref 2 ingress  protocol ip flower skip_sw
>> ip_proto tcp dst_ip 10.0.1.241 src_ip 10.0.0.75 src_port 5002 dst_port 5001
>> tcp_flags 0/0x5  action tunnel_key set dst_ip 172.168.152.241 src_ip 0 id 1000
>> nocsum pipe action mirred egress redirect dev tun1
>>
>> tc filter add dev tun1 pref 2 ingress  protocol ip flower ip_proto tcp src_ip
>> 10.0.1.241 dst_ip 10.0.0.75 src_port 5001 dst_port 5002 tcp_flags 0/0x5
>> enc_key_id 1000 enc_src_ip 172.168.152.241 action tunnel_key unset pipe
>> action mirred egress redirect dev mlx_pf0vf0
>>
>>
>> If you run this script on the host,  and in the virtual machine  run "iperf -c
>> 10.0.1.241  -i 2  -B 10.0.0.75:5002  -t 1000"
>>
>> The tcp syn packet will not be offloaded
>>
>>
>> But if you  only run the script  without the last filter as following , The tcp syn
>> packet will be offloaded.
>>
>> ifconfig mlx_p0 172.168.152.75/24 up
>>
>> ip l add dev tun1 type gretap external
>> tc qdisc add dev tun1 ingress
>> tc qdisc add dev mlx_pf0vf0 ingress
>>
>> tc filter add dev mlx_pf0vf0 pref 2 ingress  protocol ip flower skip_sw
>> ip_proto tcp dst_ip 10.0.1.241 src_ip 10.0.0.75 src_port 5002 dst_port 5001
>> tcp_flags 0/0x5  action tunnel_key set dst_ip 172.168.152.241 src_ip 0 id 1000
>> nocsum pipe action mirred egress redirect dev tun1.
>>
>> I think there are some problem in mlx5e_rep_setup_ft_cb.
>>
>> On 11/21/2019 9:05 PM, Paul Blakey wrote:
>>
>>
>> 	I see, I will test that, and how about normal FWD rules?
>>
>> 	Paul.
>>
>>
>>
>> 		-----Original Message-----
>> 		From: wenxu <wenxu@ucloud.cn>
>> <mailto:wenxu@ucloud.cn>
>> 		Sent: Thursday, November 21, 2019 2:35 PM
>> 		To: Paul Blakey <paulb@mellanox.com>
>> <mailto:paulb@mellanox.com>
>> 		Cc: pablo@netfilter.org <mailto:pablo@netfilter.org> ;
>> netdev@vger.kernel.org <mailto:netdev@vger.kernel.org> ; Mark Bloch
>> 		<markb@mellanox.com> <mailto:markb@mellanox.com>
>> 		Subject: Re: Question about flow table offload in mlx5e
>>
>>
>> 		在 2019/11/21 19:39, Paul Blakey 写道:
>>
>> 			They are good fixes, exactly what we had when we
>> tested this, thanks.
>>
>> 			Regarding encap, I don't know what changes you did,
>> how does the encap
>>
>> 		rule look? Is it a FWD to vxlan device? If not it should be, as
>> our driver
>> 		expects that.
>> 		It is fwd to a gretap devices
>>
>>
>> 			I tried it on my setup via tc, by changing the callback
>> of tc
>>
>> 		(mlx5e_rep_setup_tc_cb) to that of ft
>> (mlx5e_rep_setup_ft_cb),
>>
>> 			and testing a vxlan encap rule:
>> 			sudo tc qdisc add dev ens1f0_0 ingress
>> 			sudo ifconfig ens1f0 7.7.7.7/24 up
>> 			sudo ip link add name vxlan0 type vxlan dev ens1f0
>> remote 7.7.7.8 dstport
>>
>> 		4789 external
>>
>> 			sudo ifconfig vxlan0 up
>> 			sudo tc filter add dev ens1f0_0 ingress prio 1 chain 0
>> protocol ip flower
>>
>> 		dst_mac aa:bb:cc:dd:ee:ff ip_proto udp skip_sw  action
>> tunnel_key set
>> 		src_ip 0.0.0.0 dst_ip 7.7.7.8 id 1234 dst_port 4789 pipe action
>> mirred egress
>> 		redirect dev vxlan
>>
>>
>> 			then tc show:
>> 			filter protocol ip pref 1 flower chain 0 handle 0x1
>> dst_mac aa:bb:cc:dd:ee:ff
>>
>> 		ip_proto udp skip_sw in_hw in_hw_count 1
>>
>> 			        tunnel_key set src_ip 0.0.0.0 dst_ip 7.7.7.8 key_id
>> 1234 dst_port 4789
>>
>> 		csum pipe
>>
>> 			        Stats: used 119 sec      0 pkt
>> 			        mirred (Egress Redirect to device vxlan0)
>> 			        Stats: used 119 sec      0 pkt
>>
>>
>> 		Can you send packet that match this offloaded flow to check
>> it is real
>> 		offloaded?
>>
>> 		In the flowtable offload with my patches both
>> TC_SETUP_BLOCK and
>> 		TC_SETUP_FT can offload the rule success
>>
>> 		But in the TC_SETUP_FT case the packet is not real offloaded.
>>
>>
>> 		I  will test like u did.
>>
>>
>>
>>
>>
>>
>> 				-----Original Message-----
>> 				From: wenxu <wenxu@ucloud.cn>
>> <mailto:wenxu@ucloud.cn>
>> 				Sent: Thursday, November 21, 2019 10:29 AM
>> 				To: Paul Blakey <paulb@mellanox.com>
>> <mailto:paulb@mellanox.com>
>> 				Cc: pablo@netfilter.org
>> <mailto:pablo@netfilter.org> ; netdev@vger.kernel.org
>> <mailto:netdev@vger.kernel.org> ; Mark Bloch
>> 				<markb@mellanox.com>
>> <mailto:markb@mellanox.com>
>> 				Subject: Re: Question about flow table
>> offload in mlx5e
>>
>>
>> 				On 11/21/2019 3:42 PM, Paul Blakey wrote:
>>
>> 					Hi,
>>
>> 					The original design was the block
>> setup to use TC_SETUP_FT type, and
>>
>> 		the
>>
>> 				tc event type to be case
>> TC_SETUP_CLSFLOWER.
>>
>> 					We will post a patch to change that. I
>> would advise to wait till we fix that
>>
>> 				😊
>>
>> 					I'm not sure how you get to this
>> function mlx5e_rep_setup_ft_cb() if it
>>
>> 		the
>>
>> 				nf_flow_table_offload ndo_setup_tc event
>> was TC_SETUP_BLOCK, and
>>
>> 		not
>>
>> 				TC_SETUP_FT.
>>
>>
>> 				Yes I change the TC_SETUP_BLOCK to
>> TC_SETUP_FT in the
>> 				nf_flow_table_offload_setup.
>>
>> 				Two fixes patch provide:
>>
>> 				http://patchwork.ozlabs.org/patch/1197818/
>>
>> 				http://patchwork.ozlabs.org/patch/1197876/
>>
>> 				So this change made by me is not correct
>> currently?
>>
>>
>> 					In our driver en_rep.c we have:
>>
>> 					-------switch (type) {
>> 					-------case TC_SETUP_BLOCK:
>> 					------->-------return
>> flow_block_cb_setup_simple(type_data,
>> 					------->------->------->------->------->---
>> ----
>>
>> 		&mlx5e_rep_block_tc_cb_list,
>>
>> 					------->------->------->------->------->---
>> ----  mlx5e_rep_setup_tc_cb,
>> 					------->------->------->------->------->---
>> ----  priv, priv, true);
>> 					-------case TC_SETUP_FT:
>> 					------->-------return
>> flow_block_cb_setup_simple(type_data,
>> 					------->------->------->------->------->---
>> ----
>>
>> 		&mlx5e_rep_block_ft_cb_list,
>>
>> 					------->------->------->------->------->---
>> ----  mlx5e_rep_setup_ft_cb,
>> 					------->------->------->------->------->---
>> ----  priv, priv, true);
>> 					-------default:
>> 					------->-------return -EOPNOTSUPP;
>> 					-------}
>>
>> 					In nf_flow_table_offload.c:
>>
>> 					-------bo.binder_type>-=
>>
>> 		FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS;
>>
>> 					-------bo.extack>------= &extack;
>> 					-------INIT_LIST_HEAD(&bo.cb_list);
>> 					-------err = dev->netdev_ops-
>>> ndo_setup_tc(dev, TC_SETUP_BLOCK,
>> 				&bo);
>>
>> 					-------if (err < 0)
>> 					------->-------return err;
>> 					-------return
>> nf_flow_table_block_setup(flowtable, &bo, cmd);
>>
>> 					}
>>
>> 	EXPORT_SYMBOL_GPL(nf_flow_table_offload_setup);
>>
>>
>> 					So unless you changed that as well,
>> you should have gotten to
>>
>> 				mlx5e_rep_setup_tc_cb and not
>> mlx5e_rep_setup_tc_ft.
>>
>> 					Regarding the encap action, there
>> should be no difference on which
>>
>> 		chain
>>
>> 				the rule is on.
>>
>>
>> 				But for the same encap rule can be real
>> offloaded when setup through
>> 				through TC_SETUP_BLOCK. But TC_SETUP_FT
>> can't.
>>
>> 				So it is the problem of TC_SETUP_FT in
>> mlx5e_rep_setup_ft_cb ?
>>
>>
>>
>>
>> 					-----Original Message-----
>> 					From: wenxu <wenxu@ucloud.cn>
>> <mailto:wenxu@ucloud.cn>
>> 					Sent: Thursday, November 21, 2019
>> 9:30 AM
>> 					To: Paul Blakey
>> <paulb@mellanox.com> <mailto:paulb@mellanox.com>
>> 					Cc: pablo@netfilter.org
>> <mailto:pablo@netfilter.org> ; netdev@vger.kernel.org
>> <mailto:netdev@vger.kernel.org> ; Mark Bloch
>> 					<markb@mellanox.com>
>> <mailto:markb@mellanox.com>
>> 					Subject: Question about flow table
>> offload in mlx5e
>>
>> 					Hi  paul,
>>
>> 					The flow table offload in the mlx5e is
>> based on TC_SETUP_FT.
>>
>>
>> 					It is almost the same as
>> TC_SETUP_BLOCK.
>>
>> 					It just set
>> MLX5_TC_FLAG(FT_OFFLOAD) flags and change
>> 					cls_flower.common.chain_index =
>> FDB_FT_CHAIN;
>>
>> 					In following codes line 1380 and 1392
>>
>> 					1368 static int
>> mlx5e_rep_setup_ft_cb(enum tc_setup_type type, void
>> 					*type_data,
>> 					1369                                  void *cb_priv)
>> 					1370 {
>> 					1371         struct flow_cls_offload *f =
>> type_data;
>> 					1372         struct flow_cls_offload
>> cls_flower;
>> 					1373         struct mlx5e_priv *priv =
>> cb_priv;
>> 					1374         struct mlx5_eswitch *esw;
>> 					1375         unsigned long flags;
>> 					1376         int err;
>> 					1377
>> 					1378         flags =
>> MLX5_TC_FLAG(INGRESS) |
>> 					1379
>> MLX5_TC_FLAG(ESW_OFFLOAD) |
>> 					1380
>> MLX5_TC_FLAG(FT_OFFLOAD);
>> 					1381         esw = priv->mdev-
>>> priv.eswitch;
>> 					1382
>> 					1383         switch (type) {
>> 					1384         case
>> TC_SETUP_CLSFLOWER:
>> 					1385                 if
>> (!mlx5_eswitch_prios_supported(esw) || f-
>>
>> 					common.chain_index)
>>
>> 					1386                         return -
>> EOPNOTSUPP;
>> 					1387
>> 					1388                 /* Re-use tc offload
>> path by moving the ft flow to the
>> 					1389                  * reserved ft chain.
>> 					1390                  */
>> 					1391                 memcpy(&cls_flower, f,
>> sizeof(*f));
>> 					1392
>> cls_flower.common.chain_index = FDB_FT_CHAIN;
>> 					1393                 err =
>> mlx5e_rep_setup_tc_cls_flower(priv, &cls_flower,
>>
>> 				flags);
>>
>> 					1394                 memcpy(&f->stats,
>> &cls_flower.stats, sizeof(f->stats));
>>
>>
>> 					I want to add tunnel offload support
>> in the flow table, I  add some
>>
>> 		patches
>>
>> 				in
>>
>> 					nf_flow_table_offload.
>>
>> 					Also add the indr setup support in the
>> mlx driver. And Now I can  flow
>>
>> 				table
>>
>> 					offload with decap.
>>
>>
>> 					But I meet a problem with the encap.
>> The encap rule can be added in
>> 					hardware  successfully But it can't be
>> offloaded.
>>
>> 					But I think the rule I added is correct.
>> If I mask the line 1392. The rule
>>
>> 		also
>>
>> 				can
>>
>> 					be add success and can be offloaded.
>>
>> 					So there are some limit for encap
>> operation for FT_OFFLOAD in
>> 					FDB_FT_CHAIN?
>>
>>
>> 					BR
>>
>> 					wenxu
>>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Question about flow table offload in mlx5e
  2019-11-24  8:46                 ` Paul Blakey
  2019-11-24 11:14                   ` wenxu
@ 2019-11-26  8:18                   ` wenxu
       [not found]                     ` <84874b42-c525-2149-539d-e7510d15f6a6@mellanox.com>
  1 sibling, 1 reply; 25+ messages in thread
From: wenxu @ 2019-11-26  8:18 UTC (permalink / raw)
  To: Paul Blakey; +Cc: pablo, netdev, Mark Bloch

Hi Paul,

Did your test for this case? There are some problem that I reported?


BR

wenxu 

On 11/24/2019 4:46 PM, Paul Blakey wrote:
> Hi,
>
> The syn packet might not be actually offloaded because there isn't a neighbor to resolve the destination mac for the tunnel destination ip (next hop mac).
> Try setting the neighbor via "ip neigh replace dev mlx5_p0 172.168.152.241 lladdr <next hop mac>"
> Or running ping to 172.168.152.241 before adding (or in background) the rule to resolve the mac and make available.
> I'll test it on my end.
>
>
> Thanks,
> Paul.
>
>
>> -----Original Message-----
>> From: wenxu <wenxu@ucloud.cn>
>> Sent: Friday, November 22, 2019 8:26 AM
>> To: Paul Blakey <paulb@mellanox.com>
>> Cc: pablo@netfilter.org; netdev@vger.kernel.org; Mark Bloch
>> <markb@mellanox.com>
>> Subject: Re: Question about flow table offload in mlx5e
>>
>> Hi Paul,
>>
>>
>> There are some update. I also test it through replacing mlx5e_rep_setup_tc
>> _cb with mlx5e_rep_setup_ft_cb
>>
>>
>> ifconfig mlx_p0 172.168.152.75/24 up
>>
>> ip l add dev tun1 type gretap external
>> tc qdisc add dev tun1 ingress
>> tc qdisc add dev mlx_pf0vf0 ingress
>>
>> tc filter add dev mlx_pf0vf0 pref 2 ingress  protocol ip flower skip_sw
>> ip_proto tcp dst_ip 10.0.1.241 src_ip 10.0.0.75 src_port 5002 dst_port 5001
>> tcp_flags 0/0x5  action tunnel_key set dst_ip 172.168.152.241 src_ip 0 id 1000
>> nocsum pipe action mirred egress redirect dev tun1
>>
>> tc filter add dev tun1 pref 2 ingress  protocol ip flower ip_proto tcp src_ip
>> 10.0.1.241 dst_ip 10.0.0.75 src_port 5001 dst_port 5002 tcp_flags 0/0x5
>> enc_key_id 1000 enc_src_ip 172.168.152.241 action tunnel_key unset pipe
>> action mirred egress redirect dev mlx_pf0vf0
>>
>>
>> If you run this script on the host,  and in the virtual machine  run "iperf -c
>> 10.0.1.241  -i 2  -B 10.0.0.75:5002  -t 1000"
>>
>> The tcp syn packet will not be offloaded
>>
>>
>> But if you  only run the script  without the last filter as following , The tcp syn
>> packet will be offloaded.
>>
>> ifconfig mlx_p0 172.168.152.75/24 up
>>
>> ip l add dev tun1 type gretap external
>> tc qdisc add dev tun1 ingress
>> tc qdisc add dev mlx_pf0vf0 ingress
>>
>> tc filter add dev mlx_pf0vf0 pref 2 ingress  protocol ip flower skip_sw
>> ip_proto tcp dst_ip 10.0.1.241 src_ip 10.0.0.75 src_port 5002 dst_port 5001
>> tcp_flags 0/0x5  action tunnel_key set dst_ip 172.168.152.241 src_ip 0 id 1000
>> nocsum pipe action mirred egress redirect dev tun1.
>>
>> I think there are some problem in mlx5e_rep_setup_ft_cb.
>>
>> On 11/21/2019 9:05 PM, Paul Blakey wrote:
>>
>>
>> 	I see, I will test that, and how about normal FWD rules?
>>
>> 	Paul.
>>
>>
>>
>> 		-----Original Message-----
>> 		From: wenxu <wenxu@ucloud.cn>
>> <mailto:wenxu@ucloud.cn>
>> 		Sent: Thursday, November 21, 2019 2:35 PM
>> 		To: Paul Blakey <paulb@mellanox.com>
>> <mailto:paulb@mellanox.com>
>> 		Cc: pablo@netfilter.org <mailto:pablo@netfilter.org> ;
>> netdev@vger.kernel.org <mailto:netdev@vger.kernel.org> ; Mark Bloch
>> 		<markb@mellanox.com> <mailto:markb@mellanox.com>
>> 		Subject: Re: Question about flow table offload in mlx5e
>>
>>
>> 		在 2019/11/21 19:39, Paul Blakey 写道:
>>
>> 			They are good fixes, exactly what we had when we
>> tested this, thanks.
>>
>> 			Regarding encap, I don't know what changes you did,
>> how does the encap
>>
>> 		rule look? Is it a FWD to vxlan device? If not it should be, as
>> our driver
>> 		expects that.
>> 		It is fwd to a gretap devices
>>
>>
>> 			I tried it on my setup via tc, by changing the callback
>> of tc
>>
>> 		(mlx5e_rep_setup_tc_cb) to that of ft
>> (mlx5e_rep_setup_ft_cb),
>>
>> 			and testing a vxlan encap rule:
>> 			sudo tc qdisc add dev ens1f0_0 ingress
>> 			sudo ifconfig ens1f0 7.7.7.7/24 up
>> 			sudo ip link add name vxlan0 type vxlan dev ens1f0
>> remote 7.7.7.8 dstport
>>
>> 		4789 external
>>
>> 			sudo ifconfig vxlan0 up
>> 			sudo tc filter add dev ens1f0_0 ingress prio 1 chain 0
>> protocol ip flower
>>
>> 		dst_mac aa:bb:cc:dd:ee:ff ip_proto udp skip_sw  action
>> tunnel_key set
>> 		src_ip 0.0.0.0 dst_ip 7.7.7.8 id 1234 dst_port 4789 pipe action
>> mirred egress
>> 		redirect dev vxlan
>>
>>
>> 			then tc show:
>> 			filter protocol ip pref 1 flower chain 0 handle 0x1
>> dst_mac aa:bb:cc:dd:ee:ff
>>
>> 		ip_proto udp skip_sw in_hw in_hw_count 1
>>
>> 			        tunnel_key set src_ip 0.0.0.0 dst_ip 7.7.7.8 key_id
>> 1234 dst_port 4789
>>
>> 		csum pipe
>>
>> 			        Stats: used 119 sec      0 pkt
>> 			        mirred (Egress Redirect to device vxlan0)
>> 			        Stats: used 119 sec      0 pkt
>>
>>
>> 		Can you send packet that match this offloaded flow to check
>> it is real
>> 		offloaded?
>>
>> 		In the flowtable offload with my patches both
>> TC_SETUP_BLOCK and
>> 		TC_SETUP_FT can offload the rule success
>>
>> 		But in the TC_SETUP_FT case the packet is not real offloaded.
>>
>>
>> 		I  will test like u did.
>>
>>
>>
>>
>>
>>
>> 				-----Original Message-----
>> 				From: wenxu <wenxu@ucloud.cn>
>> <mailto:wenxu@ucloud.cn>
>> 				Sent: Thursday, November 21, 2019 10:29 AM
>> 				To: Paul Blakey <paulb@mellanox.com>
>> <mailto:paulb@mellanox.com>
>> 				Cc: pablo@netfilter.org
>> <mailto:pablo@netfilter.org> ; netdev@vger.kernel.org
>> <mailto:netdev@vger.kernel.org> ; Mark Bloch
>> 				<markb@mellanox.com>
>> <mailto:markb@mellanox.com>
>> 				Subject: Re: Question about flow table
>> offload in mlx5e
>>
>>
>> 				On 11/21/2019 3:42 PM, Paul Blakey wrote:
>>
>> 					Hi,
>>
>> 					The original design was the block
>> setup to use TC_SETUP_FT type, and
>>
>> 		the
>>
>> 				tc event type to be case
>> TC_SETUP_CLSFLOWER.
>>
>> 					We will post a patch to change that. I
>> would advise to wait till we fix that
>>
>> 				😊
>>
>> 					I'm not sure how you get to this
>> function mlx5e_rep_setup_ft_cb() if it
>>
>> 		the
>>
>> 				nf_flow_table_offload ndo_setup_tc event
>> was TC_SETUP_BLOCK, and
>>
>> 		not
>>
>> 				TC_SETUP_FT.
>>
>>
>> 				Yes I change the TC_SETUP_BLOCK to
>> TC_SETUP_FT in the
>> 				nf_flow_table_offload_setup.
>>
>> 				Two fixes patch provide:
>>
>> 				http://patchwork.ozlabs.org/patch/1197818/
>>
>> 				http://patchwork.ozlabs.org/patch/1197876/
>>
>> 				So this change made by me is not correct
>> currently?
>>
>>
>> 					In our driver en_rep.c we have:
>>
>> 					-------switch (type) {
>> 					-------case TC_SETUP_BLOCK:
>> 					------->-------return
>> flow_block_cb_setup_simple(type_data,
>> 					------->------->------->------->------->---
>> ----
>>
>> 		&mlx5e_rep_block_tc_cb_list,
>>
>> 					------->------->------->------->------->---
>> ----  mlx5e_rep_setup_tc_cb,
>> 					------->------->------->------->------->---
>> ----  priv, priv, true);
>> 					-------case TC_SETUP_FT:
>> 					------->-------return
>> flow_block_cb_setup_simple(type_data,
>> 					------->------->------->------->------->---
>> ----
>>
>> 		&mlx5e_rep_block_ft_cb_list,
>>
>> 					------->------->------->------->------->---
>> ----  mlx5e_rep_setup_ft_cb,
>> 					------->------->------->------->------->---
>> ----  priv, priv, true);
>> 					-------default:
>> 					------->-------return -EOPNOTSUPP;
>> 					-------}
>>
>> 					In nf_flow_table_offload.c:
>>
>> 					-------bo.binder_type>-=
>>
>> 		FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS;
>>
>> 					-------bo.extack>------= &extack;
>> 					-------INIT_LIST_HEAD(&bo.cb_list);
>> 					-------err = dev->netdev_ops-
>>> ndo_setup_tc(dev, TC_SETUP_BLOCK,
>> 				&bo);
>>
>> 					-------if (err < 0)
>> 					------->-------return err;
>> 					-------return
>> nf_flow_table_block_setup(flowtable, &bo, cmd);
>>
>> 					}
>>
>> 	EXPORT_SYMBOL_GPL(nf_flow_table_offload_setup);
>>
>>
>> 					So unless you changed that as well,
>> you should have gotten to
>>
>> 				mlx5e_rep_setup_tc_cb and not
>> mlx5e_rep_setup_tc_ft.
>>
>> 					Regarding the encap action, there
>> should be no difference on which
>>
>> 		chain
>>
>> 				the rule is on.
>>
>>
>> 				But for the same encap rule can be real
>> offloaded when setup through
>> 				through TC_SETUP_BLOCK. But TC_SETUP_FT
>> can't.
>>
>> 				So it is the problem of TC_SETUP_FT in
>> mlx5e_rep_setup_ft_cb ?
>>
>>
>>
>>
>> 					-----Original Message-----
>> 					From: wenxu <wenxu@ucloud.cn>
>> <mailto:wenxu@ucloud.cn>
>> 					Sent: Thursday, November 21, 2019
>> 9:30 AM
>> 					To: Paul Blakey
>> <paulb@mellanox.com> <mailto:paulb@mellanox.com>
>> 					Cc: pablo@netfilter.org
>> <mailto:pablo@netfilter.org> ; netdev@vger.kernel.org
>> <mailto:netdev@vger.kernel.org> ; Mark Bloch
>> 					<markb@mellanox.com>
>> <mailto:markb@mellanox.com>
>> 					Subject: Question about flow table
>> offload in mlx5e
>>
>> 					Hi  paul,
>>
>> 					The flow table offload in the mlx5e is
>> based on TC_SETUP_FT.
>>
>>
>> 					It is almost the same as
>> TC_SETUP_BLOCK.
>>
>> 					It just set
>> MLX5_TC_FLAG(FT_OFFLOAD) flags and change
>> 					cls_flower.common.chain_index =
>> FDB_FT_CHAIN;
>>
>> 					In following codes line 1380 and 1392
>>
>> 					1368 static int
>> mlx5e_rep_setup_ft_cb(enum tc_setup_type type, void
>> 					*type_data,
>> 					1369                                  void *cb_priv)
>> 					1370 {
>> 					1371         struct flow_cls_offload *f =
>> type_data;
>> 					1372         struct flow_cls_offload
>> cls_flower;
>> 					1373         struct mlx5e_priv *priv =
>> cb_priv;
>> 					1374         struct mlx5_eswitch *esw;
>> 					1375         unsigned long flags;
>> 					1376         int err;
>> 					1377
>> 					1378         flags =
>> MLX5_TC_FLAG(INGRESS) |
>> 					1379
>> MLX5_TC_FLAG(ESW_OFFLOAD) |
>> 					1380
>> MLX5_TC_FLAG(FT_OFFLOAD);
>> 					1381         esw = priv->mdev-
>>> priv.eswitch;
>> 					1382
>> 					1383         switch (type) {
>> 					1384         case
>> TC_SETUP_CLSFLOWER:
>> 					1385                 if
>> (!mlx5_eswitch_prios_supported(esw) || f-
>>
>> 					common.chain_index)
>>
>> 					1386                         return -
>> EOPNOTSUPP;
>> 					1387
>> 					1388                 /* Re-use tc offload
>> path by moving the ft flow to the
>> 					1389                  * reserved ft chain.
>> 					1390                  */
>> 					1391                 memcpy(&cls_flower, f,
>> sizeof(*f));
>> 					1392
>> cls_flower.common.chain_index = FDB_FT_CHAIN;
>> 					1393                 err =
>> mlx5e_rep_setup_tc_cls_flower(priv, &cls_flower,
>>
>> 				flags);
>>
>> 					1394                 memcpy(&f->stats,
>> &cls_flower.stats, sizeof(f->stats));
>>
>>
>> 					I want to add tunnel offload support
>> in the flow table, I  add some
>>
>> 		patches
>>
>> 				in
>>
>> 					nf_flow_table_offload.
>>
>> 					Also add the indr setup support in the
>> mlx driver. And Now I can  flow
>>
>> 				table
>>
>> 					offload with decap.
>>
>>
>> 					But I meet a problem with the encap.
>> The encap rule can be added in
>> 					hardware  successfully But it can't be
>> offloaded.
>>
>> 					But I think the rule I added is correct.
>> If I mask the line 1392. The rule
>>
>> 		also
>>
>> 				can
>>
>> 					be add success and can be offloaded.
>>
>> 					So there are some limit for encap
>> operation for FT_OFFLOAD in
>> 					FDB_FT_CHAIN?
>>
>>
>> 					BR
>>
>> 					wenxu
>>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Question about flow table offload in mlx5e
       [not found]                     ` <84874b42-c525-2149-539d-e7510d15f6a6@mellanox.com>
@ 2019-11-27 12:16                       ` wenxu
  2019-11-27 12:16                       ` wenxu
  2019-11-28  5:03                       ` Bad performance for VF outgoing in offloaded mode wenxu
  2 siblings, 0 replies; 25+ messages in thread
From: wenxu @ 2019-11-27 12:16 UTC (permalink / raw)
  To: Paul Blakey; +Cc: pablo, netdev, Mark Bloch

Sorry maybe something mess you,  Ignore with my patches.


I also did the test like you with route tc rules to ft callback.


please also did the following test:  mlx_p0 is the pf and mlx_pf0vf0 is the vf .

ifconfig mlx_p0 172.168.152.75/24 up
ip n replace 172.168.152.241 dev mlx_p0 lladdr aa:bb:cc:dd:ee:ff

ip l add dev tun1 type gretap external
tc qdisc add dev tun1 ingress
tc qdisc add dev mlx_pf0vf0 ingress

tc filter add dev mlx_pf0vf0 pref 2 ingress  protocol ip flower skip_sw
ip_proto tcp dst_ip 10.0.1.241 src_ip 10.0.0.75 src_port 5002 dst_port 5001
tcp_flags 0/0x5  action tunnel_key set dst_ip 172.168.152.241 src_ip 0 id 1000
nocsum pipe action mirred egress redirect dev tun1

In the virtual machine:
ifconfig eth0 10.0.0.75/24
ip r a default via 10.0.0.1
ip n replace 10.0.0.1 dev eth0 lladdr aa:bb:cc:dd:ee:01

iperf -c 10.0.1.241  -i 2  -B 10.0.0.75:5002  -t 10


The script above can offload the syn packets, The packet can't be captured on mlx_pf0vf0.

I think the rule is ok.  The problem is that if I add another rule in device tun1 as following.
It will lead the syn packet can't be offloaded

tc filter add dev tun1 pref 2 ingress  protocol ip flower ip_proto tcp src_ip
10.0.1.241 dst_ip 10.0.0.75 src_port 5001 dst_port 5002 tcp_flags 0/0x5
enc_key_id 1000 enc_src_ip 172.168.152.241 action tunnel_key unset pipe
action mirred egress redirect dev mlx_pf0vf0






在 2019/11/27 19:51, Paul Blakey 写道:
> Sorry I didn't have time apply your patches.
>
> I did test it again with route tc rules to ft callback, here's the diff:
>
> @@ -1291,7 +1304,7 @@ static int mlx5e_rep_setup_tc(struct net_device *dev, enum tc_setup_type type,
>         case TC_SETUP_BLOCK:
>                 return flow_block_cb_setup_simple(type_data,
>                                                   &mlx5e_rep_block_tc_cb_list,
> -                                                 mlx5e_rep_setup_tc_cb,
> +                                                 mlx5e_rep_setup_ft_cb,
>                                                   priv, priv, true);
>         case TC_SETUP_FT:
>                 return flow_block_cb_setup_simple(type_data,
>
>
> I ran this script after creating a VF (ens1f2) and entering switchdev mode (creating represntor ens1f0_0):
>
> ip l add dev tun1 type gretap external
> tc qdisc add dev tun1 ingress
> ifconfig tun1 up
>
> ifconfig ens1f0_0 0 up
> tc qdisc add dev ens1f0_0 ingress
>
> ifconfig ens1f0 172.168.152.75/24 up
> ip n replace 172.168.162.241 dev ens1f0 lladdr aa:bb:cc:dd:ee:01
>
> tc filter del dev ens1f0_0 ingress
>
> tc filter add dev ens1f0_0 pref 2 ingress proto ip flower \
>      skip_sw \
>      ip_proto tcp dst_ip 5.5.5.6 src_ip 5.5.5.5 tcp_flags 0/0x5 \
>      action tunnel_key set dst_ip 172.168.152.241 src_ip 0 id 1000 nocsum pipe \
>      action mirred egress redirect dev tun1
>
> tc filter add dev ens1f0_0 pref 1 ingress proto ip flower \
>      skip_hw \
>      action drop
>
> ifconfig ens1f2 5.5.5.5/24 up
> ip n replace 5.5.5.6 dev ens1f2 lladdr aa:bb:cc:dd:ee:ff
>
> timeout 3 iperf -c 5.5.5.6
>
> tc -s filter show dev ens1f0_0 ingress
>
>
>
> Notice I run iperf client without Iperf server, so I get only syn packets.
>
> Here is the tcpdump on the VF (ens1f2):
>
> Executing: sudo tcpdump -nnep  -i ens1f2
>
> Executing: sudo tcpdump -nnep  -i ens1f2
> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
> listening on ens1f2, link-type EN10MB (Ethernet), capture size 262144 bytes
> 13:49:10.610376 24:8a:07:a5:28:01 > aa:bb:cc:dd:ee:ff, ethertype IPv4 (0x0800), length 74: 5.5.5.5.49846 > 5.5.5.6.5001: Flags [S], seq 1395738427, win 64240, options [mss 1460,sackOK,TS val 2249857484 ecr 0,nop,wscale 7], length 0
> 13:49:11.616262 24:8a:07:a5:28:01 > aa:bb:cc:dd:ee:ff, ethertype IPv4 (0x0800), length 74: 5.5.5.5.49846 > 5.5.5.6.5001: Flags [S], seq 1395738427, win 64240, options [mss 1460,sackOK,TS val 2249858489 ecr 0,nop,wscale 7], length 0
> 13:49:13.664261 24:8a:07:a5:28:01 > aa:bb:cc:dd:ee:ff, ethertype IPv4 (0x0800), length 74: 5.5.5.5.49846 > 5.5.5.6.5001: Flags [S], seq 1395738427, win 64240, options [mss 1460,sackOK,TS val 2249860537 ecr 0,nop,wscale 7], length 0
> 13:49:17.696261 24:8a:07:a5:28:01 > aa:bb:cc:dd:e
>
> I get:
>
> filter protocol ip pref 1 flower chain 0
> filter protocol ip pref 1 flower chain 0 handle 0x1
>   eth_type ipv4
>   skip_hw
>   not_in_hw
>         action order 1: gact action drop
>          random type none pass val 0
>          index 1 ref 1 bind 1 installed 3 sec used 3 sec
>         Action statistics:
>         Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>         backlog 0b 0p requeues 0
>
> filter protocol ip pref 2 flower chain 0
> filter protocol ip pref 2 flower chain 0 handle 0x1
>   eth_type ipv4
>   ip_proto tcp
>   dst_ip 5.5.5.6
>   src_ip 5.5.5.5
>   tcp_flags 0/5
>   skip_sw
>   in_hw
>         action order 1: tunnel_key set
>         src_ip 0.0.0.0
>         dst_ip 172.168.152.241
>         key_id 1000
>         nocsum pipe
>         index 1 ref 1 bind 1 installed 3 sec used 3 sec
>         Action statistics:
>         Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>         backlog 0b 0p requeues 0
>
>         action order 2: mirred (Egress Redirect to device tun1) stolen
>         index 1 ref 1 bind 1 installed 3 sec used 1 sec
>         Action statistics:
>         Sent 232 bytes 2 pkt (dropped 0, overlimits 0 requeues 0)
>         backlog 0b 0p requeues 0
>
>
> And it counts the 2 syn packets in hardware, the packets are leaving the VF (ens1f2) and not arriving at the mlx5 representor device (ens1f0_0),
>
> which means hardware got them. It's also couned in the above encap rule. And the software only (skip_hw, prio 1) rule didn't catch any packets.
>
>
> Thanks,
>
> Paul.
>
>
>
> On 11/26/2019 10:18 AM, wenxu wrote:
>
> Hi Paul,
>
> Did your test for this case? There are some problem that I reported?
>
>
> BR
>
> wenxu
>
> On 11/24/2019 4:46 PM, Paul Blakey wrote:
>
>
> Hi,
>
> The syn packet might not be actually offloaded because there isn't a neighbor to resolve the destination mac for the tunnel destination ip (next hop mac).
> Try setting the neighbor via "ip neigh replace dev mlx5_p0 172.168.152.241 lladdr <next hop mac>"
> Or running ping to 172.168.152.241 before adding (or in background) the rule to resolve the mac and make available.
> I'll test it on my end.
>
>
> Thanks,
> Paul.
>
>
>
>
> -----Original Message-----
> From: wenxu <wenxu@ucloud.cn><mailto:wenxu@ucloud.cn>
> Sent: Friday, November 22, 2019 8:26 AM
> To: Paul Blakey <paulb@mellanox.com><mailto:paulb@mellanox.com>
> Cc: pablo@netfilter.org<mailto:pablo@netfilter.org>; netdev@vger.kernel.org<mailto:netdev@vger.kernel.org>; Mark Bloch
> <markb@mellanox.com><mailto:markb@mellanox.com>
> Subject: Re: Question about flow table offload in mlx5e
>
> Hi Paul,
>
>
> There are some update. I also test it through replacing mlx5e_rep_setup_tc
> _cb with mlx5e_rep_setup_ft_cb
>
>
> ifconfig mlx_p0 172.168.152.75/24 up
>
> ip l add dev tun1 type gretap external
> tc qdisc add dev tun1 ingress
> tc qdisc add dev mlx_pf0vf0 ingress
>
> tc filter add dev mlx_pf0vf0 pref 2 ingress  protocol ip flower skip_sw
> ip_proto tcp dst_ip 10.0.1.241 src_ip 10.0.0.75 src_port 5002 dst_port 5001
> tcp_flags 0/0x5  action tunnel_key set dst_ip 172.168.152.241 src_ip 0 id 1000
> nocsum pipe action mirred egress redirect dev tun1
>
> tc filter add dev tun1 pref 2 ingress  protocol ip flower ip_proto tcp src_ip
> 10.0.1.241 dst_ip 10.0.0.75 src_port 5001 dst_port 5002 tcp_flags 0/0x5
> enc_key_id 1000 enc_src_ip 172.168.152.241 action tunnel_key unset pipe
> action mirred egress redirect dev mlx_pf0vf0
>
>
> If you run this script on the host,  and in the virtual machine  run "iperf -c
> 10.0.1.241  -i 2  -B 10.0.0.75:5002  -t 1000"
>
> The tcp syn packet will not be offloaded
>
>
> But if you  only run the script  without the last filter as following , The tcp syn
> packet will be offloaded.
>
> ifconfig mlx_p0 172.168.152.75/24 up
>
> ip l add dev tun1 type gretap external
> tc qdisc add dev tun1 ingress
> tc qdisc add dev mlx_pf0vf0 ingress
>
> tc filter add dev mlx_pf0vf0 pref 2 ingress  protocol ip flower skip_sw
> ip_proto tcp dst_ip 10.0.1.241 src_ip 10.0.0.75 src_port 5002 dst_port 5001
> tcp_flags 0/0x5  action tunnel_key set dst_ip 172.168.152.241 src_ip 0 id 1000
> nocsum pipe action mirred egress redirect dev tun1.
>
> I think there are some problem in mlx5e_rep_setup_ft_cb.
>
> On 11/21/2019 9:05 PM, Paul Blakey wrote:
>
>
>         I see, I will test that, and how about normal FWD rules?
>
>         Paul.
>
>
>
>                 -----Original Message-----
>                 From: wenxu <wenxu@ucloud.cn><mailto:wenxu@ucloud.cn>
> <mailto:wenxu@ucloud.cn><mailto:wenxu@ucloud.cn>
>                 Sent: Thursday, November 21, 2019 2:35 PM
>                 To: Paul Blakey <paulb@mellanox.com><mailto:paulb@mellanox.com>
> <mailto:paulb@mellanox.com><mailto:paulb@mellanox.com>
>                 Cc: pablo@netfilter.org<mailto:pablo@netfilter.org> <mailto:pablo@netfilter.org><mailto:pablo@netfilter.org> ;
> netdev@vger.kernel.org<mailto:netdev@vger.kernel.org> <mailto:netdev@vger.kernel.org><mailto:netdev@vger.kernel.org> ; Mark Bloch
>                 <markb@mellanox.com><mailto:markb@mellanox.com> <mailto:markb@mellanox.com><mailto:markb@mellanox.com>
>                 Subject: Re: Question about flow table offload in mlx5e
>
>
>                 在 2019/11/21 19:39, Paul Blakey 写道:
>
>                         They are good fixes, exactly what we had when we
> tested this, thanks.
>
>                         Regarding encap, I don't know what changes you did,
> how does the encap
>
>                 rule look? Is it a FWD to vxlan device? If not it should be, as
> our driver
>                 expects that.
>                 It is fwd to a gretap devices
>
>
>                         I tried it on my setup via tc, by changing the callback
> of tc
>
>                 (mlx5e_rep_setup_tc_cb) to that of ft
> (mlx5e_rep_setup_ft_cb),
>
>                         and testing a vxlan encap rule:
>                         sudo tc qdisc add dev ens1f0_0 ingress
>                         sudo ifconfig ens1f0 7.7.7.7/24 up
>                         sudo ip link add name vxlan0 type vxlan dev ens1f0
> remote 7.7.7.8 dstport
>
>                 4789 external
>
>                         sudo ifconfig vxlan0 up
>                         sudo tc filter add dev ens1f0_0 ingress prio 1 chain 0
> protocol ip flower
>
>                 dst_mac aa:bb:cc:dd:ee:ff ip_proto udp skip_sw  action
> tunnel_key set
>                 src_ip 0.0.0.0 dst_ip 7.7.7.8 id 1234 dst_port 4789 pipe action
> mirred egress
>                 redirect dev vxlan
>
>
>                         then tc show:
>                         filter protocol ip pref 1 flower chain 0 handle 0x1
> dst_mac aa:bb:cc:dd:ee:ff
>
>                 ip_proto udp skip_sw in_hw in_hw_count 1
>
>                                 tunnel_key set src_ip 0.0.0.0 dst_ip 7.7.7.8 key_id
> 1234 dst_port 4789
>
>                 csum pipe
>
>                                 Stats: used 119 sec      0 pkt
>                                 mirred (Egress Redirect to device vxlan0)
>                                 Stats: used 119 sec      0 pkt
>
>
>                 Can you send packet that match this offloaded flow to check
> it is real
>                 offloaded?
>
>                 In the flowtable offload with my patches both
> TC_SETUP_BLOCK and
>                 TC_SETUP_FT can offload the rule success
>
>                 But in the TC_SETUP_FT case the packet is not real offloaded.
>
>
>                 I  will test like u did.
>
>
>
>
>
>
>                                 -----Original Message-----
>                                 From: wenxu <wenxu@ucloud.cn><mailto:wenxu@ucloud.cn>
> <mailto:wenxu@ucloud.cn><mailto:wenxu@ucloud.cn>
>                                 Sent: Thursday, November 21, 2019 10:29 AM
>                                 To: Paul Blakey <paulb@mellanox.com><mailto:paulb@mellanox.com>
> <mailto:paulb@mellanox.com><mailto:paulb@mellanox.com>
>                                 Cc: pablo@netfilter.org<mailto:pablo@netfilter.org>
> <mailto:pablo@netfilter.org><mailto:pablo@netfilter.org> ; netdev@vger.kernel.org<mailto:netdev@vger.kernel.org>
> <mailto:netdev@vger.kernel.org><mailto:netdev@vger.kernel.org> ; Mark Bloch
>                                 <markb@mellanox.com><mailto:markb@mellanox.com>
> <mailto:markb@mellanox.com><mailto:markb@mellanox.com>
>                                 Subject: Re: Question about flow table
> offload in mlx5e
>
>
>                                 On 11/21/2019 3:42 PM, Paul Blakey wrote:
>
>                                         Hi,
>
>                                         The original design was the block
> setup to use TC_SETUP_FT type, and
>
>                 the
>
>                                 tc event type to be case
> TC_SETUP_CLSFLOWER.
>
>                                         We will post a patch to change that. I
> would advise to wait till we fix that
>
>                                 😊
>
>                                         I'm not sure how you get to this
> function mlx5e_rep_setup_ft_cb() if it
>
>                 the
>
>                                 nf_flow_table_offload ndo_setup_tc event
> was TC_SETUP_BLOCK, and
>
>                 not
>
>                                 TC_SETUP_FT.
>
>
>                                 Yes I change the TC_SETUP_BLOCK to
> TC_SETUP_FT in the
>                                 nf_flow_table_offload_setup.
>
>                                 Two fixes patch provide:
>
>                                 http://patchwork.ozlabs.org/patch/1197818/
>
>                                 http://patchwork.ozlabs.org/patch/1197876/
>
>                                 So this change made by me is not correct
> currently?
>
>
>                                         In our driver en_rep.c we have:
>
>                                         -------switch (type) {
>                                         -------case TC_SETUP_BLOCK:
>                                         ------->-------return
> flow_block_cb_setup_simple(type_data,
>                                         ------->------->------->------->------->---
> ----
>
>                 &mlx5e_rep_block_tc_cb_list,
>
>                                         ------->------->------->------->------->---
> ----  mlx5e_rep_setup_tc_cb,
>                                         ------->------->------->------->------->---
> ----  priv, priv, true);
>                                         -------case TC_SETUP_FT:
>                                         ------->-------return
> flow_block_cb_setup_simple(type_data,
>                                         ------->------->------->------->------->---
> ----
>
>                 &mlx5e_rep_block_ft_cb_list,
>
>                                         ------->------->------->------->------->---
> ----  mlx5e_rep_setup_ft_cb,
>                                         ------->------->------->------->------->---
> ----  priv, priv, true);
>                                         -------default:
>                                         ------->-------return -EOPNOTSUPP;
>                                         -------}
>
>                                         In nf_flow_table_offload.c:
>
>                                         -------bo.binder_type>-=
>
>                 FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS;
>
>                                         -------bo.extack>------= &extack;
>                                         -------INIT_LIST_HEAD(&bo.cb_list);
>                                         -------err = dev->netdev_ops-
>
>
> ndo_setup_tc(dev, TC_SETUP_BLOCK,
>
>
>                                 &bo);
>
>                                         -------if (err < 0)
>                                         ------->-------return err;
>                                         -------return
> nf_flow_table_block_setup(flowtable, &bo, cmd);
>
>                                         }
>
>         EXPORT_SYMBOL_GPL(nf_flow_table_offload_setup);
>
>
>                                         So unless you changed that as well,
> you should have gotten to
>
>                                 mlx5e_rep_setup_tc_cb and not
> mlx5e_rep_setup_tc_ft.
>
>                                         Regarding the encap action, there
> should be no difference on which
>
>                 chain
>
>                                 the rule is on.
>
>
>                                 But for the same encap rule can be real
> offloaded when setup through
>                                 through TC_SETUP_BLOCK. But TC_SETUP_FT
> can't.
>
>                                 So it is the problem of TC_SETUP_FT in
> mlx5e_rep_setup_ft_cb ?
>
>
>
>
>                                         -----Original Message-----
>                                         From: wenxu <wenxu@ucloud.cn><mailto:wenxu@ucloud.cn>
> <mailto:wenxu@ucloud.cn><mailto:wenxu@ucloud.cn>
>                                         Sent: Thursday, November 21, 2019
> 9:30 AM
>                                         To: Paul Blakey
> <paulb@mellanox.com><mailto:paulb@mellanox.com> <mailto:paulb@mellanox.com><mailto:paulb@mellanox.com>
>                                         Cc: pablo@netfilter.org<mailto:pablo@netfilter.org>
> <mailto:pablo@netfilter.org><mailto:pablo@netfilter.org> ; netdev@vger.kernel.org<mailto:netdev@vger.kernel.org>
> <mailto:netdev@vger.kernel.org><mailto:netdev@vger.kernel.org> ; Mark Bloch
>                                         <markb@mellanox.com><mailto:markb@mellanox.com>
> <mailto:markb@mellanox.com><mailto:markb@mellanox.com>
>                                         Subject: Question about flow table
> offload in mlx5e
>
>                                         Hi  paul,
>
>                                         The flow table offload in the mlx5e is
> based on TC_SETUP_FT.
>
>
>                                         It is almost the same as
> TC_SETUP_BLOCK.
>
>                                         It just set
> MLX5_TC_FLAG(FT_OFFLOAD) flags and change
>                                         cls_flower.common.chain_index =
> FDB_FT_CHAIN;
>
>                                         In following codes line 1380 and 1392
>
>                                         1368 static int
> mlx5e_rep_setup_ft_cb(enum tc_setup_type type, void
>                                         *type_data,
>                                         1369                                  void *cb_priv)
>                                         1370 {
>                                         1371         struct flow_cls_offload *f =
> type_data;
>                                         1372         struct flow_cls_offload
> cls_flower;
>                                         1373         struct mlx5e_priv *priv =
> cb_priv;
>                                         1374         struct mlx5_eswitch *esw;
>                                         1375         unsigned long flags;
>                                         1376         int err;
>                                         1377
>                                         1378         flags =
> MLX5_TC_FLAG(INGRESS) |
>                                         1379
> MLX5_TC_FLAG(ESW_OFFLOAD) |
>                                         1380
> MLX5_TC_FLAG(FT_OFFLOAD);
>                                         1381         esw = priv->mdev-
>
>
> priv.eswitch;
>
>
>                                         1382
>                                         1383         switch (type) {
>                                         1384         case
> TC_SETUP_CLSFLOWER:
>                                         1385                 if
> (!mlx5_eswitch_prios_supported(esw) || f-
>
>                                         common.chain_index)
>
>                                         1386                         return -
> EOPNOTSUPP;
>                                         1387
>                                         1388                 /* Re-use tc offload
> path by moving the ft flow to the
>                                         1389                  * reserved ft chain.
>                                         1390                  */
>                                         1391                 memcpy(&cls_flower, f,
> sizeof(*f));
>                                         1392
> cls_flower.common.chain_index = FDB_FT_CHAIN;
>                                         1393                 err =
> mlx5e_rep_setup_tc_cls_flower(priv, &cls_flower,
>
>                                 flags);
>
>                                         1394                 memcpy(&f->stats,
> &cls_flower.stats, sizeof(f->stats));
>
>
>                                         I want to add tunnel offload support
> in the flow table, I  add some
>
>                 patches
>
>                                 in
>
>                                         nf_flow_table_offload.
>
>                                         Also add the indr setup support in the
> mlx driver. And Now I can  flow
>
>                                 table
>
>                                         offload with decap.
>
>
>                                         But I meet a problem with the encap.
> The encap rule can be added in
>                                         hardware  successfully But it can't be
> offloaded.
>
>                                         But I think the rule I added is correct.
> If I mask the line 1392. The rule
>
>                 also
>
>                                 can
>
>                                         be add success and can be offloaded.
>
>                                         So there are some limit for encap
> operation for FT_OFFLOAD in
>                                         FDB_FT_CHAIN?
>
>
>                                         BR
>
>                                         wenxu
>
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Question about flow table offload in mlx5e
       [not found]                     ` <84874b42-c525-2149-539d-e7510d15f6a6@mellanox.com>
  2019-11-27 12:16                       ` wenxu
@ 2019-11-27 12:16                       ` wenxu
  2019-11-27 13:11                         ` Paul Blakey
  2019-11-28  5:03                       ` Bad performance for VF outgoing in offloaded mode wenxu
  2 siblings, 1 reply; 25+ messages in thread
From: wenxu @ 2019-11-27 12:16 UTC (permalink / raw)
  To: Paul Blakey; +Cc: pablo, netdev, Mark Bloch

Sorry maybe something mess you,  Ignore with my patches.


I also did the test like you with route tc rules to ft callback.


please also did the following test:  mlx_p0 is the pf and mlx_pf0vf0 is the vf .

ifconfig mlx_p0 172.168.152.75/24 up
ip n replace 172.168.152.241 dev mlx_p0 lladdr aa:bb:cc:dd:ee:ff

ip l add dev tun1 type gretap external
tc qdisc add dev tun1 ingress
tc qdisc add dev mlx_pf0vf0 ingress

tc filter add dev mlx_pf0vf0 pref 2 ingress  protocol ip flower skip_sw
ip_proto tcp dst_ip 10.0.1.241 src_ip 10.0.0.75 src_port 5002 dst_port 5001
tcp_flags 0/0x5  action tunnel_key set dst_ip 172.168.152.241 src_ip 0 id 1000
nocsum pipe action mirred egress redirect dev tun1

In the virtual machine:
ifconfig eth0 10.0.0.75/24
ip r a default via 10.0.0.1
ip n replace 10.0.0.1 dev eth0 lladdr aa:bb:cc:dd:ee:01

iperf -c 10.0.1.241  -i 2  -B 10.0.0.75:5002  -t 10


The script above can offload the syn packets, The packet can't be captured on mlx_pf0vf0.

I think the rule is ok.  The problem is that if I add another rule in device tun1 as following
It will lead the syn packet can't be offloaded

tc filter add dev tun1 pref 2 ingress  protocol ip flower ip_proto tcp src_ip
10.0.1.241 dst_ip 10.0.0.75 src_port 5001 dst_port 5002 tcp_flags 0/0x5
enc_key_id 1000 enc_src_ip 172.168.152.241 action tunnel_key unset pipe
action mirred egress redirect dev mlx_pf0vf0






在 2019/11/27 19:51, Paul Blakey 写道:
> Sorry I didn't have time apply your patches.
>
> I did test it again with route tc rules to ft callback, here's the diff:
>
> @@ -1291,7 +1304,7 @@ static int mlx5e_rep_setup_tc(struct net_device *dev, enum tc_setup_type type,
>         case TC_SETUP_BLOCK:
>                 return flow_block_cb_setup_simple(type_data,
>                                                   &mlx5e_rep_block_tc_cb_list,
> -                                                 mlx5e_rep_setup_tc_cb,
> +                                                 mlx5e_rep_setup_ft_cb,
>                                                   priv, priv, true);
>         case TC_SETUP_FT:
>                 return flow_block_cb_setup_simple(type_data,
>
>
> I ran this script after creating a VF (ens1f2) and entering switchdev mode (creating represntor ens1f0_0):
>
> ip l add dev tun1 type gretap external
> tc qdisc add dev tun1 ingress
> ifconfig tun1 up
>
> ifconfig ens1f0_0 0 up
> tc qdisc add dev ens1f0_0 ingress
>
> ifconfig ens1f0 172.168.152.75/24 up
> ip n replace 172.168.162.241 dev ens1f0 lladdr aa:bb:cc:dd:ee:01
>
> tc filter del dev ens1f0_0 ingress
>
> tc filter add dev ens1f0_0 pref 2 ingress proto ip flower \
>      skip_sw \
>      ip_proto tcp dst_ip 5.5.5.6 src_ip 5.5.5.5 tcp_flags 0/0x5 \
>      action tunnel_key set dst_ip 172.168.152.241 src_ip 0 id 1000 nocsum pipe \
>      action mirred egress redirect dev tun1
>
> tc filter add dev ens1f0_0 pref 1 ingress proto ip flower \
>      skip_hw \
>      action drop
>
> ifconfig ens1f2 5.5.5.5/24 up
> ip n replace 5.5.5.6 dev ens1f2 lladdr aa:bb:cc:dd:ee:ff
>
> timeout 3 iperf -c 5.5.5.6
>
> tc -s filter show dev ens1f0_0 ingress
>
>
>
> Notice I run iperf client without Iperf server, so I get only syn packets.
>
> Here is the tcpdump on the VF (ens1f2):
>
> Executing: sudo tcpdump -nnep  -i ens1f2
>
> Executing: sudo tcpdump -nnep  -i ens1f2
> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
> listening on ens1f2, link-type EN10MB (Ethernet), capture size 262144 bytes
> 13:49:10.610376 24:8a:07:a5:28:01 > aa:bb:cc:dd:ee:ff, ethertype IPv4 (0x0800), length 74: 5.5.5.5.49846 > 5.5.5.6.5001: Flags [S], seq 1395738427, win 64240, options [mss 1460,sackOK,TS val 2249857484 ecr 0,nop,wscale 7], length 0
> 13:49:11.616262 24:8a:07:a5:28:01 > aa:bb:cc:dd:ee:ff, ethertype IPv4 (0x0800), length 74: 5.5.5.5.49846 > 5.5.5.6.5001: Flags [S], seq 1395738427, win 64240, options [mss 1460,sackOK,TS val 2249858489 ecr 0,nop,wscale 7], length 0
> 13:49:13.664261 24:8a:07:a5:28:01 > aa:bb:cc:dd:ee:ff, ethertype IPv4 (0x0800), length 74: 5.5.5.5.49846 > 5.5.5.6.5001: Flags [S], seq 1395738427, win 64240, options [mss 1460,sackOK,TS val 2249860537 ecr 0,nop,wscale 7], length 0
> 13:49:17.696261 24:8a:07:a5:28:01 > aa:bb:cc:dd:e
>
> I get:
>
> filter protocol ip pref 1 flower chain 0
> filter protocol ip pref 1 flower chain 0 handle 0x1
>   eth_type ipv4
>   skip_hw
>   not_in_hw
>         action order 1: gact action drop
>          random type none pass val 0
>          index 1 ref 1 bind 1 installed 3 sec used 3 sec
>         Action statistics:
>         Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>         backlog 0b 0p requeues 0
>
> filter protocol ip pref 2 flower chain 0
> filter protocol ip pref 2 flower chain 0 handle 0x1
>   eth_type ipv4
>   ip_proto tcp
>   dst_ip 5.5.5.6
>   src_ip 5.5.5.5
>   tcp_flags 0/5
>   skip_sw
>   in_hw
>         action order 1: tunnel_key set
>         src_ip 0.0.0.0
>         dst_ip 172.168.152.241
>         key_id 1000
>         nocsum pipe
>         index 1 ref 1 bind 1 installed 3 sec used 3 sec
>         Action statistics:
>         Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>         backlog 0b 0p requeues 0
>
>         action order 2: mirred (Egress Redirect to device tun1) stolen
>         index 1 ref 1 bind 1 installed 3 sec used 1 sec
>         Action statistics:
>         Sent 232 bytes 2 pkt (dropped 0, overlimits 0 requeues 0)
>         backlog 0b 0p requeues 0
>
>
> And it counts the 2 syn packets in hardware, the packets are leaving the VF (ens1f2) and not arriving at the mlx5 representor device (ens1f0_0),
>
> which means hardware got them. It's also couned in the above encap rule. And the software only (skip_hw, prio 1) rule didn't catch any packets.
>
>
> Thanks,
>
> Paul.
>
>
>
> On 11/26/2019 10:18 AM, wenxu wrote:
>
> Hi Paul,
>
> Did your test for this case? There are some problem that I reported?
>
>
> BR
>
> wenxu
>
> On 11/24/2019 4:46 PM, Paul Blakey wrote:
>
>
> Hi,
>
> The syn packet might not be actually offloaded because there isn't a neighbor to resolve the destination mac for the tunnel destination ip (next hop mac).
> Try setting the neighbor via "ip neigh replace dev mlx5_p0 172.168.152.241 lladdr <next hop mac>"
> Or running ping to 172.168.152.241 before adding (or in background) the rule to resolve the mac and make available.
> I'll test it on my end.
>
>
> Thanks,
> Paul.
>
>
>
>
> -----Original Message-----
> From: wenxu <wenxu@ucloud.cn><mailto:wenxu@ucloud.cn>
> Sent: Friday, November 22, 2019 8:26 AM
> To: Paul Blakey <paulb@mellanox.com><mailto:paulb@mellanox.com>
> Cc: pablo@netfilter.org<mailto:pablo@netfilter.org>; netdev@vger.kernel.org<mailto:netdev@vger.kernel.org>; Mark Bloch
> <markb@mellanox.com><mailto:markb@mellanox.com>
> Subject: Re: Question about flow table offload in mlx5e
>
> Hi Paul,
>
>
> There are some update. I also test it through replacing mlx5e_rep_setup_tc
> _cb with mlx5e_rep_setup_ft_cb
>
>
> ifconfig mlx_p0 172.168.152.75/24 up
>
> ip l add dev tun1 type gretap external
> tc qdisc add dev tun1 ingress
> tc qdisc add dev mlx_pf0vf0 ingress
>
> tc filter add dev mlx_pf0vf0 pref 2 ingress  protocol ip flower skip_sw
> ip_proto tcp dst_ip 10.0.1.241 src_ip 10.0.0.75 src_port 5002 dst_port 5001
> tcp_flags 0/0x5  action tunnel_key set dst_ip 172.168.152.241 src_ip 0 id 1000
> nocsum pipe action mirred egress redirect dev tun1
>
> tc filter add dev tun1 pref 2 ingress  protocol ip flower ip_proto tcp src_ip
> 10.0.1.241 dst_ip 10.0.0.75 src_port 5001 dst_port 5002 tcp_flags 0/0x5
> enc_key_id 1000 enc_src_ip 172.168.152.241 action tunnel_key unset pipe
> action mirred egress redirect dev mlx_pf0vf0
>
>
> If you run this script on the host,  and in the virtual machine  run "iperf -c
> 10.0.1.241  -i 2  -B 10.0.0.75:5002  -t 1000"
>
> The tcp syn packet will not be offloaded
>
>
> But if you  only run the script  without the last filter as following , The tcp syn
> packet will be offloaded.
>
> ifconfig mlx_p0 172.168.152.75/24 up
>
> ip l add dev tun1 type gretap external
> tc qdisc add dev tun1 ingress
> tc qdisc add dev mlx_pf0vf0 ingress
>
> tc filter add dev mlx_pf0vf0 pref 2 ingress  protocol ip flower skip_sw
> ip_proto tcp dst_ip 10.0.1.241 src_ip 10.0.0.75 src_port 5002 dst_port 5001
> tcp_flags 0/0x5  action tunnel_key set dst_ip 172.168.152.241 src_ip 0 id 1000
> nocsum pipe action mirred egress redirect dev tun1.
>
> I think there are some problem in mlx5e_rep_setup_ft_cb.
>
> On 11/21/2019 9:05 PM, Paul Blakey wrote:
>
>
>         I see, I will test that, and how about normal FWD rules?
>
>         Paul.
>
>
>
>                 -----Original Message-----
>                 From: wenxu <wenxu@ucloud.cn><mailto:wenxu@ucloud.cn>
> <mailto:wenxu@ucloud.cn><mailto:wenxu@ucloud.cn>
>                 Sent: Thursday, November 21, 2019 2:35 PM
>                 To: Paul Blakey <paulb@mellanox.com><mailto:paulb@mellanox.com>
> <mailto:paulb@mellanox.com><mailto:paulb@mellanox.com>
>                 Cc: pablo@netfilter.org<mailto:pablo@netfilter.org> <mailto:pablo@netfilter.org><mailto:pablo@netfilter.org> ;
> netdev@vger.kernel.org<mailto:netdev@vger.kernel.org> <mailto:netdev@vger.kernel.org><mailto:netdev@vger.kernel.org> ; Mark Bloch
>                 <markb@mellanox.com><mailto:markb@mellanox.com> <mailto:markb@mellanox.com><mailto:markb@mellanox.com>
>                 Subject: Re: Question about flow table offload in mlx5e
>
>
>                 在 2019/11/21 19:39, Paul Blakey 写道:
>
>                         They are good fixes, exactly what we had when we
> tested this, thanks.
>
>                         Regarding encap, I don't know what changes you did,
> how does the encap
>
>                 rule look? Is it a FWD to vxlan device? If not it should be, as
> our driver
>                 expects that.
>                 It is fwd to a gretap devices
>
>
>                         I tried it on my setup via tc, by changing the callback
> of tc
>
>                 (mlx5e_rep_setup_tc_cb) to that of ft
> (mlx5e_rep_setup_ft_cb),
>
>                         and testing a vxlan encap rule:
>                         sudo tc qdisc add dev ens1f0_0 ingress
>                         sudo ifconfig ens1f0 7.7.7.7/24 up
>                         sudo ip link add name vxlan0 type vxlan dev ens1f0
> remote 7.7.7.8 dstport
>
>                 4789 external
>
>                         sudo ifconfig vxlan0 up
>                         sudo tc filter add dev ens1f0_0 ingress prio 1 chain 0
> protocol ip flower
>
>                 dst_mac aa:bb:cc:dd:ee:ff ip_proto udp skip_sw  action
> tunnel_key set
>                 src_ip 0.0.0.0 dst_ip 7.7.7.8 id 1234 dst_port 4789 pipe action
> mirred egress
>                 redirect dev vxlan
>
>
>                         then tc show:
>                         filter protocol ip pref 1 flower chain 0 handle 0x1
> dst_mac aa:bb:cc:dd:ee:ff
>
>                 ip_proto udp skip_sw in_hw in_hw_count 1
>
>                                 tunnel_key set src_ip 0.0.0.0 dst_ip 7.7.7.8 key_id
> 1234 dst_port 4789
>
>                 csum pipe
>
>                                 Stats: used 119 sec      0 pkt
>                                 mirred (Egress Redirect to device vxlan0)
>                                 Stats: used 119 sec      0 pkt
>
>
>                 Can you send packet that match this offloaded flow to check
> it is real
>                 offloaded?
>
>                 In the flowtable offload with my patches both
> TC_SETUP_BLOCK and
>                 TC_SETUP_FT can offload the rule success
>
>                 But in the TC_SETUP_FT case the packet is not real offloaded.
>
>
>                 I  will test like u did.
>
>
>
>
>
>
>                                 -----Original Message-----
>                                 From: wenxu <wenxu@ucloud.cn><mailto:wenxu@ucloud.cn>
> <mailto:wenxu@ucloud.cn><mailto:wenxu@ucloud.cn>
>                                 Sent: Thursday, November 21, 2019 10:29 AM
>                                 To: Paul Blakey <paulb@mellanox.com><mailto:paulb@mellanox.com>
> <mailto:paulb@mellanox.com><mailto:paulb@mellanox.com>
>                                 Cc: pablo@netfilter.org<mailto:pablo@netfilter.org>
> <mailto:pablo@netfilter.org><mailto:pablo@netfilter.org> ; netdev@vger.kernel.org<mailto:netdev@vger.kernel.org>
> <mailto:netdev@vger.kernel.org><mailto:netdev@vger.kernel.org> ; Mark Bloch
>                                 <markb@mellanox.com><mailto:markb@mellanox.com>
> <mailto:markb@mellanox.com><mailto:markb@mellanox.com>
>                                 Subject: Re: Question about flow table
> offload in mlx5e
>
>
>                                 On 11/21/2019 3:42 PM, Paul Blakey wrote:
>
>                                         Hi,
>
>                                         The original design was the block
> setup to use TC_SETUP_FT type, and
>
>                 the
>
>                                 tc event type to be case
> TC_SETUP_CLSFLOWER.
>
>                                         We will post a patch to change that. I
> would advise to wait till we fix that
>
>                                 😊
>
>                                         I'm not sure how you get to this
> function mlx5e_rep_setup_ft_cb() if it
>
>                 the
>
>                                 nf_flow_table_offload ndo_setup_tc event
> was TC_SETUP_BLOCK, and
>
>                 not
>
>                                 TC_SETUP_FT.
>
>
>                                 Yes I change the TC_SETUP_BLOCK to
> TC_SETUP_FT in the
>                                 nf_flow_table_offload_setup.
>
>                                 Two fixes patch provide:
>
>                                 http://patchwork.ozlabs.org/patch/1197818/
>
>                                 http://patchwork.ozlabs.org/patch/1197876/
>
>                                 So this change made by me is not correct
> currently?
>
>
>                                         In our driver en_rep.c we have:
>
>                                         -------switch (type) {
>                                         -------case TC_SETUP_BLOCK:
>                                         ------->-------return
> flow_block_cb_setup_simple(type_data,
>                                         ------->------->------->------->------->---
> ----
>
>                 &mlx5e_rep_block_tc_cb_list,
>
>                                         ------->------->------->------->------->---
> ----  mlx5e_rep_setup_tc_cb,
>                                         ------->------->------->------->------->---
> ----  priv, priv, true);
>                                         -------case TC_SETUP_FT:
>                                         ------->-------return
> flow_block_cb_setup_simple(type_data,
>                                         ------->------->------->------->------->---
> ----
>
>                 &mlx5e_rep_block_ft_cb_list,
>
>                                         ------->------->------->------->------->---
> ----  mlx5e_rep_setup_ft_cb,
>                                         ------->------->------->------->------->---
> ----  priv, priv, true);
>                                         -------default:
>                                         ------->-------return -EOPNOTSUPP;
>                                         -------}
>
>                                         In nf_flow_table_offload.c:
>
>                                         -------bo.binder_type>-=
>
>                 FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS;
>
>                                         -------bo.extack>------= &extack;
>                                         -------INIT_LIST_HEAD(&bo.cb_list);
>                                         -------err = dev->netdev_ops-
>
>
> ndo_setup_tc(dev, TC_SETUP_BLOCK,
>
>
>                                 &bo);
>
>                                         -------if (err < 0)
>                                         ------->-------return err;
>                                         -------return
> nf_flow_table_block_setup(flowtable, &bo, cmd);
>
>                                         }
>
>         EXPORT_SYMBOL_GPL(nf_flow_table_offload_setup);
>
>
>                                         So unless you changed that as well,
> you should have gotten to
>
>                                 mlx5e_rep_setup_tc_cb and not
> mlx5e_rep_setup_tc_ft.
>
>                                         Regarding the encap action, there
> should be no difference on which
>
>                 chain
>
>                                 the rule is on.
>
>
>                                 But for the same encap rule can be real
> offloaded when setup through
>                                 through TC_SETUP_BLOCK. But TC_SETUP_FT
> can't.
>
>                                 So it is the problem of TC_SETUP_FT in
> mlx5e_rep_setup_ft_cb ?
>
>
>
>
>                                         -----Original Message-----
>                                         From: wenxu <wenxu@ucloud.cn><mailto:wenxu@ucloud.cn>
> <mailto:wenxu@ucloud.cn><mailto:wenxu@ucloud.cn>
>                                         Sent: Thursday, November 21, 2019
> 9:30 AM
>                                         To: Paul Blakey
> <paulb@mellanox.com><mailto:paulb@mellanox.com> <mailto:paulb@mellanox.com><mailto:paulb@mellanox.com>
>                                         Cc: pablo@netfilter.org<mailto:pablo@netfilter.org>
> <mailto:pablo@netfilter.org><mailto:pablo@netfilter.org> ; netdev@vger.kernel.org<mailto:netdev@vger.kernel.org>
> <mailto:netdev@vger.kernel.org><mailto:netdev@vger.kernel.org> ; Mark Bloch
>                                         <markb@mellanox.com><mailto:markb@mellanox.com>
> <mailto:markb@mellanox.com><mailto:markb@mellanox.com>
>                                         Subject: Question about flow table
> offload in mlx5e
>
>                                         Hi  paul,
>
>                                         The flow table offload in the mlx5e is
> based on TC_SETUP_FT.
>
>
>                                         It is almost the same as
> TC_SETUP_BLOCK.
>
>                                         It just set
> MLX5_TC_FLAG(FT_OFFLOAD) flags and change
>                                         cls_flower.common.chain_index =
> FDB_FT_CHAIN;
>
>                                         In following codes line 1380 and 1392
>
>                                         1368 static int
> mlx5e_rep_setup_ft_cb(enum tc_setup_type type, void
>                                         *type_data,
>                                         1369                                  void *cb_priv)
>                                         1370 {
>                                         1371         struct flow_cls_offload *f =
> type_data;
>                                         1372         struct flow_cls_offload
> cls_flower;
>                                         1373         struct mlx5e_priv *priv =
> cb_priv;
>                                         1374         struct mlx5_eswitch *esw;
>                                         1375         unsigned long flags;
>                                         1376         int err;
>                                         1377
>                                         1378         flags =
> MLX5_TC_FLAG(INGRESS) |
>                                         1379
> MLX5_TC_FLAG(ESW_OFFLOAD) |
>                                         1380
> MLX5_TC_FLAG(FT_OFFLOAD);
>                                         1381         esw = priv->mdev-
>
>
> priv.eswitch;
>
>
>                                         1382
>                                         1383         switch (type) {
>                                         1384         case
> TC_SETUP_CLSFLOWER:
>                                         1385                 if
> (!mlx5_eswitch_prios_supported(esw) || f-
>
>                                         common.chain_index)
>
>                                         1386                         return -
> EOPNOTSUPP;
>                                         1387
>                                         1388                 /* Re-use tc offload
> path by moving the ft flow to the
>                                         1389                  * reserved ft chain.
>                                         1390                  */
>                                         1391                 memcpy(&cls_flower, f,
> sizeof(*f));
>                                         1392
> cls_flower.common.chain_index = FDB_FT_CHAIN;
>                                         1393                 err =
> mlx5e_rep_setup_tc_cls_flower(priv, &cls_flower,
>
>                                 flags);
>
>                                         1394                 memcpy(&f->stats,
> &cls_flower.stats, sizeof(f->stats));
>
>
>                                         I want to add tunnel offload support
> in the flow table, I  add some
>
>                 patches
>
>                                 in
>
>                                         nf_flow_table_offload.
>
>                                         Also add the indr setup support in the
> mlx driver. And Now I can  flow
>
>                                 table
>
>                                         offload with decap.
>
>
>                                         But I meet a problem with the encap.
> The encap rule can be added in
>                                         hardware  successfully But it can't be
> offloaded.
>
>                                         But I think the rule I added is correct.
> If I mask the line 1392. The rule
>
>                 also
>
>                                 can
>
>                                         be add success and can be offloaded.
>
>                                         So there are some limit for encap
> operation for FT_OFFLOAD in
>                                         FDB_FT_CHAIN?
>
>
>                                         BR
>
>                                         wenxu
>
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Question about flow table offload in mlx5e
  2019-11-27 12:16                       ` wenxu
@ 2019-11-27 13:11                         ` Paul Blakey
  2019-11-27 13:20                           ` Paul Blakey
  0 siblings, 1 reply; 25+ messages in thread
From: Paul Blakey @ 2019-11-27 13:11 UTC (permalink / raw)
  To: wenxu; +Cc: pablo, netdev, Mark Bloch

On 11/27/2019 2:16 PM, wenxu wrote:

> Sorry maybe something mess you,  Ignore with my patches.
>
>
> I also did the test like you with route tc rules to ft callback.
>
>
> please also did the following test:  mlx_p0 is the pf and mlx_pf0vf0 is the vf .
>
Are you  in switchdev mode (via devlink) or default legacy mode?



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Question about flow table offload in mlx5e
  2019-11-27 13:11                         ` Paul Blakey
@ 2019-11-27 13:20                           ` Paul Blakey
  2019-11-27 13:45                             ` wenxu
  0 siblings, 1 reply; 25+ messages in thread
From: Paul Blakey @ 2019-11-27 13:20 UTC (permalink / raw)
  To: wenxu; +Cc: pablo, netdev, Mark Bloch


On 11/27/2019 3:11 PM, Paul Blakey wrote:
> On 11/27/2019 2:16 PM, wenxu wrote:
>
>> Sorry maybe something mess you,  Ignore with my patches.
>>
>>
>> I also did the test like you with route tc rules to ft callback.
>>
>>
>> please also did the following test:  mlx_p0 is the pf and mlx_pf0vf0 
>> is the vf .
>>
> Are you  in switchdev mode (via devlink) or default legacy mode?
>
>

mlx_pf0vf0  is representor device created after entring switchdev mode? and eth0 in vm is the binded mlx5 VF?

Can you run this command:

sudo grep -ri "" /sys/class/net/*/phys_* 2>/dev/null

example:
/sys/class/net/ens1f0_0/phys_port_name:pf0vf0
/sys/class/net/ens1f0_0/phys_switch_id:b828a50003078a24
/sys/class/net/ens1f0_1/phys_port_name:pf0vf1
/sys/class/net/ens1f0_1/phys_switch_id:b828a50003078a24
/sys/class/net/ens1f0/phys_port_name:p0
/sys/class/net/ens1f0/phys_switch_id:b828a50003078a24

and
sudo ls /sys/class/net/*/device/virtfn*/net

example:
/sys/class/net/ens1f0/device/virtfn0/net:
ens1f2

/sys/class/net/ens1f0/device/virtfn1/net:
ens1f3

and even

lspci | grep -i mellanox ; ls -l /sys/class/net






Thansk.



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Question about flow table offload in mlx5e
  2019-11-27 13:20                           ` Paul Blakey
@ 2019-11-27 13:45                             ` wenxu
  2019-12-02  3:37                               ` wenxu
  0 siblings, 1 reply; 25+ messages in thread
From: wenxu @ 2019-11-27 13:45 UTC (permalink / raw)
  To: Paul Blakey; +Cc: pablo, netdev, Mark Bloch


在 2019/11/27 21:20, Paul Blakey 写道:
> On 11/27/2019 3:11 PM, Paul Blakey wrote:
>> On 11/27/2019 2:16 PM, wenxu wrote:
>>
>>> Sorry maybe something mess you,  Ignore with my patches.
>>>
>>>
>>> I also did the test like you with route tc rules to ft callback.
>>>
>>>
>>> please also did the following test:  mlx_p0 is the pf and mlx_pf0vf0 
>>> is the vf .
>>>
>> Are you  in switchdev mode (via devlink) or default legacy mode?
>>
>>
> mlx_pf0vf0  is representor device created after entring switchdev mode? and eth0 in vm is the binded mlx5 VF?

Yes, mlx_pf0vf0 is the representor and eth0 in vm is VF. It also in the switchdev mode.


sudo grep -ri "" /sys/class/net/*/phys_* 2>/dev/null
/sys/class/net/mlx_p0/phys_port_name:p0
/sys/class/net/mlx_p0/phys_switch_id:34ebc100034b6b50
/sys/class/net/mlx_pf0vf0/phys_port_name:pf0vf0
/sys/class/net/mlx_pf0vf0/phys_switch_id:34ebc100034b6b50
/sys/class/net/mlx_pf0vf1/phys_port_name:pf0vf1
/sys/class/net/mlx_pf0vf1/phys_switch_id:34ebc100034b6b50

The problem is when the last filter add in the tun1 will lead the outgoing syn packets can't be real offloaded

>
> Can you run this command:
>
> sudo grep -ri "" /sys/class/net/*/phys_* 2>/dev/null
>
> example:
> /sys/class/net/ens1f0_0/phys_port_name:pf0vf0
> /sys/class/net/ens1f0_0/phys_switch_id:b828a50003078a24
> /sys/class/net/ens1f0_1/phys_port_name:pf0vf1
> /sys/class/net/ens1f0_1/phys_switch_id:b828a50003078a24
> /sys/class/net/ens1f0/phys_port_name:p0
> /sys/class/net/ens1f0/phys_switch_id:b828a50003078a24
>
> and
> sudo ls /sys/class/net/*/device/virtfn*/net
>
> example:
> /sys/class/net/ens1f0/device/virtfn0/net:
> ens1f2
>
> /sys/class/net/ens1f0/device/virtfn1/net:
> ens1f3
>
> and even
>
> lspci | grep -i mellanox ; ls -l /sys/class/net
>
>
>
>
>
>
> Thansk.
>
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Bad performance for VF outgoing in offloaded mode
       [not found]                     ` <84874b42-c525-2149-539d-e7510d15f6a6@mellanox.com>
  2019-11-27 12:16                       ` wenxu
  2019-11-27 12:16                       ` wenxu
@ 2019-11-28  5:03                       ` wenxu
  2019-12-04 13:50                         ` Roi Dayan
  2 siblings, 1 reply; 25+ messages in thread
From: wenxu @ 2019-11-28  5:03 UTC (permalink / raw)
  To: Roi Dayan; +Cc: netdev, saeedm

Hi mellanox team,


I did a performance test for tc offload with upstream kernel:

I setup a vm with a VF as eth0

In the vm:

ifconfig eth0 10.0.0.75/24 up


On the host the mlx_p0 is the pf representor and mlx_pf0vf0 is the vf representor

The device in the switchdev mode

# grep -ri "" /sys/class/net/*/phys_* 2>/dev/null
/sys/class/net/mlx_p0/phys_port_name:p0
/sys/class/net/mlx_p0/phys_switch_id:34ebc100034b6b50
/sys/class/net/mlx_pf0vf0/phys_port_name:pf0vf0
/sys/class/net/mlx_pf0vf0/phys_switch_id:34ebc100034b6b50
/sys/class/net/mlx_pf0vf1/phys_port_name:pf0vf1
/sys/class/net/mlx_pf0vf1/phys_switch_id:34ebc100034b6b50


The tc filter as following: just forward ip/arp packets  in mlx_p0 and mlx_pf0vf0 each other

tc qdisc add dev mlx_p0 ingress
tc qdisc add dev mlx_pf0vf0 ingress

tc filter add dev mlx_pf0vf0 pref 2 ingress  protocol ip flower skip_sw action mirred egress redirect dev mlx_p0
tc filter add dev mlx_p0 pref 2 ingress  protocol ip flower skip_sw action mirred egress redirect dev mlx_pf0vf0

tc filter add dev mlx_pf0vf0 pref 1 ingress  protocol arp flower skip_sw action mirred egress redirect dev mlx_p0
tc filter add dev mlx_p0 pref 1 ingress  protocol arp flower skip_sw action mirred egress redirect dev mlx_pf0vf0


The remote server device eth0:

ifconfig eth0 10.0.0.241/24


test case 1:   tcp recieve from VF to PF

In the vm: iperf -s

On the remote server:

iperf -c 10.0.0.75 -t 10 -i 2
------------------------------------------------------------
Client connecting to 10.0.0.75, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[  3] local 10.0.0.241 port 59708 connected with 10.0.0.75 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 2.0 sec  5.40 GBytes  23.2 Gbits/sec
[  3]  2.0- 4.0 sec  5.35 GBytes  23.0 Gbits/sec
[  3]  4.0- 6.0 sec  5.46 GBytes  23.5 Gbits/sec
[  3]  6.0- 8.0 sec  5.10 GBytes  21.9 Gbits/sec
[  3]  8.0-10.0 sec  5.36 GBytes  23.0 Gbits/sec
[  3]  0.0-10.0 sec  26.7 GBytes  22.9 Gbits/sec


Good performance with offload.

# tc -s filter ls dev mlx_p0 ingress
filter protocol arp pref 1 flower chain 0
filter protocol arp pref 1 flower chain 0 handle 0x1
  eth_type arp
  skip_sw
  in_hw in_hw_count 1
    action order 1: mirred (Egress Redirect to device mlx_pf0vf0) stolen
     index 4 ref 1 bind 1 installed 971 sec used 82 sec
     Action statistics:
    Sent 420 bytes 7 pkt (dropped 0, overlimits 0 requeues 0)
    Sent software 0 bytes 0 pkt
    Sent hardware 420 bytes 7 pkt
    backlog 0b 0p requeues 0

filter protocol ip pref 2 flower chain 0
filter protocol ip pref 2 flower chain 0 handle 0x1
  eth_type ipv4
  skip_sw
  in_hw in_hw_count 1
    action order 1: mirred (Egress Redirect to device mlx_pf0vf0) stolen
     index 2 ref 1 bind 1 installed 972 sec used 67 sec
     Action statistics:
    Sent 79272204362 bytes 91511261 pkt (dropped 0, overlimits 0 requeues 0)
    Sent software 0 bytes 0 pkt
    Sent hardware 79272204362 bytes 91511261 pkt
    backlog 0b 0p requeues 0

#  tc -s filter ls dev mlx_pf0vf0 ingress
filter protocol arp pref 1 flower chain 0
filter protocol arp pref 1 flower chain 0 handle 0x1
  eth_type arp
  skip_sw
  in_hw in_hw_count 1
    action order 1: mirred (Egress Redirect to device mlx_p0) stolen
     index 3 ref 1 bind 1 installed 978 sec used 88 sec
     Action statistics:
    Sent 600 bytes 10 pkt (dropped 0, overlimits 0 requeues 0)
    Sent software 0 bytes 0 pkt
    Sent hardware 600 bytes 10 pkt
    backlog 0b 0p requeues 0

filter protocol ip pref 2 flower chain 0
filter protocol ip pref 2 flower chain 0 handle 0x1
  eth_type ipv4
  skip_sw
  in_hw in_hw_count 1
    action order 1: mirred (Egress Redirect to device mlx_p0) stolen
     index 1 ref 1 bind 1 installed 978 sec used 73 sec
     Action statistics:
    Sent 71556027574 bytes 47805525 pkt (dropped 0, overlimits 0 requeues 0)
    Sent software 0 bytes 0 pkt
    Sent hardware 71556027574 bytes 47805525 pkt
    backlog 0b 0p requeues 0



test case 2:  tcp send from VF to PF

On the reomte server: iperf -s

in the vm:

# iperf -c 10.0.0.241 -t 10 -i 2

------------------------------------------------------------
Client connecting to 10.0.0.241, TCP port 5001
TCP window size:  230 KByte (default)
------------------------------------------------------------
[  3] local 10.0.0.75 port 53166 connected with 10.0.0.241 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 2.0 sec   939 MBytes  3.94 Gbits/sec
[  3]  2.0- 4.0 sec   944 MBytes  3.96 Gbits/sec
[  3]  4.0- 6.0 sec  1.01 GBytes  4.34 Gbits/sec
[  3]  6.0- 8.0 sec  1.03 GBytes  4.44 Gbits/sec
[  3]  8.0-10.0 sec  1.02 GBytes  4.39 Gbits/sec
[  3]  0.0-10.0 sec  4.90 GBytes  4.21 Gbits/sec


Bad performance with offload.  All the packet are offloaded. 

It is the offload problem in the hardware?


BR

wenxu



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Question about flow table offload in mlx5e
  2019-11-27 13:45                             ` wenxu
@ 2019-12-02  3:37                               ` wenxu
  2019-12-05 15:17                                 ` Paul Blakey
  0 siblings, 1 reply; 25+ messages in thread
From: wenxu @ 2019-12-02  3:37 UTC (permalink / raw)
  To: Paul Blakey; +Cc: netdev

Hi Paul,


Sorry for trouble you again. I think it is a problem in ft callback.

Can your help me fix it. Thx!

I did the test like you with route tc rules to ft callback.

# ifconfig mlx_p0 172.168.152.75/24 up
# ip n r 172.16.152.241 lladdr fa:fa:ff:ff:ff:ff dev mlx_p0

# ip l add dev tun1 type gretap external
# tc qdisc add dev tun1 ingress
# tc qdisc add dev mlx_pf0vf0 ingress

# tc filter add dev mlx_pf0vf0 pref 2 ingress  protocol ip flower skip_sw  action tunnel_key set dst_ip 172.168.152.241 src_ip 0 id 1000 nocsum pipe action mirred egress redirect dev tun1


In The vm:
# ifconfig eth0 10.0.0.75/24 up
# ip n r 10.0.0.77 lladdr fa:ff:ff:ff:ff:ff dev eth0

# iperf -c 10.0.0.77 -t 100 -i 2

The syn packets can be offloaded successfully.

# # tc -s filter ls dev mlx_pf0vf0 ingress
filter protocol ip pref 2 flower chain 0 
filter protocol ip pref 2 flower chain 0 handle 0x1 
  eth_type ipv4
  skip_sw
  in_hw in_hw_count 1
	action order 1: tunnel_key  set
	src_ip 0.0.0.0
	dst_ip 172.168.152.241
	key_id 1000
	nocsum pipe
	 index 1 ref 1 bind 1 installed 252 sec used 252 sec
	Action statistics:
	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
	backlog 0b 0p requeues 0

	action order 2: mirred (Egress Redirect to device tun1) stolen
 	index 1 ref 1 bind 1 installed 252 sec used 110 sec
 	Action statistics:
	Sent 3420 bytes 11 pkt (dropped 0, overlimits 0 requeues 0) 
	Sent software 0 bytes 0 pkt
	Sent hardware 3420 bytes 11 pkt
	backlog 0b 0p requeues 0

But Then I add another decap filter on tun1:

tc filter add dev tun1 pref 2 ingress protocol ip flower enc_key_id 1000 enc_src_ip 172.168.152.241 action tunnel_key unset pipe action mirred egress redirect dev mlx_pf0vf0

# iperf -c 10.0.0.77 -t 100 -i 2

The syn packets can't be offloaded. The tc filter counter is also not increase.


# tc -s filter ls dev mlx_pf0vf0 ingress
filter protocol ip pref 2 flower chain 0 
filter protocol ip pref 2 flower chain 0 handle 0x1 
  eth_type ipv4
  skip_sw
  in_hw in_hw_count 1
	action order 1: tunnel_key  set
	src_ip 0.0.0.0
	dst_ip 172.168.152.241
	key_id 1000
	nocsum pipe
	 index 1 ref 1 bind 1 installed 320 sec used 320 sec
	Action statistics:
	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
	backlog 0b 0p requeues 0

	action order 2: mirred (Egress Redirect to device tun1) stolen
 	index 1 ref 1 bind 1 installed 320 sec used 178 sec
 	Action statistics:
	Sent 3420 bytes 11 pkt (dropped 0, overlimits 0 requeues 0) 
	Sent software 0 bytes 0 pkt
	Sent hardware 3420 bytes 11 pkt
	backlog 0b 0p requeues 0

# tc -s filter ls dev tun1 ingress
filter protocol ip pref 2 flower chain 0 
filter protocol ip pref 2 flower chain 0 handle 0x1 
  eth_type ipv4
  enc_src_ip 172.168.152.241
  enc_key_id 1000
  in_hw in_hw_count 1
	action order 1: tunnel_key  unset pipe
	 index 2 ref 1 bind 1 installed 391 sec used 391 sec
	Action statistics:
	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
	backlog 0b 0p requeues 0

	action order 2: mirred (Egress Redirect to device mlx_pf0vf0) stolen
 	index 2 ref 1 bind 1 installed 391 sec used 391 sec
 	Action statistics:
	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
	backlog 0b 0p requeues 0


So there maybe some problem for ft callback setup. When there is another reverse
decap rule add in tunnel device, The encap rule will not offloaded the packets.

Expect your help Thx!


BR
wenxu









^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Bad performance for VF outgoing in offloaded mode
  2019-11-28  5:03                       ` Bad performance for VF outgoing in offloaded mode wenxu
@ 2019-12-04 13:50                         ` Roi Dayan
  2019-12-04 14:32                           ` wenxu
  2019-12-05  3:41                           ` wenxu
  0 siblings, 2 replies; 25+ messages in thread
From: Roi Dayan @ 2019-12-04 13:50 UTC (permalink / raw)
  To: wenxu; +Cc: netdev, Saeed Mahameed



On 2019-11-28 7:03 AM, wenxu wrote:
> Hi mellanox team,
> 
> 
> I did a performance test for tc offload with upstream kernel:
> 
> I setup a vm with a VF as eth0
> 
> In the vm:
> 
> ifconfig eth0 10.0.0.75/24 up
> 
> 
> On the host the mlx_p0 is the pf representor and mlx_pf0vf0 is the vf representor
> 
> The device in the switchdev mode
> 
> # grep -ri "" /sys/class/net/*/phys_* 2>/dev/null
> /sys/class/net/mlx_p0/phys_port_name:p0
> /sys/class/net/mlx_p0/phys_switch_id:34ebc100034b6b50
> /sys/class/net/mlx_pf0vf0/phys_port_name:pf0vf0
> /sys/class/net/mlx_pf0vf0/phys_switch_id:34ebc100034b6b50
> /sys/class/net/mlx_pf0vf1/phys_port_name:pf0vf1
> /sys/class/net/mlx_pf0vf1/phys_switch_id:34ebc100034b6b50
> 
> 
> The tc filter as following: just forward ip/arp packets  in mlx_p0 and mlx_pf0vf0 each other
> 
> tc qdisc add dev mlx_p0 ingress
> tc qdisc add dev mlx_pf0vf0 ingress
> 
> tc filter add dev mlx_pf0vf0 pref 2 ingress  protocol ip flower skip_sw action mirred egress redirect dev mlx_p0
> tc filter add dev mlx_p0 pref 2 ingress  protocol ip flower skip_sw action mirred egress redirect dev mlx_pf0vf0
> 
> tc filter add dev mlx_pf0vf0 pref 1 ingress  protocol arp flower skip_sw action mirred egress redirect dev mlx_p0
> tc filter add dev mlx_p0 pref 1 ingress  protocol arp flower skip_sw action mirred egress redirect dev mlx_pf0vf0
> 
> 
> The remote server device eth0:
> 
> ifconfig eth0 10.0.0.241/24
> 
> 
> test case 1:   tcp recieve from VF to PF
> 
> In the vm: iperf -s
> 
> On the remote server:
> 
> iperf -c 10.0.0.75 -t 10 -i 2
> ------------------------------------------------------------
> Client connecting to 10.0.0.75, TCP port 5001
> TCP window size: 85.0 KByte (default)
> ------------------------------------------------------------
> [  3] local 10.0.0.241 port 59708 connected with 10.0.0.75 port 5001
> [ ID] Interval       Transfer     Bandwidth
> [  3]  0.0- 2.0 sec  5.40 GBytes  23.2 Gbits/sec
> [  3]  2.0- 4.0 sec  5.35 GBytes  23.0 Gbits/sec
> [  3]  4.0- 6.0 sec  5.46 GBytes  23.5 Gbits/sec
> [  3]  6.0- 8.0 sec  5.10 GBytes  21.9 Gbits/sec
> [  3]  8.0-10.0 sec  5.36 GBytes  23.0 Gbits/sec
> [  3]  0.0-10.0 sec  26.7 GBytes  22.9 Gbits/sec
> 
> 
> Good performance with offload.
> 
> # tc -s filter ls dev mlx_p0 ingress
> filter protocol arp pref 1 flower chain 0
> filter protocol arp pref 1 flower chain 0 handle 0x1
>   eth_type arp
>   skip_sw
>   in_hw in_hw_count 1
>     action order 1: mirred (Egress Redirect to device mlx_pf0vf0) stolen
>      index 4 ref 1 bind 1 installed 971 sec used 82 sec
>      Action statistics:
>     Sent 420 bytes 7 pkt (dropped 0, overlimits 0 requeues 0)
>     Sent software 0 bytes 0 pkt
>     Sent hardware 420 bytes 7 pkt
>     backlog 0b 0p requeues 0
> 
> filter protocol ip pref 2 flower chain 0
> filter protocol ip pref 2 flower chain 0 handle 0x1
>   eth_type ipv4
>   skip_sw
>   in_hw in_hw_count 1
>     action order 1: mirred (Egress Redirect to device mlx_pf0vf0) stolen
>      index 2 ref 1 bind 1 installed 972 sec used 67 sec
>      Action statistics:
>     Sent 79272204362 bytes 91511261 pkt (dropped 0, overlimits 0 requeues 0)
>     Sent software 0 bytes 0 pkt
>     Sent hardware 79272204362 bytes 91511261 pkt
>     backlog 0b 0p requeues 0
> 
> #  tc -s filter ls dev mlx_pf0vf0 ingress
> filter protocol arp pref 1 flower chain 0
> filter protocol arp pref 1 flower chain 0 handle 0x1
>   eth_type arp
>   skip_sw
>   in_hw in_hw_count 1
>     action order 1: mirred (Egress Redirect to device mlx_p0) stolen
>      index 3 ref 1 bind 1 installed 978 sec used 88 sec
>      Action statistics:
>     Sent 600 bytes 10 pkt (dropped 0, overlimits 0 requeues 0)
>     Sent software 0 bytes 0 pkt
>     Sent hardware 600 bytes 10 pkt
>     backlog 0b 0p requeues 0
> 
> filter protocol ip pref 2 flower chain 0
> filter protocol ip pref 2 flower chain 0 handle 0x1
>   eth_type ipv4
>   skip_sw
>   in_hw in_hw_count 1
>     action order 1: mirred (Egress Redirect to device mlx_p0) stolen
>      index 1 ref 1 bind 1 installed 978 sec used 73 sec
>      Action statistics:
>     Sent 71556027574 bytes 47805525 pkt (dropped 0, overlimits 0 requeues 0)
>     Sent software 0 bytes 0 pkt
>     Sent hardware 71556027574 bytes 47805525 pkt
>     backlog 0b 0p requeues 0
> 
> 
> 
> test case 2:  tcp send from VF to PF
> 
> On the reomte server: iperf -s
> 
> in the vm:
> 
> # iperf -c 10.0.0.241 -t 10 -i 2
> 
> ------------------------------------------------------------
> Client connecting to 10.0.0.241, TCP port 5001
> TCP window size:  230 KByte (default)
> ------------------------------------------------------------
> [  3] local 10.0.0.75 port 53166 connected with 10.0.0.241 port 5001
> [ ID] Interval       Transfer     Bandwidth
> [  3]  0.0- 2.0 sec   939 MBytes  3.94 Gbits/sec
> [  3]  2.0- 4.0 sec   944 MBytes  3.96 Gbits/sec
> [  3]  4.0- 6.0 sec  1.01 GBytes  4.34 Gbits/sec
> [  3]  6.0- 8.0 sec  1.03 GBytes  4.44 Gbits/sec
> [  3]  8.0-10.0 sec  1.02 GBytes  4.39 Gbits/sec
> [  3]  0.0-10.0 sec  4.90 GBytes  4.21 Gbits/sec
> 
> 
> Bad performance with offload.  All the packet are offloaded. 
> 
> It is the offload problem in the hardware?
> 
> 
> BR
> 
> wenxu
> 
> 

Hi Wenxu,

We didn't notice this behavior.
Could it be your VM doesn't have enough resources to generate the traffic?
As a listener it's only sending the acks.

Thanks,
Roi

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Bad performance for VF outgoing in offloaded mode
  2019-12-04 13:50                         ` Roi Dayan
@ 2019-12-04 14:32                           ` wenxu
  2019-12-05  3:41                           ` wenxu
  1 sibling, 0 replies; 25+ messages in thread
From: wenxu @ 2019-12-04 14:32 UTC (permalink / raw)
  To: Roi Dayan; +Cc: netdev, Saeed Mahameed


在 2019/12/4 21:50, Roi Dayan 写道:
>
> On 2019-11-28 7:03 AM, wenxu wrote:
>> Hi mellanox team,
>>
>>
>> I did a performance test for tc offload with upstream kernel:
>>
>> I setup a vm with a VF as eth0
>>
>> In the vm:
>>
>> ifconfig eth0 10.0.0.75/24 up
>>
>>
>> On the host the mlx_p0 is the pf representor and mlx_pf0vf0 is the vf representor
>>
>> The device in the switchdev mode
>>
>> # grep -ri "" /sys/class/net/*/phys_* 2>/dev/null
>> /sys/class/net/mlx_p0/phys_port_name:p0
>> /sys/class/net/mlx_p0/phys_switch_id:34ebc100034b6b50
>> /sys/class/net/mlx_pf0vf0/phys_port_name:pf0vf0
>> /sys/class/net/mlx_pf0vf0/phys_switch_id:34ebc100034b6b50
>> /sys/class/net/mlx_pf0vf1/phys_port_name:pf0vf1
>> /sys/class/net/mlx_pf0vf1/phys_switch_id:34ebc100034b6b50
>>
>>
>> The tc filter as following: just forward ip/arp packets  in mlx_p0 and mlx_pf0vf0 each other
>>
>> tc qdisc add dev mlx_p0 ingress
>> tc qdisc add dev mlx_pf0vf0 ingress
>>
>> tc filter add dev mlx_pf0vf0 pref 2 ingress  protocol ip flower skip_sw action mirred egress redirect dev mlx_p0
>> tc filter add dev mlx_p0 pref 2 ingress  protocol ip flower skip_sw action mirred egress redirect dev mlx_pf0vf0
>>
>> tc filter add dev mlx_pf0vf0 pref 1 ingress  protocol arp flower skip_sw action mirred egress redirect dev mlx_p0
>> tc filter add dev mlx_p0 pref 1 ingress  protocol arp flower skip_sw action mirred egress redirect dev mlx_pf0vf0
>>
>>
>> The remote server device eth0:
>>
>> ifconfig eth0 10.0.0.241/24
>>
>>
>> test case 1:   tcp recieve from VF to PF
>>
>> In the vm: iperf -s
>>
>> On the remote server:
>>
>> iperf -c 10.0.0.75 -t 10 -i 2
>> ------------------------------------------------------------
>> Client connecting to 10.0.0.75, TCP port 5001
>> TCP window size: 85.0 KByte (default)
>> ------------------------------------------------------------
>> [  3] local 10.0.0.241 port 59708 connected with 10.0.0.75 port 5001
>> [ ID] Interval       Transfer     Bandwidth
>> [  3]  0.0- 2.0 sec  5.40 GBytes  23.2 Gbits/sec
>> [  3]  2.0- 4.0 sec  5.35 GBytes  23.0 Gbits/sec
>> [  3]  4.0- 6.0 sec  5.46 GBytes  23.5 Gbits/sec
>> [  3]  6.0- 8.0 sec  5.10 GBytes  21.9 Gbits/sec
>> [  3]  8.0-10.0 sec  5.36 GBytes  23.0 Gbits/sec
>> [  3]  0.0-10.0 sec  26.7 GBytes  22.9 Gbits/sec
>>
>>
>> Good performance with offload.
>>
>> # tc -s filter ls dev mlx_p0 ingress
>> filter protocol arp pref 1 flower chain 0
>> filter protocol arp pref 1 flower chain 0 handle 0x1
>>   eth_type arp
>>   skip_sw
>>   in_hw in_hw_count 1
>>     action order 1: mirred (Egress Redirect to device mlx_pf0vf0) stolen
>>      index 4 ref 1 bind 1 installed 971 sec used 82 sec
>>      Action statistics:
>>     Sent 420 bytes 7 pkt (dropped 0, overlimits 0 requeues 0)
>>     Sent software 0 bytes 0 pkt
>>     Sent hardware 420 bytes 7 pkt
>>     backlog 0b 0p requeues 0
>>
>> filter protocol ip pref 2 flower chain 0
>> filter protocol ip pref 2 flower chain 0 handle 0x1
>>   eth_type ipv4
>>   skip_sw
>>   in_hw in_hw_count 1
>>     action order 1: mirred (Egress Redirect to device mlx_pf0vf0) stolen
>>      index 2 ref 1 bind 1 installed 972 sec used 67 sec
>>      Action statistics:
>>     Sent 79272204362 bytes 91511261 pkt (dropped 0, overlimits 0 requeues 0)
>>     Sent software 0 bytes 0 pkt
>>     Sent hardware 79272204362 bytes 91511261 pkt
>>     backlog 0b 0p requeues 0
>>
>> #  tc -s filter ls dev mlx_pf0vf0 ingress
>> filter protocol arp pref 1 flower chain 0
>> filter protocol arp pref 1 flower chain 0 handle 0x1
>>   eth_type arp
>>   skip_sw
>>   in_hw in_hw_count 1
>>     action order 1: mirred (Egress Redirect to device mlx_p0) stolen
>>      index 3 ref 1 bind 1 installed 978 sec used 88 sec
>>      Action statistics:
>>     Sent 600 bytes 10 pkt (dropped 0, overlimits 0 requeues 0)
>>     Sent software 0 bytes 0 pkt
>>     Sent hardware 600 bytes 10 pkt
>>     backlog 0b 0p requeues 0
>>
>> filter protocol ip pref 2 flower chain 0
>> filter protocol ip pref 2 flower chain 0 handle 0x1
>>   eth_type ipv4
>>   skip_sw
>>   in_hw in_hw_count 1
>>     action order 1: mirred (Egress Redirect to device mlx_p0) stolen
>>      index 1 ref 1 bind 1 installed 978 sec used 73 sec
>>      Action statistics:
>>     Sent 71556027574 bytes 47805525 pkt (dropped 0, overlimits 0 requeues 0)
>>     Sent software 0 bytes 0 pkt
>>     Sent hardware 71556027574 bytes 47805525 pkt
>>     backlog 0b 0p requeues 0
>>
>>
>>
>> test case 2:  tcp send from VF to PF
>>
>> On the reomte server: iperf -s
>>
>> in the vm:
>>
>> # iperf -c 10.0.0.241 -t 10 -i 2
>>
>> ------------------------------------------------------------
>> Client connecting to 10.0.0.241, TCP port 5001
>> TCP window size:  230 KByte (default)
>> ------------------------------------------------------------
>> [  3] local 10.0.0.75 port 53166 connected with 10.0.0.241 port 5001
>> [ ID] Interval       Transfer     Bandwidth
>> [  3]  0.0- 2.0 sec   939 MBytes  3.94 Gbits/sec
>> [  3]  2.0- 4.0 sec   944 MBytes  3.96 Gbits/sec
>> [  3]  4.0- 6.0 sec  1.01 GBytes  4.34 Gbits/sec
>> [  3]  6.0- 8.0 sec  1.03 GBytes  4.44 Gbits/sec
>> [  3]  8.0-10.0 sec  1.02 GBytes  4.39 Gbits/sec
>> [  3]  0.0-10.0 sec  4.90 GBytes  4.21 Gbits/sec
>>
>>
>> Bad performance with offload.  All the packet are offloaded. 
>>
>> It is the offload problem in the hardware?
>>
>>
>> BR
>>
>> wenxu
>>
>>
> Hi Wenxu,
>
> We didn't notice this behavior.
> Could it be your VM doesn't have enough resources to generate the traffic?
> As a listener it's only sending the acks.

I  don't think so. If delete the ingress qdisc on mlx_pf0vf0 and set ifconfig mlx_pf0vf0 10.0.0.241/24"

iperf -s on the host.

In the vm.

# iperf -c 10.0.0.241 -t 10 -i 2
------------------------------------------------------------
Client connecting to 10.0.0.241, TCP port 5001
TCP window size: 4.00 MByte (default)
------------------------------------------------------------
[  3] local 10.0.0.75 port 50960 connected with 10.0.0.241 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 2.0 sec  6.34 GBytes  27.3 Gbits/sec
[  3]  2.0- 4.0 sec  6.45 GBytes  27.7 Gbits/sec
[  3]  4.0- 6.0 sec  6.65 GBytes  28.6 Gbits/sec
[  3]  6.0- 8.0 sec  6.60 GBytes  28.3 Gbits/sec
[  3]  8.0-10.0 sec  6.29 GBytes  27.0 Gbits/sec
[  3]  0.0-10.0 sec  32.3 GBytes  27.8 Gbits/sec


The VM can generic enough traffic. Maybe you can test my case with upstream kernel

> Thanks,
> Roi

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Bad performance for VF outgoing in offloaded mode
  2019-12-04 13:50                         ` Roi Dayan
  2019-12-04 14:32                           ` wenxu
@ 2019-12-05  3:41                           ` wenxu
  1 sibling, 0 replies; 25+ messages in thread
From: wenxu @ 2019-12-05  3:41 UTC (permalink / raw)
  To: Roi Dayan; +Cc: netdev, Saeed Mahameed


On 12/4/2019 9:50 PM, Roi Dayan wrote:
>
> On 2019-11-28 7:03 AM, wenxu wrote:
>> Hi mellanox team,
>>
>>
>> I did a performance test for tc offload with upstream kernel:
>>
>> I setup a vm with a VF as eth0
>>
>> In the vm:
>>
>> ifconfig eth0 10.0.0.75/24 up
>>
>>
>> On the host the mlx_p0 is the pf representor and mlx_pf0vf0 is the vf representor
>>
>> The device in the switchdev mode
>>
>> # grep -ri "" /sys/class/net/*/phys_* 2>/dev/null
>> /sys/class/net/mlx_p0/phys_port_name:p0
>> /sys/class/net/mlx_p0/phys_switch_id:34ebc100034b6b50
>> /sys/class/net/mlx_pf0vf0/phys_port_name:pf0vf0
>> /sys/class/net/mlx_pf0vf0/phys_switch_id:34ebc100034b6b50
>> /sys/class/net/mlx_pf0vf1/phys_port_name:pf0vf1
>> /sys/class/net/mlx_pf0vf1/phys_switch_id:34ebc100034b6b50
>>
>>
>> The tc filter as following: just forward ip/arp packets  in mlx_p0 and mlx_pf0vf0 each other
>>
>> tc qdisc add dev mlx_p0 ingress
>> tc qdisc add dev mlx_pf0vf0 ingress
>>
>> tc filter add dev mlx_pf0vf0 pref 2 ingress  protocol ip flower skip_sw action mirred egress redirect dev mlx_p0
>> tc filter add dev mlx_p0 pref 2 ingress  protocol ip flower skip_sw action mirred egress redirect dev mlx_pf0vf0
>>
>> tc filter add dev mlx_pf0vf0 pref 1 ingress  protocol arp flower skip_sw action mirred egress redirect dev mlx_p0
>> tc filter add dev mlx_p0 pref 1 ingress  protocol arp flower skip_sw action mirred egress redirect dev mlx_pf0vf0
>>
>>
>> The remote server device eth0:
>>
>> ifconfig eth0 10.0.0.241/24
>>
>>
>> test case 1:   tcp recieve from VF to PF
>>
>> In the vm: iperf -s
>>
>> On the remote server:
>>
>> iperf -c 10.0.0.75 -t 10 -i 2
>> ------------------------------------------------------------
>> Client connecting to 10.0.0.75, TCP port 5001
>> TCP window size: 85.0 KByte (default)
>> ------------------------------------------------------------
>> [  3] local 10.0.0.241 port 59708 connected with 10.0.0.75 port 5001
>> [ ID] Interval       Transfer     Bandwidth
>> [  3]  0.0- 2.0 sec  5.40 GBytes  23.2 Gbits/sec
>> [  3]  2.0- 4.0 sec  5.35 GBytes  23.0 Gbits/sec
>> [  3]  4.0- 6.0 sec  5.46 GBytes  23.5 Gbits/sec
>> [  3]  6.0- 8.0 sec  5.10 GBytes  21.9 Gbits/sec
>> [  3]  8.0-10.0 sec  5.36 GBytes  23.0 Gbits/sec
>> [  3]  0.0-10.0 sec  26.7 GBytes  22.9 Gbits/sec
>>
>>
>> Good performance with offload.
>>
>> # tc -s filter ls dev mlx_p0 ingress
>> filter protocol arp pref 1 flower chain 0
>> filter protocol arp pref 1 flower chain 0 handle 0x1
>>   eth_type arp
>>   skip_sw
>>   in_hw in_hw_count 1
>>     action order 1: mirred (Egress Redirect to device mlx_pf0vf0) stolen
>>      index 4 ref 1 bind 1 installed 971 sec used 82 sec
>>      Action statistics:
>>     Sent 420 bytes 7 pkt (dropped 0, overlimits 0 requeues 0)
>>     Sent software 0 bytes 0 pkt
>>     Sent hardware 420 bytes 7 pkt
>>     backlog 0b 0p requeues 0
>>
>> filter protocol ip pref 2 flower chain 0
>> filter protocol ip pref 2 flower chain 0 handle 0x1
>>   eth_type ipv4
>>   skip_sw
>>   in_hw in_hw_count 1
>>     action order 1: mirred (Egress Redirect to device mlx_pf0vf0) stolen
>>      index 2 ref 1 bind 1 installed 972 sec used 67 sec
>>      Action statistics:
>>     Sent 79272204362 bytes 91511261 pkt (dropped 0, overlimits 0 requeues 0)
>>     Sent software 0 bytes 0 pkt
>>     Sent hardware 79272204362 bytes 91511261 pkt
>>     backlog 0b 0p requeues 0
>>
>> #  tc -s filter ls dev mlx_pf0vf0 ingress
>> filter protocol arp pref 1 flower chain 0
>> filter protocol arp pref 1 flower chain 0 handle 0x1
>>   eth_type arp
>>   skip_sw
>>   in_hw in_hw_count 1
>>     action order 1: mirred (Egress Redirect to device mlx_p0) stolen
>>      index 3 ref 1 bind 1 installed 978 sec used 88 sec
>>      Action statistics:
>>     Sent 600 bytes 10 pkt (dropped 0, overlimits 0 requeues 0)
>>     Sent software 0 bytes 0 pkt
>>     Sent hardware 600 bytes 10 pkt
>>     backlog 0b 0p requeues 0
>>
>> filter protocol ip pref 2 flower chain 0
>> filter protocol ip pref 2 flower chain 0 handle 0x1
>>   eth_type ipv4
>>   skip_sw
>>   in_hw in_hw_count 1
>>     action order 1: mirred (Egress Redirect to device mlx_p0) stolen
>>      index 1 ref 1 bind 1 installed 978 sec used 73 sec
>>      Action statistics:
>>     Sent 71556027574 bytes 47805525 pkt (dropped 0, overlimits 0 requeues 0)
>>     Sent software 0 bytes 0 pkt
>>     Sent hardware 71556027574 bytes 47805525 pkt
>>     backlog 0b 0p requeues 0
>>
>>
>>
>> test case 2:  tcp send from VF to PF
>>
>> On the reomte server: iperf -s
>>
>> in the vm:
>>
>> # iperf -c 10.0.0.241 -t 10 -i 2
>>
>> ------------------------------------------------------------
>> Client connecting to 10.0.0.241, TCP port 5001
>> TCP window size:  230 KByte (default)
>> ------------------------------------------------------------
>> [  3] local 10.0.0.75 port 53166 connected with 10.0.0.241 port 5001
>> [ ID] Interval       Transfer     Bandwidth
>> [  3]  0.0- 2.0 sec   939 MBytes  3.94 Gbits/sec
>> [  3]  2.0- 4.0 sec   944 MBytes  3.96 Gbits/sec
>> [  3]  4.0- 6.0 sec  1.01 GBytes  4.34 Gbits/sec
>> [  3]  6.0- 8.0 sec  1.03 GBytes  4.44 Gbits/sec
>> [  3]  8.0-10.0 sec  1.02 GBytes  4.39 Gbits/sec
>> [  3]  0.0-10.0 sec  4.90 GBytes  4.21 Gbits/sec
>>
>>
>> Bad performance with offload.  All the packet are offloaded. 
>>
>> It is the offload problem in the hardware?
>>
>>
>> BR
>>
>> wenxu
>>
>>
> Hi Wenxu,
>
> We didn't notice this behavior.
> Could it be your VM doesn't have enough resources to generate the traffic?
> As a listener it's only sending the acks.

Sorry, I found it is the problem of remote 10.0.0.241 server.

>
> Thanks,
> Roi

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Question about flow table offload in mlx5e
  2019-12-02  3:37                               ` wenxu
@ 2019-12-05 15:17                                 ` Paul Blakey
  0 siblings, 0 replies; 25+ messages in thread
From: Paul Blakey @ 2019-12-05 15:17 UTC (permalink / raw)
  To: wenxu; +Cc: netdev

On 12/2/2019 5:37 AM, wenxu wrote:

> Hi Paul,
>
>
> Sorry for trouble you again. I think it is a problem in ft callback.
>
> Can your help me fix it. Thx!
>
> I did the test like you with route tc rules to ft callback.
>
> # ifconfig mlx_p0 172.168.152.75/24 up
> # ip n r 172.16.152.241 lladdr fa:fa:ff:ff:ff:ff dev mlx_p0
>
> # ip l add dev tun1 type gretap external
> # tc qdisc add dev tun1 ingress
> # tc qdisc add dev mlx_pf0vf0 ingress
>
> # tc filter add dev mlx_pf0vf0 pref 2 ingress  protocol ip flower skip_sw  action tunnel_key set dst_ip 172.168.152.241 src_ip 0 id 1000 nocsum pipe action mirred egress redirect dev tun1
>
>
> In The vm:
> # ifconfig eth0 10.0.0.75/24 up
> # ip n r 10.0.0.77 lladdr fa:ff:ff:ff:ff:ff dev eth0
>
> # iperf -c 10.0.0.77 -t 100 -i 2
>
> The syn packets can be offloaded successfully.
>
> # # tc -s filter ls dev mlx_pf0vf0 ingress
> filter protocol ip pref 2 flower chain 0
> filter protocol ip pref 2 flower chain 0 handle 0x1
>    eth_type ipv4
>    skip_sw
>    in_hw in_hw_count 1
> 	action order 1: tunnel_key  set
> 	src_ip 0.0.0.0
> 	dst_ip 172.168.152.241
> 	key_id 1000
> 	nocsum pipe
> 	 index 1 ref 1 bind 1 installed 252 sec used 252 sec
> 	Action statistics:
> 	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
> 	backlog 0b 0p requeues 0
>
> 	action order 2: mirred (Egress Redirect to device tun1) stolen
>   	index 1 ref 1 bind 1 installed 252 sec used 110 sec
>   	Action statistics:
> 	Sent 3420 bytes 11 pkt (dropped 0, overlimits 0 requeues 0)
> 	Sent software 0 bytes 0 pkt
> 	Sent hardware 3420 bytes 11 pkt
> 	backlog 0b 0p requeues 0
>
> But Then I add another decap filter on tun1:
>
> tc filter add dev tun1 pref 2 ingress protocol ip flower enc_key_id 1000 enc_src_ip 172.168.152.241 action tunnel_key unset pipe action mirred egress redirect dev mlx_pf0vf0
>
> # iperf -c 10.0.0.77 -t 100 -i 2
>
> The syn packets can't be offloaded. The tc filter counter is also not increase.
>
>
> # tc -s filter ls dev mlx_pf0vf0 ingress
> filter protocol ip pref 2 flower chain 0
> filter protocol ip pref 2 flower chain 0 handle 0x1
>    eth_type ipv4
>    skip_sw
>    in_hw in_hw_count 1
> 	action order 1: tunnel_key  set
> 	src_ip 0.0.0.0
> 	dst_ip 172.168.152.241
> 	key_id 1000
> 	nocsum pipe
> 	 index 1 ref 1 bind 1 installed 320 sec used 320 sec
> 	Action statistics:
> 	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
> 	backlog 0b 0p requeues 0
>
> 	action order 2: mirred (Egress Redirect to device tun1) stolen
>   	index 1 ref 1 bind 1 installed 320 sec used 178 sec
>   	Action statistics:
> 	Sent 3420 bytes 11 pkt (dropped 0, overlimits 0 requeues 0)
> 	Sent software 0 bytes 0 pkt
> 	Sent hardware 3420 bytes 11 pkt
> 	backlog 0b 0p requeues 0
>
> # tc -s filter ls dev tun1 ingress
> filter protocol ip pref 2 flower chain 0
> filter protocol ip pref 2 flower chain 0 handle 0x1
>    eth_type ipv4
>    enc_src_ip 172.168.152.241
>    enc_key_id 1000
>    in_hw in_hw_count 1
> 	action order 1: tunnel_key  unset pipe
> 	 index 2 ref 1 bind 1 installed 391 sec used 391 sec
> 	Action statistics:
> 	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
> 	backlog 0b 0p requeues 0
>
> 	action order 2: mirred (Egress Redirect to device mlx_pf0vf0) stolen
>   	index 2 ref 1 bind 1 installed 391 sec used 391 sec
>   	Action statistics:
> 	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
> 	backlog 0b 0p requeues 0
>
>
> So there maybe some problem for ft callback setup. When there is another reverse
> decap rule add in tunnel device, The encap rule will not offloaded the packets.
>
> Expect your help Thx!
>
>
> BR
> wenxu
>
>
>
>
>
>
Hi I reproduced it.

I'll find the reason and fix for it and get back to you soon.

We are planing on expanding our chain and prio supported range, and in 
that we also move the FT offload code a bit.

If what I think happens happened it would fix it anyway.

Thanks.


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, back to index

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-19  7:08 [PATCH net-next] ip_gre: Make none-tun-dst gre tunnel keep tunnel info wenxu
2019-11-20  0:39 ` David Miller
2019-11-21  7:30   ` Question about flow table offload in mlx5e wenxu
2019-11-21  7:42     ` Paul Blakey
2019-11-21  8:28       ` wenxu
2019-11-21 11:39         ` Paul Blakey
2019-11-21 11:40           ` Paul Blakey
2019-11-21 12:35           ` wenxu
2019-11-21 13:05             ` Paul Blakey
2019-11-21 13:39               ` wenxu
2019-11-22  6:12               ` wenxu
     [not found]               ` <64285654-bc9a-c76e-5875-dc6e434dc4d4@ucloud.cn>
2019-11-24  8:46                 ` Paul Blakey
2019-11-24 11:14                   ` wenxu
2019-11-26  8:18                   ` wenxu
     [not found]                     ` <84874b42-c525-2149-539d-e7510d15f6a6@mellanox.com>
2019-11-27 12:16                       ` wenxu
2019-11-27 12:16                       ` wenxu
2019-11-27 13:11                         ` Paul Blakey
2019-11-27 13:20                           ` Paul Blakey
2019-11-27 13:45                             ` wenxu
2019-12-02  3:37                               ` wenxu
2019-12-05 15:17                                 ` Paul Blakey
2019-11-28  5:03                       ` Bad performance for VF outgoing in offloaded mode wenxu
2019-12-04 13:50                         ` Roi Dayan
2019-12-04 14:32                           ` wenxu
2019-12-05  3:41                           ` wenxu

Netdev Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/netdev/0 netdev/git/0.git
	git clone --mirror https://lore.kernel.org/netdev/1 netdev/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 netdev netdev/ https://lore.kernel.org/netdev \
		netdev@vger.kernel.org
	public-inbox-index netdev

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.netdev


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git