* [PATCH] net/mlx5e: Allow removing representors netdev to other namespace @ 2019-05-18 0:54 xiangxia.m.yue 2019-05-20 20:24 ` Or Gerlitz 0 siblings, 1 reply; 8+ messages in thread From: xiangxia.m.yue @ 2019-05-18 0:54 UTC (permalink / raw) To: roid, saeedm; +Cc: netdev, Tonghao Zhang From: Tonghao Zhang <xiangxia.m.yue@gmail.com> At most case, we use the ConnectX-5 NIC on compute node for VMs, but we will offload forwarding rules to NICs on gateway node. On the gateway node, we will install multiple NICs and set them to different dockers which contain different net namespace, different routing table. In this way, we can specify the agent process on one docker. More dockers mean more high throughput. The commit abd3277287c7 ("net/mlx5e: Disallow changing name-space for VF representors") disallow it, but we can change it now for gateway use case. Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> --- drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c index 91e24f1..15e932f 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c @@ -1409,7 +1409,7 @@ static void mlx5e_build_rep_netdev(struct net_device *netdev) netdev->watchdog_timeo = 15 * HZ; - netdev->features |= NETIF_F_HW_TC | NETIF_F_NETNS_LOCAL; + netdev->features |= NETIF_F_HW_TC; netdev->hw_features |= NETIF_F_HW_TC; netdev->hw_features |= NETIF_F_SG; -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] net/mlx5e: Allow removing representors netdev to other namespace 2019-05-18 0:54 [PATCH] net/mlx5e: Allow removing representors netdev to other namespace xiangxia.m.yue @ 2019-05-20 20:24 ` Or Gerlitz 2019-05-21 4:35 ` Tonghao Zhang 0 siblings, 1 reply; 8+ messages in thread From: Or Gerlitz @ 2019-05-20 20:24 UTC (permalink / raw) To: Tonghao Zhang; +Cc: Roi Dayan, Saeed Mahameed, Linux Netdev List On Mon, May 20, 2019 at 3:19 PM <xiangxia.m.yue@gmail.com> wrote: > > From: Tonghao Zhang <xiangxia.m.yue@gmail.com> > > At most case, we use the ConnectX-5 NIC on compute node for VMs, > but we will offload forwarding rules to NICs on gateway node. > On the gateway node, we will install multiple NICs and set them to > different dockers which contain different net namespace, different > routing table. In this way, we can specify the agent process on one > docker. More dockers mean more high throughput. The vport (uplink and VF) representor netdev stands for the e-switch side of things. If you put different vport devices to different namespaces, you will not be able to forward between them. It's the NIC side of things (VF netdevice) which can/should be put to namespaces. For example, with SW veth devices, suppose I we have two pairs (v0,v1), (v2, v3) -- we create a SW switch (linux bridge, ovs) with the uplink and v0/v2 as ports all in a single name space and we map v1 and v3 into application containers. I am missing how can you make any use with vport reps belonging to the same HW e-switch on different name-spaces, maybe send chart? > > The commit abd3277287c7 ("net/mlx5e: Disallow changing name-space for VF representors") > disallow it, but we can change it now for gateway use case. > > Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> > --- > drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c > index 91e24f1..15e932f 100644 > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c > @@ -1409,7 +1409,7 @@ static void mlx5e_build_rep_netdev(struct net_device *netdev) > netdev->watchdog_timeo = 15 * HZ; > > > - netdev->features |= NETIF_F_HW_TC | NETIF_F_NETNS_LOCAL; > + netdev->features |= NETIF_F_HW_TC; > netdev->hw_features |= NETIF_F_HW_TC; > > netdev->hw_features |= NETIF_F_SG; > -- > 1.8.3.1 > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] net/mlx5e: Allow removing representors netdev to other namespace 2019-05-20 20:24 ` Or Gerlitz @ 2019-05-21 4:35 ` Tonghao Zhang 2019-05-21 16:44 ` Or Gerlitz 0 siblings, 1 reply; 8+ messages in thread From: Tonghao Zhang @ 2019-05-21 4:35 UTC (permalink / raw) To: Or Gerlitz; +Cc: Roi Dayan, Saeed Mahameed, Linux Netdev List On Tue, May 21, 2019 at 4:24 AM Or Gerlitz <gerlitz.or@gmail.com> wrote: > > On Mon, May 20, 2019 at 3:19 PM <xiangxia.m.yue@gmail.com> wrote: > > > > From: Tonghao Zhang <xiangxia.m.yue@gmail.com> > > > > At most case, we use the ConnectX-5 NIC on compute node for VMs, > > but we will offload forwarding rules to NICs on gateway node. > > On the gateway node, we will install multiple NICs and set them to > > different dockers which contain different net namespace, different > > routing table. In this way, we can specify the agent process on one > > docker. More dockers mean more high throughput. > > The vport (uplink and VF) representor netdev stands for the e-switch > side of things. If you put different > vport devices to different namespaces, you will not be able to forward > between them. It's the NIC side of things > (VF netdevice) which can/should be put to namespaces. > > For example, with SW veth devices, suppose I we have two pairs > (v0,v1), (v2, v3) -- we create > a SW switch (linux bridge, ovs) with the uplink and v0/v2 as ports all > in a single name space > and we map v1 and v3 into application containers. > > I am missing how can you make any use with vport reps belonging to the > same HW e-switch > on different name-spaces, maybe send chart? +---------------------------------------------------------+ | | | | | docker01 docker02 | | | | +-----------------+ +------------------+ | | | NIC (rep/vf) | | NIC | | | | | | | host | | | +--------+ | | +---------+ | | | +-----------------+ +------------------+ | | | | | | | +---------------------------------------------------------+ | | | | | | phy_port2 | phy_port3 | | | | | | | | phy_port0| |phy_port1 | | | | | | v + v + For example, there are two NIC(4 phy ports) on the host, we set the one NIC to docker01(all rep and vf of this nic are set to docker01). and other one NIC are set to docker02. The docker01/docker02 run our agent which use the tc command to offload the rule. The NIC of docker01 will receive packets from phy_port1 and do the QoS , NAT(pedit action) and then forward them to phy_port0. The NIC of docker02 do this in the same way. > > > > > The commit abd3277287c7 ("net/mlx5e: Disallow changing name-space for VF representors") > > disallow it, but we can change it now for gateway use case. > > > > Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> > > --- > > drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c > > index 91e24f1..15e932f 100644 > > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c > > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c > > @@ -1409,7 +1409,7 @@ static void mlx5e_build_rep_netdev(struct net_device *netdev) > > netdev->watchdog_timeo = 15 * HZ; > > > > > > - netdev->features |= NETIF_F_HW_TC | NETIF_F_NETNS_LOCAL; > > + netdev->features |= NETIF_F_HW_TC; > > netdev->hw_features |= NETIF_F_HW_TC; > > > > netdev->hw_features |= NETIF_F_SG; > > -- > > 1.8.3.1 > > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] net/mlx5e: Allow removing representors netdev to other namespace 2019-05-21 4:35 ` Tonghao Zhang @ 2019-05-21 16:44 ` Or Gerlitz 2019-05-22 1:25 ` Tonghao Zhang 0 siblings, 1 reply; 8+ messages in thread From: Or Gerlitz @ 2019-05-21 16:44 UTC (permalink / raw) To: Tonghao Zhang; +Cc: Roi Dayan, Saeed Mahameed, Linux Netdev List On Tue, May 21, 2019 at 7:36 AM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote: > On Tue, May 21, 2019 at 4:24 AM Or Gerlitz <gerlitz.or@gmail.com> wrote: > > > > On Mon, May 20, 2019 at 3:19 PM <xiangxia.m.yue@gmail.com> wrote: > > > > > > From: Tonghao Zhang <xiangxia.m.yue@gmail.com> > > > > > > At most case, we use the ConnectX-5 NIC on compute node for VMs, > > > but we will offload forwarding rules to NICs on gateway node. > > > On the gateway node, we will install multiple NICs and set them to > > > different dockers which contain different net namespace, different > > > routing table. In this way, we can specify the agent process on one > > > docker. More dockers mean more high throughput. > > > > The vport (uplink and VF) representor netdev stands for the e-switch > > side of things. If you put different > > vport devices to different namespaces, you will not be able to forward > > between them. It's the NIC side of things > > (VF netdevice) which can/should be put to namespaces. > > > > For example, with SW veth devices, suppose I we have two pairs > > (v0,v1), (v2, v3) -- we create > > a SW switch (linux bridge, ovs) with the uplink and v0/v2 as ports all > > in a single name space > > and we map v1 and v3 into application containers. > > > > I am missing how can you make any use with vport reps belonging to the > > same HW e-switch > > on different name-spaces, maybe send chart? > +---------------------------------------------------------+ > | | > | | > | docker01 docker02 | > | | > | +-----------------+ +------------------+ | > | | NIC (rep/vf) | | NIC | | > | | | | | host | > | | +--------+ | | +---------+ | | > | +-----------------+ +------------------+ | > | | | | | | > +---------------------------------------------------------+ > | | | | > | | phy_port2 | phy_port3 > | | | | > | | | | > phy_port0| |phy_port1 | | > | | | | > v + v + > > For example, there are two NIC(4 phy ports) on the host, we set the > one NIC to docker01(all rep and vf of this nic are set to docker01). > and other one NIC are set to docker02. The docker01/docker02 run our > agent which use the tc command to offload the rule. The NIC of > docker01 will receive packets from phy_port1 > and do the QoS , NAT(pedit action) and then forward them to phy_port0. > The NIC of docker02 do this in the same way. I see, so in the case you described about, you are going to move **all** the representors of a certain e-switch into **one** name-space -- this is something we don't have to block. However, I think we did wanted to disallow moving sub-set of the port reps into a name-space. Should look into that. Or. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] net/mlx5e: Allow removing representors netdev to other namespace 2019-05-21 16:44 ` Or Gerlitz @ 2019-05-22 1:25 ` Tonghao Zhang 2019-05-22 4:49 ` Or Gerlitz 0 siblings, 1 reply; 8+ messages in thread From: Tonghao Zhang @ 2019-05-22 1:25 UTC (permalink / raw) To: Or Gerlitz; +Cc: Roi Dayan, Saeed Mahameed, Linux Netdev List On Wed, May 22, 2019 at 12:45 AM Or Gerlitz <gerlitz.or@gmail.com> wrote: > > On Tue, May 21, 2019 at 7:36 AM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote: > > On Tue, May 21, 2019 at 4:24 AM Or Gerlitz <gerlitz.or@gmail.com> wrote: > > > > > > On Mon, May 20, 2019 at 3:19 PM <xiangxia.m.yue@gmail.com> wrote: > > > > > > > > From: Tonghao Zhang <xiangxia.m.yue@gmail.com> > > > > > > > > At most case, we use the ConnectX-5 NIC on compute node for VMs, > > > > but we will offload forwarding rules to NICs on gateway node. > > > > On the gateway node, we will install multiple NICs and set them to > > > > different dockers which contain different net namespace, different > > > > routing table. In this way, we can specify the agent process on one > > > > docker. More dockers mean more high throughput. > > > > > > The vport (uplink and VF) representor netdev stands for the e-switch > > > side of things. If you put different > > > vport devices to different namespaces, you will not be able to forward > > > between them. It's the NIC side of things > > > (VF netdevice) which can/should be put to namespaces. > > > > > > For example, with SW veth devices, suppose I we have two pairs > > > (v0,v1), (v2, v3) -- we create > > > a SW switch (linux bridge, ovs) with the uplink and v0/v2 as ports all > > > in a single name space > > > and we map v1 and v3 into application containers. > > > > > > I am missing how can you make any use with vport reps belonging to the > > > same HW e-switch > > > on different name-spaces, maybe send chart? > > +---------------------------------------------------------+ > > | | > > | | > > | docker01 docker02 | > > | | > > | +-----------------+ +------------------+ | > > | | NIC (rep/vf) | | NIC | | > > | | | | | host | > > | | +--------+ | | +---------+ | | > > | +-----------------+ +------------------+ | > > | | | | | | > > +---------------------------------------------------------+ > > | | | | > > | | phy_port2 | phy_port3 > > | | | | > > | | | | > > phy_port0| |phy_port1 | | > > | | | | > > v + v + > > > > For example, there are two NIC(4 phy ports) on the host, we set the > > one NIC to docker01(all rep and vf of this nic are set to docker01). > > and other one NIC are set to docker02. The docker01/docker02 run our > > agent which use the tc command to offload the rule. The NIC of > > docker01 will receive packets from phy_port1 > > and do the QoS , NAT(pedit action) and then forward them to phy_port0. > > The NIC of docker02 do this in the same way. > > I see, so in the case you described about, you are going to move **all** the > representors of a certain e-switch into **one** name-space -- this is something > we don't have to block. However, I think we did wanted to disallow moving > sub-set of the port reps into a name-space. Should look into that. I review the reps of netronome nfp codes, nfp does't set the NETIF_F_NETNS_LOCAL to netdev->features. And I changed the OFED codes which used for our product environment, and then send this patch to upstream. > Or. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] net/mlx5e: Allow removing representors netdev to other namespace 2019-05-22 1:25 ` Tonghao Zhang @ 2019-05-22 4:49 ` Or Gerlitz 2019-08-01 0:44 ` Tonghao Zhang 0 siblings, 1 reply; 8+ messages in thread From: Or Gerlitz @ 2019-05-22 4:49 UTC (permalink / raw) To: Tonghao Zhang; +Cc: Roi Dayan, Saeed Mahameed, Linux Netdev List On Wed, May 22, 2019 at 4:26 AM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote: > I review the reps of netronome nfp codes, nfp does't set the > NETIF_F_NETNS_LOCAL to netdev->features. > And I changed the OFED codes which used for our product environment, > and then send this patch to upstream. The real question here is if we can provide the required separation when vport rep netdevs are put into different name-spaces -- this needs deeper thinking. Technically you can do that with this one liner patch but we have to see if/what assumptions could be broken as of that. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] net/mlx5e: Allow removing representors netdev to other namespace 2019-05-22 4:49 ` Or Gerlitz @ 2019-08-01 0:44 ` Tonghao Zhang 2019-08-04 10:21 ` Or Gerlitz 0 siblings, 1 reply; 8+ messages in thread From: Tonghao Zhang @ 2019-08-01 0:44 UTC (permalink / raw) To: Or Gerlitz; +Cc: Roi Dayan, Saeed Mahameed, Linux Netdev List On Wed, May 22, 2019 at 12:49 PM Or Gerlitz <gerlitz.or@gmail.com> wrote: > > On Wed, May 22, 2019 at 4:26 AM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote: > > > I review the reps of netronome nfp codes, nfp does't set the > > NETIF_F_NETNS_LOCAL to netdev->features. > > And I changed the OFED codes which used for our product environment, > > and then send this patch to upstream. > > The real question here is if we can provide the required separation when > vport rep netdevs are put into different name-spaces -- this needs deeper > thinking. Technically you can do that with this one liner patch but we have > to see if/what assumptions could be broken as of that. Hi Or, Can we add a mode parm for allowing user to switch it off/on ? ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] net/mlx5e: Allow removing representors netdev to other namespace 2019-08-01 0:44 ` Tonghao Zhang @ 2019-08-04 10:21 ` Or Gerlitz 0 siblings, 0 replies; 8+ messages in thread From: Or Gerlitz @ 2019-08-04 10:21 UTC (permalink / raw) To: Tonghao Zhang; +Cc: Roi Dayan, Saeed Mahameed, Linux Netdev List On Thu, Aug 1, 2019 at 3:44 AM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote: > On Wed, May 22, 2019 at 12:49 PM Or Gerlitz <gerlitz.or@gmail.com> wrote: > > On Wed, May 22, 2019 at 4:26 AM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote: > > > I review the reps of netronome nfp codes, nfp does't set the > > > NETIF_F_NETNS_LOCAL to netdev->features. > > > And I changed the OFED codes which used for our product environment, > > > and then send this patch to upstream. > > The real question here is if we can provide the required separation when > > vport rep netdevs are put into different name-spaces -- this needs deeper > > thinking. Technically you can do that with this one liner patch but we have > > to see if/what assumptions could be broken as of that. > Can we add a mode parm for allowing user to switch it off/on ? The kernel model for namespace means a completely new copy of the networking stack with new routing tables, new neighbour tables. everything. It also means netdevices in different namespaces can't communicate with each other. I tend to think that our FW/HW model doesn't support that and hence we can't do proper offloading of the SW model. I suggest you approach the current maintainers (Roi and Saeed) to see if they have different opinion. Or. ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2019-08-04 10:24 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-05-18 0:54 [PATCH] net/mlx5e: Allow removing representors netdev to other namespace xiangxia.m.yue 2019-05-20 20:24 ` Or Gerlitz 2019-05-21 4:35 ` Tonghao Zhang 2019-05-21 16:44 ` Or Gerlitz 2019-05-22 1:25 ` Tonghao Zhang 2019-05-22 4:49 ` Or Gerlitz 2019-08-01 0:44 ` Tonghao Zhang 2019-08-04 10:21 ` Or Gerlitz
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.