All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] net/mlx5e: Allow removing representors netdev to other namespace
@ 2019-05-18  0:54 xiangxia.m.yue
  2019-05-20 20:24 ` Or Gerlitz
  0 siblings, 1 reply; 8+ messages in thread
From: xiangxia.m.yue @ 2019-05-18  0:54 UTC (permalink / raw)
  To: roid, saeedm; +Cc: netdev, Tonghao Zhang

From: Tonghao Zhang <xiangxia.m.yue@gmail.com>

At most case, we use the ConnectX-5 NIC on compute node for VMs,
but we will offload forwarding rules to NICs on gateway node.
On the gateway node, we will install multiple NICs and set them to
different dockers which contain different net namespace, different
routing table. In this way, we can specify the agent process on one
docker. More dockers mean more high throughput.

The commit abd3277287c7 ("net/mlx5e: Disallow changing name-space for VF representors")
disallow it, but we can change it now for gateway use case.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index 91e24f1..15e932f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -1409,7 +1409,7 @@ static void mlx5e_build_rep_netdev(struct net_device *netdev)
 	netdev->watchdog_timeo    = 15 * HZ;
 
 
-	netdev->features	 |= NETIF_F_HW_TC | NETIF_F_NETNS_LOCAL;
+	netdev->features	 |= NETIF_F_HW_TC;
 	netdev->hw_features      |= NETIF_F_HW_TC;
 
 	netdev->hw_features    |= NETIF_F_SG;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] net/mlx5e: Allow removing representors netdev to other namespace
  2019-05-18  0:54 [PATCH] net/mlx5e: Allow removing representors netdev to other namespace xiangxia.m.yue
@ 2019-05-20 20:24 ` Or Gerlitz
  2019-05-21  4:35   ` Tonghao Zhang
  0 siblings, 1 reply; 8+ messages in thread
From: Or Gerlitz @ 2019-05-20 20:24 UTC (permalink / raw)
  To: Tonghao Zhang; +Cc: Roi Dayan, Saeed Mahameed, Linux Netdev List

On Mon, May 20, 2019 at 3:19 PM <xiangxia.m.yue@gmail.com> wrote:
>
> From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
>
> At most case, we use the ConnectX-5 NIC on compute node for VMs,
> but we will offload forwarding rules to NICs on gateway node.
> On the gateway node, we will install multiple NICs and set them to
> different dockers which contain different net namespace, different
> routing table. In this way, we can specify the agent process on one
> docker. More dockers mean more high throughput.

The vport (uplink and VF) representor netdev stands for the e-switch
side of things. If you put different
vport devices to different namespaces, you will not be able to forward
between them. It's the NIC side of things
(VF netdevice) which can/should be put to namespaces.

For example, with SW veth devices, suppose I we have two pairs
(v0,v1), (v2, v3) -- we create
a SW switch (linux bridge, ovs) with the uplink and v0/v2 as ports all
in a single name space
and we map v1 and v3 into application containers.

I am missing how can you make any use with vport reps belonging to the
same HW e-switch
on different name-spaces, maybe send chart?


>
> The commit abd3277287c7 ("net/mlx5e: Disallow changing name-space for VF representors")
> disallow it, but we can change it now for gateway use case.
>
> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
> index 91e24f1..15e932f 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
> @@ -1409,7 +1409,7 @@ static void mlx5e_build_rep_netdev(struct net_device *netdev)
>         netdev->watchdog_timeo    = 15 * HZ;
>
>
> -       netdev->features         |= NETIF_F_HW_TC | NETIF_F_NETNS_LOCAL;
> +       netdev->features         |= NETIF_F_HW_TC;
>         netdev->hw_features      |= NETIF_F_HW_TC;
>
>         netdev->hw_features    |= NETIF_F_SG;
> --
> 1.8.3.1
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] net/mlx5e: Allow removing representors netdev to other namespace
  2019-05-20 20:24 ` Or Gerlitz
@ 2019-05-21  4:35   ` Tonghao Zhang
  2019-05-21 16:44     ` Or Gerlitz
  0 siblings, 1 reply; 8+ messages in thread
From: Tonghao Zhang @ 2019-05-21  4:35 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: Roi Dayan, Saeed Mahameed, Linux Netdev List

On Tue, May 21, 2019 at 4:24 AM Or Gerlitz <gerlitz.or@gmail.com> wrote:
>
> On Mon, May 20, 2019 at 3:19 PM <xiangxia.m.yue@gmail.com> wrote:
> >
> > From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> >
> > At most case, we use the ConnectX-5 NIC on compute node for VMs,
> > but we will offload forwarding rules to NICs on gateway node.
> > On the gateway node, we will install multiple NICs and set them to
> > different dockers which contain different net namespace, different
> > routing table. In this way, we can specify the agent process on one
> > docker. More dockers mean more high throughput.
>
> The vport (uplink and VF) representor netdev stands for the e-switch
> side of things. If you put different
> vport devices to different namespaces, you will not be able to forward
> between them. It's the NIC side of things
> (VF netdevice) which can/should be put to namespaces.
>
> For example, with SW veth devices, suppose I we have two pairs
> (v0,v1), (v2, v3) -- we create
> a SW switch (linux bridge, ovs) with the uplink and v0/v2 as ports all
> in a single name space
> and we map v1 and v3 into application containers.
>
> I am missing how can you make any use with vport reps belonging to the
> same HW e-switch
> on different name-spaces, maybe send chart?
   +---------------------------------------------------------+
   |                                                         |
   |                                                         |
   |       docker01                 docker02                 |
   |                                                         |
   | +-----------------+      +------------------+           |
   | |    NIC (rep/vf) |      |       NIC        |           |
   | |                 |      |                  |   host    |
   | |   +--------+    |      |   +---------+    |           |
   | +-----------------+      +------------------+           |
   |     |        |               |         |                |
   +---------------------------------------------------------+
         |        |               |         |
         |        |         phy_port2       | phy_port3
         |        |               |         |
         |        |               |         |
phy_port0|        |phy_port1      |         |
         |        |               |         |
         v        +               v         +

For example, there are two NIC(4 phy ports) on the host, we set the
one NIC to docker01(all rep and vf of this nic are set to docker01).
and other one NIC are set to docker02. The docker01/docker02 run our
agent which use the tc command to offload the rule. The NIC of
docker01 will receive packets from phy_port1
and do the QoS , NAT(pedit action) and then forward them to phy_port0.
The NIC of docker02 do this in the same way.



>
> >
> > The commit abd3277287c7 ("net/mlx5e: Disallow changing name-space for VF representors")
> > disallow it, but we can change it now for gateway use case.
> >
> > Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> > ---
> >  drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
> > index 91e24f1..15e932f 100644
> > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
> > @@ -1409,7 +1409,7 @@ static void mlx5e_build_rep_netdev(struct net_device *netdev)
> >         netdev->watchdog_timeo    = 15 * HZ;
> >
> >
> > -       netdev->features         |= NETIF_F_HW_TC | NETIF_F_NETNS_LOCAL;
> > +       netdev->features         |= NETIF_F_HW_TC;
> >         netdev->hw_features      |= NETIF_F_HW_TC;
> >
> >         netdev->hw_features    |= NETIF_F_SG;
> > --
> > 1.8.3.1
> >

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] net/mlx5e: Allow removing representors netdev to other namespace
  2019-05-21  4:35   ` Tonghao Zhang
@ 2019-05-21 16:44     ` Or Gerlitz
  2019-05-22  1:25       ` Tonghao Zhang
  0 siblings, 1 reply; 8+ messages in thread
From: Or Gerlitz @ 2019-05-21 16:44 UTC (permalink / raw)
  To: Tonghao Zhang; +Cc: Roi Dayan, Saeed Mahameed, Linux Netdev List

On Tue, May 21, 2019 at 7:36 AM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
> On Tue, May 21, 2019 at 4:24 AM Or Gerlitz <gerlitz.or@gmail.com> wrote:
> >
> > On Mon, May 20, 2019 at 3:19 PM <xiangxia.m.yue@gmail.com> wrote:
> > >
> > > From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> > >
> > > At most case, we use the ConnectX-5 NIC on compute node for VMs,
> > > but we will offload forwarding rules to NICs on gateway node.
> > > On the gateway node, we will install multiple NICs and set them to
> > > different dockers which contain different net namespace, different
> > > routing table. In this way, we can specify the agent process on one
> > > docker. More dockers mean more high throughput.
> >
> > The vport (uplink and VF) representor netdev stands for the e-switch
> > side of things. If you put different
> > vport devices to different namespaces, you will not be able to forward
> > between them. It's the NIC side of things
> > (VF netdevice) which can/should be put to namespaces.
> >
> > For example, with SW veth devices, suppose I we have two pairs
> > (v0,v1), (v2, v3) -- we create
> > a SW switch (linux bridge, ovs) with the uplink and v0/v2 as ports all
> > in a single name space
> > and we map v1 and v3 into application containers.
> >
> > I am missing how can you make any use with vport reps belonging to the
> > same HW e-switch
> > on different name-spaces, maybe send chart?
>    +---------------------------------------------------------+
>    |                                                         |
>    |                                                         |
>    |       docker01                 docker02                 |
>    |                                                         |
>    | +-----------------+      +------------------+           |
>    | |    NIC (rep/vf) |      |       NIC        |           |
>    | |                 |      |                  |   host    |
>    | |   +--------+    |      |   +---------+    |           |
>    | +-----------------+      +------------------+           |
>    |     |        |               |         |                |
>    +---------------------------------------------------------+
>          |        |               |         |
>          |        |         phy_port2       | phy_port3
>          |        |               |         |
>          |        |               |         |
> phy_port0|        |phy_port1      |         |
>          |        |               |         |
>          v        +               v         +
>
> For example, there are two NIC(4 phy ports) on the host, we set the
> one NIC to docker01(all rep and vf of this nic are set to docker01).
> and other one NIC are set to docker02. The docker01/docker02 run our
> agent which use the tc command to offload the rule. The NIC of
> docker01 will receive packets from phy_port1
> and do the QoS , NAT(pedit action) and then forward them to phy_port0.
> The NIC of docker02 do this in the same way.

I see, so in the case you described about, you are going to move **all** the
representors of a certain e-switch into **one** name-space -- this is something
we don't have to block. However, I think we did wanted to disallow moving
sub-set of the port reps into a name-space. Should look into that.

Or.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] net/mlx5e: Allow removing representors netdev to other namespace
  2019-05-21 16:44     ` Or Gerlitz
@ 2019-05-22  1:25       ` Tonghao Zhang
  2019-05-22  4:49         ` Or Gerlitz
  0 siblings, 1 reply; 8+ messages in thread
From: Tonghao Zhang @ 2019-05-22  1:25 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: Roi Dayan, Saeed Mahameed, Linux Netdev List

On Wed, May 22, 2019 at 12:45 AM Or Gerlitz <gerlitz.or@gmail.com> wrote:
>
> On Tue, May 21, 2019 at 7:36 AM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
> > On Tue, May 21, 2019 at 4:24 AM Or Gerlitz <gerlitz.or@gmail.com> wrote:
> > >
> > > On Mon, May 20, 2019 at 3:19 PM <xiangxia.m.yue@gmail.com> wrote:
> > > >
> > > > From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> > > >
> > > > At most case, we use the ConnectX-5 NIC on compute node for VMs,
> > > > but we will offload forwarding rules to NICs on gateway node.
> > > > On the gateway node, we will install multiple NICs and set them to
> > > > different dockers which contain different net namespace, different
> > > > routing table. In this way, we can specify the agent process on one
> > > > docker. More dockers mean more high throughput.
> > >
> > > The vport (uplink and VF) representor netdev stands for the e-switch
> > > side of things. If you put different
> > > vport devices to different namespaces, you will not be able to forward
> > > between them. It's the NIC side of things
> > > (VF netdevice) which can/should be put to namespaces.
> > >
> > > For example, with SW veth devices, suppose I we have two pairs
> > > (v0,v1), (v2, v3) -- we create
> > > a SW switch (linux bridge, ovs) with the uplink and v0/v2 as ports all
> > > in a single name space
> > > and we map v1 and v3 into application containers.
> > >
> > > I am missing how can you make any use with vport reps belonging to the
> > > same HW e-switch
> > > on different name-spaces, maybe send chart?
> >    +---------------------------------------------------------+
> >    |                                                         |
> >    |                                                         |
> >    |       docker01                 docker02                 |
> >    |                                                         |
> >    | +-----------------+      +------------------+           |
> >    | |    NIC (rep/vf) |      |       NIC        |           |
> >    | |                 |      |                  |   host    |
> >    | |   +--------+    |      |   +---------+    |           |
> >    | +-----------------+      +------------------+           |
> >    |     |        |               |         |                |
> >    +---------------------------------------------------------+
> >          |        |               |         |
> >          |        |         phy_port2       | phy_port3
> >          |        |               |         |
> >          |        |               |         |
> > phy_port0|        |phy_port1      |         |
> >          |        |               |         |
> >          v        +               v         +
> >
> > For example, there are two NIC(4 phy ports) on the host, we set the
> > one NIC to docker01(all rep and vf of this nic are set to docker01).
> > and other one NIC are set to docker02. The docker01/docker02 run our
> > agent which use the tc command to offload the rule. The NIC of
> > docker01 will receive packets from phy_port1
> > and do the QoS , NAT(pedit action) and then forward them to phy_port0.
> > The NIC of docker02 do this in the same way.
>
> I see, so in the case you described about, you are going to move **all** the
> representors of a certain e-switch into **one** name-space -- this is something
> we don't have to block. However, I think we did wanted to disallow moving
> sub-set of the port reps into a name-space. Should look into that.
I review the reps of netronome nfp codes,  nfp does't set the
NETIF_F_NETNS_LOCAL to netdev->features.
And I changed the OFED codes which used for our product environment,
and then send this patch to upstream.
> Or.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] net/mlx5e: Allow removing representors netdev to other namespace
  2019-05-22  1:25       ` Tonghao Zhang
@ 2019-05-22  4:49         ` Or Gerlitz
  2019-08-01  0:44           ` Tonghao Zhang
  0 siblings, 1 reply; 8+ messages in thread
From: Or Gerlitz @ 2019-05-22  4:49 UTC (permalink / raw)
  To: Tonghao Zhang; +Cc: Roi Dayan, Saeed Mahameed, Linux Netdev List

On Wed, May 22, 2019 at 4:26 AM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:

> I review the reps of netronome nfp codes,  nfp does't set the
> NETIF_F_NETNS_LOCAL to netdev->features.
> And I changed the OFED codes which used for our product environment,
> and then send this patch to upstream.

The real question here is if we can provide the required separation when
vport rep netdevs are put into different name-spaces -- this needs deeper
thinking. Technically you can do that with this one liner patch but we have
to see if/what assumptions could be broken as of that.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] net/mlx5e: Allow removing representors netdev to other namespace
  2019-05-22  4:49         ` Or Gerlitz
@ 2019-08-01  0:44           ` Tonghao Zhang
  2019-08-04 10:21             ` Or Gerlitz
  0 siblings, 1 reply; 8+ messages in thread
From: Tonghao Zhang @ 2019-08-01  0:44 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: Roi Dayan, Saeed Mahameed, Linux Netdev List

On Wed, May 22, 2019 at 12:49 PM Or Gerlitz <gerlitz.or@gmail.com> wrote:
>
> On Wed, May 22, 2019 at 4:26 AM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
>
> > I review the reps of netronome nfp codes,  nfp does't set the
> > NETIF_F_NETNS_LOCAL to netdev->features.
> > And I changed the OFED codes which used for our product environment,
> > and then send this patch to upstream.
>
> The real question here is if we can provide the required separation when
> vport rep netdevs are put into different name-spaces -- this needs deeper
> thinking. Technically you can do that with this one liner patch but we have
> to see if/what assumptions could be broken as of that.
Hi Or,
Can we add a mode parm for allowing user to switch it off/on ?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] net/mlx5e: Allow removing representors netdev to other namespace
  2019-08-01  0:44           ` Tonghao Zhang
@ 2019-08-04 10:21             ` Or Gerlitz
  0 siblings, 0 replies; 8+ messages in thread
From: Or Gerlitz @ 2019-08-04 10:21 UTC (permalink / raw)
  To: Tonghao Zhang; +Cc: Roi Dayan, Saeed Mahameed, Linux Netdev List

On Thu, Aug 1, 2019 at 3:44 AM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
> On Wed, May 22, 2019 at 12:49 PM Or Gerlitz <gerlitz.or@gmail.com> wrote:
> > On Wed, May 22, 2019 at 4:26 AM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:

> > > I review the reps of netronome nfp codes,  nfp does't set the
> > > NETIF_F_NETNS_LOCAL to netdev->features.
> > > And I changed the OFED codes which used for our product environment,
> > > and then send this patch to upstream.

> > The real question here is if we can provide the required separation when
> > vport rep netdevs are put into different name-spaces -- this needs deeper
> > thinking. Technically you can do that with this one liner patch but we have
> > to see if/what assumptions could be broken as of that.

> Can we add a mode parm for allowing user to switch it off/on ?

The kernel model for namespace means a completely new copy of the
networking stack
with new routing tables, new neighbour tables. everything. It also
means netdevices in
different namespaces can't communicate with each other. I tend to
think that our FW/HW
model doesn't support that and hence we can't do proper offloading of
the SW model.

I suggest you approach the current maintainers (Roi and Saeed) to see
if they have different opinion.

Or.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-08-04 10:24 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-18  0:54 [PATCH] net/mlx5e: Allow removing representors netdev to other namespace xiangxia.m.yue
2019-05-20 20:24 ` Or Gerlitz
2019-05-21  4:35   ` Tonghao Zhang
2019-05-21 16:44     ` Or Gerlitz
2019-05-22  1:25       ` Tonghao Zhang
2019-05-22  4:49         ` Or Gerlitz
2019-08-01  0:44           ` Tonghao Zhang
2019-08-04 10:21             ` Or Gerlitz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.