netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Expose bond_xmit_hash function
@ 2020-01-15  8:01 Maor Gottlieb
  2020-01-15  9:45 ` Jiri Pirko
  0 siblings, 1 reply; 17+ messages in thread
From: Maor Gottlieb @ 2020-01-15  8:01 UTC (permalink / raw)
  To: j.vosburgh, vfalico, andy, netdev
  Cc: Saeed Mahameed, Jason Gunthorpe, Leon Romanovsky, Jiri Pirko,
	Alex Rosenbaum, davem, Mark Zhang

RDMA over Converged Ethernet (RoCE) is a standard protocol which enables 
RDMA’s efficient data transfer over Ethernet networks allowing transport 
offload with hardware RDMA engine implementation.
The RoCE v2 protocol exists on top of either the UDP/IPv4 or the 
UDP/IPv6 protocol:

--------------------------------------------------------------
| L2 | L3 | UDP |IB BTH | Payload| ICRC | FCS |
--------------------------------------------------------------

When a bond LAG netdev is in use, we would like to have the same hash 
result for RoCE packets as any other UDP packets, for this purpose we 
need to expose the bond_xmit_hash function to external modules.
If no objection, I will push a patch that export this symbol.

Thanks


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Expose bond_xmit_hash function
  2020-01-15  8:01 Expose bond_xmit_hash function Maor Gottlieb
@ 2020-01-15  9:45 ` Jiri Pirko
  2020-01-15 13:04   ` Maor Gottlieb
  0 siblings, 1 reply; 17+ messages in thread
From: Jiri Pirko @ 2020-01-15  9:45 UTC (permalink / raw)
  To: Maor Gottlieb
  Cc: j.vosburgh, vfalico, andy, netdev, Saeed Mahameed,
	Jason Gunthorpe, Leon Romanovsky, Jiri Pirko, Alex Rosenbaum,
	davem, Mark Zhang

Wed, Jan 15, 2020 at 09:01:43AM CET, maorg@mellanox.com wrote:
>RDMA over Converged Ethernet (RoCE) is a standard protocol which enables 
>RDMA’s efficient data transfer over Ethernet networks allowing transport 
>offload with hardware RDMA engine implementation.
>The RoCE v2 protocol exists on top of either the UDP/IPv4 or the 
>UDP/IPv6 protocol:
>
>--------------------------------------------------------------
>| L2 | L3 | UDP |IB BTH | Payload| ICRC | FCS |
>--------------------------------------------------------------
>
>When a bond LAG netdev is in use, we would like to have the same hash 
>result for RoCE packets as any other UDP packets, for this purpose we 
>need to expose the bond_xmit_hash function to external modules.
>If no objection, I will push a patch that export this symbol.

I don't think it is good idea to do it. It is an internal bond function.
it even accepts "struct bonding *bond". Do you plan to push netdev
struct as an arg instead? What about team? What about OVS bonding?

Also, you don't really need a hash, you need a slave that is going to be
used for a packet xmit.

I think this could work in a generic way:

struct net_device *master_xmit_slave_get(struct net_device *master_dev,
					 struct sk_buff *skb);


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Expose bond_xmit_hash function
  2020-01-15  9:45 ` Jiri Pirko
@ 2020-01-15 13:04   ` Maor Gottlieb
  2020-01-15 14:15     ` Jiri Pirko
  0 siblings, 1 reply; 17+ messages in thread
From: Maor Gottlieb @ 2020-01-15 13:04 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: j.vosburgh, vfalico, andy, netdev, Saeed Mahameed,
	Jason Gunthorpe, Leon Romanovsky, Jiri Pirko, Alex Rosenbaum,
	davem, Mark Zhang, Parav Pandit


On 1/15/2020 11:45 AM, Jiri Pirko wrote:
> Wed, Jan 15, 2020 at 09:01:43AM CET, maorg@mellanox.com wrote:
>> RDMA over Converged Ethernet (RoCE) is a standard protocol which enables
>> RDMA’s efficient data transfer over Ethernet networks allowing transport
>> offload with hardware RDMA engine implementation.
>> The RoCE v2 protocol exists on top of either the UDP/IPv4 or the
>> UDP/IPv6 protocol:
>>
>> --------------------------------------------------------------
>> | L2 | L3 | UDP |IB BTH | Payload| ICRC | FCS |
>> --------------------------------------------------------------
>>
>> When a bond LAG netdev is in use, we would like to have the same hash
>> result for RoCE packets as any other UDP packets, for this purpose we
>> need to expose the bond_xmit_hash function to external modules.
>> If no objection, I will push a patch that export this symbol.
> I don't think it is good idea to do it. It is an internal bond function.
> it even accepts "struct bonding *bond". Do you plan to push netdev
> struct as an arg instead? What about team? What about OVS bonding?

No, I am planning to pass the bond struct as an arg. Currently, team 
bonding is not supported in RoCE LAG and I don't see how OVS is related.

>
> Also, you don't really need a hash, you need a slave that is going to be
> used for a packet xmit.
>
> I think this could work in a generic way:
>
> struct net_device *master_xmit_slave_get(struct net_device *master_dev,
> 					 struct sk_buff *skb);

The suggestion is to put this function in the bond driver and call it 
instead of bond_xmit_hash? is it still necessary if I have the bond pointer?


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Expose bond_xmit_hash function
  2020-01-15 13:04   ` Maor Gottlieb
@ 2020-01-15 14:15     ` Jiri Pirko
  2020-01-15 14:33       ` Leon Romanovsky
  2020-01-16 14:42       ` Andy Gospodarek
  0 siblings, 2 replies; 17+ messages in thread
From: Jiri Pirko @ 2020-01-15 14:15 UTC (permalink / raw)
  To: Maor Gottlieb
  Cc: j.vosburgh, vfalico, andy, netdev, Saeed Mahameed,
	Jason Gunthorpe, Leon Romanovsky, Jiri Pirko, Alex Rosenbaum,
	davem, Mark Zhang, Parav Pandit

Wed, Jan 15, 2020 at 02:04:49PM CET, maorg@mellanox.com wrote:
>
>On 1/15/2020 11:45 AM, Jiri Pirko wrote:
>> Wed, Jan 15, 2020 at 09:01:43AM CET, maorg@mellanox.com wrote:
>>> RDMA over Converged Ethernet (RoCE) is a standard protocol which enables
>>> RDMA’s efficient data transfer over Ethernet networks allowing transport
>>> offload with hardware RDMA engine implementation.
>>> The RoCE v2 protocol exists on top of either the UDP/IPv4 or the
>>> UDP/IPv6 protocol:
>>>
>>> --------------------------------------------------------------
>>> | L2 | L3 | UDP |IB BTH | Payload| ICRC | FCS |
>>> --------------------------------------------------------------
>>>
>>> When a bond LAG netdev is in use, we would like to have the same hash
>>> result for RoCE packets as any other UDP packets, for this purpose we
>>> need to expose the bond_xmit_hash function to external modules.
>>> If no objection, I will push a patch that export this symbol.
>> I don't think it is good idea to do it. It is an internal bond function.
>> it even accepts "struct bonding *bond". Do you plan to push netdev
>> struct as an arg instead? What about team? What about OVS bonding?
>
>No, I am planning to pass the bond struct as an arg. Currently, team 

Hmm, that would be ofcourse wrong, as it is internal bonding driver
structure.


>bonding is not supported in RoCE LAG and I don't see how OVS is related.

Should work for all. OVS is related in a sense that you can do bonding
there too.


>
>>
>> Also, you don't really need a hash, you need a slave that is going to be
>> used for a packet xmit.
>>
>> I think this could work in a generic way:
>>
>> struct net_device *master_xmit_slave_get(struct net_device *master_dev,
>> 					 struct sk_buff *skb);
>
>The suggestion is to put this function in the bond driver and call it 
>instead of bond_xmit_hash? is it still necessary if I have the bond pointer?

No. This should be in a generic code. No direct calls down to bonding
driver please. Or do you want to load bonding module every time your
module loads?

I thinks this can be implemented with ndo with "master_xmit_slave_get()"
as a wrapper. Masters that support this would just implement the ndo.

>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Expose bond_xmit_hash function
  2020-01-15 14:15     ` Jiri Pirko
@ 2020-01-15 14:33       ` Leon Romanovsky
  2020-01-15 16:48         ` Jiri Pirko
  2020-01-16 14:42       ` Andy Gospodarek
  1 sibling, 1 reply; 17+ messages in thread
From: Leon Romanovsky @ 2020-01-15 14:33 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Maor Gottlieb, j.vosburgh, vfalico, andy, netdev, Saeed Mahameed,
	Jason Gunthorpe, Jiri Pirko, Alex Rosenbaum, davem, Mark Zhang,
	Parav Pandit

On Wed, Jan 15, 2020 at 03:15:35PM +0100, Jiri Pirko wrote:
> Wed, Jan 15, 2020 at 02:04:49PM CET, maorg@mellanox.com wrote:
> >
> >On 1/15/2020 11:45 AM, Jiri Pirko wrote:
> >> Wed, Jan 15, 2020 at 09:01:43AM CET, maorg@mellanox.com wrote:
> >>> RDMA over Converged Ethernet (RoCE) is a standard protocol which enables
> >>> RDMA’s efficient data transfer over Ethernet networks allowing transport
> >>> offload with hardware RDMA engine implementation.
> >>> The RoCE v2 protocol exists on top of either the UDP/IPv4 or the
> >>> UDP/IPv6 protocol:
> >>>
> >>> --------------------------------------------------------------
> >>> | L2 | L3 | UDP |IB BTH | Payload| ICRC | FCS |
> >>> --------------------------------------------------------------
> >>>
> >>> When a bond LAG netdev is in use, we would like to have the same hash
> >>> result for RoCE packets as any other UDP packets, for this purpose we
> >>> need to expose the bond_xmit_hash function to external modules.
> >>> If no objection, I will push a patch that export this symbol.
> >> I don't think it is good idea to do it. It is an internal bond function.
> >> it even accepts "struct bonding *bond". Do you plan to push netdev
> >> struct as an arg instead? What about team? What about OVS bonding?
> >
> >No, I am planning to pass the bond struct as an arg. Currently, team
>
> Hmm, that would be ofcourse wrong, as it is internal bonding driver
> structure.
>
>
> >bonding is not supported in RoCE LAG and I don't see how OVS is related.
>
> Should work for all. OVS is related in a sense that you can do bonding
> there too.
>
>
> >
> >>
> >> Also, you don't really need a hash, you need a slave that is going to be
> >> used for a packet xmit.
> >>
> >> I think this could work in a generic way:
> >>
> >> struct net_device *master_xmit_slave_get(struct net_device *master_dev,
> >> 					 struct sk_buff *skb);
> >
> >The suggestion is to put this function in the bond driver and call it
> >instead of bond_xmit_hash? is it still necessary if I have the bond pointer?
>
> No. This should be in a generic code. No direct calls down to bonding
> driver please. Or do you want to load bonding module every time your
> module loads?
>
> I thinks this can be implemented with ndo with "master_xmit_slave_get()"
> as a wrapper. Masters that support this would just implement the ndo.

We are talking about code sharing and not attempting to solve any
problem other than that.

Right now, we have one of two options:
1. One-to-one copy/paste of that bond_xmit function to RDMA.
2. Add EXPORT_SYMBOL and call from RDMA.

Do you have another solution to our undesire to do copy/paste in mind?

Thanks

>
> >

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Expose bond_xmit_hash function
  2020-01-15 14:33       ` Leon Romanovsky
@ 2020-01-15 16:48         ` Jiri Pirko
  2020-01-15 17:34           ` David Ahern
  0 siblings, 1 reply; 17+ messages in thread
From: Jiri Pirko @ 2020-01-15 16:48 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Maor Gottlieb, j.vosburgh, vfalico, andy, netdev, Saeed Mahameed,
	Jason Gunthorpe, Jiri Pirko, Alex Rosenbaum, davem, Mark Zhang,
	Parav Pandit

Wed, Jan 15, 2020 at 03:33:23PM CET, leonro@mellanox.com wrote:
>On Wed, Jan 15, 2020 at 03:15:35PM +0100, Jiri Pirko wrote:
>> Wed, Jan 15, 2020 at 02:04:49PM CET, maorg@mellanox.com wrote:
>> >
>> >On 1/15/2020 11:45 AM, Jiri Pirko wrote:
>> >> Wed, Jan 15, 2020 at 09:01:43AM CET, maorg@mellanox.com wrote:
>> >>> RDMA over Converged Ethernet (RoCE) is a standard protocol which enables
>> >>> RDMA’s efficient data transfer over Ethernet networks allowing transport
>> >>> offload with hardware RDMA engine implementation.
>> >>> The RoCE v2 protocol exists on top of either the UDP/IPv4 or the
>> >>> UDP/IPv6 protocol:
>> >>>
>> >>> --------------------------------------------------------------
>> >>> | L2 | L3 | UDP |IB BTH | Payload| ICRC | FCS |
>> >>> --------------------------------------------------------------
>> >>>
>> >>> When a bond LAG netdev is in use, we would like to have the same hash
>> >>> result for RoCE packets as any other UDP packets, for this purpose we
>> >>> need to expose the bond_xmit_hash function to external modules.
>> >>> If no objection, I will push a patch that export this symbol.
>> >> I don't think it is good idea to do it. It is an internal bond function.
>> >> it even accepts "struct bonding *bond". Do you plan to push netdev
>> >> struct as an arg instead? What about team? What about OVS bonding?
>> >
>> >No, I am planning to pass the bond struct as an arg. Currently, team
>>
>> Hmm, that would be ofcourse wrong, as it is internal bonding driver
>> structure.
>>
>>
>> >bonding is not supported in RoCE LAG and I don't see how OVS is related.
>>
>> Should work for all. OVS is related in a sense that you can do bonding
>> there too.
>>
>>
>> >
>> >>
>> >> Also, you don't really need a hash, you need a slave that is going to be
>> >> used for a packet xmit.
>> >>
>> >> I think this could work in a generic way:
>> >>
>> >> struct net_device *master_xmit_slave_get(struct net_device *master_dev,
>> >> 					 struct sk_buff *skb);
>> >
>> >The suggestion is to put this function in the bond driver and call it
>> >instead of bond_xmit_hash? is it still necessary if I have the bond pointer?
>>
>> No. This should be in a generic code. No direct calls down to bonding
>> driver please. Or do you want to load bonding module every time your
>> module loads?
>>
>> I thinks this can be implemented with ndo with "master_xmit_slave_get()"
>> as a wrapper. Masters that support this would just implement the ndo.
>
>We are talking about code sharing and not attempting to solve any
>problem other than that.

Nope Leon, you are wrong. This is not about code sharing. You need to
know exact slave that is going to be used for the xmit. The hashing
function can be setup per bonding instance, it could only L2 for
example.


>
>Right now, we have one of two options:
>1. One-to-one copy/paste of that bond_xmit function to RDMA.
>2. Add EXPORT_SYMBOL and call from RDMA.
>
>Do you have another solution to our undesire to do copy/paste in mind?

I presented it in this thread.


>
>Thanks
>
>>
>> >

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Expose bond_xmit_hash function
  2020-01-15 16:48         ` Jiri Pirko
@ 2020-01-15 17:34           ` David Ahern
  2020-01-15 18:04             ` Jay Vosburgh
  0 siblings, 1 reply; 17+ messages in thread
From: David Ahern @ 2020-01-15 17:34 UTC (permalink / raw)
  To: Jiri Pirko, Leon Romanovsky
  Cc: Maor Gottlieb, j.vosburgh, vfalico, andy, netdev, Saeed Mahameed,
	Jason Gunthorpe, Jiri Pirko, Alex Rosenbaum, davem, Mark Zhang,
	Parav Pandit

On 1/15/20 9:48 AM, Jiri Pirko wrote:
>>
>> Right now, we have one of two options:
>> 1. One-to-one copy/paste of that bond_xmit function to RDMA.
>> 2. Add EXPORT_SYMBOL and call from RDMA.
>>
>> Do you have another solution to our undesire to do copy/paste in mind?
> 
> I presented it in this thread.
> 

Something similar is needed for xdp and not necessarily tied to a
specific bond mode. Some time back I was using this as a prototype:

https://github.com/dsahern/linux/commit/2714abc1e629613e3485b7aa860fa3096e273cb2

It is incomplete, but shows the intent - exporting bond_egress_slave for
use by other code to take a bond device and return an egress leg.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Expose bond_xmit_hash function
  2020-01-15 17:34           ` David Ahern
@ 2020-01-15 18:04             ` Jay Vosburgh
  2020-01-15 18:12               ` David Ahern
  0 siblings, 1 reply; 17+ messages in thread
From: Jay Vosburgh @ 2020-01-15 18:04 UTC (permalink / raw)
  To: David Ahern
  Cc: Jiri Pirko, Leon Romanovsky, Maor Gottlieb, vfalico, andy,
	netdev, Saeed Mahameed, Jason Gunthorpe, Jiri Pirko,
	Alex Rosenbaum, davem, Mark Zhang, Parav Pandit

David Ahern <dsahern@gmail.com> wrote:

>On 1/15/20 9:48 AM, Jiri Pirko wrote:
>>>
>>> Right now, we have one of two options:
>>> 1. One-to-one copy/paste of that bond_xmit function to RDMA.
>>> 2. Add EXPORT_SYMBOL and call from RDMA.
>>>
>>> Do you have another solution to our undesire to do copy/paste in mind?
>> 
>> I presented it in this thread.

	Having the output of bond_xmit_hash would only allow matching
the egress slave in the RDMA code to the bonding selection if the RDMA
code also peeks into the bonding internal data structures to look at
bond->slave_arr.

	This is true whether it's EXPORTed or duplicated.

>Something similar is needed for xdp and not necessarily tied to a
>specific bond mode. Some time back I was using this as a prototype:
>
>https://github.com/dsahern/linux/commit/2714abc1e629613e3485b7aa860fa3096e273cb2
>
>It is incomplete, but shows the intent - exporting bond_egress_slave for
>use by other code to take a bond device and return an egress leg.

	This seems much less awful, but would it make bonding a
dependency on pretty much everything?

	-J

---
	-Jay Vosburgh, jay.vosburgh@canonical.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Expose bond_xmit_hash function
  2020-01-15 18:04             ` Jay Vosburgh
@ 2020-01-15 18:12               ` David Ahern
  2020-01-15 20:46                 ` Jiri Pirko
  0 siblings, 1 reply; 17+ messages in thread
From: David Ahern @ 2020-01-15 18:12 UTC (permalink / raw)
  To: Jay Vosburgh
  Cc: Jiri Pirko, Leon Romanovsky, Maor Gottlieb, vfalico, andy,
	netdev, Saeed Mahameed, Jason Gunthorpe, Jiri Pirko,
	Alex Rosenbaum, davem, Mark Zhang, Parav Pandit

On 1/15/20 11:04 AM, Jay Vosburgh wrote:
> 
>> Something similar is needed for xdp and not necessarily tied to a
>> specific bond mode. Some time back I was using this as a prototype:
>>
>> https://github.com/dsahern/linux/commit/2714abc1e629613e3485b7aa860fa3096e273cb2
>>
>> It is incomplete, but shows the intent - exporting bond_egress_slave for
>> use by other code to take a bond device and return an egress leg.
> 
> 	This seems much less awful, but would it make bonding a
> dependency on pretty much everything?
> 

The intent is to hide the bond details beyond the general "a bond has
multiple egress paths and we need to pick one". ie., all of the logic
and data structures are still private.

Exporting the function for use by modules is the easy part.

Making it accessible to core code (XDP) means ??? Obviously not a
concern when bond is built in but the usual case is a module. One
solution is to repeat the IPv6 stub format; not great from an indirect
call perspective. I have not followed the work on INDIRECT_CALL to know
if that mitigates the concern about the stub when bond is a module.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Expose bond_xmit_hash function
  2020-01-15 18:12               ` David Ahern
@ 2020-01-15 20:46                 ` Jiri Pirko
  2020-01-15 20:58                   ` David Ahern
  0 siblings, 1 reply; 17+ messages in thread
From: Jiri Pirko @ 2020-01-15 20:46 UTC (permalink / raw)
  To: David Ahern
  Cc: Jay Vosburgh, Leon Romanovsky, Maor Gottlieb, vfalico, andy,
	netdev, Saeed Mahameed, Jason Gunthorpe, Jiri Pirko,
	Alex Rosenbaum, davem, Mark Zhang, Parav Pandit

Wed, Jan 15, 2020 at 07:12:54PM CET, dsahern@gmail.com wrote:
>On 1/15/20 11:04 AM, Jay Vosburgh wrote:
>> 
>>> Something similar is needed for xdp and not necessarily tied to a
>>> specific bond mode. Some time back I was using this as a prototype:
>>>
>>> https://github.com/dsahern/linux/commit/2714abc1e629613e3485b7aa860fa3096e273cb2
>>>
>>> It is incomplete, but shows the intent - exporting bond_egress_slave for
>>> use by other code to take a bond device and return an egress leg.
>> 
>> 	This seems much less awful, but would it make bonding a
>> dependency on pretty much everything?
>> 
>
>The intent is to hide the bond details beyond the general "a bond has
>multiple egress paths and we need to pick one". ie., all of the logic
>and data structures are still private.
>
>Exporting the function for use by modules is the easy part.
>
>Making it accessible to core code (XDP) means ??? Obviously not a
>concern when bond is built in but the usual case is a module. One
>solution is to repeat the IPv6 stub format; not great from an indirect
>call perspective. I have not followed the work on INDIRECT_CALL to know
>if that mitigates the concern about the stub when bond is a module.

Why it can't be an ndo as I previously suggested in this thread? It is
not specific to bond, others might like to fillup this ndo too (team,
ovs, bridge).

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Expose bond_xmit_hash function
  2020-01-15 20:46                 ` Jiri Pirko
@ 2020-01-15 20:58                   ` David Ahern
  0 siblings, 0 replies; 17+ messages in thread
From: David Ahern @ 2020-01-15 20:58 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Jay Vosburgh, Leon Romanovsky, Maor Gottlieb, vfalico, andy,
	netdev, Saeed Mahameed, Jason Gunthorpe, Jiri Pirko,
	Alex Rosenbaum, davem, Mark Zhang, Parav Pandit

On 1/15/20 1:46 PM, Jiri Pirko wrote:
> Wed, Jan 15, 2020 at 07:12:54PM CET, dsahern@gmail.com wrote:
>> On 1/15/20 11:04 AM, Jay Vosburgh wrote:
>>>
>>>> Something similar is needed for xdp and not necessarily tied to a
>>>> specific bond mode. Some time back I was using this as a prototype:
>>>>
>>>> https://github.com/dsahern/linux/commit/2714abc1e629613e3485b7aa860fa3096e273cb2
>>>>
>>>> It is incomplete, but shows the intent - exporting bond_egress_slave for
>>>> use by other code to take a bond device and return an egress leg.
>>>
>>> 	This seems much less awful, but would it make bonding a
>>> dependency on pretty much everything?
>>>
>>
>> The intent is to hide the bond details beyond the general "a bond has
>> multiple egress paths and we need to pick one". ie., all of the logic
>> and data structures are still private.
>>
>> Exporting the function for use by modules is the easy part.
>>
>> Making it accessible to core code (XDP) means ??? Obviously not a
>> concern when bond is built in but the usual case is a module. One
>> solution is to repeat the IPv6 stub format; not great from an indirect
>> call perspective. I have not followed the work on INDIRECT_CALL to know
>> if that mitigates the concern about the stub when bond is a module.
> 
> Why it can't be an ndo as I previously suggested in this thread? It is
> not specific to bond, others might like to fillup this ndo too (team,
> ovs, bridge).
> 

Sure, that is an option to try. I can not remember if I explored that
option previously; too much time has passed.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Expose bond_xmit_hash function
  2020-01-15 14:15     ` Jiri Pirko
  2020-01-15 14:33       ` Leon Romanovsky
@ 2020-01-16 14:42       ` Andy Gospodarek
  2020-01-16 15:55         ` Maor Gottlieb
  1 sibling, 1 reply; 17+ messages in thread
From: Andy Gospodarek @ 2020-01-16 14:42 UTC (permalink / raw)
  To: Jiri Pirko, Maor Gottlieb
  Cc: Maor Gottlieb, j.vosburgh, vfalico, netdev, Saeed Mahameed,
	Jason Gunthorpe, Leon Romanovsky, Jiri Pirko, Alex Rosenbaum,
	davem, Mark Zhang, Parav Pandit

On Wed, Jan 15, 2020 at 03:15:35PM +0100, Jiri Pirko wrote:
> Wed, Jan 15, 2020 at 02:04:49PM CET, maorg@mellanox.com wrote:
> >
> >On 1/15/2020 11:45 AM, Jiri Pirko wrote:
> >> Wed, Jan 15, 2020 at 09:01:43AM CET, maorg@mellanox.com wrote:
> >>> RDMA over Converged Ethernet (RoCE) is a standard protocol which enables
> >>> RDMA’s efficient data transfer over Ethernet networks allowing transport
> >>> offload with hardware RDMA engine implementation.
> >>> The RoCE v2 protocol exists on top of either the UDP/IPv4 or the
> >>> UDP/IPv6 protocol:
> >>>
> >>> --------------------------------------------------------------
> >>> | L2 | L3 | UDP |IB BTH | Payload| ICRC | FCS |
> >>> --------------------------------------------------------------
> >>>
> >>> When a bond LAG netdev is in use, we would like to have the same hash
> >>> result for RoCE packets as any other UDP packets, for this purpose we
> >>> need to expose the bond_xmit_hash function to external modules.
> >>> If no objection, I will push a patch that export this symbol.
> >> I don't think it is good idea to do it. It is an internal bond function.
> >> it even accepts "struct bonding *bond". Do you plan to push netdev
> >> struct as an arg instead? What about team? What about OVS bonding?
> >
> >No, I am planning to pass the bond struct as an arg. Currently, team 
> 
> Hmm, that would be ofcourse wrong, as it is internal bonding driver
> structure.
> 
> 
> >bonding is not supported in RoCE LAG and I don't see how OVS is related.
> 
> Should work for all. OVS is related in a sense that you can do bonding
> there too.
> 
> 
> >
> >>
> >> Also, you don't really need a hash, you need a slave that is going to be
> >> used for a packet xmit.
> >>
> >> I think this could work in a generic way:
> >>
> >> struct net_device *master_xmit_slave_get(struct net_device *master_dev,
> >> 					 struct sk_buff *skb);
> >
> >The suggestion is to put this function in the bond driver and call it 
> >instead of bond_xmit_hash? is it still necessary if I have the bond pointer?
> 
> No. This should be in a generic code. No direct calls down to bonding
> driver please. Or do you want to load bonding module every time your
> module loads?
> 
> I thinks this can be implemented with ndo with "master_xmit_slave_get()"
> as a wrapper. Masters that support this would just implement the ndo.

In general I think this is a good idea (though maybe not with an skb as
an arg so we can use it easily within BPF), but I'm not sure if solves
the problem that Maor et al were setting out to solve.

Maor, if you did export bond_xmit_hash() to be used by another driver,
you would presumably have a check in place so if the RoCE and UDP
packets had a different hash function output you would make a change and
be sure that the UDP frames would go out on the same device that the
RoCE traffic would normally use.  Is this correct?  Would you also send
the frames directly on the interface using dev_queue_xmit() and bypass
the bonding driver completely?

I don't think I fundamentally have a problem with this, I just want to
make sure I understand your proposed code-flow.

Thanks!


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Expose bond_xmit_hash function
  2020-01-16 14:42       ` Andy Gospodarek
@ 2020-01-16 15:55         ` Maor Gottlieb
  2020-01-16 16:00           ` Jay Vosburgh
  0 siblings, 1 reply; 17+ messages in thread
From: Maor Gottlieb @ 2020-01-16 15:55 UTC (permalink / raw)
  To: Andy Gospodarek, Jiri Pirko
  Cc: j.vosburgh, vfalico, netdev, Saeed Mahameed, Jason Gunthorpe,
	Leon Romanovsky, Jiri Pirko, Alex Rosenbaum, davem, Mark Zhang,
	Parav Pandit


On 1/16/2020 4:42 PM, Andy Gospodarek wrote:
> On Wed, Jan 15, 2020 at 03:15:35PM +0100, Jiri Pirko wrote:
>> Wed, Jan 15, 2020 at 02:04:49PM CET, maorg@mellanox.com wrote:
>>> On 1/15/2020 11:45 AM, Jiri Pirko wrote:
>>>> Wed, Jan 15, 2020 at 09:01:43AM CET, maorg@mellanox.com wrote:
>>>>> RDMA over Converged Ethernet (RoCE) is a standard protocol which enables
>>>>> RDMA’s efficient data transfer over Ethernet networks allowing transport
>>>>> offload with hardware RDMA engine implementation.
>>>>> The RoCE v2 protocol exists on top of either the UDP/IPv4 or the
>>>>> UDP/IPv6 protocol:
>>>>>
>>>>> --------------------------------------------------------------
>>>>> | L2 | L3 | UDP |IB BTH | Payload| ICRC | FCS |
>>>>> --------------------------------------------------------------
>>>>>
>>>>> When a bond LAG netdev is in use, we would like to have the same hash
>>>>> result for RoCE packets as any other UDP packets, for this purpose we
>>>>> need to expose the bond_xmit_hash function to external modules.
>>>>> If no objection, I will push a patch that export this symbol.
>>>> I don't think it is good idea to do it. It is an internal bond function.
>>>> it even accepts "struct bonding *bond". Do you plan to push netdev
>>>> struct as an arg instead? What about team? What about OVS bonding?
>>> No, I am planning to pass the bond struct as an arg. Currently, team
>> Hmm, that would be ofcourse wrong, as it is internal bonding driver
>> structure.
>>
>>
>>> bonding is not supported in RoCE LAG and I don't see how OVS is related.
>> Should work for all. OVS is related in a sense that you can do bonding
>> there too.
>>
>>
>>>> Also, you don't really need a hash, you need a slave that is going to be
>>>> used for a packet xmit.
>>>>
>>>> I think this could work in a generic way:
>>>>
>>>> struct net_device *master_xmit_slave_get(struct net_device *master_dev,
>>>> 					 struct sk_buff *skb);
>>> The suggestion is to put this function in the bond driver and call it
>>> instead of bond_xmit_hash? is it still necessary if I have the bond pointer?
>> No. This should be in a generic code. No direct calls down to bonding
>> driver please. Or do you want to load bonding module every time your
>> module loads?
>>
>> I thinks this can be implemented with ndo with "master_xmit_slave_get()"
>> as a wrapper. Masters that support this would just implement the ndo.
> In general I think this is a good idea (though maybe not with an skb as
> an arg so we can use it easily within BPF), but I'm not sure if solves
> the problem that Maor et al were setting out to solve.
>
> Maor, if you did export bond_xmit_hash() to be used by another driver,
> you would presumably have a check in place so if the RoCE and UDP
> packets had a different hash function output you would make a change and
> be sure that the UDP frames would go out on the same device that the
> RoCE traffic would normally use.  Is this correct?  Would you also send
> the frames directly on the interface using dev_queue_xmit() and bypass
> the bonding driver completely?

RoCE packets are UDP. The idea is that the same UDP header (RoCE as 
well) will get the same hash result so they will be transmitted from the 
same port.
The frames will be sent by using the RDMA send API and bypass the 
bonding driver completely.
Is it answer your question?

> I don't think I fundamentally have a problem with this, I just want to
> make sure I understand your proposed code-flow.
>
> Thanks!
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Expose bond_xmit_hash function
  2020-01-16 15:55         ` Maor Gottlieb
@ 2020-01-16 16:00           ` Jay Vosburgh
  2020-01-19 14:52             ` Maor Gottlieb
  0 siblings, 1 reply; 17+ messages in thread
From: Jay Vosburgh @ 2020-01-16 16:00 UTC (permalink / raw)
  To: Maor Gottlieb
  Cc: Andy Gospodarek, Jiri Pirko, vfalico, netdev, Saeed Mahameed,
	Jason Gunthorpe, Leon Romanovsky, Jiri Pirko, Alex Rosenbaum,
	davem, Mark Zhang, Parav Pandit

Maor Gottlieb <maorg@mellanox.com> wrote:

>
>On 1/16/2020 4:42 PM, Andy Gospodarek wrote:
>> On Wed, Jan 15, 2020 at 03:15:35PM +0100, Jiri Pirko wrote:
>>> Wed, Jan 15, 2020 at 02:04:49PM CET, maorg@mellanox.com wrote:
>>>> On 1/15/2020 11:45 AM, Jiri Pirko wrote:
>>>>> Wed, Jan 15, 2020 at 09:01:43AM CET, maorg@mellanox.com wrote:
>>>>>> RDMA over Converged Ethernet (RoCE) is a standard protocol which enables
>>>>>> RDMA’s efficient data transfer over Ethernet networks allowing transport
>>>>>> offload with hardware RDMA engine implementation.
>>>>>> The RoCE v2 protocol exists on top of either the UDP/IPv4 or the
>>>>>> UDP/IPv6 protocol:
>>>>>>
>>>>>> --------------------------------------------------------------
>>>>>> | L2 | L3 | UDP |IB BTH | Payload| ICRC | FCS |
>>>>>> --------------------------------------------------------------
>>>>>>
>>>>>> When a bond LAG netdev is in use, we would like to have the same hash
>>>>>> result for RoCE packets as any other UDP packets, for this purpose we
>>>>>> need to expose the bond_xmit_hash function to external modules.
>>>>>> If no objection, I will push a patch that export this symbol.
>>>>> I don't think it is good idea to do it. It is an internal bond function.
>>>>> it even accepts "struct bonding *bond". Do you plan to push netdev
>>>>> struct as an arg instead? What about team? What about OVS bonding?
>>>> No, I am planning to pass the bond struct as an arg. Currently, team
>>> Hmm, that would be ofcourse wrong, as it is internal bonding driver
>>> structure.
>>>
>>>
>>>> bonding is not supported in RoCE LAG and I don't see how OVS is related.
>>> Should work for all. OVS is related in a sense that you can do bonding
>>> there too.
>>>
>>>
>>>>> Also, you don't really need a hash, you need a slave that is going to be
>>>>> used for a packet xmit.
>>>>>
>>>>> I think this could work in a generic way:
>>>>>
>>>>> struct net_device *master_xmit_slave_get(struct net_device *master_dev,
>>>>> 					 struct sk_buff *skb);
>>>> The suggestion is to put this function in the bond driver and call it
>>>> instead of bond_xmit_hash? is it still necessary if I have the bond pointer?
>>> No. This should be in a generic code. No direct calls down to bonding
>>> driver please. Or do you want to load bonding module every time your
>>> module loads?
>>>
>>> I thinks this can be implemented with ndo with "master_xmit_slave_get()"
>>> as a wrapper. Masters that support this would just implement the ndo.
>> In general I think this is a good idea (though maybe not with an skb as
>> an arg so we can use it easily within BPF), but I'm not sure if solves
>> the problem that Maor et al were setting out to solve.
>>
>> Maor, if you did export bond_xmit_hash() to be used by another driver,
>> you would presumably have a check in place so if the RoCE and UDP
>> packets had a different hash function output you would make a change and
>> be sure that the UDP frames would go out on the same device that the
>> RoCE traffic would normally use.  Is this correct?  Would you also send
>> the frames directly on the interface using dev_queue_xmit() and bypass
>> the bonding driver completely?
>
>RoCE packets are UDP. The idea is that the same UDP header (RoCE as 
>well) will get the same hash result so they will be transmitted from the 
>same port.
>The frames will be sent by using the RDMA send API and bypass the 
>bonding driver completely.
>Is it answer your question?

	If the RDMA send bypasses bonding, how will you insure that the
same hash result maps to the same underlying interface for both bonding
and RDMA?

	-J

>> I don't think I fundamentally have a problem with this, I just want to
>> make sure I understand your proposed code-flow.
>>
>> Thanks!
>>

---
	-Jay Vosburgh, jay.vosburgh@canonical.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Expose bond_xmit_hash function
  2020-01-16 16:00           ` Jay Vosburgh
@ 2020-01-19 14:52             ` Maor Gottlieb
  2020-01-20 18:43               ` Jay Vosburgh
  0 siblings, 1 reply; 17+ messages in thread
From: Maor Gottlieb @ 2020-01-19 14:52 UTC (permalink / raw)
  To: Jay Vosburgh
  Cc: Andy Gospodarek, Jiri Pirko, vfalico, netdev, Saeed Mahameed,
	Jason Gunthorpe, Leon Romanovsky, Jiri Pirko, Alex Rosenbaum,
	davem, Mark Zhang, Parav Pandit


On 1/16/2020 6:00 PM, Jay Vosburgh wrote:
> Maor Gottlieb <maorg@mellanox.com> wrote:
>
>> On 1/16/2020 4:42 PM, Andy Gospodarek wrote:
>>> On Wed, Jan 15, 2020 at 03:15:35PM +0100, Jiri Pirko wrote:
>>>> Wed, Jan 15, 2020 at 02:04:49PM CET, maorg@mellanox.com wrote:
>>>>> On 1/15/2020 11:45 AM, Jiri Pirko wrote:
>>>>>> Wed, Jan 15, 2020 at 09:01:43AM CET, maorg@mellanox.com wrote:
>>>>>>> RDMA over Converged Ethernet (RoCE) is a standard protocol which enables
>>>>>>> RDMA’s efficient data transfer over Ethernet networks allowing transport
>>>>>>> offload with hardware RDMA engine implementation.
>>>>>>> The RoCE v2 protocol exists on top of either the UDP/IPv4 or the
>>>>>>> UDP/IPv6 protocol:
>>>>>>>
>>>>>>> --------------------------------------------------------------
>>>>>>> | L2 | L3 | UDP |IB BTH | Payload| ICRC | FCS |
>>>>>>> --------------------------------------------------------------
>>>>>>>
>>>>>>> When a bond LAG netdev is in use, we would like to have the same hash
>>>>>>> result for RoCE packets as any other UDP packets, for this purpose we
>>>>>>> need to expose the bond_xmit_hash function to external modules.
>>>>>>> If no objection, I will push a patch that export this symbol.
>>>>>> I don't think it is good idea to do it. It is an internal bond function.
>>>>>> it even accepts "struct bonding *bond". Do you plan to push netdev
>>>>>> struct as an arg instead? What about team? What about OVS bonding?
>>>>> No, I am planning to pass the bond struct as an arg. Currently, team
>>>> Hmm, that would be ofcourse wrong, as it is internal bonding driver
>>>> structure.
>>>>
>>>>
>>>>> bonding is not supported in RoCE LAG and I don't see how OVS is related.
>>>> Should work for all. OVS is related in a sense that you can do bonding
>>>> there too.
>>>>
>>>>
>>>>>> Also, you don't really need a hash, you need a slave that is going to be
>>>>>> used for a packet xmit.
>>>>>>
>>>>>> I think this could work in a generic way:
>>>>>>
>>>>>> struct net_device *master_xmit_slave_get(struct net_device *master_dev,
>>>>>> 					 struct sk_buff *skb);
>>>>> The suggestion is to put this function in the bond driver and call it
>>>>> instead of bond_xmit_hash? is it still necessary if I have the bond pointer?
>>>> No. This should be in a generic code. No direct calls down to bonding
>>>> driver please. Or do you want to load bonding module every time your
>>>> module loads?
>>>>
>>>> I thinks this can be implemented with ndo with "master_xmit_slave_get()"
>>>> as a wrapper. Masters that support this would just implement the ndo.
>>> In general I think this is a good idea (though maybe not with an skb as
>>> an arg so we can use it easily within BPF), but I'm not sure if solves
>>> the problem that Maor et al were setting out to solve.
>>>
>>> Maor, if you did export bond_xmit_hash() to be used by another driver,
>>> you would presumably have a check in place so if the RoCE and UDP
>>> packets had a different hash function output you would make a change and
>>> be sure that the UDP frames would go out on the same device that the
>>> RoCE traffic would normally use.  Is this correct?  Would you also send
>>> the frames directly on the interface using dev_queue_xmit() and bypass
>>> the bonding driver completely?
>> RoCE packets are UDP. The idea is that the same UDP header (RoCE as
>> well) will get the same hash result so they will be transmitted from the
>> same port.
>> The frames will be sent by using the RDMA send API and bypass the
>> bonding driver completely.
>> Is it answer your question?
> 	If the RDMA send bypasses bonding, how will you insure that the
> same hash result maps to the same underlying interface for both bonding
> and RDMA?
>
> 	-J

In RoCE, the affinity is determined while the HW resources are created 
and will be modified globally in run time to track the active salves.
If we get the slave result, all the UDP packets will have the same 
affinity as RoCE.
The downside is that all RoCE HW resources will be stuck with the 
original affinity port and not move to the re-activate slave once it 
goes up. Another disadvantage, when both ports are down, we still need 
to create the RoCE HW resources with a given port affinity. Both 
problems are solved by exporting the hash.

>
>>> I don't think I fundamentally have a problem with this, I just want to
>>> make sure I understand your proposed code-flow.
>>>
>>> Thanks!
>>>
> ---
> 	-Jay Vosburgh, jay.vosburgh@canonical.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Expose bond_xmit_hash function
  2020-01-19 14:52             ` Maor Gottlieb
@ 2020-01-20 18:43               ` Jay Vosburgh
  2020-01-22  7:53                 ` Maor Gottlieb
  0 siblings, 1 reply; 17+ messages in thread
From: Jay Vosburgh @ 2020-01-20 18:43 UTC (permalink / raw)
  To: Maor Gottlieb
  Cc: Andy Gospodarek, Jiri Pirko, vfalico, netdev, Saeed Mahameed,
	Jason Gunthorpe, Leon Romanovsky, Jiri Pirko, Alex Rosenbaum,
	davem, Mark Zhang, Parav Pandit

Maor Gottlieb <maorg@mellanox.com> wrote:

>
>On 1/16/2020 6:00 PM, Jay Vosburgh wrote:
>> Maor Gottlieb <maorg@mellanox.com> wrote:
>>
>>> On 1/16/2020 4:42 PM, Andy Gospodarek wrote:
>>>> On Wed, Jan 15, 2020 at 03:15:35PM +0100, Jiri Pirko wrote:
>>>>> Wed, Jan 15, 2020 at 02:04:49PM CET, maorg@mellanox.com wrote:
>>>>>> On 1/15/2020 11:45 AM, Jiri Pirko wrote:
>>>>>>> Wed, Jan 15, 2020 at 09:01:43AM CET, maorg@mellanox.com wrote:
>>>>>>>> RDMA over Converged Ethernet (RoCE) is a standard protocol which enables
>>>>>>>> RDMA’s efficient data transfer over Ethernet networks allowing transport
>>>>>>>> offload with hardware RDMA engine implementation.
>>>>>>>> The RoCE v2 protocol exists on top of either the UDP/IPv4 or the
>>>>>>>> UDP/IPv6 protocol:
>>>>>>>>
>>>>>>>> --------------------------------------------------------------
>>>>>>>> | L2 | L3 | UDP |IB BTH | Payload| ICRC | FCS |
>>>>>>>> --------------------------------------------------------------
>>>>>>>>
>>>>>>>> When a bond LAG netdev is in use, we would like to have the same hash
>>>>>>>> result for RoCE packets as any other UDP packets, for this purpose we
>>>>>>>> need to expose the bond_xmit_hash function to external modules.
>>>>>>>> If no objection, I will push a patch that export this symbol.
>>>>>>> I don't think it is good idea to do it. It is an internal bond function.
>>>>>>> it even accepts "struct bonding *bond". Do you plan to push netdev
>>>>>>> struct as an arg instead? What about team? What about OVS bonding?
>>>>>> No, I am planning to pass the bond struct as an arg. Currently, team
>>>>> Hmm, that would be ofcourse wrong, as it is internal bonding driver
>>>>> structure.
>>>>>
>>>>>
>>>>>> bonding is not supported in RoCE LAG and I don't see how OVS is related.
>>>>> Should work for all. OVS is related in a sense that you can do bonding
>>>>> there too.
>>>>>
>>>>>
>>>>>>> Also, you don't really need a hash, you need a slave that is going to be
>>>>>>> used for a packet xmit.
>>>>>>>
>>>>>>> I think this could work in a generic way:
>>>>>>>
>>>>>>> struct net_device *master_xmit_slave_get(struct net_device *master_dev,
>>>>>>> 					 struct sk_buff *skb);
>>>>>> The suggestion is to put this function in the bond driver and call it
>>>>>> instead of bond_xmit_hash? is it still necessary if I have the bond pointer?
>>>>> No. This should be in a generic code. No direct calls down to bonding
>>>>> driver please. Or do you want to load bonding module every time your
>>>>> module loads?
>>>>>
>>>>> I thinks this can be implemented with ndo with "master_xmit_slave_get()"
>>>>> as a wrapper. Masters that support this would just implement the ndo.
>>>> In general I think this is a good idea (though maybe not with an skb as
>>>> an arg so we can use it easily within BPF), but I'm not sure if solves
>>>> the problem that Maor et al were setting out to solve.
>>>>
>>>> Maor, if you did export bond_xmit_hash() to be used by another driver,
>>>> you would presumably have a check in place so if the RoCE and UDP
>>>> packets had a different hash function output you would make a change and
>>>> be sure that the UDP frames would go out on the same device that the
>>>> RoCE traffic would normally use.  Is this correct?  Would you also send
>>>> the frames directly on the interface using dev_queue_xmit() and bypass
>>>> the bonding driver completely?
>>> RoCE packets are UDP. The idea is that the same UDP header (RoCE as
>>> well) will get the same hash result so they will be transmitted from the
>>> same port.
>>> The frames will be sent by using the RDMA send API and bypass the
>>> bonding driver completely.
>>> Is it answer your question?
>> 	If the RDMA send bypasses bonding, how will you insure that the
>> same hash result maps to the same underlying interface for both bonding
>> and RDMA?
>>
>> 	-J
>
>In RoCE, the affinity is determined while the HW resources are created 
>and will be modified globally in run time to track the active salves.
>If we get the slave result, all the UDP packets will have the same 
>affinity as RoCE.

	How do you "track the active slaves?"

	What I want to know is whether or not the RoCE code is going to
peek into the bonding internal data structures to look at, e.g.,
bond->slave_arr.  I don't see an obvious way to insure a duplication of
bonding's hash to slave mapping without knowing the placement of the
slaves in slave_arr.

>The downside is that all RoCE HW resources will be stuck with the 
>original affinity port and not move to the re-activate slave once it 
>goes up. Another disadvantage, when both ports are down, we still need 
>to create the RoCE HW resources with a given port affinity. Both 
>problems are solved by exporting the hash.

	This sounds like a different explanation; above, you said things
are "modified globally in run time" but here the mappings are static.  I
don't see an explanation of how having the hash solves these problems,
or why it would be better than Jiri's suggestion of an .ndo or generic
wrapper.

	-J

---
	-Jay Vosburgh, jay.vosburgh@canonical.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Expose bond_xmit_hash function
  2020-01-20 18:43               ` Jay Vosburgh
@ 2020-01-22  7:53                 ` Maor Gottlieb
  0 siblings, 0 replies; 17+ messages in thread
From: Maor Gottlieb @ 2020-01-22  7:53 UTC (permalink / raw)
  To: Jay Vosburgh
  Cc: Andy Gospodarek, Jiri Pirko, vfalico, netdev, Saeed Mahameed,
	Jason Gunthorpe, Leon Romanovsky, Jiri Pirko, Alex Rosenbaum,
	davem, Mark Zhang, Parav Pandit


On 1/20/2020 8:43 PM, Jay Vosburgh wrote:
> Maor Gottlieb <maorg@mellanox.com> wrote:
>
>> On 1/16/2020 6:00 PM, Jay Vosburgh wrote:
>>> Maor Gottlieb <maorg@mellanox.com> wrote:
>>>
>>>> On 1/16/2020 4:42 PM, Andy Gospodarek wrote:
>>>>> On Wed, Jan 15, 2020 at 03:15:35PM +0100, Jiri Pirko wrote:
>>>>>> Wed, Jan 15, 2020 at 02:04:49PM CET, maorg@mellanox.com wrote:
>>>>>>> On 1/15/2020 11:45 AM, Jiri Pirko wrote:
>>>>>>>> Wed, Jan 15, 2020 at 09:01:43AM CET, maorg@mellanox.com wrote:
>>>>>>>>> RDMA over Converged Ethernet (RoCE) is a standard protocol which enables
>>>>>>>>> RDMA’s efficient data transfer over Ethernet networks allowing transport
>>>>>>>>> offload with hardware RDMA engine implementation.
>>>>>>>>> The RoCE v2 protocol exists on top of either the UDP/IPv4 or the
>>>>>>>>> UDP/IPv6 protocol:
>>>>>>>>>
>>>>>>>>> --------------------------------------------------------------
>>>>>>>>> | L2 | L3 | UDP |IB BTH | Payload| ICRC | FCS |
>>>>>>>>> --------------------------------------------------------------
>>>>>>>>>
>>>>>>>>> When a bond LAG netdev is in use, we would like to have the same hash
>>>>>>>>> result for RoCE packets as any other UDP packets, for this purpose we
>>>>>>>>> need to expose the bond_xmit_hash function to external modules.
>>>>>>>>> If no objection, I will push a patch that export this symbol.
>>>>>>>> I don't think it is good idea to do it. It is an internal bond function.
>>>>>>>> it even accepts "struct bonding *bond". Do you plan to push netdev
>>>>>>>> struct as an arg instead? What about team? What about OVS bonding?
>>>>>>> No, I am planning to pass the bond struct as an arg. Currently, team
>>>>>> Hmm, that would be ofcourse wrong, as it is internal bonding driver
>>>>>> structure.
>>>>>>
>>>>>>
>>>>>>> bonding is not supported in RoCE LAG and I don't see how OVS is related.
>>>>>> Should work for all. OVS is related in a sense that you can do bonding
>>>>>> there too.
>>>>>>
>>>>>>
>>>>>>>> Also, you don't really need a hash, you need a slave that is going to be
>>>>>>>> used for a packet xmit.
>>>>>>>>
>>>>>>>> I think this could work in a generic way:
>>>>>>>>
>>>>>>>> struct net_device *master_xmit_slave_get(struct net_device *master_dev,
>>>>>>>> 					 struct sk_buff *skb);
>>>>>>> The suggestion is to put this function in the bond driver and call it
>>>>>>> instead of bond_xmit_hash? is it still necessary if I have the bond pointer?
>>>>>> No. This should be in a generic code. No direct calls down to bonding
>>>>>> driver please. Or do you want to load bonding module every time your
>>>>>> module loads?
>>>>>>
>>>>>> I thinks this can be implemented with ndo with "master_xmit_slave_get()"
>>>>>> as a wrapper. Masters that support this would just implement the ndo.
>>>>> In general I think this is a good idea (though maybe not with an skb as
>>>>> an arg so we can use it easily within BPF), but I'm not sure if solves
>>>>> the problem that Maor et al were setting out to solve.
>>>>>
>>>>> Maor, if you did export bond_xmit_hash() to be used by another driver,
>>>>> you would presumably have a check in place so if the RoCE and UDP
>>>>> packets had a different hash function output you would make a change and
>>>>> be sure that the UDP frames would go out on the same device that the
>>>>> RoCE traffic would normally use.  Is this correct?  Would you also send
>>>>> the frames directly on the interface using dev_queue_xmit() and bypass
>>>>> the bonding driver completely?
>>>> RoCE packets are UDP. The idea is that the same UDP header (RoCE as
>>>> well) will get the same hash result so they will be transmitted from the
>>>> same port.
>>>> The frames will be sent by using the RDMA send API and bypass the
>>>> bonding driver completely.
>>>> Is it answer your question?
>>> 	If the RDMA send bypasses bonding, how will you insure that the
>>> same hash result maps to the same underlying interface for both bonding
>>> and RDMA?
>>>
>>> 	-J
>> In RoCE, the affinity is determined while the HW resources are created
>> and will be modified globally in run time to track the active salves.
>> If we get the slave result, all the UDP packets will have the same
>> affinity as RoCE.
> 	How do you "track the active slaves?"
>
> 	What I want to know is whether or not the RoCE code is going to
> peek into the bonding internal data structures to look at, e.g.,
> bond->slave_arr.  I don't see an obvious way to insure a duplication of
> bonding's hash to slave mapping without knowing the placement of the
> slaves in slave_arr.
>
>> The downside is that all RoCE HW resources will be stuck with the
>> original affinity port and not move to the re-activate slave once it
>> goes up. Another disadvantage, when both ports are down, we still need
>> to create the RoCE HW resources with a given port affinity. Both
>> problems are solved by exporting the hash.
> 	This sounds like a different explanation; above, you said things
> are "modified globally in run time" but here the mappings are static.  I
> don't see an explanation of how having the hash solves these problems,
> or why it would be better than Jiri's suggestion of an .ndo or generic
> wrapper.
>
> 	-J


I will try to clarify it.
In RoCE, the tx affinity is selected in the control path.  In case that 
port is going down, then the port could be remapped to active port.
If I have the hash result, then I can do modulo with the number of the 
device ports, no matter if the port is active or not. Outgoing traffic 
will go through this port unless the port was remapped. So I can 
guarantee that the same RoCE packets will be transmitted from the same 
port, but because we are not going to look into the slave_arr, it isn't 
insured that same RoCE and non-RoCE UDP packets will have the same affinity.

What about extending the .ndo suggestion to get a flag that returns the 
tx slave assume all slaves are available?

>
> ---
> 	-Jay Vosburgh, jay.vosburgh@canonical.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2020-01-22  7:53 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-15  8:01 Expose bond_xmit_hash function Maor Gottlieb
2020-01-15  9:45 ` Jiri Pirko
2020-01-15 13:04   ` Maor Gottlieb
2020-01-15 14:15     ` Jiri Pirko
2020-01-15 14:33       ` Leon Romanovsky
2020-01-15 16:48         ` Jiri Pirko
2020-01-15 17:34           ` David Ahern
2020-01-15 18:04             ` Jay Vosburgh
2020-01-15 18:12               ` David Ahern
2020-01-15 20:46                 ` Jiri Pirko
2020-01-15 20:58                   ` David Ahern
2020-01-16 14:42       ` Andy Gospodarek
2020-01-16 15:55         ` Maor Gottlieb
2020-01-16 16:00           ` Jay Vosburgh
2020-01-19 14:52             ` Maor Gottlieb
2020-01-20 18:43               ` Jay Vosburgh
2020-01-22  7:53                 ` Maor Gottlieb

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).