All of lore.kernel.org
 help / color / mirror / Atom feed
* RE: Re: [PATCH for-next v7 10/12] RDMA/erdma: Add the erdma module
@ 2022-05-20 15:13 Bernard Metzler
  2022-05-23  1:39 ` Cheng Xu
  0 siblings, 1 reply; 4+ messages in thread
From: Bernard Metzler @ 2022-05-20 15:13 UTC (permalink / raw)
  To: Cheng Xu, Jason Gunthorpe, Tom Talpey
  Cc: dledford, leon, linux-rdma, KaiShen, tonylu


> -----Original Message-----
> From: Cheng Xu <chengyou@linux.alibaba.com>
> Sent: Friday, 20 May 2022 09:04
> To: Bernard Metzler <BMT@zurich.ibm.com>; Jason Gunthorpe
> <jgg@nvidia.com>; Tom Talpey <tom@talpey.com>
> Cc: dledford@redhat.com; leon@kernel.org; linux-rdma@vger.kernel.org;
> KaiShen@linux.alibaba.com; tonylu@linux.alibaba.com
> Subject: [EXTERNAL] Re: [PATCH for-next v7 10/12] RDMA/erdma: Add the
> erdma module
> 
> 
> 
> On 5/20/22 12:20 AM, Bernard Metzler wrote:
> >
> >
> 
> <...>
> 
> >>> As far as I know, iWarp device only has one GID entry which
> generated
> >>> from MAC address.
> >>>
> >>> For iWarp, The CM part in core code resolves address, finds
> >>> route with the help of kernel's net subsystem, and then obtains the
> >> correct
> >>> ibdev by GID matching. The GID matching in iWarp is indeed MAC
> address
> >>> matching.
> >>>
> >>> In another words, for iWarp devices, the core code doesn't handle IP
> >>> addressing related stuff directly, it is finished by calling net
> APIs.
> >>> The netdev set by ib_device_set_netdev does not used in iWarp's CM
> >>> process.
> >>>
> >>> The binded netdev in iWarp devices, mainly have two purposes:
> >>>    1). generated GID0, using the netdev's mac address.
> >>>    2). get the port state and attributes.
> >>>
> >>> For 1), erdma device binded to net device also by mac address, which
> can
> >>> be obtained from our PCIe bar registers.
> >>> For 2), erdma can also get the information, and may be more
> accurately.
> >>> For example, erdma can have different MTU with virtio-net in our
> cloud.
> >>>
> >>> For RoCEv2, I know that it has many GIDs, some of them are generated
> >>> from IP addresses, and handing IP addressing in core code.
> >>
> >> Bernard, Tom what do you think?
> >>
> >> Jason
> >
> > I think iWarp (and now RoCEv2 with its UDP dependency) drivers
> > produce GIDs mostly to satisfy the current RDMA CM infrastructure,
> > which depends on this type of unique identifier, inherited from IB.
> > Imo, more natural would be to implement IP based RDMA protocols
> > connection management by relying on IP addresses.
> >
> > Sorry for asking again - why erdma does not need to link with netdev?
> > Can erdma exist without using a netdev?
> 
> Actually erdma also need a net device binded to, and so does it.
> 
> These days I’m trying to find out acceptable ways to get the reference
> of the binded netdev, e,g, the 'struct net_device' pointer. Unlike other
> RDMA drivers can get the reference of their binded netdevs' reference
> easily (most RDMA devices are based on the extended aux devices), it is
> a little more complex for erdma, because erdma and its binded net device
> are two separated PCIe devices.
> 
> Then I find that the netdev reference hold in ibdev is rarely used
> in core code for iWarp deivces, GID0 is the key attribute (As you and
> Tom mentioned, it appears with the historical need for compatibility,
> but I think this is another story).
> 

Yes, I think this is right.

If you are saying you can go away with a NULL netdev at CM core, then
I think that's fine?
Of course the erdma driver must somehow keep track of the state of
its associated network device - like catching up with link status -
and must provide related information/events to the RDMA core.

> So, there are two choices for erdma: enum net devices and find the
> matched one, or never calling ib_device_set_netdev. The second one has
> less code.
> 
> The second way can't work in ROCE. But it works for iWarp (I've tested),
> since the netdev reference is rarely used for iWarp in core code, as I
> said in last reply.
> 
> In short, the question discussed here is that: is it acceptable that
> doesn't hold the netdev reference in core code for a iWarp driver
> (indeed it has a netdev binded to) ? Or is it necessary that calling
> ib_device_set_netdev to set the binded netdev for iWarp driver?
> 
> You and Tom both are specialists in iWarp, your opinions are important.
> 
> Thanks very much
> Cheng Xu
> 
> 
> >
> > Thanks,
> > Bernard.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH for-next v7 10/12] RDMA/erdma: Add the erdma module
  2022-05-20 15:13 Re: [PATCH for-next v7 10/12] RDMA/erdma: Add the erdma module Bernard Metzler
@ 2022-05-23  1:39 ` Cheng Xu
  2022-05-23 13:25   ` Tom Talpey
  0 siblings, 1 reply; 4+ messages in thread
From: Cheng Xu @ 2022-05-23  1:39 UTC (permalink / raw)
  To: Bernard Metzler, Jason Gunthorpe, Tom Talpey
  Cc: dledford, leon, linux-rdma, KaiShen, tonylu



On 5/20/22 11:13 PM, Bernard Metzler wrote:
> 
>> -----Original Message-----
>> From: Cheng Xu <chengyou@linux.alibaba.com>
>> Sent: Friday, 20 May 2022 09:04
>> To: Bernard Metzler <BMT@zurich.ibm.com>; Jason Gunthorpe
>> <jgg@nvidia.com>; Tom Talpey <tom@talpey.com>
>> Cc: dledford@redhat.com; leon@kernel.org; linux-rdma@vger.kernel.org;
>> KaiShen@linux.alibaba.com; tonylu@linux.alibaba.com
>> Subject: [EXTERNAL] Re: [PATCH for-next v7 10/12] RDMA/erdma: Add the
>> erdma module
>>
>>
>>
>> On 5/20/22 12:20 AM, Bernard Metzler wrote:
>>>
>>>
>>
>> <...>
>>
>>>>> As far as I know, iWarp device only has one GID entry which
>> generated
>>>>> from MAC address.
>>>>>
>>>>> For iWarp, The CM part in core code resolves address, finds
>>>>> route with the help of kernel's net subsystem, and then obtains the
>>>> correct
>>>>> ibdev by GID matching. The GID matching in iWarp is indeed MAC
>> address
>>>>> matching.
>>>>>
>>>>> In another words, for iWarp devices, the core code doesn't handle IP
>>>>> addressing related stuff directly, it is finished by calling net
>> APIs.
>>>>> The netdev set by ib_device_set_netdev does not used in iWarp's CM
>>>>> process.
>>>>>
>>>>> The binded netdev in iWarp devices, mainly have two purposes:
>>>>>     1). generated GID0, using the netdev's mac address.
>>>>>     2). get the port state and attributes.
>>>>>
>>>>> For 1), erdma device binded to net device also by mac address, which
>> can
>>>>> be obtained from our PCIe bar registers.
>>>>> For 2), erdma can also get the information, and may be more
>> accurately.
>>>>> For example, erdma can have different MTU with virtio-net in our
>> cloud.
>>>>>
>>>>> For RoCEv2, I know that it has many GIDs, some of them are generated
>>>>> from IP addresses, and handing IP addressing in core code.
>>>>
>>>> Bernard, Tom what do you think?
>>>>
>>>> Jason
>>>
>>> I think iWarp (and now RoCEv2 with its UDP dependency) drivers
>>> produce GIDs mostly to satisfy the current RDMA CM infrastructure,
>>> which depends on this type of unique identifier, inherited from IB.
>>> Imo, more natural would be to implement IP based RDMA protocols
>>> connection management by relying on IP addresses.
>>>
>>> Sorry for asking again - why erdma does not need to link with netdev?
>>> Can erdma exist without using a netdev?
>>
>> Actually erdma also need a net device binded to, and so does it.
>>
>> These days I’m trying to find out acceptable ways to get the reference
>> of the binded netdev, e,g, the 'struct net_device' pointer. Unlike other
>> RDMA drivers can get the reference of their binded netdevs' reference
>> easily (most RDMA devices are based on the extended aux devices), it is
>> a little more complex for erdma, because erdma and its binded net device
>> are two separated PCIe devices.
>>
>> Then I find that the netdev reference hold in ibdev is rarely used
>> in core code for iWarp deivces, GID0 is the key attribute (As you and
>> Tom mentioned, it appears with the historical need for compatibility,
>> but I think this is another story).
>>
> 
> Yes, I think this is right.
> 
> If you are saying you can go away with a NULL netdev at CM core, then
> I think that's fine?
> Of course the erdma driver must somehow keep track of the state of
> its associated network device - like catching up with link status -
> and must provide related information/events to the RDMA core.
> 

All right, and get it. I'd like to hold the binded netdev reference in
our probe routine, and send v9 patches.

Thanks your time for looking at this, Jason, Bernard and Tom.

Cheng Xu


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH for-next v7 10/12] RDMA/erdma: Add the erdma module
  2022-05-23  1:39 ` Cheng Xu
@ 2022-05-23 13:25   ` Tom Talpey
  2022-05-24  3:09     ` Cheng Xu
  0 siblings, 1 reply; 4+ messages in thread
From: Tom Talpey @ 2022-05-23 13:25 UTC (permalink / raw)
  To: Cheng Xu, Bernard Metzler, Jason Gunthorpe
  Cc: dledford, leon, linux-rdma, KaiShen, tonylu

On 5/22/2022 9:39 PM, Cheng Xu wrote:
> 
> 
> On 5/20/22 11:13 PM, Bernard Metzler wrote:
>>
>>> -----Original Message-----
>>> From: Cheng Xu <chengyou@linux.alibaba.com>
>>> Sent: Friday, 20 May 2022 09:04
>>> To: Bernard Metzler <BMT@zurich.ibm.com>; Jason Gunthorpe
>>> <jgg@nvidia.com>; Tom Talpey <tom@talpey.com>
>>> Cc: dledford@redhat.com; leon@kernel.org; linux-rdma@vger.kernel.org;
>>> KaiShen@linux.alibaba.com; tonylu@linux.alibaba.com
>>> Subject: [EXTERNAL] Re: [PATCH for-next v7 10/12] RDMA/erdma: Add the
>>> erdma module
>>>
>>>
>>>
>>> On 5/20/22 12:20 AM, Bernard Metzler wrote:
>>>>
>>>>
>>>
>>> <...>
>>>
>>>>>> As far as I know, iWarp device only has one GID entry which
>>> generated
>>>>>> from MAC address.
>>>>>>
>>>>>> For iWarp, The CM part in core code resolves address, finds
>>>>>> route with the help of kernel's net subsystem, and then obtains the
>>>>> correct
>>>>>> ibdev by GID matching. The GID matching in iWarp is indeed MAC
>>> address
>>>>>> matching.
>>>>>>
>>>>>> In another words, for iWarp devices, the core code doesn't handle IP
>>>>>> addressing related stuff directly, it is finished by calling net
>>> APIs.
>>>>>> The netdev set by ib_device_set_netdev does not used in iWarp's CM
>>>>>> process.
>>>>>>
>>>>>> The binded netdev in iWarp devices, mainly have two purposes:
>>>>>>     1). generated GID0, using the netdev's mac address.
>>>>>>     2). get the port state and attributes.
>>>>>>
>>>>>> For 1), erdma device binded to net device also by mac address, which
>>> can
>>>>>> be obtained from our PCIe bar registers.
>>>>>> For 2), erdma can also get the information, and may be more
>>> accurately.
>>>>>> For example, erdma can have different MTU with virtio-net in our
>>> cloud.
>>>>>>
>>>>>> For RoCEv2, I know that it has many GIDs, some of them are generated
>>>>>> from IP addresses, and handing IP addressing in core code.
>>>>>
>>>>> Bernard, Tom what do you think?
>>>>>
>>>>> Jason
>>>>
>>>> I think iWarp (and now RoCEv2 with its UDP dependency) drivers
>>>> produce GIDs mostly to satisfy the current RDMA CM infrastructure,
>>>> which depends on this type of unique identifier, inherited from IB.
>>>> Imo, more natural would be to implement IP based RDMA protocols
>>>> connection management by relying on IP addresses.
>>>>
>>>> Sorry for asking again - why erdma does not need to link with netdev?
>>>> Can erdma exist without using a netdev?
>>>
>>> Actually erdma also need a net device binded to, and so does it.
>>>
>>> These days I’m trying to find out acceptable ways to get the reference
>>> of the binded netdev, e,g, the 'struct net_device' pointer. Unlike other
>>> RDMA drivers can get the reference of their binded netdevs' reference
>>> easily (most RDMA devices are based on the extended aux devices), it is
>>> a little more complex for erdma, because erdma and its binded net device
>>> are two separated PCIe devices.
>>>
>>> Then I find that the netdev reference hold in ibdev is rarely used
>>> in core code for iWarp deivces, GID0 is the key attribute (As you and
>>> Tom mentioned, it appears with the historical need for compatibility,
>>> but I think this is another story).
>>>
>>
>> Yes, I think this is right.
>>
>> If you are saying you can go away with a NULL netdev at CM core, then
>> I think that's fine?
>> Of course the erdma driver must somehow keep track of the state of
>> its associated network device - like catching up with link status -
>> and must provide related information/events to the RDMA core.
>>
> 
> All right, and get it. I'd like to hold the binded netdev reference in
> our probe routine, and send v9 patches.

Cheng Xu, by this do you mean you'll drop the reference after the probe?
In addition to watching for link status changes, I'd expect you also
will want to monitor ARP/ND changes, maybe even perform those in the
netdev stack rather than your firmware. Does your adapter not also
function as a normal NIC?

Tom.

> Thanks your time for looking at this, Jason, Bernard and Tom.
> 
> Cheng Xu
> 
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH for-next v7 10/12] RDMA/erdma: Add the erdma module
  2022-05-23 13:25   ` Tom Talpey
@ 2022-05-24  3:09     ` Cheng Xu
  0 siblings, 0 replies; 4+ messages in thread
From: Cheng Xu @ 2022-05-24  3:09 UTC (permalink / raw)
  To: Tom Talpey, Bernard Metzler, Jason Gunthorpe
  Cc: dledford, leon, linux-rdma, KaiShen, tonylu



On 5/23/22 9:25 PM, Tom Talpey wrote:
> On 5/22/2022 9:39 PM, Cheng Xu wrote:
>>
>>
>> On 5/20/22 11:13 PM, Bernard Metzler wrote:
>>>
>>>> -----Original Message-----
>>>> From: Cheng Xu <chengyou@linux.alibaba.com>
>>>> Sent: Friday, 20 May 2022 09:04
>>>> To: Bernard Metzler <BMT@zurich.ibm.com>; Jason Gunthorpe
>>>> <jgg@nvidia.com>; Tom Talpey <tom@talpey.com>
>>>> Cc: dledford@redhat.com; leon@kernel.org; linux-rdma@vger.kernel.org;
>>>> KaiShen@linux.alibaba.com; tonylu@linux.alibaba.com
>>>> Subject: [EXTERNAL] Re: [PATCH for-next v7 10/12] RDMA/erdma: Add the
>>>> erdma module
>>>>
>>>>
>>>>
>>>> On 5/20/22 12:20 AM, Bernard Metzler wrote:
>>>>>
>>>>>
>>>>
>>>> <...>
>>>>
>>>>>>> As far as I know, iWarp device only has one GID entry which
>>>> generated
>>>>>>> from MAC address.
>>>>>>>
>>>>>>> For iWarp, The CM part in core code resolves address, finds
>>>>>>> route with the help of kernel's net subsystem, and then obtains the
>>>>>> correct
>>>>>>> ibdev by GID matching. The GID matching in iWarp is indeed MAC
>>>> address
>>>>>>> matching.
>>>>>>>
>>>>>>> In another words, for iWarp devices, the core code doesn't handle IP
>>>>>>> addressing related stuff directly, it is finished by calling net
>>>> APIs.
>>>>>>> The netdev set by ib_device_set_netdev does not used in iWarp's CM
>>>>>>> process.
>>>>>>>
>>>>>>> The binded netdev in iWarp devices, mainly have two purposes:
>>>>>>>     1). generated GID0, using the netdev's mac address.
>>>>>>>     2). get the port state and attributes.
>>>>>>>
>>>>>>> For 1), erdma device binded to net device also by mac address, which
>>>> can
>>>>>>> be obtained from our PCIe bar registers.
>>>>>>> For 2), erdma can also get the information, and may be more
>>>> accurately.
>>>>>>> For example, erdma can have different MTU with virtio-net in our
>>>> cloud.
>>>>>>>
>>>>>>> For RoCEv2, I know that it has many GIDs, some of them are generated
>>>>>>> from IP addresses, and handing IP addressing in core code.
>>>>>>
>>>>>> Bernard, Tom what do you think?
>>>>>>
>>>>>> Jason
>>>>>
>>>>> I think iWarp (and now RoCEv2 with its UDP dependency) drivers
>>>>> produce GIDs mostly to satisfy the current RDMA CM infrastructure,
>>>>> which depends on this type of unique identifier, inherited from IB.
>>>>> Imo, more natural would be to implement IP based RDMA protocols
>>>>> connection management by relying on IP addresses.
>>>>>
>>>>> Sorry for asking again - why erdma does not need to link with netdev?
>>>>> Can erdma exist without using a netdev?
>>>>
>>>> Actually erdma also need a net device binded to, and so does it.
>>>>
>>>> These days I’m trying to find out acceptable ways to get the reference
>>>> of the binded netdev, e,g, the 'struct net_device' pointer. Unlike 
>>>> other
>>>> RDMA drivers can get the reference of their binded netdevs' reference
>>>> easily (most RDMA devices are based on the extended aux devices), it is
>>>> a little more complex for erdma, because erdma and its binded net 
>>>> device
>>>> are two separated PCIe devices.
>>>>
>>>> Then I find that the netdev reference hold in ibdev is rarely used
>>>> in core code for iWarp deivces, GID0 is the key attribute (As you and
>>>> Tom mentioned, it appears with the historical need for compatibility,
>>>> but I think this is another story).
>>>>
>>>
>>> Yes, I think this is right.
>>>
>>> If you are saying you can go away with a NULL netdev at CM core, then
>>> I think that's fine?
>>> Of course the erdma driver must somehow keep track of the state of
>>> its associated network device - like catching up with link status -
>>> and must provide related information/events to the RDMA core.
>>>
>>
>> All right, and get it. I'd like to hold the binded netdev reference in
>> our probe routine, and send v9 patches.
> 
> Cheng Xu, by this do you mean you'll drop the reference after the probe?

No, we will keep the reference in the lifecycle of erdma, like what
other physical RNICs do.

> In addition to watching for link status changes, I'd expect you also
> will want to monitor ARP/ND changes, maybe even perform those in the
> netdev stack rather than your firmware.

Since we will keep the netdev's reference, we needn't do this. Anyway,
thank your suggestion very much.

> ... Does your adapter not also
> function as a normal NIC?

 From the OS view, erdma does not have NIC function itself. but indeed
our HW has, and it is performed in another PCI device, e,g, virtio-net
device in our cloud now. In other words, our NIC and RDMA devices are
separated PCIe devices in OS, but they are all emulated in the same
hardware.

More details, our scenario is cloud virtualization. We have a
hardware (called MOC, a chipset for cloud virtualization in Alibaba
Cloud, a kind of DPU/IPU), which provides IO acceleration. MOC emulates
kinds of IO devices (virtio-net, virio-blk, nvme and erdma) to provide
IO services to VMs or bare metals. In our cloud, each VM/bare metal has
a virtio-net device by nature to get the VPC network service. erdma aims
to provide RDMA service in the same VPC network. So, erdma need not have
the NIC function, and it is a redundant feature for erdma, which
is provided by virio-net already.

Thanks,
Cheng Xu

> Tom.
> 
>> Thanks your time for looking at this, Jason, Bernard and Tom.
>>
>> Cheng Xu
>>
>>

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-05-24  3:09 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-20 15:13 Re: [PATCH for-next v7 10/12] RDMA/erdma: Add the erdma module Bernard Metzler
2022-05-23  1:39 ` Cheng Xu
2022-05-23 13:25   ` Tom Talpey
2022-05-24  3:09     ` Cheng Xu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.