All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH for-next v7 00/12] Elastic RDMA Adapter (ERDMA) driver
@ 2022-04-21  7:17 Cheng Xu
  2022-04-21  7:17 ` [PATCH for-next v7 01/12] RDMA: Add ERDMA to rdma_driver_id definition Cheng Xu
                   ` (12 more replies)
  0 siblings, 13 replies; 42+ messages in thread
From: Cheng Xu @ 2022-04-21  7:17 UTC (permalink / raw)
  To: jgg, dledford, leon; +Cc: linux-rdma, KaiShen, chengyou, tonylu, BMT

Hello all,

This v7 patch set introduces the Elastic RDMA Adapter (ERDMA) driver,
which released in Apsara Conference 2021 by Alibaba. The PR of ERDMA
userspace provider has already been created [1].

ERDMA enables large-scale RDMA acceleration capability in Alibaba ECS
environment, initially offered in g7re instance. It can improve the
efficiency of large-scale distributed computing and communication
significantly and expand dynamically with the cluster scale of Alibaba
Cloud.

ERDMA is a RDMA networking adapter based on the Alibaba MOC hardware. It
works in the VPC network environment (overlay network), and uses iWarp
transport protocol. ERDMA supports reliable connection (RC). ERDMA also
supports both kernel space and user space verbs. Now we have already
supported HPC/AI applications with libfabric, NoF and some other internal
verbs libraries, such as xrdma, epsl, etc,.

For the ECS instance with RDMA enabled, our MOC hardware generates two
kinds of PCI devices: one for ERDMA, and one for the original net device
(virtio-net). They are separated PCI devices, using "rdma link" command
with a filter inside our rdma_link_ops.newlink implementation can bind
them together properly.

Besides, this patchset contains a change in iw_query_port to fix this
issue [2]. This change lets the device drivers decide the return value of
iw_query_port when attached netdev is NULL. After this change, erdma can
register device successfully in pci probe function, and keep port state
invalid until a netdev is binded to it.

Changes in v7:
- Fix a wrong doorbell records' address calculation issue in
  erdma_create_qp.
- Fix a condition race issue when reporting IW_CM_EVENT_CONNECT_REQUEST
  event in cm.
- Sorry for a mmap_free implementation missing, we add it in this version.
- Remove unnecessary reference to erdma_dev in erdma_ucontext.

Changes in v6:
- Rebase to the latest for-next code, and solve the compilation issues.

Fixed issues or changes in v5:
- Rename the reserved fields of structure definitions to improve
  readability.
- Remove some magic numbers and unnecessary initializations.
- Fix some coding style format issues.
- Fix some typos in comments.
- No casting in the assignment if the function's returned pointer is
  "void *".
- Re-write the polling functions (cmdq cq, verbs cq, aeq and ceq), which
  all check the valid bit in order to get next valid QE. This new
  implementation is more simple. Thank Wenpeng.
- Fix an issue reported by kernel test robot.
- Some minor changes in code (such as removing SRQ definitions since we do
  not support it yet).

Fixed issues in v4:
- Fix some typos.
- Use __GFP_ZERO flags in dma_alloc_coherent, instead of memset after
  buffer allocation.
- Use one single polling function for AEQ and CEQ, before there had two.
- Fix wrong iov_num when calling kernel_sendmsg.
- Add necessary comment in erdma_cm.
- Remove duplicated check in MPA processing function.
- Always return 0 in erdma_query_port.
- Directly return error code instead of assigning "ret", and then returning
  "ret" in init_kernel_qp.

Fixed issues or changes in v3:
- Change char limit of column from 100 to 80.
- Remove unnecessary field or structure definitions in erdma.h.
- Use exactly type (bool, unsigned int) instead of "int" in erdma_dev.
- Make ibdev and pci device having the same lifecycle. ERDMA will remain
  an invalid port state until binded to the corresponding netdev.
- ib_core: allow query_port when netdev is NULL for iWarp device.
- Move large inline function in erdma.h to .c files.
- Use dev_{info, warn, err} or ibdev_{info, warn, err} instead of
  pr_{info, warn, err} function calls.
- Remove print function calls in userspace-triggered paths.
- Add necessary comments in CM part.
- Remove unused entries in map_cqe_opcode[] table.
- Use rdma_is_kernel_res instead of self-definitions.
- Remove unsed resources counter in erdma_dev.
- Use pgprot_device instead of pgprot_noncached in erdma_mmap.
- Remove disassociate_ucontext interface implementation

Fixed issues in v2:
- No "extern" to function declarations.
- No inline functions in .c files, no void casting for functions with
  return values.
- Based on siw's newest kernel version, rewrite the code (mainly CM and
  CM related part) which originally based on an old siw version.
- remove debugfs.
- fix issues reported by kernel test robot.
- Using RDMA_NLDEV_CMD_NEWLINK instead of binding in net notifiers.

[1] https://github.com/linux-rdma/rdma-core/pull/1126
[2] https://lore.kernel.org/all/20220118141324.GF8034@ziepe.ca/

Thanks,
Cheng Xu

Cheng Xu (12):
  RDMA: Add ERDMA to rdma_driver_id definition
  RDMA/core: Allow calling query_port when netdev isn't attached in
    iWarp
  RDMA/erdma: Add the hardware related definitions
  RDMA/erdma: Add main include file
  RDMA/erdma: Add cmdq implementation
  RDMA/erdma: Add event queue implementation
  RDMA/erdma: Add verbs header file
  RDMA/erdma: Add verbs implementation
  RDMA/erdma: Add connection management (CM) support
  RDMA/erdma: Add the erdma module
  RDMA/erdma: Add the ABI definitions
  RDMA/erdma: Add driver to kernel build environment

 MAINTAINERS                               |    8 +
 drivers/infiniband/Kconfig                |    1 +
 drivers/infiniband/core/device.c          |    7 +-
 drivers/infiniband/hw/Makefile            |    1 +
 drivers/infiniband/hw/erdma/Kconfig       |   12 +
 drivers/infiniband/hw/erdma/Makefile      |    4 +
 drivers/infiniband/hw/erdma/erdma.h       |  287 ++++
 drivers/infiniband/hw/erdma/erdma_cm.c    | 1435 ++++++++++++++++++++
 drivers/infiniband/hw/erdma/erdma_cm.h    |  168 +++
 drivers/infiniband/hw/erdma/erdma_cmdq.c  |  497 +++++++
 drivers/infiniband/hw/erdma/erdma_cq.c    |  205 +++
 drivers/infiniband/hw/erdma/erdma_eq.c    |  334 +++++
 drivers/infiniband/hw/erdma/erdma_hw.h    |  504 +++++++
 drivers/infiniband/hw/erdma/erdma_main.c  |  625 +++++++++
 drivers/infiniband/hw/erdma/erdma_qp.c    |  564 ++++++++
 drivers/infiniband/hw/erdma/erdma_verbs.c | 1461 +++++++++++++++++++++
 drivers/infiniband/hw/erdma/erdma_verbs.h |  342 +++++
 include/uapi/rdma/erdma-abi.h             |   49 +
 include/uapi/rdma/ib_user_ioctl_verbs.h   |    1 +
 19 files changed, 6504 insertions(+), 1 deletion(-)
 create mode 100644 drivers/infiniband/hw/erdma/Kconfig
 create mode 100644 drivers/infiniband/hw/erdma/Makefile
 create mode 100644 drivers/infiniband/hw/erdma/erdma.h
 create mode 100644 drivers/infiniband/hw/erdma/erdma_cm.c
 create mode 100644 drivers/infiniband/hw/erdma/erdma_cm.h
 create mode 100644 drivers/infiniband/hw/erdma/erdma_cmdq.c
 create mode 100644 drivers/infiniband/hw/erdma/erdma_cq.c
 create mode 100644 drivers/infiniband/hw/erdma/erdma_eq.c
 create mode 100644 drivers/infiniband/hw/erdma/erdma_hw.h
 create mode 100644 drivers/infiniband/hw/erdma/erdma_main.c
 create mode 100644 drivers/infiniband/hw/erdma/erdma_qp.c
 create mode 100644 drivers/infiniband/hw/erdma/erdma_verbs.c
 create mode 100644 drivers/infiniband/hw/erdma/erdma_verbs.h
 create mode 100644 include/uapi/rdma/erdma-abi.h

-- 
2.27.0


^ permalink raw reply	[flat|nested] 42+ messages in thread
* RE: Re: [PATCH for-next v7 10/12] RDMA/erdma: Add the erdma module
@ 2022-05-20 15:13 Bernard Metzler
  2022-05-23  1:39 ` Cheng Xu
  0 siblings, 1 reply; 42+ messages in thread
From: Bernard Metzler @ 2022-05-20 15:13 UTC (permalink / raw)
  To: Cheng Xu, Jason Gunthorpe, Tom Talpey
  Cc: dledford, leon, linux-rdma, KaiShen, tonylu


> -----Original Message-----
> From: Cheng Xu <chengyou@linux.alibaba.com>
> Sent: Friday, 20 May 2022 09:04
> To: Bernard Metzler <BMT@zurich.ibm.com>; Jason Gunthorpe
> <jgg@nvidia.com>; Tom Talpey <tom@talpey.com>
> Cc: dledford@redhat.com; leon@kernel.org; linux-rdma@vger.kernel.org;
> KaiShen@linux.alibaba.com; tonylu@linux.alibaba.com
> Subject: [EXTERNAL] Re: [PATCH for-next v7 10/12] RDMA/erdma: Add the
> erdma module
> 
> 
> 
> On 5/20/22 12:20 AM, Bernard Metzler wrote:
> >
> >
> 
> <...>
> 
> >>> As far as I know, iWarp device only has one GID entry which
> generated
> >>> from MAC address.
> >>>
> >>> For iWarp, The CM part in core code resolves address, finds
> >>> route with the help of kernel's net subsystem, and then obtains the
> >> correct
> >>> ibdev by GID matching. The GID matching in iWarp is indeed MAC
> address
> >>> matching.
> >>>
> >>> In another words, for iWarp devices, the core code doesn't handle IP
> >>> addressing related stuff directly, it is finished by calling net
> APIs.
> >>> The netdev set by ib_device_set_netdev does not used in iWarp's CM
> >>> process.
> >>>
> >>> The binded netdev in iWarp devices, mainly have two purposes:
> >>>    1). generated GID0, using the netdev's mac address.
> >>>    2). get the port state and attributes.
> >>>
> >>> For 1), erdma device binded to net device also by mac address, which
> can
> >>> be obtained from our PCIe bar registers.
> >>> For 2), erdma can also get the information, and may be more
> accurately.
> >>> For example, erdma can have different MTU with virtio-net in our
> cloud.
> >>>
> >>> For RoCEv2, I know that it has many GIDs, some of them are generated
> >>> from IP addresses, and handing IP addressing in core code.
> >>
> >> Bernard, Tom what do you think?
> >>
> >> Jason
> >
> > I think iWarp (and now RoCEv2 with its UDP dependency) drivers
> > produce GIDs mostly to satisfy the current RDMA CM infrastructure,
> > which depends on this type of unique identifier, inherited from IB.
> > Imo, more natural would be to implement IP based RDMA protocols
> > connection management by relying on IP addresses.
> >
> > Sorry for asking again - why erdma does not need to link with netdev?
> > Can erdma exist without using a netdev?
> 
> Actually erdma also need a net device binded to, and so does it.
> 
> These days I’m trying to find out acceptable ways to get the reference
> of the binded netdev, e,g, the 'struct net_device' pointer. Unlike other
> RDMA drivers can get the reference of their binded netdevs' reference
> easily (most RDMA devices are based on the extended aux devices), it is
> a little more complex for erdma, because erdma and its binded net device
> are two separated PCIe devices.
> 
> Then I find that the netdev reference hold in ibdev is rarely used
> in core code for iWarp deivces, GID0 is the key attribute (As you and
> Tom mentioned, it appears with the historical need for compatibility,
> but I think this is another story).
> 

Yes, I think this is right.

If you are saying you can go away with a NULL netdev at CM core, then
I think that's fine?
Of course the erdma driver must somehow keep track of the state of
its associated network device - like catching up with link status -
and must provide related information/events to the RDMA core.

> So, there are two choices for erdma: enum net devices and find the
> matched one, or never calling ib_device_set_netdev. The second one has
> less code.
> 
> The second way can't work in ROCE. But it works for iWarp (I've tested),
> since the netdev reference is rarely used for iWarp in core code, as I
> said in last reply.
> 
> In short, the question discussed here is that: is it acceptable that
> doesn't hold the netdev reference in core code for a iWarp driver
> (indeed it has a netdev binded to) ? Or is it necessary that calling
> ib_device_set_netdev to set the binded netdev for iWarp driver?
> 
> You and Tom both are specialists in iWarp, your opinions are important.
> 
> Thanks very much
> Cheng Xu
> 
> 
> >
> > Thanks,
> > Bernard.

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2022-05-24  3:09 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-21  7:17 [PATCH for-next v7 00/12] Elastic RDMA Adapter (ERDMA) driver Cheng Xu
2022-04-21  7:17 ` [PATCH for-next v7 01/12] RDMA: Add ERDMA to rdma_driver_id definition Cheng Xu
2022-04-21  7:17 ` [PATCH for-next v7 02/12] RDMA/core: Allow calling query_port when netdev isn't attached in iWarp Cheng Xu
2022-04-21  7:17 ` [PATCH for-next v7 03/12] RDMA/erdma: Add the hardware related definitions Cheng Xu
2022-04-21  7:17 ` [PATCH for-next v7 04/12] RDMA/erdma: Add main include file Cheng Xu
2022-04-21  7:17 ` [PATCH for-next v7 05/12] RDMA/erdma: Add cmdq implementation Cheng Xu
2022-04-21  7:17 ` [PATCH for-next v7 06/12] RDMA/erdma: Add event queue implementation Cheng Xu
2022-04-21  7:17 ` [PATCH for-next v7 07/12] RDMA/erdma: Add verbs header file Cheng Xu
2022-04-21  7:17 ` [PATCH for-next v7 08/12] RDMA/erdma: Add verbs implementation Cheng Xu
2022-04-21  7:17 ` [PATCH for-next v7 09/12] RDMA/erdma: Add connection management (CM) support Cheng Xu
2022-04-21  7:17 ` [PATCH for-next v7 10/12] RDMA/erdma: Add the erdma module Cheng Xu
2022-05-10 13:17   ` Jason Gunthorpe
2022-05-16  3:15     ` Cheng Xu
2022-05-16 11:49       ` Jason Gunthorpe
2022-05-16 12:37         ` Cheng Xu
2022-05-16 12:40         ` Cheng Xu
2022-05-16 13:59           ` Cheng Xu
2022-05-16 14:07             ` Jason Gunthorpe
2022-05-16 15:14               ` Cheng Xu
2022-05-16 17:31                 ` Jason Gunthorpe
2022-05-17  1:53                   ` Cheng Xu
2022-05-18  8:30     ` Cheng Xu
2022-05-18 14:46       ` Jason Gunthorpe
2022-05-18 16:24         ` Cheng Xu
2022-05-18 16:31           ` Jason Gunthorpe
2022-05-19 16:20             ` Bernard Metzler
2022-05-19 18:51               ` Tom Talpey
2022-05-20  7:03               ` Cheng Xu
2022-04-21  7:17 ` [PATCH for-next v7 11/12] RDMA/erdma: Add the ABI definitions Cheng Xu
2022-04-21  7:17 ` [PATCH for-next v7 12/12] RDMA/erdma: Add driver to kernel build environment Cheng Xu
2022-05-10 13:18   ` Jason Gunthorpe
2022-05-16  3:40     ` Cheng Xu
2022-05-16  7:11       ` Cheng Xu
2022-05-16 10:07         ` Cheng Xu
2022-05-16 14:13       ` Jason Gunthorpe
2022-05-16 14:41         ` Cheng Xu
2022-05-10 12:50 ` [PATCH for-next v7 00/12] Elastic RDMA Adapter (ERDMA) driver Jason Gunthorpe
2022-05-16  2:30   ` Cheng Xu
2022-05-16 14:13     ` Jason Gunthorpe
2022-05-20 15:13 Re: [PATCH for-next v7 10/12] RDMA/erdma: Add the erdma module Bernard Metzler
2022-05-23  1:39 ` Cheng Xu
2022-05-23 13:25   ` Tom Talpey
2022-05-24  3:09     ` Cheng Xu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.