linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Creating new RDMA driver for habanalabs
@ 2021-08-22  9:40 Oded Gabbay
  2021-08-22 11:32 ` Leon Romanovsky
  2021-08-22 22:31 ` Jason Gunthorpe
  0 siblings, 2 replies; 12+ messages in thread
From: Oded Gabbay @ 2021-08-22  9:40 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: linux-rdma, Greg Kroah-Hartman

Hi Jason,

I think that about a year ago we talked about the custom RDMA code of
habanalabs. I tried to upstream it and you, rightfully, rejected that.

Now that I have enough b/w to do this work, I want to start writing a
proper RDMA driver for the habanalabs Gaudi device, which I will be
able to upstream to the infiniband subsystem.

I don't know if you remember but the Gaudi h/w is somewhat limited in
its RDMA capabilities. We are not selling a stand-alone NIC :) We just
use RDMA (or more precisely, ROCEv2) to connect between Gaudi devices.

I'm sure I will have more specific questions down the line, but I had
hoped you could point me to a basic/not-too-complex existing driver
that I can use as a modern template. I'm also aware that I will need
to write matching code in rdma-core.

Also, I would like to add we will use the auxiliary bus feature to
connect between this driver, the main (compute) driver and the
Ethernet driver (which we are going to publish soon I hope).

Thanks,
Oded

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Creating new RDMA driver for habanalabs
  2021-08-22  9:40 Creating new RDMA driver for habanalabs Oded Gabbay
@ 2021-08-22 11:32 ` Leon Romanovsky
  2021-08-22 22:31 ` Jason Gunthorpe
  1 sibling, 0 replies; 12+ messages in thread
From: Leon Romanovsky @ 2021-08-22 11:32 UTC (permalink / raw)
  To: Oded Gabbay; +Cc: Jason Gunthorpe, linux-rdma, Greg Kroah-Hartman

On Sun, Aug 22, 2021 at 12:40:26PM +0300, Oded Gabbay wrote:
> Hi Jason,
> 
> I think that about a year ago we talked about the custom RDMA code of
> habanalabs. I tried to upstream it and you, rightfully, rejected that.
> 
> Now that I have enough b/w to do this work, I want to start writing a
> proper RDMA driver for the habanalabs Gaudi device, which I will be
> able to upstream to the infiniband subsystem.
> 
> I don't know if you remember but the Gaudi h/w is somewhat limited in
> its RDMA capabilities. We are not selling a stand-alone NIC :) We just
> use RDMA (or more precisely, ROCEv2) to connect between Gaudi devices.
> 
> I'm sure I will have more specific questions down the line, but I had
> hoped you could point me to a basic/not-too-complex existing driver
> that I can use as a modern template. I'm also aware that I will need
> to write matching code in rdma-core.

drivers/infiniband/hw/efa can be seen as a good example.

> 
> Also, I would like to add we will use the auxiliary bus feature to
> connect between this driver, the main (compute) driver and the
> Ethernet driver (which we are going to publish soon I hope).
> 
> Thanks,
> Oded

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Creating new RDMA driver for habanalabs
  2021-08-22  9:40 Creating new RDMA driver for habanalabs Oded Gabbay
  2021-08-22 11:32 ` Leon Romanovsky
@ 2021-08-22 22:31 ` Jason Gunthorpe
  2021-08-23  8:53   ` Oded Gabbay
  1 sibling, 1 reply; 12+ messages in thread
From: Jason Gunthorpe @ 2021-08-22 22:31 UTC (permalink / raw)
  To: Oded Gabbay; +Cc: linux-rdma, Greg Kroah-Hartman

On Sun, Aug 22, 2021 at 12:40:26PM +0300, Oded Gabbay wrote:
> Hi Jason,
> 
> I think that about a year ago we talked about the custom RDMA code of
> habanalabs. I tried to upstream it and you, rightfully, rejected that.
> 
> Now that I have enough b/w to do this work, I want to start writing a
> proper RDMA driver for the habanalabs Gaudi device, which I will be
> able to upstream to the infiniband subsystem.
> 
> I don't know if you remember but the Gaudi h/w is somewhat limited in
> its RDMA capabilities. We are not selling a stand-alone NIC :) We just
> use RDMA (or more precisely, ROCEv2) to connect between Gaudi devices.
> 
> I'm sure I will have more specific questions down the line, but I had
> hoped you could point me to a basic/not-too-complex existing driver
> that I can use as a modern template. I'm also aware that I will need
> to write matching code in rdma-core.
> 
> Also, I would like to add we will use the auxiliary bus feature to
> connect between this driver, the main (compute) driver and the
> Ethernet driver (which we are going to publish soon I hope).

It sounds fine, as Leon mentions EFA is a good starting point for
something simple but non-spec compliant

If I recall properly you'll want to have some special singular PD for
the HW and some specialty QPs?

Jason

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Creating new RDMA driver for habanalabs
  2021-08-22 22:31 ` Jason Gunthorpe
@ 2021-08-23  8:53   ` Oded Gabbay
  2021-08-23 13:04     ` Jason Gunthorpe
  0 siblings, 1 reply; 12+ messages in thread
From: Oded Gabbay @ 2021-08-23  8:53 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: linux-rdma, Greg Kroah-Hartman

On Mon, Aug 23, 2021 at 1:31 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Sun, Aug 22, 2021 at 12:40:26PM +0300, Oded Gabbay wrote:
> > Hi Jason,
> >
> > I think that about a year ago we talked about the custom RDMA code of
> > habanalabs. I tried to upstream it and you, rightfully, rejected that.
> >
> > Now that I have enough b/w to do this work, I want to start writing a
> > proper RDMA driver for the habanalabs Gaudi device, which I will be
> > able to upstream to the infiniband subsystem.
> >
> > I don't know if you remember but the Gaudi h/w is somewhat limited in
> > its RDMA capabilities. We are not selling a stand-alone NIC :) We just
> > use RDMA (or more precisely, ROCEv2) to connect between Gaudi devices.
> >
> > I'm sure I will have more specific questions down the line, but I had
> > hoped you could point me to a basic/not-too-complex existing driver
> > that I can use as a modern template. I'm also aware that I will need
> > to write matching code in rdma-core.
> >
> > Also, I would like to add we will use the auxiliary bus feature to
> > connect between this driver, the main (compute) driver and the
> > Ethernet driver (which we are going to publish soon I hope).
>
> It sounds fine, as Leon mentions EFA is a good starting point for
> something simple but non-spec compliant
>
> If I recall properly you'll want to have some special singular PD for
> the HW and some specialty QPs?
>
> Jason

Yes, we will have a singular PD.
Regarding the QPs, I don't think we have anything special there, but I
might be proven wrong.
I was worried about reg_mr but I think we found a solution for that.

I may be ahead of myself a little, but one of the issues I will need
help with is how to handle ports that are not exposed to the
Networking/Ethernet subsystem.
In a box with Gaudis, some ports are connected back-to-back (between
Gaudi devices) and some are exposed externally.

The ports that are exposed externally will be registered as an
Ethernet device and will be also handled by the Ethernet driver. I
think that is pretty much standard.

However, the "internal" ports won't be registered as an Ethernet
device, as we don't want to expose them to the user as an interface.
They are used only for back-to-back communication between Gaudi
devices inside the same box. You can imagine them to be similar to
NVlink, but instead of a proprietary protocol, they run ROCEv2.
Registering them to netdev creates a very poor user experience and
potentially degrades the host CPU performance (I can elaborate more on
that).

For those ports, we want to prevent the user from sending raw Ethernet
data (opening a socket). We also want to avoid the need for the user
to handle them with ifconfig/ethtool/etc. We only want to expose the
IBverbs interface to those ports.

Do you see any issue with that ?

Thanks,
Oded

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Creating new RDMA driver for habanalabs
  2021-08-23  8:53   ` Oded Gabbay
@ 2021-08-23 13:04     ` Jason Gunthorpe
  2021-08-23 14:19       ` Oded Gabbay
  0 siblings, 1 reply; 12+ messages in thread
From: Jason Gunthorpe @ 2021-08-23 13:04 UTC (permalink / raw)
  To: Oded Gabbay; +Cc: linux-rdma, Greg Kroah-Hartman

On Mon, Aug 23, 2021 at 11:53:48AM +0300, Oded Gabbay wrote:

> Do you see any issue with that ?

It should work out, without a netdev you have to be more careful about
addressing and can't really use the IP addressing modes. But you'd
have a singular hardwired roce gid in this case and act more like an
IB device than a roce device.

Where you might start to run into trouble is you probably want to put
all these ports under a single struct ib_device and we've been moving
away from having significant per-port differences. But I suspect it
can still work out.

Jason

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Creating new RDMA driver for habanalabs
  2021-08-23 13:04     ` Jason Gunthorpe
@ 2021-08-23 14:19       ` Oded Gabbay
  2022-07-06  8:59         ` Oded Gabbay
  0 siblings, 1 reply; 12+ messages in thread
From: Oded Gabbay @ 2021-08-23 14:19 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: linux-rdma, Greg Kroah-Hartman

On Mon, Aug 23, 2021 at 4:04 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Mon, Aug 23, 2021 at 11:53:48AM +0300, Oded Gabbay wrote:
>
> > Do you see any issue with that ?
>
> It should work out, without a netdev you have to be more careful about
> addressing and can't really use the IP addressing modes. But you'd
> have a singular hardwired roce gid in this case and act more like an
> IB device than a roce device.
>
> Where you might start to run into trouble is you probably want to put
> all these ports under a single struct ib_device and we've been moving
> away from having significant per-port differences. But I suspect it
> can still work out.
>
> Jason

ok, thanks for all the info.
I will go look at the efa driver.

Thanks,
Oded

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Creating new RDMA driver for habanalabs
  2021-08-23 14:19       ` Oded Gabbay
@ 2022-07-06  8:59         ` Oded Gabbay
  2022-07-06 16:24           ` Jason Gunthorpe
  0 siblings, 1 reply; 12+ messages in thread
From: Oded Gabbay @ 2022-07-06  8:59 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: linux-rdma, Greg Kroah-Hartman

On Mon, Aug 23, 2021 at 5:19 PM Oded Gabbay <ogabbay@kernel.org> wrote:
>
> On Mon, Aug 23, 2021 at 4:04 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Mon, Aug 23, 2021 at 11:53:48AM +0300, Oded Gabbay wrote:
> >
> > > Do you see any issue with that ?
> >
> > It should work out, without a netdev you have to be more careful about
> > addressing and can't really use the IP addressing modes. But you'd
> > have a singular hardwired roce gid in this case and act more like an
> > IB device than a roce device.
> >
> > Where you might start to run into trouble is you probably want to put
> > all these ports under a single struct ib_device and we've been moving
> > away from having significant per-port differences. But I suspect it
> > can still work out.
> >
> > Jason
>
> ok, thanks for all the info.
> I will go look at the efa driver.
>
> Thanks,
> Oded

Hi Jason.

So it took a *bit* longer than expected due to higher-priority tasks,
but in the last month we did a thorough investigation of how our h/w maps
to the IBverbs API and it appears we have a few constraints that are
not quite common.

Tackling these constraints can affect the basic design of the driver or
even be a non-starter for this entire endeavor.

Therefore, I would like to list the major constraints and get your opinion
whether they are significant, and if so, how to tackle them.

To understand the context of these constraints, I would like to first say
that the Gaudi NICs were designed primarily as a form of a scale-out fabric
for doing Deep-Learning training across thousands of Gaudi devices.

This means that the designated deployment is one where the entire network
is composed of Gaudi NICs, and L2/L3 switches. Doing interoperability with
other NICs was not the main goal, although we did manage to
work vs. a MLNX RDMA NIC in the lab.

In addition, I would like to remind you that each Gaudi has multiple NIC
ports, but from our perspective they are all used for the same purpose.
i.e. We are using ALL the Gaudi NIC ports for a single user process
to distribute its Deep-Learning training workload.

Due to that, we would want to put all the ports under a single struct ib_device,
as you said it yourself in your original email a year ago.
I haven't written this as a h/w constraint, but this is very important
for us from a system/deployment perspective. I would go on to say it
is pretty much
mandatory.

The major constraints are:

1. Support only RDMA WRITE operation. We do not support READ, SEND or RECV.
    This means that many existing open source tests in rdma-core are not
    compatible. e.g. rc_pingpong.c will not work. I guess we will need to
    implement different tests and submit them ? Do you have a
different idea/suggestion ?

2. As you mentioned in the original email, we support only a single PD.
   I don't see any major implication regarding this constraint but please
   correct me if you think otherwise.

3. MR limitation on the rkey that is received from the remote connection
   during connection creation. The limitation is that our h/w extracts
   the rkey from the QP h/w context and not from the WQE when sending packets.
   This means that we may associate only a single remote MR per QP.

   Moreover, we also have an MR limitation on the rkey that we can give to the
   remote side. Our h/w extracts the rkey from QP h/w context and not
from the received
   packets. This means we give the same rkey for all MRs that we create per QP.

   Do you see any issue here with these two limitations ? One thing we noted is
   that we need to somehow configure the rkey in our h/w QP context, while today
   the API doesn't allow it.

   These limitations are not relevant to a deployment where all the NICs are
   Gaudi NICs, because we can use a single rkey for all MRs.

4. We do not support all the flags in the reg_mr API. e.g. we don't
   support IBV_ACCESS_LOCAL_WRITE. I'm not sure what the
   implication is here.

5. Our h/w contains several accelerations we would like to utilize.
   e.g. we have a h/w mechanism for accelerating collective operations
   on multiple RDMA NICs. These accelerations will require either extensions
   to current APIs, or some dedicated APIs. For example, one of the
   accelerations requires that the user will create a QP with the same
   index on all the Gaudi NICs.

Those are the major constraints. We have a few others but imo they are less
severe and can be discussed when we upstream the code.

btw, due to the large effort, we will do this conversion only for
Gaudi2 (and beyond).
Gaudi1 will continue to use our proprietary, not-upstreamed, kernel driver uAPI.

Appreciate your help on this.

Thanks,
Oded

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Creating new RDMA driver for habanalabs
  2022-07-06  8:59         ` Oded Gabbay
@ 2022-07-06 16:24           ` Jason Gunthorpe
  2022-07-07  9:30             ` Oded Gabbay
  0 siblings, 1 reply; 12+ messages in thread
From: Jason Gunthorpe @ 2022-07-06 16:24 UTC (permalink / raw)
  To: Oded Gabbay; +Cc: linux-rdma, Greg Kroah-Hartman

On Wed, Jul 06, 2022 at 11:59:14AM +0300, Oded Gabbay wrote:

> Due to that, we would want to put all the ports under a single struct ib_device,
> as you said it yourself in your original email a year ago.

Yes

> The major constraints are:
> 
> 1. Support only RDMA WRITE operation. We do not support READ, SEND or RECV.
>     This means that many existing open source tests in rdma-core are not
>     compatible. e.g. rc_pingpong.c will not work. I guess we will need to
>     implement different tests and submit them ? Do you have a
> different idea/suggestion ?

I would suggest following what EFA did and just using your own unique
QP with dv accessors to create it. A QP that can only do RDMA WRITE is
not IBA compliant and shouldn't be created by a standard verbs call.
 
> 2. As you mentioned in the original email, we support only a single PD.
>    I don't see any major implication regarding this constraint but please
>    correct me if you think otherwise.

Seems fine

> 3. MR limitation on the rkey that is received from the remote connection
>    during connection creation. The limitation is that our h/w extracts
>    the rkey from the QP h/w context and not from the WQE when sending packets.
>    This means that we may associate only a single remote MR per QP.

It seems OK in the context above where you have your own QP type and
obviouly your specila RDMA WRITE poster will not take in an rkey as
any argument.

>    Do you see any issue here with these two limitations ? One thing we noted is
>    that we need to somehow configure the rkey in our h/w QP context, while today
>    the API doesn't allow it.

When you add your own dv qp create function it will take in the
required rkey during qp creation.
 
>    These limitations are not relevant to a deployment where all the NICs are
>    Gaudi NICs, because we can use a single rkey for all MRs.

Er, that is weird, did you mean to say you have only one MR per PD and
that it always has a fixed value?
 
> 4. We do not support all the flags in the reg_mr API. e.g. we don't
>    support IBV_ACCESS_LOCAL_WRITE. I'm not sure what the
>    implication is here.

It is OK, since you can't issue a local operation WQE anyhow you can
just ignore the flag.

> 5. Our h/w contains several accelerations we would like to utilize.
>    e.g. we have a h/w mechanism for accelerating collective operations
>    on multiple RDMA NICs. These accelerations will require either extensions
>    to current APIs, or some dedicated APIs. For example, one of the
>    accelerations requires that the user will create a QP with the same
>    index on all the Gaudi NICs.

Use your DV interface to do these kinds of things

Thanks,
Jason

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Creating new RDMA driver for habanalabs
  2022-07-06 16:24           ` Jason Gunthorpe
@ 2022-07-07  9:30             ` Oded Gabbay
  2022-07-08 13:29               ` Jason Gunthorpe
  0 siblings, 1 reply; 12+ messages in thread
From: Oded Gabbay @ 2022-07-07  9:30 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: linux-rdma, Greg Kroah-Hartman

On Wed, Jul 6, 2022 at 7:24 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Wed, Jul 06, 2022 at 11:59:14AM +0300, Oded Gabbay wrote:
>
> > Due to that, we would want to put all the ports under a single struct ib_device,
> > as you said it yourself in your original email a year ago.
>
> Yes
>
> > The major constraints are:
> >
> > 1. Support only RDMA WRITE operation. We do not support READ, SEND or RECV.
> >     This means that many existing open source tests in rdma-core are not
> >     compatible. e.g. rc_pingpong.c will not work. I guess we will need to
> >     implement different tests and submit them ? Do you have a
> > different idea/suggestion ?
>
> I would suggest following what EFA did and just using your own unique
> QP with dv accessors to create it. A QP that can only do RDMA WRITE is
> not IBA compliant and shouldn't be created by a standard verbs call.
>
> > 2. As you mentioned in the original email, we support only a single PD.
> >    I don't see any major implication regarding this constraint but please
> >    correct me if you think otherwise.
>
> Seems fine
>
> > 3. MR limitation on the rkey that is received from the remote connection
> >    during connection creation. The limitation is that our h/w extracts
> >    the rkey from the QP h/w context and not from the WQE when sending packets.
> >    This means that we may associate only a single remote MR per QP.
>
> It seems OK in the context above where you have your own QP type and
> obviouly your specila RDMA WRITE poster will not take in an rkey as
> any argument.
>
> >    Do you see any issue here with these two limitations ? One thing we noted is
> >    that we need to somehow configure the rkey in our h/w QP context, while today
> >    the API doesn't allow it.
>
> When you add your own dv qp create function it will take in the
> required rkey during qp creation.
>
> >    These limitations are not relevant to a deployment where all the NICs are
> >    Gaudi NICs, because we can use a single rkey for all MRs.
>
> Er, that is weird, did you mean to say you have only one MR per PD and
> that it always has a fixed value?
Not exactly. We have multiple MRs per PD, but the driver assigns the
same rkey (fixed value) for all created MRs. Our h/w matches the rkey
with the one that is written in the QP. The rkey is not part of the actual
MMU translation that is done inside our h/w. The MMU translation is
done using the PD (we call it ASID - address space ID) and Address.

>
> > 4. We do not support all the flags in the reg_mr API. e.g. we don't
> >    support IBV_ACCESS_LOCAL_WRITE. I'm not sure what the
> >    implication is here.
>
> It is OK, since you can't issue a local operation WQE anyhow you can
> just ignore the flag.
>
> > 5. Our h/w contains several accelerations we would like to utilize.
> >    e.g. we have a h/w mechanism for accelerating collective operations
> >    on multiple RDMA NICs. These accelerations will require either extensions
> >    to current APIs, or some dedicated APIs. For example, one of the
> >    accelerations requires that the user will create a QP with the same
> >    index on all the Gaudi NICs.
>
> Use your DV interface to do these kinds of things

Great!
We will start to move forward using this approach.
I imagine we will have something to show in a couple of months.

Thanks,
Oded

>
> Thanks,
> Jason

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Creating new RDMA driver for habanalabs
  2022-07-07  9:30             ` Oded Gabbay
@ 2022-07-08 13:29               ` Jason Gunthorpe
  2022-07-10  7:30                 ` Oded Gabbay
  0 siblings, 1 reply; 12+ messages in thread
From: Jason Gunthorpe @ 2022-07-08 13:29 UTC (permalink / raw)
  To: Oded Gabbay; +Cc: linux-rdma, Greg Kroah-Hartman

On Thu, Jul 07, 2022 at 12:30:03PM +0300, Oded Gabbay wrote:
> > >    These limitations are not relevant to a deployment where all the NICs are
> > >    Gaudi NICs, because we can use a single rkey for all MRs.
> >
> > Er, that is weird, did you mean to say you have only one MR per PD and
> > that it always has a fixed value?

> Not exactly. We have multiple MRs per PD, but the driver assigns the
> same rkey (fixed value) for all created MRs. Our h/w matches the rkey
> with the one that is written in the QP. The rkey is not part of the actual
> MMU translation that is done inside our h/w. The MMU translation is
> done using the PD (we call it ASID - address space ID) and Address.

I don't understand this at all - how can you have multiple MRs if
there is only one ASID per PD? The MR is logically the ASID since the
MR is the verbs model for MMU translation.

So, if you have one ASID per PD and multiple MRs, what are the MRs
supposed to be?

Jason

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Creating new RDMA driver for habanalabs
  2022-07-08 13:29               ` Jason Gunthorpe
@ 2022-07-10  7:30                 ` Oded Gabbay
  2022-07-21 18:42                   ` Jason Gunthorpe
  0 siblings, 1 reply; 12+ messages in thread
From: Oded Gabbay @ 2022-07-10  7:30 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: linux-rdma, Greg Kroah-Hartman

On Fri, Jul 8, 2022 at 4:29 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Thu, Jul 07, 2022 at 12:30:03PM +0300, Oded Gabbay wrote:
> > > >    These limitations are not relevant to a deployment where all the NICs are
> > > >    Gaudi NICs, because we can use a single rkey for all MRs.
> > >
> > > Er, that is weird, did you mean to say you have only one MR per PD and
> > > that it always has a fixed value?
>
> > Not exactly. We have multiple MRs per PD, but the driver assigns the
> > same rkey (fixed value) for all created MRs. Our h/w matches the rkey
> > with the one that is written in the QP. The rkey is not part of the actual
> > MMU translation that is done inside our h/w. The MMU translation is
> > done using the PD (we call it ASID - address space ID) and Address.
>
> I don't understand this at all - how can you have multiple MRs if
> there is only one ASID per PD? The MR is logically the ASID since the
> MR is the verbs model for MMU translation.

We don't follow the MR verbs model. This is the meaning of the
hardware constraint I wrote imo.
Our MMU does a pgt walk that starts with ASID and then just goes
according to the virtual address, same as regular CPU does.
The key is not a part of the pgt.
The ASID represents different processes, but because we decided long
ago we support only a single user process,
we only allocate a single ASID, which will translate to a single PD in
our IBverb driver.

>
> So, if you have one ASID per PD and multiple MRs, what are the MRs
> supposed to be?
>
> Jason

Per my understanding, the MRs are meant to notify the driver that the
user would like the h/w MMU to be familiar with these memory regions.
As we also need to pin them, it is preferable to have multiple small
MRs than a single very large MR.
The fact that the key that is returned is the same for all memory
regions shouldn't affect the user. Our MMU will be able to do the
translation correctly using only the ASID+address.
In addition, because we also have on-device memory (HBM), we would
like to allow the user to register memory regions in that memory. So
we need to support at least two MRs.

Thanks,
Oded

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Creating new RDMA driver for habanalabs
  2022-07-10  7:30                 ` Oded Gabbay
@ 2022-07-21 18:42                   ` Jason Gunthorpe
  0 siblings, 0 replies; 12+ messages in thread
From: Jason Gunthorpe @ 2022-07-21 18:42 UTC (permalink / raw)
  To: Oded Gabbay; +Cc: linux-rdma, Greg Kroah-Hartman

On Sun, Jul 10, 2022 at 10:30:41AM +0300, Oded Gabbay wrote:

> Per my understanding, the MRs are meant to notify the driver that the
> user would like the h/w MMU to be familiar with these memory regions.
> As we also need to pin them, it is preferable to have multiple small
> MRs than a single very large MR.
> The fact that the key that is returned is the same for all memory
> regions shouldn't affect the user. Our MMU will be able to do the
> translation correctly using only the ASID+address.
> In addition, because we also have on-device memory (HBM), we would
> like to allow the user to register memory regions in that memory. So
> we need to support at least two MRs.

I think it would make sense to stick with a single MR and just have
some DV operation to remap portions of it. It achieves the same thing
for you without creating a verbs confusion with multiple MRs that are
actually the same MR.

mlx5 already has a dv API that is sort of like this, so there is
precedent.

Jason

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-07-21 18:42 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-22  9:40 Creating new RDMA driver for habanalabs Oded Gabbay
2021-08-22 11:32 ` Leon Romanovsky
2021-08-22 22:31 ` Jason Gunthorpe
2021-08-23  8:53   ` Oded Gabbay
2021-08-23 13:04     ` Jason Gunthorpe
2021-08-23 14:19       ` Oded Gabbay
2022-07-06  8:59         ` Oded Gabbay
2022-07-06 16:24           ` Jason Gunthorpe
2022-07-07  9:30             ` Oded Gabbay
2022-07-08 13:29               ` Jason Gunthorpe
2022-07-10  7:30                 ` Oded Gabbay
2022-07-21 18:42                   ` Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).