* [RFC] RDMA verbs transport design notes
@ 2015-10-29 19:41 Dennis Dalessandro
[not found] ` <20151029194129.GE26235-W4f6Xiosr+yv7QzWx2u06xL4W9x8LtSr@public.gmane.org>
0 siblings, 1 reply; 3+ messages in thread
From: Dennis Dalessandro @ 2015-10-29 19:41 UTC (permalink / raw)
To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
Cc: ira.weiny-ral2JQCrhuEAvxtiuMwx3w,
mike.marciniszyn-ral2JQCrhuEAvxtiuMwx3w,
monis-VPRAkNaXOzVWk0Htik3J/w
Hi Folks,
I had previously posted a notice about the very beginnings of the rdmavt
driver which is the software verbs consolidation for multiple drivers [1].
I have now pushed another set of updates to a GitHub repo [2] which
contains more details. What this latest batch of patches entail is a
stubbed out version with annotations in the comments as to what the
interaction between drivers and rdmavt will look like. Look for lines like:
VT-DRIVER-API
The following is a summary of the current posted code and the direction
which we are thinking of going based on knowledge of qib and hfi1 drivers.
Feedback and suggestions are welcome.
Design Goals
------------
- Remove duplication of software verbs code present in multiple drivers.
- Do not regress performance.
Registration and general code flow
----------------------------------
Instead of registering directly with the IB core like they do now, drivers
will register with the rdmavt, referred to as rvt in the code. Drivers will
build up the ib_device_attr and pass in to the registration by way of the
rvt_dev_info struct. This will also contain any other driver specific
settings that rvt will need to know about.
Currently allocation of the ib_device is done by the driver. This is merely
a stepping stone, and eventually the allocation will move up to rvt. The
driver should not need to know about the ib_device structure eventually,
other than for those functions it chooses to override
In addition to describing its properties, drivers will supply a mapping of
function pointers for use by rvt. The idea is that most of the verbs code
lives in rvt, but there are some device specific functions which drivers
will need to perform, such as pushing packets to the wire. Rvt will
accomplish its tasks by calling into the drivers for these.
There are also times when drivers will need to call back into rvt. We
should aim to limit this as much as possible. For things like a packet
arriving from the wire we have no choice but for the driver to initiate the
processing and call into (or signal in some way) rvt.
Driver override
---------------
Drivers need to be able to override functions that would normally be done
by the rvt. In the current set of patches this is accomplished by filling
in a value in the ib_device_attr function pointer map. If the value is NULL
then rvt uses its function, otherwise rvt is bypassed by the core and the
driver is called directly. Performance optimizations could be one reason,
another is incremental development. We can work on moving a driver over to
rvt in stages.
Driver provided functionality
-----------------------------
This list will likely grow as the code evolves but as a first pass through
these are the things which I see as needing to be provided by the driver:
query_port_state()
Returns pretty much what is in ib_port_attr
Will differ based on driver
set_link_state()
For rvt to have the driver set the state of the link
get_lid()
Provides the LID
qp_mtu()
Using the SL determines the MTU (this varies per VL in OPA)
make_qpn()
QPN ranges differ for drivers
flush_qp()
Flush out all pending operations for a QP that have not made it the
wire, and wait for that flush to finish.
do_send()
Take a fully constructed packet and place on the wire
Other functions for things like maintaining MAD counters perhaps.
Driver notification or upcall to rvt
------------------------------------
Certain things will require the driver to notify the rvt or execute some
function. For instance, the driver needs to hand the packet to rvt after it
pulls it off the wire.
There are also event which the driver needs to let rvt know have happened.
Things that currently generate IB_EVENT_PORT_ERROR, or
IB_EVENT_PORT_ACTIVE, etc. There are likely other events as well.
Next steps
----------
We will continue posting code to GitHub [2] while we field feedback. Note
the repo has been moved from my previous announcement [1]. I have placed it
under my GitHub. Once there is more significant development and folks are
generally happy with the design we will begin posting to this mailing list
(linux-rdma).
The current branch on [2] is rdmavt-v1. I'll bump this whenever a rebase
is needed.
[1] http://marc.info/?l=linux-rdma&m=144563342718705&w=2
[2] https://github.com/ddalessa/kernel/tree/rdmavt-v1
Thanks
-Denny
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [RFC] RDMA verbs transport design notes
[not found] ` <20151029194129.GE26235-W4f6Xiosr+yv7QzWx2u06xL4W9x8LtSr@public.gmane.org>
@ 2015-11-02 13:46 ` Moni Shoua
[not found] ` <CAG9sBKOEs_u-YVTwS=Gk2sxRCqV+Wsdh1PyNy+baf3EBSCSbTA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 1 reply; 3+ messages in thread
From: Moni Shoua @ 2015-11-02 13:46 UTC (permalink / raw)
To: Dennis Dalessandro
Cc: linux-rdma, Weiny, Ira, mike.marciniszyn-ral2JQCrhuEAvxtiuMwx3w
On Thu, Oct 29, 2015 at 9:41 PM, Dennis Dalessandro
<dennis.dalessandro-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:
> Hi Folks,
>
> I had previously posted a notice about the very beginnings of the rdmavt
> driver which is the software verbs consolidation for multiple drivers [1].
> I have now pushed another set of updates to a GitHub repo [2] which
> contains more details. What this latest batch of patches entail is a
> stubbed out version with annotations in the comments as to what the
> interaction between drivers and rdmavt will look like. Look for lines like:
>
Thanks Denny
We will pull from GitHub, learn the design and hopefully contribute.
Some minor comments below
> VT-DRIVER-API
>
> The following is a summary of the current posted code and the direction
> which we are thinking of going based on knowledge of qib and hfi1 drivers.
> Feedback and suggestions are welcome.
>
> Design Goals
> ------------
> - Remove duplication of software verbs code present in multiple drivers.
> - Do not regress performance.
>
> Registration and general code flow
> ----------------------------------
> Instead of registering directly with the IB core like they do now, drivers
> will register with the rdmavt, referred to as rvt in the code. Drivers will
> build up the ib_device_attr and pass in to the registration by way of the
> rvt_dev_info struct. This will also contain any other driver specific
> settings that rvt will need to know about.
>
> Currently allocation of the ib_device is done by the driver. This is merely
> a stepping stone, and eventually the allocation will move up to rvt. The
> driver should not need to know about the ib_device structure eventually,
> other than for those functions it chooses to override
>
> In addition to describing its properties, drivers will supply a mapping of
> function pointers for use by rvt. The idea is that most of the verbs code
> lives in rvt, but there are some device specific functions which drivers
> will need to perform, such as pushing packets to the wire. Rvt will
> accomplish its tasks by calling into the drivers for these.
>
> There are also times when drivers will need to call back into rvt. We should
> aim to limit this as much as possible. For things like a packet arriving
> from the wire we have no choice but for the driver to initiate the
> processing and call into (or signal in some way) rvt.
>
> Driver override
> ---------------
> Drivers need to be able to override functions that would normally be done by
> the rvt. In the current set of patches this is accomplished by filling in a
> value in the ib_device_attr function pointer map. If the value is NULL then
> rvt uses its function, otherwise rvt is bypassed by the core and the driver
> is called directly. Performance optimizations could be one reason, another
> is incremental development. We can work on moving a driver over to rvt in
> stages.
>
> Driver provided functionality
> -----------------------------
> This list will likely grow as the code evolves but as a first pass through
> these are the things which I see as needing to be provided by the driver:
>
> query_port_state()
> Returns pretty much what is in ib_port_attr
> Will differ based on driver
>
> set_link_state()
> For rvt to have the driver set the state of the link
>
> get_lid()
> Provides the LID
>
LID is a special case of L2 address (MAC is another special case)
Maybe change this to het_l2()?
> qp_mtu()
> Using the SL determines the MTU (this varies per VL in OPA)
>
Again, this is too tied to the InfiniBand protocol.
Also, It doesn't make sense to me that a driver won't know how to
create a QP (this is done by the rvt) but will know how to answer
about QP mtu. Does it?
> make_qpn()
> QPN ranges differ for drivers
>
> flush_qp()
> Flush out all pending operations for a QP that have not made it the
> wire, and wait for that flush to finish.
Again, needs generalization
> do_send()
> Take a fully constructed packet and place on the wire
>
The hardest operation of all IMO.
Should be efficient but yet general
> Other functions for things like maintaining MAD counters perhaps.
>
> Driver notification or upcall to rvt
> ------------------------------------
> Certain things will require the driver to notify the rvt or execute some
> function. For instance, the driver needs to hand the packet to rvt after it
> pulls it off the wire.
>
> There are also event which the driver needs to let rvt know have happened.
> Things that currently generate IB_EVENT_PORT_ERROR, or IB_EVENT_PORT_ACTIVE,
> etc. There are likely other events as well.
>
> Next steps
> ----------
> We will continue posting code to GitHub [2] while we field feedback. Note
> the repo has been moved from my previous announcement [1]. I have placed it
> under my GitHub. Once there is more significant development and folks are
> generally happy with the design we will begin posting to this mailing list
> (linux-rdma).
What's the minimal progress in the rvt and the drivers before you
think it's ready for posting to the list?
>
> The current branch on [2] is rdmavt-v1. I'll bump this whenever a rebase
> is needed.
>
> [1] http://marc.info/?l=linux-rdma&m=144563342718705&w=2
> [2] https://github.com/ddalessa/kernel/tree/rdmavt-v1
>
> Thanks
>
> -Denny
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [RFC] RDMA verbs transport design notes
[not found] ` <CAG9sBKOEs_u-YVTwS=Gk2sxRCqV+Wsdh1PyNy+baf3EBSCSbTA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-11-02 16:34 ` Dennis Dalessandro
0 siblings, 0 replies; 3+ messages in thread
From: Dennis Dalessandro @ 2015-11-02 16:34 UTC (permalink / raw)
To: Moni Shoua
Cc: linux-rdma, Weiny, Ira, mike.marciniszyn-ral2JQCrhuEAvxtiuMwx3w
On Mon, Nov 02, 2015 at 03:46:52PM +0200, Moni Shoua wrote:
>> get_lid()
>> Provides the LID
>>
>LID is a special case of L2 address (MAC is another special case)
>Maybe change this to het_l2()?
I'm not particularly tied to the name, we can certainly change it to
something else. I assume that's a typo and should be "get_l2()"?
>> qp_mtu()
>> Using the SL determines the MTU (this varies per VL in OPA)
>>
>Again, this is too tied to the InfiniBand protocol.
>Also, It doesn't make sense to me that a driver won't know how to
>create a QP (this is done by the rvt) but will know how to answer
>about QP mtu. Does it?
Perhaps that should have been written as "validate_mtu()", to take the user
supplied MTU and ensure that it is valid. Keep in mind that OPA can have an
MTU that varies across VLs. For those drivers which do not support this the
function basically becomes a no-op.
>> flush_qp()
>> Flush out all pending operations for a QP that have not made it the
>> wire, and wait for that flush to finish.
>Again, needs generalization
Sure, we can work on that.
>> do_send()
>> Take a fully constructed packet and place on the wire
>>
>The hardest operation of all IMO.
>Should be efficient but yet general
We are on the same page here.
>> Next steps
>> ----------
>> We will continue posting code to GitHub [2] while we field feedback. Note
>> the repo has been moved from my previous announcement [1]. I have placed it
>> under my GitHub. Once there is more significant development and folks are
>> generally happy with the design we will begin posting to this mailing list
>> (linux-rdma).
>What's the minimal progress in the rvt and the drivers before you
>think it's ready for posting to the list?
I don't think we have a hard set minimal state before we are ready to post
to the list. Let's see how things shape up over the next couple of weeks and
if we as a community like the direction of the code and think its ready to
post we can surely do so.
-Denny
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2015-11-02 16:34 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-29 19:41 [RFC] RDMA verbs transport design notes Dennis Dalessandro
[not found] ` <20151029194129.GE26235-W4f6Xiosr+yv7QzWx2u06xL4W9x8LtSr@public.gmane.org>
2015-11-02 13:46 ` Moni Shoua
[not found] ` <CAG9sBKOEs_u-YVTwS=Gk2sxRCqV+Wsdh1PyNy+baf3EBSCSbTA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-11-02 16:34 ` Dennis Dalessandro
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).