All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] libibmad: To reserve upper 8 bits of tid used by solaris SRIOV driver
@ 2013-03-12 19:35 Boris Chiu
       [not found] ` <513F8386.5070208-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Boris Chiu @ 2013-03-12 19:35 UTC (permalink / raw)
  To: iw >> Ira Weiny; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

From: Brendan Doyle <brendan.doyle-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>

Signed-off-by: Brendan Doyle <brendan.doyle-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
 src/mad.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/src/mad.c b/src/mad.c
index 70a69dd..fb5f5eb 100644
--- a/src/mad.c
+++ b/src/mad.c
@@ -62,6 +62,9 @@ uint64_t mad_trid(void)
 		trid = random();
 	}
 	next = ++trid | (base << 32);
+#if defined(__SVR4) && defined(__sun)
+	next &= 0x00ffffffffffffff;
+#endif
 	return next;
 }
 
-- 
1.7.1


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] libibmad: To reserve upper 8 bits of tid used by solaris SRIOV driver
       [not found] ` <513F8386.5070208-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
@ 2013-03-20 22:11   ` Ira Weiny
       [not found]     ` <CAMLCd9LOuajya0_s4NhuUD4hUtHnc_NLi8sELnsEprash6t5Eg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Ira Weiny @ 2013-03-20 22:11 UTC (permalink / raw)
  To: Boris Chiu; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

Why does this need to be done in user space?  This seems like
something which should be done in the kernel for all Transaction ID's
which might be sent to the umad layer.

Ira

On Tue, Mar 12, 2013 at 12:35 PM, Boris Chiu <boris.chiu-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> wrote:
> From: Brendan Doyle <brendan.doyle-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
>
> Signed-off-by: Brendan Doyle <brendan.doyle-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
> ---
> src/mad.c |    3 +++
> 1 files changed, 3 insertions(+), 0 deletions(-)
>
> diff --git a/src/mad.c b/src/mad.c
> index 70a69dd..fb5f5eb 100644
> --- a/src/mad.c
> +++ b/src/mad.c
> @@ -62,6 +62,9 @@ uint64_t mad_trid(void)
>                 trid = random();
>         }
>         next = ++trid | (base << 32);
> +#if defined(__SVR4) && defined(__sun)
> +       next &= 0x00ffffffffffffff;
> +#endif
>         return next;
> }
>
> --
> 1.7.1
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] libibmad: To reserve upper 8 bits of tid used by solaris SRIOV driver
       [not found]     ` <CAMLCd9LOuajya0_s4NhuUD4hUtHnc_NLi8sELnsEprash6t5Eg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2013-03-20 22:33       ` Jason Gunthorpe
       [not found]         ` <514A3996.601@oracle.com>
  0 siblings, 1 reply; 10+ messages in thread
From: Jason Gunthorpe @ 2013-03-20 22:33 UTC (permalink / raw)
  To: Ira Weiny; +Cc: Boris Chiu, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Wed, Mar 20, 2013 at 03:11:51PM -0700, Ira Weiny wrote:
> Why does this need to be done in user space?  This seems like
> something which should be done in the kernel for all Transaction ID's
> which might be sent to the umad layer.

Agreed, if Solaris is going to emulate the Linux umad dev FD it should
be done properly. The kernel overrides some bits and returns the TRID
it actually put on the wire after the MAD is sent.

Linux is already doing this to add per-process uniqueness, per-guest
SRIOV uniqueness should use the same mechanism.

Thus, this masking is either redundant, or hiding a kernel bug..

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Fwd: Re: [PATCH] libibmad: To reserve upper 8 bits of tid used by solaris SRIOV driver
       [not found]           ` <514A3996.601-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
@ 2013-03-21  0:20             ` brendan doyle
       [not found]               ` <514A525F.7030706-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: brendan doyle @ 2013-03-21  0:20 UTC (permalink / raw)
  To: Boris Chiu, Weiny, Ira, Jason Gunthorpe
  Cc: Pramod Gunjikar, linux-rdma-u79uwXL29TY76Z2rM5mHXA


On 20/03/2013 22:35, Boris Chiu wrote:
> fyi,
>
> Boris
>
>
> -------- Original Message --------
> Subject: 	Re: [PATCH] libibmad: To reserve upper 8 bits of tid used by 
> solaris SRIOV driver
> Date: 	Wed, 20 Mar 2013 16:33:22 -0600
> From: 	Jason Gunthorpe <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
> To: 	Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> CC: 	Boris Chiu <boris.chiu-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>
>
>
> On Wed, Mar 20, 2013 at 03:11:51PM -0700, Ira Weiny wrote:
> > Why does this need to be done in user space?  This seems like
> > something which should be done in the kernel for all Transaction ID's
> > which might be sent to the umad layer.

So the reason we "reserved" not set the upper two bytes of TID in the 
user space
is so that libs and/or applications that rely on TID matching won't 
break. If we
changed the TID in the kernel then the response TID returned to the 
lib/application
would not match the request TID.

Currently the only place I see TID check is in libibmad's _do_madrpc(), 
and here it
effectively does something similar where it truncates the TID to 
32bits.  We send a
MAD with a full 64 bit TID, record the lower 32 bits of the TID, on on 
receipt of the
response check that the lower 32 bits of TID are the same, the upper 32 
could be
completely different.

>
> Agreed, if Solaris is going to emulate the Linux umad dev FD it should
> be done properly. The kernel overrides some bits and returns the TRID
> it actually put on the wire after the MAD is sent.

I don't see where the TID sent is returned to the caller of umad_send(), 
I see
that we pass down a MAD over the fd to the kernel that has a full 64bit 
TID and
then I see that the kernel over writes the upper 32 bits, but I don't 
see how that
is communicated back to the sender of the MAD, so if the sender were doing
64 bit TID matching it would not find its MAD, the upper 32bits it set 
are lost.

Note in what we propose, in the userland we just reserve the upper byte, 
it is the
kernel that then uses/sets and unsets this upper byte to include a VF id so
MADs send on a VF device can get tunneled to the PF and received on a PF 
can be
directed to the correct VF, the upper byte is cleared before handing 
back to the
userland so that it can do 64 bit TID matching and the TID it specified in
the request MAD is the same as the one it gets in the response MAD.

>
> Linux is already doing this to add per-process uniqueness, per-guest
> SRIOV uniqueness should use the same mechanism.

The way it is done in Linux effectively makes the TID a 32bit entity from
the userland perspective.


>
> Thus, this masking is either redundant, or hiding a kernel bug..

I might say it is the other way around. In Linux from the userland the 
TID you get in
a response MAD is not the same as the one you specified in the request MAD.

I'm not debating that the kernel should use upper bits of the TID to demux
per process, per guest, that is what we do in Solaris, what I'm saying
is that those bits used by the kernel should be "reserved"  in userland
so that the TID of its request and response are the same across all 64 bits.

Brendan

> Jason
>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: Fwd: Re: [PATCH] libibmad: To reserve upper 8 bits of tid used by solaris SRIOV driver
       [not found]               ` <514A525F.7030706-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
@ 2013-03-21  0:51                 ` Weiny, Ira
       [not found]                   ` <2807E5FD2F6FDA4886F6618EAC48510EBB428F-8k97q/ur5Z1cIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org>
  2013-03-21  5:07                 ` Jason Gunthorpe
  1 sibling, 1 reply; 10+ messages in thread
From: Weiny, Ira @ 2013-03-21  0:51 UTC (permalink / raw)
  To: brendan doyle, Boris Chiu, Jason Gunthorpe
  Cc: Pramod Gunjikar, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Hal Rosenstock
	(hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org)

> -----Original Message-----
> From: brendan doyle [mailto:brendan.doyle-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org]
> 
> 
> On 20/03/2013 22:35, Boris Chiu wrote:
> > fyi,
> >
> > Boris
> >
> >
> > -------- Original Message --------
> > Subject: 	Re: [PATCH] libibmad: To reserve upper 8 bits of tid used by
> > solaris SRIOV driver
> >
> >
> >
> > On Wed, Mar 20, 2013 at 03:11:51PM -0700, Ira Weiny wrote:
> > > Why does this need to be done in user space?  This seems like
> > > something which should be done in the kernel for all Transaction
> > > ID's which might be sent to the umad layer.
> 
> So the reason we "reserved" not set the upper two bytes of TID

You mean 1 byte?

>  in the user
> space is so that libs and/or applications that rely on TID matching won't break.
> If we changed the TID in the kernel then the response TID returned to the
> lib/application would not match the request TID.

While this makes sense I can't believe that libibmad is the only place which TID's get generated.  Do you also do this in OpenSM?

> 
> Currently the only place I see TID check is in libibmad's _do_madrpc(), and
> here it effectively does something similar where it truncates the TID to
> 32bits.  We send a MAD with a full 64 bit TID, record the lower 32 bits of the
> TID, on on receipt of the response check that the lower 32 bits of TID are the
> same, the upper 32 could be completely different.
> 

Right, it ignores the upper 32bits because it knows the umad layer reserves the upper 32bits.  So for libibmad your patch does not fix anything unless you are doing some 64bit compare of TID's above the libibmad layer?


> >
> > Agreed, if Solaris is going to emulate the Linux umad dev FD it should
> > be done properly. The kernel overrides some bits and returns the TRID
> > it actually put on the wire after the MAD is sent.
> 
> I don't see where the TID sent is returned to the caller of umad_send(), I see
> that we pass down a MAD over the fd to the kernel that has a full 64bit TID
> and then I see that the kernel over writes the upper 32 bits, but I don't see
> how that is communicated back to the sender of the MAD, so if the sender
> were doing
> 64 bit TID matching it would not find its MAD, the upper 32bits it set are lost.

I believe you are correct.  It is just up to the user of umad to "know" that the upper 32bits are not valid.  While unfortunate, this can only really be fixed through a major change in the umad ABI and/or documentation of that library.

> 
> Note in what we propose, in the userland we just reserve the upper byte, it
> is the kernel that then uses/sets and unsets this upper byte to include a VF id
> so MADs send on a VF device can get tunneled to the PF and received on a
> PF can be directed to the correct VF, the upper byte is cleared before
> handing back to the userland so that it can do 64 bit TID matching and the TID
> it specified in the request MAD is the same as the one it gets in the response
> MAD.

This makes sense but I worry about users who do not use libibmad.  Frankly, I think people should be moving away from using libibmad.  It is very complicated and undocumented.  Putting transaction ID processing like this should probably be in libibumad since it reserves and uses bits from the TID.

> 
> >
> > Linux is already doing this to add per-process uniqueness, per-guest
> > SRIOV uniqueness should use the same mechanism.
> 
> The way it is done in Linux effectively makes the TID a 32bit entity from the
> userland perspective.
> 

Yep, and I wish it was better documented and/or done a different way.

> 
> >
> > Thus, this masking is either redundant, or hiding a kernel bug..
> 
> I might say it is the other way around. In Linux from the userland the TID you
> get in a response MAD is not the same as the one you specified in the
> request MAD.

And this is yet another unfortunate "gotcha" of the libibumad interface.

> 
> I'm not debating that the kernel should use upper bits of the TID to demux
> per process, per guest, that is what we do in Solaris, what I'm saying is that
> those bits used by the kernel should be "reserved"  in userland so that the
> TID of its request and response are the same across all 64 bits.

So in Solaris do you combine the reservation of umad and this SRIOV like this:

0xXX YYYYYY ZZZZZZZZ

Where XX is the VF mask YYYYYY is the agent mask and ZZZZZZZZ is the "actual" TID?

It seems like we should standardize on some format and then put this functionality into libibumad where all software can consistently use it.

Or leave things the way they are and define the TID to be 32bits in userspace and leave the rest up to the kernel.  In that respect libibmad probably should not be generating 64bit TID's.

I think I would rather see a patch which generates a 32bit TID from mad_trid.  Would you be happy with that?

Ira

> 
> Brendan
> 
> > Jason
> >
> >

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Fwd: Re: [PATCH] libibmad: To reserve upper 8 bits of tid used by solaris SRIOV driver
       [not found]                   ` <2807E5FD2F6FDA4886F6618EAC48510EBB428F-8k97q/ur5Z1cIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2013-03-21  1:23                     ` brendan doyle
  0 siblings, 0 replies; 10+ messages in thread
From: brendan doyle @ 2013-03-21  1:23 UTC (permalink / raw)
  To: Weiny, Ira
  Cc: Boris Chiu, Jason Gunthorpe, Pramod Gunjikar,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Hal Rosenstock
	(hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org)


On 21/03/2013 00:51, Weiny, Ira wrote:
>> -----Original Message-----
>> From: brendan doyle [mailto:brendan.doyle-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org]
>>
>>
>> On 20/03/2013 22:35, Boris Chiu wrote:
>>> fyi,
>>>
>>> Boris
>>>
>>>
>>> -------- Original Message --------
>>> Subject: 	Re: [PATCH] libibmad: To reserve upper 8 bits of tid used by
>>> solaris SRIOV driver
>>>
>>>
>>>
>>> On Wed, Mar 20, 2013 at 03:11:51PM -0700, Ira Weiny wrote:
>>>> Why does this need to be done in user space?  This seems like
>>>> something which should be done in the kernel for all Transaction
>>>> ID's which might be sent to the umad layer.
>> So the reason we "reserved" not set the upper two bytes of TID
> You mean 1 byte?

yeah sorry typo..

>>   in the user
>> space is so that libs and/or applications that rely on TID matching won't break.
>> If we changed the TID in the kernel then the response TID returned to the
>> lib/application would not match the request TID.
> While this makes sense I can't believe that libibmad is the only place which TID's get generated.  Do you also do this in OpenSM?

I didn't personal check, butt I ask a colleague who was making the 
change to check this
specifically, and the answer was that all tids were set/generated via 
mad_trid().

>> Currently the only place I see TID check is in libibmad's _do_madrpc(), and
>> here it effectively does something similar where it truncates the TID to
>> 32bits.  We send a MAD with a full 64 bit TID, record the lower 32 bits of the
>> TID, on on receipt of the response check that the lower 32 bits of TID are the
>> same, the upper 32 could be completely different.
>>
> Right, it ignores the upper 32bits because it knows the umad layer reserves the upper 32bits.  So for libibmad your patch does not fix anything unless you are doing some 64bit compare of TID's above the libibmad layer?

we are not so, i agree you could say that patch is moot, but I do think 
that tids
can be handled better than they are, and that if we expect the kernel to use
some of the upper bytes, then they should be marked reserved in userland.


>>> Agreed, if Solaris is going to emulate the Linux umad dev FD it should
>>> be done properly. The kernel overrides some bits and returns the TRID
>>> it actually put on the wire after the MAD is sent.
>> I don't see where the TID sent is returned to the caller of umad_send(), I see
>> that we pass down a MAD over the fd to the kernel that has a full 64bit TID
>> and then I see that the kernel over writes the upper 32 bits, but I don't see
>> how that is communicated back to the sender of the MAD, so if the sender
>> were doing
>> 64 bit TID matching it would not find its MAD, the upper 32bits it set are lost.
> I believe you are correct.  It is just up to the user of umad to "know" that the upper 32bits are not valid.  While unfortunate, this can only really be fixed through a major change in the umad ABI and/or documentation of that library.

noted.

>> Note in what we propose, in the userland we just reserve the upper byte, it
>> is the kernel that then uses/sets and unsets this upper byte to include a VF id
>> so MADs send on a VF device can get tunneled to the PF and received on a
>> PF can be directed to the correct VF, the upper byte is cleared before
>> handing back to the userland so that it can do 64 bit TID matching and the TID
>> it specified in the request MAD is the same as the one it gets in the response
>> MAD.
> This makes sense but I worry about users who do not use libibmad.  Frankly, I think people should be moving away from using libibmad.  It is very complicated and undocumented.  Putting transaction ID processing like this should probably be in libibumad since it reserves and uses bits from the TID.

Humm, so may be getting beyond the scope of this patch, I think there 
are a lot of good
parts to libibmad, especially the packing and unpacking/parsing of the 
MAD pkt, and
if we were to drop libibmad these should be rolled into libibumd. 
Included in that should
be a function to get a tid, and that all users of the umad interface 
must get their tid using
that function, then we can enforce a consistent policy for tids.

>>> Linux is already doing this to add per-process uniqueness, per-guest
>>> SRIOV uniqueness should use the same mechanism.
>> The way it is done in Linux effectively makes the TID a 32bit entity from the
>> userland perspective.
>>
> Yep, and I wish it was better documented and/or done a different way.
>
>>> Thus, this masking is either redundant, or hiding a kernel bug..
>> I might say it is the other way around. In Linux from the userland the TID you
>> get in a response MAD is not the same as the one you specified in the
>> request MAD.
> And this is yet another unfortunate "gotcha" of the libibumad interface.
>
>> I'm not debating that the kernel should use upper bits of the TID to demux
>> per process, per guest, that is what we do in Solaris, what I'm saying is that
>> those bits used by the kernel should be "reserved"  in userland so that the
>> TID of its request and response are the same across all 64 bits.
> So in Solaris do you combine the reservation of umad and this SRIOV like this:
>
> 0xXX YYYYYY ZZZZZZZZ
>
> Where XX is the VF mask YYYYYY is the agent mask and ZZZZZZZZ is the "actual" TID?
Actually we just reserve the upper byte, we currently don't include an 
agent mask, but
that's something we could certainly add.

>
> It seems like we should standardize on some format and then put this functionality into libibumad where all software can consistently use it.

I'd agree.

>
> Or leave things the way they are and define the TID to be 32bits in userspace and leave the rest up to the kernel.  In that respect libibmad probably should not be generating 64bit TID's.

So I think that could be a short term thing, until we have an agreed 
standard as above.

>
> I think I would rather see a patch which generates a 32bit TID from mad_trid.  Would you be happy with that?

yeup, I'll ask Boris to rework the patch.

Thanks

>
> Ira
>
>> Brendan
>>
>>> Jason
>>>
>>>

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Fwd: Re: [PATCH] libibmad: To reserve upper 8 bits of tid used by solaris SRIOV driver
       [not found]               ` <514A525F.7030706-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
  2013-03-21  0:51                 ` Weiny, Ira
@ 2013-03-21  5:07                 ` Jason Gunthorpe
       [not found]                   ` <20130321050709.GA20882-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  1 sibling, 1 reply; 10+ messages in thread
From: Jason Gunthorpe @ 2013-03-21  5:07 UTC (permalink / raw)
  To: brendan doyle
  Cc: Boris Chiu, Weiny, Ira, Pramod Gunjikar,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Thu, Mar 21, 2013 at 12:20:47AM +0000, brendan doyle wrote:

> >On Wed, Mar 20, 2013 at 03:11:51PM -0700, Ira Weiny wrote:
> >> Why does this need to be done in user space?  This seems like
> >> something which should be done in the kernel for all Transaction ID's
> >> which might be sent to the umad layer.
> 
> So the reason we "reserved" not set the upper two bytes of TID in
> the user space is so that libs and/or applications that rely on TID
> matching won't break. If we changed the TID in the kernel then the
> response TID returned to the lib/application would not match the
> request TID.

This is already the case, only the lower 32 bits are available for
application use, the upper 32 bits are all reserved for the kernel,
and you can learn what they are by doing a SMP to the local SMA. IIRC
some of my stuff captures the upper 32 bits when it gets the first
reply - but all userspace demux is only done on the low 32 bits
anyhow.

It is mysterious why umad sends in all 64 bits, but it isn't
harmful. If you are going to change it then send 32 bits only.
 
> >Agreed, if Solaris is going to emulate the Linux umad dev FD it should
> >be done properly. The kernel overrides some bits and returns the TRID
> >it actually put on the wire after the MAD is sent.
> 
> I don't see where the TID sent is returned to the caller of
> umad_send(), I see that we pass down a MAD over the fd to the kernel
> that has a full 64bit TID and then I see that the kernel over writes
> the upper 32 bits, but I don't see how that is communicated back to
> the sender of the MAD, so if the sender were doing 64 bit TID
> matching it would not find its MAD, the upper 32bits it set are
> lost.

Certainly, the umad API is problematic in many ways :|

> Note in what we propose, in the userland we just reserve the upper
> byte, it is the kernel that then uses/sets and unsets this upper
> byte to include a VF id so MADs send on a VF device can get tunneled
> to the PF and received on a PF can be directed to the correct VF,
> the upper byte is cleared before handing back to the userland so
> that it can do 64 bit TID matching and the TID it specified in the
> request MAD is the same as the one it gets in the response MAD.

Masking the TID off in messages coming from the kernel to umad is a
really bad idea, the TID returned to user space should exactly match
the on-the-wire value. This is why 64 bits are allocated for a 32 bit
value. Otherwise you have to create ugly special cases for requests vs
reply vs trap and it precludes userspace producing sensible packet
traces and the like.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Fwd: Re: [PATCH] libibmad: To reserve upper 8 bits of tid used by solaris SRIOV driver
       [not found]                   ` <20130321050709.GA20882-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2013-03-21 18:14                     ` brendan doyle
       [not found]                       ` <514B4E17.5020703-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: brendan doyle @ 2013-03-21 18:14 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Boris Chiu, Weiny, Ira, Pramod Gunjikar,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

Ok so I think we are essentially on the same page...

> This is already the case, only the lower 32 bits are available for
> application use, the upper 32 bits are all reserved for the kernel,
This is not an IB spec defined delineation, but an OF one. Perhaps there 
was some
community discussion to architect it this way but it is not documented. 
At the
very least this patch should document this.

So how about modifying the patch such that mad_trid() is something like:

#define GET_IB_USERLAND_TID(tid) (tid & 0x00000000ffffffff)
/*
   * Generate the  64 bit MAD transaction ID. The upper 32 bits are 
reserved for use
   * by the kernel. We clear the upper 32 bits here, but MADs received 
from the kernel
   * may contain kernel specific data in these bits, consequently 
userland TID matching
   * should only be done on the lower 32 bits.
   */
uint64_t mad_trid(void)
{
         static uint64_t trid;
         uint64_t next;

         if (!trid) {
                 srandom((int)time(0) * getpid());
                 trid = random();
         }
         next = ++trid;
         next = GET_IB_USERLAND_TID(next);
         return next;
}

and in _do_madrpc

-uint32_t trid;          /* only low 32 bits */
+uint32_t trid;          /* only low 32 bits - see mad_trid() */

Brendan

On 21/03/2013 05:07, Jason Gunthorpe wrote:
> On Thu, Mar 21, 2013 at 12:20:47AM +0000, brendan doyle wrote:
>
>>> On Wed, Mar 20, 2013 at 03:11:51PM -0700, Ira Weiny wrote:
>>>> Why does this need to be done in user space?  This seems like
>>>> something which should be done in the kernel for all Transaction ID's
>>>> which might be sent to the umad layer.
>> So the reason we "reserved" not set the upper two bytes of TID in
>> the user space is so that libs and/or applications that rely on TID
>> matching won't break. If we changed the TID in the kernel then the
>> response TID returned to the lib/application would not match the
>> request TID.
> This is already the case, only the lower 32 bits are available for
> application use, the upper 32 bits are all reserved for the kernel,
> and you can learn what they are by doing a SMP to the local SMA. IIRC
> some of my stuff captures the upper 32 bits when it gets the first
> reply - but all userspace demux is only done on the low 32 bits
> anyhow.
>
> It is mysterious why umad sends in all 64 bits, but it isn't
> harmful. If you are going to change it then send 32 bits only.
>   
>>> Agreed, if Solaris is going to emulate the Linux umad dev FD it should
>>> be done properly. The kernel overrides some bits and returns the TRID
>>> it actually put on the wire after the MAD is sent.
>> I don't see where the TID sent is returned to the caller of
>> umad_send(), I see that we pass down a MAD over the fd to the kernel
>> that has a full 64bit TID and then I see that the kernel over writes
>> the upper 32 bits, but I don't see how that is communicated back to
>> the sender of the MAD, so if the sender were doing 64 bit TID
>> matching it would not find its MAD, the upper 32bits it set are
>> lost.
> Certainly, the umad API is problematic in many ways :|
>
>> Note in what we propose, in the userland we just reserve the upper
>> byte, it is the kernel that then uses/sets and unsets this upper
>> byte to include a VF id so MADs send on a VF device can get tunneled
>> to the PF and received on a PF can be directed to the correct VF,
>> the upper byte is cleared before handing back to the userland so
>> that it can do 64 bit TID matching and the TID it specified in the
>> request MAD is the same as the one it gets in the response MAD.
> Masking the TID off in messages coming from the kernel to umad is a
> really bad idea, the TID returned to user space should exactly match
> the on-the-wire value. This is why 64 bits are allocated for a 32 bit
> value. Otherwise you have to create ugly special cases for requests vs
> reply vs trap and it precludes userspace producing sensible packet
> traces and the like.
>
> Jason

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Fwd: Re: [PATCH] libibmad: To reserve upper 8 bits of tid used by solaris SRIOV driver
       [not found]                       ` <514B4E17.5020703-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
@ 2013-03-21 18:45                         ` Jason Gunthorpe
       [not found]                           ` <20130321184507.GB8044-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Jason Gunthorpe @ 2013-03-21 18:45 UTC (permalink / raw)
  To: brendan doyle
  Cc: Boris Chiu, Weiny, Ira, Pramod Gunjikar,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Thu, Mar 21, 2013 at 06:14:47PM +0000, brendan doyle wrote:
> Ok so I think we are essentially on the same page...
> 
> >This is already the case, only the lower 32 bits are available for
> >application use, the upper 32 bits are all reserved for the kernel,

> This is not an IB spec defined delineation, but an OF one. Perhaps
> there was some community discussion to architect it this way but it
> is not documented. At the very least this patch should document
> this.

Well, it is how /dev/umad works - IBA does not define the umad api,
Solaris is copying the Linux umad API, so it should work the same..

There is no need to specify the format of the upper 32 bits, it is
kernel specific and has no impact on the application. The kernel
should use those bits as necessary to demux flows between users of
umad, be they processes or VMs.

> So how about modifying the patch such that mad_trid() is something like:

Looks like a reasonable clarification of existing usage to me.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Fwd: Re: [PATCH] libibmad: To reserve upper 8 bits of tid used by solaris SRIOV driver
       [not found]                           ` <20130321184507.GB8044-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2013-03-21 20:40                             ` brendan doyle
  0 siblings, 0 replies; 10+ messages in thread
From: brendan doyle @ 2013-03-21 20:40 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Boris Chiu, Weiny, Ira, Pramod Gunjikar,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA


On 21/03/2013 18:45, Jason Gunthorpe wrote:
> On Thu, Mar 21, 2013 at 06:14:47PM +0000, brendan doyle wrote:
>> Ok so I think we are essentially on the same page...
>>
>>> This is already the case, only the lower 32 bits are available for
>>> application use, the upper 32 bits are all reserved for the kernel,
>> This is not an IB spec defined delineation, but an OF one. Perhaps
>> there was some community discussion to architect it this way but it
>> is not documented. At the very least this patch should document
>> this.
> Well, it is how /dev/umad works - IBA does not define the umad api,
> Solaris is copying the Linux umad API, so it should work the same..
>
> There is no need to specify the format of the upper 32 bits, it is
> kernel specific and has no impact on the application. The kernel
> should use those bits as necessary to demux flows between users of
> umad, be they processes or VMs.
>
>> So how about modifying the patch such that mad_trid() is something like:
> Looks like a reasonable clarification of existing usage to me.

Great, so we'll modify the patch and resubmit.

Thanks

>
> Jason

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2013-03-21 20:40 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-03-12 19:35 [PATCH] libibmad: To reserve upper 8 bits of tid used by solaris SRIOV driver Boris Chiu
     [not found] ` <513F8386.5070208-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2013-03-20 22:11   ` Ira Weiny
     [not found]     ` <CAMLCd9LOuajya0_s4NhuUD4hUtHnc_NLi8sELnsEprash6t5Eg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-03-20 22:33       ` Jason Gunthorpe
     [not found]         ` <514A3996.601@oracle.com>
     [not found]           ` <514A3996.601-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2013-03-21  0:20             ` Fwd: " brendan doyle
     [not found]               ` <514A525F.7030706-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2013-03-21  0:51                 ` Weiny, Ira
     [not found]                   ` <2807E5FD2F6FDA4886F6618EAC48510EBB428F-8k97q/ur5Z1cIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2013-03-21  1:23                     ` brendan doyle
2013-03-21  5:07                 ` Jason Gunthorpe
     [not found]                   ` <20130321050709.GA20882-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2013-03-21 18:14                     ` brendan doyle
     [not found]                       ` <514B4E17.5020703-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2013-03-21 18:45                         ` Jason Gunthorpe
     [not found]                           ` <20130321184507.GB8044-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2013-03-21 20:40                             ` brendan doyle

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.