All of lore.kernel.org
 help / color / mirror / Atom feed
* RFC: Immediate data support for SRP
@ 2015-07-16 15:25 Bart Van Assche
       [not found] ` <55A7CCF1.3080201-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Bart Van Assche @ 2015-07-16 15:25 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA

Hello,

As you probably know for write requests "immediate data" means sending 
the data in the same packet as the write command instead of sending it 
as a separate packet. This approach improves performance and reduces 
latency. Although support for immediate data has not been standardized, 
it is easy to add to the SRP initiator and target drivers. 
Implementations exist in the ib_srp-backport initiator driver and the 
SCST SRP target driver (see also 
https://github.com/bvanassche/ib_srp-backport and 
http://sourceforge.net/p/scst/svn/HEAD/tree/trunk/srpt/). These 
implementations are available since considerable time, work reliably, 
are backwards compatible and support zero-copy. Since using immediate 
data provides a measurable performance improvement I'm wondering whether 
it would be acceptable to add support for immediate data to the SRP 
drivers in the Linux kernel tree (ib_srp and ib_srpt) ?

Thanks,

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RFC: Immediate data support for SRP
       [not found] ` <55A7CCF1.3080201-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
@ 2015-07-19 16:07   ` Sagi Grimberg
       [not found]     ` <CACaajQu9QBo7robmrFF7hev-xS=c3VD9qQTn5DKuuObk5aU_Kg@mail.gmail.com>
       [not found]     ` <55ABCB34.1000506-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 2 replies; 10+ messages in thread
From: Sagi Grimberg @ 2015-07-19 16:07 UTC (permalink / raw)
  To: Bart Van Assche, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On 7/16/2015 6:25 PM, Bart Van Assche wrote:
> Hello,

Hi Bart,

I agree it would definitely help as the lack of immediate data
emphasizes the additional latency of doing rdma reads.

>
> As you probably know for write requests "immediate data" means sending
> the data in the same packet as the write command instead of sending it
> as a separate packet. This approach improves performance and reduces
> latency. Although support for immediate data has not been standardized,

Has anyone tried to get it into the standard? It would seem beneficial
to just add it. I wouldn't want this to be a Linux compatible only
feature...

> it is easy to add to the SRP initiator and target drivers.
> Implementations exist in the ib_srp-backport initiator driver and the
> SCST SRP target driver (see also
> https://github.com/bvanassche/ib_srp-backport and
> http://sourceforge.net/p/scst/svn/HEAD/tree/trunk/srpt/). These
> implementations are available since considerable time, work reliably,
> are backwards compatible and support zero-copy. Since using immediate
> data provides a measurable performance improvement I'm wondering whether
> it would be acceptable to add support for immediate data to the SRP
> drivers in the Linux kernel tree (ib_srp and ib_srpt) ?

Was this tested against any other array besides SCST and LIO? I know
people are using SRP against various arrays (Oracle ZFS, RamSan TMS,
DDN Fusion, NIMBUS, NetApp E5400A...). Their not running upstream
kernels, but they will catch up at some point...

Sagi.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RFC: Immediate data support for SRP
       [not found]     ` <55ABCB34.1000506-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2015-07-19 21:43       ` Or Gerlitz
       [not found]         ` <CAJ3xEMiz=n_uP9TQ6YsGN+omzsN3Zrz5skDEq00+fE188z0erA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2015-07-21  0:03       ` Bart Van Assche
  1 sibling, 1 reply; 10+ messages in thread
From: Or Gerlitz @ 2015-07-19 21:43 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: Bart Van Assche, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Sun, Jul 19, 2015 at 7:07 PM, Sagi Grimberg <sagig-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> wrote:
> On 7/16/2015 6:25 PM, Bart Van Assche wrote:

> I agree it would definitely help as the lack of immediate data
> emphasizes the additional latency of doing rdma reads.

Sagi, do we have black box evidence from iSER showing notable
(results? setup?) IO latency improvement from using immediate data vs.
RDMA read?

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RFC: Immediate data support for SRP
       [not found]         ` <CAJ3xEMiz=n_uP9TQ6YsGN+omzsN3Zrz5skDEq00+fE188z0erA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-07-20  9:44           ` Sagi Grimberg
       [not found]             ` <55ACC2FA.8030202-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Sagi Grimberg @ 2015-07-20  9:44 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: Bart Van Assche, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On 7/20/2015 12:43 AM, Or Gerlitz wrote:
> On Sun, Jul 19, 2015 at 7:07 PM, Sagi Grimberg <sagig-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> wrote:
>> On 7/16/2015 6:25 PM, Bart Van Assche wrote:
>
>> I agree it would definitely help as the lack of immediate data
>> emphasizes the additional latency of doing rdma reads.
>
> Sagi, do we have black box evidence from iSER showing notable
> (results? setup?) IO latency improvement from using immediate data vs.
> RDMA read?

I've seen it. The LIO target has a better write performance due to
ImmediateData. I also have a patch in the pipe that optimize the
this flow at the target side which improves up to 40% for 512B-8K IOs.

Sagi.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re:  RFC: Immediate data support for SRP
       [not found]             ` <55ACC2FA.8030202-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2015-07-20 15:26               ` Or Gerlitz
       [not found]                 ` <55AD133D.2040204-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Or Gerlitz @ 2015-07-20 15:26 UTC (permalink / raw)
  To: Sagi Grimberg, Or Gerlitz
  Cc: Bart Van Assche, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On 7/20/2015 12:44 PM, Sagi Grimberg wrote:
> On 7/20/2015 12:43 AM, Or Gerlitz wrote:
>> On Sun, Jul 19, 2015 at 7:07 PM, Sagi Grimberg 
>> <sagig-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> wrote:
>>> On 7/16/2015 6:25 PM, Bart Van Assche wrote:
>>
>>> I agree it would definitely help as the lack of immediate data
>>> emphasizes the additional latency of doing rdma reads.
>>
>> Sagi, do we have black box evidence from iSER showing notable
>> (results? setup?) IO latency improvement from using immediate data vs.
>> RDMA read?
>
> I've seen it. The LIO target has a better write performance due to 
> ImmediateData. 

Numberz, please...

Also, do you see any gain with TGT too? if not, what's you thinking re 
the LIO vs TGT difference?

> I also have a patch in the pipe that optimize the this flow at the 
> target side which improves up to 40% for 512B-8K IOs.
>

So you have 140% better IOPS with immediate-data vs. non immediate 
data?! numberz?

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RFC: Immediate data support for SRP
       [not found]       ` <CACaajQu9QBo7robmrFF7hev-xS=c3VD9qQTn5DKuuObk5aU_Kg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-07-20 23:49         ` Bart Van Assche
  0 siblings, 0 replies; 10+ messages in thread
From: Bart Van Assche @ 2015-07-20 23:49 UTC (permalink / raw)
  To: Vasiliy Tolstov, Sagi Grimberg; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

On 07/19/2015 02:25 PM, Vasiliy Tolstov wrote:
>  > On 7/16/2015 6:25 PM, Bart Van Assche wrote:
>  >> it is easy to add to the SRP initiator and target drivers.
>  >> Implementations exist in the ib_srp-backport initiator driver and the
>  >> SCST SRP target driver (see also
>  >> https://github.com/bvanassche/ib_srp-backport and
>  >> http://sourceforge.net/p/scst/svn/HEAD/tree/trunk/srpt/). These
>  >> implementations are available since considerable time, work reliably,
>  >> are backwards compatible and support zero-copy. Since using immediate
>  >> data provides a measurable performance improvement I'm wondering whether
>  >> it would be acceptable to add support for immediate data to the SRP
>  >> drivers in the Linux kernel tree (ib_srp and ib_srpt) ?
>
> Does this possible in userspace? I'm working on software defined storage
> with rdma support,and I think this is usable.

Hello Vasiliy,

The feature of the RDMA API that was used to implement sending inline 
data, namely passing a scatterlist to ib_post_send() with more than one 
element, is also available in the user space RDMA API. I think it would 
be easy to add inline data support in a user space implementation of 
SRP. Not that it really matters, but your e-mail made me wonder whether 
the source code of the user space SRP initiator implementation you are 
working is available somewhere, and if so, under which license ?

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RFC: Immediate data support for SRP
       [not found]     ` <55ABCB34.1000506-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  2015-07-19 21:43       ` Or Gerlitz
@ 2015-07-21  0:03       ` Bart Van Assche
       [not found]         ` <55AD8C3A.9070403-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
  1 sibling, 1 reply; 10+ messages in thread
From: Bart Van Assche @ 2015-07-21  0:03 UTC (permalink / raw)
  To: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On 07/19/2015 09:07 AM, Sagi Grimberg wrote:
> On 7/16/2015 6:25 PM, Bart Van Assche wrote:
>> As you probably know for write requests "immediate data" means sending
>> the data in the same packet as the write command instead of sending it
>> as a separate packet. This approach improves performance and reduces
>> latency. Although support for immediate data has not been standardized,
>
> Has anyone tried to get it into the standard? It would seem beneficial
> to just add it. I wouldn't want this to be a Linux compatible only
> feature...

Earlier today I have asked the SanDisk representative in the ANSI T10 
committee for his opinion about standardizing support for immediate data 
for the SRP protocol. I'm now waiting for his reply.

>> it is easy to add to the SRP initiator and target drivers.
>> Implementations exist in the ib_srp-backport initiator driver and the
>> SCST SRP target driver (see also
>> https://github.com/bvanassche/ib_srp-backport and
>> http://sourceforge.net/p/scst/svn/HEAD/tree/trunk/srpt/). These
>> implementations are available since considerable time, work reliably,
>> are backwards compatible and support zero-copy. Since using immediate
>> data provides a measurable performance improvement I'm wondering whether
>> it would be acceptable to add support for immediate data to the SRP
>> drivers in the Linux kernel tree (ib_srp and ib_srpt) ?
>
> Was this tested against any other array besides SCST and LIO? I know
> people are using SRP against various arrays (Oracle ZFS, RamSan TMS,
> DDN Fusion, NIMBUS, NetApp E5400A...). Their not running upstream
> kernels, but they will catch up at some point...

Not yet, but the existing implementation is backwards compatible. The 
SRP initiator sets a flag in the login request to request the target 
driver to enable immediate data (SRP_BUF_FORMAT_IMM), and only if the 
target driver sends a login response in which the same flag is set 
immediate data is enabled. Immediate data could only be enabled when it 
should not be enabled if an SRP target sets other flags in its login 
reply than those that have been standardized by the ANSI T10 committee.

Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RFC: Immediate data support for SRP
       [not found]         ` <55AD8C3A.9070403-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
@ 2015-07-21  9:17           ` Sagi Grimberg
       [not found]             ` <55AE0E30.40404-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Sagi Grimberg @ 2015-07-21  9:17 UTC (permalink / raw)
  To: Bart Van Assche, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On 7/21/2015 3:03 AM, Bart Van Assche wrote:
> On 07/19/2015 09:07 AM, Sagi Grimberg wrote:
>> On 7/16/2015 6:25 PM, Bart Van Assche wrote:
>>> As you probably know for write requests "immediate data" means sending
>>> the data in the same packet as the write command instead of sending it
>>> as a separate packet. This approach improves performance and reduces
>>> latency. Although support for immediate data has not been standardized,
>>
>> Has anyone tried to get it into the standard? It would seem beneficial
>> to just add it. I wouldn't want this to be a Linux compatible only
>> feature...
>
> Earlier today I have asked the SanDisk representative in the ANSI T10
> committee for his opinion about standardizing support for immediate data
> for the SRP protocol. I'm now waiting for his reply.

OK, I think that if we add it to upstream drivers we should be committed
to get that into the standards.

>
>>> it is easy to add to the SRP initiator and target drivers.
>>> Implementations exist in the ib_srp-backport initiator driver and the
>>> SCST SRP target driver (see also
>>> https://github.com/bvanassche/ib_srp-backport and
>>> http://sourceforge.net/p/scst/svn/HEAD/tree/trunk/srpt/). These
>>> implementations are available since considerable time, work reliably,
>>> are backwards compatible and support zero-copy. Since using immediate
>>> data provides a measurable performance improvement I'm wondering whether
>>> it would be acceptable to add support for immediate data to the SRP
>>> drivers in the Linux kernel tree (ib_srp and ib_srpt) ?
>>
>> Was this tested against any other array besides SCST and LIO? I know
>> people are using SRP against various arrays (Oracle ZFS, RamSan TMS,
>> DDN Fusion, NIMBUS, NetApp E5400A...). Their not running upstream
>> kernels, but they will catch up at some point...
>
> Not yet, but the existing implementation is backwards compatible. The
> SRP initiator sets a flag in the login request to request the target
> driver to enable immediate data (SRP_BUF_FORMAT_IMM), and only if the
> target driver sends a login response in which the same flag is set
> immediate data is enabled.

I understand, but if I'm not mistaken, setting a reserved field is also
a spec violation. Who knows what might happen if arrays will see a value
in that reserved.

Also, what is the imm_length? hard-coded? or do you specify that in the
login request as well? if so, is the target allowed to lower that value
(i.e. is it negotiable)?

This is why I think the standards is important here.

Having said that, I'm 100% on board with this feature.

Cheers,

Sagi.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RFC: Immediate data support for SRP
       [not found]                 ` <55AD133D.2040204-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-07-21  9:25                   ` Sagi Grimberg
  0 siblings, 0 replies; 10+ messages in thread
From: Sagi Grimberg @ 2015-07-21  9:25 UTC (permalink / raw)
  To: Or Gerlitz, Or Gerlitz; +Cc: Bart Van Assche, linux-rdma-u79uwXL29TY76Z2rM5mHXA


>
> So you have 140% better IOPS with immediate-data vs. non immediate
> data?! numberz?

No, the improvement was to avoid memory copy from the pre-posted recieve
buffer (with immediate-data) to an allocated buffer. Instead the receive
buffer is handed to the backend to do IO.

This shows up to 40% improvement (vs. immediate data with copy on the
target side).

I'll post numbers with the patches.

Sagi.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RFC: Immediate data support for SRP
       [not found]             ` <55AE0E30.40404-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2015-07-21 15:35               ` Bart Van Assche
  0 siblings, 0 replies; 10+ messages in thread
From: Bart Van Assche @ 2015-07-21 15:35 UTC (permalink / raw)
  To: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On 07/21/2015 02:17 AM, Sagi Grimberg wrote:
> Also, what is the imm_length? hard-coded? or do you specify that in the
> login request as well? if so, is the target allowed to lower that value
> (i.e. is it negotiable)?

Hello Sagi,

As you probably know SRP initiator and target drivers negotiate the 
MAXIMUM INITIATOR TO TARGET IU LENGTH (MITTIL) and MAXIMUM TARGET TO 
INITIATOR IU LENGTH (MTTIIL) parameters during login. The choice we made 
is that the initiator must only send immediate data if the size of the 
resulting message does not exceed the MITTIL. Since an SRP target driver 
is only allowed to send back a MITTIL value that is larger than or equal 
to what an initiator requested, the maximum immediate data size is 
controlled by the target driver.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-07-21 15:35 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-16 15:25 RFC: Immediate data support for SRP Bart Van Assche
     [not found] ` <55A7CCF1.3080201-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2015-07-19 16:07   ` Sagi Grimberg
     [not found]     ` <CACaajQu9QBo7robmrFF7hev-xS=c3VD9qQTn5DKuuObk5aU_Kg@mail.gmail.com>
     [not found]       ` <CACaajQu9QBo7robmrFF7hev-xS=c3VD9qQTn5DKuuObk5aU_Kg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-07-20 23:49         ` Bart Van Assche
     [not found]     ` <55ABCB34.1000506-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-07-19 21:43       ` Or Gerlitz
     [not found]         ` <CAJ3xEMiz=n_uP9TQ6YsGN+omzsN3Zrz5skDEq00+fE188z0erA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-07-20  9:44           ` Sagi Grimberg
     [not found]             ` <55ACC2FA.8030202-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-07-20 15:26               ` Or Gerlitz
     [not found]                 ` <55AD133D.2040204-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-07-21  9:25                   ` Sagi Grimberg
2015-07-21  0:03       ` Bart Van Assche
     [not found]         ` <55AD8C3A.9070403-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2015-07-21  9:17           ` Sagi Grimberg
     [not found]             ` <55AE0E30.40404-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-07-21 15:35               ` Bart Van Assche

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.