All of lore.kernel.org
 help / color / mirror / Atom feed
* RDMA power failure write atomicity
@ 2016-03-10 23:45 Vladislav Bolkhovitin
       [not found] ` <56E20734.4030208-d+Crzxg7Rs0@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Vladislav Bolkhovitin @ 2016-03-10 23:45 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA

Hello,

I'm currently considering to use NVDIMM behind RDMA and wonder what is RDMA power
failure write atomicity? I mean, what is minimal size and alignment guaranteed to be
written atomically in face of power failure (or some other similar failure), i.e.
either written in full, or not written at all?

For memory writes on Intel it is 8 bytes with 8 bytes alignment. Is there anything like
this for RDMA? Or different vendors/implementation have so different expectations and
promises, so you can not assume anything >1 byte?

I can't find such info anywhere.

Thanks,
Vlad
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: RDMA power failure write atomicity
       [not found] ` <56E20734.4030208-d+Crzxg7Rs0@public.gmane.org>
@ 2016-03-11  1:33   ` Asgeir Eiriksson
       [not found]     ` <0E25BAE6-9091-4B28-A2A9-2F41BD97145A-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Asgeir Eiriksson @ 2016-03-11  1:33 UTC (permalink / raw)
  To: Vladislav Bolkhovitin; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

Vladislav,

This is an area of active R&D

You might be interested in the following (at ietf.org):

Title           : RDMA Durable Write Commit
        Authors         : Tom Talpey
                               Jim Pinkerton
                          <>
	Filename      : draft-talpey-rdma-commit-00.txt
	Pages          : 24
	Date            : 2016-02-19

Regards,

‘Asgeir


> On Mar 10, 2016, at 3:45 PM, Vladislav Bolkhovitin <vst-d+Crzxg7Rs0@public.gmane.org> wrote:
> 
> Hello,
> 
> I'm currently considering to use NVDIMM behind RDMA and wonder what is RDMA power
> failure write atomicity? I mean, what is minimal size and alignment guaranteed to be
> written atomically in face of power failure (or some other similar failure), i.e.
> either written in full, or not written at all?
> 
> For memory writes on Intel it is 8 bytes with 8 bytes alignment. Is there anything like
> this for RDMA? Or different vendors/implementation have so different expectations and
> promises, so you can not assume anything >1 byte?
> 
> I can't find such info anywhere.
> 
> Thanks,
> Vlad
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: RDMA power failure write atomicity
       [not found]     ` <0E25BAE6-9091-4B28-A2A9-2F41BD97145A-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2016-03-12  0:26       ` Vladislav Bolkhovitin
       [not found]         ` <56E36247.7060605-d+Crzxg7Rs0@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Vladislav Bolkhovitin @ 2016-03-12  0:26 UTC (permalink / raw)
  To: Asgeir Eiriksson; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

I'm aware of this proposal. Unfortunately, it is quite orthogonal to my question,
because it is about how to ensure persistence of RDMA writes. Atomicity it is
mentioning as well as general RDMA atomicity is atomicity with regard of parallel
commands acting on the same locations. However, I'm asking about power failure
atomicity, which is something different.

For instance, you are doing RDMA WRITE of 10 bytes of data. If a power failure happen
while this operation is in progress, what data will end up on the target location? All
10 bytes new? All 10 bytes old? Or mix of 5 bytes new and five bytes old? Power failure
atomicity I mean is guarantee that the data either old, or new, never mix of old and
new data.

Thanks,
Vlad

Asgeir Eiriksson wrote on 03/10/2016 05:33 PM:
> Vladislav,
> 
> This is an area of active R&D
> 
> You might be interested in the following (at ietf.org):
> 
> Title           : RDMA Durable Write Commit
>         Authors         : Tom Talpey
>                                Jim Pinkerton
>                           <>
> 	Filename      : draft-talpey-rdma-commit-00.txt
> 	Pages          : 24
> 	Date            : 2016-02-19
> 
> Regards,
> 
> ‘Asgeir
> 
> 
>> On Mar 10, 2016, at 3:45 PM, Vladislav Bolkhovitin <vst-d+Crzxg7Rs0@public.gmane.org> wrote:
>>
>> Hello,
>>
>> I'm currently considering to use NVDIMM behind RDMA and wonder what is RDMA power
>> failure write atomicity? I mean, what is minimal size and alignment guaranteed to be
>> written atomically in face of power failure (or some other similar failure), i.e.
>> either written in full, or not written at all?
>>
>> For memory writes on Intel it is 8 bytes with 8 bytes alignment. Is there anything like
>> this for RDMA? Or different vendors/implementation have so different expectations and
>> promises, so you can not assume anything >1 byte?
>>
>> I can't find such info anywhere.
>>
>> Thanks,
>> Vlad

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: RDMA power failure write atomicity
       [not found]         ` <56E36247.7060605-d+Crzxg7Rs0@public.gmane.org>
@ 2016-03-12  1:14           ` Anuj Kalia
       [not found]             ` <CADPSxAh-Fn8cHekiXeBa6+1rNDM=N0y5wQHDFtM4BxxH0wjzBg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Anuj Kalia @ 2016-03-12  1:14 UTC (permalink / raw)
  To: Vladislav Bolkhovitin; +Cc: Asgeir Eiriksson, linux-rdma-u79uwXL29TY76Z2rM5mHXA

There are several factors that make this problem hard. For many modern
servers, DMA data is written to last level cache via DDIO, i.e., it
will not go to the NVDIMM unless the remote CPU flushes the cache /
cache lines. On servers where data is written to DRAM (or to an NVDIMM
attached to memory bus), the data can (probably) still be buffered by
the CPU's memory controller.

I am not sure how much control RDMA NICs have over these factors.
AFAIK, there is no PCIe command to flush either cache lines or memory
controller buffers, so flushing to DRAM this is beyond what RDMA NICs
can currently accomplish.

--Anuj (rdma_guy)

On Fri, Mar 11, 2016 at 7:26 PM, Vladislav Bolkhovitin <vst-d+Crzxg7Rs0@public.gmane.org> wrote:
> I'm aware of this proposal. Unfortunately, it is quite orthogonal to my question,
> because it is about how to ensure persistence of RDMA writes. Atomicity it is
> mentioning as well as general RDMA atomicity is atomicity with regard of parallel
> commands acting on the same locations. However, I'm asking about power failure
> atomicity, which is something different.
>
> For instance, you are doing RDMA WRITE of 10 bytes of data. If a power failure happen
> while this operation is in progress, what data will end up on the target location? All
> 10 bytes new? All 10 bytes old? Or mix of 5 bytes new and five bytes old? Power failure
> atomicity I mean is guarantee that the data either old, or new, never mix of old and
> new data.
>
> Thanks,
> Vlad
>
> Asgeir Eiriksson wrote on 03/10/2016 05:33 PM:
>> Vladislav,
>>
>> This is an area of active R&D
>>
>> You might be interested in the following (at ietf.org):
>>
>> Title           : RDMA Durable Write Commit
>>         Authors         : Tom Talpey
>>                                Jim Pinkerton
>>                           <>
>>       Filename      : draft-talpey-rdma-commit-00.txt
>>       Pages          : 24
>>       Date            : 2016-02-19
>>
>> Regards,
>>
>> ‘Asgeir
>>
>>
>>> On Mar 10, 2016, at 3:45 PM, Vladislav Bolkhovitin <vst-d+Crzxg7Rs0@public.gmane.org> wrote:
>>>
>>> Hello,
>>>
>>> I'm currently considering to use NVDIMM behind RDMA and wonder what is RDMA power
>>> failure write atomicity? I mean, what is minimal size and alignment guaranteed to be
>>> written atomically in face of power failure (or some other similar failure), i.e.
>>> either written in full, or not written at all?
>>>
>>> For memory writes on Intel it is 8 bytes with 8 bytes alignment. Is there anything like
>>> this for RDMA? Or different vendors/implementation have so different expectations and
>>> promises, so you can not assume anything >1 byte?
>>>
>>> I can't find such info anywhere.
>>>
>>> Thanks,
>>> Vlad
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: RDMA power failure write atomicity
       [not found]             ` <CADPSxAh-Fn8cHekiXeBa6+1rNDM=N0y5wQHDFtM4BxxH0wjzBg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2016-03-12  1:54               ` Vladislav Bolkhovitin
  0 siblings, 0 replies; 5+ messages in thread
From: Vladislav Bolkhovitin @ 2016-03-12  1:54 UTC (permalink / raw)
  To: Anuj Kalia; +Cc: Asgeir Eiriksson, linux-rdma-u79uwXL29TY76Z2rM5mHXA


Anuj Kalia wrote on 03/11/2016 05:14 PM:
> There are several factors that make this problem hard. For many modern
> servers, DMA data is written to last level cache via DDIO, i.e., it
> will not go to the NVDIMM unless the remote CPU flushes the cache /
> cache lines. On servers where data is written to DRAM (or to an NVDIMM
> attached to memory bus), the data can (probably) still be buffered by
> the CPU's memory controller.

It sounds to me that then it should have the regular CPU write atomicity properties,
i.e. on 64-bit Intel: 8 bytes with 8 bytes alignment.

> I am not sure how much control RDMA NICs have over these factors.
> AFAIK, there is no PCIe command to flush either cache lines or memory
> controller buffers, so flushing to DRAM this is beyond what RDMA NICs
> can currently accomplish.

Yes, but flushing data is beyond my question, which is only about what type of pattern
of eventual data you can see on power failure, with or without flushing.

If there are no any minimal power failure atomicity guarantees, it would mean
effectively disable any write-in place into NVRAM/PMEM, because you can end up with
mixed old and new, hence, corrupted data. You would not be able to ever atomically
switch a pointer, so ever classical "write in new location, flush, than switch pointer
to the new data" approach would not work anymore. As result, value of what is proposed
in draft-Talpey-rdma-commit-00.txt (which is very good proposal) would be significantly
lower, because, unless I'm missing something, the only available use case for RDMA
writes bypassing remote CPU that would withstand is logs replication with each entry
protected by a checksum, so on recovery after power failure you can figure out the last
corrupted record. However, records compaction would still be done via remote CPU (no
bypassing), because only CPU can power failure atomically switch pointers in NVRAM/PMEM.

So, it seems to me that something minimal, like 8 bytes, must be defined. I wonder,
maybe it has already been defined. Looks like, not.

Thanks,
Vlad

> --Anuj (rdma_guy)
> 
> On Fri, Mar 11, 2016 at 7:26 PM, Vladislav Bolkhovitin <vst-d+Crzxg7Rs0@public.gmane.org> wrote:
>> I'm aware of this proposal. Unfortunately, it is quite orthogonal to my question,
>> because it is about how to ensure persistence of RDMA writes. Atomicity it is
>> mentioning as well as general RDMA atomicity is atomicity with regard of parallel
>> commands acting on the same locations. However, I'm asking about power failure
>> atomicity, which is something different.
>>
>> For instance, you are doing RDMA WRITE of 10 bytes of data. If a power failure happen
>> while this operation is in progress, what data will end up on the target location? All
>> 10 bytes new? All 10 bytes old? Or mix of 5 bytes new and five bytes old? Power failure
>> atomicity I mean is guarantee that the data either old, or new, never mix of old and
>> new data.
>>
>> Thanks,
>> Vlad
>>
>> Asgeir Eiriksson wrote on 03/10/2016 05:33 PM:
>>> Vladislav,
>>>
>>> This is an area of active R&D
>>>
>>> You might be interested in the following (at ietf.org):
>>>
>>> Title           : RDMA Durable Write Commit
>>>         Authors         : Tom Talpey
>>>                                Jim Pinkerton
>>>                           <>
>>>       Filename      : draft-talpey-rdma-commit-00.txt
>>>       Pages          : 24
>>>       Date            : 2016-02-19
>>>
>>> Regards,
>>>
>>> ‘Asgeir
>>>
>>>
>>>> On Mar 10, 2016, at 3:45 PM, Vladislav Bolkhovitin <vst-d+Crzxg7Rs0@public.gmane.org> wrote:
>>>>
>>>> Hello,
>>>>
>>>> I'm currently considering to use NVDIMM behind RDMA and wonder what is RDMA power
>>>> failure write atomicity? I mean, what is minimal size and alignment guaranteed to be
>>>> written atomically in face of power failure (or some other similar failure), i.e.
>>>> either written in full, or not written at all?
>>>>
>>>> For memory writes on Intel it is 8 bytes with 8 bytes alignment. Is there anything like
>>>> this for RDMA? Or different vendors/implementation have so different expectations and
>>>> promises, so you can not assume anything >1 byte?
>>>>
>>>> I can't find such info anywhere.
>>>>
>>>> Thanks,
>>>> Vlad
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-03-12  1:54 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-10 23:45 RDMA power failure write atomicity Vladislav Bolkhovitin
     [not found] ` <56E20734.4030208-d+Crzxg7Rs0@public.gmane.org>
2016-03-11  1:33   ` Asgeir Eiriksson
     [not found]     ` <0E25BAE6-9091-4B28-A2A9-2F41BD97145A-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2016-03-12  0:26       ` Vladislav Bolkhovitin
     [not found]         ` <56E36247.7060605-d+Crzxg7Rs0@public.gmane.org>
2016-03-12  1:14           ` Anuj Kalia
     [not found]             ` <CADPSxAh-Fn8cHekiXeBa6+1rNDM=N0y5wQHDFtM4BxxH0wjzBg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-03-12  1:54               ` Vladislav Bolkhovitin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.