All of lore.kernel.org
 help / color / mirror / Atom feed
From: Boaz Harrosh <boaz@plexistor.com>
To: Chuck Lever <chuck.lever@oracle.com>,
	lsf-pc@lists.linux-foundation.org,
	Dan Williams <dan.j.williams@intel.com>,
	Yigal Korman <yigal@plexistor.com>
Cc: Linux NFS Mailing List <linux-nfs@vger.kernel.org>,
	Linux RDMA Mailing List <linux-rdma@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Jan Kara <jack@suse.cz>, Ric Wheeler <rwheeler@redhat.com>
Subject: [LSF/MM TOPIC/ATTEND] RDMA passive target
Date: Wed, 27 Jan 2016 18:54:30 +0200	[thread overview]
Message-ID: <56A8F646.5020003@plexistor.com> (raw)
In-Reply-To: <06414D5A-0632-4C74-B76C-038093E8AED3@oracle.com>

On 01/25/2016 11:19 PM, Chuck Lever wrote:
> I'd like to propose a discussion of how to take advantage of
> persistent memory in network-attached storage scenarios.
> 
> RDMA runs on high speed network fabrics and offloads data
> transfer from host CPUs. Thus it is a good match to the
> performance characteristics of persistent memory.
> 
> Today Linux supports iSER, SRP, and NFS/RDMA on RDMA
> fabrics. What kind of changes are needed in the Linux I/O
> stack (in particular, storage targets) and in these storage
> protocols to get the most benefit from ultra-low latency
> storage?
> 
> There have been recent proposals about how storage protocols
> and implementations might need to change (eg. Tom Talpey's
> SNIA proposals for changing to a push data transfer model,
> Sagi's proposal to utilize DAX under the NFS/RDMA server,
> and my proposal for a new pNFS layout to drive RDMA data
> transfer directly).
> 
> The outcome of the discussion would be to understand what
> people are working on now and what is the desired
> architectural approach in order to determine where storage
> developers should be focused.
> 
> This could be either a BoF or a session during the main
> tracks. There is sure to be a narrow segment of each
> track's attendees that would have interest in this topic.
> 

I would like to attend this talk, and also talk about
a target we have been developing / utilizing that we would like
to propose as a Linux standard driver.
(It would be very important for me to also attend the other
 pmem talks in LSF, as well as some of the MM and FS talks
 proposed so far)

RDMA passive target
~~~~~~~~~~~~~~~~~~~

The idea is to have a storage brick that exports a very
low level pure RDMA API to access its memory based storage.
The brick might be battery backed volatile based memory, or
pmem based. In any case the brick might utilize a much higher
capacity then memory by utilizing a "tiering" to slower media,
which is enabled by the API.

The API is simple:

1. Alloc_2M_block_at_virtual_address (ADDR_64_BIT)
   ADDR_64_BIT is any virtual address and defines the logical ID of the block.
   If the ID is already allocated an error is returned.
   If storage is exhausted return => ENOSPC
2. Free_2M_block_at_virtual_address (ADDR_64_BIT)
   Space for logical ID is returned to free store and the ID becomes free for
   a new allocation.
3. map_virtual_address(ADDR_64_BIT, flags) => RDMA handle
   previously allocated virtual address is locked in memory and an RDMA handle
   is returned.
   Flags: read-only, read-write, shared and so on...
4. unmap__virtual_address(ADDR_64_BIT)
   At this point the brick can write data to slower storage if memory space
   is needed. The RDMA handle from [3] is revoked.
5. List_mapped_IDs
   An extent based list of all allocated ranges. (This is usually used on
   mount or after a crash)

The dumb brick is not the Network allocator / storage manager at all. and it
is not a smart target / server. like an iser-target or pnfs-DS. A SW defined
application can do that, on top of the Dumb-brick. The motivation is a low level
very low latency API+library, which can be built upon for higher protocols or
used directly for very low latency cluster.
It does however mange a virtual allocation map of logical to physical mapping
of the 2M blocks.

Currently both drivers initiator and target are in Kernel, but with
latest advancement by Dan Williams it can be implemented in user-mode as well,
Almost.

The almost is because:
1. If the target is over a /dev/pmemX then all is fine we have 2M contiguous
   memory blocks.
2. If the target is over an FS, we have a proposal pending for an falloc_2M_flag
   to ask the FS for a contiguous 2M allocations only. If any of the 2M allocations
   fail then return ENOSPC from falloc. This way we guaranty that each 2M block can be
   mapped by a single RDAM handle.
   An FS for this purpose is nice for an over-allocated / dynamic space usage by
   a target and other resources in the server.


RDMA Initiator
~~~~~~~~~~~~~~~~~~~

The initiator is just a simple library. Both usermode and Kernel side should
be available, for direct access to the RDMA-passive-brick.

Thanks.
Boaz

> --
> Chuck Lever
> 


  parent reply	other threads:[~2016-01-27 16:54 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-25 21:19 [LSF/MM TOPIC] Remote access to pmem on storage targets Chuck Lever
2016-01-25 21:19 ` Chuck Lever
     [not found] ` <06414D5A-0632-4C74-B76C-038093E8AED3-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2016-01-26  8:25   ` [Lsf-pc] " Jan Kara
2016-01-26  8:25     ` Jan Kara
2016-01-26 15:58     ` Chuck Lever
2016-01-27  0:04       ` Dave Chinner
2016-01-27 15:55         ` Chuck Lever
2016-01-27 15:55           ` Chuck Lever
2016-01-28 21:10           ` Dave Chinner
     [not found]       ` <F0E2108B-891C-4570-B486-7DC7C4FB59C4-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2016-01-27 10:52         ` Sagi Grimberg
2016-01-27 10:52           ` Sagi Grimberg
2016-01-26 15:25   ` Atchley, Scott
2016-01-26 15:25     ` Atchley, Scott
2016-01-26 15:25     ` Atchley, Scott
     [not found]     ` <5FD20017-B588-42E6-BBDA-2AA8ABDBA42B-1Heg1YXhbW8@public.gmane.org>
2016-01-26 15:29       ` Chuck Lever
2016-01-26 15:29         ` Chuck Lever
     [not found]         ` <D0C5C0B9-A1A2-4428-B3CA-7BBCC5BEF10D-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2016-01-26 17:00           ` Christoph Hellwig
2016-01-26 17:00             ` Christoph Hellwig
2016-01-27 16:54 ` Boaz Harrosh [this message]
     [not found]   ` <56A8F646.5020003-/8YdC2HfS5554TAoqtyWWQ@public.gmane.org>
2016-01-27 17:02     ` [Lsf-pc] [LSF/MM TOPIC/ATTEND] RDMA passive target James Bottomley
2016-01-27 17:02       ` James Bottomley
2016-01-27 17:27   ` Sagi Grimberg
     [not found]     ` <56A8FE10.7000309-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2016-01-31 14:20       ` Boaz Harrosh
2016-01-31 14:20         ` Boaz Harrosh
2016-01-31 16:55         ` Yigal Korman
     [not found]           ` <CACTTzNaOChdWN2eS9_kzv6HO_LVib-JVdkmeUn0LDe2eKxPEgA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-02-01 10:36             ` Sagi Grimberg
2016-02-01 10:36               ` Sagi Grimberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56A8F646.5020003@plexistor.com \
    --to=boaz@plexistor.com \
    --cc=chuck.lever@oracle.com \
    --cc=dan.j.williams@intel.com \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=rwheeler@redhat.com \
    --cc=yigal@plexistor.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.