All of lore.kernel.org
 help / color / mirror / Atom feed
From: Matias Bjorling <m@bjorling.me>
To: Christoph Hellwig <hch@infradead.org>
Cc: axboe@fb.com, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-nvme@lists.infradead.org,
	keith.busch@intel.com, javier@paletta.io
Subject: Re: [PATCH 1/5 v2] blk-mq: Add prep/unprep support
Date: Sat, 18 Apr 2015 08:45:19 +0200	[thread overview]
Message-ID: <5531FD7F.8070809@bjorling.me> (raw)
In-Reply-To: <20150417174630.GA10249@infradead.org>

Den 17-04-2015 kl. 19:46 skrev Christoph Hellwig:
> On Fri, Apr 17, 2015 at 10:15:46AM +0200, Matias Bj?rling wrote:
>> Just the prep/unprep, or other pieces as well?
>
> All of it - it's functionality that lies logically below the block
> layer, so that's where it should be handled.
>
> In fact it should probably work similar to the mtd subsystem - that is
> have it's own API for low level drivers, and just export a block driver
> as one consumer on the top side.

The low level drivers will be NVMe and vendor's own PCI-e drivers. It's 
very generic in their nature. Each driver would duplicate the same work. 
Both could have normal and open-channel drives attached.

I'll like to keep blk-mq in the loop. I don't think it will be pretty to 
have two data paths in the drivers. For blk-mq, bios are splitted/merged 
on the way down. Thus, the actual physical addresses needs aren't known 
before the IO is diced to the right size.

The reason it shouldn't be under the a single block device, is that a 
target should be able to provide a global address space. That allows the 
address space to grow/shrink dynamically with the disks. Allowing a 
continuously growing address space, where disks can be added/removed as 
requirements grow or flash ages. Not on a sector level, but on a flash 
block level.

>
>> In the future, applications can have an API to get/put flash block directly.
>> (using the blk_nvm_[get/put]_blk interface).
>
> s/application/filesystem/?
>

Applications. The goal is that key value stores, e.g. RocksDB, 
Aerospike, Ceph and similar have direct access to flash storage. There 
won't be a kernel file-system between.

The get/put interface can be seen as a space reservation interface for 
where a given process is allowed to access the storage media.

It can also be seen in the way that we provide a block allocator in the 
kernel, while applications implement the rest of "file-system" in 
user-space, specially optimized for their data structures. This makes a 
lot of sense for a small subset (LSM, Fractal trees, etc.) of database 
applications.


WARNING: multiple messages have this Message-ID (diff)
From: m@bjorling.me (Matias Bjorling)
Subject: [PATCH 1/5 v2] blk-mq: Add prep/unprep support
Date: Sat, 18 Apr 2015 08:45:19 +0200	[thread overview]
Message-ID: <5531FD7F.8070809@bjorling.me> (raw)
In-Reply-To: <20150417174630.GA10249@infradead.org>

Den 17-04-2015 kl. 19:46 skrev Christoph Hellwig:
> On Fri, Apr 17, 2015@10:15:46AM +0200, Matias Bj?rling wrote:
>> Just the prep/unprep, or other pieces as well?
>
> All of it - it's functionality that lies logically below the block
> layer, so that's where it should be handled.
>
> In fact it should probably work similar to the mtd subsystem - that is
> have it's own API for low level drivers, and just export a block driver
> as one consumer on the top side.

The low level drivers will be NVMe and vendor's own PCI-e drivers. It's 
very generic in their nature. Each driver would duplicate the same work. 
Both could have normal and open-channel drives attached.

I'll like to keep blk-mq in the loop. I don't think it will be pretty to 
have two data paths in the drivers. For blk-mq, bios are splitted/merged 
on the way down. Thus, the actual physical addresses needs aren't known 
before the IO is diced to the right size.

The reason it shouldn't be under the a single block device, is that a 
target should be able to provide a global address space. That allows the 
address space to grow/shrink dynamically with the disks. Allowing a 
continuously growing address space, where disks can be added/removed as 
requirements grow or flash ages. Not on a sector level, but on a flash 
block level.

>
>> In the future, applications can have an API to get/put flash block directly.
>> (using the blk_nvm_[get/put]_blk interface).
>
> s/application/filesystem/?
>

Applications. The goal is that key value stores, e.g. RocksDB, 
Aerospike, Ceph and similar have direct access to flash storage. There 
won't be a kernel file-system between.

The get/put interface can be seen as a space reservation interface for 
where a given process is allowed to access the storage media.

It can also be seen in the way that we provide a block allocator in the 
kernel, while applications implement the rest of "file-system" in 
user-space, specially optimized for their data structures. This makes a 
lot of sense for a small subset (LSM, Fractal trees, etc.) of database 
applications.

  reply	other threads:[~2015-04-18  6:45 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-15 12:34 [PATCH 0/5 v2] Support for Open-Channel SSDs Matias Bjørling
2015-04-15 12:34 ` Matias Bjørling
2015-04-15 12:34 ` [PATCH 1/5 v2] blk-mq: Add prep/unprep support Matias Bjørling
2015-04-15 12:34   ` Matias Bjørling
2015-04-17  6:34   ` Christoph Hellwig
2015-04-17  6:34     ` Christoph Hellwig
2015-04-17  8:15     ` Matias Bjørling
2015-04-17  8:15       ` Matias Bjørling
2015-04-17 17:46       ` Christoph Hellwig
2015-04-17 17:46         ` Christoph Hellwig
2015-04-18  6:45         ` Matias Bjorling [this message]
2015-04-18  6:45           ` Matias Bjorling
2015-04-18 20:16           ` Christoph Hellwig
2015-04-18 20:16             ` Christoph Hellwig
2015-04-19 18:12             ` Matias Bjorling
2015-04-19 18:12               ` Matias Bjorling
2015-04-15 12:34 ` [PATCH 2/5 v2] blk-mq: Support for Open-Channel SSDs Matias Bjørling
2015-04-15 12:34   ` Matias Bjørling
2015-04-15 12:34   ` Matias Bjørling
2015-04-16  9:10   ` Paul Bolle
2015-04-16  9:10     ` Paul Bolle
2015-04-16 10:23     ` Matias Bjørling
2015-04-16 10:23       ` Matias Bjørling
2015-04-16 10:23       ` Matias Bjørling
2015-04-16 11:34       ` Paul Bolle
2015-04-16 11:34         ` Paul Bolle
2015-04-16 11:34         ` Paul Bolle
2015-04-16 13:29         ` Matias Bjørling
2015-04-16 13:29           ` Matias Bjørling
2015-04-16 13:29           ` Matias Bjørling
2015-04-15 12:34 ` [PATCH 3/5 v2] lightnvm: RRPC target Matias Bjørling
2015-04-15 12:34   ` Matias Bjørling
2015-04-15 12:34   ` Matias Bjørling
2015-04-16  9:12   ` Paul Bolle
2015-04-16  9:12     ` Paul Bolle
2015-04-15 12:34 ` [PATCH 4/5 v2] null_blk: LightNVM support Matias Bjørling
2015-04-15 12:34   ` Matias Bjørling
2015-04-15 12:34 ` [PATCH 5/5 v2] nvme: " Matias Bjørling
2015-04-15 12:34   ` Matias Bjørling
2015-04-16 14:55   ` Keith Busch
2015-04-16 14:55     ` Keith Busch
2015-04-16 15:14     ` Javier González
2015-04-16 15:14       ` Javier González
2015-04-16 15:52       ` Keith Busch
2015-04-16 15:52         ` Keith Busch
2015-04-16 16:01         ` James R. Bergsten
2015-04-16 16:01           ` James R. Bergsten
2015-04-16 16:01           ` James R. Bergsten
2015-04-16 16:12           ` Keith Busch
2015-04-16 16:12             ` Keith Busch
2015-04-16 17:17     ` Matias Bjorling
2015-04-16 17:17       ` Matias Bjorling
2015-04-16 17:17       ` Matias Bjorling

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5531FD7F.8070809@bjorling.me \
    --to=m@bjorling.me \
    --cc=axboe@fb.com \
    --cc=hch@infradead.org \
    --cc=javier@paletta.io \
    --cc=keith.busch@intel.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.