From: Matias Bjorling <m@bjorling.me> To: Christoph Hellwig <hch@infradead.org> Cc: axboe@fb.com, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nvme@lists.infradead.org, keith.busch@intel.com, javier@paletta.io Subject: Re: [PATCH 1/5 v2] blk-mq: Add prep/unprep support Date: Sat, 18 Apr 2015 08:45:19 +0200 [thread overview] Message-ID: <5531FD7F.8070809@bjorling.me> (raw) In-Reply-To: <20150417174630.GA10249@infradead.org> Den 17-04-2015 kl. 19:46 skrev Christoph Hellwig: > On Fri, Apr 17, 2015 at 10:15:46AM +0200, Matias Bj?rling wrote: >> Just the prep/unprep, or other pieces as well? > > All of it - it's functionality that lies logically below the block > layer, so that's where it should be handled. > > In fact it should probably work similar to the mtd subsystem - that is > have it's own API for low level drivers, and just export a block driver > as one consumer on the top side. The low level drivers will be NVMe and vendor's own PCI-e drivers. It's very generic in their nature. Each driver would duplicate the same work. Both could have normal and open-channel drives attached. I'll like to keep blk-mq in the loop. I don't think it will be pretty to have two data paths in the drivers. For blk-mq, bios are splitted/merged on the way down. Thus, the actual physical addresses needs aren't known before the IO is diced to the right size. The reason it shouldn't be under the a single block device, is that a target should be able to provide a global address space. That allows the address space to grow/shrink dynamically with the disks. Allowing a continuously growing address space, where disks can be added/removed as requirements grow or flash ages. Not on a sector level, but on a flash block level. > >> In the future, applications can have an API to get/put flash block directly. >> (using the blk_nvm_[get/put]_blk interface). > > s/application/filesystem/? > Applications. The goal is that key value stores, e.g. RocksDB, Aerospike, Ceph and similar have direct access to flash storage. There won't be a kernel file-system between. The get/put interface can be seen as a space reservation interface for where a given process is allowed to access the storage media. It can also be seen in the way that we provide a block allocator in the kernel, while applications implement the rest of "file-system" in user-space, specially optimized for their data structures. This makes a lot of sense for a small subset (LSM, Fractal trees, etc.) of database applications.
WARNING: multiple messages have this Message-ID (diff)
From: m@bjorling.me (Matias Bjorling) Subject: [PATCH 1/5 v2] blk-mq: Add prep/unprep support Date: Sat, 18 Apr 2015 08:45:19 +0200 [thread overview] Message-ID: <5531FD7F.8070809@bjorling.me> (raw) In-Reply-To: <20150417174630.GA10249@infradead.org> Den 17-04-2015 kl. 19:46 skrev Christoph Hellwig: > On Fri, Apr 17, 2015@10:15:46AM +0200, Matias Bj?rling wrote: >> Just the prep/unprep, or other pieces as well? > > All of it - it's functionality that lies logically below the block > layer, so that's where it should be handled. > > In fact it should probably work similar to the mtd subsystem - that is > have it's own API for low level drivers, and just export a block driver > as one consumer on the top side. The low level drivers will be NVMe and vendor's own PCI-e drivers. It's very generic in their nature. Each driver would duplicate the same work. Both could have normal and open-channel drives attached. I'll like to keep blk-mq in the loop. I don't think it will be pretty to have two data paths in the drivers. For blk-mq, bios are splitted/merged on the way down. Thus, the actual physical addresses needs aren't known before the IO is diced to the right size. The reason it shouldn't be under the a single block device, is that a target should be able to provide a global address space. That allows the address space to grow/shrink dynamically with the disks. Allowing a continuously growing address space, where disks can be added/removed as requirements grow or flash ages. Not on a sector level, but on a flash block level. > >> In the future, applications can have an API to get/put flash block directly. >> (using the blk_nvm_[get/put]_blk interface). > > s/application/filesystem/? > Applications. The goal is that key value stores, e.g. RocksDB, Aerospike, Ceph and similar have direct access to flash storage. There won't be a kernel file-system between. The get/put interface can be seen as a space reservation interface for where a given process is allowed to access the storage media. It can also be seen in the way that we provide a block allocator in the kernel, while applications implement the rest of "file-system" in user-space, specially optimized for their data structures. This makes a lot of sense for a small subset (LSM, Fractal trees, etc.) of database applications.
next prev parent reply other threads:[~2015-04-18 6:45 UTC|newest] Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top 2015-04-15 12:34 [PATCH 0/5 v2] Support for Open-Channel SSDs Matias Bjørling 2015-04-15 12:34 ` Matias Bjørling 2015-04-15 12:34 ` [PATCH 1/5 v2] blk-mq: Add prep/unprep support Matias Bjørling 2015-04-15 12:34 ` Matias Bjørling 2015-04-17 6:34 ` Christoph Hellwig 2015-04-17 6:34 ` Christoph Hellwig 2015-04-17 8:15 ` Matias Bjørling 2015-04-17 8:15 ` Matias Bjørling 2015-04-17 17:46 ` Christoph Hellwig 2015-04-17 17:46 ` Christoph Hellwig 2015-04-18 6:45 ` Matias Bjorling [this message] 2015-04-18 6:45 ` Matias Bjorling 2015-04-18 20:16 ` Christoph Hellwig 2015-04-18 20:16 ` Christoph Hellwig 2015-04-19 18:12 ` Matias Bjorling 2015-04-19 18:12 ` Matias Bjorling 2015-04-15 12:34 ` [PATCH 2/5 v2] blk-mq: Support for Open-Channel SSDs Matias Bjørling 2015-04-15 12:34 ` Matias Bjørling 2015-04-15 12:34 ` Matias Bjørling 2015-04-16 9:10 ` Paul Bolle 2015-04-16 9:10 ` Paul Bolle 2015-04-16 10:23 ` Matias Bjørling 2015-04-16 10:23 ` Matias Bjørling 2015-04-16 10:23 ` Matias Bjørling 2015-04-16 11:34 ` Paul Bolle 2015-04-16 11:34 ` Paul Bolle 2015-04-16 11:34 ` Paul Bolle 2015-04-16 13:29 ` Matias Bjørling 2015-04-16 13:29 ` Matias Bjørling 2015-04-16 13:29 ` Matias Bjørling 2015-04-15 12:34 ` [PATCH 3/5 v2] lightnvm: RRPC target Matias Bjørling 2015-04-15 12:34 ` Matias Bjørling 2015-04-15 12:34 ` Matias Bjørling 2015-04-16 9:12 ` Paul Bolle 2015-04-16 9:12 ` Paul Bolle 2015-04-15 12:34 ` [PATCH 4/5 v2] null_blk: LightNVM support Matias Bjørling 2015-04-15 12:34 ` Matias Bjørling 2015-04-15 12:34 ` [PATCH 5/5 v2] nvme: " Matias Bjørling 2015-04-15 12:34 ` Matias Bjørling 2015-04-16 14:55 ` Keith Busch 2015-04-16 14:55 ` Keith Busch 2015-04-16 15:14 ` Javier González 2015-04-16 15:14 ` Javier González 2015-04-16 15:52 ` Keith Busch 2015-04-16 15:52 ` Keith Busch 2015-04-16 16:01 ` James R. Bergsten 2015-04-16 16:01 ` James R. Bergsten 2015-04-16 16:01 ` James R. Bergsten 2015-04-16 16:12 ` Keith Busch 2015-04-16 16:12 ` Keith Busch 2015-04-16 17:17 ` Matias Bjorling 2015-04-16 17:17 ` Matias Bjorling 2015-04-16 17:17 ` Matias Bjorling
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=5531FD7F.8070809@bjorling.me \ --to=m@bjorling.me \ --cc=axboe@fb.com \ --cc=hch@infradead.org \ --cc=javier@paletta.io \ --cc=keith.busch@intel.com \ --cc=linux-fsdevel@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-nvme@lists.infradead.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.