All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Matias Bjørling" <m@bjorling.me>
To: lsf-pc@lists.linux-foundation.org
Cc: Linux FS Devel <linux-fsdevel@vger.kernel.org>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	linux-nvme@lists.infradead.org
Subject: [LSF/MM TOPIC][LSF/MM ATTEND] OCSSDs - SMR, Hierarchical Interface, and Vector I/Os
Date: Mon, 2 Jan 2017 22:06:14 +0100	[thread overview]
Message-ID: <05204e9d-ed4d-f97a-88f0-41b5e008af43@bjorling.me> (raw)

Hi,

The open-channel SSD subsystem is maturing, and drives are beginning to 
become available on the market. The open-channel SSD interface is very 
similar to the one exposed by SMR hard-drives. They both have a set of 
chunks (zones) exposed, and zones are managed using open/close logic. 
The main difference on open-channel SSDs is that it additionally exposes 
multiple sets of zones through a hierarchical interface, which covers a 
numbers levels (X channels, Y LUNs per channel, Z zones per LUN).

Given that the SMR interface is similar to OCSSDs interface, I like to 
propose to discuss this at LSF/MM to align the efforts and make a clear 
path forward:

1. SMR Compatibility

Can the SMR host interface be adapted to Open-Channel SSDs? For example, 
the interface may be exposed as a single-level set of zones, which 
ignore the channel and lun concept for simplicity. Another approach 
might be to extend the SMR implementation sysfs entries to expose the 
hierarchy of the device (channels with X LUNs and each luns have a set 
of zones).

2. How to expose the tens of LUNs that OCSSDs have?

An open-channel SSDs typically has 64-256 LUNs that each acts as a 
parallel unit. How can these be efficiently exposed?

One may expose these as separate namespaces/partitions. For a DAS with 
24 drives, that will be 1536-6144 separate LUNs to manage. That many 
LUNs will blow up the host with gendisk instances. While if we do, then 
we have an excellent 1:1 mapping between the SMR interface and the OCSSD 
interface.

On the other hand, one could expose the device LUNs within a single LBA 
address space and lay the LUNs out linearly. In that case, the block 
layer may expose a variable that enables applications to understand this 
hierarchy. Mainly the channels with LUNs. Any warm feelings towards this?

Currently, a shortcut is taken with the geometry and hierarchy, which 
expose it through the /lightnvm sysfs entries. These (or a type thereof) 
can be moved to the block layer /queue directory.

If keeping the LUNs exposed on the same gendisk, vector I/Os becomes a 
viable path:

3. Vector I/Os

To derive parallelism from an open-channel SSD (and SSDs in parallel), 
one need to access them in parallel. Parallelism is achieved either by 
issuing I/Os for each LUN (similar to driving multiple SSDs today) or 
using a vector interface (encapsulating a list of LBAs, length, and data 
buffer) into the kernel. The latter approach allows I/Os to be 
vectorized and sent as a single unit to hardware.

Implementing this in generic block layer code might be overkill if only 
open-channel SSDs use it. I like to hear other use-cases (e.g., 
preadv/pwritev, file-systems, virtio?) that can take advantage of 
vectored I/Os. If it makes sense, then which level to implement: 
bio/request level, SGLs, or a new structure?

Device drivers that support vectored I/Os should be able to opt into the 
interface, while the block layer may automatically roll out for device 
drivers that don't have the support.

What has the history been in the Linux kernel about vector I/Os? What 
have reasons in the past been that such an interface was not adopted?

I will post RFC SMR patches before LSF/MM, such that we have a firm 
ground to discuss how it may be integrated.

-- Besides OCSSDs, I also like to participate in the discussions of 
XCOPY, NVMe, multipath, multi-queue interrupt management as well.

-Matias

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

WARNING: multiple messages have this Message-ID (diff)
From: "Matias Bjørling" <m@bjorling.me>
To: lsf-pc@lists.linux-foundation.org
Cc: Linux FS Devel <linux-fsdevel@vger.kernel.org>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	linux-nvme@lists.infradead.org
Subject: [LSF/MM TOPIC][LSF/MM ATTEND] OCSSDs - SMR, Hierarchical Interface, and Vector I/Os
Date: Mon, 2 Jan 2017 22:06:14 +0100	[thread overview]
Message-ID: <05204e9d-ed4d-f97a-88f0-41b5e008af43@bjorling.me> (raw)

Hi,

The open-channel SSD subsystem is maturing, and drives are beginning to 
become available on the market. The open-channel SSD interface is very 
similar to the one exposed by SMR hard-drives. They both have a set of 
chunks (zones) exposed, and zones are managed using open/close logic. 
The main difference on open-channel SSDs is that it additionally exposes 
multiple sets of zones through a hierarchical interface, which covers a 
numbers levels (X channels, Y LUNs per channel, Z zones per LUN).

Given that the SMR interface is similar to OCSSDs interface, I like to 
propose to discuss this at LSF/MM to align the efforts and make a clear 
path forward:

1. SMR Compatibility

Can the SMR host interface be adapted to Open-Channel SSDs? For example, 
the interface may be exposed as a single-level set of zones, which 
ignore the channel and lun concept for simplicity. Another approach 
might be to extend the SMR implementation sysfs entries to expose the 
hierarchy of the device (channels with X LUNs and each luns have a set 
of zones).

2. How to expose the tens of LUNs that OCSSDs have?

An open-channel SSDs typically has 64-256 LUNs that each acts as a 
parallel unit. How can these be efficiently exposed?

One may expose these as separate namespaces/partitions. For a DAS with 
24 drives, that will be 1536-6144 separate LUNs to manage. That many 
LUNs will blow up the host with gendisk instances. While if we do, then 
we have an excellent 1:1 mapping between the SMR interface and the OCSSD 
interface.

On the other hand, one could expose the device LUNs within a single LBA 
address space and lay the LUNs out linearly. In that case, the block 
layer may expose a variable that enables applications to understand this 
hierarchy. Mainly the channels with LUNs. Any warm feelings towards this?

Currently, a shortcut is taken with the geometry and hierarchy, which 
expose it through the /lightnvm sysfs entries. These (or a type thereof) 
can be moved to the block layer /queue directory.

If keeping the LUNs exposed on the same gendisk, vector I/Os becomes a 
viable path:

3. Vector I/Os

To derive parallelism from an open-channel SSD (and SSDs in parallel), 
one need to access them in parallel. Parallelism is achieved either by 
issuing I/Os for each LUN (similar to driving multiple SSDs today) or 
using a vector interface (encapsulating a list of LBAs, length, and data 
buffer) into the kernel. The latter approach allows I/Os to be 
vectorized and sent as a single unit to hardware.

Implementing this in generic block layer code might be overkill if only 
open-channel SSDs use it. I like to hear other use-cases (e.g., 
preadv/pwritev, file-systems, virtio?) that can take advantage of 
vectored I/Os. If it makes sense, then which level to implement: 
bio/request level, SGLs, or a new structure?

Device drivers that support vectored I/Os should be able to opt into the 
interface, while the block layer may automatically roll out for device 
drivers that don't have the support.

What has the history been in the Linux kernel about vector I/Os? What 
have reasons in the past been that such an interface was not adopted?

I will post RFC SMR patches before LSF/MM, such that we have a firm 
ground to discuss how it may be integrated.

-- Besides OCSSDs, I also like to participate in the discussions of 
XCOPY, NVMe, multipath, multi-queue interrupt management as well.

-Matias

WARNING: multiple messages have this Message-ID (diff)
From: m@bjorling.me (Matias Bjørling)
Subject: [LSF/MM TOPIC][LSF/MM ATTEND] OCSSDs - SMR, Hierarchical Interface, and Vector I/Os
Date: Mon, 2 Jan 2017 22:06:14 +0100	[thread overview]
Message-ID: <05204e9d-ed4d-f97a-88f0-41b5e008af43@bjorling.me> (raw)

Hi,

The open-channel SSD subsystem is maturing, and drives are beginning to 
become available on the market. The open-channel SSD interface is very 
similar to the one exposed by SMR hard-drives. They both have a set of 
chunks (zones) exposed, and zones are managed using open/close logic. 
The main difference on open-channel SSDs is that it additionally exposes 
multiple sets of zones through a hierarchical interface, which covers a 
numbers levels (X channels, Y LUNs per channel, Z zones per LUN).

Given that the SMR interface is similar to OCSSDs interface, I like to 
propose to discuss this at LSF/MM to align the efforts and make a clear 
path forward:

1. SMR Compatibility

Can the SMR host interface be adapted to Open-Channel SSDs? For example, 
the interface may be exposed as a single-level set of zones, which 
ignore the channel and lun concept for simplicity. Another approach 
might be to extend the SMR implementation sysfs entries to expose the 
hierarchy of the device (channels with X LUNs and each luns have a set 
of zones).

2. How to expose the tens of LUNs that OCSSDs have?

An open-channel SSDs typically has 64-256 LUNs that each acts as a 
parallel unit. How can these be efficiently exposed?

One may expose these as separate namespaces/partitions. For a DAS with 
24 drives, that will be 1536-6144 separate LUNs to manage. That many 
LUNs will blow up the host with gendisk instances. While if we do, then 
we have an excellent 1:1 mapping between the SMR interface and the OCSSD 
interface.

On the other hand, one could expose the device LUNs within a single LBA 
address space and lay the LUNs out linearly. In that case, the block 
layer may expose a variable that enables applications to understand this 
hierarchy. Mainly the channels with LUNs. Any warm feelings towards this?

Currently, a shortcut is taken with the geometry and hierarchy, which 
expose it through the /lightnvm sysfs entries. These (or a type thereof) 
can be moved to the block layer /queue directory.

If keeping the LUNs exposed on the same gendisk, vector I/Os becomes a 
viable path:

3. Vector I/Os

To derive parallelism from an open-channel SSD (and SSDs in parallel), 
one need to access them in parallel. Parallelism is achieved either by 
issuing I/Os for each LUN (similar to driving multiple SSDs today) or 
using a vector interface (encapsulating a list of LBAs, length, and data 
buffer) into the kernel. The latter approach allows I/Os to be 
vectorized and sent as a single unit to hardware.

Implementing this in generic block layer code might be overkill if only 
open-channel SSDs use it. I like to hear other use-cases (e.g., 
preadv/pwritev, file-systems, virtio?) that can take advantage of 
vectored I/Os. If it makes sense, then which level to implement: 
bio/request level, SGLs, or a new structure?

Device drivers that support vectored I/Os should be able to opt into the 
interface, while the block layer may automatically roll out for device 
drivers that don't have the support.

What has the history been in the Linux kernel about vector I/Os? What 
have reasons in the past been that such an interface was not adopted?

I will post RFC SMR patches before LSF/MM, such that we have a firm 
ground to discuss how it may be integrated.

-- Besides OCSSDs, I also like to participate in the discussions of 
XCOPY, NVMe, multipath, multi-queue interrupt management as well.

-Matias

             reply	other threads:[~2017-01-02 21:06 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-02 21:06 Matias Bjørling [this message]
2017-01-02 21:06 ` [LSF/MM TOPIC][LSF/MM ATTEND] OCSSDs - SMR, Hierarchical Interface, and Vector I/Os Matias Bjørling
2017-01-02 21:06 ` Matias Bjørling
2017-01-02 23:12 ` Viacheslav Dubeyko
2017-01-02 23:12   ` Viacheslav Dubeyko
2017-01-02 23:12   ` Viacheslav Dubeyko
2017-01-03  8:56   ` Matias Bjørling
2017-01-03  8:56     ` Matias Bjørling
2017-01-03 17:35     ` Viacheslav Dubeyko
2017-01-03 17:35       ` Viacheslav Dubeyko
2017-01-03 17:35       ` Viacheslav Dubeyko
2017-01-03 19:10       ` Matias Bjørling
2017-01-03 19:10         ` Matias Bjørling
2017-01-04  2:59         ` Slava Dubeyko
2017-01-04  2:59           ` Slava Dubeyko
2017-01-04  2:59           ` Slava Dubeyko
2017-01-04  7:24           ` Damien Le Moal
2017-01-04  7:24             ` Damien Le Moal
2017-01-04 12:39             ` Matias Bjørling
2017-01-04 12:39               ` Matias Bjørling
2017-01-04 16:57             ` Theodore Ts'o
2017-01-04 16:57               ` Theodore Ts'o
2017-01-10  1:42               ` Damien Le Moal
2017-01-10  1:42                 ` Damien Le Moal
2017-01-10  4:24                 ` Theodore Ts'o
2017-01-10  4:24                   ` Theodore Ts'o
2017-01-10 13:06                   ` Matias Bjorling
2017-01-10 13:06                     ` Matias Bjorling
2017-01-11  4:07                     ` Damien Le Moal
2017-01-11  4:07                       ` Damien Le Moal
2017-01-11  6:06                       ` Matias Bjorling
2017-01-11  6:06                         ` Matias Bjorling
2017-01-11  7:49                       ` Hannes Reinecke
2017-01-11  7:49                         ` Hannes Reinecke
2017-01-05 22:58             ` Slava Dubeyko
2017-01-05 22:58               ` Slava Dubeyko
2017-01-05 22:58               ` Slava Dubeyko
2017-01-06  1:11               ` Theodore Ts'o
2017-01-06  1:11                 ` Theodore Ts'o
2017-01-06 12:51                 ` Matias Bjørling
2017-01-06 12:51                   ` Matias Bjørling
2017-01-06 12:51                   ` Matias Bjørling
2017-01-09  6:49                 ` Slava Dubeyko
2017-01-09  6:49                   ` Slava Dubeyko
2017-01-09  6:49                   ` Slava Dubeyko
2017-01-09 14:55                   ` Theodore Ts'o
2017-01-09 14:55                     ` Theodore Ts'o
2017-01-09 14:55                     ` Theodore Ts'o
2017-01-06 13:05               ` Matias Bjørling
2017-01-06 13:05                 ` Matias Bjørling
2017-01-06 13:05                 ` Matias Bjørling
2017-01-06  1:09             ` Jaegeuk Kim
2017-01-06  1:09               ` Jaegeuk Kim
2017-01-06 12:55               ` Matias Bjørling
2017-01-06 12:55                 ` Matias Bjørling
2017-01-06 12:55                 ` Matias Bjørling
2017-01-12  1:33 ` [LSF/MM " Damien Le Moal
2017-01-12  2:18   ` [Lsf-pc] " James Bottomley
2017-01-12  2:18     ` James Bottomley
2017-01-12  2:35     ` Damien Le Moal
2017-01-12  2:35       ` Damien Le Moal
2017-01-12  2:38       ` James Bottomley
2017-01-12  2:38         ` James Bottomley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=05204e9d-ed4d-f97a-88f0-41b5e008af43@bjorling.me \
    --to=m@bjorling.me \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.