All of lore.kernel.org
 help / color / mirror / Atom feed
* [LSF/MM/BPF Topic] Towards more useful nvme-passthrough
       [not found] <CGME20210609105347epcas5p42ab916655fca311157a38d54f79f95e7@epcas5p4.samsung.com>
@ 2021-06-09 10:50   ` Kanchan Joshi
  0 siblings, 0 replies; 5+ messages in thread
From: Kanchan Joshi @ 2021-06-09 10:50 UTC (permalink / raw)
  To: lsf-pc, linux-nvme, io-uring, linux-block
  Cc: axboe, hch, kbusch, javier, anuj20.g, joshiiitr, Kanchan Joshi

Background & objectives:
------------------------

The NVMe passthrough interface

Good part: allows new device-features to be usable (at least in raw
form) without having to build block-generic cmds, in-kernel users,
emulations and file-generic user-interfaces - all this take some time to
evolve.

Bad part: passthrough interface has remain tied to synchronous ioctl,
which is a blocker for performance-centric usage scenarios. User-space
can take the pain of implementing async-over-sync on its own but it does
not make much sense in a world that already has io_uring.

Passthrough is lean in the sense it cuts through layers of abstractions
and reaches to NVMe fast. One of the objective here is to build a
scalable pass-through that can be readily used to play with new/emerging
NVMe features.  Another is to surpass/match existing raw/direct block
I/O performance with this new in-kernel path.

Recent developments:
--------------------
- NVMe now has a per-namespace char interface that remains available/usable
  even for unsupported features and for new command-sets [1].

- Jens has proposed async-ioctl like facility 'uring_cmd' in io_uring. This
  introduces new possibilities (beyond storage); async-passthrough is one of
those. Last posted version is V4 [2].

- I have posted work on async nvme passthrough over block-dev [3]. Posted work
  is in V4 (in sync with the infra of [2]).

Early performance numbers:
--------------------------
fio, randread, 4k bs, 1 job
Kiops, with varying QD:

QD      Sync-PT         io_uring        Async-PT
1         10.8            10.6            10.6
2         10.9            24.5            24
4         10.6            45              46
8         10.9            90              89
16        11.0            169             170
32        10.6            308             307
64        10.8            503             506
128       10.9            592             596

Further steps/discussion points:
--------------------------------
1.Async-passthrough over nvme char-dev
It is in a shape to receive feedback, but I am not sure if community
would like to take a look at that before settling on uring-cmd infra.

2.Once above gets in shape, bring other perf-centric features of io_uring to
this path -
A. SQPoll and register-file: already functional.
B. Passthrough polling: This can be enabled for block and looks feasible for
char-interface as well.  Keith recently posted enabling polling for user
pass-through [4]
C. Pre-mapped buffers: Early thought is to let the buffers registered by
io_uring, and add a new passthrough ioctl/uring_cmd in driver which does
everything that passthrough does except pinning/unpinning the pages.

3. Are there more things in the "io_uring->nvme->[block-layer]->nvme" path
which can be optimized.

Ideally I'd like to cover good deal of ground before Dec. But there seems
plenty of possibilities on this path.  Discussion would help in how best to
move forward, and cement the ideas.

[1] https://lore.kernel.org/linux-nvme/20210421074504.57750-1-minwoo.im.dev@gmail.com/
[2] https://lore.kernel.org/linux-nvme/20210317221027.366780-1-axboe@kernel.dk/
[3] https://lore.kernel.org/linux-nvme/20210325170540.59619-1-joshi.k@samsung.com/
[4] https://lore.kernel.org/linux-block/20210517171443.GB2709391@dhcp-10-100-145-180.wdc.com/#t

-- 
2.25.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [LSF/MM/BPF Topic] Towards more useful nvme-passthrough
@ 2021-06-09 10:50   ` Kanchan Joshi
  0 siblings, 0 replies; 5+ messages in thread
From: Kanchan Joshi @ 2021-06-09 10:50 UTC (permalink / raw)
  To: lsf-pc, linux-nvme, io-uring, linux-block
  Cc: axboe, hch, kbusch, javier, anuj20.g, joshiiitr, Kanchan Joshi

Background & objectives:
------------------------

The NVMe passthrough interface

Good part: allows new device-features to be usable (at least in raw
form) without having to build block-generic cmds, in-kernel users,
emulations and file-generic user-interfaces - all this take some time to
evolve.

Bad part: passthrough interface has remain tied to synchronous ioctl,
which is a blocker for performance-centric usage scenarios. User-space
can take the pain of implementing async-over-sync on its own but it does
not make much sense in a world that already has io_uring.

Passthrough is lean in the sense it cuts through layers of abstractions
and reaches to NVMe fast. One of the objective here is to build a
scalable pass-through that can be readily used to play with new/emerging
NVMe features.  Another is to surpass/match existing raw/direct block
I/O performance with this new in-kernel path.

Recent developments:
--------------------
- NVMe now has a per-namespace char interface that remains available/usable
  even for unsupported features and for new command-sets [1].

- Jens has proposed async-ioctl like facility 'uring_cmd' in io_uring. This
  introduces new possibilities (beyond storage); async-passthrough is one of
those. Last posted version is V4 [2].

- I have posted work on async nvme passthrough over block-dev [3]. Posted work
  is in V4 (in sync with the infra of [2]).

Early performance numbers:
--------------------------
fio, randread, 4k bs, 1 job
Kiops, with varying QD:

QD      Sync-PT         io_uring        Async-PT
1         10.8            10.6            10.6
2         10.9            24.5            24
4         10.6            45              46
8         10.9            90              89
16        11.0            169             170
32        10.6            308             307
64        10.8            503             506
128       10.9            592             596

Further steps/discussion points:
--------------------------------
1.Async-passthrough over nvme char-dev
It is in a shape to receive feedback, but I am not sure if community
would like to take a look at that before settling on uring-cmd infra.

2.Once above gets in shape, bring other perf-centric features of io_uring to
this path -
A. SQPoll and register-file: already functional.
B. Passthrough polling: This can be enabled for block and looks feasible for
char-interface as well.  Keith recently posted enabling polling for user
pass-through [4]
C. Pre-mapped buffers: Early thought is to let the buffers registered by
io_uring, and add a new passthrough ioctl/uring_cmd in driver which does
everything that passthrough does except pinning/unpinning the pages.

3. Are there more things in the "io_uring->nvme->[block-layer]->nvme" path
which can be optimized.

Ideally I'd like to cover good deal of ground before Dec. But there seems
plenty of possibilities on this path.  Discussion would help in how best to
move forward, and cement the ideas.

[1] https://lore.kernel.org/linux-nvme/20210421074504.57750-1-minwoo.im.dev@gmail.com/
[2] https://lore.kernel.org/linux-nvme/20210317221027.366780-1-axboe@kernel.dk/
[3] https://lore.kernel.org/linux-nvme/20210325170540.59619-1-joshi.k@samsung.com/
[4] https://lore.kernel.org/linux-block/20210517171443.GB2709391@dhcp-10-100-145-180.wdc.com/#t

-- 
2.25.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [LSF/MM/BPF Topic] Towards more useful nvme-passthrough
  2021-06-09 10:50   ` Kanchan Joshi
@ 2021-06-24  9:24     ` Hannes Reinecke
  -1 siblings, 0 replies; 5+ messages in thread
From: Hannes Reinecke @ 2021-06-24  9:24 UTC (permalink / raw)
  To: Kanchan Joshi, lsf-pc, linux-nvme, io-uring, linux-block, Doug Gilbert
  Cc: axboe, hch, kbusch, javier, anuj20.g, joshiiitr

On 6/9/21 12:50 PM, Kanchan Joshi wrote:
> Background & objectives:
> ------------------------
> 
> The NVMe passthrough interface
> 
> Good part: allows new device-features to be usable (at least in raw
> form) without having to build block-generic cmds, in-kernel users,
> emulations and file-generic user-interfaces - all this take some time to
> evolve.
> 
> Bad part: passthrough interface has remain tied to synchronous ioctl,
> which is a blocker for performance-centric usage scenarios. User-space
> can take the pain of implementing async-over-sync on its own but it does
> not make much sense in a world that already has io_uring.
> 
> Passthrough is lean in the sense it cuts through layers of abstractions
> and reaches to NVMe fast. One of the objective here is to build a
> scalable pass-through that can be readily used to play with new/emerging
> NVMe features.  Another is to surpass/match existing raw/direct block
> I/O performance with this new in-kernel path.
> 
> Recent developments:
> --------------------
> - NVMe now has a per-namespace char interface that remains available/usable
>   even for unsupported features and for new command-sets [1].
> 
> - Jens has proposed async-ioctl like facility 'uring_cmd' in io_uring. This
>   introduces new possibilities (beyond storage); async-passthrough is one of
> those. Last posted version is V4 [2].
> 
> - I have posted work on async nvme passthrough over block-dev [3]. Posted work
>   is in V4 (in sync with the infra of [2]).
> 
> Early performance numbers:
> --------------------------
> fio, randread, 4k bs, 1 job
> Kiops, with varying QD:
> 
> QD      Sync-PT         io_uring        Async-PT
> 1         10.8            10.6            10.6
> 2         10.9            24.5            24
> 4         10.6            45              46
> 8         10.9            90              89
> 16        11.0            169             170
> 32        10.6            308             307
> 64        10.8            503             506
> 128       10.9            592             596
> 
> Further steps/discussion points:
> --------------------------------
> 1.Async-passthrough over nvme char-dev
> It is in a shape to receive feedback, but I am not sure if community
> would like to take a look at that before settling on uring-cmd infra.
> 
> 2.Once above gets in shape, bring other perf-centric features of io_uring to
> this path -
> A. SQPoll and register-file: already functional.
> B. Passthrough polling: This can be enabled for block and looks feasible for
> char-interface as well.  Keith recently posted enabling polling for user
> pass-through [4]
> C. Pre-mapped buffers: Early thought is to let the buffers registered by
> io_uring, and add a new passthrough ioctl/uring_cmd in driver which does
> everything that passthrough does except pinning/unpinning the pages.
> 
> 3. Are there more things in the "io_uring->nvme->[block-layer]->nvme" path
> which can be optimized.
> 
> Ideally I'd like to cover good deal of ground before Dec. But there seems
> plenty of possibilities on this path.  Discussion would help in how best to
> move forward, and cement the ideas.
> 
> [1] https://lore.kernel.org/linux-nvme/20210421074504.57750-1-minwoo.im.dev@gmail.com/
> [2] https://lore.kernel.org/linux-nvme/20210317221027.366780-1-axboe@kernel.dk/
> [3] https://lore.kernel.org/linux-nvme/20210325170540.59619-1-joshi.k@samsung.com/
> [4] https://lore.kernel.org/linux-block/20210517171443.GB2709391@dhcp-10-100-145-180.wdc.com/#t
> 
I do like the idea.

What I would like to see is to make the ioring_cmd infrastructure
generally available, such that we can port the SCSI sg asynchronous
interface over to this.
Doug Gilbert has been fighting a lone battle to improve the sg
asynchronous interface, as the current one is deemed a security hazard.
But in the absence of a generic interface he had to design his own
ioctls, with all the expected pushback.
Plus there are only so many people who care about sg internals :-(

Being able to use ioring_cmd would be a neat way out of this.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		           Kernel Storage Architect
hare@suse.de			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [LSF/MM/BPF Topic] Towards more useful nvme-passthrough
@ 2021-06-24  9:24     ` Hannes Reinecke
  0 siblings, 0 replies; 5+ messages in thread
From: Hannes Reinecke @ 2021-06-24  9:24 UTC (permalink / raw)
  To: Kanchan Joshi, lsf-pc, linux-nvme, io-uring, linux-block, Doug Gilbert
  Cc: axboe, hch, kbusch, javier, anuj20.g, joshiiitr

On 6/9/21 12:50 PM, Kanchan Joshi wrote:
> Background & objectives:
> ------------------------
> 
> The NVMe passthrough interface
> 
> Good part: allows new device-features to be usable (at least in raw
> form) without having to build block-generic cmds, in-kernel users,
> emulations and file-generic user-interfaces - all this take some time to
> evolve.
> 
> Bad part: passthrough interface has remain tied to synchronous ioctl,
> which is a blocker for performance-centric usage scenarios. User-space
> can take the pain of implementing async-over-sync on its own but it does
> not make much sense in a world that already has io_uring.
> 
> Passthrough is lean in the sense it cuts through layers of abstractions
> and reaches to NVMe fast. One of the objective here is to build a
> scalable pass-through that can be readily used to play with new/emerging
> NVMe features.  Another is to surpass/match existing raw/direct block
> I/O performance with this new in-kernel path.
> 
> Recent developments:
> --------------------
> - NVMe now has a per-namespace char interface that remains available/usable
>   even for unsupported features and for new command-sets [1].
> 
> - Jens has proposed async-ioctl like facility 'uring_cmd' in io_uring. This
>   introduces new possibilities (beyond storage); async-passthrough is one of
> those. Last posted version is V4 [2].
> 
> - I have posted work on async nvme passthrough over block-dev [3]. Posted work
>   is in V4 (in sync with the infra of [2]).
> 
> Early performance numbers:
> --------------------------
> fio, randread, 4k bs, 1 job
> Kiops, with varying QD:
> 
> QD      Sync-PT         io_uring        Async-PT
> 1         10.8            10.6            10.6
> 2         10.9            24.5            24
> 4         10.6            45              46
> 8         10.9            90              89
> 16        11.0            169             170
> 32        10.6            308             307
> 64        10.8            503             506
> 128       10.9            592             596
> 
> Further steps/discussion points:
> --------------------------------
> 1.Async-passthrough over nvme char-dev
> It is in a shape to receive feedback, but I am not sure if community
> would like to take a look at that before settling on uring-cmd infra.
> 
> 2.Once above gets in shape, bring other perf-centric features of io_uring to
> this path -
> A. SQPoll and register-file: already functional.
> B. Passthrough polling: This can be enabled for block and looks feasible for
> char-interface as well.  Keith recently posted enabling polling for user
> pass-through [4]
> C. Pre-mapped buffers: Early thought is to let the buffers registered by
> io_uring, and add a new passthrough ioctl/uring_cmd in driver which does
> everything that passthrough does except pinning/unpinning the pages.
> 
> 3. Are there more things in the "io_uring->nvme->[block-layer]->nvme" path
> which can be optimized.
> 
> Ideally I'd like to cover good deal of ground before Dec. But there seems
> plenty of possibilities on this path.  Discussion would help in how best to
> move forward, and cement the ideas.
> 
> [1] https://lore.kernel.org/linux-nvme/20210421074504.57750-1-minwoo.im.dev@gmail.com/
> [2] https://lore.kernel.org/linux-nvme/20210317221027.366780-1-axboe@kernel.dk/
> [3] https://lore.kernel.org/linux-nvme/20210325170540.59619-1-joshi.k@samsung.com/
> [4] https://lore.kernel.org/linux-block/20210517171443.GB2709391@dhcp-10-100-145-180.wdc.com/#t
> 
I do like the idea.

What I would like to see is to make the ioring_cmd infrastructure
generally available, such that we can port the SCSI sg asynchronous
interface over to this.
Doug Gilbert has been fighting a lone battle to improve the sg
asynchronous interface, as the current one is deemed a security hazard.
But in the absence of a generic interface he had to design his own
ioctls, with all the expected pushback.
Plus there are only so many people who care about sg internals :-(

Being able to use ioring_cmd would be a neat way out of this.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		           Kernel Storage Architect
hare@suse.de			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [LSF/MM/BPF Topic] Towards more useful nvme-passthrough
  2021-06-24  9:24     ` Hannes Reinecke
  (?)
@ 2022-03-03  0:45     ` Luis Chamberlain
  -1 siblings, 0 replies; 5+ messages in thread
From: Luis Chamberlain @ 2022-03-03  0:45 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Kanchan Joshi, lsf-pc, linux-nvme, io-uring, linux-block,
	Doug Gilbert, axboe, hch, kbusch, javier, anuj20.g, joshiiitr

On Thu, Jun 24, 2021 at 11:24:27AM +0200, Hannes Reinecke wrote:
> On 6/9/21 12:50 PM, Kanchan Joshi wrote:
> > Background & objectives:
> > ------------------------
> > 
> > The NVMe passthrough interface
> > 
> > Good part: allows new device-features to be usable (at least in raw
> > form) without having to build block-generic cmds, in-kernel users,
> > emulations and file-generic user-interfaces - all this take some time to
> > evolve.
> > 
> > Bad part: passthrough interface has remain tied to synchronous ioctl,
> > which is a blocker for performance-centric usage scenarios. User-space
> > can take the pain of implementing async-over-sync on its own but it does
> > not make much sense in a world that already has io_uring.
> > 
> > Passthrough is lean in the sense it cuts through layers of abstractions
> > and reaches to NVMe fast. One of the objective here is to build a
> > scalable pass-through that can be readily used to play with new/emerging
> > NVMe features.  Another is to surpass/match existing raw/direct block
> > I/O performance with this new in-kernel path.
> > 
> > Recent developments:
> > --------------------
> > - NVMe now has a per-namespace char interface that remains available/usable
> >   even for unsupported features and for new command-sets [1].
> > 
> > - Jens has proposed async-ioctl like facility 'uring_cmd' in io_uring. This
> >   introduces new possibilities (beyond storage); async-passthrough is one of
> > those. Last posted version is V4 [2].
> > 
> > - I have posted work on async nvme passthrough over block-dev [3]. Posted work
> >   is in V4 (in sync with the infra of [2]).
> > 
> > Early performance numbers:
> > --------------------------
> > fio, randread, 4k bs, 1 job
> > Kiops, with varying QD:
> > 
> > QD      Sync-PT         io_uring        Async-PT
> > 1         10.8            10.6            10.6
> > 2         10.9            24.5            24
> > 4         10.6            45              46
> > 8         10.9            90              89
> > 16        11.0            169             170
> > 32        10.6            308             307
> > 64        10.8            503             506
> > 128       10.9            592             596
> > 
> > Further steps/discussion points:
> > --------------------------------
> > 1.Async-passthrough over nvme char-dev
> > It is in a shape to receive feedback, but I am not sure if community
> > would like to take a look at that before settling on uring-cmd infra.
> > 
> > 2.Once above gets in shape, bring other perf-centric features of io_uring to
> > this path -
> > A. SQPoll and register-file: already functional.
> > B. Passthrough polling: This can be enabled for block and looks feasible for
> > char-interface as well.  Keith recently posted enabling polling for user
> > pass-through [4]
> > C. Pre-mapped buffers: Early thought is to let the buffers registered by
> > io_uring, and add a new passthrough ioctl/uring_cmd in driver which does
> > everything that passthrough does except pinning/unpinning the pages.
> > 
> > 3. Are there more things in the "io_uring->nvme->[block-layer]->nvme" path
> > which can be optimized.
> > 
> > Ideally I'd like to cover good deal of ground before Dec. But there seems
> > plenty of possibilities on this path.  Discussion would help in how best to
> > move forward, and cement the ideas.
> > 
> > [1] https://lore.kernel.org/linux-nvme/20210421074504.57750-1-minwoo.im.dev@gmail.com/
> > [2] https://lore.kernel.org/linux-nvme/20210317221027.366780-1-axboe@kernel.dk/
> > [3] https://lore.kernel.org/linux-nvme/20210325170540.59619-1-joshi.k@samsung.com/
> > [4] https://lore.kernel.org/linux-block/20210517171443.GB2709391@dhcp-10-100-145-180.wdc.com/#t
> > 
> I do like the idea.
> 
> What I would like to see is to make the ioring_cmd infrastructure
> generally available, such that we can port the SCSI sg asynchronous
> interface over to this.

What prevents you from doing this already? I think we just need more
patch reviews for the generic io-uring cmd patches, no?

 Luis

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-03-03  0:45 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CGME20210609105347epcas5p42ab916655fca311157a38d54f79f95e7@epcas5p4.samsung.com>
2021-06-09 10:50 ` [LSF/MM/BPF Topic] Towards more useful nvme-passthrough Kanchan Joshi
2021-06-09 10:50   ` Kanchan Joshi
2021-06-24  9:24   ` Hannes Reinecke
2021-06-24  9:24     ` Hannes Reinecke
2022-03-03  0:45     ` Luis Chamberlain

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.