linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Kanchan Joshi <joshi.k@samsung.com>
To: axboe@kernel.dk, viro@zeniv.linux.org.uk, bcrl@kvack.org
Cc: hch@infradead.org, Damien.LeMoal@wdc.com, asml.silence@gmail.com,
	linux-fsdevel@vger.kernel.org, mb@lightnvm.io,
	linux-kernel@vger.kernel.org, linux-aio@kvack.org,
	io-uring@vger.kernel.org, linux-block@vger.kernel.org,
	Kanchan Joshi <joshi.k@samsung.com>
Subject: [PATCH v3 0/4] zone-append support in io-uring and aio
Date: Mon,  6 Jul 2020 00:17:46 +0530	[thread overview]
Message-ID: <1593974870-18919-1-git-send-email-joshi.k@samsung.com> (raw)
In-Reply-To: CGME20200705185204epcas5p3adeb4fc3473c5fc0472a7396783c5267@epcas5p3.samsung.com

Changes since v2:
- Use file append infra (O_APPEND/RWF_APPEND) to trigger zone-append
(Christoph, Wilcox)
- Added Block I/O path changes (Damien). Avoided append split into multi-bio.
- Added patch to extend zone-append in block-layer to support bvec iov_iter.
Append using io-uring fixed-buffer is enabled with this.
- Made io-uring support code more concise, added changes mentioned by Pavel.

v2: https://lore.kernel.org/io-uring/1593105349-19270-1-git-send-email-joshi.k@samsung.com/

Changes since v1:
- No new opcodes in uring or aio. Use RWF_ZONE_APPEND flag instead.
- linux-aio changes vanish because of no new opcode
- Fixed the overflow and other issues mentioned by Damien
- Simplified uring support code, fixed the issues mentioned by Pavel
- Added error checks for io-uring fixed-buffer and sync kiocb

v1: https://lore.kernel.org/io-uring/1592414619-5646-1-git-send-email-joshi.k@samsung.com/

Cover letter (updated):

This patchset enables zone-append using io-uring/linux-aio, on block IO path.
Purpose is to provide zone-append consumption ability to applications which are
using zoned-block-device directly.
Application can send write with existing O/RWF_APPEND;On a zoned-block-device
this will trigger zone-append. On regular block device existing behavior is
retained. However, infra allows zone-append to be triggered on any file if
FMODE_ZONE_APPEND (new kernel-only fmode) is set during open.

With zone-append, written-location within zone is known only after completion.
So apart from usual return value of write, additional mean is needed to obtain
the actual written location.

In aio, this is returned to application using res2 field of io_event -

struct io_event {
        __u64           data;           /* the data field from the iocb */
        __u64           obj;            /* what iocb this event came from */
        __s64           res;            /* result code for this event */
        __s64           res2;           /* secondary result */
};

In io-uring, cqe->flags is repurposed for zone-append result.

struct io_uring_cqe {
        __u64   user_data;      /* sqe->data submission passed back */
        __s32   res;            /* result code for this event */
        __u32   flags;
};

32 bit flags is not sufficient, to cover zone-size represented by chunk_sectors.
Discussions in the LKML led to following ways to go about it -
Option 1: Return zone-relative offset in sector/512b unit
Option 2: Return zone-relative offset in bytes

With option #1, io-uring changes remain minimal, relatively clean, and extra
checks and conversions are avoided in I/O path. Also ki_complete interface change
is avoided (last parameter ret2 is of long type, which cannot store return value
in bytes). Bad part of the choice is - return value is in 512b units and not in
bytes. To hide that, a wrapper needs to be written in user-space that converts
cqe->flags value to bytes and combines with zone-start.

Option #2 requires pulling some bits from cqe->res and combine those with
cqe->flags to store result in bytes. This bitwise scattering needs to be done
by kernel in I/O path, and application still needs to have a relatively
heavyweight wrapper to assemble the pieces so that both cqe->res and append
location are derived correctly.

Patchset picks option #1.

Kanchan Joshi (2):
  fs: introduce FMODE_ZONE_APPEND and IOCB_ZONE_APPEND
  block: enable zone-append for iov_iter of bvec type

Selvakumar S (2):
  block: add zone append handling for direct I/O path
  io_uring: add support for zone-append

 block/bio.c        | 31 ++++++++++++++++++++++++++++---
 fs/block_dev.c     | 49 ++++++++++++++++++++++++++++++++++++++++---------
 fs/io_uring.c      | 21 +++++++++++++++++++--
 include/linux/fs.h | 14 ++++++++++++--
 4 files changed, 99 insertions(+), 16 deletions(-)

-- 
2.7.4


       reply	other threads:[~2020-07-05 18:52 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20200705185204epcas5p3adeb4fc3473c5fc0472a7396783c5267@epcas5p3.samsung.com>
2020-07-05 18:47 ` Kanchan Joshi [this message]
     [not found]   ` <CGME20200705185211epcas5p4059d05d2fcedb91829300a7a7d03fda3@epcas5p4.samsung.com>
2020-07-05 18:47     ` [PATCH v3 1/4] fs: introduce FMODE_ZONE_APPEND and IOCB_ZONE_APPEND Kanchan Joshi
     [not found]   ` <CGME20200705185217epcas5p1cc12d4b892f057a1fe06d73a00869daa@epcas5p1.samsung.com>
2020-07-05 18:47     ` [PATCH v3 2/4] block: add zone append handling for direct I/O path Kanchan Joshi
     [not found]   ` <CGME20200705185221epcas5p28b6d060df829b751109265222285da0e@epcas5p2.samsung.com>
2020-07-05 18:47     ` [PATCH v3 3/4] block: enable zone-append for iov_iter of bvec type Kanchan Joshi
     [not found]   ` <CGME20200705185227epcas5p16fba3cb92561794b960184c89fdf2bb7@epcas5p1.samsung.com>
2020-07-05 18:47     ` [PATCH v3 4/4] io_uring: add support for zone-append Kanchan Joshi
2020-07-05 21:00       ` Jens Axboe
2020-07-05 21:09         ` Matthew Wilcox
2020-07-05 21:12           ` Jens Axboe
2020-07-06 14:10             ` Matthew Wilcox
2020-07-06 14:27               ` Jens Axboe
2020-07-06 14:32                 ` Matthew Wilcox
2020-07-06 14:33                   ` Jens Axboe
2020-07-07 15:11                   ` Kanchan Joshi
2020-07-07 15:52                     ` Matthew Wilcox
2020-07-07 16:00                       ` Christoph Hellwig
2020-07-07 20:23                       ` Kanchan Joshi
2020-07-07 20:40                         ` Jens Axboe
2020-07-07 22:18                           ` Matthew Wilcox
2020-07-07 22:37                             ` Jens Axboe
2020-07-08 12:58                               ` Kanchan Joshi
2020-07-08 14:22                                 ` Matthew Wilcox
2020-07-08 16:41                                   ` Kanchan Joshi
2020-07-08 14:54                                 ` Jens Axboe
2020-07-08 14:58                                   ` Matthew Wilcox
2020-07-08 14:59                                     ` Jens Axboe
2020-07-08 15:02                                       ` Matthew Wilcox
2020-07-08 15:06                                         ` Jens Axboe
2020-07-08 16:08                                           ` Javier González
2020-07-08 16:33                                             ` Matthew Wilcox
2020-07-08 16:38                                               ` Jens Axboe
2020-07-08 17:13                                                 ` Kanchan Joshi
2020-07-08 16:43                                               ` Javier González
2020-07-06 13:58         ` Kanchan Joshi
2020-07-09 10:15         ` Christoph Hellwig
2020-07-09 13:58           ` Jens Axboe
2020-07-09 14:00             ` Christoph Hellwig
2020-07-09 14:05               ` Jens Axboe
2020-07-09 18:36                 ` Kanchan Joshi
2020-07-09 18:50                   ` Pavel Begunkov
2020-07-09 18:53                     ` Pavel Begunkov
2020-07-09 18:50                   ` Jens Axboe
2020-07-09 19:05                     ` Kanchan Joshi
2020-07-10 13:10                       ` Christoph Hellwig
2020-07-10 13:48                         ` Matthew Wilcox
2020-07-10 13:49                           ` Christoph Hellwig
2020-07-10 13:51                             ` Matthew Wilcox
2020-07-10 14:11                               ` Kanchan Joshi
2020-07-20 16:49                                 ` Kanchan Joshi
2020-07-20 17:14                                   ` Matthew Wilcox
2020-07-20 20:17                                     ` Kanchan Joshi
2020-07-21  0:59                                       ` Damien Le Moal
2020-07-21  1:15                                         ` Matthew Wilcox
2020-07-21  1:29                                           ` Jens Axboe
2020-07-21  2:19                                           ` Damien Le Moal
2020-07-10 14:09                         ` Jens Axboe
2020-07-20 16:46                           ` Kanchan Joshi
2020-07-10 13:09                     ` Christoph Hellwig
2020-07-10 13:29                       ` Kanchan Joshi
2020-07-10 13:43                         ` Christoph Hellwig
2020-07-20 17:02                           ` Kanchan Joshi
2020-07-10 13:57                         ` Kanchan Joshi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1593974870-18919-1-git-send-email-joshi.k@samsung.com \
    --to=joshi.k@samsung.com \
    --cc=Damien.LeMoal@wdc.com \
    --cc=asml.silence@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=bcrl@kvack.org \
    --cc=hch@infradead.org \
    --cc=io-uring@vger.kernel.org \
    --cc=linux-aio@kvack.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mb@lightnvm.io \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).