io-uring.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Kanchan Joshi <joshi.k@samsung.com>
To: Christoph Hellwig <hch@lst.de>
Cc: axboe@kernel.dk, kbusch@kernel.org, asml.silence@gmail.com,
	io-uring@vger.kernel.org, linux-nvme@lists.infradead.org,
	linux-block@vger.kernel.org, sbates@raithlin.com,
	logang@deltatee.com, pankydev8@gmail.com, javier@javigon.com,
	mcgrof@kernel.org, a.manzanares@samsung.com, joshiiitr@gmail.com,
	anuj20.g@samsung.com
Subject: Re: [PATCH 05/17] nvme: wire-up support for async-passthru on char-device.
Date: Mon, 14 Mar 2022 21:53:56 +0530	[thread overview]
Message-ID: <20220314162356.GA13902@test-zns> (raw)
In-Reply-To: <20220311070148.GA17881@lst.de>

[-- Attachment #1: Type: text/plain, Size: 6563 bytes --]

On Fri, Mar 11, 2022 at 08:01:48AM +0100, Christoph Hellwig wrote:
>On Tue, Mar 08, 2022 at 08:50:53PM +0530, Kanchan Joshi wrote:
>> +/*
>> + * This overlays struct io_uring_cmd pdu.
>> + * Expect build errors if this grows larger than that.
>> + */
>> +struct nvme_uring_cmd_pdu {
>> +	u32 meta_len;
>> +	union {
>> +		struct bio *bio;
>> +		struct request *req;
>> +	};
>> +	void *meta; /* kernel-resident buffer */
>> +	void __user *meta_buffer;
>> +} __packed;
>
>Why is this marked __packed?
Did not like doing it, but had to.
If not packed, this takes 32 bytes of space. While driver-pdu in struct
io_uring_cmd can take max 30 bytes. Packing nvme-pdu brought it down to
28 bytes, which fits and gives 2 bytes back.

For quick reference - 
struct io_uring_cmd {
        struct file *              file;                 /*     0     8 */
        void *                     cmd;                  /*     8     8 */
        union {
                void *             bio;                  /*    16     8 */
                void               (*driver_cb)(struct io_uring_cmd *); /*    16     8 */
        };                                               /*    16     8 */
        u32                        flags;                /*    24     4 */
        u32                        cmd_op;               /*    28     4 */
        u16                        cmd_len;              /*    32     2 */
        u16                        unused;               /*    34     2 */
        u8                         pdu[28];              /*    36    28 */

        /* size: 64, cachelines: 1, members: 8 */
};
io_uring_cmd struct goes into the first cacheline of io_kiocb.
Last field is pdu, taking 28 bytes. Will be 30 if I evaporate above
field.
nvme-pdu after packing:
struct nvme_uring_cmd_pdu {
        u32                        meta_len;             /*     0     4 */
        union {
                struct bio *       bio;                  /*     4     8 */
                struct request *   req;                  /*     4     8 */
        };                                               /*     4     8 */
        void *                     meta;                 /*    12     8 */
        void *                     meta_buffer;          /*    20     8 */

        /* size: 28, cachelines: 1, members: 4 */
        /* last cacheline: 28 bytes */
} __attribute__((__packed__));

>In general I'd be much more happy if the meta elelements were a
>io_uring-level feature handled outside the driver and typesafe in
>struct io_uring_cmd, with just a normal private data pointer for the
>actual user, which would remove all the crazy casting.

Not sure if I got your point.

+static struct nvme_uring_cmd_pdu *nvme_uring_cmd_pdu(struct io_uring_cmd *ioucmd)
+{
+       return (struct nvme_uring_cmd_pdu *)&ioucmd->pdu;
+}
+
+static void nvme_pt_task_cb(struct io_uring_cmd *ioucmd)
+{
+       struct nvme_uring_cmd_pdu *pdu = nvme_uring_cmd_pdu(ioucmd);

Do you mean crazy casting inside nvme_uring_cmd_pdu()?
Somehow this looks sane to me (perhaps because it used to be crazier
earlier).

And on moving meta elements outside the driver, my worry is that it
reduces scope of uring-cmd infra and makes it nvme passthru specific.
At this point uring-cmd is still generic async ioctl/fsctl facility
which may find other users (than nvme-passthru) down the line. 
Organization of fields within "struct io_uring_cmd" is around the rule
that a field is kept out (of 28 bytes pdu) only if is accessed by both
io_uring and driver. 

>> +static void nvme_end_async_pt(struct request *req, blk_status_t err)
>> +{
>> +	struct io_uring_cmd *ioucmd = req->end_io_data;
>> +	struct nvme_uring_cmd_pdu *pdu = nvme_uring_cmd_pdu(ioucmd);
>> +	/* extract bio before reusing the same field for request */
>> +	struct bio *bio = pdu->bio;
>> +
>> +	pdu->req = req;
>> +	req->bio = bio;
>> +	/* this takes care of setting up task-work */
>> +	io_uring_cmd_complete_in_task(ioucmd, nvme_pt_task_cb);
>
>This is a bit silly.  First we defer the actual request I/O completion
>from the block layer to a different CPU or softirq and then we have
>another callback here.  I think it would be much more useful if we
>could find a way to enhance blk_mq_complete_request so that it could
>directly complet in a given task.  That would also be really nice for
>say normal synchronous direct I/O.

I see, so there is room for adding some efficiency.
Hope it will be ok if I carry this out as a separate effort.
Since this is about touching blk_mq_complete_request at its heart, and
improving sync-direct-io, this does not seem best-fit and slow this
series down.

FWIW, I ran the tests with counters inside blk_mq_complete_request_remote()

        if (blk_mq_complete_need_ipi(rq)) {
                blk_mq_complete_send_ipi(rq);
                return true;
        }

        if (rq->q->nr_hw_queues == 1) {
                blk_mq_raise_softirq(rq);
                return true;
        }
Deferring by ipi or softirq never occured. Neither for block nor for
char. Softirq is obvious since I was not running against scsi (or nvme with
single queue). I could not spot whether this is really a overhead, at
least for nvme.


>> +	if (ioucmd) { /* async dispatch */
>> +		if (cmd->common.opcode == nvme_cmd_write ||
>> +				cmd->common.opcode == nvme_cmd_read) {
>
>No we can't just check for specific commands in the passthrough handler.

Right. This is for inline-cmd approach. 
Last two patches of the series undo this (for indirect-cmd).
I will do something about it.

>> +			nvme_setup_uring_cmd_data(req, ioucmd, meta, meta_buffer,
>> +					meta_len);
>> +			blk_execute_rq_nowait(req, 0, nvme_end_async_pt);
>> +			return 0;
>> +		} else {
>> +			/* support only read and write for now. */
>> +			ret = -EINVAL;
>> +			goto out_meta;
>> +		}
>
>Pleae always handle error in the first branch and don't bother with an
>else after a goto or return.

Yes, that'll be better.
>> +static int nvme_ns_async_ioctl(struct nvme_ns *ns, struct io_uring_cmd *ioucmd)
>> +{
>> +	int ret;
>> +
>> +	BUILD_BUG_ON(sizeof(struct nvme_uring_cmd_pdu) > sizeof(ioucmd->pdu));
>> +
>> +	switch (ioucmd->cmd_op) {
>> +	case NVME_IOCTL_IO64_CMD:
>> +		ret = nvme_user_cmd64(ns->ctrl, ns, NULL, ioucmd);
>> +		break;
>> +	default:
>> +		ret = -ENOTTY;
>> +	}
>> +
>> +	if (ret >= 0)
>> +		ret = -EIOCBQUEUED;
>
>That's a weird way to handle the returns.  Just return -EIOCBQUEUED
>directly from the handler (which as said before should be split from
>the ioctl handler anyway).

Indeed. That will make it cleaner.

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



  reply	other threads:[~2022-03-14 17:10 UTC|newest]

Thread overview: 122+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20220308152651epcas5p1ebd2dc7fa01db43dd587c228a3695696@epcas5p1.samsung.com>
2022-03-08 15:20 ` [PATCH 00/17] io_uring passthru over nvme Kanchan Joshi
     [not found]   ` <CGME20220308152653epcas5p10c31f58cf6bff125cc0baa176b4d4fac@epcas5p1.samsung.com>
2022-03-08 15:20     ` [PATCH 01/17] io_uring: add support for 128-byte SQEs Kanchan Joshi
     [not found]   ` <CGME20220308152655epcas5p4ae47d715e1c15069e97152dcd283fd40@epcas5p4.samsung.com>
2022-03-08 15:20     ` [PATCH 02/17] fs: add file_operations->async_cmd() Kanchan Joshi
     [not found]   ` <CGME20220308152658epcas5p3929bd1fcf75edc505fec71901158d1b5@epcas5p3.samsung.com>
2022-03-08 15:20     ` [PATCH 03/17] io_uring: add infra and support for IORING_OP_URING_CMD Kanchan Joshi
2022-03-11  1:51       ` Luis Chamberlain
2022-03-11  2:43         ` Jens Axboe
2022-03-11 17:11           ` Luis Chamberlain
2022-03-11 18:47             ` Paul Moore
2022-03-11 20:57               ` Luis Chamberlain
2022-03-11 21:03                 ` Paul Moore
2022-03-14 16:25             ` Casey Schaufler
2022-03-14 16:32               ` Luis Chamberlain
2022-03-14 18:05                 ` Casey Schaufler
2022-03-14 19:40                   ` Luis Chamberlain
     [not found]   ` <CGME20220308152700epcas5p4130d20119a3a250a2515217d6552f668@epcas5p4.samsung.com>
2022-03-08 15:20     ` [PATCH 04/17] nvme: modify nvme_alloc_request to take an additional parameter Kanchan Joshi
2022-03-11  6:38       ` Christoph Hellwig
     [not found]   ` <CGME20220308152702epcas5p1eb1880e024ac8b9531c85a82f31a4e78@epcas5p1.samsung.com>
2022-03-08 15:20     ` [PATCH 05/17] nvme: wire-up support for async-passthru on char-device Kanchan Joshi
2022-03-10  0:02       ` Clay Mayers
2022-03-10  8:32         ` Kanchan Joshi
2022-03-11  7:01       ` Christoph Hellwig
2022-03-14 16:23         ` Kanchan Joshi [this message]
2022-03-15  8:54           ` Christoph Hellwig
2022-03-16  7:27             ` Kanchan Joshi
2022-03-24  6:22               ` Christoph Hellwig
2022-03-24 17:45                 ` Kanchan Joshi
2022-03-11 17:56       ` Luis Chamberlain
2022-03-11 18:53         ` Paul Moore
2022-03-11 21:02           ` Luis Chamberlain
2022-03-13 21:53       ` Sagi Grimberg
2022-03-14 17:54         ` Kanchan Joshi
2022-03-15  9:02           ` Sagi Grimberg
2022-03-16  9:21             ` Kanchan Joshi
2022-03-16 10:56               ` Sagi Grimberg
2022-03-16 11:51                 ` Kanchan Joshi
2022-03-16 13:52                   ` Sagi Grimberg
2022-03-16 14:35                     ` Jens Axboe
2022-03-16 14:50                       ` Sagi Grimberg
2022-03-24  6:20                         ` Christoph Hellwig
2022-03-24 10:42                           ` Sagi Grimberg
2022-03-22 15:18       ` Clay Mayers
2022-03-22 16:57         ` Kanchan Joshi
     [not found]   ` <CGME20220308152704epcas5p16610e1f50672b25fa1df5f7c5c261bb5@epcas5p1.samsung.com>
2022-03-08 15:20     ` [PATCH 06/17] io_uring: prep for fixed-buffer enabled uring-cmd Kanchan Joshi
     [not found]   ` <CGME20220308152707epcas5p430127761a7fd4bf90c2501eabe9ee96e@epcas5p4.samsung.com>
2022-03-08 15:20     ` [PATCH 07/17] io_uring: add support for uring_cmd with fixed-buffer Kanchan Joshi
     [not found]   ` <CGME20220308152709epcas5p1f9d274a0214dc462c22c278a72d8697c@epcas5p1.samsung.com>
2022-03-08 15:20     ` [PATCH 08/17] nvme: enable passthrough " Kanchan Joshi
2022-03-10  8:32       ` Christoph Hellwig
2022-03-11  6:43       ` Christoph Hellwig
2022-03-14 13:06         ` Kanchan Joshi
2022-03-15  8:55           ` Christoph Hellwig
2022-03-14 12:18       ` Ming Lei
2022-03-14 13:09         ` Kanchan Joshi
     [not found]   ` <CGME20220308152711epcas5p31de5d63f5de91fae94e61e5c857c0f13@epcas5p3.samsung.com>
2022-03-08 15:20     ` [PATCH 09/17] io_uring: plug for async bypass Kanchan Joshi
2022-03-10  8:33       ` Christoph Hellwig
2022-03-14 14:33         ` Ming Lei
2022-03-15  8:56           ` Christoph Hellwig
2022-03-11 17:15       ` Luis Chamberlain
     [not found]   ` <CGME20220308152714epcas5p4c5a0d16512fd7054c9a713ee28ede492@epcas5p4.samsung.com>
2022-03-08 15:20     ` [PATCH 10/17] block: wire-up support for plugging Kanchan Joshi
2022-03-10  8:34       ` Christoph Hellwig
2022-03-10 12:40         ` Kanchan Joshi
2022-03-14 14:40           ` Ming Lei
2022-03-21  7:02             ` Kanchan Joshi
2022-03-23  1:27               ` Ming Lei
2022-03-23  1:41                 ` Jens Axboe
2022-03-23  1:58                   ` Jens Axboe
2022-03-23  2:10                     ` Ming Lei
2022-03-23  2:17                       ` Jens Axboe
     [not found]   ` <CGME20220308152716epcas5p3d38d2372c184259f1a10c969f7e4396f@epcas5p3.samsung.com>
2022-03-08 15:20     ` [PATCH 11/17] block: factor out helper for bio allocation from cache Kanchan Joshi
2022-03-10  8:35       ` Christoph Hellwig
2022-03-10 12:25         ` Kanchan Joshi
2022-03-24  6:30           ` Christoph Hellwig
2022-03-24 17:45             ` Kanchan Joshi
2022-03-25  5:38               ` Christoph Hellwig
     [not found]   ` <CGME20220308152718epcas5p3afd2c8a628f4e9733572cbb39270989d@epcas5p3.samsung.com>
2022-03-08 15:21     ` [PATCH 12/17] nvme: enable bio-cache for fixed-buffer passthru Kanchan Joshi
2022-03-11  6:48       ` Christoph Hellwig
2022-03-14 18:18         ` Kanchan Joshi
2022-03-15  8:57           ` Christoph Hellwig
     [not found]   ` <CGME20220308152720epcas5p19653942458e160714444942ddb8b8579@epcas5p1.samsung.com>
2022-03-08 15:21     ` [PATCH 13/17] nvme: allow user passthrough commands to poll Kanchan Joshi
2022-03-08 17:08       ` Keith Busch
2022-03-09  7:03         ` Kanchan Joshi
2022-03-11  6:49           ` Christoph Hellwig
     [not found]   ` <CGME20220308152723epcas5p34460b4af720e515317f88dbb78295f06@epcas5p3.samsung.com>
2022-03-08 15:21     ` [PATCH 14/17] io_uring: add polling support for uring-cmd Kanchan Joshi
2022-03-11  6:50       ` Christoph Hellwig
2022-03-14 10:16         ` Kanchan Joshi
2022-03-15  8:57           ` Christoph Hellwig
2022-03-16  5:09             ` Kanchan Joshi
2022-03-24  6:30               ` Christoph Hellwig
     [not found]   ` <CGME20220308152725epcas5p36d1ce3269a47c1c22cc0d66bdc2b9eb3@epcas5p3.samsung.com>
2022-03-08 15:21     ` [PATCH 15/17] nvme: wire-up polling for uring-passthru Kanchan Joshi
     [not found]   ` <CGME20220308152727epcas5p20e605718dd99e97c94f9232d40d04d95@epcas5p2.samsung.com>
2022-03-08 15:21     ` [PATCH 16/17] io_uring: add support for non-inline uring-cmd Kanchan Joshi
     [not found]   ` <CGME20220308152729epcas5p17e82d59c68076eb46b5ef658619d65e3@epcas5p1.samsung.com>
2022-03-08 15:21     ` [PATCH 17/17] nvme: enable non-inline passthru commands Kanchan Joshi
2022-03-10  8:36       ` Christoph Hellwig
2022-03-10 11:50         ` Kanchan Joshi
2022-03-10 14:19           ` Christoph Hellwig
2022-03-10 18:43             ` Kanchan Joshi
2022-03-11  6:27               ` Christoph Hellwig
2022-03-22 17:10                 ` Kanchan Joshi
2022-03-24  6:32                   ` Christoph Hellwig
2022-03-25 13:39                     ` Kanchan Joshi
2022-03-28  4:44                       ` Kanchan Joshi
2022-03-30 12:59                         ` Christoph Hellwig
2022-03-30 13:02                       ` Christoph Hellwig
2022-03-30 13:14                         ` Kanchan Joshi
2022-04-01  1:25                           ` Jens Axboe
2022-04-01  2:33                             ` Kanchan Joshi
2022-04-01  2:44                               ` Jens Axboe
2022-04-01  3:05                                 ` Jens Axboe
2022-04-01  6:32                                 ` Kanchan Joshi
2022-04-19 17:31                                 ` Kanchan Joshi
2022-04-19 18:19                                   ` Jens Axboe
     [not found]                                     ` <CGME20220420152003epcas5p3991e6941773690bcb425fd9d817105c3@epcas5p3.samsung.com>
2022-04-20 15:14                                       ` Kanchan Joshi
2022-04-20 15:28                                         ` Kanchan Joshi
2022-04-01  1:23                         ` Jens Axboe
2022-04-01  1:22                   ` Jens Axboe
2022-04-01  6:29                     ` Kanchan Joshi
2022-03-24 21:09       ` Clay Mayers
2022-03-24 23:36         ` Jens Axboe
2022-03-10  8:29   ` [PATCH 00/17] io_uring passthru over nvme Christoph Hellwig
2022-03-10 10:05     ` Kanchan Joshi
2022-03-11 16:43       ` Luis Chamberlain
2022-03-11 23:35         ` Adam Manzanares
2022-03-12  2:27           ` Adam Manzanares
2022-03-13  5:07             ` Kanchan Joshi
2022-03-14 20:30               ` Adam Manzanares
2022-03-13  5:10         ` Kanchan Joshi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220314162356.GA13902@test-zns \
    --to=joshi.k@samsung.com \
    --cc=a.manzanares@samsung.com \
    --cc=anuj20.g@samsung.com \
    --cc=asml.silence@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=hch@lst.de \
    --cc=io-uring@vger.kernel.org \
    --cc=javier@javigon.com \
    --cc=joshiiitr@gmail.com \
    --cc=kbusch@kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=logang@deltatee.com \
    --cc=mcgrof@kernel.org \
    --cc=pankydev8@gmail.com \
    --cc=sbates@raithlin.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).