linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Javier Gonzalez <javier@javigon.com>
To: Kanchan Joshi <joshiiitr@gmail.com>
Cc: Keith Busch <kbusch@kernel.org>,
	David Fugate <david.fugate@linux.intel.com>,
	"axboe@kernel.dk" <axboe@kernel.dk>,
	"Damien.LeMoal@wdc.com" <Damien.LeMoal@wdc.com>,
	SelvaKumar S <selvakuma.s1@samsung.com>,
	"sagi@grimberg.me" <sagi@grimberg.me>,
	Kanchan Joshi <joshi.k@samsung.com>,
	"johannes.thumshirn@wdc.com" <johannes.thumshirn@wdc.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>,
	Nitesh Shetty <nj.shetty@samsung.com>,
	Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCH 2/2] nvme: add emulation for zone-append
Date: Thu, 20 Aug 2020 10:14:43 +0200	[thread overview]
Message-ID: <20200820081443.fwu7z3webjkhpeuv@mpHalley.local> (raw)
In-Reply-To: <CA+1E3rJgvvebEex-zVBGtqK_BAaQUtcwq9nDoFm+Rd=DD20zxg@mail.gmail.com>

On 20.08.2020 13:07, Kanchan Joshi wrote:
>On Thu, Aug 20, 2020 at 3:22 AM Keith Busch <kbusch@kernel.org> wrote:
>>
>> On Wed, Aug 19, 2020 at 01:11:58PM -0600, David Fugate wrote:
>> > Intel does not support making *optional* NVMe spec features *required*
>> > by the NVMe driver.
>>
>> This is inaccurate. As another example, the spec optionally allows a
>> zone size to be a power of 2, but linux requires a power of 2 if you
>> want to use it with this driver.
>>
>> > Provided there's no glaring technical issues
>>
>> There are many. Some may be improved through a serious review process,
>> but the mess it makes out of the fast path is measurably harmful to
>> devices that don't subscribe to this. That issue is not so easily
>> remedied.
>>
>> Since this patch is a copy of the scsi implementation, the reasonable
>> thing is take this fight to the generic block layer for a common
>> solution. We're not taking this in the nvme driver.
>
>I sincerely want to minimize any adverse impact to the fast-path of
>non-zoned devices.
>My understanding of that aspect is (I feel it is good to confirm,
>irrespective of the future of this patch):
>
>1. Submission path:
>There is no extra code for non-zoned device IO. For append, there is
>this "ns->append_emulate = 1" check.
>Snippet -
>        case REQ_OP_ZONE_APPEND:
>-               ret = nvme_setup_rw(ns, req, cmd, nvme_cmd_zone_append);
>+               if (!nvme_is_append_emulated(ns))
>+                       ret = nvme_setup_rw(ns, req, cmd, nvme_cmd_zone_append);
>+               else {
>+                       /* prepare append like write, and adjust lba
>afterwards */
>
>2. Completion:
> Not as clean as submission for sure.
>The extra check is "ns && ns->append_emulate == 1" for completions if
>CONFIG_ZONED is enabled.
>A slightly better completion-handling version (than the submitted patch) is -
>
>-       } else if (IS_ENABLED(CONFIG_BLK_DEV_ZONED) &&
>-                  req_op(req) == REQ_OP_ZONE_APPEND) {
>-               req->__sector = nvme_lba_to_sect(req->q->queuedata,
>-                       le64_to_cpu(nvme_req(req)->result.u64));
>+       } else if (IS_ENABLED(CONFIG_BLK_DEV_ZONED)) {
>+               struct nvme_ns *ns = req->q->queuedata;
>+               /* append-emulation requires wp update for some cmds*/
>+               if (ns && nvme_is_append_emulated(ns)) {
>+                       if (nvme_need_zone_wp_update(req))
>+                               nvme_zone_wp_update(ns, req, status);
>+               }
>+               else if (req_op(req) == REQ_OP_ZONE_APPEND)
>+                       req->__sector = nvme_lba_to_sect(ns,
>+                                       le64_to_cpu(nvme_req(req)->result.u64));
>
>Am I missing any other fast-path pain-points?
>
>A quick 1 minute 4K randwrite run (QD 64, 4 jobs,libaio) shows :
>before: IOPS=270k, BW=1056MiB/s (1107MB/s)(61.9GiB/60002msec)
>after:  IOPS=270k, BW=1055MiB/s (1106MB/s)(61.8GiB/60005msec)

It is good to use the QEMU "simulation" path that we implemented to test
performance with different delays, etc., but for these numbers to make
sense we need to put them in contrast to the simulated NAND speed, etc.

>
>This maynot be the best test to see the cost, and I am happy to
>conduct more and improvise.
>
>As for the volume of the code - it is same as SCSI emulation. And I
>can make efforts to reduce that by moving common code to blk-zone, and
>reuse in SCSI/NVMe emulation.
>In the patch I tried to isolate append-emulation by keeping everything
>into "nvme_za_emul". It contains nothing nvme'ish except nvme_ns,
>which can be turned into "driver_data".
>
>+#ifdef CONFIG_BLK_DEV_ZONED
>+struct nvme_za_emul {
>+       unsigned int nr_zones;
>+       spinlock_t zones_wp_offset_lock;
>+       u32 *zones_wp_offset;
>+       u32 *rev_wp_offset;
>+       struct work_struct zone_wp_offset_work;
>+       char *zone_wp_update_buf;
>+       struct mutex rev_mutex;
>+       struct nvme_ns *ns;
>+};
>+#endif
>
>Will that be an acceptable line of further work/discussions?

I know we spent time enabling this path, but I don't think that moving
the discussion to the block layer will have much more benefit.

Let's keep the support for these non-append devices in xNVMe and focus
on the support for the append-enabled ones in Linux. We have a lot of
good stuff in the backlog that we can start pushing.


  reply	other threads:[~2020-08-20  8:15 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20200818053247epcas5p262c5fd7e207dfa5145011c4329cf239d@epcas5p2.samsung.com>
2020-08-18  5:29 ` [PATCH 0/2] enable append-emulation for ZNS Kanchan Joshi
     [not found]   ` <CGME20200818053252epcas5p4ee61d64bba5f6a131105e40330984f5e@epcas5p4.samsung.com>
2020-08-18  5:29     ` [PATCH 1/2] nvme: set io-scheduler requirement " Kanchan Joshi
2020-08-18  7:11       ` Christoph Hellwig
2020-08-19  9:26         ` Kanchan Joshi
2020-08-19  9:38           ` Damien Le Moal
2020-08-19 10:31             ` Kanchan Joshi
2020-08-19 11:17               ` Damien Le Moal
2020-09-07  7:00                 ` Kanchan Joshi
2020-09-07  8:22                   ` Damien Le Moal
2020-09-07 11:23                     ` Kanchan Joshi
2020-09-07 11:37                       ` Damien Le Moal
2020-09-07 11:54                         ` Kanchan Joshi
2020-09-07 12:53                           ` Damien Le Moal
     [not found]   ` <CGME20200818053256epcas5p46d0b66b3702192eb6617c8bba334c15f@epcas5p4.samsung.com>
2020-08-18  5:29     ` [PATCH 2/2] nvme: add emulation for zone-append Kanchan Joshi
2020-08-18  7:12       ` Christoph Hellwig
2020-08-18  9:50         ` Javier Gonzalez
2020-08-18 10:51           ` Matias Bjørling
2020-08-18 18:11             ` Javier Gonzalez
2020-08-18 15:50           ` Christoph Hellwig
2020-08-18 18:04             ` Javier Gonzalez
2020-08-19  7:40               ` Christoph Hellwig
2020-08-19  8:33                 ` Javier Gonzalez
2020-08-19  9:14                   ` Damien Le Moal
2020-08-19 10:43                     ` Christoph Hellwig
2020-08-20  6:45                       ` Javier Gonzalez
2020-08-19 10:49                   ` Christoph Hellwig
2020-08-18 16:58           ` Keith Busch
2020-08-18 17:29             ` Javier Gonzalez
2020-08-18 17:39               ` Keith Busch
2020-08-18 18:13                 ` Javier Gonzalez
2020-08-19 19:11         ` David Fugate
2020-08-19 19:25           ` Jens Axboe
2020-08-19 21:54             ` David Fugate
2020-08-19 22:10               ` Keith Busch
2020-08-19 23:43                 ` David Fugate
2020-08-20  3:45                   ` Keith Busch
2020-08-20 23:26                     ` David Fugate
2020-08-20  5:51                   ` Christoph Hellwig
2020-08-20  6:37             ` Javier Gonzalez
2020-08-20  6:52               ` Christoph Hellwig
2020-08-20  8:03                 ` Javier Gonzalez
2020-08-19 21:42           ` Keith Busch
2020-08-20  7:37             ` Kanchan Joshi
2020-08-20  8:14               ` Javier Gonzalez [this message]
2020-08-20  5:29           ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200820081443.fwu7z3webjkhpeuv@mpHalley.local \
    --to=javier@javigon.com \
    --cc=Damien.LeMoal@wdc.com \
    --cc=axboe@kernel.dk \
    --cc=david.fugate@linux.intel.com \
    --cc=hch@lst.de \
    --cc=johannes.thumshirn@wdc.com \
    --cc=joshi.k@samsung.com \
    --cc=joshiiitr@gmail.com \
    --cc=kbusch@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=nj.shetty@samsung.com \
    --cc=sagi@grimberg.me \
    --cc=selvakuma.s1@samsung.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).