From: Damien Le Moal <Damien.LeMoal@wdc.com> To: Kanchan Joshi <joshiiitr@gmail.com> Cc: Christoph Hellwig <hch@lst.de>, Kanchan Joshi <joshi.k@samsung.com>, Jens Axboe <axboe@kernel.dk>, "sagi@grimberg.me" <sagi@grimberg.me>, Johannes Thumshirn <Johannes.Thumshirn@wdc.com>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, "linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>, Keith Busch <kbusch@kernel.org>, Selvakumar S <selvakuma.s1@samsung.com>, Javier Gonzalez <javier.gonz@samsung.com>, Nitesh Shetty <nj.shetty@samsung.com> Subject: Re: [PATCH 1/2] nvme: set io-scheduler requirement for ZNS Date: Mon, 7 Sep 2020 12:53:20 +0000 [thread overview] Message-ID: <CY4PR04MB375105E77F87B60E74E025BAE7280@CY4PR04MB3751.namprd04.prod.outlook.com> (raw) In-Reply-To: CA+1E3rJtGTt6gss-5uAjzQ+BAXWTOTcjUzyFToz-QWbEfkLkaA@mail.gmail.com On 2020/09/07 20:54, Kanchan Joshi wrote: > On Mon, Sep 7, 2020 at 5:07 PM Damien Le Moal <Damien.LeMoal@wdc.com> wrote: >> >> On 2020/09/07 20:24, Kanchan Joshi wrote: >>> On Mon, Sep 7, 2020 at 1:52 PM Damien Le Moal <Damien.LeMoal@wdc.com> wrote: >>>> >>>> On 2020/09/07 16:01, Kanchan Joshi wrote: >>>>>> Even for SMR, the user is free to set the elevator to none, which disables zone >>>>>> write locking. Issuing writes correctly then becomes the responsibility of the >>>>>> application. This can be useful for settings that for instance use NCQ I/O >>>>>> priorities, which give better results when "none" is used. >>>>> >>>>> Was it not a problem that even if the application is sending writes >>>>> correctly, scheduler may not preserve the order. >>>>> And even when none is being used, re-queue can happen which may lead >>>>> to different ordering. >>>> >>>> "Issuing writes correctly" means doing small writes, one per zone at most. In >>>> that case, it does not matter if the block layer reorders writes. Per zone, they >>>> will still be sequential. >>>> >>>>>> As far as I know, zoned drives are always used in tightly controlled >>>>>> environments. Problems like "does not know what other applications would be >>>>>> doing" are non-existent. Setting up the drive correctly for the use case at hand >>>>>> is a sysadmin/server setup problem, based on *the* application (singular) >>>>>> requirements. >>>>> >>>>> Fine. >>>>> But what about the null-block-zone which sets MQ-deadline but does not >>>>> actually use write-lock to avoid race among multiple appends on a >>>>> zone. >>>>> Does that deserve a fix? >>>> >>>> In nullblk, commands are executed under a spinlock. So there is no concurrency >>>> problem. The spinlock serializes the execution of all commands. null_blk zone >>>> append emulation thus does not need to take the scheduler level zone write lock >>>> like scsi does. >>> >>> I do not see spinlock for that. There is one "nullb->lock", but its >>> scope is limited to memory-backed handling. >>> For concurrent zone-appends on a zone, multiple threads may set the >>> "same" write-pointer into incoming request(s). >>> Are you referring to any other spinlock that can avoid "same wp being >>> returned to multiple threads". >> >> Checking again, it looks like you are correct. nullb->lock is indeed only used >> for processing read/write with memory backing turned on. >> We either need to extend that spinlock use, or add one to protect the zone array >> when doing zoned commands and checks of read/write against a zone wp. >> Care to send a patch ? I can send one too. > > Sure, I can send. > Do you think it is not OK to use zone write-lock (same like SCSI > emulation) instead of introducing a new spinlock? zone write lock will not protect against read or zone management commands executed concurrently with writes. Only concurrent writes to the same zone will be serialized with the scheduler zone write locking, which may not be used at all also if the user set the scheduler to none. A lock for exclusive access and changes to the zone array is needed. > > -- Damien Le Moal Western Digital Research
WARNING: multiple messages have this Message-ID (diff)
From: Damien Le Moal <Damien.LeMoal@wdc.com> To: Kanchan Joshi <joshiiitr@gmail.com> Cc: Jens Axboe <axboe@kernel.dk>, Selvakumar S <selvakuma.s1@samsung.com>, "sagi@grimberg.me" <sagi@grimberg.me>, Kanchan Joshi <joshi.k@samsung.com>, Johannes Thumshirn <Johannes.Thumshirn@wdc.com>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, "linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>, Nitesh Shetty <nj.shetty@samsung.com>, Keith Busch <kbusch@kernel.org>, Javier Gonzalez <javier.gonz@samsung.com>, Christoph Hellwig <hch@lst.de> Subject: Re: [PATCH 1/2] nvme: set io-scheduler requirement for ZNS Date: Mon, 7 Sep 2020 12:53:20 +0000 [thread overview] Message-ID: <CY4PR04MB375105E77F87B60E74E025BAE7280@CY4PR04MB3751.namprd04.prod.outlook.com> (raw) In-Reply-To: CA+1E3rJtGTt6gss-5uAjzQ+BAXWTOTcjUzyFToz-QWbEfkLkaA@mail.gmail.com On 2020/09/07 20:54, Kanchan Joshi wrote: > On Mon, Sep 7, 2020 at 5:07 PM Damien Le Moal <Damien.LeMoal@wdc.com> wrote: >> >> On 2020/09/07 20:24, Kanchan Joshi wrote: >>> On Mon, Sep 7, 2020 at 1:52 PM Damien Le Moal <Damien.LeMoal@wdc.com> wrote: >>>> >>>> On 2020/09/07 16:01, Kanchan Joshi wrote: >>>>>> Even for SMR, the user is free to set the elevator to none, which disables zone >>>>>> write locking. Issuing writes correctly then becomes the responsibility of the >>>>>> application. This can be useful for settings that for instance use NCQ I/O >>>>>> priorities, which give better results when "none" is used. >>>>> >>>>> Was it not a problem that even if the application is sending writes >>>>> correctly, scheduler may not preserve the order. >>>>> And even when none is being used, re-queue can happen which may lead >>>>> to different ordering. >>>> >>>> "Issuing writes correctly" means doing small writes, one per zone at most. In >>>> that case, it does not matter if the block layer reorders writes. Per zone, they >>>> will still be sequential. >>>> >>>>>> As far as I know, zoned drives are always used in tightly controlled >>>>>> environments. Problems like "does not know what other applications would be >>>>>> doing" are non-existent. Setting up the drive correctly for the use case at hand >>>>>> is a sysadmin/server setup problem, based on *the* application (singular) >>>>>> requirements. >>>>> >>>>> Fine. >>>>> But what about the null-block-zone which sets MQ-deadline but does not >>>>> actually use write-lock to avoid race among multiple appends on a >>>>> zone. >>>>> Does that deserve a fix? >>>> >>>> In nullblk, commands are executed under a spinlock. So there is no concurrency >>>> problem. The spinlock serializes the execution of all commands. null_blk zone >>>> append emulation thus does not need to take the scheduler level zone write lock >>>> like scsi does. >>> >>> I do not see spinlock for that. There is one "nullb->lock", but its >>> scope is limited to memory-backed handling. >>> For concurrent zone-appends on a zone, multiple threads may set the >>> "same" write-pointer into incoming request(s). >>> Are you referring to any other spinlock that can avoid "same wp being >>> returned to multiple threads". >> >> Checking again, it looks like you are correct. nullb->lock is indeed only used >> for processing read/write with memory backing turned on. >> We either need to extend that spinlock use, or add one to protect the zone array >> when doing zoned commands and checks of read/write against a zone wp. >> Care to send a patch ? I can send one too. > > Sure, I can send. > Do you think it is not OK to use zone write-lock (same like SCSI > emulation) instead of introducing a new spinlock? zone write lock will not protect against read or zone management commands executed concurrently with writes. Only concurrent writes to the same zone will be serialized with the scheduler zone write locking, which may not be used at all also if the user set the scheduler to none. A lock for exclusive access and changes to the zone array is needed. > > -- Damien Le Moal Western Digital Research _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme
next prev parent reply other threads:[~2020-09-07 12:57 UTC|newest] Thread overview: 90+ messages / expand[flat|nested] mbox.gz Atom feed top [not found] <CGME20200818053247epcas5p262c5fd7e207dfa5145011c4329cf239d@epcas5p2.samsung.com> 2020-08-18 5:29 ` [PATCH 0/2] enable append-emulation for ZNS Kanchan Joshi 2020-08-18 5:29 ` Kanchan Joshi [not found] ` <CGME20200818053252epcas5p4ee61d64bba5f6a131105e40330984f5e@epcas5p4.samsung.com> 2020-08-18 5:29 ` [PATCH 1/2] nvme: set io-scheduler requirement " Kanchan Joshi 2020-08-18 5:29 ` Kanchan Joshi 2020-08-18 7:11 ` Christoph Hellwig 2020-08-18 7:11 ` Christoph Hellwig 2020-08-19 9:26 ` Kanchan Joshi 2020-08-19 9:26 ` Kanchan Joshi 2020-08-19 9:38 ` Damien Le Moal 2020-08-19 9:38 ` Damien Le Moal 2020-08-19 10:31 ` Kanchan Joshi 2020-08-19 10:31 ` Kanchan Joshi 2020-08-19 11:17 ` Damien Le Moal 2020-08-19 11:17 ` Damien Le Moal 2020-09-07 7:00 ` Kanchan Joshi 2020-09-07 7:00 ` Kanchan Joshi 2020-09-07 8:22 ` Damien Le Moal 2020-09-07 8:22 ` Damien Le Moal 2020-09-07 11:23 ` Kanchan Joshi 2020-09-07 11:23 ` Kanchan Joshi 2020-09-07 11:37 ` Damien Le Moal 2020-09-07 11:37 ` Damien Le Moal 2020-09-07 11:54 ` Kanchan Joshi 2020-09-07 11:54 ` Kanchan Joshi 2020-09-07 12:53 ` Damien Le Moal [this message] 2020-09-07 12:53 ` Damien Le Moal [not found] ` <CGME20200818053256epcas5p46d0b66b3702192eb6617c8bba334c15f@epcas5p4.samsung.com> 2020-08-18 5:29 ` [PATCH 2/2] nvme: add emulation for zone-append Kanchan Joshi 2020-08-18 5:29 ` Kanchan Joshi 2020-08-18 7:12 ` Christoph Hellwig 2020-08-18 7:12 ` Christoph Hellwig 2020-08-18 9:50 ` Javier Gonzalez 2020-08-18 9:50 ` Javier Gonzalez 2020-08-18 10:51 ` Matias Bjørling 2020-08-18 10:51 ` Matias Bjørling 2020-08-18 18:11 ` Javier Gonzalez 2020-08-18 18:11 ` Javier Gonzalez 2020-08-18 15:50 ` Christoph Hellwig 2020-08-18 15:50 ` Christoph Hellwig 2020-08-18 18:04 ` Javier Gonzalez 2020-08-18 18:04 ` Javier Gonzalez 2020-08-19 7:40 ` Christoph Hellwig 2020-08-19 7:40 ` Christoph Hellwig 2020-08-19 8:33 ` Javier Gonzalez 2020-08-19 8:33 ` Javier Gonzalez 2020-08-19 9:14 ` Damien Le Moal 2020-08-19 9:14 ` Damien Le Moal 2020-08-19 10:43 ` Christoph Hellwig 2020-08-19 10:43 ` Christoph Hellwig 2020-08-20 6:45 ` Javier Gonzalez 2020-08-20 6:45 ` Javier Gonzalez 2020-08-19 10:49 ` Christoph Hellwig 2020-08-19 10:49 ` Christoph Hellwig 2020-08-18 16:58 ` Keith Busch 2020-08-18 16:58 ` Keith Busch 2020-08-18 17:29 ` Javier Gonzalez 2020-08-18 17:29 ` Javier Gonzalez 2020-08-18 17:39 ` Keith Busch 2020-08-18 17:39 ` Keith Busch 2020-08-18 18:13 ` Javier Gonzalez 2020-08-18 18:13 ` Javier Gonzalez 2020-08-19 19:11 ` David Fugate 2020-08-19 19:11 ` David Fugate 2020-08-19 19:25 ` Jens Axboe 2020-08-19 19:25 ` Jens Axboe 2020-08-19 21:54 ` David Fugate 2020-08-19 21:54 ` David Fugate 2020-08-19 22:10 ` Keith Busch 2020-08-19 22:10 ` Keith Busch 2020-08-19 23:43 ` David Fugate 2020-08-19 23:43 ` David Fugate 2020-08-20 3:45 ` Keith Busch 2020-08-20 3:45 ` Keith Busch 2020-08-20 23:26 ` David Fugate 2020-08-20 23:26 ` David Fugate 2020-08-20 5:51 ` Christoph Hellwig 2020-08-20 5:51 ` Christoph Hellwig 2020-08-20 6:37 ` Javier Gonzalez 2020-08-20 6:37 ` Javier Gonzalez 2020-08-20 6:52 ` Christoph Hellwig 2020-08-20 6:52 ` Christoph Hellwig 2020-08-20 8:03 ` Javier Gonzalez 2020-08-20 8:03 ` Javier Gonzalez 2020-08-19 21:42 ` Keith Busch 2020-08-19 21:42 ` Keith Busch 2020-08-20 7:37 ` Kanchan Joshi 2020-08-20 7:37 ` Kanchan Joshi 2020-08-20 8:14 ` Javier Gonzalez 2020-08-20 8:14 ` Javier Gonzalez 2020-08-20 5:29 ` Christoph Hellwig 2020-08-20 5:29 ` Christoph Hellwig
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=CY4PR04MB375105E77F87B60E74E025BAE7280@CY4PR04MB3751.namprd04.prod.outlook.com \ --to=damien.lemoal@wdc.com \ --cc=Johannes.Thumshirn@wdc.com \ --cc=axboe@kernel.dk \ --cc=hch@lst.de \ --cc=javier.gonz@samsung.com \ --cc=joshi.k@samsung.com \ --cc=joshiiitr@gmail.com \ --cc=kbusch@kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-nvme@lists.infradead.org \ --cc=nj.shetty@samsung.com \ --cc=sagi@grimberg.me \ --cc=selvakuma.s1@samsung.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.