From: Damien Le Moal <Damien.LeMoal@wdc.com> To: Heiner Litz <hlitz@ucsc.edu>, Keith Busch <kbusch@kernel.org> Cc: "Javier González" <javier@javigon.com>, "Matias Bjørling" <mb@lightnvm.io>, "Matias Bjorling" <Matias.Bjorling@wdc.com>, "Christoph Hellwig" <hch@lst.de>, "Keith Busch" <Keith.Busch@wdc.com>, "linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>, "linux-block@vger.kernel.org" <linux-block@vger.kernel.org>, "Sagi Grimberg" <sagi@grimberg.me>, "Jens Axboe" <axboe@kernel.dk>, "Hans Holmberg" <Hans.Holmberg@wdc.com>, "Dmitry Fomichev" <Dmitry.Fomichev@wdc.com>, "Ajay Joshi" <Ajay.Joshi@wdc.com>, "Aravind Ramesh" <Aravind.Ramesh@wdc.com>, "Niklas Cassel" <Niklas.Cassel@wdc.com>, "Judy Brock" <judy.brock@samsung.com> Subject: Re: [PATCH 5/5] nvme: support for zoned namespaces Date: Thu, 18 Jun 2020 05:15:35 +0000 [thread overview] Message-ID: <CY4PR04MB3751E6A6D6F04285CAB18514E79B0@CY4PR04MB3751.namprd04.prod.outlook.com> (raw) In-Reply-To: CAJbgVnVKqDobpX8iwqRVeDqvmfdEd-uRzNFC2z5U03X9E3Pi_w@mail.gmail.com On 2020/06/18 13:24, Heiner Litz wrote: > What is the purpose of making zones larger than the erase block size > of flash? And why are large writes fundamentally unreasonable? It is up to the drive vendor to decide how zones are mapped onto flash media. Different mapping give different properties for different use cases. Zones, in many cases, will be much larger than an erase block due to stripping across many dies for example. And erase block size also has a tendency to grow over time with new media generations. The block layer management of zoned block devices also applies to SMR HDDs, which can have any zone size they want. This is not all about flash. As for large writes, they may not be possible due to memory fragmentation and/or limited SGL size of the drive interface. E.g. AHCI max out at 168 segments, most HBAs are at best 256, etc. > I don't see why it should be a fundamental problem for e.g. RocksDB to > issue single zone-sized writes (whatever the zone size is because > RocksDB needs to cope with it). The write buffer exists as a level in > DRAM anyways and increasing write latency will not matter either. Rocksdb is an application, so of course it is free to issue a single write() call with a buffer size equal to the zone size. But due to the buffer mapping limitations stated above, there is a very high probability that this single zone-sized large write operation will end-up being split into multiple write commands in the kernel. > > On Wed, Jun 17, 2020 at 6:55 PM Keith Busch <kbusch@kernel.org> wrote: >> >> On Wed, Jun 17, 2020 at 04:44:23PM -0700, Heiner Litz wrote: >>> Mandating zone-sized writes would address all problems with ease and >>> reduce request rate and overheads in the kernel. >> >> Yikes, no. Typical zone sizes are much to large for that to be >> reasonable. > -- Damien Le Moal Western Digital Research
WARNING: multiple messages have this Message-ID (diff)
From: Damien Le Moal <Damien.LeMoal@wdc.com> To: Heiner Litz <hlitz@ucsc.edu>, Keith Busch <kbusch@kernel.org> Cc: "Jens Axboe" <axboe@kernel.dk>, "Niklas Cassel" <Niklas.Cassel@wdc.com>, "Javier González" <javier@javigon.com>, "Ajay Joshi" <Ajay.Joshi@wdc.com>, "Sagi Grimberg" <sagi@grimberg.me>, "Keith Busch" <Keith.Busch@wdc.com>, "Dmitry Fomichev" <Dmitry.Fomichev@wdc.com>, "Aravind Ramesh" <Aravind.Ramesh@wdc.com>, "linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>, "linux-block@vger.kernel.org" <linux-block@vger.kernel.org>, "Hans Holmberg" <Hans.Holmberg@wdc.com>, "Matias Bjørling" <mb@lightnvm.io>, "Judy Brock" <judy.brock@samsung.com>, "Christoph Hellwig" <hch@lst.de>, "Matias Bjorling" <Matias.Bjorling@wdc.com> Subject: Re: [PATCH 5/5] nvme: support for zoned namespaces Date: Thu, 18 Jun 2020 05:15:35 +0000 [thread overview] Message-ID: <CY4PR04MB3751E6A6D6F04285CAB18514E79B0@CY4PR04MB3751.namprd04.prod.outlook.com> (raw) In-Reply-To: CAJbgVnVKqDobpX8iwqRVeDqvmfdEd-uRzNFC2z5U03X9E3Pi_w@mail.gmail.com On 2020/06/18 13:24, Heiner Litz wrote: > What is the purpose of making zones larger than the erase block size > of flash? And why are large writes fundamentally unreasonable? It is up to the drive vendor to decide how zones are mapped onto flash media. Different mapping give different properties for different use cases. Zones, in many cases, will be much larger than an erase block due to stripping across many dies for example. And erase block size also has a tendency to grow over time with new media generations. The block layer management of zoned block devices also applies to SMR HDDs, which can have any zone size they want. This is not all about flash. As for large writes, they may not be possible due to memory fragmentation and/or limited SGL size of the drive interface. E.g. AHCI max out at 168 segments, most HBAs are at best 256, etc. > I don't see why it should be a fundamental problem for e.g. RocksDB to > issue single zone-sized writes (whatever the zone size is because > RocksDB needs to cope with it). The write buffer exists as a level in > DRAM anyways and increasing write latency will not matter either. Rocksdb is an application, so of course it is free to issue a single write() call with a buffer size equal to the zone size. But due to the buffer mapping limitations stated above, there is a very high probability that this single zone-sized large write operation will end-up being split into multiple write commands in the kernel. > > On Wed, Jun 17, 2020 at 6:55 PM Keith Busch <kbusch@kernel.org> wrote: >> >> On Wed, Jun 17, 2020 at 04:44:23PM -0700, Heiner Litz wrote: >>> Mandating zone-sized writes would address all problems with ease and >>> reduce request rate and overheads in the kernel. >> >> Yikes, no. Typical zone sizes are much to large for that to be >> reasonable. > -- Damien Le Moal Western Digital Research _______________________________________________ linux-nvme mailing list linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme
next prev parent reply other threads:[~2020-06-18 5:15 UTC|newest] Thread overview: 192+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-06-15 23:34 [PATCH 0/5] nvme support for zoned namespace command set Keith Busch 2020-06-15 23:34 ` Keith Busch 2020-06-15 23:34 ` [PATCH 1/5] block: add capacity field to zone descriptors Keith Busch 2020-06-15 23:34 ` Keith Busch 2020-06-15 23:49 ` Chaitanya Kulkarni 2020-06-15 23:49 ` Chaitanya Kulkarni 2020-06-16 10:28 ` Javier González 2020-06-16 10:28 ` Javier González 2020-06-16 13:47 ` Daniel Wagner 2020-06-16 13:47 ` Daniel Wagner 2020-06-16 13:54 ` Johannes Thumshirn 2020-06-16 13:54 ` Johannes Thumshirn 2020-06-16 15:41 ` Martin K. Petersen 2020-06-16 15:41 ` Martin K. Petersen 2020-06-15 23:34 ` [PATCH 2/5] null_blk: introduce zone capacity for zoned device Keith Busch 2020-06-15 23:34 ` Keith Busch 2020-06-15 23:46 ` Chaitanya Kulkarni 2020-06-15 23:46 ` Chaitanya Kulkarni 2020-06-16 14:18 ` Daniel Wagner 2020-06-16 14:18 ` Daniel Wagner 2020-06-16 15:48 ` Martin K. Petersen 2020-06-16 15:48 ` Martin K. Petersen 2020-06-15 23:34 ` [PATCH 3/5] nvme: implement I/O Command Sets Command Set support Keith Busch 2020-06-15 23:34 ` Keith Busch 2020-06-16 10:33 ` Javier González 2020-06-16 10:33 ` Javier González 2020-06-16 17:14 ` Niklas Cassel 2020-06-16 17:14 ` Niklas Cassel 2020-06-16 15:58 ` Martin K. Petersen 2020-06-16 15:58 ` Martin K. Petersen 2020-06-16 17:01 ` Keith Busch 2020-06-16 17:01 ` Keith Busch 2020-06-17 9:50 ` Niklas Cassel 2020-06-17 9:50 ` Niklas Cassel 2020-06-16 17:06 ` Niklas Cassel 2020-06-16 17:06 ` Niklas Cassel 2020-06-17 2:01 ` Martin K. Petersen 2020-06-17 2:01 ` Martin K. Petersen 2020-06-15 23:34 ` [PATCH 4/5] nvme: support for multi-command set effects Keith Busch 2020-06-15 23:34 ` Keith Busch 2020-06-16 10:34 ` Javier González 2020-06-16 10:34 ` Javier González 2020-06-16 16:03 ` Martin K. Petersen 2020-06-16 16:03 ` Martin K. Petersen 2020-06-15 23:34 ` [PATCH 5/5] nvme: support for zoned namespaces Keith Busch 2020-06-15 23:34 ` Keith Busch 2020-06-16 10:41 ` Javier González 2020-06-16 10:41 ` Javier González 2020-06-16 11:18 ` Matias Bjørling 2020-06-16 11:18 ` Matias Bjørling 2020-06-16 12:00 ` Javier González 2020-06-16 12:00 ` Javier González 2020-06-16 12:06 ` Matias Bjørling 2020-06-16 12:06 ` Matias Bjørling 2020-06-16 12:24 ` Javier González 2020-06-16 12:24 ` Javier González 2020-06-16 12:27 ` Matias Bjørling 2020-06-16 12:27 ` Matias Bjørling 2020-06-16 12:35 ` Damien Le Moal 2020-06-16 12:35 ` Damien Le Moal [not found] ` <CGME20200616130815uscas1p1be34e5fceaa548eac31fb30790a689d4@uscas1p1.samsung.com> 2020-06-16 13:08 ` Judy Brock 2020-06-16 13:08 ` Judy Brock 2020-06-16 13:32 ` Matias Bjørling 2020-06-16 13:32 ` Matias Bjørling 2020-06-16 13:34 ` Damien Le Moal 2020-06-16 13:34 ` Damien Le Moal 2020-06-16 14:16 ` Javier González 2020-06-16 14:16 ` Javier González 2020-06-16 14:42 ` Damien Le Moal 2020-06-16 14:42 ` Damien Le Moal 2020-06-16 15:02 ` Javier González 2020-06-16 15:02 ` Javier González 2020-06-16 15:20 ` Matias Bjørling 2020-06-16 15:20 ` Matias Bjørling 2020-06-16 16:03 ` Javier González 2020-06-16 16:03 ` Javier González 2020-06-16 16:07 ` Matias Bjorling 2020-06-16 16:07 ` Matias Bjorling 2020-06-16 16:21 ` Javier González 2020-06-16 16:21 ` Javier González 2020-06-16 16:25 ` Matias Bjørling 2020-06-16 16:25 ` Matias Bjørling 2020-06-16 15:48 ` Keith Busch 2020-06-16 15:48 ` Keith Busch 2020-06-16 15:55 ` Javier González 2020-06-16 15:55 ` Javier González 2020-06-16 16:04 ` Matias Bjorling 2020-06-16 16:04 ` Matias Bjorling 2020-06-16 16:07 ` Keith Busch 2020-06-16 16:07 ` Keith Busch 2020-06-16 16:13 ` Javier González 2020-06-16 16:13 ` Javier González 2020-06-17 0:38 ` Damien Le Moal 2020-06-17 0:38 ` Damien Le Moal 2020-06-17 6:18 ` Javier González 2020-06-17 6:18 ` Javier González 2020-06-17 6:54 ` Damien Le Moal 2020-06-17 6:54 ` Damien Le Moal 2020-06-17 7:11 ` Javier González 2020-06-17 7:11 ` Javier González 2020-06-17 7:29 ` Damien Le Moal 2020-06-17 7:29 ` Damien Le Moal 2020-06-17 7:34 ` Javier González 2020-06-17 7:34 ` Javier González 2020-06-17 0:14 ` Damien Le Moal 2020-06-17 0:14 ` Damien Le Moal 2020-06-17 6:09 ` Javier González 2020-06-17 6:09 ` Javier González 2020-06-17 6:47 ` Damien Le Moal 2020-06-17 6:47 ` Damien Le Moal 2020-06-17 7:02 ` Javier González 2020-06-17 7:02 ` Javier González 2020-06-17 7:24 ` Damien Le Moal 2020-06-17 7:24 ` Damien Le Moal 2020-06-17 7:29 ` Javier González 2020-06-17 7:29 ` Javier González [not found] ` <CGME20200616123503uscas1p22ce22054a1b4152a20437b5abdd55119@uscas1p2.samsung.com> 2020-06-16 12:35 ` Judy Brock 2020-06-16 12:35 ` Judy Brock 2020-06-16 12:37 ` Damien Le Moal 2020-06-16 12:37 ` Damien Le Moal 2020-06-16 12:37 ` Matias Bjørling 2020-06-16 12:37 ` Matias Bjørling 2020-06-16 13:12 ` Judy Brock 2020-06-16 13:12 ` Judy Brock 2020-06-16 13:18 ` Judy Brock 2020-06-16 13:18 ` Judy Brock 2020-06-16 13:32 ` Judy Brock 2020-06-16 13:32 ` Judy Brock 2020-06-16 13:39 ` Damien Le Moal 2020-06-16 13:39 ` Damien Le Moal 2020-06-17 7:43 ` Christoph Hellwig 2020-06-17 7:43 ` Christoph Hellwig 2020-06-17 12:01 ` Martin K. Petersen 2020-06-17 12:01 ` Martin K. Petersen 2020-06-17 15:00 ` Javier González 2020-06-17 15:00 ` Javier González 2020-06-17 14:42 ` Javier González 2020-06-17 14:42 ` Javier González 2020-06-17 17:57 ` Matias Bjørling 2020-06-17 17:57 ` Matias Bjørling 2020-06-17 18:28 ` Javier González 2020-06-17 18:28 ` Javier González 2020-06-17 18:55 ` Matias Bjorling 2020-06-17 18:55 ` Matias Bjorling 2020-06-17 19:09 ` Javier González 2020-06-17 19:09 ` Javier González 2020-06-17 19:23 ` Matias Bjørling 2020-06-17 19:23 ` Matias Bjørling 2020-06-17 19:40 ` Javier González 2020-06-17 19:40 ` Javier González 2020-06-17 23:44 ` Heiner Litz 2020-06-17 23:44 ` Heiner Litz 2020-06-18 1:55 ` Keith Busch 2020-06-18 1:55 ` Keith Busch 2020-06-18 4:24 ` Heiner Litz 2020-06-18 4:24 ` Heiner Litz 2020-06-18 5:15 ` Damien Le Moal [this message] 2020-06-18 5:15 ` Damien Le Moal 2020-06-18 20:47 ` Heiner Litz 2020-06-18 20:47 ` Heiner Litz 2020-06-18 21:04 ` Matias Bjorling 2020-06-18 21:04 ` Matias Bjorling 2020-06-18 21:19 ` Keith Busch 2020-06-18 21:19 ` Keith Busch 2020-06-18 22:05 ` Heiner Litz 2020-06-18 22:05 ` Heiner Litz 2020-06-19 0:57 ` Damien Le Moal 2020-06-19 0:57 ` Damien Le Moal 2020-06-19 10:29 ` Matias Bjorling 2020-06-19 10:29 ` Matias Bjorling 2020-06-19 18:08 ` Heiner Litz 2020-06-19 18:08 ` Heiner Litz 2020-06-19 18:10 ` Keith Busch 2020-06-19 18:10 ` Keith Busch 2020-06-19 18:17 ` Heiner Litz 2020-06-19 18:17 ` Heiner Litz 2020-06-19 18:22 ` Keith Busch 2020-06-19 18:22 ` Keith Busch 2020-06-19 18:25 ` Matias Bjørling 2020-06-19 18:25 ` Matias Bjørling 2020-06-19 18:40 ` Heiner Litz 2020-06-19 18:40 ` Heiner Litz 2020-06-19 18:18 ` Matias Bjørling 2020-06-19 18:18 ` Matias Bjørling 2020-06-20 6:33 ` Christoph Hellwig 2020-06-20 6:33 ` Christoph Hellwig 2020-06-20 17:52 ` Heiner Litz 2020-06-20 17:52 ` Heiner Litz 2020-06-22 14:01 ` Christoph Hellwig 2022-03-02 21:11 ` Luis Chamberlain 2020-06-17 2:08 ` Martin K. Petersen 2020-06-17 2:08 ` Martin K. Petersen
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=CY4PR04MB3751E6A6D6F04285CAB18514E79B0@CY4PR04MB3751.namprd04.prod.outlook.com \ --to=damien.lemoal@wdc.com \ --cc=Ajay.Joshi@wdc.com \ --cc=Aravind.Ramesh@wdc.com \ --cc=Dmitry.Fomichev@wdc.com \ --cc=Hans.Holmberg@wdc.com \ --cc=Keith.Busch@wdc.com \ --cc=Matias.Bjorling@wdc.com \ --cc=Niklas.Cassel@wdc.com \ --cc=axboe@kernel.dk \ --cc=hch@lst.de \ --cc=hlitz@ucsc.edu \ --cc=javier@javigon.com \ --cc=judy.brock@samsung.com \ --cc=kbusch@kernel.org \ --cc=linux-block@vger.kernel.org \ --cc=linux-nvme@lists.infradead.org \ --cc=mb@lightnvm.io \ --cc=sagi@grimberg.me \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.