All of lore.kernel.org
 help / color / mirror / Atom feed
From: Heiner Litz <hlitz@ucsc.edu>
To: Damien Le Moal <Damien.LeMoal@wdc.com>
Cc: "Keith Busch" <kbusch@kernel.org>,
	"Javier González" <javier@javigon.com>,
	"Matias Bjørling" <mb@lightnvm.io>,
	"Matias Bjorling" <Matias.Bjorling@wdc.com>,
	"Christoph Hellwig" <hch@lst.de>,
	"Keith Busch" <Keith.Busch@wdc.com>,
	"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	"Sagi Grimberg" <sagi@grimberg.me>,
	"Jens Axboe" <axboe@kernel.dk>,
	"Hans Holmberg" <Hans.Holmberg@wdc.com>,
	"Dmitry Fomichev" <Dmitry.Fomichev@wdc.com>,
	"Ajay Joshi" <Ajay.Joshi@wdc.com>,
	"Aravind Ramesh" <Aravind.Ramesh@wdc.com>,
	"Niklas Cassel" <Niklas.Cassel@wdc.com>,
	"Judy Brock" <judy.brock@samsung.com>
Subject: Re: [PATCH 5/5] nvme: support for zoned namespaces
Date: Thu, 18 Jun 2020 13:47:20 -0700	[thread overview]
Message-ID: <CAJbgVnVnqGQiLx1PctDhAKkjLXRKFwr00tdTPJjkaXZDc+V6Bg@mail.gmail.com> (raw)
In-Reply-To: <CY4PR04MB3751E6A6D6F04285CAB18514E79B0@CY4PR04MB3751.namprd04.prod.outlook.com>

Thanks Damien,
the striping explanation makes sense. In this case will rephase to: It
is sufficient to support large enough un-splittable writes to achieve
full per-zone bandwidth with a single writer/single QD.

My main point is: There is no fundamental reason for splitting up
requests intermittently just to re-assemble them in the same form
later.

On Wed, Jun 17, 2020 at 10:15 PM Damien Le Moal <Damien.LeMoal@wdc.com> wrote:
>
> On 2020/06/18 13:24, Heiner Litz wrote:
> > What is the purpose of making zones larger than the erase block size
> > of flash? And why are large writes fundamentally unreasonable?
>
> It is up to the drive vendor to decide how zones are mapped onto flash media.
> Different mapping give different properties for different use cases. Zones, in
> many cases, will be much larger than an erase block due to stripping across many
> dies for example. And erase block size also has a tendency to grow over time
> with new media generations.
> The block layer management of zoned block devices also applies to SMR HDDs,
> which can have any zone size they want. This is not all about flash.
>
> As for large writes, they may not be possible due to memory fragmentation and/or
> limited SGL size of the drive interface. E.g. AHCI max out at 168 segments, most
> HBAs are at best 256, etc.
>
> > I don't see why it should be a fundamental problem for e.g. RocksDB to
> > issue single zone-sized writes (whatever the zone size is because
> > RocksDB needs to cope with it). The write buffer exists as a level in
> > DRAM anyways and increasing write latency will not matter either.
>
> Rocksdb is an application, so of course it is free to issue a single write()
> call with a buffer size equal to the zone size. But due to the buffer mapping
> limitations stated above, there is a very high probability that this single
> zone-sized large write operation will end-up being split into multiple write
> commands in the kernel.
>
> >
> > On Wed, Jun 17, 2020 at 6:55 PM Keith Busch <kbusch@kernel.org> wrote:
> >>
> >> On Wed, Jun 17, 2020 at 04:44:23PM -0700, Heiner Litz wrote:
> >>> Mandating zone-sized writes would address all problems with ease and
> >>> reduce request rate and overheads in the kernel.
> >>
> >> Yikes, no. Typical zone sizes are much to large for that to be
> >> reasonable.
> >
>
>
> --
> Damien Le Moal
> Western Digital Research

WARNING: multiple messages have this Message-ID (diff)
From: Heiner Litz <hlitz@ucsc.edu>
To: Damien Le Moal <Damien.LeMoal@wdc.com>
Cc: "Jens Axboe" <axboe@kernel.dk>,
	"Niklas Cassel" <Niklas.Cassel@wdc.com>,
	"Javier González" <javier@javigon.com>,
	"Ajay Joshi" <Ajay.Joshi@wdc.com>,
	"Sagi Grimberg" <sagi@grimberg.me>,
	"Keith Busch" <Keith.Busch@wdc.com>,
	"Dmitry Fomichev" <Dmitry.Fomichev@wdc.com>,
	"Aravind Ramesh" <Aravind.Ramesh@wdc.com>,
	"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	"Hans Holmberg" <Hans.Holmberg@wdc.com>,
	"Keith Busch" <kbusch@kernel.org>,
	"Matias Bjørling" <mb@lightnvm.io>,
	"Judy Brock" <judy.brock@samsung.com>,
	"Christoph Hellwig" <hch@lst.de>,
	"Matias Bjorling" <Matias.Bjorling@wdc.com>
Subject: Re: [PATCH 5/5] nvme: support for zoned namespaces
Date: Thu, 18 Jun 2020 13:47:20 -0700	[thread overview]
Message-ID: <CAJbgVnVnqGQiLx1PctDhAKkjLXRKFwr00tdTPJjkaXZDc+V6Bg@mail.gmail.com> (raw)
In-Reply-To: <CY4PR04MB3751E6A6D6F04285CAB18514E79B0@CY4PR04MB3751.namprd04.prod.outlook.com>

Thanks Damien,
the striping explanation makes sense. In this case will rephase to: It
is sufficient to support large enough un-splittable writes to achieve
full per-zone bandwidth with a single writer/single QD.

My main point is: There is no fundamental reason for splitting up
requests intermittently just to re-assemble them in the same form
later.

On Wed, Jun 17, 2020 at 10:15 PM Damien Le Moal <Damien.LeMoal@wdc.com> wrote:
>
> On 2020/06/18 13:24, Heiner Litz wrote:
> > What is the purpose of making zones larger than the erase block size
> > of flash? And why are large writes fundamentally unreasonable?
>
> It is up to the drive vendor to decide how zones are mapped onto flash media.
> Different mapping give different properties for different use cases. Zones, in
> many cases, will be much larger than an erase block due to stripping across many
> dies for example. And erase block size also has a tendency to grow over time
> with new media generations.
> The block layer management of zoned block devices also applies to SMR HDDs,
> which can have any zone size they want. This is not all about flash.
>
> As for large writes, they may not be possible due to memory fragmentation and/or
> limited SGL size of the drive interface. E.g. AHCI max out at 168 segments, most
> HBAs are at best 256, etc.
>
> > I don't see why it should be a fundamental problem for e.g. RocksDB to
> > issue single zone-sized writes (whatever the zone size is because
> > RocksDB needs to cope with it). The write buffer exists as a level in
> > DRAM anyways and increasing write latency will not matter either.
>
> Rocksdb is an application, so of course it is free to issue a single write()
> call with a buffer size equal to the zone size. But due to the buffer mapping
> limitations stated above, there is a very high probability that this single
> zone-sized large write operation will end-up being split into multiple write
> commands in the kernel.
>
> >
> > On Wed, Jun 17, 2020 at 6:55 PM Keith Busch <kbusch@kernel.org> wrote:
> >>
> >> On Wed, Jun 17, 2020 at 04:44:23PM -0700, Heiner Litz wrote:
> >>> Mandating zone-sized writes would address all problems with ease and
> >>> reduce request rate and overheads in the kernel.
> >>
> >> Yikes, no. Typical zone sizes are much to large for that to be
> >> reasonable.
> >
>
>
> --
> Damien Le Moal
> Western Digital Research

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

  reply	other threads:[~2020-06-18 20:47 UTC|newest]

Thread overview: 192+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-15 23:34 [PATCH 0/5] nvme support for zoned namespace command set Keith Busch
2020-06-15 23:34 ` Keith Busch
2020-06-15 23:34 ` [PATCH 1/5] block: add capacity field to zone descriptors Keith Busch
2020-06-15 23:34   ` Keith Busch
2020-06-15 23:49   ` Chaitanya Kulkarni
2020-06-15 23:49     ` Chaitanya Kulkarni
2020-06-16 10:28   ` Javier González
2020-06-16 10:28     ` Javier González
2020-06-16 13:47   ` Daniel Wagner
2020-06-16 13:47     ` Daniel Wagner
2020-06-16 13:54   ` Johannes Thumshirn
2020-06-16 13:54     ` Johannes Thumshirn
2020-06-16 15:41   ` Martin K. Petersen
2020-06-16 15:41     ` Martin K. Petersen
2020-06-15 23:34 ` [PATCH 2/5] null_blk: introduce zone capacity for zoned device Keith Busch
2020-06-15 23:34   ` Keith Busch
2020-06-15 23:46   ` Chaitanya Kulkarni
2020-06-15 23:46     ` Chaitanya Kulkarni
2020-06-16 14:18   ` Daniel Wagner
2020-06-16 14:18     ` Daniel Wagner
2020-06-16 15:48   ` Martin K. Petersen
2020-06-16 15:48     ` Martin K. Petersen
2020-06-15 23:34 ` [PATCH 3/5] nvme: implement I/O Command Sets Command Set support Keith Busch
2020-06-15 23:34   ` Keith Busch
2020-06-16 10:33   ` Javier González
2020-06-16 10:33     ` Javier González
2020-06-16 17:14     ` Niklas Cassel
2020-06-16 17:14       ` Niklas Cassel
2020-06-16 15:58   ` Martin K. Petersen
2020-06-16 15:58     ` Martin K. Petersen
2020-06-16 17:01     ` Keith Busch
2020-06-16 17:01       ` Keith Busch
2020-06-17  9:50       ` Niklas Cassel
2020-06-17  9:50         ` Niklas Cassel
2020-06-16 17:06     ` Niklas Cassel
2020-06-16 17:06       ` Niklas Cassel
2020-06-17  2:01       ` Martin K. Petersen
2020-06-17  2:01         ` Martin K. Petersen
2020-06-15 23:34 ` [PATCH 4/5] nvme: support for multi-command set effects Keith Busch
2020-06-15 23:34   ` Keith Busch
2020-06-16 10:34   ` Javier González
2020-06-16 10:34     ` Javier González
2020-06-16 16:03   ` Martin K. Petersen
2020-06-16 16:03     ` Martin K. Petersen
2020-06-15 23:34 ` [PATCH 5/5] nvme: support for zoned namespaces Keith Busch
2020-06-15 23:34   ` Keith Busch
2020-06-16 10:41   ` Javier González
2020-06-16 10:41     ` Javier González
2020-06-16 11:18     ` Matias Bjørling
2020-06-16 11:18       ` Matias Bjørling
2020-06-16 12:00       ` Javier González
2020-06-16 12:00         ` Javier González
2020-06-16 12:06         ` Matias Bjørling
2020-06-16 12:06           ` Matias Bjørling
2020-06-16 12:24           ` Javier González
2020-06-16 12:24             ` Javier González
2020-06-16 12:27             ` Matias Bjørling
2020-06-16 12:27               ` Matias Bjørling
2020-06-16 12:35             ` Damien Le Moal
2020-06-16 12:35               ` Damien Le Moal
     [not found]               ` <CGME20200616130815uscas1p1be34e5fceaa548eac31fb30790a689d4@uscas1p1.samsung.com>
2020-06-16 13:08                 ` Judy Brock
2020-06-16 13:08                   ` Judy Brock
2020-06-16 13:32                   ` Matias Bjørling
2020-06-16 13:32                     ` Matias Bjørling
2020-06-16 13:34                   ` Damien Le Moal
2020-06-16 13:34                     ` Damien Le Moal
2020-06-16 14:16               ` Javier González
2020-06-16 14:16                 ` Javier González
2020-06-16 14:42                 ` Damien Le Moal
2020-06-16 14:42                   ` Damien Le Moal
2020-06-16 15:02                   ` Javier González
2020-06-16 15:02                     ` Javier González
2020-06-16 15:20                     ` Matias Bjørling
2020-06-16 15:20                       ` Matias Bjørling
2020-06-16 16:03                       ` Javier González
2020-06-16 16:03                         ` Javier González
2020-06-16 16:07                         ` Matias Bjorling
2020-06-16 16:07                           ` Matias Bjorling
2020-06-16 16:21                           ` Javier González
2020-06-16 16:21                             ` Javier González
2020-06-16 16:25                             ` Matias Bjørling
2020-06-16 16:25                               ` Matias Bjørling
2020-06-16 15:48                     ` Keith Busch
2020-06-16 15:48                       ` Keith Busch
2020-06-16 15:55                       ` Javier González
2020-06-16 15:55                         ` Javier González
2020-06-16 16:04                         ` Matias Bjorling
2020-06-16 16:04                           ` Matias Bjorling
2020-06-16 16:07                         ` Keith Busch
2020-06-16 16:07                           ` Keith Busch
2020-06-16 16:13                           ` Javier González
2020-06-16 16:13                             ` Javier González
2020-06-17  0:38                             ` Damien Le Moal
2020-06-17  0:38                               ` Damien Le Moal
2020-06-17  6:18                               ` Javier González
2020-06-17  6:18                                 ` Javier González
2020-06-17  6:54                                 ` Damien Le Moal
2020-06-17  6:54                                   ` Damien Le Moal
2020-06-17  7:11                                   ` Javier González
2020-06-17  7:11                                     ` Javier González
2020-06-17  7:29                                     ` Damien Le Moal
2020-06-17  7:29                                       ` Damien Le Moal
2020-06-17  7:34                                       ` Javier González
2020-06-17  7:34                                         ` Javier González
2020-06-17  0:14                     ` Damien Le Moal
2020-06-17  0:14                       ` Damien Le Moal
2020-06-17  6:09                       ` Javier González
2020-06-17  6:09                         ` Javier González
2020-06-17  6:47                         ` Damien Le Moal
2020-06-17  6:47                           ` Damien Le Moal
2020-06-17  7:02                           ` Javier González
2020-06-17  7:02                             ` Javier González
2020-06-17  7:24                             ` Damien Le Moal
2020-06-17  7:24                               ` Damien Le Moal
2020-06-17  7:29                               ` Javier González
2020-06-17  7:29                                 ` Javier González
     [not found]         ` <CGME20200616123503uscas1p22ce22054a1b4152a20437b5abdd55119@uscas1p2.samsung.com>
2020-06-16 12:35           ` Judy Brock
2020-06-16 12:35             ` Judy Brock
2020-06-16 12:37             ` Damien Le Moal
2020-06-16 12:37               ` Damien Le Moal
2020-06-16 12:37             ` Matias Bjørling
2020-06-16 12:37               ` Matias Bjørling
2020-06-16 13:12               ` Judy Brock
2020-06-16 13:12                 ` Judy Brock
2020-06-16 13:18                 ` Judy Brock
2020-06-16 13:18                   ` Judy Brock
2020-06-16 13:32                   ` Judy Brock
2020-06-16 13:32                     ` Judy Brock
2020-06-16 13:39                     ` Damien Le Moal
2020-06-16 13:39                       ` Damien Le Moal
2020-06-17  7:43     ` Christoph Hellwig
2020-06-17  7:43       ` Christoph Hellwig
2020-06-17 12:01       ` Martin K. Petersen
2020-06-17 12:01         ` Martin K. Petersen
2020-06-17 15:00         ` Javier González
2020-06-17 15:00           ` Javier González
2020-06-17 14:42       ` Javier González
2020-06-17 14:42         ` Javier González
2020-06-17 17:57         ` Matias Bjørling
2020-06-17 17:57           ` Matias Bjørling
2020-06-17 18:28           ` Javier González
2020-06-17 18:28             ` Javier González
2020-06-17 18:55             ` Matias Bjorling
2020-06-17 18:55               ` Matias Bjorling
2020-06-17 19:09               ` Javier González
2020-06-17 19:09                 ` Javier González
2020-06-17 19:23                 ` Matias Bjørling
2020-06-17 19:23                   ` Matias Bjørling
2020-06-17 19:40                   ` Javier González
2020-06-17 19:40                     ` Javier González
2020-06-17 23:44                     ` Heiner Litz
2020-06-17 23:44                       ` Heiner Litz
2020-06-18  1:55                       ` Keith Busch
2020-06-18  1:55                         ` Keith Busch
2020-06-18  4:24                         ` Heiner Litz
2020-06-18  4:24                           ` Heiner Litz
2020-06-18  5:15                           ` Damien Le Moal
2020-06-18  5:15                             ` Damien Le Moal
2020-06-18 20:47                             ` Heiner Litz [this message]
2020-06-18 20:47                               ` Heiner Litz
2020-06-18 21:04                               ` Matias Bjorling
2020-06-18 21:04                                 ` Matias Bjorling
2020-06-18 21:19                               ` Keith Busch
2020-06-18 21:19                                 ` Keith Busch
2020-06-18 22:05                                 ` Heiner Litz
2020-06-18 22:05                                   ` Heiner Litz
2020-06-19  0:57                                   ` Damien Le Moal
2020-06-19  0:57                                     ` Damien Le Moal
2020-06-19 10:29                                   ` Matias Bjorling
2020-06-19 10:29                                     ` Matias Bjorling
2020-06-19 18:08                                     ` Heiner Litz
2020-06-19 18:08                                       ` Heiner Litz
2020-06-19 18:10                                       ` Keith Busch
2020-06-19 18:10                                         ` Keith Busch
2020-06-19 18:17                                         ` Heiner Litz
2020-06-19 18:17                                           ` Heiner Litz
2020-06-19 18:22                                           ` Keith Busch
2020-06-19 18:22                                             ` Keith Busch
2020-06-19 18:25                                           ` Matias Bjørling
2020-06-19 18:25                                             ` Matias Bjørling
2020-06-19 18:40                                             ` Heiner Litz
2020-06-19 18:40                                               ` Heiner Litz
2020-06-19 18:18                                       ` Matias Bjørling
2020-06-19 18:18                                         ` Matias Bjørling
2020-06-20  6:33                                       ` Christoph Hellwig
2020-06-20  6:33                                         ` Christoph Hellwig
2020-06-20 17:52                                         ` Heiner Litz
2020-06-20 17:52                                           ` Heiner Litz
2020-06-22 14:01                                           ` Christoph Hellwig
2022-03-02 21:11                   ` Luis Chamberlain
2020-06-17  2:08   ` Martin K. Petersen
2020-06-17  2:08     ` Martin K. Petersen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJbgVnVnqGQiLx1PctDhAKkjLXRKFwr00tdTPJjkaXZDc+V6Bg@mail.gmail.com \
    --to=hlitz@ucsc.edu \
    --cc=Ajay.Joshi@wdc.com \
    --cc=Aravind.Ramesh@wdc.com \
    --cc=Damien.LeMoal@wdc.com \
    --cc=Dmitry.Fomichev@wdc.com \
    --cc=Hans.Holmberg@wdc.com \
    --cc=Keith.Busch@wdc.com \
    --cc=Matias.Bjorling@wdc.com \
    --cc=Niklas.Cassel@wdc.com \
    --cc=axboe@kernel.dk \
    --cc=hch@lst.de \
    --cc=javier@javigon.com \
    --cc=judy.brock@samsung.com \
    --cc=kbusch@kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=mb@lightnvm.io \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.