All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Sterba <dsterba@suse.cz>
To: Pankaj Raghav <p.raghav@samsung.com>
Cc: jaegeuk@kernel.org, hare@suse.de, dsterba@suse.com,
	axboe@kernel.dk, hch@lst.de, damien.lemoal@opensource.wdc.com,
	snitzer@kernel.org, Chris Mason <clm@fb.com>,
	Josef Bacik <josef@toxicpanda.com>,
	bvanassche@acm.org, linux-fsdevel@vger.kernel.org,
	matias.bjorling@wdc.com, Jens Axboe <axboe@fb.com>,
	gost.dev@samsung.com, jonathan.derrick@linux.dev,
	jiangbo.365@bytedance.com, linux-nvme@lists.infradead.org,
	dm-devel@redhat.com, Naohiro Aota <naohiro.aota@wdc.com>,
	linux-kernel@vger.kernel.org, Johannes Thumshirn <jth@kernel.org>,
	Sagi Grimberg <sagi@grimberg.me>,
	Alasdair Kergon <agk@redhat.com>,
	linux-block@vger.kernel.org, Chaitanya Kulkarni <kch@nvidia.com>,
	Keith Busch <kbusch@kernel.org>,
	linux-btrfs@vger.kernel.org
Subject: Re: [PATCH v3 00/11] support non power of 2 zoned devices
Date: Fri, 6 May 2022 12:00:55 +0200	[thread overview]
Message-ID: <20220506100054.GZ18596@suse.cz> (raw)
In-Reply-To: <20220506081105.29134-1-p.raghav@samsung.com>

On Fri, May 06, 2022 at 10:10:54AM +0200, Pankaj Raghav wrote:
> - Open issue:
> * btrfs superblock location for zoned devices is expected to be in 0,
>   512GB(mirror) and 4TB(mirror) in the device. Zoned devices with po2
>   zone size will naturally align with these superblock location but non
>   po2 devices will not align with 512GB and 4TB offset.
> 
>   The current approach for npo2 devices is to place the superblock mirror
>   zones near   512GB and 4TB that is **aligned to the zone size**.

I don't like that, the offsets have been chosen so the values are fixed
and also future proof in case the zone size increases significantly. The
natural alignment of the pow2 zones makes it fairly trivial.

If I understand correctly what you suggest, it would mean that if zone
is eg. 5G and starts at 510G then the superblock should start at 510G,
right? And with another device that has 7G zone size the nearest
multiple is 511G. And so on.

That makes it all less predictable, depending on the physical device
constraints that are affecting the logical data structures of the
filesystem. We tried to avoid that with pow2, the only thing that
depends on the device is that the range from the super block offsets is
always 2 zones.

I really want to keep the offsets for all zoned devices the same and
adapt the code that's handling the writes. This is possible with the
non-pow2 too, the first write is set to the expected offset, leaving the
beginning of the zone unused.

>   This
>   is of no issue for normal operation as we keep track where the superblock
>   mirror are placed but this can cause an issue with recovery tools for
>   zoned devices as they expect mirror superblock to be in 512GB and 4TB.

Yeah the tools need to be updated, btrfs-progs and suite of blk* in
util-linux.

>   Note that ATM, recovery tools such as `btrfs check` does not work for
>   image dumps for zoned devices even for po2 zone sizes.

I thought this worked, but if you find something that does not please
report that to Johannes or Naohiro.

WARNING: multiple messages have this Message-ID (diff)
From: David Sterba <dsterba@suse.cz>
To: Pankaj Raghav <p.raghav@samsung.com>
Cc: jiangbo.365@bytedance.com, linux-nvme@lists.infradead.org,
	Chris Mason <clm@fb.com>,
	dm-devel@redhat.com, hch@lst.de, Alasdair Kergon <agk@redhat.com>,
	Naohiro Aota <naohiro.aota@wdc.com>,
	bvanassche@acm.org, gost.dev@samsung.com,
	damien.lemoal@opensource.wdc.com, jonathan.derrick@linux.dev,
	Chaitanya Kulkarni <kch@nvidia.com>,
	snitzer@kernel.org, Josef Bacik <josef@toxicpanda.com>,
	Jens Axboe <axboe@fb.com>,
	dsterba@suse.com, jaegeuk@kernel.org, matias.bjorling@wdc.com,
	Sagi Grimberg <sagi@grimberg.me>,
	axboe@kernel.dk, linux-block@vger.kernel.org,
	linux-kernel@vger.kernel.org, Keith Busch <kbusch@kernel.org>,
	linux-fsdevel@vger.kernel.org,
	Johannes Thumshirn <jth@kernel.org>,
	linux-btrfs@vger.kernel.org
Subject: Re: [dm-devel] [PATCH v3 00/11] support non power of 2 zoned devices
Date: Fri, 6 May 2022 12:00:55 +0200	[thread overview]
Message-ID: <20220506100054.GZ18596@suse.cz> (raw)
In-Reply-To: <20220506081105.29134-1-p.raghav@samsung.com>

On Fri, May 06, 2022 at 10:10:54AM +0200, Pankaj Raghav wrote:
> - Open issue:
> * btrfs superblock location for zoned devices is expected to be in 0,
>   512GB(mirror) and 4TB(mirror) in the device. Zoned devices with po2
>   zone size will naturally align with these superblock location but non
>   po2 devices will not align with 512GB and 4TB offset.
> 
>   The current approach for npo2 devices is to place the superblock mirror
>   zones near   512GB and 4TB that is **aligned to the zone size**.

I don't like that, the offsets have been chosen so the values are fixed
and also future proof in case the zone size increases significantly. The
natural alignment of the pow2 zones makes it fairly trivial.

If I understand correctly what you suggest, it would mean that if zone
is eg. 5G and starts at 510G then the superblock should start at 510G,
right? And with another device that has 7G zone size the nearest
multiple is 511G. And so on.

That makes it all less predictable, depending on the physical device
constraints that are affecting the logical data structures of the
filesystem. We tried to avoid that with pow2, the only thing that
depends on the device is that the range from the super block offsets is
always 2 zones.

I really want to keep the offsets for all zoned devices the same and
adapt the code that's handling the writes. This is possible with the
non-pow2 too, the first write is set to the expected offset, leaving the
beginning of the zone unused.

>   This
>   is of no issue for normal operation as we keep track where the superblock
>   mirror are placed but this can cause an issue with recovery tools for
>   zoned devices as they expect mirror superblock to be in 512GB and 4TB.

Yeah the tools need to be updated, btrfs-progs and suite of blk* in
util-linux.

>   Note that ATM, recovery tools such as `btrfs check` does not work for
>   image dumps for zoned devices even for po2 zone sizes.

I thought this worked, but if you find something that does not please
report that to Johannes or Naohiro.

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


  parent reply	other threads:[~2022-05-06 10:05 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20220506081106eucas1p181e83ef352eb8bfb1752bee0cf84020f@eucas1p1.samsung.com>
2022-05-06  8:10 ` [PATCH v3 00/11] support non power of 2 zoned devices Pankaj Raghav
2022-05-06  8:10   ` [dm-devel] " Pankaj Raghav
     [not found]   ` <CGME20220506081107eucas1p1070e00b208e00090c235017435be1593@eucas1p1.samsung.com>
2022-05-06  8:10     ` [PATCH v3 01/11] block: make blkdev_nr_zones and blk_queue_zone_no generic for npo2 zsze Pankaj Raghav
2022-05-06  8:10       ` [dm-devel] " Pankaj Raghav
     [not found]   ` <CGME20220506081108eucas1p2ca72ccafb05dfdcc5b8ba9393da1ce60@eucas1p2.samsung.com>
2022-05-06  8:10     ` [PATCH v3 02/11] block: allow blk-zoned devices to have non-power-of-2 zone size Pankaj Raghav
2022-05-06  8:10       ` [dm-devel] " Pankaj Raghav
     [not found]   ` <CGME20220506081109eucas1p26bbb68a1740b1af923ed862a93112780@eucas1p2.samsung.com>
2022-05-06  8:10     ` [PATCH v3 03/11] nvme: zns: Allow ZNS drives that have non-power_of_2 " Pankaj Raghav
2022-05-06  8:10       ` [dm-devel] " Pankaj Raghav
     [not found]   ` <CGME20220506081110eucas1p1b6c624ddca1c41b9838bb5b85f8ca5ff@eucas1p1.samsung.com>
2022-05-06  8:10     ` [PATCH v3 04/11] nvmet: Allow ZNS target to support non-power_of_2 zone sizes Pankaj Raghav
2022-05-06  8:10       ` [dm-devel] " Pankaj Raghav
     [not found]   ` <CGME20220506081111eucas1p11e4dd5a89ce49939bbea57433cea046f@eucas1p1.samsung.com>
2022-05-06  8:10     ` [PATCH v3 05/11] btrfs: zoned: Cache superblock location in btrfs_zoned_device_info Pankaj Raghav
2022-05-06  8:10       ` [dm-devel] " Pankaj Raghav
     [not found]   ` <CGME20220506081112eucas1p2f6116cb713749c259a6da533df9c2505@eucas1p2.samsung.com>
2022-05-06  8:11     ` [PATCH v3 06/11] btrfs: zoned: Make sb_zone_number function non power of 2 compatible Pankaj Raghav
2022-05-06  8:11       ` [dm-devel] " Pankaj Raghav
     [not found]   ` <CGME20220506081113eucas1p25deb73a4b7898476d2e8e3d35b16f879@eucas1p2.samsung.com>
2022-05-06  8:11     ` [PATCH v3 07/11] btrfs: zoned: use generic btrfs zone helpers to support npo2 zoned devices Pankaj Raghav
2022-05-06  8:11       ` [dm-devel] " Pankaj Raghav
     [not found]   ` <CGME20220506081114eucas1p1a9d86eb429a6f68c29d1980891f49786@eucas1p1.samsung.com>
2022-05-06  8:11     ` [PATCH v3 08/11] btrfs: zoned: relax the alignment constraint for " Pankaj Raghav
2022-05-06  8:11       ` [dm-devel] " Pankaj Raghav
     [not found]   ` <CGME20220506081115eucas1p2e7bed137c74be42a702732027581330e@eucas1p2.samsung.com>
2022-05-06  8:11     ` [PATCH v3 09/11] zonefs: allow non power of 2 " Pankaj Raghav
2022-05-06  8:11       ` [dm-devel] " Pankaj Raghav
     [not found]   ` <CGME20220506081116eucas1p2cce67bbf30f4c9c4e6854965be41b098@eucas1p2.samsung.com>
2022-05-06  8:11     ` [PATCH v3 10/11] null_blk: " Pankaj Raghav
2022-05-06  8:11       ` [dm-devel] " Pankaj Raghav
2022-05-06 15:47       ` Damien Le Moal
2022-05-06 15:47         ` [dm-devel] " Damien Le Moal
2022-05-09 11:06         ` Pankaj Raghav
2022-05-09 11:06           ` [dm-devel] " Pankaj Raghav
2022-05-09 11:31           ` Damien Le Moal
2022-05-09 11:31             ` [dm-devel] " Damien Le Moal
2022-05-09 11:56             ` Pankaj Raghav
2022-05-09 11:56               ` [dm-devel] " Pankaj Raghav
2022-05-12 17:22               ` Bart Van Assche
2022-05-12 17:22                 ` [dm-devel] " Bart Van Assche
     [not found]   ` <CGME20220506081118eucas1p17f3c29cc36d748c3b5a3246f069f434a@eucas1p1.samsung.com>
2022-05-06  8:11     ` [PATCH v3 11/11] dm-zoned: ensure only power of 2 zone sizes are allowed Pankaj Raghav
2022-05-06  8:11       ` [dm-devel] " Pankaj Raghav
2022-05-06 15:41       ` Damien Le Moal
2022-05-06 15:41         ` [dm-devel] " Damien Le Moal
2022-05-09 11:03         ` Pankaj Raghav
2022-05-09 11:03           ` [dm-devel] " Pankaj Raghav
2022-05-09 16:05           ` Mike Snitzer
2022-05-09 16:05             ` [dm-devel] " Mike Snitzer
2022-05-09 18:54       ` David Sterba
2022-05-09 18:54         ` [dm-devel] " David Sterba
2022-05-11 14:39         ` Pankaj Raghav
2022-05-11 14:39           ` [dm-devel] " Pankaj Raghav
2022-05-11 16:00           ` David Sterba
2022-05-11 16:00             ` [dm-devel] " David Sterba
2022-05-12  8:27             ` Pankaj Raghav
2022-05-12  8:27               ` [dm-devel] " Pankaj Raghav
2022-05-06 10:00   ` David Sterba [this message]
2022-05-06 10:00     ` [dm-devel] [PATCH v3 00/11] support non power of 2 zoned devices David Sterba
2022-05-09 11:02     ` Pankaj Raghav
2022-05-09 11:02       ` [dm-devel] " Pankaj Raghav

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220506100054.GZ18596@suse.cz \
    --to=dsterba@suse.cz \
    --cc=agk@redhat.com \
    --cc=axboe@fb.com \
    --cc=axboe@kernel.dk \
    --cc=bvanassche@acm.org \
    --cc=clm@fb.com \
    --cc=damien.lemoal@opensource.wdc.com \
    --cc=dm-devel@redhat.com \
    --cc=dsterba@suse.com \
    --cc=gost.dev@samsung.com \
    --cc=hare@suse.de \
    --cc=hch@lst.de \
    --cc=jaegeuk@kernel.org \
    --cc=jiangbo.365@bytedance.com \
    --cc=jonathan.derrick@linux.dev \
    --cc=josef@toxicpanda.com \
    --cc=jth@kernel.org \
    --cc=kbusch@kernel.org \
    --cc=kch@nvidia.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=matias.bjorling@wdc.com \
    --cc=naohiro.aota@wdc.com \
    --cc=p.raghav@samsung.com \
    --cc=sagi@grimberg.me \
    --cc=snitzer@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.