All of lore.kernel.org
 help / color / mirror / Atom feed
* [LSF/MM/BPF BoF] BoF for Zoned Storage
@ 2022-03-03  0:56 Luis Chamberlain
  2022-03-03  1:03 ` Luis Chamberlain
                   ` (8 more replies)
  0 siblings, 9 replies; 59+ messages in thread
From: Luis Chamberlain @ 2022-03-03  0:56 UTC (permalink / raw)
  To: linux-block, linux-fsdevel, lsf-pc
  Cc: Matias Bjørling, Javier González, Damien Le Moal,
	Bart Van Assche, Adam Manzanares, Keith Busch,
	Johannes Thumshirn, Naohiro Aota, Pankaj Raghav, Kanchan Joshi,
	Nitesh Shetty, mcgrof

Thinking proactively about LSFMM, regarding just Zone storage..

I'd like to propose a BoF for Zoned Storage. The point of it is
to address the existing point points we have and take advantage of
having folks in the room we can likely settle on things faster which
otherwise would take years.

I'll throw at least one topic out:

  * Raw access for zone append for microbenchmarks:
  	- are we really happy with the status quo?
	- if not what outlets do we have?

I think the nvme passthrogh stuff deserves it's own shared
discussion though and should not make it part of the BoF.

  Luis

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [LSF/MM/BPF BoF] BoF for Zoned Storage
  2022-03-03  0:56 [LSF/MM/BPF BoF] BoF for Zoned Storage Luis Chamberlain
@ 2022-03-03  1:03 ` Luis Chamberlain
  2022-03-03  1:33 ` Bart Van Assche
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 59+ messages in thread
From: Luis Chamberlain @ 2022-03-03  1:03 UTC (permalink / raw)
  To: linux-block, linux-fsdevel, lsf-pc
  Cc: Matias Bjørling, Javier González, Damien Le Moal,
	Bart Van Assche, Adam Manzanares, Keith Busch,
	Johannes Thumshirn, Naohiro Aota, Pankaj Raghav, Kanchan Joshi,
	Nitesh Shetty

On Wed, Mar 02, 2022 at 04:56:54PM -0800, Luis Chamberlain wrote:
> Thinking proactively about LSFMM, regarding just Zone storage..
> 
> I'd like to propose a BoF for Zoned Storage. The point of it is
> to address the existing point points we have and take advantage of

                          ^pain points

> having folks in the room we can likely settle on things faster which
> otherwise would take years.
> 
> I'll throw at least one topic out:
> 
>   * Raw access for zone append for microbenchmarks:
>   	- are we really happy with the status quo?
> 	- if not what outlets do we have?
> 
> I think the nvme passthrogh stuff deserves it's own shared
> discussion though and should not make it part of the BoF.
> 
>   Luis

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [LSF/MM/BPF BoF] BoF for Zoned Storage
  2022-03-03  0:56 [LSF/MM/BPF BoF] BoF for Zoned Storage Luis Chamberlain
  2022-03-03  1:03 ` Luis Chamberlain
@ 2022-03-03  1:33 ` Bart Van Assche
  2022-03-03  4:31   ` Matias Bjørling
  2022-03-03  5:32 ` Javier González
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 59+ messages in thread
From: Bart Van Assche @ 2022-03-03  1:33 UTC (permalink / raw)
  To: Luis Chamberlain, linux-block, linux-fsdevel, lsf-pc
  Cc: Matias Bjørling, Javier González, Damien Le Moal,
	Adam Manzanares, Keith Busch, Johannes Thumshirn, Naohiro Aota,
	Pankaj Raghav, Kanchan Joshi, Nitesh Shetty

On 3/2/22 16:56, Luis Chamberlain wrote:
> Thinking proactively about LSFMM, regarding just Zone storage..
> 
> I'd like to propose a BoF for Zoned Storage. The point of it is
> to address the existing point points we have and take advantage of
> having folks in the room we can likely settle on things faster which
> otherwise would take years.
> 
> I'll throw at least one topic out:
> 
>    * Raw access for zone append for microbenchmarks:
>    	- are we really happy with the status quo?
> 	- if not what outlets do we have?
> 
> I think the nvme passthrogh stuff deserves it's own shared
> discussion though and should not make it part of the BoF.

Since I'm working on zoned storage I'd like to participate.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* RE: [LSF/MM/BPF BoF] BoF for Zoned Storage
  2022-03-03  1:33 ` Bart Van Assche
@ 2022-03-03  4:31   ` Matias Bjørling
  2022-03-03  5:21     ` Adam Manzanares
  0 siblings, 1 reply; 59+ messages in thread
From: Matias Bjørling @ 2022-03-03  4:31 UTC (permalink / raw)
  To: Bart Van Assche, Luis Chamberlain, linux-block, linux-fsdevel, lsf-pc
  Cc: Javier González, Damien Le Moal, Adam Manzanares,
	Keith Busch, Johannes Thumshirn, Naohiro Aota, Pankaj Raghav,
	Kanchan Joshi, Nitesh Shetty

Good idea to bring up zoned storage topics. I'd like to participate as well.

Best, Matias

-----Original Message-----
From: Bart Van Assche <bvanassche@acm.org> 
Sent: Thursday, 3 March 2022 02.33
To: Luis Chamberlain <mcgrof@kernel.org>; linux-block@vger.kernel.org; linux-fsdevel@vger.kernel.org; lsf-pc@lists.linux-foundation.org
Cc: Matias Bjørling <Matias.Bjorling@wdc.com>; Javier González <javier.gonz@samsung.com>; Damien Le Moal <Damien.LeMoal@wdc.com>; Adam Manzanares <a.manzanares@samsung.com>; Keith Busch <Keith.Busch@wdc.com>; Johannes Thumshirn <Johannes.Thumshirn@wdc.com>; Naohiro Aota <Naohiro.Aota@wdc.com>; Pankaj Raghav <pankydev8@gmail.com>; Kanchan Joshi <joshi.k@samsung.com>; Nitesh Shetty <nj.shetty@samsung.com>
Subject: Re: [LSF/MM/BPF BoF] BoF for Zoned Storage

On 3/2/22 16:56, Luis Chamberlain wrote:
> Thinking proactively about LSFMM, regarding just Zone storage..
> 
> I'd like to propose a BoF for Zoned Storage. The point of it is to 
> address the existing point points we have and take advantage of having 
> folks in the room we can likely settle on things faster which 
> otherwise would take years.
> 
> I'll throw at least one topic out:
> 
>    * Raw access for zone append for microbenchmarks:
>    	- are we really happy with the status quo?
> 	- if not what outlets do we have?
> 
> I think the nvme passthrogh stuff deserves it's own shared discussion 
> though and should not make it part of the BoF.

Since I'm working on zoned storage I'd like to participate.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [LSF/MM/BPF BoF] BoF for Zoned Storage
  2022-03-03  4:31   ` Matias Bjørling
@ 2022-03-03  5:21     ` Adam Manzanares
  0 siblings, 0 replies; 59+ messages in thread
From: Adam Manzanares @ 2022-03-03  5:21 UTC (permalink / raw)
  To: Matias Bjørling
  Cc: Bart Van Assche, Luis Chamberlain, linux-block, linux-fsdevel,
	lsf-pc, Javier González, Damien Le Moal, Keith Busch,
	Johannes Thumshirn, Naohiro Aota, Pankaj Raghav, Kanchan Joshi,
	Nitesh Shetty

On Thu, Mar 03, 2022 at 04:31:58AM +0000, Matias Bjørling wrote:
> Good idea to bring up zoned storage topics. I'd like to participate as well.
> 
> Best, Matias

I'm also working on zoned storage ATM so count me in as well.

> 
> -----Original Message-----
> From: Bart Van Assche <bvanassche@acm.org> 
> Sent: Thursday, 3 March 2022 02.33
> To: Luis Chamberlain <mcgrof@kernel.org>; linux-block@vger.kernel.org; linux-fsdevel@vger.kernel.org; lsf-pc@lists.linux-foundation.org
> Cc: Matias Bjørling <Matias.Bjorling@wdc.com>; Javier González <javier.gonz@samsung.com>; Damien Le Moal <Damien.LeMoal@wdc.com>; Adam Manzanares <a.manzanares@samsung.com>; Keith Busch <Keith.Busch@wdc.com>; Johannes Thumshirn <Johannes.Thumshirn@wdc.com>; Naohiro Aota <Naohiro.Aota@wdc.com>; Pankaj Raghav <pankydev8@gmail.com>; Kanchan Joshi <joshi.k@samsung.com>; Nitesh Shetty <nj.shetty@samsung.com>
> Subject: Re: [LSF/MM/BPF BoF] BoF for Zoned Storage
> 
> On 3/2/22 16:56, Luis Chamberlain wrote:
> > Thinking proactively about LSFMM, regarding just Zone storage..
> > 
> > I'd like to propose a BoF for Zoned Storage. The point of it is to 
> > address the existing point points we have and take advantage of having 
> > folks in the room we can likely settle on things faster which 
> > otherwise would take years.
> > 
> > I'll throw at least one topic out:
> > 
> >    * Raw access for zone append for microbenchmarks:
> >    	- are we really happy with the status quo?
> > 	- if not what outlets do we have?
> > 
> > I think the nvme passthrogh stuff deserves it's own shared discussion 
> > though and should not make it part of the BoF.
> 
> Since I'm working on zoned storage I'd like to participate.
> 
> Thanks,
> 
> Bart.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [LSF/MM/BPF BoF] BoF for Zoned Storage
  2022-03-03  0:56 [LSF/MM/BPF BoF] BoF for Zoned Storage Luis Chamberlain
  2022-03-03  1:03 ` Luis Chamberlain
  2022-03-03  1:33 ` Bart Van Assche
@ 2022-03-03  5:32 ` Javier González
  2022-03-03  6:29   ` Javier González
  2022-03-03  7:21 ` Hannes Reinecke
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 59+ messages in thread
From: Javier González @ 2022-03-03  5:32 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: linux-block, linux-fsdevel, lsf-pc, Matias Bjørling,
	Javier González, Damien Le Moal, Bart Van Assche,
	Adam Manzanares, Keith Busch, Johannes Thumshirn, Naohiro Aota,
	Pankaj Raghav, Kanchan Joshi, Nitesh Shetty


> On 3 Mar 2022, at 04.24, Luis Chamberlain <mcgrof@kernel.org> wrote:
> 
> Thinking proactively about LSFMM, regarding just Zone storage..
> 
> I'd like to propose a BoF for Zoned Storage. The point of it is
> to address the existing point points we have and take advantage of
> having folks in the room we can likely settle on things faster which
> otherwise would take years.
> 
> I'll throw at least one topic out:
> 
>  * Raw access for zone append for microbenchmarks:
>      - are we really happy with the status quo?
>    - if not what outlets do we have?
> 
> I think the nvme passthrogh stuff deserves it's own shared
> discussion though and should not make it part of the BoF.
> 
>  Luis

Thanks for proposing this, Luis. 

I’d like to join this discussion too. 

Thanks,
Javier 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [LSF/MM/BPF BoF] BoF for Zoned Storage
  2022-03-03  5:32 ` Javier González
@ 2022-03-03  6:29   ` Javier González
  2022-03-03  7:54     ` Pankaj Raghav
                       ` (2 more replies)
  0 siblings, 3 replies; 59+ messages in thread
From: Javier González @ 2022-03-03  6:29 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: linux-block, linux-fsdevel, lsf-pc, Matias Bjørling,
	Damien Le Moal, Bart Van Assche, Adam Manzanares, Keith Busch,
	Johannes Thumshirn, Naohiro Aota, Pankaj Raghav, Kanchan Joshi,
	Nitesh Shetty

On 03.03.2022 06:32, Javier González wrote:
>
>> On 3 Mar 2022, at 04.24, Luis Chamberlain <mcgrof@kernel.org> wrote:
>>
>> Thinking proactively about LSFMM, regarding just Zone storage..
>>
>> I'd like to propose a BoF for Zoned Storage. The point of it is
>> to address the existing point points we have and take advantage of
>> having folks in the room we can likely settle on things faster which
>> otherwise would take years.
>>
>> I'll throw at least one topic out:
>>
>>  * Raw access for zone append for microbenchmarks:
>>      - are we really happy with the status quo?
>>    - if not what outlets do we have?
>>
>> I think the nvme passthrogh stuff deserves it's own shared
>> discussion though and should not make it part of the BoF.
>>
>>  Luis
>
>Thanks for proposing this, Luis.
>
>I’d like to join this discussion too.
>
>Thanks,
>Javier

Let me expand a bit on this. There is one topic that I would like to
cover in this session:

   - PO2 zone sizes
       In the past weeks we have been talking to Damien and Matias around
       the constraint that we currently have for PO2 zone sizes. While
       this has not been an issue for SMR HDDs, the gap that ZNS
       introduces between zone capacity and zone size causes holes in the
       address space. This unmapped LBA space has been the topic of
       discussion with several ZNS adopters.

       One of the things to note here is that even if the zone size is a
       PO2, the zone capacity is typically not. This means that even when
       we can use shifts to move around zones, the actual data placement
       algorithms need to deal with arbitrary sizes. So at the end of the
       day applications that use a contiguous address space - like in a
       conventional block device -, will have to deal with this.

       Since chunk_sectors is no longer required to be a PO2, we have
       started the work in removing this constraint. We are working in 2
       phases:

         1. Add an emulation layer in NVMe driver to simulate PO2 devices
	when the HW presents a zone_capacity = zone_size. This is a
	product of one of Damien's early concerns about supporting
	existing applications and FSs that work under the PO2
	assumption. We will post these patches in the next few days.

         2. Remove the PO2 constraint from the block layer and add
	support for arbitrary zone support in btrfs. This will allow the
	raw block device to be present for arbitrary zone sizes (and
	capacities) and btrfs will be able to use it natively.

	For completeness, F2FS works natively in PO2 zone sizes, so we
	will not do work here for now, as the changes will not bring any
	benefit. For F2FS, the emulation layer will help use devices
	that do not have PO2 zone sizes.

      We are working towards having at least a RFC of (2) before LSF/MM.
      Since this is a topic that involves several parties across the
      stack, I believe that a F2F conversation will help laying the path
      forward.

Thanks,
Javier


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [LSF/MM/BPF BoF] BoF for Zoned Storage
  2022-03-03  0:56 [LSF/MM/BPF BoF] BoF for Zoned Storage Luis Chamberlain
                   ` (2 preceding siblings ...)
  2022-03-03  5:32 ` Javier González
@ 2022-03-03  7:21 ` Hannes Reinecke
  2022-03-03  8:55   ` Damien Le Moal
  2022-03-03  7:38 ` Kanchan Joshi
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 59+ messages in thread
From: Hannes Reinecke @ 2022-03-03  7:21 UTC (permalink / raw)
  To: Luis Chamberlain, linux-block, linux-fsdevel, lsf-pc
  Cc: Matias Bjørling, Javier González, Damien Le Moal,
	Bart Van Assche, Adam Manzanares, Keith Busch,
	Johannes Thumshirn, Naohiro Aota, Pankaj Raghav, Kanchan Joshi,
	Nitesh Shetty

On 3/3/22 01:56, Luis Chamberlain wrote:
> Thinking proactively about LSFMM, regarding just Zone storage..
> 
> I'd like to propose a BoF for Zoned Storage. The point of it is
> to address the existing point points we have and take advantage of
> having folks in the room we can likely settle on things faster which
> otherwise would take years.
> 
> I'll throw at least one topic out:
> 
>    * Raw access for zone append for microbenchmarks:
>    	- are we really happy with the status quo?
> 	- if not what outlets do we have?
> 
> I think the nvme passthrogh stuff deserves it's own shared
> discussion though and should not make it part of the BoF.
> 
Yeah, count me in.
But we need Matias to be present; otherwise we'll just grope in the dark :-)

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [LSF/MM/BPF BoF] BoF for Zoned Storage
  2022-03-03  0:56 [LSF/MM/BPF BoF] BoF for Zoned Storage Luis Chamberlain
                   ` (3 preceding siblings ...)
  2022-03-03  7:21 ` Hannes Reinecke
@ 2022-03-03  7:38 ` Kanchan Joshi
  2022-03-03  8:43 ` Johannes Thumshirn
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 59+ messages in thread
From: Kanchan Joshi @ 2022-03-03  7:38 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: linux-block, linux-fsdevel, lsf-pc, Matias Bjørling,
	Javier González, Damien Le Moal, Bart Van Assche,
	Adam Manzanares, Keith Busch, Johannes Thumshirn, Naohiro Aota,
	Pankaj Raghav, Kanchan Joshi, Nitesh Shetty

On Thu, Mar 3, 2022 at 6:51 AM Luis Chamberlain <mcgrof@kernel.org> wrote:
>
> Thinking proactively about LSFMM, regarding just Zone storage..
>
> I'd like to propose a BoF for Zoned Storage. The point of it is
> to address the existing point points we have and take advantage of
> having folks in the room we can likely settle on things faster which
> otherwise would take years.
>
> I'll throw at least one topic out:
>
>   * Raw access for zone append for microbenchmarks:
>         - are we really happy with the status quo?
>         - if not what outlets do we have?

Many choices were discussed/implemented in my last attempt to do
append via io_uring/aio.
Without consensus though. Perhaps F2F can help reaching it. I'd like
to join this discussion.

> I think the nvme passthrogh stuff deserves it's own shared
> discussion though and should not make it part of the BoF.

Yes indeed, there is more to it.
We may anyway have to forgo append (and other-commands requiring extra
result) in the first drop of uring-passthru at least. For good
reasons. But let's discuss that in a separate thread along with some
code.

Thanks,
-- 
Kanchan

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [LSF/MM/BPF BoF] BoF for Zoned Storage
  2022-03-03  6:29   ` Javier González
@ 2022-03-03  7:54     ` Pankaj Raghav
  2022-03-03  9:49     ` Damien Le Moal
  2022-03-03 16:12     ` Himanshu Madhani
  2 siblings, 0 replies; 59+ messages in thread
From: Pankaj Raghav @ 2022-03-03  7:54 UTC (permalink / raw)
  To: Javier González, Luis Chamberlain
  Cc: Luis Chamberlain, linux-block, linux-fsdevel, lsf-pc,
	Matias Bjørling, Damien Le Moal, Bart Van Assche,
	Adam Manzanares, Keith Busch, Johannes Thumshirn, Naohiro Aota,
	Kanchan Joshi, Nitesh Shetty

Thanks Luis and Javier for the proposal.
On Thu, Mar 03, 2022 at 07:29:50AM +0100, Javier González wrote:
> On 03.03.2022 06:32, Javier González wrote:
> Let me expand a bit on this. There is one topic that I would like to
> cover in this session:
> 
>   - PO2 zone sizes
As I am working on this topic, I would like to join as well.

It could be tricky for me to be present there physically due to some
constraints so will there be a possibility to also join also virtually?

--
Pankaj

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [LSF/MM/BPF BoF] BoF for Zoned Storage
  2022-03-03  0:56 [LSF/MM/BPF BoF] BoF for Zoned Storage Luis Chamberlain
                   ` (4 preceding siblings ...)
  2022-03-03  7:38 ` Kanchan Joshi
@ 2022-03-03  8:43 ` Johannes Thumshirn
  2022-03-03 18:20 ` Viacheslav Dubeyko
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 59+ messages in thread
From: Johannes Thumshirn @ 2022-03-03  8:43 UTC (permalink / raw)
  To: Luis Chamberlain, linux-block, linux-fsdevel, lsf-pc
  Cc: Matias Bjørling, Javier González, Damien Le Moal,
	Bart Van Assche, Adam Manzanares, Keith Busch, Naohiro Aota,
	Pankaj Raghav, Kanchan Joshi, Nitesh Shetty

On 03/03/2022 01:57, Luis Chamberlain wrote:
> Thinking proactively about LSFMM, regarding just Zone storage..
> 
> I'd like to propose a BoF for Zoned Storage. The point of it is
> to address the existing point points we have and take advantage of
> having folks in the room we can likely settle on things faster which
> otherwise would take years.
> 
> I'll throw at least one topic out:
> 
>   * Raw access for zone append for microbenchmarks:
>   	- are we really happy with the status quo?
> 	- if not what outlets do we have?
> 
> I think the nvme passthrogh stuff deserves it's own shared
> discussion though and should not make it part of the BoF.
> 

Working on zoned storage (Block and FS side) for quite some 
time now, so please count me in.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [LSF/MM/BPF BoF] BoF for Zoned Storage
  2022-03-03  7:21 ` Hannes Reinecke
@ 2022-03-03  8:55   ` Damien Le Moal
  0 siblings, 0 replies; 59+ messages in thread
From: Damien Le Moal @ 2022-03-03  8:55 UTC (permalink / raw)
  To: Hannes Reinecke, Luis Chamberlain, linux-block, linux-fsdevel, lsf-pc
  Cc: Matias Bjørling, Javier González, Bart Van Assche,
	Adam Manzanares, Keith Busch, Johannes Thumshirn, Naohiro Aota,
	Pankaj Raghav, Kanchan Joshi, Nitesh Shetty

On 2022/03/03 9:21, Hannes Reinecke wrote:
> On 3/3/22 01:56, Luis Chamberlain wrote:
>> Thinking proactively about LSFMM, regarding just Zone storage..
>>
>> I'd like to propose a BoF for Zoned Storage. The point of it is
>> to address the existing point points we have and take advantage of
>> having folks in the room we can likely settle on things faster which
>> otherwise would take years.
>>
>> I'll throw at least one topic out:
>>
>>    * Raw access for zone append for microbenchmarks:
>>    	- are we really happy with the status quo?
>> 	- if not what outlets do we have?
>>
>> I think the nvme passthrogh stuff deserves it's own shared
>> discussion though and should not make it part of the BoF.
>>
> Yeah, count me in.
> But we need Matias to be present; otherwise we'll just grope in the dark :-)

I will be around too :)




-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [LSF/MM/BPF BoF] BoF for Zoned Storage
  2022-03-03  6:29   ` Javier González
  2022-03-03  7:54     ` Pankaj Raghav
@ 2022-03-03  9:49     ` Damien Le Moal
  2022-03-03 14:55       ` Adam Manzanares
  2022-03-03 16:12     ` Himanshu Madhani
  2 siblings, 1 reply; 59+ messages in thread
From: Damien Le Moal @ 2022-03-03  9:49 UTC (permalink / raw)
  To: Javier González, Luis Chamberlain
  Cc: linux-block, linux-fsdevel, lsf-pc, Matias Bjørling,
	Bart Van Assche, Adam Manzanares, Keith Busch,
	Johannes Thumshirn, Naohiro Aota, Pankaj Raghav, Kanchan Joshi,
	Nitesh Shetty

On 2022/03/03 8:29, Javier González wrote:
> On 03.03.2022 06:32, Javier González wrote:
>>
>>> On 3 Mar 2022, at 04.24, Luis Chamberlain <mcgrof@kernel.org> wrote:
>>>
>>> Thinking proactively about LSFMM, regarding just Zone storage..
>>>
>>> I'd like to propose a BoF for Zoned Storage. The point of it is
>>> to address the existing point points we have and take advantage of
>>> having folks in the room we can likely settle on things faster which
>>> otherwise would take years.
>>>
>>> I'll throw at least one topic out:
>>>
>>>  * Raw access for zone append for microbenchmarks:
>>>      - are we really happy with the status quo?
>>>    - if not what outlets do we have?
>>>
>>> I think the nvme passthrogh stuff deserves it's own shared
>>> discussion though and should not make it part of the BoF.
>>>
>>>  Luis
>>
>> Thanks for proposing this, Luis.
>>
>> I’d like to join this discussion too.
>>
>> Thanks,
>> Javier
> 
> Let me expand a bit on this. There is one topic that I would like to
> cover in this session:
> 
>    - PO2 zone sizes
>        In the past weeks we have been talking to Damien and Matias around
>        the constraint that we currently have for PO2 zone sizes. While
>        this has not been an issue for SMR HDDs, the gap that ZNS
>        introduces between zone capacity and zone size causes holes in the
>        address space. This unmapped LBA space has been the topic of
>        discussion with several ZNS adopters.
> 
>        One of the things to note here is that even if the zone size is a
>        PO2, the zone capacity is typically not. This means that even when
>        we can use shifts to move around zones, the actual data placement
>        algorithms need to deal with arbitrary sizes. So at the end of the
>        day applications that use a contiguous address space - like in a
>        conventional block device -, will have to deal with this.

"the actual data placement algorithms need to deal with arbitrary sizes"

???

No it does not. With zone cap < zone size, the amount of sectors that can be
used within a zone may be smaller than the zone size, but:
1) Writes still must be issued at the WP location so choosing a zone for writing
data has the same constraint regardless of the zone capacity: Do I have enough
usable sectors left in the zone ?
2) Reading after the WP is not useful (if not outright stupid), regardless of
where the last usable sector in the zone is (at zone start + zone size or at
zone start + zone cap).

And talking about "use a contiguous address space" is in my opinion nonsense in
the context of zoned storage since by definition, everything has to be managed
using zones as units. The only sensible range for a "contiguous address space"
is "zone start + min(zone cap, zone size)".

>        Since chunk_sectors is no longer required to be a PO2, we have
>        started the work in removing this constraint. We are working in 2
>        phases:
> 
>          1. Add an emulation layer in NVMe driver to simulate PO2 devices
> 	when the HW presents a zone_capacity = zone_size. This is a
> 	product of one of Damien's early concerns about supporting
> 	existing applications and FSs that work under the PO2
> 	assumption. We will post these patches in the next few days.
> 
>          2. Remove the PO2 constraint from the block layer and add
> 	support for arbitrary zone support in btrfs. This will allow the
> 	raw block device to be present for arbitrary zone sizes (and
> 	capacities) and btrfs will be able to use it natively.

Zone sizes cannot be arbitrary in btrfs since block groups must be a multiple of
64K. So constraints remain and should be enforced, at least by btrfs.

> 
> 	For completeness, F2FS works natively in PO2 zone sizes, so we
> 	will not do work here for now, as the changes will not bring any
> 	benefit. For F2FS, the emulation layer will help use devices
> 	that do not have PO2 zone sizes.
> 
>       We are working towards having at least a RFC of (2) before LSF/MM.
>       Since this is a topic that involves several parties across the
>       stack, I believe that a F2F conversation will help laying the path
>       forward.
> 
> Thanks,
> Javier
> 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [LSF/MM/BPF BoF] BoF for Zoned Storage
  2022-03-03  9:49     ` Damien Le Moal
@ 2022-03-03 14:55       ` Adam Manzanares
  2022-03-03 15:22         ` Damien Le Moal
  0 siblings, 1 reply; 59+ messages in thread
From: Adam Manzanares @ 2022-03-03 14:55 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Javier González, Luis Chamberlain, linux-block,
	linux-fsdevel, lsf-pc, Matias Bjørling, Bart Van Assche,
	Keith Busch, Johannes Thumshirn, Naohiro Aota, Pankaj Raghav,
	Kanchan Joshi, Nitesh Shetty

On Thu, Mar 03, 2022 at 09:49:06AM +0000, Damien Le Moal wrote:
> On 2022/03/03 8:29, Javier González wrote:
> > On 03.03.2022 06:32, Javier González wrote:
> >>
> >>> On 3 Mar 2022, at 04.24, Luis Chamberlain <mcgrof@kernel.org> wrote:
> >>>
> >>> Thinking proactively about LSFMM, regarding just Zone storage..
> >>>
> >>> I'd like to propose a BoF for Zoned Storage. The point of it is
> >>> to address the existing point points we have and take advantage of
> >>> having folks in the room we can likely settle on things faster which
> >>> otherwise would take years.
> >>>
> >>> I'll throw at least one topic out:
> >>>
> >>>  * Raw access for zone append for microbenchmarks:
> >>>      - are we really happy with the status quo?
> >>>    - if not what outlets do we have?
> >>>
> >>> I think the nvme passthrogh stuff deserves it's own shared
> >>> discussion though and should not make it part of the BoF.
> >>>
> >>>  Luis
> >>
> >> Thanks for proposing this, Luis.
> >>
> >> I’d like to join this discussion too.
> >>
> >> Thanks,
> >> Javier
> > 
> > Let me expand a bit on this. There is one topic that I would like to
> > cover in this session:
> > 
> >    - PO2 zone sizes
> >        In the past weeks we have been talking to Damien and Matias around
> >        the constraint that we currently have for PO2 zone sizes. While
> >        this has not been an issue for SMR HDDs, the gap that ZNS
> >        introduces between zone capacity and zone size causes holes in the
> >        address space. This unmapped LBA space has been the topic of
> >        discussion with several ZNS adopters.
> > 
> >        One of the things to note here is that even if the zone size is a
> >        PO2, the zone capacity is typically not. This means that even when
> >        we can use shifts to move around zones, the actual data placement
> >        algorithms need to deal with arbitrary sizes. So at the end of the
> >        day applications that use a contiguous address space - like in a
> >        conventional block device -, will have to deal with this.
> 
> "the actual data placement algorithms need to deal with arbitrary sizes"
> 
> ???
> 
> No it does not. With zone cap < zone size, the amount of sectors that can be
> used within a zone may be smaller than the zone size, but:
> 1) Writes still must be issued at the WP location so choosing a zone for writing
> data has the same constraint regardless of the zone capacity: Do I have enough
> usable sectors left in the zone ?

Are you saying holes are irrelevant because an application has to know the 
status of a zone by querying the device for the zone status before using a zone
and at that point it should know a start LBA? I see your point here but we have
to assume things to arrive at this conclusion.

Let's think of another scenario where the drive is managed by a user space 
application that knows the status of zones and picks a zone because it knows 
it is free. To calculate the start offset in terms of LBAs the application has 
to use the difference in zone_size and zone_cap to calculate the write offset
in terms of LBAs. 

My argument is that the zone_size is a construct conceived to make a ZNS zone
a power of 2 that creates a hole in the LBA space. Applications don't want
to deal with the power of 2 constraint and neither do devices. It seems like
the existing zoned kernel infrastructure, which made sense for SMR, pushed 
this constraint onto devices and onto users. Arguments can be made for where 
complexity should lie, but I don't think this decision made things easier for
someone to use a ZNS SSD as a block device. 

> 2) Reading after the WP is not useful (if not outright stupid), regardless of
> where the last usable sector in the zone is (at zone start + zone size or at
> zone start + zone cap).

Of course but the with po2 you force useless LBA space even if you fill a zone.


> 
> And talking about "use a contiguous address space" is in my opinion nonsense in
> the context of zoned storage since by definition, everything has to be managed
> using zones as units. The only sensible range for a "contiguous address space"
> is "zone start + min(zone cap, zone size)".

Definitely disagree with this given previous arguments. This is a construct 
forced upon us because of zoned storage legacy.

> 
> >        Since chunk_sectors is no longer required to be a PO2, we have
> >        started the work in removing this constraint. We are working in 2
> >        phases:
> > 
> >          1. Add an emulation layer in NVMe driver to simulate PO2 devices
> > 	when the HW presents a zone_capacity = zone_size. This is a
> > 	product of one of Damien's early concerns about supporting
> > 	existing applications and FSs that work under the PO2
> > 	assumption. We will post these patches in the next few days.
> > 
> >          2. Remove the PO2 constraint from the block layer and add
> > 	support for arbitrary zone support in btrfs. This will allow the
> > 	raw block device to be present for arbitrary zone sizes (and
> > 	capacities) and btrfs will be able to use it natively.
> 
> Zone sizes cannot be arbitrary in btrfs since block groups must be a multiple of
> 64K. So constraints remain and should be enforced, at least by btrfs.

I don't think we should base a lot of decisions on the work that has gone into 
btrfs. I think it is very promising, but I don't think it is settled that it 
is the only way people will consume ZNS SSDs.

> 
> > 
> > 	For completeness, F2FS works natively in PO2 zone sizes, so we
> > 	will not do work here for now, as the changes will not bring any
> > 	benefit. For F2FS, the emulation layer will help use devices
> > 	that do not have PO2 zone sizes.
> > 
> >       We are working towards having at least a RFC of (2) before LSF/MM.
> >       Since this is a topic that involves several parties across the
> >       stack, I believe that a F2F conversation will help laying the path
> >       forward.
> > 
> > Thanks,
> > Javier
> > 
> 
> 
> -- 
> Damien Le Moal
> Western Digital Research

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [LSF/MM/BPF BoF] BoF for Zoned Storage
  2022-03-03 14:55       ` Adam Manzanares
@ 2022-03-03 15:22         ` Damien Le Moal
  2022-03-03 17:10           ` Adam Manzanares
  0 siblings, 1 reply; 59+ messages in thread
From: Damien Le Moal @ 2022-03-03 15:22 UTC (permalink / raw)
  To: Adam Manzanares
  Cc: Javier González, Luis Chamberlain, linux-block,
	linux-fsdevel, lsf-pc, Matias Bjørling, Bart Van Assche,
	Keith Busch, Johannes Thumshirn, Naohiro Aota, Pankaj Raghav,
	Kanchan Joshi, Nitesh Shetty

On 2022/03/03 16:55, Adam Manzanares wrote:
> On Thu, Mar 03, 2022 at 09:49:06AM +0000, Damien Le Moal wrote:
>> On 2022/03/03 8:29, Javier González wrote:
>>> On 03.03.2022 06:32, Javier González wrote:
>>>>
>>>>> On 3 Mar 2022, at 04.24, Luis Chamberlain <mcgrof@kernel.org> wrote:
>>>>>
>>>>> Thinking proactively about LSFMM, regarding just Zone storage..
>>>>>
>>>>> I'd like to propose a BoF for Zoned Storage. The point of it is
>>>>> to address the existing point points we have and take advantage of
>>>>> having folks in the room we can likely settle on things faster which
>>>>> otherwise would take years.
>>>>>
>>>>> I'll throw at least one topic out:
>>>>>
>>>>>  * Raw access for zone append for microbenchmarks:
>>>>>      - are we really happy with the status quo?
>>>>>    - if not what outlets do we have?
>>>>>
>>>>> I think the nvme passthrogh stuff deserves it's own shared
>>>>> discussion though and should not make it part of the BoF.
>>>>>
>>>>>  Luis
>>>>
>>>> Thanks for proposing this, Luis.
>>>>
>>>> I’d like to join this discussion too.
>>>>
>>>> Thanks,
>>>> Javier
>>>
>>> Let me expand a bit on this. There is one topic that I would like to
>>> cover in this session:
>>>
>>>    - PO2 zone sizes
>>>        In the past weeks we have been talking to Damien and Matias around
>>>        the constraint that we currently have for PO2 zone sizes. While
>>>        this has not been an issue for SMR HDDs, the gap that ZNS
>>>        introduces between zone capacity and zone size causes holes in the
>>>        address space. This unmapped LBA space has been the topic of
>>>        discussion with several ZNS adopters.
>>>
>>>        One of the things to note here is that even if the zone size is a
>>>        PO2, the zone capacity is typically not. This means that even when
>>>        we can use shifts to move around zones, the actual data placement
>>>        algorithms need to deal with arbitrary sizes. So at the end of the
>>>        day applications that use a contiguous address space - like in a
>>>        conventional block device -, will have to deal with this.
>>
>> "the actual data placement algorithms need to deal with arbitrary sizes"
>>
>> ???
>>
>> No it does not. With zone cap < zone size, the amount of sectors that can be
>> used within a zone may be smaller than the zone size, but:
>> 1) Writes still must be issued at the WP location so choosing a zone for writing
>> data has the same constraint regardless of the zone capacity: Do I have enough
>> usable sectors left in the zone ?
> 
> Are you saying holes are irrelevant because an application has to know the 
> status of a zone by querying the device for the zone status before using a zone
> and at that point it should know a start LBA? I see your point here but we have
> to assume things to arrive at this conclusion.

Of course holes are relevant. But their presence does not complicate anything
because the basic management of zones already has to deal with 2 sector ranges
in any zone: sectors that have already been written and the one that have not.
The "hole" for zone capacity != zone size case is simply another range to be
ignored.

And the only thing I am assuming here is that the software has a decent design,
that is, it is indeed zone aware and manages zones (their state and wp
position). That does not mean that one needs to do a report zones before every
IO (well, dumb application can do that if they want). Zone management is
initialized using a report zone command information but can be then be cached on
the host dram in any form suitable for the application.

> 
> Let's think of another scenario where the drive is managed by a user space 
> application that knows the status of zones and picks a zone because it knows 
> it is free. To calculate the start offset in terms of LBAs the application has 
> to use the difference in zone_size and zone_cap to calculate the write offset
> in terms of LBAs. 

What ? This does not make sense. The application simply needs to know the
current "soft" wp position and issue writes at that position and increment it
right away with the number of sectors written. Once that position reaches zone
cap, the zone is full. The hole behind that can be ignored. What is difficult
with this ? This is zone storage use 101.

> My argument is that the zone_size is a construct conceived to make a ZNS zone
> a power of 2 that creates a hole in the LBA space. Applications don't want
> to deal with the power of 2 constraint and neither do devices. It seems like
> the existing zoned kernel infrastructure, which made sense for SMR, pushed 
> this constraint onto devices and onto users. Arguments can be made for where 
> complexity should lie, but I don't think this decision made things easier for
> someone to use a ZNS SSD as a block device.

"Applications don't want to deal with the power of 2 constraint"

Well, we definitely are not talking to the same users then. Because I heard the
contrary from users who have actually deployed zoned storage at scale. And there
is nothing to deal with power of 2. This is not a constraint in itself. A
particular zone size is the constraint and for that, users are indeed never
satisfied (some want smaller zones, other bigger zones). So far, power of 2 size
has been mostly irrelevant or actually required because everybody understands
the CPU load benefits of bit shift arithmetic as opposed to CPU cycle hungry
multiplications and divisions.

> 
>> 2) Reading after the WP is not useful (if not outright stupid), regardless of
>> where the last usable sector in the zone is (at zone start + zone size or at
>> zone start + zone cap).
> 
> Of course but the with po2 you force useless LBA space even if you fill a zone.

And my point is: so what ? I do not see this as a problem given that accesses
must be zone based anyway.

>> And talking about "use a contiguous address space" is in my opinion nonsense in
>> the context of zoned storage since by definition, everything has to be managed
>> using zones as units. The only sensible range for a "contiguous address space"
>> is "zone start + min(zone cap, zone size)".
> 
> Definitely disagree with this given previous arguments. This is a construct 
> forced upon us because of zoned storage legacy.

What construct ? The zone is the unit. No matter its size, it *must* remain the
access management unit for the zoned software top be correct. Thinking that one
can correctly implement a zone compliant application, or any piece of software,
without managing zones and using them as the storage unit is in my opinion a bad
design bound to fail.

I may be wrong, of course, but I still have to be proven so by an actual use case.

> 
>>
>>>        Since chunk_sectors is no longer required to be a PO2, we have
>>>        started the work in removing this constraint. We are working in 2
>>>        phases:
>>>
>>>          1. Add an emulation layer in NVMe driver to simulate PO2 devices
>>> 	when the HW presents a zone_capacity = zone_size. This is a
>>> 	product of one of Damien's early concerns about supporting
>>> 	existing applications and FSs that work under the PO2
>>> 	assumption. We will post these patches in the next few days.
>>>
>>>          2. Remove the PO2 constraint from the block layer and add
>>> 	support for arbitrary zone support in btrfs. This will allow the
>>> 	raw block device to be present for arbitrary zone sizes (and
>>> 	capacities) and btrfs will be able to use it natively.
>>
>> Zone sizes cannot be arbitrary in btrfs since block groups must be a multiple of
>> 64K. So constraints remain and should be enforced, at least by btrfs.
> 
> I don't think we should base a lot of decisions on the work that has gone into 
> btrfs. I think it is very promising, but I don't think it is settled that it 
> is the only way people will consume ZNS SSDs.

Of course it is not. But not satisfying this constraint essentially disables
btrfs support. Ever heard of a regular block device that you cannot format with
ext4 or xfs ? It is the same here.


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [LSF/MM/BPF BoF] BoF for Zoned Storage
  2022-03-03  6:29   ` Javier González
  2022-03-03  7:54     ` Pankaj Raghav
  2022-03-03  9:49     ` Damien Le Moal
@ 2022-03-03 16:12     ` Himanshu Madhani
  2 siblings, 0 replies; 59+ messages in thread
From: Himanshu Madhani @ 2022-03-03 16:12 UTC (permalink / raw)
  To: Javier González
  Cc: Luis Chamberlain, linux-block, linux-fsdevel, lsf-pc,
	Matias Bjørling, Damien Le Moal, Bart Van Assche,
	Adam Manzanares, Keith Busch, Johannes Thumshirn, Naohiro Aota,
	Pankaj Raghav, Kanchan Joshi, Nitesh Shetty



> On Mar 2, 2022, at 10:29 PM, Javier González <javier@javigon.com> wrote:
> 
> On 03.03.2022 06:32, Javier González wrote:
>> 
>>> On 3 Mar 2022, at 04.24, Luis Chamberlain <mcgrof@kernel.org> wrote:
>>> 
>>> Thinking proactively about LSFMM, regarding just Zone storage..
>>> 
>>> I'd like to propose a BoF for Zoned Storage. The point of it is
>>> to address the existing point points we have and take advantage of
>>> having folks in the room we can likely settle on things faster which
>>> otherwise would take years.
>>> 
>>> I'll throw at least one topic out:
>>> 
>>> * Raw access for zone append for microbenchmarks:
>>>     - are we really happy with the status quo?
>>>   - if not what outlets do we have?
>>> 
>>> I think the nvme passthrogh stuff deserves it's own shared
>>> discussion though and should not make it part of the BoF.
>>> 
>>> Luis
>> 
>> Thanks for proposing this, Luis.
>> 
>> I’d like to join this discussion too.
>> 
>> Thanks,
>> Javier
> 
> Let me expand a bit on this. There is one topic that I would like to
> cover in this session:
> 
>  - PO2 zone sizes
>      In the past weeks we have been talking to Damien and Matias around
>      the constraint that we currently have for PO2 zone sizes. While
>      this has not been an issue for SMR HDDs, the gap that ZNS
>      introduces between zone capacity and zone size causes holes in the
>      address space. This unmapped LBA space has been the topic of
>      discussion with several ZNS adopters.
> 
>      One of the things to note here is that even if the zone size is a
>      PO2, the zone capacity is typically not. This means that even when
>      we can use shifts to move around zones, the actual data placement
>      algorithms need to deal with arbitrary sizes. So at the end of the
>      day applications that use a contiguous address space - like in a
>      conventional block device -, will have to deal with this.
> 
>      Since chunk_sectors is no longer required to be a PO2, we have
>      started the work in removing this constraint. We are working in 2
>      phases:
> 
>        1. Add an emulation layer in NVMe driver to simulate PO2 devices
> 	when the HW presents a zone_capacity = zone_size. This is a
> 	product of one of Damien's early concerns about supporting
> 	existing applications and FSs that work under the PO2
> 	assumption. We will post these patches in the next few days.
> 
>        2. Remove the PO2 constraint from the block layer and add
> 	support for arbitrary zone support in btrfs. This will allow the
> 	raw block device to be present for arbitrary zone sizes (and
> 	capacities) and btrfs will be able to use it natively.
> 
> 	For completeness, F2FS works natively in PO2 zone sizes, so we
> 	will not do work here for now, as the changes will not bring any
> 	benefit. For F2FS, the emulation layer will help use devices
> 	that do not have PO2 zone sizes.
> 
>     We are working towards having at least a RFC of (2) before LSF/MM.
>     Since this is a topic that involves several parties across the
>     stack, I believe that a F2F conversation will help laying the path
>     forward.
> 
> Thanks,
> Javier
> 

I am working on Zoned storage for some time as well. I would like to be part of this discussion as well. 

Thanks! 

--
Himanshu Madhani	 Oracle Linux Engineering


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [LSF/MM/BPF BoF] BoF for Zoned Storage
  2022-03-03 15:22         ` Damien Le Moal
@ 2022-03-03 17:10           ` Adam Manzanares
  2022-03-03 19:51             ` Matias Bjørling
  0 siblings, 1 reply; 59+ messages in thread
From: Adam Manzanares @ 2022-03-03 17:10 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Javier González, Luis Chamberlain, linux-block,
	linux-fsdevel, lsf-pc, Matias Bjørling, Bart Van Assche,
	Keith Busch, Johannes Thumshirn, Naohiro Aota, Pankaj Raghav,
	Kanchan Joshi, Nitesh Shetty

On Thu, Mar 03, 2022 at 03:22:52PM +0000, Damien Le Moal wrote:
> On 2022/03/03 16:55, Adam Manzanares wrote:
> > On Thu, Mar 03, 2022 at 09:49:06AM +0000, Damien Le Moal wrote:
> >> On 2022/03/03 8:29, Javier González wrote:
> >>> On 03.03.2022 06:32, Javier González wrote:
> >>>>
> >>>>> On 3 Mar 2022, at 04.24, Luis Chamberlain <mcgrof@kernel.org> wrote:
> >>>>>
> >>>>> Thinking proactively about LSFMM, regarding just Zone storage..
> >>>>>
> >>>>> I'd like to propose a BoF for Zoned Storage. The point of it is
> >>>>> to address the existing point points we have and take advantage of
> >>>>> having folks in the room we can likely settle on things faster which
> >>>>> otherwise would take years.
> >>>>>
> >>>>> I'll throw at least one topic out:
> >>>>>
> >>>>>  * Raw access for zone append for microbenchmarks:
> >>>>>      - are we really happy with the status quo?
> >>>>>    - if not what outlets do we have?
> >>>>>
> >>>>> I think the nvme passthrogh stuff deserves it's own shared
> >>>>> discussion though and should not make it part of the BoF.
> >>>>>
> >>>>>  Luis
> >>>>
> >>>> Thanks for proposing this, Luis.
> >>>>
> >>>> I’d like to join this discussion too.
> >>>>
> >>>> Thanks,
> >>>> Javier
> >>>
> >>> Let me expand a bit on this. There is one topic that I would like to
> >>> cover in this session:
> >>>
> >>>    - PO2 zone sizes
> >>>        In the past weeks we have been talking to Damien and Matias around
> >>>        the constraint that we currently have for PO2 zone sizes. While
> >>>        this has not been an issue for SMR HDDs, the gap that ZNS
> >>>        introduces between zone capacity and zone size causes holes in the
> >>>        address space. This unmapped LBA space has been the topic of
> >>>        discussion with several ZNS adopters.
> >>>
> >>>        One of the things to note here is that even if the zone size is a
> >>>        PO2, the zone capacity is typically not. This means that even when
> >>>        we can use shifts to move around zones, the actual data placement
> >>>        algorithms need to deal with arbitrary sizes. So at the end of the
> >>>        day applications that use a contiguous address space - like in a
> >>>        conventional block device -, will have to deal with this.
> >>
> >> "the actual data placement algorithms need to deal with arbitrary sizes"
> >>
> >> ???
> >>
> >> No it does not. With zone cap < zone size, the amount of sectors that can be
> >> used within a zone may be smaller than the zone size, but:
> >> 1) Writes still must be issued at the WP location so choosing a zone for writing
> >> data has the same constraint regardless of the zone capacity: Do I have enough
> >> usable sectors left in the zone ?
> > 
> > Are you saying holes are irrelevant because an application has to know the 
> > status of a zone by querying the device for the zone status before using a zone
> > and at that point it should know a start LBA? I see your point here but we have
> > to assume things to arrive at this conclusion.
> 
> Of course holes are relevant. But their presence does not complicate anything
> because the basic management of zones already has to deal with 2 sector ranges
> in any zone: sectors that have already been written and the one that have not.
> The "hole" for zone capacity != zone size case is simply another range to be
> ignored.
> 
> And the only thing I am assuming here is that the software has a decent design,
> that is, it is indeed zone aware and manages zones (their state and wp
> position). That does not mean that one needs to do a report zones before every
> IO (well, dumb application can do that if they want). Zone management is
> initialized using a report zone command information but can be then be cached on
> the host dram in any form suitable for the application.
> 
> > 
> > Let's think of another scenario where the drive is managed by a user space 
> > application that knows the status of zones and picks a zone because it knows 
> > it is free. To calculate the start offset in terms of LBAs the application has 
> > to use the difference in zone_size and zone_cap to calculate the write offset
> > in terms of LBAs. 
> 
> What ? This does not make sense. The application simply needs to know the
> current "soft" wp position and issue writes at that position and increment it
> right away with the number of sectors written. Once that position reaches zone
> cap, the zone is full. The hole behind that can be ignored. What is difficult
> with this ? This is zone storage use 101.

Sounds like you voluntered to teach zoned storage use 101. Can you teach me how
to calculate an LBA offset given a zone number when zone capacity is not equal
to zone size?

The second thing I would like to know is what happens when an application wants
to map an object that spans multiple consecutive zones. Does the application 
have to be aware of the difference in zone capacity and zone size?

> 
> > My argument is that the zone_size is a construct conceived to make a ZNS zone
> > a power of 2 that creates a hole in the LBA space. Applications don't want
> > to deal with the power of 2 constraint and neither do devices. It seems like
> > the existing zoned kernel infrastructure, which made sense for SMR, pushed 
> > this constraint onto devices and onto users. Arguments can be made for where 
> > complexity should lie, but I don't think this decision made things easier for
> > someone to use a ZNS SSD as a block device.
> 
> "Applications don't want to deal with the power of 2 constraint"
> 
> Well, we definitely are not talking to the same users then. Because I heard the
> contrary from users who have actually deployed zoned storage at scale. And there
> is nothing to deal with power of 2. This is not a constraint in itself. A
> particular zone size is the constraint and for that, users are indeed never
> satisfied (some want smaller zones, other bigger zones). So far, power of 2 size
> has been mostly irrelevant or actually required because everybody understands
> the CPU load benefits of bit shift arithmetic as opposed to CPU cycle hungry
> multiplications and divisions.

You are thinking from a kernel perspective you are potentially pushing 
additional multiplications onto users. This should be clear if we learn more 
about zoned storage 101 in this thread.

> 
> > 
> >> 2) Reading after the WP is not useful (if not outright stupid), regardless of
> >> where the last usable sector in the zone is (at zone start + zone size or at
> >> zone start + zone cap).
> > 
> > Of course but the with po2 you force useless LBA space even if you fill a zone.
> 
> And my point is: so what ? I do not see this as a problem given that accesses
> must be zone based anyway.
> 
> >> And talking about "use a contiguous address space" is in my opinion nonsense in
> >> the context of zoned storage since by definition, everything has to be managed
> >> using zones as units. The only sensible range for a "contiguous address space"
> >> is "zone start + min(zone cap, zone size)".
> > 
> > Definitely disagree with this given previous arguments. This is a construct 
> > forced upon us because of zoned storage legacy.
> 
> What construct ? The zone is the unit. No matter its size, it *must* remain the
> access management unit for the zoned software top be correct. Thinking that one
> can correctly implement a zone compliant application, or any piece of software,
> without managing zones and using them as the storage unit is in my opinion a bad
> design bound to fail.
> 

Forcing a zone to be power of 2 size. For NAND it is something that it is 
not. Capacity vs size doesn't solve any real problem other than making ZNS fit
the zoned model that was conceived for HDDs.

> I may be wrong, of course, but I still have to be proven so by an actual use case.
> 
> > 
> >>
> >>>        Since chunk_sectors is no longer required to be a PO2, we have
> >>>        started the work in removing this constraint. We are working in 2
> >>>        phases:
> >>>
> >>>          1. Add an emulation layer in NVMe driver to simulate PO2 devices
> >>> 	when the HW presents a zone_capacity = zone_size. This is a
> >>> 	product of one of Damien's early concerns about supporting
> >>> 	existing applications and FSs that work under the PO2
> >>> 	assumption. We will post these patches in the next few days.
> >>>
> >>>          2. Remove the PO2 constraint from the block layer and add
> >>> 	support for arbitrary zone support in btrfs. This will allow the
> >>> 	raw block device to be present for arbitrary zone sizes (and
> >>> 	capacities) and btrfs will be able to use it natively.
> >>
> >> Zone sizes cannot be arbitrary in btrfs since block groups must be a multiple of
> >> 64K. So constraints remain and should be enforced, at least by btrfs.
> > 
> > I don't think we should base a lot of decisions on the work that has gone into 
> > btrfs. I think it is very promising, but I don't think it is settled that it 
> > is the only way people will consume ZNS SSDs.
> 
> Of course it is not. But not satisfying this constraint essentially disables
> btrfs support. Ever heard of a regular block device that you cannot format with
> ext4 or xfs ? It is the same here.
> 
> 
> -- 
> Damien Le Moal
> Western Digital Research

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [LSF/MM/BPF BoF] BoF for Zoned Storage
  2022-03-03  0:56 [LSF/MM/BPF BoF] BoF for Zoned Storage Luis Chamberlain
                   ` (5 preceding siblings ...)
  2022-03-03  8:43 ` Johannes Thumshirn
@ 2022-03-03 18:20 ` Viacheslav Dubeyko
  2022-03-04  0:10 ` Dave Chinner
  2022-03-15 18:08 ` [EXT] " Luca Porzio (lporzio)
  8 siblings, 0 replies; 59+ messages in thread
From: Viacheslav Dubeyko @ 2022-03-03 18:20 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: linux-block, Linux FS Devel, lsf-pc, Matias Bjørling,
	Javier González, Damien Le Moal, Bart Van Assche,
	Adam Manzanares, Keith Busch, Johannes Thumshirn, Naohiro Aota,
	Pankaj Raghav, Kanchan Joshi, Nitesh Shetty,
	Viacheslav A. Dubeyko



> On Mar 2, 2022, at 4:56 PM, Luis Chamberlain <mcgrof@kernel.org> wrote:
> 
> Thinking proactively about LSFMM, regarding just Zone storage..
> 
> I'd like to propose a BoF for Zoned Storage. The point of it is
> to address the existing point points we have and take advantage of
> having folks in the room we can likely settle on things faster which
> otherwise would take years.
> 
> I'll throw at least one topic out:
> 
>  * Raw access for zone append for microbenchmarks:
>  	- are we really happy with the status quo?
> 	- if not what outlets do we have?
> 
> I think the nvme passthrogh stuff deserves it's own shared
> discussion though and should not make it part of the BoF.
> 
>  Luis

I am working on zone-aware file system. So, I would be really happy to participate.

Thanks,
Slava.


^ permalink raw reply	[flat|nested] 59+ messages in thread

* RE: [LSF/MM/BPF BoF] BoF for Zoned Storage
  2022-03-03 17:10           ` Adam Manzanares
@ 2022-03-03 19:51             ` Matias Bjørling
  2022-03-03 20:18               ` Adam Manzanares
  0 siblings, 1 reply; 59+ messages in thread
From: Matias Bjørling @ 2022-03-03 19:51 UTC (permalink / raw)
  To: Adam Manzanares, Damien Le Moal
  Cc: Javier González, Luis Chamberlain, linux-block,
	linux-fsdevel, lsf-pc, Bart Van Assche, Keith Busch,
	Johannes Thumshirn, Naohiro Aota, Pankaj Raghav, Kanchan Joshi,
	Nitesh Shetty

> Sounds like you voluntered to teach zoned storage use 101. Can you teach me
> how to calculate an LBA offset given a zone number when zone capacity is not
> equal to zone size?

zonesize_pow = x; // e.g., x = 32 if 2GiB Zone size /w 512B block size
zone_id = y; // valid zone id

struct blk_zone zone = zones[zone_id]; // zones is a linear array of blk_zone structs that holds per zone information.

With that, one can do the following
1a) first_lba_of_zone =  zone_id << zonesize_pow;
1b) first_lba_of_zone = zone.start;
2a) next_writeable_lba = (zoneid << zonesize_pow) + zone.wp;
2b) next_writeable_lba = zone.start + zone.wp;
3)   writeable_lbas_left = zone.len - zone.wp;
4)   lbas_written = zone.wp - 1;

> The second thing I would like to know is what happens when an application
> wants to map an object that spans multiple consecutive zones. Does the
> application have to be aware of the difference in zone capacity and zone size?

The zoned namespace command set specification does not allow variable zone size. The zone size is fixed for all zones in a namespace. Only the zone capacity has the capability to be variable. Usually, the zone capacity is fixed, I have not yet seen implementations that have variable zone capacities.

An application that wants to place a single object across a set of zones would have to be explicitly handled by the application. E.g., as well as the application, should be aware of a zone's capacity, it should also be aware that it should reset the set of zones and not a single zone. I.e., the application must always be aware of the zones it uses.

However, an end-user application should not (in my opinion) have to deal with this. It should use helper functions from a library that provides the appropriate abstraction to the application, such that the applications don't have to care about either specific zone capacity/size, or multiple resets. This is similar to how file systems work with file system semantics. For example, a file can span multiple extents on disk, but all an application sees is the file semantics. 


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [LSF/MM/BPF BoF] BoF for Zoned Storage
  2022-03-03 19:51             ` Matias Bjørling
@ 2022-03-03 20:18               ` Adam Manzanares
  2022-03-03 21:08                 ` Javier González
  2022-03-03 21:33                 ` Matias Bjørling
  0 siblings, 2 replies; 59+ messages in thread
From: Adam Manzanares @ 2022-03-03 20:18 UTC (permalink / raw)
  To: Matias Bjørling
  Cc: Damien Le Moal, Javier González, Luis Chamberlain,
	linux-block, linux-fsdevel, lsf-pc, Bart Van Assche, Keith Busch,
	Johannes Thumshirn, Naohiro Aota, Pankaj Raghav, Kanchan Joshi,
	Nitesh Shetty

On Thu, Mar 03, 2022 at 07:51:36PM +0000, Matias Bjørling wrote:
> > Sounds like you voluntered to teach zoned storage use 101. Can you teach me
> > how to calculate an LBA offset given a zone number when zone capacity is not
> > equal to zone size?
> 
> zonesize_pow = x; // e.g., x = 32 if 2GiB Zone size /w 512B block size
> zone_id = y; // valid zone id
> 
> struct blk_zone zone = zones[zone_id]; // zones is a linear array of blk_zone structs that holds per zone information.
> 
> With that, one can do the following
> 1a) first_lba_of_zone =  zone_id << zonesize_pow;
> 1b) first_lba_of_zone = zone.start;

1b is interesting. What happens if i don't have struct blk_zone and zone size 
is not equal to zone capacity?

> 2a) next_writeable_lba = (zoneid << zonesize_pow) + zone.wp;
> 2b) next_writeable_lba = zone.start + zone.wp;

Can we modify 2b to not use zone.start?

> 3)   writeable_lbas_left = zone.len - zone.wp;
> 4)   lbas_written = zone.wp - 1;
> 
> > The second thing I would like to know is what happens when an application
> > wants to map an object that spans multiple consecutive zones. Does the
> > application have to be aware of the difference in zone capacity and zone size?
> 
> The zoned namespace command set specification does not allow variable zone size. The zone size is fixed for all zones in a namespace. Only the zone capacity has the capability to be variable. Usually, the zone capacity is fixed, I have not yet seen implementations that have variable zone capacities.
> 

IDK where variable zone size came from. I am talking about the fact that the 
zone size does not have to equal zone capacity. 

> An application that wants to place a single object across a set of zones would have to be explicitly handled by the application. E.g., as well as the application, should be aware of a zone's capacity, it should also be aware that it should reset the set of zones and not a single zone. I.e., the application must always be aware of the zones it uses.
> 
> However, an end-user application should not (in my opinion) have to deal with this. It should use helper functions from a library that provides the appropriate abstraction to the application, such that the applications don't have to care about either specific zone capacity/size, or multiple resets. This is similar to how file systems work with file system semantics. For example, a file can span multiple extents on disk, but all an application sees is the file semantics. 
> 

I don't want to go so far as to say what the end user application should and 
should not do.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [LSF/MM/BPF BoF] BoF for Zoned Storage
  2022-03-03 20:18               ` Adam Manzanares
@ 2022-03-03 21:08                 ` Javier González
  2022-03-03 21:33                 ` Matias Bjørling
  1 sibling, 0 replies; 59+ messages in thread
From: Javier González @ 2022-03-03 21:08 UTC (permalink / raw)
  To: Adam Manzanares
  Cc: Matias Bjørling, Damien Le Moal, Luis Chamberlain,
	linux-block, linux-fsdevel, lsf-pc, Bart Van Assche, Keith Busch,
	Johannes Thumshirn, Naohiro Aota, Pankaj Raghav, Kanchan Joshi,
	Nitesh Shetty


> On 3 Mar 2022, at 21.18, Adam Manzanares <a.manzanares@samsung.com> wrote:
> 
> On Thu, Mar 03, 2022 at 07:51:36PM +0000, Matias Bjørling wrote:
>>> Sounds like you voluntered to teach zoned storage use 101. Can you teach me
>>> how to calculate an LBA offset given a zone number when zone capacity is not
>>> equal to zone size?
>> 
>> zonesize_pow = x; // e.g., x = 32 if 2GiB Zone size /w 512B block size
>> zone_id = y; // valid zone id
>> 
>> struct blk_zone zone = zones[zone_id]; // zones is a linear array of blk_zone structs that holds per zone information.
>> 
>> With that, one can do the following
>> 1a) first_lba_of_zone =  zone_id << zonesize_pow;
>> 1b) first_lba_of_zone = zone.start;
> 
> 1b is interesting. What happens if i don't have struct blk_zone and zone size 
> is not equal to zone capacity?
> 
>> 2a) next_writeable_lba = (zoneid << zonesize_pow) + zone.wp;
>> 2b) next_writeable_lba = zone.start + zone.wp;
> 
> Can we modify 2b to not use zone.start?
> 
>> 3)   writeable_lbas_left = zone.len - zone.wp;
>> 4)   lbas_written = zone.wp - 1;
>> 
>>> The second thing I would like to know is what happens when an application
>>> wants to map an object that spans multiple consecutive zones. Does the
>>> application have to be aware of the difference in zone capacity and zone size?
>> 
>> The zoned namespace command set specification does not allow variable zone size. The zone size is fixed for all zones in a namespace. Only the zone capacity has the capability to be variable. Usually, the zone capacity is fixed, I have not yet seen implementations that have variable zone capacities.
>> 
> 
> IDK where variable zone size came from. I am talking about the fact that the 
> zone size does not have to equal zone capacity. 
> 
>> An application that wants to place a single object across a set of zones would have to be explicitly handled by the application. E.g., as well as the application, should be aware of a zone's capacity, it should also be aware that it should reset the set of zones and not a single zone. I.e., the application must always be aware of the zones it uses.
>> 
>> However, an end-user application should not (in my opinion) have to deal with this. It should use helper functions from a library that provides the appropriate abstraction to the application, such that the applications don't have to care about either specific zone capacity/size, or multiple resets. This is similar to how file systems work with file system semantics. For example, a file can span multiple extents on disk, but all an application sees is the file semantics. 
>> 
> 
> I don't want to go so far as to say what the end user application should and 
> should not do.

Adam, Matias, Damien,

Trying to bring us back to the original proposal. 

I believe we all can agree that applications and file-systems that work in objects / extents / segments of PO2 can benefit from defining the zone boundary at a PO2. Based on the code I have seen so far, these applications will still have to deal with the zone capacity. So if an application of FS needs to align to a certain size, it is the capacity that will have to be considered. Since there are plenty users, I am sure there are examples where this does not apply. 

In my view, the point to remove this constraint is that there are users that can deal with !PO2 zone sizes and imposing the unmapped LBAs for them is creating unnecessary hassle. This hurts the zoned ecosystem and therefore adoption. 

Even when we remove PO2 zone sizes, devices exposing PO2 zone sizes will of course be supported, and probably preferred for the use-cases that make sense. 

As we start to post patches, I hope these points become more clear. 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* RE: [LSF/MM/BPF BoF] BoF for Zoned Storage
  2022-03-03 20:18               ` Adam Manzanares
  2022-03-03 21:08                 ` Javier González
@ 2022-03-03 21:33                 ` Matias Bjørling
  2022-03-04 20:12                   ` Luis Chamberlain
  1 sibling, 1 reply; 59+ messages in thread
From: Matias Bjørling @ 2022-03-03 21:33 UTC (permalink / raw)
  To: Adam Manzanares
  Cc: Damien Le Moal, Javier González, Luis Chamberlain,
	linux-block, linux-fsdevel, lsf-pc, Bart Van Assche, Keith Busch,
	Johannes Thumshirn, Naohiro Aota, Pankaj Raghav, Kanchan Joshi,
	Nitesh Shetty

> -----Original Message-----
> From: Adam Manzanares <a.manzanares@samsung.com>
> Sent: Thursday, 3 March 2022 21.19
> To: Matias Bjørling <Matias.Bjorling@wdc.com>
> Cc: Damien Le Moal <Damien.LeMoal@wdc.com>; Javier González
> <javier@javigon.com>; Luis Chamberlain <mcgrof@kernel.org>; linux-
> block@vger.kernel.org; linux-fsdevel@vger.kernel.org; lsf-pc@lists.linux-
> foundation.org; Bart Van Assche <bvanassche@acm.org>; Keith Busch
> <Keith.Busch@wdc.com>; Johannes Thumshirn
> <Johannes.Thumshirn@wdc.com>; Naohiro Aota <Naohiro.Aota@wdc.com>;
> Pankaj Raghav <pankydev8@gmail.com>; Kanchan Joshi
> <joshi.k@samsung.com>; Nitesh Shetty <nj.shetty@samsung.com>
> Subject: Re: [LSF/MM/BPF BoF] BoF for Zoned Storage
> 
> On Thu, Mar 03, 2022 at 07:51:36PM +0000, Matias Bjørling wrote:
> > > Sounds like you voluntered to teach zoned storage use 101. Can you
> > > teach me how to calculate an LBA offset given a zone number when
> > > zone capacity is not equal to zone size?
> >
> > zonesize_pow = x; // e.g., x = 32 if 2GiB Zone size /w 512B block size
> > zone_id = y; // valid zone id
> >
> > struct blk_zone zone = zones[zone_id]; // zones is a linear array of blk_zone
> structs that holds per zone information.
> >
> > With that, one can do the following
> > 1a) first_lba_of_zone =  zone_id << zonesize_pow;
> > 1b) first_lba_of_zone = zone.start;
> 
> 1b is interesting. What happens if i don't have struct blk_zone and zone size is
> not equal to zone capacity?

struct blk_zone could be what one likes it to be. It is just a data structure that captures key information about a zone. A zone's start address is orthogonal to a zone's writeable capacity.

> 
> > 2a) next_writeable_lba = (zoneid << zonesize_pow) + zone.wp;
> > 2b) next_writeable_lba = zone.start + zone.wp;
> 
> Can we modify 2b to not use zone.start?

Yes - use 2a.

> 
> > 3)   writeable_lbas_left = zone.len - zone.wp;
> > 4)   lbas_written = zone.wp - 1;
> >
> > > The second thing I would like to know is what happens when an
> > > application wants to map an object that spans multiple consecutive
> > > zones. Does the application have to be aware of the difference in zone
> capacity and zone size?
> >
> > The zoned namespace command set specification does not allow variable
> zone size. The zone size is fixed for all zones in a namespace. Only the zone
> capacity has the capability to be variable. Usually, the zone capacity is fixed, I
> have not yet seen implementations that have variable zone capacities.
> >
> 
> IDK where variable zone size came from. I am talking about the fact that the
> zone size does not have to equal zone capacity.

Ok. Yes, an application should be aware how its managing a zone - similar to that it has to have logic that knows that a zone must be reset.

> 
> > An application that wants to place a single object across a set of zones would
> have to be explicitly handled by the application. E.g., as well as the application,
> should be aware of a zone's capacity, it should also be aware that it should
> reset the set of zones and not a single zone. I.e., the application must always be
> aware of the zones it uses.
> >
> > However, an end-user application should not (in my opinion) have to deal
> with this. It should use helper functions from a library that provides the
> appropriate abstraction to the application, such that the applications don't
> have to care about either specific zone capacity/size, or multiple resets. This is
> similar to how file systems work with file system semantics. For example, a file
> can span multiple extents on disk, but all an application sees is the file
> semantics.
> >
> 
> I don't want to go so far as to say what the end user application should and
> should not do.

Consider it as a best practice example. Another typical example is that one should avoid extensive flushes to disk if the application doesn't need persistence for each I/O it issues. 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [LSF/MM/BPF BoF] BoF for Zoned Storage
  2022-03-03  0:56 [LSF/MM/BPF BoF] BoF for Zoned Storage Luis Chamberlain
                   ` (6 preceding siblings ...)
  2022-03-03 18:20 ` Viacheslav Dubeyko
@ 2022-03-04  0:10 ` Dave Chinner
  2022-03-04 22:10   ` Luis Chamberlain
  2022-03-15 18:08 ` [EXT] " Luca Porzio (lporzio)
  8 siblings, 1 reply; 59+ messages in thread
From: Dave Chinner @ 2022-03-04  0:10 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: linux-block, linux-fsdevel, lsf-pc, Matias Bjørling,
	Javier González, Damien Le Moal, Bart Van Assche,
	Adam Manzanares, Keith Busch, Johannes Thumshirn, Naohiro Aota,
	Pankaj Raghav, Kanchan Joshi, Nitesh Shetty

On Wed, Mar 02, 2022 at 04:56:54PM -0800, Luis Chamberlain wrote:
> Thinking proactively about LSFMM, regarding just Zone storage..
> 
> I'd like to propose a BoF for Zoned Storage. The point of it is
> to address the existing point points we have and take advantage of
> having folks in the room we can likely settle on things faster which
> otherwise would take years.
> 
> I'll throw at least one topic out:
> 
>   * Raw access for zone append for microbenchmarks:
>   	- are we really happy with the status quo?
> 	- if not what outlets do we have?
> 
> I think the nvme passthrogh stuff deserves it's own shared
> discussion though and should not make it part of the BoF.

Reading through the discussion on this thread, perhaps this session
should be used to educate application developers about how to use
ZoneFS so they never need to manage low level details of zone
storage such as enumerating zones, controlling write pointers
safely for concurrent IO, performing zone resets, etc.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [LSF/MM/BPF BoF] BoF for Zoned Storage
  2022-03-03 21:33                 ` Matias Bjørling
@ 2022-03-04 20:12                   ` Luis Chamberlain
  2022-03-06 23:54                     ` Damien Le Moal
  0 siblings, 1 reply; 59+ messages in thread
From: Luis Chamberlain @ 2022-03-04 20:12 UTC (permalink / raw)
  To: Matias Bjørling
  Cc: Adam Manzanares, Damien Le Moal, Javier González,
	linux-block, linux-fsdevel, lsf-pc, Bart Van Assche, Keith Busch,
	Johannes Thumshirn, Naohiro Aota, Pankaj Raghav, Kanchan Joshi,
	Nitesh Shetty

On Thu, Mar 03, 2022 at 09:33:06PM +0000, Matias Bjørling wrote:
> > -----Original Message-----
> > From: Adam Manzanares <a.manzanares@samsung.com>
> > However, an end-user application should not (in my opinion) have to deal
> > with this. It should use helper functions from a library that provides the
> > appropriate abstraction to the application, such that the applications don't
> > have to care about either specific zone capacity/size, or multiple resets. This is
> > similar to how file systems work with file system semantics. For example, a file
> > can span multiple extents on disk, but all an application sees is the file
> > semantics.
> > >
> > 
> > I don't want to go so far as to say what the end user application should and
> > should not do.
> 
> Consider it as a best practice example. Another typical example is
> that one should avoid extensive flushes to disk if the application
> doesn't need persistence for each I/O it issues. 

Although I was sad to see there was no raw access to a block zoned
storage device, the above makes me kind of happy that this is the case
today. Why? Because there is an implicit requirement on management of
data on zone storage devices outside of regular storage SSDs, and if
its not considered and *very well documented*, in agreement with us
all, we can end up with folks slightly surprised with these
requirements.

An application today can't directly manage these objects so that's not
even possible today. And in fact it's not even clear if / how we'll get
there.

So in the meantime the only way to access zones directly, if an application
wants anything close as possible to the block layer, the only way is
through the VFS through zonefs. I can hear people cringing even if you
are miles away. If we want an improvement upon this, whatever API we come
up with we *must* clearly embrace and document the requirements /
responsiblities above.

From what I read, the unmapped LBA problem can be observed as a
non-problem *iff* users are willing to deal with the above. We seem to
have disagreement on the expection from users.

Any way, there are two aspects to what Javier was mentioning and I think
it is *critial* to separate them:

 a) emulation should be possible given the nature of NAND
 b) The PO2 requirement exists, is / should it exist forever?

The discussion around these two throws drew in a third aspect:

c) Applications which want to deal with LBAs directly on
NVMe ZNS drives must be aware of the ZNS design and deal with
it diretly or indirectly in light of the unmapped LBAs which
are caused by the differences between zone sizes, zone capacity,
how objects can span multiple zones, zone resets, etc.

I think a) is easier to swallow and accept provided there is
no impact on existing users. b) and c) are things which I think
could be elaborated a bit more at LSFMM through community dialog.

  Luis

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [LSF/MM/BPF BoF] BoF for Zoned Storage
  2022-03-04  0:10 ` Dave Chinner
@ 2022-03-04 22:10   ` Luis Chamberlain
  2022-03-04 22:42     ` Dave Chinner
  2022-03-06 23:56     ` Damien Le Moal
  0 siblings, 2 replies; 59+ messages in thread
From: Luis Chamberlain @ 2022-03-04 22:10 UTC (permalink / raw)
  To: Dave Chinner
  Cc: linux-block, linux-fsdevel, lsf-pc, Matias Bjørling,
	Javier González, Damien Le Moal, Bart Van Assche,
	Adam Manzanares, Keith Busch, Johannes Thumshirn, Naohiro Aota,
	Pankaj Raghav, Kanchan Joshi, Nitesh Shetty

On Fri, Mar 04, 2022 at 11:10:22AM +1100, Dave Chinner wrote:
> On Wed, Mar 02, 2022 at 04:56:54PM -0800, Luis Chamberlain wrote:
> > Thinking proactively about LSFMM, regarding just Zone storage..
> > 
> > I'd like to propose a BoF for Zoned Storage. The point of it is
> > to address the existing point points we have and take advantage of
> > having folks in the room we can likely settle on things faster which
> > otherwise would take years.
> > 
> > I'll throw at least one topic out:
> > 
> >   * Raw access for zone append for microbenchmarks:
> >   	- are we really happy with the status quo?
> > 	- if not what outlets do we have?
> > 
> > I think the nvme passthrogh stuff deserves it's own shared
> > discussion though and should not make it part of the BoF.
> 
> Reading through the discussion on this thread, perhaps this session
> should be used to educate application developers about how to use
> ZoneFS so they never need to manage low level details of zone
> storage such as enumerating zones, controlling write pointers
> safely for concurrent IO, performing zone resets, etc.

I'm not even sure users are really aware that given cap can be different
than zone size and btrfs uses zone size to compute size, the size is a
flat out lie.

modprobe null_blk nr_devices=0
mkdir /sys/kernel/config/nullb/nullb0
echo 0 > /sys/kernel/config/nullb/nullb0/completion_nsec
echo 0 > /sys/kernel/config/nullb/nullb0/irqmode
echo 2 > /sys/kernel/config/nullb/nullb0/queue_mode
echo 1024 > /sys/kernel/config/nullb/nullb0/hw_queue_depth
echo 1 > /sys/kernel/config/nullb/nullb0/memory_backed
echo 1 > /sys/kernel/config/nullb/nullb0/zoned

echo 128 > /sys/kernel/config/nullb/nullb0/zone_size
# 6 zones are implied, we are saying 768 for the full storage size..
# but...
echo 768 > /sys/kernel/config/nullb/nullb0/size

# If we force capacity to be way less than the zone sizes, btrfs still
# uses the zone size to do its data / metadata size computation...
echo 32 > /sys/kernel/config/nullb/nullb0/zone_capacity

# No conventional zones
echo 0 > /sys/kernel/config/nullb/nullb0/zone_nr_conv

echo 1 > /sys/kernel/config/nullb/nullb0/power
echo mq-deadline > /sys/block/nullb0/queue/scheduler

# mkfs.btrfs -f -d single -m single /dev/nullb0
Label:              (null)
UUID:               e725782a-d2d3-4c02-97fd-0501de117323
Node size:          16384
Sector size:        4096
Filesystem size:    768.00MiB
Block group profiles:
  Data:             single          128.00MiB
    Metadata:         single          128.00MiB
      System:           single          128.00MiB
      SSD detected:       yes
      Zoned device:       yes
        Zone size:        128.00MiB
	Incompat features:  extref, skinny-metadata, no-holes, zoned
	Runtime features:   free-space-tree
	Checksum:           crc32c
	Number of devices:  1
	Devices:
	   ID        SIZE  PATH
	       1   768.00MiB  /dev/nullb0

# mount /dev/nullb0 /mnt
# btrfs fi show
Label: none  uuid: e725782a-d2d3-4c02-97fd-0501de117323
        Total devices 1 FS bytes used 144.00KiB
	        devid    1 size 768.00MiB used 384.00MiB path
		/dev/nullb0

# btrfs fi df /mnt
Data, single: total=128.00MiB, used=0.00B
System, single: total=128.00MiB, used=16.00KiB
Metadata, single: total=128.00MiB, used=128.00KiB
GlobalReserve, single: total=3.50MiB, used=0.00B

Since btrfs already has "real size" problems this existing
design takes this a bit further without a fix either. I suspect
quite a bit of puzzled users will be unhappy that even though
ZNS claims to kill overprovisioning we're now somehow lying
about size. I'm not even sure this might be good for the
filesystem / metadata.

  Luis

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [LSF/MM/BPF BoF] BoF for Zoned Storage
  2022-03-04 22:10   ` Luis Chamberlain
@ 2022-03-04 22:42     ` Dave Chinner
  2022-03-04 22:55       ` Luis Chamberlain
  2022-03-06 23:56     ` Damien Le Moal
  1 sibling, 1 reply; 59+ messages in thread
From: Dave Chinner @ 2022-03-04 22:42 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: linux-block, linux-fsdevel, lsf-pc, Matias Bjørling,
	Javier González, Damien Le Moal, Bart Van Assche,
	Adam Manzanares, Keith Busch, Johannes Thumshirn, Naohiro Aota,
	Pankaj Raghav, Kanchan Joshi, Nitesh Shetty

On Fri, Mar 04, 2022 at 02:10:08PM -0800, Luis Chamberlain wrote:
> On Fri, Mar 04, 2022 at 11:10:22AM +1100, Dave Chinner wrote:
> > On Wed, Mar 02, 2022 at 04:56:54PM -0800, Luis Chamberlain wrote:
> > > Thinking proactively about LSFMM, regarding just Zone storage..
> > > 
> > > I'd like to propose a BoF for Zoned Storage. The point of it is
> > > to address the existing point points we have and take advantage of
> > > having folks in the room we can likely settle on things faster which
> > > otherwise would take years.
> > > 
> > > I'll throw at least one topic out:
> > > 
> > >   * Raw access for zone append for microbenchmarks:
> > >   	- are we really happy with the status quo?
> > > 	- if not what outlets do we have?
> > > 
> > > I think the nvme passthrogh stuff deserves it's own shared
> > > discussion though and should not make it part of the BoF.
> > 
> > Reading through the discussion on this thread, perhaps this session
> > should be used to educate application developers about how to use
> > ZoneFS so they never need to manage low level details of zone
> > storage such as enumerating zones, controlling write pointers
> > safely for concurrent IO, performing zone resets, etc.
> 
> I'm not even sure users are really aware that given cap can be different
> than zone size and btrfs uses zone size to compute size, the size is a
> flat out lie.

Sorry, I don't get what btrfs does with zone management has anything
to do with using Zonefs to get direct, raw IO access to individual
zones. Direct IO on open zone fds is likely more efficient than
doing IO through the standard LBA based block device because ZoneFS
uses iomap_dio_rw() so it only needs to do one mapping operation per
IO instead of one per page in the IO. Nor does it have to manage
buffer heads or other "generic blockdev" functionality that direct
IO access to zoned storage doesn't require.

So whatever you're complaining about that btrfs lies about, does or
doesn't do is irrelevant - Zonefs was written with the express
purpose of getting user applications away from needing to directly
manage zone storage. SO if you have special zone IO management
requirements, work out how they can be supported by zonefs - we
don't need yet another special purpose direct hardware access API
for zone storage when we already have a solid solution to the
problem already.

> modprobe null_blk nr_devices=0
> mkdir /sys/kernel/config/nullb/nullb0
> echo 0 > /sys/kernel/config/nullb/nullb0/completion_nsec
> echo 0 > /sys/kernel/config/nullb/nullb0/irqmode
> echo 2 > /sys/kernel/config/nullb/nullb0/queue_mode
> echo 1024 > /sys/kernel/config/nullb/nullb0/hw_queue_depth
> echo 1 > /sys/kernel/config/nullb/nullb0/memory_backed
> echo 1 > /sys/kernel/config/nullb/nullb0/zoned
> 
> echo 128 > /sys/kernel/config/nullb/nullb0/zone_size
> # 6 zones are implied, we are saying 768 for the full storage size..
> # but...
> echo 768 > /sys/kernel/config/nullb/nullb0/size
> 
> # If we force capacity to be way less than the zone sizes, btrfs still
> # uses the zone size to do its data / metadata size computation...
> echo 32 > /sys/kernel/config/nullb/nullb0/zone_capacity

Then that's just a btrfs zone support bug where it's used the
wrong information to size it's zones. Why not just send a patch to
fix it?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [LSF/MM/BPF BoF] BoF for Zoned Storage
  2022-03-04 22:42     ` Dave Chinner
@ 2022-03-04 22:55       ` Luis Chamberlain
  2022-03-05  7:33         ` Javier González
  2022-03-07  0:07         ` Damien Le Moal
  0 siblings, 2 replies; 59+ messages in thread
From: Luis Chamberlain @ 2022-03-04 22:55 UTC (permalink / raw)
  To: Dave Chinner
  Cc: linux-block, linux-fsdevel, lsf-pc, Matias Bjørling,
	Javier González, Damien Le Moal, Bart Van Assche,
	Adam Manzanares, Keith Busch, Johannes Thumshirn, Naohiro Aota,
	Pankaj Raghav, Kanchan Joshi, Nitesh Shetty

On Sat, Mar 05, 2022 at 09:42:57AM +1100, Dave Chinner wrote:
> On Fri, Mar 04, 2022 at 02:10:08PM -0800, Luis Chamberlain wrote:
> > On Fri, Mar 04, 2022 at 11:10:22AM +1100, Dave Chinner wrote:
> > > On Wed, Mar 02, 2022 at 04:56:54PM -0800, Luis Chamberlain wrote:
> > > > Thinking proactively about LSFMM, regarding just Zone storage..
> > > > 
> > > > I'd like to propose a BoF for Zoned Storage. The point of it is
> > > > to address the existing point points we have and take advantage of
> > > > having folks in the room we can likely settle on things faster which
> > > > otherwise would take years.
> > > > 
> > > > I'll throw at least one topic out:
> > > > 
> > > >   * Raw access for zone append for microbenchmarks:
> > > >   	- are we really happy with the status quo?
> > > > 	- if not what outlets do we have?
> > > > 
> > > > I think the nvme passthrogh stuff deserves it's own shared
> > > > discussion though and should not make it part of the BoF.
> > > 
> > > Reading through the discussion on this thread, perhaps this session
> > > should be used to educate application developers about how to use
> > > ZoneFS so they never need to manage low level details of zone
> > > storage such as enumerating zones, controlling write pointers
> > > safely for concurrent IO, performing zone resets, etc.
> > 
> > I'm not even sure users are really aware that given cap can be different
> > than zone size and btrfs uses zone size to compute size, the size is a
> > flat out lie.
> 
> Sorry, I don't get what btrfs does with zone management has anything
> to do with using Zonefs to get direct, raw IO access to individual
> zones.

You are right for direct raw access. My point was that even for
filesystem use design I don't think the communication is clear on
expectations. Similar computation need to be managed by fileystem
design, for instance.

> Direct IO on open zone fds is likely more efficient than
> doing IO through the standard LBA based block device because ZoneFS
> uses iomap_dio_rw() so it only needs to do one mapping operation per
> IO instead of one per page in the IO. Nor does it have to manage
> buffer heads or other "generic blockdev" functionality that direct
> IO access to zoned storage doesn't require.
>
> So whatever you're complaining about that btrfs lies about, does or
> doesn't do is irrelevant - Zonefs was written with the express
> purpose of getting user applications away from needing to directly
> manage zone storage.

I think it ended that way, I can't say it was the goal from the start.
Seems the raw block patches had some support and in the end zonefs
was presented as a possible outlet.

> SO if you have special zone IO management
> requirements, work out how they can be supported by zonefs - we
> don't need yet another special purpose direct hardware access API
> for zone storage when we already have a solid solution to the
> problem already.

If this is fairly decided. Then that's that.

Calling zonefs solid though is a stretch.

> > modprobe null_blk nr_devices=0
> > mkdir /sys/kernel/config/nullb/nullb0
> > echo 0 > /sys/kernel/config/nullb/nullb0/completion_nsec
> > echo 0 > /sys/kernel/config/nullb/nullb0/irqmode
> > echo 2 > /sys/kernel/config/nullb/nullb0/queue_mode
> > echo 1024 > /sys/kernel/config/nullb/nullb0/hw_queue_depth
> > echo 1 > /sys/kernel/config/nullb/nullb0/memory_backed
> > echo 1 > /sys/kernel/config/nullb/nullb0/zoned
> > 
> > echo 128 > /sys/kernel/config/nullb/nullb0/zone_size
> > # 6 zones are implied, we are saying 768 for the full storage size..
> > # but...
> > echo 768 > /sys/kernel/config/nullb/nullb0/size
> > 
> > # If we force capacity to be way less than the zone sizes, btrfs still
> > # uses the zone size to do its data / metadata size computation...
> > echo 32 > /sys/kernel/config/nullb/nullb0/zone_capacity
> 
> Then that's just a btrfs zone support bug where it's used the
> wrong information to size it's zones. Why not just send a patch to
> fix it?

This can change the format of existing created filesystems. And so
if this change is welcomed I think we would need to be explicit
about its support.

  Luis

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [LSF/MM/BPF BoF] BoF for Zoned Storage
  2022-03-04 22:55       ` Luis Chamberlain
@ 2022-03-05  7:33         ` Javier González
  2022-03-07  7:12           ` Dave Chinner
  2022-03-07 13:55           ` James Bottomley
  2022-03-07  0:07         ` Damien Le Moal
  1 sibling, 2 replies; 59+ messages in thread
From: Javier González @ 2022-03-05  7:33 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: Dave Chinner, linux-block, linux-fsdevel, lsf-pc,
	Matias Bjørling, Damien Le Moal, Bart Van Assche,
	Adam Manzanares, Keith Busch, Johannes Thumshirn, Naohiro Aota,
	Pankaj Raghav, Kanchan Joshi, Nitesh Shetty

On 04.03.2022 14:55, Luis Chamberlain wrote:
>On Sat, Mar 05, 2022 at 09:42:57AM +1100, Dave Chinner wrote:
>> On Fri, Mar 04, 2022 at 02:10:08PM -0800, Luis Chamberlain wrote:
>> > On Fri, Mar 04, 2022 at 11:10:22AM +1100, Dave Chinner wrote:
>> > > On Wed, Mar 02, 2022 at 04:56:54PM -0800, Luis Chamberlain wrote:
>> > > > Thinking proactively about LSFMM, regarding just Zone storage..
>> > > >
>> > > > I'd like to propose a BoF for Zoned Storage. The point of it is
>> > > > to address the existing point points we have and take advantage of
>> > > > having folks in the room we can likely settle on things faster which
>> > > > otherwise would take years.
>> > > >
>> > > > I'll throw at least one topic out:
>> > > >
>> > > >   * Raw access for zone append for microbenchmarks:
>> > > >   	- are we really happy with the status quo?
>> > > > 	- if not what outlets do we have?
>> > > >
>> > > > I think the nvme passthrogh stuff deserves it's own shared
>> > > > discussion though and should not make it part of the BoF.
>> > >
>> > > Reading through the discussion on this thread, perhaps this session
>> > > should be used to educate application developers about how to use
>> > > ZoneFS so they never need to manage low level details of zone
>> > > storage such as enumerating zones, controlling write pointers
>> > > safely for concurrent IO, performing zone resets, etc.
>> >
>> > I'm not even sure users are really aware that given cap can be different
>> > than zone size and btrfs uses zone size to compute size, the size is a
>> > flat out lie.
>>
>> Sorry, I don't get what btrfs does with zone management has anything
>> to do with using Zonefs to get direct, raw IO access to individual
>> zones.
>
>You are right for direct raw access. My point was that even for
>filesystem use design I don't think the communication is clear on
>expectations. Similar computation need to be managed by fileystem
>design, for instance.

Dave,

I understand that you point to ZoneFS for this. It is true that it was
presented at the moment as the way to do raw zone access from
user-space.

However, there is no users of ZoneFS for ZNS devices that I am aware of
(maybe for SMR this is a different story).  The main open-source
implementations out there for RocksDB that are being used in production
(ZenFS and xZTL) rely on either raw zone block access or the generic
char device in NVMe (/dev/ngXnY). This is because having the capability
to do zone management from applications that already work with objects
fits much better.

My point is that there is space for both ZoneFS and raw zoned block
device. And regarding !PO2 zone sizes, my point is that this can be
leveraged both by btrfs and this raw zone block device.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [LSF/MM/BPF BoF] BoF for Zoned Storage
  2022-03-04 20:12                   ` Luis Chamberlain
@ 2022-03-06 23:54                     ` Damien Le Moal
  0 siblings, 0 replies; 59+ messages in thread
From: Damien Le Moal @ 2022-03-06 23:54 UTC (permalink / raw)
  To: Luis Chamberlain, Matias Bjørling
  Cc: Adam Manzanares, Damien Le Moal, Javier González,
	linux-block, linux-fsdevel, lsf-pc, Bart Van Assche, Keith Busch,
	Johannes Thumshirn, Naohiro Aota, Pankaj Raghav, Kanchan Joshi,
	Nitesh Shetty

On 3/5/22 05:12, Luis Chamberlain wrote:
> On Thu, Mar 03, 2022 at 09:33:06PM +0000, Matias Bjørling wrote:
>>> -----Original Message-----
>>> From: Adam Manzanares <a.manzanares@samsung.com>
>>> However, an end-user application should not (in my opinion) have to deal
>>> with this. It should use helper functions from a library that provides the
>>> appropriate abstraction to the application, such that the applications don't
>>> have to care about either specific zone capacity/size, or multiple resets. This is
>>> similar to how file systems work with file system semantics. For example, a file
>>> can span multiple extents on disk, but all an application sees is the file
>>> semantics.
>>>>
>>>
>>> I don't want to go so far as to say what the end user application should and
>>> should not do.
>>
>> Consider it as a best practice example. Another typical example is
>> that one should avoid extensive flushes to disk if the application
>> doesn't need persistence for each I/O it issues. 
> 
> Although I was sad to see there was no raw access to a block zoned
> storage device, the above makes me kind of happy that this is the case
> today. Why? Because there is an implicit requirement on management of
> data on zone storage devices outside of regular storage SSDs, and if
> its not considered and *very well documented*, in agreement with us
> all, we can end up with folks slightly surprised with these
> requirements.
> 
> An application today can't directly manage these objects so that's not
> even possible today. And in fact it's not even clear if / how we'll get
> there.

See include/uapi/linux/blkzoned.h. I really do not understand what you
are talking about.

And yes, there is not much in terms of documentation under
Documentation. Patches welcome. We do have documented things here though:

https://zonedstorage.io/docs/linux/zbd-api

> 
> So in the meantime the only way to access zones directly, if an application
> wants anything close as possible to the block layer, the only way is
> through the VFS through zonefs. I can hear people cringing even if you
> are miles away. If we want an improvement upon this, whatever API we come
> up with we *must* clearly embrace and document the requirements /
> responsiblities above.
> 
> From what I read, the unmapped LBA problem can be observed as a
> non-problem *iff* users are willing to deal with the above. We seem to
> have disagreement on the expection from users.

Again, how can one implement an application doing raw zoned block device
accesses without managing zones correctly is unknown to me. It seems to
me that you are thinking of an application design model that I do not
see/understand. Care to elaborate ?

> Any way, there are two aspects to what Javier was mentioning and I think
> it is *critial* to separate them:
> 
>  a) emulation should be possible given the nature of NAND

Emulation need has nothing to do with the media type. Specifications
*never* talk about a specific media type. ZBC/ZAC, similarly to ZNS, do
not mandate any requirement on zone size.

>  b) The PO2 requirement exists, is / should it exist forever?

Not necessarily. But since it is that right now, any change must ensure
that existing user-space does not break nor regress (performance).

> 
> The discussion around these two throws drew in a third aspect:
> 
> c) Applications which want to deal with LBAs directly on
> NVMe ZNS drives must be aware of the ZNS design and deal with
> it diretly or indirectly in light of the unmapped LBAs which
> are caused by the differences between zone sizes, zone capacity,
> how objects can span multiple zones, zone resets, etc.

That is not really special to ZNS. ZBC/ZAC SMR HDDs also need that
management since zones can go offline or read-only too (in ZNS too).
That is actually the main reason why applications *must* manage accesses
per zones. Otherwise, correct IO error recovery is impossible.

> 
> I think a) is easier to swallow and accept provided there is
> no impact on existing users. b) and c) are things which I think
> could be elaborated a bit more at LSFMM through community dialog.
> 
>   Luis


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [LSF/MM/BPF BoF] BoF for Zoned Storage
  2022-03-04 22:10   ` Luis Chamberlain
  2022-03-04 22:42     ` Dave Chinner
@ 2022-03-06 23:56     ` Damien Le Moal
  2022-03-07 15:44       ` Luis Chamberlain
  1 sibling, 1 reply; 59+ messages in thread
From: Damien Le Moal @ 2022-03-06 23:56 UTC (permalink / raw)
  To: Luis Chamberlain, Dave Chinner
  Cc: linux-block, linux-fsdevel, lsf-pc, Matias Bjørling,
	Javier González, Damien Le Moal, Bart Van Assche,
	Adam Manzanares, Keith Busch, Johannes Thumshirn, Naohiro Aota,
	Pankaj Raghav, Kanchan Joshi, Nitesh Shetty

On 3/5/22 07:10, Luis Chamberlain wrote:
> On Fri, Mar 04, 2022 at 11:10:22AM +1100, Dave Chinner wrote:
>> On Wed, Mar 02, 2022 at 04:56:54PM -0800, Luis Chamberlain wrote:
>>> Thinking proactively about LSFMM, regarding just Zone storage..
>>>
>>> I'd like to propose a BoF for Zoned Storage. The point of it is
>>> to address the existing point points we have and take advantage of
>>> having folks in the room we can likely settle on things faster which
>>> otherwise would take years.
>>>
>>> I'll throw at least one topic out:
>>>
>>>   * Raw access for zone append for microbenchmarks:
>>>   	- are we really happy with the status quo?
>>> 	- if not what outlets do we have?
>>>
>>> I think the nvme passthrogh stuff deserves it's own shared
>>> discussion though and should not make it part of the BoF.
>>
>> Reading through the discussion on this thread, perhaps this session
>> should be used to educate application developers about how to use
>> ZoneFS so they never need to manage low level details of zone
>> storage such as enumerating zones, controlling write pointers
>> safely for concurrent IO, performing zone resets, etc.
> 
> I'm not even sure users are really aware that given cap can be different
> than zone size and btrfs uses zone size to compute size, the size is a
> flat out lie.
> 
> modprobe null_blk nr_devices=0
> mkdir /sys/kernel/config/nullb/nullb0
> echo 0 > /sys/kernel/config/nullb/nullb0/completion_nsec
> echo 0 > /sys/kernel/config/nullb/nullb0/irqmode
> echo 2 > /sys/kernel/config/nullb/nullb0/queue_mode
> echo 1024 > /sys/kernel/config/nullb/nullb0/hw_queue_depth
> echo 1 > /sys/kernel/config/nullb/nullb0/memory_backed
> echo 1 > /sys/kernel/config/nullb/nullb0/zoned
> 
> echo 128 > /sys/kernel/config/nullb/nullb0/zone_size
> # 6 zones are implied, we are saying 768 for the full storage size..
> # but...
> echo 768 > /sys/kernel/config/nullb/nullb0/size
> 
> # If we force capacity to be way less than the zone sizes, btrfs still
> # uses the zone size to do its data / metadata size computation...
> echo 32 > /sys/kernel/config/nullb/nullb0/zone_capacity
> 
> # No conventional zones
> echo 0 > /sys/kernel/config/nullb/nullb0/zone_nr_conv
> 
> echo 1 > /sys/kernel/config/nullb/nullb0/power
> echo mq-deadline > /sys/block/nullb0/queue/scheduler
> 
> # mkfs.btrfs -f -d single -m single /dev/nullb0
> Label:              (null)
> UUID:               e725782a-d2d3-4c02-97fd-0501de117323
> Node size:          16384
> Sector size:        4096
> Filesystem size:    768.00MiB
> Block group profiles:
>   Data:             single          128.00MiB
>     Metadata:         single          128.00MiB
>       System:           single          128.00MiB
>       SSD detected:       yes
>       Zoned device:       yes
>         Zone size:        128.00MiB
> 	Incompat features:  extref, skinny-metadata, no-holes, zoned
> 	Runtime features:   free-space-tree
> 	Checksum:           crc32c
> 	Number of devices:  1
> 	Devices:
> 	   ID        SIZE  PATH
> 	       1   768.00MiB  /dev/nullb0
> 
> # mount /dev/nullb0 /mnt
> # btrfs fi show
> Label: none  uuid: e725782a-d2d3-4c02-97fd-0501de117323
>         Total devices 1 FS bytes used 144.00KiB
> 	        devid    1 size 768.00MiB used 384.00MiB path
> 		/dev/nullb0
> 
> # btrfs fi df /mnt
> Data, single: total=128.00MiB, used=0.00B
> System, single: total=128.00MiB, used=16.00KiB
> Metadata, single: total=128.00MiB, used=128.00KiB
> GlobalReserve, single: total=3.50MiB, used=0.00B
> 
> Since btrfs already has "real size" problems this existing
> design takes this a bit further without a fix either. I suspect
> quite a bit of puzzled users will be unhappy that even though
> ZNS claims to kill overprovisioning we're now somehow lying
> about size. I'm not even sure this might be good for the
> filesystem / metadata.

btrfs maps zones to block groups and the sectors between zone capacity
and zone size are marked as unusable. The report above is not showing
that. The coding is correct though. The block allocation will not be
attempted beyond zone capacity.

> 
>   Luis


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [LSF/MM/BPF BoF] BoF for Zoned Storage
  2022-03-04 22:55       ` Luis Chamberlain
  2022-03-05  7:33         ` Javier González
@ 2022-03-07  0:07         ` Damien Le Moal
  1 sibling, 0 replies; 59+ messages in thread
From: Damien Le Moal @ 2022-03-07  0:07 UTC (permalink / raw)
  To: Luis Chamberlain, Dave Chinner
  Cc: linux-block, linux-fsdevel, lsf-pc, Matias Bjørling,
	Javier González, Damien Le Moal, Bart Van Assche,
	Adam Manzanares, Keith Busch, Johannes Thumshirn, Naohiro Aota,
	Pankaj Raghav, Kanchan Joshi, Nitesh Shetty

On 3/5/22 07:55, Luis Chamberlain wrote:
> On Sat, Mar 05, 2022 at 09:42:57AM +1100, Dave Chinner wrote:
>> On Fri, Mar 04, 2022 at 02:10:08PM -0800, Luis Chamberlain wrote:
>>> On Fri, Mar 04, 2022 at 11:10:22AM +1100, Dave Chinner wrote:
>>>> On Wed, Mar 02, 2022 at 04:56:54PM -0800, Luis Chamberlain wrote:
>>>>> Thinking proactively about LSFMM, regarding just Zone storage..
>>>>>
>>>>> I'd like to propose a BoF for Zoned Storage. The point of it is
>>>>> to address the existing point points we have and take advantage of
>>>>> having folks in the room we can likely settle on things faster which
>>>>> otherwise would take years.
>>>>>
>>>>> I'll throw at least one topic out:
>>>>>
>>>>>   * Raw access for zone append for microbenchmarks:
>>>>>   	- are we really happy with the status quo?
>>>>> 	- if not what outlets do we have?
>>>>>
>>>>> I think the nvme passthrogh stuff deserves it's own shared
>>>>> discussion though and should not make it part of the BoF.
>>>>
>>>> Reading through the discussion on this thread, perhaps this session
>>>> should be used to educate application developers about how to use
>>>> ZoneFS so they never need to manage low level details of zone
>>>> storage such as enumerating zones, controlling write pointers
>>>> safely for concurrent IO, performing zone resets, etc.
>>>
>>> I'm not even sure users are really aware that given cap can be different
>>> than zone size and btrfs uses zone size to compute size, the size is a
>>> flat out lie.
>>
>> Sorry, I don't get what btrfs does with zone management has anything
>> to do with using Zonefs to get direct, raw IO access to individual
>> zones.
> 
> You are right for direct raw access. My point was that even for
> filesystem use design I don't think the communication is clear on
> expectations. Similar computation need to be managed by fileystem
> design, for instance.
> 
>> Direct IO on open zone fds is likely more efficient than
>> doing IO through the standard LBA based block device because ZoneFS
>> uses iomap_dio_rw() so it only needs to do one mapping operation per
>> IO instead of one per page in the IO. Nor does it have to manage
>> buffer heads or other "generic blockdev" functionality that direct
>> IO access to zoned storage doesn't require.
>>
>> So whatever you're complaining about that btrfs lies about, does or
>> doesn't do is irrelevant - Zonefs was written with the express
>> purpose of getting user applications away from needing to directly
>> manage zone storage.
> 
> I think it ended that way, I can't say it was the goal from the start.
> Seems the raw block patches had some support and in the end zonefs
> was presented as a possible outlet.

zonefs *was* design from the start as a file-based raw access method so
that zoned block devices can be used from applications coded in
languages such as Java, which do not really have a direct equivalent of
ioctl(), as far as I know.

So no, it is not an accident and did not "end up that way". See:

Documentation/filesystems/zonefs.rst

If anything, where zonefs currently falls short is the need to do direct
IO for writes to sequential zones. That does not play well with
languages like Java which do not have O_DIRECT and also have the super
annoying property of *not* aligning IO memory buffers to sectors/pages
(e.g. Java always has that crazy 16B offset because it adds its own
buffer management struct at the beginning of a buffer). But I have a
couple of ideas to solve this.

> 
>> SO if you have special zone IO management
>> requirements, work out how they can be supported by zonefs - we
>> don't need yet another special purpose direct hardware access API
>> for zone storage when we already have a solid solution to the
>> problem already.
> 
> If this is fairly decided. Then that's that.
> 
> Calling zonefs solid though is a stretch.

If you see problems with it, please report them. We have Hadoop/HDFS
running with it and it works great.With zonefs, any application
chuncking its data using files over a regular FS can be more easily
converted to using zoned storage with a low overhead FS. Think Ceph as
another potential candidate.

And yes, it is not a magical solution, since in the end, it exposes the
device as-is.

> 
>>> modprobe null_blk nr_devices=0
>>> mkdir /sys/kernel/config/nullb/nullb0
>>> echo 0 > /sys/kernel/config/nullb/nullb0/completion_nsec
>>> echo 0 > /sys/kernel/config/nullb/nullb0/irqmode
>>> echo 2 > /sys/kernel/config/nullb/nullb0/queue_mode
>>> echo 1024 > /sys/kernel/config/nullb/nullb0/hw_queue_depth
>>> echo 1 > /sys/kernel/config/nullb/nullb0/memory_backed
>>> echo 1 > /sys/kernel/config/nullb/nullb0/zoned
>>>
>>> echo 128 > /sys/kernel/config/nullb/nullb0/zone_size
>>> # 6 zones are implied, we are saying 768 for the full storage size..
>>> # but...
>>> echo 768 > /sys/kernel/config/nullb/nullb0/size
>>>
>>> # If we force capacity to be way less than the zone sizes, btrfs still
>>> # uses the zone size to do its data / metadata size computation...
>>> echo 32 > /sys/kernel/config/nullb/nullb0/zone_capacity
>>
>> Then that's just a btrfs zone support bug where it's used the
>> wrong information to size it's zones. Why not just send a patch to
>> fix it?
> 
> This can change the format of existing created filesystems. And so
> if this change is welcomed I think we would need to be explicit
> about its support.

No. btrfs already has provision for unavailable blocks in a block group.
See my previous email.

> 
>   Luis


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [LSF/MM/BPF BoF] BoF for Zoned Storage
  2022-03-05  7:33         ` Javier González
@ 2022-03-07  7:12           ` Dave Chinner
  2022-03-07 10:27             ` Matias Bjørling
  2022-03-11  0:49             ` Luis Chamberlain
  2022-03-07 13:55           ` James Bottomley
  1 sibling, 2 replies; 59+ messages in thread
From: Dave Chinner @ 2022-03-07  7:12 UTC (permalink / raw)
  To: Javier González
  Cc: Luis Chamberlain, linux-block, linux-fsdevel, lsf-pc,
	Matias Bjørling, Damien Le Moal, Bart Van Assche,
	Adam Manzanares, Keith Busch, Johannes Thumshirn, Naohiro Aota,
	Pankaj Raghav, Kanchan Joshi, Nitesh Shetty

On Sat, Mar 05, 2022 at 08:33:21AM +0100, Javier González wrote:
> On 04.03.2022 14:55, Luis Chamberlain wrote:
> > On Sat, Mar 05, 2022 at 09:42:57AM +1100, Dave Chinner wrote:
> > > On Fri, Mar 04, 2022 at 02:10:08PM -0800, Luis Chamberlain wrote:
> > > > On Fri, Mar 04, 2022 at 11:10:22AM +1100, Dave Chinner wrote:
> > > > > On Wed, Mar 02, 2022 at 04:56:54PM -0800, Luis Chamberlain wrote:
> > > > > > Thinking proactively about LSFMM, regarding just Zone storage..
> > > > > >
> > > > > > I'd like to propose a BoF for Zoned Storage. The point of it is
> > > > > > to address the existing point points we have and take advantage of
> > > > > > having folks in the room we can likely settle on things faster which
> > > > > > otherwise would take years.
> > > > > >
> > > > > > I'll throw at least one topic out:
> > > > > >
> > > > > >   * Raw access for zone append for microbenchmarks:
> > > > > >   	- are we really happy with the status quo?
> > > > > > 	- if not what outlets do we have?
> > > > > >
> > > > > > I think the nvme passthrogh stuff deserves it's own shared
> > > > > > discussion though and should not make it part of the BoF.
> > > > >
> > > > > Reading through the discussion on this thread, perhaps this session
> > > > > should be used to educate application developers about how to use
> > > > > ZoneFS so they never need to manage low level details of zone
> > > > > storage such as enumerating zones, controlling write pointers
> > > > > safely for concurrent IO, performing zone resets, etc.
> > > >
> > > > I'm not even sure users are really aware that given cap can be different
> > > > than zone size and btrfs uses zone size to compute size, the size is a
> > > > flat out lie.
> > > 
> > > Sorry, I don't get what btrfs does with zone management has anything
> > > to do with using Zonefs to get direct, raw IO access to individual
> > > zones.
> > 
> > You are right for direct raw access. My point was that even for
> > filesystem use design I don't think the communication is clear on
> > expectations. Similar computation need to be managed by fileystem
> > design, for instance.
> 
> Dave,
> 
> I understand that you point to ZoneFS for this. It is true that it was
> presented at the moment as the way to do raw zone access from
> user-space.
> 
> However, there is no users of ZoneFS for ZNS devices that I am aware of
> (maybe for SMR this is a different story).  The main open-source
> implementations out there for RocksDB that are being used in production
> (ZenFS and xZTL) rely on either raw zone block access or the generic
> char device in NVMe (/dev/ngXnY).

That's exactly the situation we want to avoid.

You're talking about accessing Zoned storage by knowing directly
about how the hardware works and interfacing directly with hardware
specific device commands.

This is exactly what is wrong with this whole conversation - direct
access to hardware is fragile and very limiting, and the whole
purpose of having an operating system is to abstract the hardware
functionality into a generally usable API. That way when something
new gets added to the hardware or something gets removed, the
applications don't because they weren't written with that sort of
hardware functionality extension in mind.

I understand that RocksDB probably went direct to the hardware
because, at the time, it was the only choice the developers had to
make use of ZNS based storage. I understand that.

However, I also understand that there are *better options now* that
allow applications to target zone storage in a way that doesn't
expose them to the foibles of hardware support and storage protocol
specifications and characteristics.

The generic interface that the kernel provides for zoned storage is
called ZoneFS. Forget about the fact it is a filesystem, all it
does is provide userspace with a named zone abstraction for a zoned
device: every zone is an append-only file.

That's what I'm trying to get across here - this whole discussion
about zone capacity not matching zone size is a hardware/
specification detail that applications *do not need to know about*
to use zone storage. That's something taht Zonefs can/does hide from
applications completely - the zone files behave exactly the same
from the user perspective regardless of whether the hardware zone
capacity is the same or less than the zone size.

Expanding access the hardware and/or raw block devices to ensure
userspace applications can directly manage zone write pointers, zone
capacity/space limits, etc is the wrong architectural direction to
be taking. The sort of *hardware quirks* being discussed in this
thread need to be managed by the kernel and hidden from userspace;
userspace shouldn't need to care about such wierd and esoteric
hardware and storage protocol/specification/implementation
differences.

IMO, while RocksDB is the technology leader for ZNS, it is not the
model that new applications should be trying to emulate. They should
be designed from the ground up to use ZoneFS instead of directly
accessing nvme devices or trying to use the raw block devices for
zoned storage. Use the generic kernel abstraction for the hardware
like applications do for all other things!

> This is because having the capability to do zone management from
> applications that already work with objects fits much better.

ZoneFS doesn't absolve applications from having to perform zone
management to pack it's objects and garbage collect stale storage
space.  ZoneFS merely provides a generic, file based, hardware
independent API for performing these zone management tasks.

> My point is that there is space for both ZoneFS and raw zoned block
> device. And regarding !PO2 zone sizes, my point is that this can be
> leveraged both by btrfs and this raw zone block device.

On that I disagree - any argument that starts with "we need raw
zoned block device access to ...." is starting from an invalid
premise. We should be hiding hardware quirks from userspace, not
exposing them further.

IMO, we want writing zone storage native applications to be simple
and approachable by anyone who knows how to write to append-only
files.  We do not want such applications to be limited to people who
have deep and rare expertise in the dark details of, say, largely
undocumented niche NVMe ZNS specification and protocol quirks.

ZoneFS provides us with a path to the former, what you are
advocating is the latter....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 59+ messages in thread

* RE: [LSF/MM/BPF BoF] BoF for Zoned Storage
  2022-03-07  7:12           ` Dave Chinner
@ 2022-03-07 10:27             ` Matias Bjørling
  2022-03-07 11:29               ` Javier González
  2022-03-11  0:49             ` Luis Chamberlain
  1 sibling, 1 reply; 59+ messages in thread
From: Matias Bjørling @ 2022-03-07 10:27 UTC (permalink / raw)
  To: Dave Chinner, Javier González
  Cc: Luis Chamberlain, linux-block, linux-fsdevel, lsf-pc,
	Damien Le Moal, Bart Van Assche, Hans Holmberg, Adam Manzanares,
	Keith Busch, Johannes Thumshirn, Naohiro Aota, Pankaj Raghav,
	Kanchan Joshi, Nitesh Shetty

> > I understand that you point to ZoneFS for this. It is true that it was
> > presented at the moment as the way to do raw zone access from
> > user-space.
> >
> > However, there is no users of ZoneFS for ZNS devices that I am aware
> > of (maybe for SMR this is a different story).  The main open-source
> > implementations out there for RocksDB that are being used in
> > production (ZenFS and xZTL) rely on either raw zone block access or
> > the generic char device in NVMe (/dev/ngXnY).
> 
> That's exactly the situation we want to avoid.
> 
> You're talking about accessing Zoned storage by knowing directly about how
> the hardware works and interfacing directly with hardware specific device
> commands.
> 
> This is exactly what is wrong with this whole conversation - direct access to
> hardware is fragile and very limiting, and the whole purpose of having an
> operating system is to abstract the hardware functionality into a generally
> usable API. That way when something new gets added to the hardware or
> something gets removed, the applications don't because they weren't written
> with that sort of hardware functionality extension in mind.
> 
> I understand that RocksDB probably went direct to the hardware because, at
> the time, it was the only choice the developers had to make use of ZNS based
> storage. I understand that.
> 
> However, I also understand that there are *better options now* that allow
> applications to target zone storage in a way that doesn't expose them to the
> foibles of hardware support and storage protocol specifications and
> characteristics.
> 
> The generic interface that the kernel provides for zoned storage is called
> ZoneFS. Forget about the fact it is a filesystem, all it does is provide userspace
> with a named zone abstraction for a zoned
> device: every zone is an append-only file.
> 
> That's what I'm trying to get across here - this whole discussion about zone
> capacity not matching zone size is a hardware/ specification detail that
> applications *do not need to know about* to use zone storage. That's
> something taht Zonefs can/does hide from applications completely - the zone
> files behave exactly the same from the user perspective regardless of whether
> the hardware zone capacity is the same or less than the zone size.
> 
> Expanding access the hardware and/or raw block devices to ensure userspace
> applications can directly manage zone write pointers, zone capacity/space
> limits, etc is the wrong architectural direction to be taking. The sort of
> *hardware quirks* being discussed in this thread need to be managed by the
> kernel and hidden from userspace; userspace shouldn't need to care about
> such wierd and esoteric hardware and storage
> protocol/specification/implementation
> differences.
> 
> IMO, while RocksDB is the technology leader for ZNS, it is not the model that
> new applications should be trying to emulate. They should be designed from
> the ground up to use ZoneFS instead of directly accessing nvme devices or
> trying to use the raw block devices for zoned storage. Use the generic kernel
> abstraction for the hardware like applications do for all other things!
> 
> > This is because having the capability to do zone management from
> > applications that already work with objects fits much better.
> 
> ZoneFS doesn't absolve applications from having to perform zone management
> to pack it's objects and garbage collect stale storage space.  ZoneFS merely
> provides a generic, file based, hardware independent API for performing these
> zone management tasks.
> 
> > My point is that there is space for both ZoneFS and raw zoned block
> > device. And regarding !PO2 zone sizes, my point is that this can be
> > leveraged both by btrfs and this raw zone block device.
> 
> On that I disagree - any argument that starts with "we need raw zoned block
> device access to ...." is starting from an invalid premise. We should be hiding
> hardware quirks from userspace, not exposing them further.
> 
> IMO, we want writing zone storage native applications to be simple and
> approachable by anyone who knows how to write to append-only files.  We do
> not want such applications to be limited to people who have deep and rare
> expertise in the dark details of, say, largely undocumented niche NVMe ZNS
> specification and protocol quirks.
> 
> ZoneFS provides us with a path to the former, what you are advocating is the
> latter....
> 

+ Hans (zenfs/rocksdb author)

Dave, thank you for your great insight. It is a great argument for why zonefs makes sense. I must admit that Damien has been telling me this multiple times, but I didn't fully grok the benefits until seeing it in the light of this thread.

Wrt to RocksDB support using ZenFS - while raw block access was the initial approach, it is very easy to change to use the zonefs API. Hans has already whipped up a plan for how to do it.



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [LSF/MM/BPF BoF] BoF for Zoned Storage
  2022-03-07 10:27             ` Matias Bjørling
@ 2022-03-07 11:29               ` Javier González
  0 siblings, 0 replies; 59+ messages in thread
From: Javier González @ 2022-03-07 11:29 UTC (permalink / raw)
  To: Matias Bjørling
  Cc: Dave Chinner, Luis Chamberlain, linux-block, linux-fsdevel,
	lsf-pc, Damien Le Moal, Bart Van Assche, Hans Holmberg,
	Adam Manzanares, Keith Busch, Johannes Thumshirn, Naohiro Aota,
	Pankaj Raghav, Kanchan Joshi, Nitesh Shetty

On 07.03.2022 10:27, Matias Bjørling wrote:
>> > I understand that you point to ZoneFS for this. It is true that it was
>> > presented at the moment as the way to do raw zone access from
>> > user-space.
>> >
>> > However, there is no users of ZoneFS for ZNS devices that I am aware
>> > of (maybe for SMR this is a different story).  The main open-source
>> > implementations out there for RocksDB that are being used in
>> > production (ZenFS and xZTL) rely on either raw zone block access or
>> > the generic char device in NVMe (/dev/ngXnY).
>>
>> That's exactly the situation we want to avoid.
>>
>> You're talking about accessing Zoned storage by knowing directly about how
>> the hardware works and interfacing directly with hardware specific device
>> commands.
>>
>> This is exactly what is wrong with this whole conversation - direct access to
>> hardware is fragile and very limiting, and the whole purpose of having an
>> operating system is to abstract the hardware functionality into a generally
>> usable API. That way when something new gets added to the hardware or
>> something gets removed, the applications don't because they weren't written
>> with that sort of hardware functionality extension in mind.
>>
>> I understand that RocksDB probably went direct to the hardware because, at
>> the time, it was the only choice the developers had to make use of ZNS based
>> storage. I understand that.
>>
>> However, I also understand that there are *better options now* that allow
>> applications to target zone storage in a way that doesn't expose them to the
>> foibles of hardware support and storage protocol specifications and
>> characteristics.
>>
>> The generic interface that the kernel provides for zoned storage is called
>> ZoneFS. Forget about the fact it is a filesystem, all it does is provide userspace
>> with a named zone abstraction for a zoned
>> device: every zone is an append-only file.
>>
>> That's what I'm trying to get across here - this whole discussion about zone
>> capacity not matching zone size is a hardware/ specification detail that
>> applications *do not need to know about* to use zone storage. That's
>> something taht Zonefs can/does hide from applications completely - the zone
>> files behave exactly the same from the user perspective regardless of whether
>> the hardware zone capacity is the same or less than the zone size.
>>
>> Expanding access the hardware and/or raw block devices to ensure userspace
>> applications can directly manage zone write pointers, zone capacity/space
>> limits, etc is the wrong architectural direction to be taking. The sort of
>> *hardware quirks* being discussed in this thread need to be managed by the
>> kernel and hidden from userspace; userspace shouldn't need to care about
>> such wierd and esoteric hardware and storage
>> protocol/specification/implementation
>> differences.
>>
>> IMO, while RocksDB is the technology leader for ZNS, it is not the model that
>> new applications should be trying to emulate. They should be designed from
>> the ground up to use ZoneFS instead of directly accessing nvme devices or
>> trying to use the raw block devices for zoned storage. Use the generic kernel
>> abstraction for the hardware like applications do for all other things!
>>
>> > This is because having the capability to do zone management from
>> > applications that already work with objects fits much better.
>>
>> ZoneFS doesn't absolve applications from having to perform zone management
>> to pack it's objects and garbage collect stale storage space.  ZoneFS merely
>> provides a generic, file based, hardware independent API for performing these
>> zone management tasks.
>>
>> > My point is that there is space for both ZoneFS and raw zoned block
>> > device. And regarding !PO2 zone sizes, my point is that this can be
>> > leveraged both by btrfs and this raw zone block device.
>>
>> On that I disagree - any argument that starts with "we need raw zoned block
>> device access to ...." is starting from an invalid premise. We should be hiding
>> hardware quirks from userspace, not exposing them further.
>>
>> IMO, we want writing zone storage native applications to be simple and
>> approachable by anyone who knows how to write to append-only files.  We do
>> not want such applications to be limited to people who have deep and rare
>> expertise in the dark details of, say, largely undocumented niche NVMe ZNS
>> specification and protocol quirks.
>>
>> ZoneFS provides us with a path to the former, what you are advocating is the
>> latter....

I agree with all you say. I can see ZoneFS becoming a generic zone API,
but we are not there yet. Rather than advocating for using raw devices,
I am describing how zone devices are being consumed today. So to me
there are 2 things we need to consider: Support current customers and
improve the way future customers consume these devices.

Coming back to the original topic of the LSF/MM discussion, what I would
like to propose is that we support existing, deployed devices that are
running in Linux and do not have PO2 zone sizes. These can then be
consumed by btrfs or presented to applications through ZoneFS. And for
existing customers, this will mean less headaches.

Note here that if we use ZoneFS and all we care is zone capacities, then
the whole PO2 argument to make applications more efficient does not
apply anymore, as applications would be using the real capacity of the
zone. I very much like this approach.

>+ Hans (zenfs/rocksdb author)
>
>Dave, thank you for your great insight. It is a great argument for why zonefs makes sense. I must admit that Damien has been telling me this multiple times, but I didn't fully grok the benefits until seeing it in the light of this thread.
>
>Wrt to RocksDB support using ZenFS - while raw block access was the initial approach, it is very easy to change to use the zonefs API. Hans has already whipped up a plan for how to do it.

This is great. We have been thinking for some time about aligning with
ZenFS for the in-kernel path. This might be the right time to take
action on this.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [LSF/MM/BPF BoF] BoF for Zoned Storage
  2022-03-05  7:33         ` Javier González
  2022-03-07  7:12           ` Dave Chinner
@ 2022-03-07 13:55           ` James Bottomley
  2022-03-07 14:35             ` Javier González
  1 sibling, 1 reply; 59+ messages in thread
From: James Bottomley @ 2022-03-07 13:55 UTC (permalink / raw)
  To: Javier González, Luis Chamberlain
  Cc: Dave Chinner, linux-block, linux-fsdevel, lsf-pc,
	Matias Bjørling, Damien Le Moal, Bart Van Assche,
	Adam Manzanares, Keith Busch, Johannes Thumshirn, Naohiro Aota,
	Pankaj Raghav, Kanchan Joshi, Nitesh Shetty

On Sat, 2022-03-05 at 08:33 +0100, Javier González wrote:
[...]
> However, there is no users of ZoneFS for ZNS devices that I am aware
> of (maybe for SMR this is a different story).  The main open-source
> implementations out there for RocksDB that are being used in
> production (ZenFS and xZTL) rely on either raw zone block access or
> the generic char device in NVMe (/dev/ngXnY). This is because having
> the capability to do zone management from applications that already
> work with objects fits much better.
> 
> My point is that there is space for both ZoneFS and raw zoned block
> device. And regarding !PO2 zone sizes, my point is that this can be
> leveraged both by btrfs and this raw zone block device.

This is basically history repeating itself, though.  It's precisely the
reason why Linux acquired the raw character device: Oracle decided they
didn't want the OS abstractions in the way of fast performing direct
database access and raw devices was the way it had been done on UNIX,
so they decided it should be done on Linux as well.  There was some
legacy to this as well: because Oracle already had a raw handler they
figured it would be easy to port to Linux.

The problem Oracle had with /dev/raw is that they then have to manage
device discovery and partitioning as well.  It sort of worked on UNIX
when you didn't have too many disks and the discover order was
deterministic.  It began to fail as disks became storage networks.  In
the end, when O_DIRECT was proposed, Oracle eventually saw that using
it on files allowed for much better managed access and the raw driver
fell into disuse and was (finally) removed last year.

What you're proposing above is to repeat the /dev/raw experiment for
equivalent input reasons but expecting different outcomes ... Einstein
has already ruled on that one.

James




^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [LSF/MM/BPF BoF] BoF for Zoned Storage
  2022-03-07 13:55           ` James Bottomley
@ 2022-03-07 14:35             ` Javier González
  2022-03-07 15:15               ` Keith Busch
  0 siblings, 1 reply; 59+ messages in thread
From: Javier González @ 2022-03-07 14:35 UTC (permalink / raw)
  To: James Bottomley
  Cc: Luis Chamberlain, Dave Chinner, linux-block, linux-fsdevel,
	lsf-pc, Matias Bjørling, Damien Le Moal, Bart Van Assche,
	Adam Manzanares, Keith Busch, Johannes Thumshirn, Naohiro Aota,
	Pankaj Raghav, Kanchan Joshi, Nitesh Shetty



> On 7 Mar 2022, at 14.55, James Bottomley <James.Bottomley@hansenpartnership.com> wrote:
> 
> On Sat, 2022-03-05 at 08:33 +0100, Javier González wrote:
> [...]
>> However, there is no users of ZoneFS for ZNS devices that I am aware
>> of (maybe for SMR this is a different story).  The main open-source
>> implementations out there for RocksDB that are being used in
>> production (ZenFS and xZTL) rely on either raw zone block access or
>> the generic char device in NVMe (/dev/ngXnY). This is because having
>> the capability to do zone management from applications that already
>> work with objects fits much better.
>> 
>> My point is that there is space for both ZoneFS and raw zoned block
>> device. And regarding !PO2 zone sizes, my point is that this can be
>> leveraged both by btrfs and this raw zone block device.
> 
> This is basically history repeating itself, though.  It's precisely the
> reason why Linux acquired the raw character device: Oracle decided they
> didn't want the OS abstractions in the way of fast performing direct
> database access and raw devices was the way it had been done on UNIX,
> so they decided it should be done on Linux as well.  There was some
> legacy to this as well: because Oracle already had a raw handler they
> figured it would be easy to port to Linux.
> 
> The problem Oracle had with /dev/raw is that they then have to manage
> device discovery and partitioning as well.  It sort of worked on UNIX
> when you didn't have too many disks and the discover order was
> deterministic.  It began to fail as disks became storage networks.  In
> the end, when O_DIRECT was proposed, Oracle eventually saw that using
> it on files allowed for much better managed access and the raw driver
> fell into disuse and was (finally) removed last year.
> 
> What you're proposing above is to repeat the /dev/raw experiment for
> equivalent input reasons but expecting different outcomes ... Einstein
> has already ruled on that one.

Thanks for the history on the raw device. It’s good to the perspective on history repeating itself. 

I believe that the raw block device is different than the raw character device and we see tons of applications that don’t want FS semantics relying on them. But I get your point.

If we agree to get ZoneFS up to speed and use it as the general API for zone devices, then I think we can refocus there. 

As I mentioned in the last reply to to Dave, the main concern for me at the moment is supporting arbitrary zone sizes in the kernel. If we can agree on a path towards that, we can definitely commit to focus on ZoneFS and implement support for it on the different places we maintain in user-space. 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [LSF/MM/BPF BoF] BoF for Zoned Storage
  2022-03-07 14:35             ` Javier González
@ 2022-03-07 15:15               ` Keith Busch
  2022-03-07 15:28                 ` Javier González
  2022-03-07 20:42                 ` Damien Le Moal
  0 siblings, 2 replies; 59+ messages in thread
From: Keith Busch @ 2022-03-07 15:15 UTC (permalink / raw)
  To: Javier González
  Cc: James Bottomley, Luis Chamberlain, Dave Chinner, linux-block,
	linux-fsdevel, lsf-pc, Matias Bjørling, Damien Le Moal,
	Bart Van Assche, Adam Manzanares, Keith Busch,
	Johannes Thumshirn, Naohiro Aota, Pankaj Raghav, Kanchan Joshi,
	Nitesh Shetty

On Mon, Mar 07, 2022 at 03:35:12PM +0100, Javier González wrote:
> As I mentioned in the last reply to to Dave, the main concern for me
> at the moment is supporting arbitrary zone sizes in the kernel. If we
> can agree on a path towards that, we can definitely commit to focus on
> ZoneFS and implement support for it on the different places we
> maintain in user-space. 

FWIW, the block layer doesn't require pow2 chunk_sectors anymore, so it
looks like that requirement for zone sizes can be relaxed, too.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [LSF/MM/BPF BoF] BoF for Zoned Storage
  2022-03-07 15:15               ` Keith Busch
@ 2022-03-07 15:28                 ` Javier González
  2022-03-07 20:42                 ` Damien Le Moal
  1 sibling, 0 replies; 59+ messages in thread
From: Javier González @ 2022-03-07 15:28 UTC (permalink / raw)
  To: Keith Busch
  Cc: James Bottomley, Luis Chamberlain, Dave Chinner, linux-block,
	linux-fsdevel, lsf-pc, Matias Bjørling, Damien Le Moal,
	Bart Van Assche, Adam Manzanares, Keith Busch,
	Johannes Thumshirn, Naohiro Aota, Pankaj Raghav, Kanchan Joshi,
	Nitesh Shetty


> On 7 Mar 2022, at 16.16, Keith Busch <kbusch@kernel.org> wrote:
> 
> On Mon, Mar 07, 2022 at 03:35:12PM +0100, Javier González wrote:
>> As I mentioned in the last reply to to Dave, the main concern for me
>> at the moment is supporting arbitrary zone sizes in the kernel. If we
>> can agree on a path towards that, we can definitely commit to focus on
>> ZoneFS and implement support for it on the different places we
>> maintain in user-space. 
> 
> FWIW, the block layer doesn't require pow2 chunk_sectors anymore, so it
> looks like that requirement for zone sizes can be relaxed, too.

Exactly. This is the core of the proposal. 


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [LSF/MM/BPF BoF] BoF for Zoned Storage
  2022-03-06 23:56     ` Damien Le Moal
@ 2022-03-07 15:44       ` Luis Chamberlain
  2022-03-07 16:23         ` Johannes Thumshirn
  0 siblings, 1 reply; 59+ messages in thread
From: Luis Chamberlain @ 2022-03-07 15:44 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Dave Chinner, linux-block, linux-fsdevel, lsf-pc,
	Matias Bjørling, Javier González, Damien Le Moal,
	Bart Van Assche, Adam Manzanares, Keith Busch,
	Johannes Thumshirn, Naohiro Aota, Pankaj Raghav, Kanchan Joshi,
	Nitesh Shetty

On Mon, Mar 07, 2022 at 08:56:30AM +0900, Damien Le Moal wrote:
> btrfs maps zones to block groups and the sectors between zone capacity
> and zone size are marked as unusable. The report above is not showing
> that. The coding is correct though. The block allocation will not be
> attempted beyond zone capacity.

That does not explain or justify why zone size was used instead of zone
capacity. Using the zones size gives an incorrect inflated sense of actual
capacity, and users / userspace applications can easily missuse that.

Should other filesystems follow this logic as well? If so why?

  Luis

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [LSF/MM/BPF BoF] BoF for Zoned Storage
  2022-03-07 15:44       ` Luis Chamberlain
@ 2022-03-07 16:23         ` Johannes Thumshirn
  2022-03-07 16:36           ` Luis Chamberlain
  0 siblings, 1 reply; 59+ messages in thread
From: Johannes Thumshirn @ 2022-03-07 16:23 UTC (permalink / raw)
  To: Luis Chamberlain, Damien Le Moal
  Cc: Dave Chinner, linux-block, linux-fsdevel, lsf-pc,
	Matias Bjørling, Javier González, Damien Le Moal,
	Bart Van Assche, Adam Manzanares, Keith Busch, Naohiro Aota,
	Pankaj Raghav, Kanchan Joshi, Nitesh Shetty

On 07/03/2022 16:44, Luis Chamberlain wrote:
> On Mon, Mar 07, 2022 at 08:56:30AM +0900, Damien Le Moal wrote:
>> btrfs maps zones to block groups and the sectors between zone capacity
>> and zone size are marked as unusable. The report above is not showing
>> that. The coding is correct though. The block allocation will not be
>> attempted beyond zone capacity.
> 
> That does not explain or justify why zone size was used instead of zone
> capacity. Using the zones size gives an incorrect inflated sense of actual
> capacity, and users / userspace applications can easily missuse that.
> 
> Should other filesystems follow this logic as well? If so why?
>

The justification is, when btrfs zoned support was implemented there was no 
zone capacity. This started with zns and thus btrfs' knowledge of 
zone_capacity came with it's zns support. So instead of playing the blame
game for whatever reason I don't want to know, you could have reported the
bug or fixed it yourself.

It's not that Naohiro, Damien or I aren't following bug reports of zoned btrfs
on the mailing lists.

Byte,
	Johannes

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [LSF/MM/BPF BoF] BoF for Zoned Storage
  2022-03-07 16:23         ` Johannes Thumshirn
@ 2022-03-07 16:36           ` Luis Chamberlain
  0 siblings, 0 replies; 59+ messages in thread
From: Luis Chamberlain @ 2022-03-07 16:36 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: Damien Le Moal, Dave Chinner, linux-block, linux-fsdevel, lsf-pc,
	Matias Bjørling, Javier González, Damien Le Moal,
	Bart Van Assche, Adam Manzanares, Keith Busch, Naohiro Aota,
	Pankaj Raghav, Kanchan Joshi, Nitesh Shetty

On Mon, Mar 07, 2022 at 04:23:46PM +0000, Johannes Thumshirn wrote:
> On 07/03/2022 16:44, Luis Chamberlain wrote:
> > On Mon, Mar 07, 2022 at 08:56:30AM +0900, Damien Le Moal wrote:
> >> btrfs maps zones to block groups and the sectors between zone capacity
> >> and zone size are marked as unusable. The report above is not showing
> >> that. The coding is correct though. The block allocation will not be
> >> attempted beyond zone capacity.
> > 
> > That does not explain or justify why zone size was used instead of zone
> > capacity. Using the zones size gives an incorrect inflated sense of actual
> > capacity, and users / userspace applications can easily missuse that.
> > 
> > Should other filesystems follow this logic as well? If so why?
> >
> 
> The justification is, when btrfs zoned support was implemented there was no 
> zone capacity. This started with zns and thus btrfs' knowledge of 
> zone_capacity came with it's zns support. So instead of playing the blame
> game for whatever reason I don't want to know, you could have reported the
> bug or fixed it yourself.

I didn't realize it would be acknowledged as a bug, now that it is I'll
just go send a fix, thanks!

  Luis

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [LSF/MM/BPF BoF] BoF for Zoned Storage
  2022-03-07 15:15               ` Keith Busch
  2022-03-07 15:28                 ` Javier González
@ 2022-03-07 20:42                 ` Damien Le Moal
  2022-03-11  7:21                   ` Javier González
  1 sibling, 1 reply; 59+ messages in thread
From: Damien Le Moal @ 2022-03-07 20:42 UTC (permalink / raw)
  To: Keith Busch, Javier González
  Cc: James Bottomley, Luis Chamberlain, Dave Chinner, linux-block,
	linux-fsdevel, lsf-pc, Matias Bjørling, Damien Le Moal,
	Bart Van Assche, Adam Manzanares, Keith Busch,
	Johannes Thumshirn, Naohiro Aota, Pankaj Raghav, Kanchan Joshi,
	Nitesh Shetty

On 3/8/22 00:15, Keith Busch wrote:
> On Mon, Mar 07, 2022 at 03:35:12PM +0100, Javier González wrote:
>> As I mentioned in the last reply to to Dave, the main concern for me
>> at the moment is supporting arbitrary zone sizes in the kernel. If we
>> can agree on a path towards that, we can definitely commit to focus on
>> ZoneFS and implement support for it on the different places we
>> maintain in user-space. 
> 
> FWIW, the block layer doesn't require pow2 chunk_sectors anymore, so it
> looks like that requirement for zone sizes can be relaxed, too.

As long as:
1) Userspace does not break (really not sure about that one...)
2) No performance regression: the overhead of using multiplications &
divisions for sector to zone conversions must be acceptable for ZNS (it
will not matter for SMR HDDs)

All in kernel users of zoned devices will need some patching (zonefs,
btrfs, f2fs). Some will not work anymore (e.g. f2fs) and others will
need different constraints (btrfs needs 64K aligned zones). Not all
zoned devices will be usable anymore, and I am not sure if this
degradation in the support provided is acceptable.

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [LSF/MM/BPF BoF] BoF for Zoned Storage
  2022-03-07  7:12           ` Dave Chinner
  2022-03-07 10:27             ` Matias Bjørling
@ 2022-03-11  0:49             ` Luis Chamberlain
  2022-03-11  6:07               ` Christoph Hellwig
  1 sibling, 1 reply; 59+ messages in thread
From: Luis Chamberlain @ 2022-03-11  0:49 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Javier González, linux-block, linux-fsdevel, lsf-pc,
	Matias Bjørling, Damien Le Moal, Bart Van Assche,
	Adam Manzanares, Keith Busch, Johannes Thumshirn, Naohiro Aota,
	Pankaj Raghav, Kanchan Joshi, Nitesh Shetty, Jaegeuk Kim

On Mon, Mar 07, 2022 at 06:12:29PM +1100, Dave Chinner wrote:
> The generic interface that the kernel provides for zoned storage is
> called ZoneFS. Forget about the fact it is a filesystem, all it
> does is provide userspace with a named zone abstraction for a zoned
> device: every zone is an append-only file.

We seem to be reaching consensus on a path forward to use ZoneFS for
raw access.

> > My point is that there is space for both ZoneFS and raw zoned block
> > device. And regarding !PO2 zone sizes, my point is that this can be
> > leveraged both by btrfs and this raw zone block device.
> 
> On that I disagree - any argument that starts with "we need raw
> zoned block device access to ...." is starting from an invalid
> premise.

This seems reasonable given the possibility to bring folks forward
with ZoneFS.

> We should be hiding hardware quirks from userspace, not
> exposing them further.

ZoneFS requires a block device and such block device cannot be exposed
if the zone size != PO2. So sadly ZoneFS cannot be used by !PO2 ZNS
drives.

> IMO, we want writing zone storage native applications to be simple
> and approachable by anyone who knows how to write to append-only
> files.  We do not want such applications to be limited to people who
> have deep and rare expertise in the dark details of, say, largely
> undocumented niche NVMe ZNS specification and protocol quirks.
>
> ZoneFS provides us with a path to the former, what you are
> advocating is the latter....

That surely simplifies things if we can use ZoneFS!

Some filesystems who want to support zone storage natively have been
extended to do things to help with these quirks. My concerns were the
divergence on approaches to how filesystems use ZNS as well. Do you have
any plans to consider such efforts for XFS or would you rather build on
ZoneFS somehow?

  Luis

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [LSF/MM/BPF BoF] BoF for Zoned Storage
  2022-03-11  0:49             ` Luis Chamberlain
@ 2022-03-11  6:07               ` Christoph Hellwig
  2022-03-11 20:31                 ` Luis Chamberlain
  0 siblings, 1 reply; 59+ messages in thread
From: Christoph Hellwig @ 2022-03-11  6:07 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: Dave Chinner, Javier González, linux-block, linux-fsdevel,
	lsf-pc, Matias Bjørling, Damien Le Moal, Bart Van Assche,
	Adam Manzanares, Keith Busch, Johannes Thumshirn, Naohiro Aota,
	Pankaj Raghav, Kanchan Joshi, Nitesh Shetty, Jaegeuk Kim

On Thu, Mar 10, 2022 at 04:49:06PM -0800, Luis Chamberlain wrote:
> Some filesystems who want to support zone storage natively have been
> extended to do things to help with these quirks. My concerns were the
> divergence on approaches to how filesystems use ZNS as well. Do you have
> any plans to consider such efforts for XFS or would you rather build on
> ZoneFS somehow?

XFS will always require a random writable area for metadata.  I have
an old early draft with a fully zone aware allocator essentially
replacing the realtime subvolume.  But it's been catching dust so far,
maybe I'll have a chance to resurrect it if I don't have too fight too
many stupid patchseries all at once.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [LSF/MM/BPF BoF] BoF for Zoned Storage
  2022-03-07 20:42                 ` Damien Le Moal
@ 2022-03-11  7:21                   ` Javier González
  2022-03-11  7:39                     ` Damien Le Moal
  0 siblings, 1 reply; 59+ messages in thread
From: Javier González @ 2022-03-11  7:21 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Keith Busch, James Bottomley, Luis Chamberlain, Dave Chinner,
	linux-block, linux-fsdevel, lsf-pc, Matias Bjørling,
	Damien Le Moal, Bart Van Assche, Adam Manzanares, Keith Busch,
	Johannes Thumshirn, Naohiro Aota, Pankaj Raghav, Kanchan Joshi,
	Nitesh Shetty

On 08.03.2022 05:42, Damien Le Moal wrote:
>On 3/8/22 00:15, Keith Busch wrote:
>> On Mon, Mar 07, 2022 at 03:35:12PM +0100, Javier González wrote:
>>> As I mentioned in the last reply to to Dave, the main concern for me
>>> at the moment is supporting arbitrary zone sizes in the kernel. If we
>>> can agree on a path towards that, we can definitely commit to focus on
>>> ZoneFS and implement support for it on the different places we
>>> maintain in user-space.
>>
>> FWIW, the block layer doesn't require pow2 chunk_sectors anymore, so it
>> looks like that requirement for zone sizes can be relaxed, too.
>
>As long as:
>1) Userspace does not break (really not sure about that one...)
>2) No performance regression: the overhead of using multiplications &
>divisions for sector to zone conversions must be acceptable for ZNS (it
>will not matter for SMR HDDs)

Good. The emulation patches we sent should cover this.

>All in kernel users of zoned devices will need some patching (zonefs,
>btrfs, f2fs). Some will not work anymore (e.g. f2fs) and others will
>need different constraints (btrfs needs 64K aligned zones). Not all
>zoned devices will be usable anymore, and I am not sure if this
>degradation in the support provided is acceptable.

We will do the work for btrfs (already have a prototype) and for zonefs
(we need to look into it). F2FS will use the emulation layer for now;
only !PO2 devices will pay the price. We will add a knob in the block
layer so that F2FS can force enable the emulation.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [LSF/MM/BPF BoF] BoF for Zoned Storage
  2022-03-11  7:21                   ` Javier González
@ 2022-03-11  7:39                     ` Damien Le Moal
  2022-03-11  7:42                       ` Christoph Hellwig
  0 siblings, 1 reply; 59+ messages in thread
From: Damien Le Moal @ 2022-03-11  7:39 UTC (permalink / raw)
  To: Javier González
  Cc: Keith Busch, James Bottomley, Luis Chamberlain, Dave Chinner,
	linux-block, linux-fsdevel, lsf-pc, Matias Bjørling,
	Damien Le Moal, Bart Van Assche, Adam Manzanares, Keith Busch,
	Johannes Thumshirn, Naohiro Aota, Pankaj Raghav, Kanchan Joshi,
	Nitesh Shetty

On 3/11/22 16:21, Javier González wrote:
> On 08.03.2022 05:42, Damien Le Moal wrote:
>> On 3/8/22 00:15, Keith Busch wrote:
>>> On Mon, Mar 07, 2022 at 03:35:12PM +0100, Javier González wrote:
>>>> As I mentioned in the last reply to to Dave, the main concern for me
>>>> at the moment is supporting arbitrary zone sizes in the kernel. If we
>>>> can agree on a path towards that, we can definitely commit to focus on
>>>> ZoneFS and implement support for it on the different places we
>>>> maintain in user-space.
>>>
>>> FWIW, the block layer doesn't require pow2 chunk_sectors anymore, so it
>>> looks like that requirement for zone sizes can be relaxed, too.
>>
>> As long as:
>> 1) Userspace does not break (really not sure about that one...)
>> 2) No performance regression: the overhead of using multiplications &
>> divisions for sector to zone conversions must be acceptable for ZNS (it
>> will not matter for SMR HDDs)
> 
> Good. The emulation patches we sent should cover this.
> 
>> All in kernel users of zoned devices will need some patching (zonefs,
>> btrfs, f2fs). Some will not work anymore (e.g. f2fs) and others will
>> need different constraints (btrfs needs 64K aligned zones). Not all
>> zoned devices will be usable anymore, and I am not sure if this
>> degradation in the support provided is acceptable.
> 
> We will do the work for btrfs (already have a prototype) and for zonefs
> (we need to look into it). F2FS will use the emulation layer for now;
> only !PO2 devices will pay the price. We will add a knob in the block
> layer so that F2FS can force enable the emulation.

No. The FS has no business changing the device.


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [LSF/MM/BPF BoF] BoF for Zoned Storage
  2022-03-11  7:39                     ` Damien Le Moal
@ 2022-03-11  7:42                       ` Christoph Hellwig
  2022-03-11  7:53                         ` Javier González
  0 siblings, 1 reply; 59+ messages in thread
From: Christoph Hellwig @ 2022-03-11  7:42 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Javier González, Keith Busch, James Bottomley,
	Luis Chamberlain, Dave Chinner, linux-block, linux-fsdevel,
	lsf-pc, Matias Bjørling, Damien Le Moal, Bart Van Assche,
	Adam Manzanares, Keith Busch, Johannes Thumshirn, Naohiro Aota,
	Pankaj Raghav, Kanchan Joshi, Nitesh Shetty

On Fri, Mar 11, 2022 at 04:39:12PM +0900, Damien Le Moal wrote:
> > (we need to look into it). F2FS will use the emulation layer for now;
> > only !PO2 devices will pay the price. We will add a knob in the block
> > layer so that F2FS can force enable the emulation.
> 
> No. The FS has no business changing the device.

And nvme will not support any kind of emulation if that wasn't clear.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [LSF/MM/BPF BoF] BoF for Zoned Storage
  2022-03-11  7:42                       ` Christoph Hellwig
@ 2022-03-11  7:53                         ` Javier González
  2022-03-11  8:46                           ` Christoph Hellwig
  0 siblings, 1 reply; 59+ messages in thread
From: Javier González @ 2022-03-11  7:53 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Damien Le Moal, Keith Busch, James Bottomley, Luis Chamberlain,
	Dave Chinner, linux-block, linux-fsdevel, lsf-pc,
	Matias Bjørling, Damien Le Moal, Bart Van Assche,
	Adam Manzanares, Keith Busch, Johannes Thumshirn, Naohiro Aota,
	Pankaj Raghav, Kanchan Joshi, Nitesh Shetty

On 10.03.2022 23:42, Christoph Hellwig wrote:
>On Fri, Mar 11, 2022 at 04:39:12PM +0900, Damien Le Moal wrote:
>> > (we need to look into it). F2FS will use the emulation layer for now;
>> > only !PO2 devices will pay the price. We will add a knob in the block
>> > layer so that F2FS can force enable the emulation.
>>
>> No. The FS has no business changing the device.

Ok. Then it can be something the user can set.

>
>And nvme will not support any kind of emulation if that wasn't clear.

How do you propose we meed the request from Damien to support _all_
existing users if we remove the PO2 constraint from the block layer?

The emulation is not the goal, but it seems to be a requirement


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [LSF/MM/BPF BoF] BoF for Zoned Storage
  2022-03-11  7:53                         ` Javier González
@ 2022-03-11  8:46                           ` Christoph Hellwig
  2022-03-11  8:59                             ` Javier González
  0 siblings, 1 reply; 59+ messages in thread
From: Christoph Hellwig @ 2022-03-11  8:46 UTC (permalink / raw)
  To: Javier González
  Cc: Christoph Hellwig, Damien Le Moal, Keith Busch, James Bottomley,
	Luis Chamberlain, Dave Chinner, linux-block, linux-fsdevel,
	lsf-pc, Matias Bjørling, Damien Le Moal, Bart Van Assche,
	Adam Manzanares, Keith Busch, Johannes Thumshirn, Naohiro Aota,
	Pankaj Raghav, Kanchan Joshi, Nitesh Shetty

On Fri, Mar 11, 2022 at 08:53:17AM +0100, Javier González wrote:
> How do you propose we meed the request from Damien to support _all_
> existing users if we remove the PO2 constraint from the block layer?

By actually making the users support it.  Not by adding crap to
block drivers to pretend that they are exposing something totally
different than what they actually are.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [LSF/MM/BPF BoF] BoF for Zoned Storage
  2022-03-11  8:46                           ` Christoph Hellwig
@ 2022-03-11  8:59                             ` Javier González
  2022-03-12  8:03                               ` Damien Le Moal
  0 siblings, 1 reply; 59+ messages in thread
From: Javier González @ 2022-03-11  8:59 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Damien Le Moal, Keith Busch, James Bottomley, Luis Chamberlain,
	Dave Chinner, linux-block, linux-fsdevel, lsf-pc,
	Matias Bjørling, Damien Le Moal, Bart Van Assche,
	Adam Manzanares, Keith Busch, Johannes Thumshirn, Naohiro Aota,
	Pankaj Raghav, Kanchan Joshi, Nitesh Shetty

On 11.03.2022 00:46, Christoph Hellwig wrote:
>On Fri, Mar 11, 2022 at 08:53:17AM +0100, Javier González wrote:
>> How do you propose we meed the request from Damien to support _all_
>> existing users if we remove the PO2 constraint from the block layer?
>
>By actually making the users support it.  Not by adding crap to
>block drivers to pretend that they are exposing something totally
>different than what they actually are.

Ok. Is it reasonable for you that we start removing the PO2 check in the
block layer and then add btrfs support? This will mean that some
applications that assume PO2 will not work.

	Damien: Are you OK with this?

We can then work on other parts as needed (e.g., ZoneFS)

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [LSF/MM/BPF BoF] BoF for Zoned Storage
  2022-03-11  6:07               ` Christoph Hellwig
@ 2022-03-11 20:31                 ` Luis Chamberlain
  0 siblings, 0 replies; 59+ messages in thread
From: Luis Chamberlain @ 2022-03-11 20:31 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Dave Chinner, Javier González, linux-block, linux-fsdevel,
	lsf-pc, Matias Bjørling, Damien Le Moal, Bart Van Assche,
	Adam Manzanares, Keith Busch, Johannes Thumshirn, Naohiro Aota,
	Pankaj Raghav, Kanchan Joshi, Nitesh Shetty, Jaegeuk Kim

On Thu, Mar 10, 2022 at 10:07:30PM -0800, Christoph Hellwig wrote:
> On Thu, Mar 10, 2022 at 04:49:06PM -0800, Luis Chamberlain wrote:
> > Some filesystems who want to support zone storage natively have been
> > extended to do things to help with these quirks. My concerns were the
> > divergence on approaches to how filesystems use ZNS as well. Do you have
> > any plans to consider such efforts for XFS or would you rather build on
> > ZoneFS somehow?
> 
> XFS will always require a random writable area for metadata.

XFS also supports an external journal, so could that go through
a conventional zone?

> I have
> an old early draft with a fully zone aware allocator essentially
> replacing the realtime subvolume.  But it's been catching dust so far,
> maybe I'll have a chance to resurrect it if I don't have too fight too
> many stupid patchseries all at once.

Good to know thanks!

I was wondering weather or not Chinner's subvolume concept could be
applied to ZoneFS for the data area. But given the file would be on
another filesystem made me think this would probably not be possible.
Then there is the append only requirement as well...

  Luis

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [LSF/MM/BPF BoF] BoF for Zoned Storage
  2022-03-11  8:59                             ` Javier González
@ 2022-03-12  8:03                               ` Damien Le Moal
  0 siblings, 0 replies; 59+ messages in thread
From: Damien Le Moal @ 2022-03-12  8:03 UTC (permalink / raw)
  To: Javier González, Christoph Hellwig
  Cc: Keith Busch, James Bottomley, Luis Chamberlain, Dave Chinner,
	linux-block, linux-fsdevel, lsf-pc, Matias Bjørling,
	Damien Le Moal, Bart Van Assche, Adam Manzanares, Keith Busch,
	Johannes Thumshirn, Naohiro Aota, Pankaj Raghav, Kanchan Joshi,
	Nitesh Shetty

On 3/11/22 17:59, Javier González wrote:
> On 11.03.2022 00:46, Christoph Hellwig wrote:
>> On Fri, Mar 11, 2022 at 08:53:17AM +0100, Javier González wrote:
>>> How do you propose we meed the request from Damien to support _all_
>>> existing users if we remove the PO2 constraint from the block layer?
>>
>> By actually making the users support it.  Not by adding crap to
>> block drivers to pretend that they are exposing something totally
>> different than what they actually are.
> 
> Ok. Is it reasonable for you that we start removing the PO2 check in the
> block layer and then add btrfs support? This will mean that some
> applications that assume PO2 will not work.
> 
> 	Damien: Are you OK with this?

See my answer to Luis's email.

> 
> We can then work on other parts as needed (e.g., ZoneFS)


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 59+ messages in thread

* RE: [EXT] [LSF/MM/BPF BoF] BoF for Zoned Storage
  2022-03-03  0:56 [LSF/MM/BPF BoF] BoF for Zoned Storage Luis Chamberlain
                   ` (7 preceding siblings ...)
  2022-03-04  0:10 ` Dave Chinner
@ 2022-03-15 18:08 ` Luca Porzio (lporzio)
  2022-03-15 18:39   ` Bart Van Assche
  8 siblings, 1 reply; 59+ messages in thread
From: Luca Porzio (lporzio) @ 2022-03-15 18:08 UTC (permalink / raw)
  To: Luis Chamberlain, linux-block, linux-fsdevel, lsf-pc
  Cc: Matias Bjørling, Javier González, Damien Le Moal,
	Bart Van Assche, Adam Manzanares, Keith Busch,
	Johannes Thumshirn, Naohiro Aota, Pankaj Raghav, Kanchan Joshi,
	Nitesh Shetty

> Thinking proactively about LSFMM, regarding just Zone storage..
> 
> I'd like to propose a BoF for Zoned Storage. The point of it is to address the
> existing point points we have and take advantage of having folks in the room
> we can likely settle on things faster which otherwise would take years.
> 
> I'll throw at least one topic out:
> 
>   * Raw access for zone append for microbenchmarks:
>         - are we really happy with the status quo?
>         - if not what outlets do we have?
> 
> I think the nvme passthrogh stuff deserves it's own shared discussion though
> and should not make it part of the BoF.
> 
>   Luis

Hi,

I'm doing some study on how ZNS may be introduced in UFS+embedded (android) 
platforms and share this study with you to discuss changes which might be required 
in Linux to support ZNS for embedded devices.

Can I get invitation?

Cheers,
   Luca


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [EXT] [LSF/MM/BPF BoF] BoF for Zoned Storage
  2022-03-15 18:08 ` [EXT] " Luca Porzio (lporzio)
@ 2022-03-15 18:39   ` Bart Van Assche
  2022-03-15 18:47     ` Bean Huo (beanhuo)
  0 siblings, 1 reply; 59+ messages in thread
From: Bart Van Assche @ 2022-03-15 18:39 UTC (permalink / raw)
  To: Luca Porzio (lporzio),
	Luis Chamberlain, linux-block, linux-fsdevel, lsf-pc
  Cc: Matias Bjørling, Javier González, Damien Le Moal,
	Adam Manzanares, Keith Busch, Johannes Thumshirn, Naohiro Aota,
	Pankaj Raghav, Kanchan Joshi, Nitesh Shetty

On 3/15/22 11:08, Luca Porzio (lporzio) wrote:
> Can I get invitation?

Hi Luca,

A link to the form to request attendance is available at 
https://lore.kernel.org/all/YherWymi1E%2FhP%2FsS@localhost.localdomain/

Best regards,

Bart.


^ permalink raw reply	[flat|nested] 59+ messages in thread

* RE: [EXT] [LSF/MM/BPF BoF] BoF for Zoned Storage
  2022-03-15 18:39   ` Bart Van Assche
@ 2022-03-15 18:47     ` Bean Huo (beanhuo)
  2022-03-15 18:49       ` Jens Axboe
  0 siblings, 1 reply; 59+ messages in thread
From: Bean Huo (beanhuo) @ 2022-03-15 18:47 UTC (permalink / raw)
  To: Bart Van Assche, Luca Porzio (lporzio),
	Luis Chamberlain, linux-block, linux-fsdevel, lsf-pc
  Cc: Matias Bjørling, Javier González, Damien Le Moal,
	Adam Manzanares, Keith Busch, Johannes Thumshirn, Naohiro Aota,
	Pankaj Raghav, Kanchan Joshi, Nitesh Shetty

Micron Confidential

> Subject: Re: [EXT] [LSF/MM/BPF BoF] BoF for Zoned Storage
> 
> CAUTION: EXTERNAL EMAIL. Do not click links or open attachments unless you
> recognize the sender and were expecting this message.
> 
> 
> On 3/15/22 11:08, Luca Porzio (lporzio) wrote:
> > Can I get invitation?
> 
> Hi Luca,
> 
> A link to the form to request attendance is available at
> https://urldefense.com/v3/__https://lore.kernel.org/all/YherWymi1E*2FhP*2FsS
> @localhost.localdomain/__;JSU!!KZTdOCjhgt4hgw!tFRrqHl5Gjfz2VyrwS6W0b7bSiJ
> TfqnxSK8-zPikIlJlFQmpizYjABJZLTnyOg$
> 

Bart,

We had submitted the request via Google Forms and I regret it appears to be after the requested
deadline of March 1st.

kind regards,
Bean

> Best regards,
> 
> Bart.


Micron Confidential

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [EXT] [LSF/MM/BPF BoF] BoF for Zoned Storage
  2022-03-15 18:47     ` Bean Huo (beanhuo)
@ 2022-03-15 18:49       ` Jens Axboe
  2022-03-15 19:04         ` Bean Huo (beanhuo)
  0 siblings, 1 reply; 59+ messages in thread
From: Jens Axboe @ 2022-03-15 18:49 UTC (permalink / raw)
  To: Bean Huo (beanhuo), Bart Van Assche, Luca Porzio (lporzio),
	Luis Chamberlain, linux-block, linux-fsdevel, lsf-pc
  Cc: Matias Bjørling, Javier González, Damien Le Moal,
	Adam Manzanares, Keith Busch, Johannes Thumshirn, Naohiro Aota,
	Pankaj Raghav, Kanchan Joshi, Nitesh Shetty

On 3/15/22 12:47 PM, Bean Huo (beanhuo) wrote:
> Micron Confidential
 
> 
> Micron Confidential

Must be very confidential if it needs two?

Please get rid of these useless disclaimers in public emails, they make
ZERO sense.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 59+ messages in thread

* RE: [EXT] [LSF/MM/BPF BoF] BoF for Zoned Storage
  2022-03-15 18:49       ` Jens Axboe
@ 2022-03-15 19:04         ` Bean Huo (beanhuo)
  2022-03-15 19:16           ` Jens Axboe
  2022-03-15 19:59           ` Bart Van Assche
  0 siblings, 2 replies; 59+ messages in thread
From: Bean Huo (beanhuo) @ 2022-03-15 19:04 UTC (permalink / raw)
  To: Jens Axboe, Bart Van Assche, Luca Porzio (lporzio),
	Luis Chamberlain, linux-block, linux-fsdevel, lsf-pc
  Cc: Matias Bjørling, Javier González, Damien Le Moal,
	Adam Manzanares, Keith Busch, Johannes Thumshirn, Naohiro Aota,
	Pankaj Raghav, Kanchan Joshi, Nitesh Shetty

> -----Original Message-----
> From: Jens Axboe <axboe@kernel.dk>
> Sent: Tuesday, March 15, 2022 7:49 PM
> To: Bean Huo (beanhuo) <beanhuo@micron.com>; Bart Van Assche
> <bvanassche@acm.org>; Luca Porzio (lporzio) <lporzio@micron.com>; Luis
> Chamberlain <mcgrof@kernel.org>; linux-block@vger.kernel.org; linux-
> fsdevel@vger.kernel.org; lsf-pc@lists.linux-foundation.org
> Cc: Matias Bjørling <Matias.Bjorling@wdc.com>; Javier González
> <javier.gonz@samsung.com>; Damien Le Moal <Damien.LeMoal@wdc.com>;
> Adam Manzanares <a.manzanares@samsung.com>; Keith Busch
> <Keith.Busch@wdc.com>; Johannes Thumshirn <Johannes.Thumshirn@wdc.com>;
> Naohiro Aota <Naohiro.Aota@wdc.com>; Pankaj Raghav
> <pankydev8@gmail.com>; Kanchan Joshi <joshi.k@samsung.com>; Nitesh Shetty
> <nj.shetty@samsung.com>
> Subject: Re: [EXT] [LSF/MM/BPF BoF] BoF for Zoned Storage
> 
> CAUTION: EXTERNAL EMAIL. Do not click links or open attachments unless you
> recognize the sender and were expecting this message.
> 
> 
> On 3/15/22 12:47 PM, Bean Huo (beanhuo) wrote:
> > Micron Confidential
> 
> >
> > Micron Confidential
> 
> Must be very confidential if it needs two?
> 
> Please get rid of these useless disclaimers in public emails, they make ZERO sense.
> 

Sorry for that. They are added by outlook automatically, seems I can turn it off, let me see if this email has this message.

Kind regards,
Bean

> --
> Jens Axboe


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [EXT] [LSF/MM/BPF BoF] BoF for Zoned Storage
  2022-03-15 19:04         ` Bean Huo (beanhuo)
@ 2022-03-15 19:16           ` Jens Axboe
  2022-03-15 19:59           ` Bart Van Assche
  1 sibling, 0 replies; 59+ messages in thread
From: Jens Axboe @ 2022-03-15 19:16 UTC (permalink / raw)
  To: Bean Huo (beanhuo), Bart Van Assche, Luca Porzio (lporzio),
	Luis Chamberlain, linux-block, linux-fsdevel, lsf-pc
  Cc: Matias Bjørling, Javier González, Damien Le Moal,
	Adam Manzanares, Keith Busch, Johannes Thumshirn, Naohiro Aota,
	Pankaj Raghav, Kanchan Joshi, Nitesh Shetty

On 3/15/22 1:04 PM, Bean Huo (beanhuo) wrote:
>> -----Original Message-----
>> From: Jens Axboe <axboe@kernel.dk>
>> Sent: Tuesday, March 15, 2022 7:49 PM
>> To: Bean Huo (beanhuo) <beanhuo@micron.com>; Bart Van Assche
>> <bvanassche@acm.org>; Luca Porzio (lporzio) <lporzio@micron.com>; Luis
>> Chamberlain <mcgrof@kernel.org>; linux-block@vger.kernel.org; linux-
>> fsdevel@vger.kernel.org; lsf-pc@lists.linux-foundation.org
>> Cc: Matias Bj?rling <Matias.Bjorling@wdc.com>; Javier Gonz?lez
>> <javier.gonz@samsung.com>; Damien Le Moal <Damien.LeMoal@wdc.com>;
>> Adam Manzanares <a.manzanares@samsung.com>; Keith Busch
>> <Keith.Busch@wdc.com>; Johannes Thumshirn <Johannes.Thumshirn@wdc.com>;
>> Naohiro Aota <Naohiro.Aota@wdc.com>; Pankaj Raghav
>> <pankydev8@gmail.com>; Kanchan Joshi <joshi.k@samsung.com>; Nitesh Shetty
>> <nj.shetty@samsung.com>
>> Subject: Re: [EXT] [LSF/MM/BPF BoF] BoF for Zoned Storage

All of this is duplicated info too, it just makes your emails have poor
signal to noise ratio...

>> CAUTION: EXTERNAL EMAIL. Do not click links or open attachments unless you
>> recognize the sender and were expecting this message.

Same

>>> Micron Confidential
>>
>>>
>>> Micron Confidential
>>
>> Must be very confidential if it needs two?
>>
>> Please get rid of these useless disclaimers in public emails, they make ZERO sense.
>>
> 
> Sorry for that. They are added by outlook automatically, seems I can
> turn it off, let me see if this email has this message.

In general, advise for open source or open list emails:

- Don't include any "Foo company confidential", which by definition is
  nonsensical because the email is sent out publically.

- Wrap lines in emails. Wrapping them at 74 chars or something like that
  makes them a LOT easier to read, and means I don't have to wrap your
  replies when I reply.

- Don't include huge headers of who got the email. That part is in the
  email headers already, and it's just noise in the body of the email.

- Trim replies! Nothing worse than browsing page after page of just
  quoted text to get to the meat of it.

That's just the basics, but it goes a long way towards making email a
more useful medium. And making the sender/company look like they
understand how open source collaborations and communication works.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [EXT] [LSF/MM/BPF BoF] BoF for Zoned Storage
  2022-03-15 19:04         ` Bean Huo (beanhuo)
  2022-03-15 19:16           ` Jens Axboe
@ 2022-03-15 19:59           ` Bart Van Assche
  1 sibling, 0 replies; 59+ messages in thread
From: Bart Van Assche @ 2022-03-15 19:59 UTC (permalink / raw)
  To: Bean Huo (beanhuo), Jens Axboe, Luca Porzio (lporzio),
	Luis Chamberlain, linux-block, linux-fsdevel, lsf-pc
  Cc: Matias Bjørling, Javier González, Damien Le Moal,
	Adam Manzanares, Keith Busch, Johannes Thumshirn, Naohiro Aota,
	Pankaj Raghav, Kanchan Joshi, Nitesh Shetty

On 3/15/22 12:04, Bean Huo (beanhuo) wrote:
> Sorry for that. They are added by outlook automatically, seems I can
> turn it off, let me see if this email has this message.
When I was working for WDC I used the Evolution mail client to connect 
to their Office365 email infrastructure. Evolution is much better suited 
to reply to open source emails than Outlook.

Bart.

^ permalink raw reply	[flat|nested] 59+ messages in thread

end of thread, other threads:[~2022-03-15 19:59 UTC | newest]

Thread overview: 59+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-03  0:56 [LSF/MM/BPF BoF] BoF for Zoned Storage Luis Chamberlain
2022-03-03  1:03 ` Luis Chamberlain
2022-03-03  1:33 ` Bart Van Assche
2022-03-03  4:31   ` Matias Bjørling
2022-03-03  5:21     ` Adam Manzanares
2022-03-03  5:32 ` Javier González
2022-03-03  6:29   ` Javier González
2022-03-03  7:54     ` Pankaj Raghav
2022-03-03  9:49     ` Damien Le Moal
2022-03-03 14:55       ` Adam Manzanares
2022-03-03 15:22         ` Damien Le Moal
2022-03-03 17:10           ` Adam Manzanares
2022-03-03 19:51             ` Matias Bjørling
2022-03-03 20:18               ` Adam Manzanares
2022-03-03 21:08                 ` Javier González
2022-03-03 21:33                 ` Matias Bjørling
2022-03-04 20:12                   ` Luis Chamberlain
2022-03-06 23:54                     ` Damien Le Moal
2022-03-03 16:12     ` Himanshu Madhani
2022-03-03  7:21 ` Hannes Reinecke
2022-03-03  8:55   ` Damien Le Moal
2022-03-03  7:38 ` Kanchan Joshi
2022-03-03  8:43 ` Johannes Thumshirn
2022-03-03 18:20 ` Viacheslav Dubeyko
2022-03-04  0:10 ` Dave Chinner
2022-03-04 22:10   ` Luis Chamberlain
2022-03-04 22:42     ` Dave Chinner
2022-03-04 22:55       ` Luis Chamberlain
2022-03-05  7:33         ` Javier González
2022-03-07  7:12           ` Dave Chinner
2022-03-07 10:27             ` Matias Bjørling
2022-03-07 11:29               ` Javier González
2022-03-11  0:49             ` Luis Chamberlain
2022-03-11  6:07               ` Christoph Hellwig
2022-03-11 20:31                 ` Luis Chamberlain
2022-03-07 13:55           ` James Bottomley
2022-03-07 14:35             ` Javier González
2022-03-07 15:15               ` Keith Busch
2022-03-07 15:28                 ` Javier González
2022-03-07 20:42                 ` Damien Le Moal
2022-03-11  7:21                   ` Javier González
2022-03-11  7:39                     ` Damien Le Moal
2022-03-11  7:42                       ` Christoph Hellwig
2022-03-11  7:53                         ` Javier González
2022-03-11  8:46                           ` Christoph Hellwig
2022-03-11  8:59                             ` Javier González
2022-03-12  8:03                               ` Damien Le Moal
2022-03-07  0:07         ` Damien Le Moal
2022-03-06 23:56     ` Damien Le Moal
2022-03-07 15:44       ` Luis Chamberlain
2022-03-07 16:23         ` Johannes Thumshirn
2022-03-07 16:36           ` Luis Chamberlain
2022-03-15 18:08 ` [EXT] " Luca Porzio (lporzio)
2022-03-15 18:39   ` Bart Van Assche
2022-03-15 18:47     ` Bean Huo (beanhuo)
2022-03-15 18:49       ` Jens Axboe
2022-03-15 19:04         ` Bean Huo (beanhuo)
2022-03-15 19:16           ` Jens Axboe
2022-03-15 19:59           ` Bart Van Assche

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.