All of lore.kernel.org
 help / color / mirror / Atom feed
* [LSF/MM/BPF BoF]: A host FTL for zoned block devices using UBLK
@ 2023-02-06 10:00 Hans Holmberg
  2023-02-06 12:49 ` Ming Lei
  2023-02-06 18:58 ` Bart Van Assche
  0 siblings, 2 replies; 12+ messages in thread
From: Hans Holmberg @ 2023-02-06 10:00 UTC (permalink / raw)
  To: linux-block
  Cc: ming.lei, Matias Bjørling, Damien Le Moal,
	Dennis Maisenbacher, Ajay Joshi, Jørgen Hansen, andreas,
	javier, slava, kbusch, hans, mcgrof, guokuankuan,
	viacheslav.dubeyko, hch

I think we're missing a flexible way of routing random-ish
write workloads on to zoned storage devices. Implementing a UBLK
target for this would be a great way to provide zoned storage
benefits to a range of use cases. Creating UBLK target would
enable us experiment and move fast, and when we arrive
at a common, reasonably stable, solution we could move this into
the kernel.

We do have dm-zoned [3]in the kernel, but it requires a bounce
on conventional zones for non-sequential writes, resulting in a write
amplification of 2x (which is not optimal for flash).

Fully random workloads make little sense to store on ZBDs as a
host FTL could not be expected to do better than what conventional block
devices do today. Fully sequential writes are also well taken care of
by conventional block devices.

The interesting stuff is what lies in between those extremes.

I would like to discuss how we could use UBLK to implement a
common FTL with the right knobs to cater for a wide range of workloads
that utilize raw block devices. We had some knobs in  the now-dead pblk,
a FTL for open channel devices, but I think we could do way better than that.

Pblk did not require bouncing writes and had knobs for over-provisioning and
workload isolation which could be implemented. We could also add options
for different garbage collection policies. In userspace it would also 
be easy to support default block indirection sizes, reducing logical-physical
translation table memory overhead.

Use cases for such an FTL includes SSD caching stores such as Apache
traffic server [1] and CacheLib[2]. CacheLib's block cache and the apache
traffic server storage workloads are *almost* zone block device compatible
and would need little translation overhead to perform very well on e.g.
ZNS SSDs.

There are probably more use cases that would benefit.

It would also be a great research vehicle for academia. We've used dm-zap
for this [4] purpose the last couple of years, but that is not production-ready
and cumbersome to improve and maintain as it is implemented as a out-of-tree
device mapper.

ublk adds a bit of latency overhead, but I think this is acceptable at least
until we have a great, proven solution, which could be turned into
an in-kernel FTL.

If there is interest in the community for a project like this, let's talk!

cc:ing the folks who participated in the discussions at ALPSS 2021 and last
years' plumbers on this subject.

Thanks,
Hans

[1] https://trafficserver.apache.org/
[2] https://cachelib.org/
[3] https://docs.kernel.org/admin-guide/device-mapper/dm-zoned.html
[4] https://github.com/westerndigitalcorporation/dm-zap

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [LSF/MM/BPF BoF]: A host FTL for zoned block devices using UBLK
  2023-02-06 10:00 [LSF/MM/BPF BoF]: A host FTL for zoned block devices using UBLK Hans Holmberg
@ 2023-02-06 12:49 ` Ming Lei
  2023-02-06 12:54   ` Ming Lei
                     ` (2 more replies)
  2023-02-06 18:58 ` Bart Van Assche
  1 sibling, 3 replies; 12+ messages in thread
From: Ming Lei @ 2023-02-06 12:49 UTC (permalink / raw)
  To: Hans Holmberg
  Cc: linux-block, Matias Bjørling, Damien Le Moal,
	Dennis Maisenbacher, Ajay Joshi, Jørgen Hansen, andreas,
	javier, slava, kbusch, hans, mcgrof, guokuankuan,
	viacheslav.dubeyko, hch, ming.lei

On Mon, Feb 06, 2023 at 10:00:20AM +0000, Hans Holmberg wrote:
> I think we're missing a flexible way of routing random-ish
> write workloads on to zoned storage devices. Implementing a UBLK
> target for this would be a great way to provide zoned storage
> benefits to a range of use cases. Creating UBLK target would
> enable us experiment and move fast, and when we arrive
> at a common, reasonably stable, solution we could move this into
> the kernel.

Yeah, UBLK provides one easy way for fast prototype.

> 
> We do have dm-zoned [3]in the kernel, but it requires a bounce
> on conventional zones for non-sequential writes, resulting in a write
> amplification of 2x (which is not optimal for flash).
> 
> Fully random workloads make little sense to store on ZBDs as a
> host FTL could not be expected to do better than what conventional block
> devices do today. Fully sequential writes are also well taken care of
> by conventional block devices.
> 
> The interesting stuff is what lies in between those extremes.
> 
> I would like to discuss how we could use UBLK to implement a
> common FTL with the right knobs to cater for a wide range of workloads
> that utilize raw block devices. We had some knobs in  the now-dead pblk,
> a FTL for open channel devices, but I think we could do way better than that.
> 
> Pblk did not require bouncing writes and had knobs for over-provisioning and
> workload isolation which could be implemented. We could also add options
> for different garbage collection policies. In userspace it would also 
> be easy to support default block indirection sizes, reducing logical-physical
> translation table memory overhead.
> 
> Use cases for such an FTL includes SSD caching stores such as Apache
> traffic server [1] and CacheLib[2]. CacheLib's block cache and the apache
> traffic server storage workloads are *almost* zone block device compatible
> and would need little translation overhead to perform very well on e.g.
> ZNS SSDs.
> 
> There are probably more use cases that would benefit.
> 
> It would also be a great research vehicle for academia. We've used dm-zap
> for this [4] purpose the last couple of years, but that is not production-ready
> and cumbersome to improve and maintain as it is implemented as a out-of-tree
> device mapper.

Maybe it is one beginning for generic open-source userspace SSD FTL,
which could be useful for people curious in SSD internal. I have
google several times for such toolkit to see if it can be ported to
UBLK easily. SSD simulator isn't great, which isn't disk and can't handle
real data & workloads. With such project, SSD simulator could be less
useful, IMO.

> 
> ublk adds a bit of latency overhead, but I think this is acceptable at least
> until we have a great, proven solution, which could be turned into
> an in-kernel FTL.

We will keep improving ublk io path, and I am working on ublk
copy. Once it is done, big chunk IO latency could be reduced a lot.

 
Thanks,
Ming


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [LSF/MM/BPF BoF]: A host FTL for zoned block devices using UBLK
  2023-02-06 12:49 ` Ming Lei
@ 2023-02-06 12:54   ` Ming Lei
  2023-02-06 14:34   ` Matias Bjørling
       [not found]   ` <CGME20230207103212epcas5p41c50a45f9d892a53915e04b604a40149@epcas5p4.samsung.com>
  2 siblings, 0 replies; 12+ messages in thread
From: Ming Lei @ 2023-02-06 12:54 UTC (permalink / raw)
  To: Hans Holmberg
  Cc: linux-block, Matias Bjørling, Damien Le Moal,
	Dennis Maisenbacher, Ajay Joshi, Jørgen Hansen, andreas,
	javier, slava, kbusch, hans, mcgrof, guokuankuan,
	viacheslav.dubeyko, hch

On Mon, Feb 06, 2023 at 08:49:15PM +0800, Ming Lei wrote:
> > ublk adds a bit of latency overhead, but I think this is acceptable at least
> > until we have a great, proven solution, which could be turned into
> > an in-kernel FTL.
> 
> We will keep improving ublk io path, and I am working on ublk
> copy. Once it is done, big chunk IO latency could be reduced a lot.

s/copy/zero copy


Thanks,
Ming


^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [LSF/MM/BPF BoF]: A host FTL for zoned block devices using UBLK
  2023-02-06 12:49 ` Ming Lei
  2023-02-06 12:54   ` Ming Lei
@ 2023-02-06 14:34   ` Matias Bjørling
  2023-02-06 15:32     ` Ming Lei
                       ` (2 more replies)
       [not found]   ` <CGME20230207103212epcas5p41c50a45f9d892a53915e04b604a40149@epcas5p4.samsung.com>
  2 siblings, 3 replies; 12+ messages in thread
From: Matias Bjørling @ 2023-02-06 14:34 UTC (permalink / raw)
  To: Ming Lei, Hans Holmberg
  Cc: linux-block, Damien Le Moal, Dennis Maisenbacher, Ajay Joshi,
	Jørgen Hansen, andreas, javier, slava, kbusch, hans, mcgrof,
	guokuankuan, viacheslav.dubeyko, hch

> Maybe it is one beginning for generic open-source userspace SSD FTL, which
> could be useful for people curious in SSD internal. I have google several times
> for such toolkit to see if it can be ported to UBLK easily. SSD simulator isn't
> great, which isn't disk and can't handle real data & workloads. With such
> project, SSD simulator could be less useful, IMO.
> 

Another possible avenue could be the FTL module that's part of SPDK. It might be worth checking out as well. It has been battletested for a couple of years and is used in production (https://www.youtube.com/watch?v=qeNBSjGq0dA).

The module itself could be extracted from SPDK into its own, or SPDK's ublk extension could be used to instantiate it. In any case, I think it could provide a solid foundation for a host-side FTL implementation.

Best, Matias

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [LSF/MM/BPF BoF]: A host FTL for zoned block devices using UBLK
  2023-02-06 14:34   ` Matias Bjørling
@ 2023-02-06 15:32     ` Ming Lei
  2023-02-06 18:31     ` Bart Van Assche
  2023-02-07  9:32     ` Hans Holmberg
  2 siblings, 0 replies; 12+ messages in thread
From: Ming Lei @ 2023-02-06 15:32 UTC (permalink / raw)
  To: Matias Bjørling
  Cc: Hans Holmberg, linux-block, Damien Le Moal, Dennis Maisenbacher,
	Ajay Joshi, Jørgen Hansen, andreas, javier, slava, kbusch,
	hans, mcgrof, guokuankuan, viacheslav.dubeyko, hch

Hi Matias,

On Mon, Feb 06, 2023 at 02:34:51PM +0000, Matias Bjørling wrote:
> > Maybe it is one beginning for generic open-source userspace SSD FTL, which
> > could be useful for people curious in SSD internal. I have google several times
> > for such toolkit to see if it can be ported to UBLK easily. SSD simulator isn't
> > great, which isn't disk and can't handle real data & workloads. With such
> > project, SSD simulator could be less useful, IMO.
> > 
> 
> Another possible avenue could be the FTL module that's part of SPDK. It might be worth checking out as well. It has been battletested for a couple of years and is used in production (https://www.youtube.com/watch?v=qeNBSjGq0dA).
> 
> The module itself could be extracted from SPDK into its own, or SPDK's ublk extension could be used to instantiate it. In any case, I think it could provide a solid foundation for a host-side FTL implementation.

Great, I will take a look, and thanks for the sharing!

Thanks,
Ming


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [LSF/MM/BPF BoF]: A host FTL for zoned block devices using UBLK
  2023-02-06 14:34   ` Matias Bjørling
  2023-02-06 15:32     ` Ming Lei
@ 2023-02-06 18:31     ` Bart Van Assche
  2023-02-07  9:40       ` Matias Bjørling
  2023-02-07  9:32     ` Hans Holmberg
  2 siblings, 1 reply; 12+ messages in thread
From: Bart Van Assche @ 2023-02-06 18:31 UTC (permalink / raw)
  To: Matias Bjørling, Ming Lei, Hans Holmberg
  Cc: linux-block, Damien Le Moal, Dennis Maisenbacher, Ajay Joshi,
	Jørgen Hansen, andreas, javier, slava, kbusch, hans, mcgrof,
	guokuankuan, viacheslav.dubeyko, hch

On 2/6/23 06:34, Matias Bjørling wrote:
>> Maybe it is one beginning for generic open-source userspace SSD
>> FTL, which could be useful for people curious in SSD internal. I
>> have google several times for such toolkit to see if it can be
>> ported to UBLK easily. SSD simulator isn't great, which isn't disk
>> and can't handle real data & workloads. With such project, SSD
>> simulator could be less useful, IMO.
>> 
> 
> Another possible avenue could be the FTL module that's part of SPDK.
> It might be worth checking out as well. It has been battletested for
> a couple of years and is used in production
> (https://www.youtube.com/watch?v=qeNBSjGq0dA).
> 
> The module itself could be extracted from SPDK into its own, or
> SPDK's ublk extension could be used to instantiate it. In any case, I
> think it could provide a solid foundation for a host-side FTL
> implementation.

Thanks Matias for the link. I had not yet heard about this project. 
Although I have not yet had the time to watch the video, on 
https://spdk.io/doc/ftl.html I found the following: "The Flash 
Translation Layer library provides efficient 4K block device access on 
top of devices with >4K write unit size (eg. raid5f bdev) or devices 
with large indirection units (some capacity-focused NAND drives), which 
don't handle 4K writes well. It handles the logical to physical address 
mapping and manages the garbage collection process." To me that sounds 
like an effort that has very similar goals as ZNS and ZBC? Does the 
following advice apply to that project: "Don't stack your log on my 
log"? (Yang, Jingpei, Ned Plasson, Greg Gillis, Nisha Talagala, and 
Swaminathan Sundararaman. "Don’t stack your log on my log." In 2nd 
Workshop on Interactions of NVM/Flash with Operating Systems and 
Workloads ({INFLOW} 14). 2014.)

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [LSF/MM/BPF BoF]: A host FTL for zoned block devices using UBLK
  2023-02-06 10:00 [LSF/MM/BPF BoF]: A host FTL for zoned block devices using UBLK Hans Holmberg
  2023-02-06 12:49 ` Ming Lei
@ 2023-02-06 18:58 ` Bart Van Assche
  2023-02-07 12:11   ` Hans Holmberg
  1 sibling, 1 reply; 12+ messages in thread
From: Bart Van Assche @ 2023-02-06 18:58 UTC (permalink / raw)
  To: Hans Holmberg, linux-block
  Cc: ming.lei, Matias Bjørling, Damien Le Moal,
	Dennis Maisenbacher, Ajay Joshi, Jørgen Hansen, andreas,
	javier, slava, kbusch, hans, mcgrof, guokuankuan,
	viacheslav.dubeyko, hch

On 2/6/23 02:00, Hans Holmberg wrote:
> I think we're missing a flexible way of routing random-ish
> write workloads on to zoned storage devices. Implementing a UBLK
> target for this would be a great way to provide zoned storage
> benefits to a range of use cases. Creating UBLK target would
> enable us experiment and move fast, and when we arrive
> at a common, reasonably stable, solution we could move this into
> the kernel.
> 
> We do have dm-zoned [3]in the kernel, but it requires a bounce
> on conventional zones for non-sequential writes, resulting in a write
> amplification of 2x (which is not optimal for flash).
> 
> Fully random workloads make little sense to store on ZBDs as a
> host FTL could not be expected to do better than what conventional block
> devices do today. Fully sequential writes are also well taken care of
> by conventional block devices.
> 
> The interesting stuff is what lies in between those extremes.
> 
> I would like to discuss how we could use UBLK to implement a
> common FTL with the right knobs to cater for a wide range of workloads
> that utilize raw block devices. We had some knobs in  the now-dead pblk,
> a FTL for open channel devices, but I think we could do way better than that.
> 
> Pblk did not require bouncing writes and had knobs for over-provisioning and
> workload isolation which could be implemented. We could also add options
> for different garbage collection policies. In userspace it would also
> be easy to support default block indirection sizes, reducing logical-physical
> translation table memory overhead.
> 
> Use cases for such an FTL includes SSD caching stores such as Apache
> traffic server [1] and CacheLib[2]. CacheLib's block cache and the apache
> traffic server storage workloads are *almost* zone block device compatible
> and would need little translation overhead to perform very well on e.g.
> ZNS SSDs.
> 
> There are probably more use cases that would benefit.
> 
> It would also be a great research vehicle for academia. We've used dm-zap
> for this [4] purpose the last couple of years, but that is not production-ready
> and cumbersome to improve and maintain as it is implemented as a out-of-tree
> device mapper.
> 
> ublk adds a bit of latency overhead, but I think this is acceptable at least
> until we have a great, proven solution, which could be turned into
> an in-kernel FTL.
> 
> If there is interest in the community for a project like this, let's talk!
> 
> cc:ing the folks who participated in the discussions at ALPSS 2021 and last
> years' plumbers on this subject.
> 
> Thanks,
> Hans
> 
> [1] https://trafficserver.apache.org/
> [2] https://cachelib.org/
> [3] https://docs.kernel.org/admin-guide/device-mapper/dm-zoned.html
> [4] https://github.com/westerndigitalcorporation/dm-zap

Hi Hans,

Which functionality would such a user space target provide that is not 
yet provided by BTRFS, F2FS or any other log-structured filesystem that 
supports zoned block devices?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [LSF/MM/BPF BoF]: A host FTL for zoned block devices using UBLK
  2023-02-06 14:34   ` Matias Bjørling
  2023-02-06 15:32     ` Ming Lei
  2023-02-06 18:31     ` Bart Van Assche
@ 2023-02-07  9:32     ` Hans Holmberg
  2 siblings, 0 replies; 12+ messages in thread
From: Hans Holmberg @ 2023-02-07  9:32 UTC (permalink / raw)
  To: Matias Bjørling
  Cc: Ming Lei, Hans Holmberg, linux-block, Damien Le Moal,
	Dennis Maisenbacher, Ajay Joshi, Jørgen Hansen, andreas,
	javier, slava, kbusch, mcgrof, guokuankuan, viacheslav.dubeyko,
	hch

On Mon, Feb 6, 2023 at 3:35 PM Matias Bjørling <Matias.Bjorling@wdc.com> wrote:
>
> > Maybe it is one beginning for generic open-source userspace SSD FTL, which
> > could be useful for people curious in SSD internal. I have google several times
> > for such toolkit to see if it can be ported to UBLK easily. SSD simulator isn't
> > great, which isn't disk and can't handle real data & workloads. With such
> > project, SSD simulator could be less useful, IMO.
> >
>
> Another possible avenue could be the FTL module that's part of SPDK. It might be worth checking out as well. It has been battletested for a couple of years and is used in production (https://www.youtube.com/watch?v=qeNBSjGq0dA).
>
> The module itself could be extracted from SPDK into its own, or SPDK's ublk extension could be used to instantiate it. In any case, I think it could provide a solid foundation for a host-side FTL implementation.

Thanks for bringing SPDK's CSAL up, I think it's a great example of a
well implemented host-ftl.

It does require a fast caching device with persistence guarantees
(like optane) though, not entirely unlike dm-zoned.
It also lives in the spdk universe, which makes it a bit harder to
work with than a standalone ftl.

While a cache in front of the backing storage gives the ftl some time
to organize writes in a device-friendly manner
before flushing, it adds cost (write amplification or having to add a
fast persistent cache device)

I've seen that SPDK already has the required plumbing for UBLK:
https://spdk.io/doc/ublk.html
I don't know if IO can be routed to CSAL yet.

That said, it would be great to support the CSAL use case in a common ftl.
Not all workloads require a cache, so I think that caching should be optimal.
Raiding and supporting multiple tenants from a combined pool of
storage is super-nice.

Cheers,
Hans

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [LSF/MM/BPF BoF]: A host FTL for zoned block devices using UBLK
  2023-02-06 18:31     ` Bart Van Assche
@ 2023-02-07  9:40       ` Matias Bjørling
  0 siblings, 0 replies; 12+ messages in thread
From: Matias Bjørling @ 2023-02-07  9:40 UTC (permalink / raw)
  To: Bart Van Assche, Ming Lei, Hans Holmberg
  Cc: linux-block, Damien Le Moal, Dennis Maisenbacher, Ajay Joshi,
	Jørgen Hansen, andreas, javier, slava, kbusch, hans, mcgrof,
	guokuankuan, viacheslav.dubeyko, hch, Barczak, Mariusz,
	Malikowski, Wojciech

> > The module itself could be extracted from SPDK into its own, or SPDK's
> > ublk extension could be used to instantiate it. In any case, I think
> > it could provide a solid foundation for a host-side FTL
> > implementation.
> 
> Thanks Matias for the link. I had not yet heard about this project.
> Although I have not yet had the time to watch the video, on
> https://spdk.io/doc/ftl.html I found the following: "The Flash Translation Layer
> library provides efficient 4K block device access on top of devices with >4K
> write unit size (eg. raid5f bdev) or devices with large indirection units (some
> capacity-focused NAND drives), which don't handle 4K writes well. It handles
> the logical to physical address mapping and manages the garbage collection
> process." To me that sounds like an effort that has very similar goals as ZNS and
> ZBC? Does the following advice apply to that project: "Don't stack your log on
> my log"? (Yang, Jingpei, Ned Plasson, Greg Gillis, Nisha Talagala, and
> Swaminathan Sundararaman. "Don’t stack your log on my log." In 2nd
> Workshop on Interactions of NVM/Flash with Operating Systems and
> Workloads ({INFLOW} 14). 2014.)
> 

Hi Bart,

Yep, it does. The early incarnation of the ftl module was targeted as an OCSSD-compatible host-side FTL. It was later extended to support large writes and caching devices (e.g., optane). Mariuz and Wojciech have had the pleasure of building it, as well as enabled ZNS support that'll soon be upstream.

Regards, Matias

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [LSF/MM/BPF BoF]: A host FTL for zoned block devices using UBLK
       [not found]   ` <CGME20230207103212epcas5p41c50a45f9d892a53915e04b604a40149@epcas5p4.samsung.com>
@ 2023-02-07 10:31     ` Nitesh Shetty
  2023-02-07 12:49       ` Ming Lei
  0 siblings, 1 reply; 12+ messages in thread
From: Nitesh Shetty @ 2023-02-07 10:31 UTC (permalink / raw)
  To: Ming Lei
  Cc: Hans Holmberg, linux-block, Matias Bjørling, Damien Le Moal,
	Dennis Maisenbacher, Ajay Joshi, Jørgen Hansen, andreas,
	javier, slava, kbusch, hans, mcgrof, guokuankuan,
	viacheslav.dubeyko, hch

[-- Attachment #1: Type: text/plain, Size: 3313 bytes --]

On Mon, Feb 06, 2023 at 08:49:15PM +0800, Ming Lei wrote:
> On Mon, Feb 06, 2023 at 10:00:20AM +0000, Hans Holmberg wrote:
> > I think we're missing a flexible way of routing random-ish
> > write workloads on to zoned storage devices. Implementing a UBLK
> > target for this would be a great way to provide zoned storage
> > benefits to a range of use cases. Creating UBLK target would
> > enable us experiment and move fast, and when we arrive
> > at a common, reasonably stable, solution we could move this into
> > the kernel.
> 
> Yeah, UBLK provides one easy way for fast prototype.
> 
> > 
> > We do have dm-zoned [3]in the kernel, but it requires a bounce
> > on conventional zones for non-sequential writes, resulting in a write
> > amplification of 2x (which is not optimal for flash).
> > 
> > Fully random workloads make little sense to store on ZBDs as a
> > host FTL could not be expected to do better than what conventional block
> > devices do today. Fully sequential writes are also well taken care of
> > by conventional block devices.
> > 
> > The interesting stuff is what lies in between those extremes.
> > 
> > I would like to discuss how we could use UBLK to implement a
> > common FTL with the right knobs to cater for a wide range of workloads
> > that utilize raw block devices. We had some knobs in  the now-dead pblk,
> > a FTL for open channel devices, but I think we could do way better than that.
> > 
> > Pblk did not require bouncing writes and had knobs for over-provisioning and
> > workload isolation which could be implemented. We could also add options
> > for different garbage collection policies. In userspace it would also 
> > be easy to support default block indirection sizes, reducing logical-physical
> > translation table memory overhead.
> > 
> > Use cases for such an FTL includes SSD caching stores such as Apache
> > traffic server [1] and CacheLib[2]. CacheLib's block cache and the apache
> > traffic server storage workloads are *almost* zone block device compatible
> > and would need little translation overhead to perform very well on e.g.
> > ZNS SSDs.
> > 
> > There are probably more use cases that would benefit.
> > 
> > It would also be a great research vehicle for academia. We've used dm-zap
> > for this [4] purpose the last couple of years, but that is not production-ready
> > and cumbersome to improve and maintain as it is implemented as a out-of-tree
> > device mapper.
> 
> Maybe it is one beginning for generic open-source userspace SSD FTL,
> which could be useful for people curious in SSD internal. I have
> google several times for such toolkit to see if it can be ported to
> UBLK easily. SSD simulator isn't great, which isn't disk and can't handle
> real data & workloads. With such project, SSD simulator could be less
> useful, IMO.
> 
> > 
> > ublk adds a bit of latency overhead, but I think this is acceptable at least
> > until we have a great, proven solution, which could be turned into
> > an in-kernel FTL.
> 
> We will keep improving ublk io path, and I am working on ublk
> copy. Once it is done, big chunk IO latency could be reduced a lot.
> 

Just curious, will this also involve running do_splice_direct*() in async style
like normal async read/write, instead of offloading to iowq context ?

Regards,
Nitesh Shetty

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [LSF/MM/BPF BoF]: A host FTL for zoned block devices using UBLK
  2023-02-06 18:58 ` Bart Van Assche
@ 2023-02-07 12:11   ` Hans Holmberg
  0 siblings, 0 replies; 12+ messages in thread
From: Hans Holmberg @ 2023-02-07 12:11 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Hans Holmberg, linux-block, ming.lei, Matias Bjørling,
	Damien Le Moal, Dennis Maisenbacher, Ajay Joshi,
	Jørgen Hansen, andreas, javier, slava, kbusch, mcgrof,
	guokuankuan, viacheslav.dubeyko, hch

On Mon, Feb 6, 2023 at 7:58 PM Bart Van Assche <bvanassche@acm.org> wrote:
>
> On 2/6/23 02:00, Hans Holmberg wrote:
> > I think we're missing a flexible way of routing random-ish
> > write workloads on to zoned storage devices. Implementing a UBLK
> > target for this would be a great way to provide zoned storage
> > benefits to a range of use cases. Creating UBLK target would
> > enable us experiment and move fast, and when we arrive
> > at a common, reasonably stable, solution we could move this into
> > the kernel.
> >
> > We do have dm-zoned [3]in the kernel, but it requires a bounce
> > on conventional zones for non-sequential writes, resulting in a write
> > amplification of 2x (which is not optimal for flash).
> >
> > Fully random workloads make little sense to store on ZBDs as a
> > host FTL could not be expected to do better than what conventional block
> > devices do today. Fully sequential writes are also well taken care of
> > by conventional block devices.
> >
> > The interesting stuff is what lies in between those extremes.
> >
> > I would like to discuss how we could use UBLK to implement a
> > common FTL with the right knobs to cater for a wide range of workloads
> > that utilize raw block devices. We had some knobs in  the now-dead pblk,
> > a FTL for open channel devices, but I think we could do way better than that.
> >
> > Pblk did not require bouncing writes and had knobs for over-provisioning and
> > workload isolation which could be implemented. We could also add options
> > for different garbage collection policies. In userspace it would also
> > be easy to support default block indirection sizes, reducing logical-physical
> > translation table memory overhead.
> >
> > Use cases for such an FTL includes SSD caching stores such as Apache
> > traffic server [1] and CacheLib[2]. CacheLib's block cache and the apache
> > traffic server storage workloads are *almost* zone block device compatible
> > and would need little translation overhead to perform very well on e.g.
> > ZNS SSDs.
> >
> > There are probably more use cases that would benefit.
> >
> > It would also be a great research vehicle for academia. We've used dm-zap
> > for this [4] purpose the last couple of years, but that is not production-ready
> > and cumbersome to improve and maintain as it is implemented as a out-of-tree
> > device mapper.
> >
> > ublk adds a bit of latency overhead, but I think this is acceptable at least
> > until we have a great, proven solution, which could be turned into
> > an in-kernel FTL.
> >
> > If there is interest in the community for a project like this, let's talk!
> >
> > cc:ing the folks who participated in the discussions at ALPSS 2021 and last
> > years' plumbers on this subject.
> >
> > Thanks,
> > Hans
> >
> > [1] https://trafficserver.apache.org/
> > [2] https://cachelib.org/
> > [3] https://docs.kernel.org/admin-guide/device-mapper/dm-zoned.html
> > [4] https://github.com/westerndigitalcorporation/dm-zap
>
> Hi Hans,
>
> Which functionality would such a user space target provide that is not
> yet provided by BTRFS, F2FS or any other log-structured filesystem that
> supports zoned block devices?
>

Hi Bart,

The use cases I'm primarily thinking of are applications and services that work
on top of raw block interfaces, like Apache Traffic server and Cachelib
mentioned in my proposal.

These workloads benefit from not using a file system. The file system overhead
is just too big for storing millions of (> 2kiB) sized objects and
billions of < 2kiB
tiny objects.

For the larger objects, the write pattern is log structured and almost
fully sequential.
Zoned storage would provide a benefit if multiple instances of these
caches would
be co-located on the same media, resulting in mixing of these streams,
or if a large object cache would be mixed with other, random workloads,
like the cache lib store for small objects.

Cache workloads have relaxed persistence requirements. It's not the
end of the world
if an object disappears.

I can recommend [1] and [2] as an introduction to these workloads.

In my plumbers talk [3] from last year I sketched out how zoned storage
could benefit object caching on flash.

[1] https://www.usenix.org/conference/osdi20/presentation/berg
[2] https://engineering.fb.com/2021/10/26/core-data/kangaroo/
[3] https://lpc.events/event/16/contributions/1232/attachments/1066/2095/LPC%202022%20Zoned%20MC%20Improving%20object%20caches%20using%20ZNS%20V2.pdf


Cheers,
Hans

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [LSF/MM/BPF BoF]: A host FTL for zoned block devices using UBLK
  2023-02-07 10:31     ` Nitesh Shetty
@ 2023-02-07 12:49       ` Ming Lei
  0 siblings, 0 replies; 12+ messages in thread
From: Ming Lei @ 2023-02-07 12:49 UTC (permalink / raw)
  To: Nitesh Shetty
  Cc: Hans Holmberg, linux-block, Matias Bjørling, Damien Le Moal,
	Dennis Maisenbacher, Ajay Joshi, Jørgen Hansen, andreas,
	javier, slava, kbusch, hans, mcgrof, guokuankuan,
	viacheslav.dubeyko, hch, ming.lei

On Tue, Feb 07, 2023 at 04:01:41PM +0530, Nitesh Shetty wrote:
> On Mon, Feb 06, 2023 at 08:49:15PM +0800, Ming Lei wrote:
> > On Mon, Feb 06, 2023 at 10:00:20AM +0000, Hans Holmberg wrote:
> > > I think we're missing a flexible way of routing random-ish
> > > write workloads on to zoned storage devices. Implementing a UBLK
> > > target for this would be a great way to provide zoned storage
> > > benefits to a range of use cases. Creating UBLK target would
> > > enable us experiment and move fast, and when we arrive
> > > at a common, reasonably stable, solution we could move this into
> > > the kernel.
> > 
> > Yeah, UBLK provides one easy way for fast prototype.
> > 
> > > 
> > > We do have dm-zoned [3]in the kernel, but it requires a bounce
> > > on conventional zones for non-sequential writes, resulting in a write
> > > amplification of 2x (which is not optimal for flash).
> > > 
> > > Fully random workloads make little sense to store on ZBDs as a
> > > host FTL could not be expected to do better than what conventional block
> > > devices do today. Fully sequential writes are also well taken care of
> > > by conventional block devices.
> > > 
> > > The interesting stuff is what lies in between those extremes.
> > > 
> > > I would like to discuss how we could use UBLK to implement a
> > > common FTL with the right knobs to cater for a wide range of workloads
> > > that utilize raw block devices. We had some knobs in  the now-dead pblk,
> > > a FTL for open channel devices, but I think we could do way better than that.
> > > 
> > > Pblk did not require bouncing writes and had knobs for over-provisioning and
> > > workload isolation which could be implemented. We could also add options
> > > for different garbage collection policies. In userspace it would also 
> > > be easy to support default block indirection sizes, reducing logical-physical
> > > translation table memory overhead.
> > > 
> > > Use cases for such an FTL includes SSD caching stores such as Apache
> > > traffic server [1] and CacheLib[2]. CacheLib's block cache and the apache
> > > traffic server storage workloads are *almost* zone block device compatible
> > > and would need little translation overhead to perform very well on e.g.
> > > ZNS SSDs.
> > > 
> > > There are probably more use cases that would benefit.
> > > 
> > > It would also be a great research vehicle for academia. We've used dm-zap
> > > for this [4] purpose the last couple of years, but that is not production-ready
> > > and cumbersome to improve and maintain as it is implemented as a out-of-tree
> > > device mapper.
> > 
> > Maybe it is one beginning for generic open-source userspace SSD FTL,
> > which could be useful for people curious in SSD internal. I have
> > google several times for such toolkit to see if it can be ported to
> > UBLK easily. SSD simulator isn't great, which isn't disk and can't handle
> > real data & workloads. With such project, SSD simulator could be less
> > useful, IMO.
> > 
> > > 
> > > ublk adds a bit of latency overhead, but I think this is acceptable at least
> > > until we have a great, proven solution, which could be turned into
> > > an in-kernel FTL.
> > 
> > We will keep improving ublk io path, and I am working on ublk
> > copy. Once it is done, big chunk IO latency could be reduced a lot.
> > 
> 
> Just curious, will this also involve running do_splice_direct*() in async style
> like normal async read/write, instead of offloading to iowq context ?

Follows the idea:

- adding new type of buffer(splice buffer) to io_uring, this
buffer will be populated into bvec table(reusing io_mapped_ubuf) by
passing (splice_fd, offset, len) from SQE.

- The buffer is filled from ublk ->read_splice() with help of
splice_direct_to_actor() over direct pipe, probably we can add one
private splice flag to just allow ublk ->read_splice() to be available
in kernel(io_uring) & direct pipe

- It requires the pipe buffer ownership not transferred, so nop_pipe_buf_ops
is needed for such usage, and this way is pretty fine for ublk & fuse.

- The buffer can be allocated & populated from ->prep() of io_uring rw/net,
then handled just like READ[WRITE]_FIXED.

So it is like normal async read/write, then two pin pages are avoided,
and one time of io data copy is saved.

This way is also flexible to allow read/write over any part of the buffer.

Thanks,
Ming


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-02-07 12:50 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-06 10:00 [LSF/MM/BPF BoF]: A host FTL for zoned block devices using UBLK Hans Holmberg
2023-02-06 12:49 ` Ming Lei
2023-02-06 12:54   ` Ming Lei
2023-02-06 14:34   ` Matias Bjørling
2023-02-06 15:32     ` Ming Lei
2023-02-06 18:31     ` Bart Van Assche
2023-02-07  9:40       ` Matias Bjørling
2023-02-07  9:32     ` Hans Holmberg
     [not found]   ` <CGME20230207103212epcas5p41c50a45f9d892a53915e04b604a40149@epcas5p4.samsung.com>
2023-02-07 10:31     ` Nitesh Shetty
2023-02-07 12:49       ` Ming Lei
2023-02-06 18:58 ` Bart Van Assche
2023-02-07 12:11   ` Hans Holmberg

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.