Linux-Fsdevel Archive on lore.kernel.org
 help / Atom feed
* [LSF/MM TOPIC] BPF for Block Devices
@ 2019-02-07 17:12 Stephen  Bates
       [not found] ` <04952865-6EEE-4D78-8CC9-00484CFBD13E@javigon.com>
  2019-02-08  8:31 ` Matias Bjørling
  0 siblings, 2 replies; 3+ messages in thread
From: Stephen  Bates @ 2019-02-07 17:12 UTC (permalink / raw)
  To: Jens Axboe, linux-fsdevel, linux-mm, linux-block,
	IDE/ATA development list, linux-scsi, linux-nvme,
	Logan Gunthorpe
  Cc: linux-kernel, lsf-pc, bpf, ast

Hi All

> A BPF track will join the annual LSF/MM Summit this year! Please read the updated description and CFP information below.

Well if we are adding BPF to LSF/MM I have to submit a request to discuss BPF for block devices please!

There has been quite a bit of activity around the concept of Computational Storage in the past 12 months. SNIA recently formed a Technical Working Group (TWG) and it is expected that this TWG will be making proposals to standards like NVM Express to add APIs for computation elements that reside on or near block devices.

While some of these Computational Storage accelerators will provide fixed functions (e.g. a RAID, encryption or compression), others will be more flexible. Some of these flexible accelerators will be capable of running BPF code on them (something that certain Linux drivers for SmartNICs support today [1]). I would like to discuss what such a framework could look like for the storage layer and the file-system layer. I'd like to discuss how devices could advertise this capability (a special type of NVMe namespace or SCSI LUN perhaps?) and how the BPF engine could be programmed and then used against block IO. Ideally I'd like to discuss doing this in a vendor-neutral way and develop ideas I can take back to NVMe and the SNIA TWG to help shape how these standard evolve.

To provide an example use-case one could consider a BPF capable accelerator being used to perform a filtering function and then using p2pdma to scan data on a number of adjacent NVMe SSDs, filtering said data and then only providing filter-matched LBAs to the host. Many other potential applications apply. 

Also, I am interested in the "The end of the DAX Experiment" topic proposed by Dan and the " Zoned Block Devices" from Matias and Damien.

Cheers
 
Stephen

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/netronome/nfp/bpf/offload.c?h=v5.0-rc5
 
    


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [LSF/MM TOPIC] BPF for Block Devices
       [not found] ` <04952865-6EEE-4D78-8CC9-00484CFBD13E@javigon.com>
@ 2019-02-07 21:12   ` Javier González
  0 siblings, 0 replies; 3+ messages in thread
From: Javier González @ 2019-02-07 21:12 UTC (permalink / raw)
  To: Stephen Bates
  Cc: Jens Axboe, linux-fsdevel, linux-mm, linux-block,
	IDE/ATA development list, linux-scsi, linux-nvme,
	Logan Gunthorpe, linux-kernel, lsf-pc, bpf, ast

+ Mailing lists

> On 7 Feb 2019, at 18.48, Javier González <javier@javigon.com> wrote:
> 
> 
> 
>> On 7 Feb 2019, at 18.12, Stephen Bates <sbates@raithlin.com> wrote:
>> 
>> Hi All
>> 
>>> A BPF track will join the annual LSF/MM Summit this year! Please read the updated description and CFP information below.
>> 
>> Well if we are adding BPF to LSF/MM I have to submit a request to discuss BPF for block devices please!
>> 
>> There has been quite a bit of activity around the concept of Computational Storage in the past 12 months. SNIA recently formed a Technical Working Group (TWG) and it is expected that this TWG will be making proposals to standards like NVM Express to add APIs for computation elements that reside on or near block devices.
>> 
>> While some of these Computational Storage accelerators will provide fixed functions (e.g. a RAID, encryption or compression), others will be more flexible. Some of these flexible accelerators will be capable of running BPF code on them (something that certain Linux drivers for SmartNICs support today [1]). I would like to discuss what such a framework could look like for the storage layer and the file-system layer. I'd like to discuss how devices could advertise this capability (a special type of NVMe namespace or SCSI LUN perhaps?) and how the BPF engine could be programmed and then used against block IO. Ideally I'd like to discuss doing this in a vendor-neutral way and develop ideas I can take back to NVMe and the SNIA TWG to help shape how these standard evolve.
>> 
>> To provide an example use-case one could consider a BPF capable accelerator being used to perform a filtering function and then using p2pdma to scan data on a number of adjacent NVMe SSDs, filtering said data and then only providing filter-matched LBAs to the host. Many other potential applications apply. 
>> 
>> Also, I am interested in the "The end of the DAX Experiment" topic proposed by Dan and the " Zoned Block Devices" from Matias and Damien.
>> 
>> Cheers
>> 
>> Stephen
>> 
>> [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/netronome/nfp/bpf/offload.c?h=v5.0-rc5
> 
> Definitely interested on this too - and pleasantly surprised to see a BPF track!
> 
> I would like to extend Stephen’s discussion to eBPF running in the block layer directly - both on the kernel VM and offloaded to the accelerator of choice. This would be like XDP on the storage stack, possibly with different entry points. I have been doing some experiments building a dedup engine for pblk in the last couple of weeks and a number of interesting questions have arisen.
> 
> Also, if there is a discussion on offloading the eBPF to an accelerator, I would like to discuss how we can efficiently support data modifications without having double transfers over either the PCIe bus (or worse, over the network): one for the data computation + modification and another for the actual data transfer. Something like p2pmem comes to mind here, but for this to integrate nicely, we would need to overcome the current limitations on PCIe and talk about p2pmem over fabrics.
> 
> Javier


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [LSF/MM TOPIC] BPF for Block Devices
  2019-02-07 17:12 [LSF/MM TOPIC] BPF for Block Devices Stephen  Bates
       [not found] ` <04952865-6EEE-4D78-8CC9-00484CFBD13E@javigon.com>
@ 2019-02-08  8:31 ` Matias Bjørling
  1 sibling, 0 replies; 3+ messages in thread
From: Matias Bjørling @ 2019-02-08  8:31 UTC (permalink / raw)
  To: Stephen Bates, Jens Axboe, linux-fsdevel, linux-mm, linux-block,
	IDE/ATA development list, linux-scsi, linux-nvme,
	Logan Gunthorpe
  Cc: linux-kernel, lsf-pc, bpf, ast

On 2/7/19 6:12 PM, Stephen  Bates wrote:
> Hi All
> 
>> A BPF track will join the annual LSF/MM Summit this year! Please read the updated description and CFP information below.
> 
> Well if we are adding BPF to LSF/MM I have to submit a request to discuss BPF for block devices please!
> 
> There has been quite a bit of activity around the concept of Computational Storage in the past 12 months. SNIA recently formed a Technical Working Group (TWG) and it is expected that this TWG will be making proposals to standards like NVM Express to add APIs for computation elements that reside on or near block devices.
> 
> While some of these Computational Storage accelerators will provide fixed functions (e.g. a RAID, encryption or compression), others will be more flexible. Some of these flexible accelerators will be capable of running BPF code on them (something that certain Linux drivers for SmartNICs support today [1]). I would like to discuss what such a framework could look like for the storage layer and the file-system layer. I'd like to discuss how devices could advertise this capability (a special type of NVMe namespace or SCSI LUN perhaps?) and how the BPF engine could be programmed and then used against block IO. Ideally I'd like to discuss doing this in a vendor-neutral way and develop ideas I can take back to NVMe and the SNIA TWG to help shape how these standard evolve.
> 
> To provide an example use-case one could consider a BPF capable accelerator being used to perform a filtering function and then using p2pdma to scan data on a number of adjacent NVMe SSDs, filtering said data and then only providing filter-matched LBAs to the host. Many other potential applications apply.
> 
> Also, I am interested in the "The end of the DAX Experiment" topic proposed by Dan and the " Zoned Block Devices" from Matias and Damien.
> 
> Cheers
>   
> Stephen
> 
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/netronome/nfp/bpf/offload.c?h=v5.0-rc5
>   
>      
> 

If we're going down that road, we can also look at the block I/O path 
itself.

Now that Jens' has shown that io_uring can beat SPDK. Let's take it a 
step further, and create an API, such that we can bypass the boilerplate 
checking in kernel block I/O path, and go straight to issuing the I/O in 
the block layer.

For example, we could provide an API that allows applications to 
register a fast path through the kernel — one where checks, such as 
generic_make_request_checks(), already has been validated.

The user-space application registers a BFP program with the kernel, the 
kernel prechecks the possible I/O patterns and then green-lights all 
I/Os that goes through that unit. In that way, the checks only have to 
be done once, instead of every I/O. This approach could work beautifully 
with direct io and raw devices, and with a bit more work, we can do more 
complex use-cases as well.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, back to index

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-07 17:12 [LSF/MM TOPIC] BPF for Block Devices Stephen  Bates
     [not found] ` <04952865-6EEE-4D78-8CC9-00484CFBD13E@javigon.com>
2019-02-07 21:12   ` Javier González
2019-02-08  8:31 ` Matias Bjørling

Linux-Fsdevel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-fsdevel/0 linux-fsdevel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-fsdevel linux-fsdevel/ https://lore.kernel.org/linux-fsdevel \
		linux-fsdevel@vger.kernel.org linux-fsdevel@archiver.kernel.org
	public-inbox-index linux-fsdevel


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-fsdevel


AGPL code for this site: git clone https://public-inbox.org/ public-inbox