linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [LSF/MM TOPIC ATTEND][RFD] ZUFS - Zero-copy User-mode File System
@ 2018-02-01 13:51 Boaz Harrosh
  2018-02-01 18:34 ` Chuck Lever
  0 siblings, 1 reply; 12+ messages in thread
From: Boaz Harrosh @ 2018-02-01 13:51 UTC (permalink / raw)
  To: lsf-pc, linux-fsdevel, Ric Wheeler, Jan Kara, Steven Whitehouse,
	coughlan, Jefff moyer, Sage Weil, Miklos Szeredi, Andy Rudof,
	Anna Schumaker, Amir Goldstein, Stephen Bates, Amit Golander,
	Sagi Manole, Shachar Sharon, Josef Bacik

Sorry I'm resending because this mail never appeared on linux-fsdevel@ML. I'm assuming because it had an attachment. So here is a URL on the web of the slides attached:
http://linuxplumbersconf.org/2017/ocw//system/presentations/4703/original/ZUFS_for_LinuxPlumbers_LPC.pptx
~~~~~

Hi Fellow coders

Last plumbers I have introduced a new project ZUFS,
(See attached slides of what was presented then. Specially the POC benchmarks)

ZUFS - stands for Zero-copy User-mode FS
- It is geared towards true zero copy end to end of both data and meta data.
- It is geared towards very *low latency*, very high CPU locality, lock-less parallelism.
- Synchronous operations
- Numa awareness

Short description:
  ZUFS is a from scratch implementation of a filesystem-in-user-space, which tries to address the above goals. from the get go it is aimed for pmem based FSs. But can easily support other type of FSs that can utilize x10 latency and parallelism improvements. The novelty of this project is that the interface is designed with a modern multi-core NUMA machine in mind down to the ABI, so to reach these goals.

Not only FSs need apply, also any kind of user-mode Server can set up a pseudo filesystem and communicate with application via virtual files. These can then benefit from zero copy low-latency communication directly to/from application buffers. Or Application mmap direct Server resources. As long as it looks like a file system to the Kernel.

During Plumbers a few people and companies showed deep interest in this project. Mainly because it introduces a new kind of synchronous, low-latency communication between many application and a user-mode Server, as the FS example above. (Perhaps this pattern could be extended to other areas but this is out of scope for now.)

Since then I have been banging on some real implementation and very soon (Once Netapp legal approves the code, should be done next week) will release a first draft RFC for review. (I will be sending some design notes soon).

Current status is that we have couple trivial filesystem implementations and together with the Kernel module the UM-Server and the FSs User-mode pluggin can actually pass a good bunch of xfstests quick run. (Still working on Stability)

I would like to present current (to LSF time) implementation and status.  But mainly to consult with the Kernel Gurus at LSF/MM how to HARDEN and secure this very ambitious project. Specially as there are couple of mm and scheduler patches that will need to be submitted along with this project.

CC'ed are people that stated interest in the past about ZUFS. Sorry if I forgot some. Please comment if you are interested to talk about this on LSF/MM

Just to get some points across as I said this project is all about performance and low latency. Here below are a POC results I have run (Repeated from the attached slides)

	In Kernel FS			ZUFS			FUSE	
Threads	Op/s	Lat (us)		Op/s	Lat [us]	Op/s	Lat [us]
1	388361	2.271589		200799	4.6		71820	13.5
2	635115	2.604376		314321	5.9		148083	13.1
4	1260307	2.626361		565574	6.6		212133	18.3
8	2744963	2.485292		1113138	6.6		209799	37.6
12	2126945	5.020506		1598451	6.8		201689	58.7
18	4350995	3.386433		1648689	7.8		174823	101.8
24	4211180	4.784997		1702285	8		149413	159
36	3057166	9.291997		1783346	13.4		148276	240.7
48	3148972	10.382461		1741873	17.4		145296	327.3

I have used an average server machine in our lab with two NUMA nodes and total of 40 cores (Can't remember all the details).
Running fio with 4k random writes. The IO is then just memcpy_nt() to a pmem simulated DRAM. The fio was run with more and
more threads (see threads column) (See the nice graphs inside)

We can see that we are still > x2 slower than in-Kernel FS. But I believe I can shave off another 1 us by optimizing the app-to-server thread switch by utilizing perhaps the "Binder" scheduler object or devising another way to not be going through the scheduler (and its locks) when switching VMs

Thank you
Boaz

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [LSF/MM TOPIC ATTEND][RFD] ZUFS - Zero-copy User-mode File System
  2018-02-01 13:51 [LSF/MM TOPIC ATTEND][RFD] ZUFS - Zero-copy User-mode File System Boaz Harrosh
@ 2018-02-01 18:34 ` Chuck Lever
  2018-02-01 18:59   ` Boaz Harrosh
  0 siblings, 1 reply; 12+ messages in thread
From: Chuck Lever @ 2018-02-01 18:34 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: lsf-pc, linux-fsdevel, Ric Wheeler, Jan Kara, Steven Whitehouse,
	coughlan, Jeff Moyer, Sage Weil, Miklos Szeredi, Andy Rudof,
	Anna Schumaker, Amir Goldstein, Stephen Bates, Amit Golander,
	Sagi Manole, Shachar Sharon, Josef Bacik



> On Feb 1, 2018, at 8:51 AM, Boaz Harrosh <boazh@netapp.com> wrote:
> 
> Sorry I'm resending because this mail never appeared on linux-fsdevel@ML. I'm assuming because it had an attachment. So here is a URL on the web of the slides attached:
> http://linuxplumbersconf.org/2017/ocw//system/presentations/4703/original/ZUFS_for_LinuxPlumbers_LPC.pptx
> ~~~~~
> 
> Hi Fellow coders
> 
> Last plumbers I have introduced a new project ZUFS,
> (See attached slides of what was presented then. Specially the POC benchmarks)
> 
> ZUFS - stands for Zero-copy User-mode FS
> - It is geared towards true zero copy end to end of both data and meta data.
> - It is geared towards very *low latency*, very high CPU locality, lock-less parallelism.
> - Synchronous operations
> - Numa awareness
> 
> Short description:
>  ZUFS is a from scratch implementation of a filesystem-in-user-space, which tries to address the above goals. from the get go it is aimed for pmem based FSs. But can easily support other type of FSs that can utilize x10 latency and parallelism improvements. The novelty of this project is that the interface is designed with a modern multi-core NUMA machine in mind down to the ABI, so to reach these goals.
> 
> Not only FSs need apply, also any kind of user-mode Server can set up a pseudo filesystem and communicate with application via virtual files. These can then benefit from zero copy low-latency communication directly to/from application buffers. Or Application mmap direct Server resources. As long as it looks like a file system to the Kernel.
> 
> During Plumbers a few people and companies showed deep interest in this project. Mainly because it introduces a new kind of synchronous, low-latency communication between many application and a user-mode Server, as the FS example above. (Perhaps this pattern could be extended to other areas but this is out of scope for now.)
> 
> Since then I have been banging on some real implementation and very soon (Once Netapp legal approves the code, should be done next week) will release a first draft RFC for review. (I will be sending some design notes soon).
> 
> Current status is that we have couple trivial filesystem implementations and together with the Kernel module the UM-Server and the FSs User-mode pluggin can actually pass a good bunch of xfstests quick run. (Still working on Stability)
> 
> I would like to present current (to LSF time) implementation and status.  But mainly to consult with the Kernel Gurus at LSF/MM how to HARDEN and secure this very ambitious project. Specially as there are couple of mm and scheduler patches that will need to be submitted along with this project.
> 
> CC'ed are people that stated interest in the past about ZUFS. Sorry if I forgot some. Please comment if you are interested to talk about this on LSF/MM
> 
> Just to get some points across as I said this project is all about performance and low latency. Here below are a POC results I have run (Repeated from the attached slides)
> 
> 	In Kernel FS			ZUFS			FUSE	
> Threads	Op/s	Lat (us)		Op/s	Lat [us]	Op/s	Lat [us]
> 1	388361	2.271589		200799	4.6		71820	13.5
> 2	635115	2.604376		314321	5.9		148083	13.1
> 4	1260307	2.626361		565574	6.6		212133	18.3
> 8	2744963	2.485292		1113138	6.6		209799	37.6
> 12	2126945	5.020506		1598451	6.8		201689	58.7
> 18	4350995	3.386433		1648689	7.8		174823	101.8
> 24	4211180	4.784997		1702285	8		149413	159
> 36	3057166	9.291997		1783346	13.4		148276	240.7
> 48	3148972	10.382461		1741873	17.4		145296	327.3
> 
> I have used an average server machine in our lab with two NUMA nodes and total of 40 cores (Can't remember all the details).
> Running fio with 4k random writes. The IO is then just memcpy_nt() to a pmem simulated DRAM. The fio was run with more and
> more threads (see threads column) (See the nice graphs inside)
> 
> We can see that we are still > x2 slower than in-Kernel FS. But I believe I can shave off another 1 us by optimizing the app-to-server thread switch by utilizing perhaps the "Binder" scheduler object or devising another way to not be going through the scheduler (and its locks) when switching VMs

This work was also presented at the SNIA Persistent Memory Summit last week.
The use case of course is providing a user space platform for the development
and deployment of memory-based file systems. The value-add of this kind of
file system is ultra-low latency, which is a challenge for the current most
popular such framework, FUSE.

To start, I can think of three areas where specific questions might be
entertained by LSF/MM attendees:

- Spectre mitigations make this whole "user space filesystem" arrangement even
slower, thanks to additional context switches between user space and the kernel.
What can be done for FUSE and ZUFS to reduce the impact of Spectre mitigations?

- The fundamental innovation of ZUFS is porting Solaris "doors" to Linux. A
"door" is a local RPC mechanism that stays on the same thread to reduce
scheduling overhead during calls to services provided by daemons on the local
system. Is there interest in building a generic "doors" facility in Linux?

- ZUFS currently supports only synchronous calls to the file system, as the
assumption is the services (filesystems) will be I/O-less, typically. Is there
a need to support asynchronicity in the ZUFS API?


--
Chuck Lever

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [LSF/MM TOPIC ATTEND][RFD] ZUFS - Zero-copy User-mode File System
  2018-02-01 18:34 ` Chuck Lever
@ 2018-02-01 18:59   ` Boaz Harrosh
  2018-02-02  9:36     ` Miklos Szeredi
                       ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Boaz Harrosh @ 2018-02-01 18:59 UTC (permalink / raw)
  To: Chuck Lever
  Cc: lsf-pc, linux-fsdevel, Ric Wheeler, Jan Kara, Steven Whitehouse,
	coughlan, Jeff Moyer, Sage Weil, Miklos Szeredi, Andy Rudof,
	Anna Schumaker, Amir Goldstein, Stephen Bates, Amit Golander,
	Sagi Manole, Shachar Sharon, Josef Bacik

On 01/02/18 20:34, Chuck Lever wrote:
<>
> This work was also presented at the SNIA Persistent Memory Summit last week.
> The use case of course is providing a user space platform for the development
> and deployment of memory-based file systems. The value-add of this kind of
> file system is ultra-low latency, which is a challenge for the current most
> popular such framework, FUSE.
> 
> To start, I can think of three areas where specific questions might be
> entertained by LSF/MM attendees:
> 
> - Spectre mitigations make this whole "user space filesystem" arrangement even
> slower, thanks to additional context switches between user space and the kernel.
> What can be done for FUSE and ZUFS to reduce the impact of Spectre mitigations?
> 

Sigh !!

What about a different interface for a "trusted" binary with "Spectre mitigation"
off.
I know Redhat guys have a project where they want to sign and verify by Kernel all
systemd /sbin/* binaries. If these binaries have such an hardened trust could
we make them faster? (ie back to regular speed)

> - The fundamental innovation of ZUFS is porting Solaris "doors" to Linux. A
> "door" is a local RPC mechanism that stays on the same thread to reduce
> scheduling overhead during calls to services provided by daemons on the local
> system. Is there interest in building a generic "doors" facility in Linux?
> 

People said I should look into the Binder project from Google. It is already
in Kernel, and is used by Android. As I understand they have exactly such
an object like you describe above. Combined with my thread-array it sounds
like what I want.

> - ZUFS currently supports only synchronous calls to the file system, as the
> assumption is the services (filesystems) will be I/O-less, typically. Is there
> a need to support asynchronicity in the ZUFS API?
> 

There is a plan for async ops, the Server returns "PENDING" and then the
Operation is completed later by the server on some Async back channel.
With some kind of a cookie system. But I have not come to implement it yet.
This will cost because it will need a double mapping of application pages.
But I assume async means it is slower, and it is still cheaper then copy.
So yes, it is on the roadmap.

In General locks are OK in a sync-operation but any need for re-entering
the Kernel, Say to read from disk, will need to be completed ASYNC and the
zt-channel freed for other operations.

> 
> --
> Chuck Lever
>

Thanks
Boaz

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [LSF/MM TOPIC ATTEND][RFD] ZUFS - Zero-copy User-mode File System
  2018-02-01 18:59   ` Boaz Harrosh
@ 2018-02-02  9:36     ` Miklos Szeredi
  2018-02-05 13:04       ` Boaz Harrosh
  2018-02-02 15:49     ` J. Bruce Fields
  2018-03-05 12:18     ` Greg KH
  2 siblings, 1 reply; 12+ messages in thread
From: Miklos Szeredi @ 2018-02-02  9:36 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: Chuck Lever, lsf-pc, linux-fsdevel, Ric Wheeler, Jan Kara,
	Steven Whitehouse, Tom Coughlan, Jeff Moyer, Sage Weil,
	Andy Rudof, Anna Schumaker, Amir Goldstein, Stephen Bates,
	Amit Golander, Sagi Manole, Shachar Sharon, Josef Bacik

On Thu, Feb 1, 2018 at 7:59 PM, Boaz Harrosh <boazh@netapp.com> wrote:
> On 01/02/18 20:34, Chuck Lever wrote:
> <>
>> This work was also presented at the SNIA Persistent Memory Summit last week.
>> The use case of course is providing a user space platform for the development
>> and deployment of memory-based file systems. The value-add of this kind of
>> file system is ultra-low latency, which is a challenge for the current most
>> popular such framework, FUSE.

I can see the numbers being very impressive and very happy to see
progress being made in this field.

What I'd also really be interested to see is where those speedups come from.

As of linux-4.2/libfuse-3.0 there is a scalability improvement, that
can be enabled with the "-o clone_fd" option.  This option creates
per-thread queues, which are prerequisite to achieving full CPU/NUMA
affinity for request processing.   Even just turning of "clone_fd"
might improve the latency numbers for FUSE.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [LSF/MM TOPIC ATTEND][RFD] ZUFS - Zero-copy User-mode File System
  2018-02-01 18:59   ` Boaz Harrosh
  2018-02-02  9:36     ` Miklos Szeredi
@ 2018-02-02 15:49     ` J. Bruce Fields
  2018-02-02 16:09       ` Chuck Lever
  2018-02-05 12:53       ` Boaz Harrosh
  2018-03-05 12:18     ` Greg KH
  2 siblings, 2 replies; 12+ messages in thread
From: J. Bruce Fields @ 2018-02-02 15:49 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: Chuck Lever, lsf-pc, linux-fsdevel, Ric Wheeler, Jan Kara,
	Steven Whitehouse, coughlan, Jeff Moyer, Sage Weil,
	Miklos Szeredi, Andy Rudof, Anna Schumaker, Amir Goldstein,
	Stephen Bates, Amit Golander, Sagi Manole, Shachar Sharon,
	Josef Bacik

On Thu, Feb 01, 2018 at 08:59:18PM +0200, Boaz Harrosh wrote:
> On 01/02/18 20:34, Chuck Lever wrote: <>
> > This work was also presented at the SNIA Persistent Memory Summit
> > last week.  The use case of course is providing a user space
> > platform for the development and deployment of memory-based file
> > systems. The value-add of this kind of file system is ultra-low
> > latency, which is a challenge for the current most popular such
> > framework, FUSE.
> > 
> > To start, I can think of three areas where specific questions might
> > be entertained by LSF/MM attendees:
> > 
> > - Spectre mitigations make this whole "user space filesystem"
> > arrangement even slower, thanks to additional context switches
> > between user space and the kernel.

I think you're referring to the KPTI patches, which address Meltdown,
not Spectre.

> What about a different interface for a "trusted" binary with "Spectre
> mitigation" off.  I know Redhat guys have a project where they want to
> sign and verify by Kernel all systemd /sbin/* binaries. If these
> binaries have such an hardened trust could we make them faster? (ie
> back to regular speed)

I don't think that helps.

--b.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [LSF/MM TOPIC ATTEND][RFD] ZUFS - Zero-copy User-mode File System
  2018-02-02 15:49     ` J. Bruce Fields
@ 2018-02-02 16:09       ` Chuck Lever
  2018-02-02 16:13         ` Bruce Fields
  2018-02-09 17:47         ` Steve French
  2018-02-05 12:53       ` Boaz Harrosh
  1 sibling, 2 replies; 12+ messages in thread
From: Chuck Lever @ 2018-02-02 16:09 UTC (permalink / raw)
  To: Bruce Fields
  Cc: Boaz Harrosh, lsf-pc, linux-fsdevel, Ric Wheeler, Jan Kara,
	Steven Whitehouse, coughlan, Jeff Moyer, Sage Weil,
	Miklos Szeredi, Andy Rudof, Anna Schumaker, Amir Goldstein,
	Stephen Bates, Amit Golander, Sagi Manole, Shachar Sharon,
	Josef Bacik



> On Feb 2, 2018, at 10:49 AM, bfields@fieldses.org wrote:
> 
> On Thu, Feb 01, 2018 at 08:59:18PM +0200, Boaz Harrosh wrote:
>> On 01/02/18 20:34, Chuck Lever wrote: <>
>>> This work was also presented at the SNIA Persistent Memory Summit
>>> last week.  The use case of course is providing a user space
>>> platform for the development and deployment of memory-based file
>>> systems. The value-add of this kind of file system is ultra-low
>>> latency, which is a challenge for the current most popular such
>>> framework, FUSE.
>>> 
>>> To start, I can think of three areas where specific questions might
>>> be entertained by LSF/MM attendees:
>>> 
>>> - Spectre mitigations make this whole "user space filesystem"
>>> arrangement even slower, thanks to additional context switches
>>> between user space and the kernel.
> 
> I think you're referring to the KPTI patches, which address Meltdown,
> not Spectre.

I enabled KPTI on my NFS client and server systems in early
v4.15-rc, and didn't measure a change in latency or throughput.

But with v4.15 final, which includes some Spectre mitigations,
write(2) on NFS files, for example, takes about 15us longer.
Since the RPC round-trip times did not increase, I presume this
extra latency is incurred on the client, where the user-kernel
boundary transitions occur.

<shrug>

Anyway there's more latency in the user space-kernel transition
now. Thus any stack that adds more such transitions will need
attention. That would include FUSE, user-space file servers,
ZUFS, any activity that requires upcalls, and so on.


--
Chuck Lever

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [LSF/MM TOPIC ATTEND][RFD] ZUFS - Zero-copy User-mode File System
  2018-02-02 16:09       ` Chuck Lever
@ 2018-02-02 16:13         ` Bruce Fields
  2018-02-09 17:47         ` Steve French
  1 sibling, 0 replies; 12+ messages in thread
From: Bruce Fields @ 2018-02-02 16:13 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Boaz Harrosh, lsf-pc, linux-fsdevel, Ric Wheeler, Jan Kara,
	Steven Whitehouse, coughlan, Jeff Moyer, Sage Weil,
	Miklos Szeredi, Andy Rudof, Anna Schumaker, Amir Goldstein,
	Stephen Bates, Amit Golander, Sagi Manole, Shachar Sharon,
	Josef Bacik

On Fri, Feb 02, 2018 at 11:09:03AM -0500, Chuck Lever wrote:
> 
> 
> > On Feb 2, 2018, at 10:49 AM, bfields@fieldses.org wrote:
> > 
> > On Thu, Feb 01, 2018 at 08:59:18PM +0200, Boaz Harrosh wrote:
> >> On 01/02/18 20:34, Chuck Lever wrote: <>
> >>> This work was also presented at the SNIA Persistent Memory Summit
> >>> last week.  The use case of course is providing a user space
> >>> platform for the development and deployment of memory-based file
> >>> systems. The value-add of this kind of file system is ultra-low
> >>> latency, which is a challenge for the current most popular such
> >>> framework, FUSE.
> >>> 
> >>> To start, I can think of three areas where specific questions might
> >>> be entertained by LSF/MM attendees:
> >>> 
> >>> - Spectre mitigations make this whole "user space filesystem"
> >>> arrangement even slower, thanks to additional context switches
> >>> between user space and the kernel.
> > 
> > I think you're referring to the KPTI patches, which address Meltdown,
> > not Spectre.
> 
> I enabled KPTI on my NFS client and server systems in early
> v4.15-rc, and didn't measure a change in latency or throughput.
> 
> But with v4.15 final, which includes some Spectre mitigations,
> write(2) on NFS files, for example, takes about 15us longer.
> Since the RPC round-trip times did not increase, I presume this
> extra latency is incurred on the client, where the user-kernel
> boundary transitions occur.

OK, that's interesting, thanks.

--b.

> 
> <shrug>
> 
> Anyway there's more latency in the user space-kernel transition
> now. Thus any stack that adds more such transitions will need
> attention. That would include FUSE, user-space file servers,
> ZUFS, any activity that requires upcalls, and so on.
> 
> 
> --
> Chuck Lever
> 
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [LSF/MM TOPIC ATTEND][RFD] ZUFS - Zero-copy User-mode File System
  2018-02-02 15:49     ` J. Bruce Fields
  2018-02-02 16:09       ` Chuck Lever
@ 2018-02-05 12:53       ` Boaz Harrosh
  1 sibling, 0 replies; 12+ messages in thread
From: Boaz Harrosh @ 2018-02-05 12:53 UTC (permalink / raw)
  To: J. Bruce Fields, Boaz Harrosh
  Cc: Chuck Lever, lsf-pc, linux-fsdevel, Ric Wheeler, Jan Kara,
	Steven Whitehouse, coughlan, Jeff Moyer, Sage Weil,
	Miklos Szeredi, Andy Rudof, Anna Schumaker, Amir Goldstein,
	Stephen Bates, Amit Golander, Sagi Manole, Shachar Sharon,
	Josef Bacik

On 02/02/18 17:49, J. Bruce Fields wrote:
<>
>> What about a different interface for a "trusted" binary with "Spectre
>> mitigation" off.  I know Redhat guys have a project where they want to
>> sign and verify by Kernel all systemd /sbin/* binaries. If these
>> binaries have such an hardened trust could we make them faster? (ie
>> back to regular speed)
> 
> I don't think that helps.
> 

If that does not help then I'm clueless. I understood that the
slowdown is because some CPU pipelines need stalling (flushing)
because back-from-kernel call in usermode can (theoretically)
inspect the other side of the un-taken speculated branch .....
So if I trust the user-mode app I can trust it will not misuse
that info?

But again I'm completely clueless. What else then "app trust"
can there be?

> --b.
> 

Thanks
Boaz

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [LSF/MM TOPIC ATTEND][RFD] ZUFS - Zero-copy User-mode File System
  2018-02-02  9:36     ` Miklos Szeredi
@ 2018-02-05 13:04       ` Boaz Harrosh
  2018-02-05 15:48         ` Miklos Szeredi
  0 siblings, 1 reply; 12+ messages in thread
From: Boaz Harrosh @ 2018-02-05 13:04 UTC (permalink / raw)
  To: Miklos Szeredi, Boaz Harrosh
  Cc: Chuck Lever, lsf-pc, linux-fsdevel, Ric Wheeler, Jan Kara,
	Steven Whitehouse, Tom Coughlan, Jeff Moyer, Sage Weil,
	Andy Rudof, Anna Schumaker, Amir Goldstein, Stephen Bates,
	Amit Golander, Sagi Manole, Shachar Sharon, Josef Bacik

On 02/02/18 11:36, Miklos Szeredi wrote:
> On Thu, Feb 1, 2018 at 7:59 PM, Boaz Harrosh <boazh@netapp.com> wrote:
>> On 01/02/18 20:34, Chuck Lever wrote:
>> <>
>>> This work was also presented at the SNIA Persistent Memory Summit last week.
>>> The use case of course is providing a user space platform for the development
>>> and deployment of memory-based file systems. The value-add of this kind of
>>> file system is ultra-low latency, which is a challenge for the current most
>>> popular such framework, FUSE.
> 
> I can see the numbers being very impressive and very happy to see
> progress being made in this field.
> 
> What I'd also really be interested to see is where those speedups come from.
> 
> As of linux-4.2/libfuse-3.0 there is a scalability improvement, that
> can be enabled with the "-o clone_fd" option. This option creates
> per-thread queues, which are prerequisite to achieving full CPU/NUMA
> affinity for request processing.   

Is it not on by default. Is there a plane to make this the default?

> Even just turning of "clone_fd"
> might improve the latency numbers for FUSE.
> 

Thank you Miklos.

Yes I suspect my FUSE foo is not so strong and I have not optimized
FUSE for strongest numbers. Actually the only optimization I tried
was increasing the number of direct IO threads to the Number of CPU's.
(But that did not help much)
Once the dust settles I will again conduct these tests
(And so could you, I hope to release the code this week) and we should see.

I have one question: If I understand  correctly you expect scalability
and CPU locality to highly improve with the new version and the proper
configuration. Do you also expect the single thread latency to improve
as well?

> Thanks,
> Miklos
> 

Thanks
Boaz

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [LSF/MM TOPIC ATTEND][RFD] ZUFS - Zero-copy User-mode File System
  2018-02-05 13:04       ` Boaz Harrosh
@ 2018-02-05 15:48         ` Miklos Szeredi
  0 siblings, 0 replies; 12+ messages in thread
From: Miklos Szeredi @ 2018-02-05 15:48 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: Boaz Harrosh, Chuck Lever, lsf-pc, linux-fsdevel, Ric Wheeler,
	Jan Kara, Steven Whitehouse, Tom Coughlan, Jeff Moyer, Sage Weil,
	Andy Rudof, Anna Schumaker, Amir Goldstein, Stephen Bates,
	Amit Golander, Sagi Manole, Shachar Sharon, Josef Bacik

On Mon, Feb 5, 2018 at 2:04 PM, Boaz Harrosh <openosd@gmail.com> wrote:

> I have one question: If I understand  correctly you expect scalability
> and CPU locality to highly improve with the new version and the proper
> configuration. Do you also expect the single thread latency to improve
> as well?

I don't expect high improvement.  Just wondering, where fuse is now
and how much improvements can be had with various tweaks.

I expect CPU/NUMA locality can help, but that's not yet what the fuse
code does, so we cannot try it out.  What "clone_fd" does is improve
scalability, but even if enabled there are some bottlenecks in the
kernel part that could be improved.

Also I'm interested how much performance the zufs API brings compared
to fuse.  I understand the NIH pressure, but when inventing new
interfaces, you should understand the tradeoffs.   Even if you design
your interface perfectly for the first time (unlikely) you can miss
out on a lot of benefits of an existing, well proven and widely used
interface, such as existing apps that use that interface and the user
(and tester) base that comes with that.

So I'd definitely like to have some conversation about interfaces.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [LSF/MM TOPIC ATTEND][RFD] ZUFS - Zero-copy User-mode File System
  2018-02-02 16:09       ` Chuck Lever
  2018-02-02 16:13         ` Bruce Fields
@ 2018-02-09 17:47         ` Steve French
  1 sibling, 0 replies; 12+ messages in thread
From: Steve French @ 2018-02-09 17:47 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Bruce Fields, Boaz Harrosh, lsf-pc, linux-fsdevel, Ric Wheeler,
	Jan Kara, Steven Whitehouse, coughlan, Jeff Moyer, Sage Weil,
	Miklos Szeredi, Andy Rudof, Anna Schumaker, Amir Goldstein,
	Stephen Bates, Amit Golander, Sagi Manole, Shachar Sharon,
	Josef Bacik

On Fri, Feb 2, 2018 at 10:09 AM, Chuck Lever <chuck.lever@oracle.com> wrote:
>
>
>> On Feb 2, 2018, at 10:49 AM, bfields@fieldses.org wrote:
>>
>> On Thu, Feb 01, 2018 at 08:59:18PM +0200, Boaz Harrosh wrote:
>>> On 01/02/18 20:34, Chuck Lever wrote: <>
>>>> This work was also presented at the SNIA Persistent Memory Summit
>>>> last week.  The use case of course is providing a user space
>>>> platform for the development and deployment of memory-based file
>>>> systems. The value-add of this kind of file system is ultra-low
>>>> latency, which is a challenge for the current most popular such
>>>> framework, FUSE.
>>>>
>>>> To start, I can think of three areas where specific questions might
>>>> be entertained by LSF/MM attendees:
>>>>
>>>> - Spectre mitigations make this whole "user space filesystem"
>>>> arrangement even slower, thanks to additional context switches
>>>> between user space and the kernel.
>>
>> I think you're referring to the KPTI patches, which address Meltdown,
>> not Spectre.
>
> I enabled KPTI on my NFS client and server systems in early
> v4.15-rc, and didn't measure a change in latency or throughput.
>
> But with v4.15 final, which includes some Spectre mitigations,
> write(2) on NFS files, for example, takes about 15us longer.
> Since the RPC round-trip times did not increase, I presume this
> extra latency is incurred on the client, where the user-kernel
> boundary transitions occur.
>
> <shrug>

That is interesting data.

A loosely related question is whether ZUFS would be helpful in
a typical NFS or SMB3 scenario (under the server) - especially
with low latency RDMA (SMB3 Direct) connection to the server
(in the case of SMB3, we would want to consider what this would
look like with i/o from the same client, potentially the same file,
coming in on multiple RDMA cards ).

-- 
Thanks,

Steve

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [LSF/MM TOPIC ATTEND][RFD] ZUFS - Zero-copy User-mode File System
  2018-02-01 18:59   ` Boaz Harrosh
  2018-02-02  9:36     ` Miklos Szeredi
  2018-02-02 15:49     ` J. Bruce Fields
@ 2018-03-05 12:18     ` Greg KH
  2 siblings, 0 replies; 12+ messages in thread
From: Greg KH @ 2018-03-05 12:18 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: Chuck Lever, lsf-pc, linux-fsdevel, Ric Wheeler, Jan Kara,
	Steven Whitehouse, coughlan, Jeff Moyer, Sage Weil,
	Miklos Szeredi, Andy Rudof, Anna Schumaker, Amir Goldstein,
	Stephen Bates, Amit Golander, Sagi Manole, Shachar Sharon,
	Josef Bacik

On Thu, Feb 01, 2018 at 08:59:18PM +0200, Boaz Harrosh wrote:
> > - The fundamental innovation of ZUFS is porting Solaris "doors" to Linux. A
> > "door" is a local RPC mechanism that stays on the same thread to reduce
> > scheduling overhead during calls to services provided by daemons on the local
> > system. Is there interest in building a generic "doors" facility in Linux?
> 
> People said I should look into the Binder project from Google. It is already
> in Kernel, and is used by Android. As I understand they have exactly such
> an object like you describe above. Combined with my thread-array it sounds
> like what I want.

There was a "libdoor" userspace library around a long time ago (I was
the GSoC mentor for it) to enable Solaris apps to be able to use the
"door" api on Linux.  Turns out no one really cared about it as the
number of Solaris-only applications seems to have dropped to almost 0.

And yes, if you want IPC that passes the cpu scheduler to the receiver,
use binder.  It's much faster and all of the old problems that used to
be present with it (scalability, security, etc.) seem to now be resolved
in the later kernel releases.

good luck!

greg k-h

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2018-03-05 12:18 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-01 13:51 [LSF/MM TOPIC ATTEND][RFD] ZUFS - Zero-copy User-mode File System Boaz Harrosh
2018-02-01 18:34 ` Chuck Lever
2018-02-01 18:59   ` Boaz Harrosh
2018-02-02  9:36     ` Miklos Szeredi
2018-02-05 13:04       ` Boaz Harrosh
2018-02-05 15:48         ` Miklos Szeredi
2018-02-02 15:49     ` J. Bruce Fields
2018-02-02 16:09       ` Chuck Lever
2018-02-02 16:13         ` Bruce Fields
2018-02-09 17:47         ` Steve French
2018-02-05 12:53       ` Boaz Harrosh
2018-03-05 12:18     ` Greg KH

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).