linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Boaz Harrosh <boazh@netapp.com>
To: <lsf-pc@lists.linux-foundation.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Ric Wheeler <rwheeler@redhat.com>, Jan Kara <jack@suse.cz>,
	Steven Whitehouse <swhiteho@redhat.com>, <coughlan@redhat.com>,
	Jefff moyer <jmoyer@redhat.com>, Sage Weil <sweil@redhat.com>,
	Miklos Szeredi <mszeredi@redhat.com>,
	Andy Rudof <andy.rudoff@intel.com>,
	Anna Schumaker <Anna.Schumaker@netapp.com>,
	"Amir Goldstein" <amir73il@gmail.com>,
	Stephen Bates <stephen@eideticom.com>,
	"Amit Golander" <Amit.Golander@netapp.com>,
	Sagi Manole <sagim@netapp.com>,
	"Shachar Sharon" <Shachar.Sharon@netapp.com>,
	Josef Bacik <josef@redhat.com>
Subject: [LSF/MM TOPIC ATTEND][RFD] ZUFS - Zero-copy User-mode File System
Date: Thu, 1 Feb 2018 15:51:56 +0200	[thread overview]
Message-ID: <8d119597-4543-c6a4-917f-14f4f4a6a855@netapp.com> (raw)

Sorry I'm resending because this mail never appeared on linux-fsdevel@ML. I'm assuming because it had an attachment. So here is a URL on the web of the slides attached:
http://linuxplumbersconf.org/2017/ocw//system/presentations/4703/original/ZUFS_for_LinuxPlumbers_LPC.pptx
~~~~~

Hi Fellow coders

Last plumbers I have introduced a new project ZUFS,
(See attached slides of what was presented then. Specially the POC benchmarks)

ZUFS - stands for Zero-copy User-mode FS
- It is geared towards true zero copy end to end of both data and meta data.
- It is geared towards very *low latency*, very high CPU locality, lock-less parallelism.
- Synchronous operations
- Numa awareness

Short description:
  ZUFS is a from scratch implementation of a filesystem-in-user-space, which tries to address the above goals. from the get go it is aimed for pmem based FSs. But can easily support other type of FSs that can utilize x10 latency and parallelism improvements. The novelty of this project is that the interface is designed with a modern multi-core NUMA machine in mind down to the ABI, so to reach these goals.

Not only FSs need apply, also any kind of user-mode Server can set up a pseudo filesystem and communicate with application via virtual files. These can then benefit from zero copy low-latency communication directly to/from application buffers. Or Application mmap direct Server resources. As long as it looks like a file system to the Kernel.

During Plumbers a few people and companies showed deep interest in this project. Mainly because it introduces a new kind of synchronous, low-latency communication between many application and a user-mode Server, as the FS example above. (Perhaps this pattern could be extended to other areas but this is out of scope for now.)

Since then I have been banging on some real implementation and very soon (Once Netapp legal approves the code, should be done next week) will release a first draft RFC for review. (I will be sending some design notes soon).

Current status is that we have couple trivial filesystem implementations and together with the Kernel module the UM-Server and the FSs User-mode pluggin can actually pass a good bunch of xfstests quick run. (Still working on Stability)

I would like to present current (to LSF time) implementation and status.  But mainly to consult with the Kernel Gurus at LSF/MM how to HARDEN and secure this very ambitious project. Specially as there are couple of mm and scheduler patches that will need to be submitted along with this project.

CC'ed are people that stated interest in the past about ZUFS. Sorry if I forgot some. Please comment if you are interested to talk about this on LSF/MM

Just to get some points across as I said this project is all about performance and low latency. Here below are a POC results I have run (Repeated from the attached slides)

	In Kernel FS			ZUFS			FUSE	
Threads	Op/s	Lat (us)		Op/s	Lat [us]	Op/s	Lat [us]
1	388361	2.271589		200799	4.6		71820	13.5
2	635115	2.604376		314321	5.9		148083	13.1
4	1260307	2.626361		565574	6.6		212133	18.3
8	2744963	2.485292		1113138	6.6		209799	37.6
12	2126945	5.020506		1598451	6.8		201689	58.7
18	4350995	3.386433		1648689	7.8		174823	101.8
24	4211180	4.784997		1702285	8		149413	159
36	3057166	9.291997		1783346	13.4		148276	240.7
48	3148972	10.382461		1741873	17.4		145296	327.3

I have used an average server machine in our lab with two NUMA nodes and total of 40 cores (Can't remember all the details).
Running fio with 4k random writes. The IO is then just memcpy_nt() to a pmem simulated DRAM. The fio was run with more and
more threads (see threads column) (See the nice graphs inside)

We can see that we are still > x2 slower than in-Kernel FS. But I believe I can shave off another 1 us by optimizing the app-to-server thread switch by utilizing perhaps the "Binder" scheduler object or devising another way to not be going through the scheduler (and its locks) when switching VMs

Thank you
Boaz

             reply	other threads:[~2018-02-01 14:01 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-01 13:51 Boaz Harrosh [this message]
2018-02-01 18:34 ` [LSF/MM TOPIC ATTEND][RFD] ZUFS - Zero-copy User-mode File System Chuck Lever
2018-02-01 18:59   ` Boaz Harrosh
2018-02-02  9:36     ` Miklos Szeredi
2018-02-05 13:04       ` Boaz Harrosh
2018-02-05 15:48         ` Miklos Szeredi
2018-02-02 15:49     ` J. Bruce Fields
2018-02-02 16:09       ` Chuck Lever
2018-02-02 16:13         ` Bruce Fields
2018-02-09 17:47         ` Steve French
2018-02-05 12:53       ` Boaz Harrosh
2018-03-05 12:18     ` Greg KH

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8d119597-4543-c6a4-917f-14f4f4a6a855@netapp.com \
    --to=boazh@netapp.com \
    --cc=Amit.Golander@netapp.com \
    --cc=Anna.Schumaker@netapp.com \
    --cc=Shachar.Sharon@netapp.com \
    --cc=amir73il@gmail.com \
    --cc=andy.rudoff@intel.com \
    --cc=coughlan@redhat.com \
    --cc=jack@suse.cz \
    --cc=jmoyer@redhat.com \
    --cc=josef@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=mszeredi@redhat.com \
    --cc=rwheeler@redhat.com \
    --cc=sagim@netapp.com \
    --cc=stephen@eideticom.com \
    --cc=sweil@redhat.com \
    --cc=swhiteho@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).