linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mikulas Patocka <mpatocka@redhat.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Andrew Morton <akpm@linux-foundation.org>,
	Vishal Verma <vishal.l.verma@intel.com>,
	Dave Jiang <dave.jiang@intel.com>,
	Ira Weiny <ira.weiny@intel.com>,
	Matthew Wilcox <willy@infradead.org>, Jan Kara <jack@suse.cz>,
	Eric Sandeen <esandeen@redhat.com>,
	Dave Chinner <dchinner@redhat.com>,
	"Kani, Toshi" <toshi.kani@hpe.com>,
	"Norton, Scott J" <scott.norton@hpe.com>,
	"Tadakamadla,
	Rajesh (DCIG/CDI/HPS Perf)"  <rajesh.tadakamadla@hpe.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	linux-nvdimm <linux-nvdimm@lists.01.org>
Subject: NVFS XFS metadata (was: [PATCH] pmem: export the symbols __copy_user_flushcache and __copy_from_user_flushcache)
Date: Mon, 21 Sep 2020 12:20:42 -0400 (EDT)	[thread overview]
Message-ID: <alpine.LRH.2.02.2009191336380.3478@file01.intranet.prod.int.rdu2.redhat.com> (raw)
In-Reply-To: <alpine.LRH.2.02.2009161359540.20710@file01.intranet.prod.int.rdu2.redhat.com>



On Wed, 16 Sep 2020, Mikulas Patocka wrote:

> 
> 
> On Wed, 16 Sep 2020, Dan Williams wrote:
> 
> > On Wed, Sep 16, 2020 at 10:24 AM Mikulas Patocka <mpatocka@redhat.com> wrote:
> > >
> > > > My first question about nvfs is how it compares to a daxfs with
> > > > executables and other binaries configured to use page cache with the
> > > > new per-file dax facility?
> > >
> > > nvfs is faster than dax-based filesystems on metadata-heavy operations
> > > because it doesn't have the overhead of the buffer cache and bios. See
> > > this: http://people.redhat.com/~mpatocka/nvfs/BENCHMARKS
> > 
> > ...and that metadata problem is intractable upstream? Christoph poked
> > at bypassing the block layer for xfs metadata operations [1], I just
> > have not had time to carry that further.
> > 
> > [1]: "xfs: use dax_direct_access for log writes", although it seems
> > he's dropped that branch from his xfs.git
> 
> XFS is very big. I wanted to create something small.

And the another difference is that XFS metadata are optimized for disks 
and SSDs.

On disks and SSDs, reading one byte is as costly as reading a full block. 
So we must put as much information to a block as possible. XFS uses 
b+trees for file block mapping and for directories - it is reasonable 
decision because b+trees minimize the number of disk accesses.

On persistent memory, each access has its own cost, so NVFS uses metadata 
structures that minimize the number of cache lines accessed (rather than 
the number of blocks accessed). For block mapping, NVFS uses the classic 
unix dierct/indirect blocks - if a file block is mapped by a 3-rd level 
indirect block, we do just three memory accesses and we are done. If we 
used b+trees, the number of accesses would be much larger than 3 (we would 
have to do binary search in the b+tree nodes).

The same for directories - NVFS hashes the file name and uses radix-tree 
to locate a directory page where the directory entry is located. XFS 
b+trees would result in much more accesses than the radix-tree.

Regarding journaling - NVFS doesn't do it because persistent memory is so 
fast that we can just check it in the case of crash. NVFS has a 
multithreaded fsck that can do 3 million inodes per second. XFS does 
journaling (it was reasonable decision for disks where fsck took hours) 
and it will cause overhead for all the filesystem operations.

Mikulas


  reply	other threads:[~2020-09-21 16:20 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-15 12:34 [RFC] nvfs: a filesystem for persistent memory Mikulas Patocka
2020-09-15 13:00 ` Matthew Wilcox
2020-09-15 13:24   ` Mikulas Patocka
2020-09-22 10:04   ` Ritesh Harjani
2020-09-15 15:16 ` Dan Williams
2020-09-15 16:58   ` Mikulas Patocka
2020-09-15 17:38     ` Mikulas Patocka
2020-09-16 10:57       ` [PATCH] pmem: export the symbols __copy_user_flushcache and __copy_from_user_flushcache Mikulas Patocka
2020-09-16 16:21         ` Dan Williams
2020-09-16 17:24           ` Mikulas Patocka
2020-09-16 17:40             ` Dan Williams
2020-09-16 18:06               ` Mikulas Patocka
2020-09-21 16:20                 ` Mikulas Patocka [this message]
2020-09-22  5:03                   ` NVFS XFS metadata (was: [PATCH] pmem: export the symbols __copy_user_flushcache and __copy_from_user_flushcache) Dave Chinner
2020-09-22 16:46                     ` Mikulas Patocka
2020-09-22 17:25                       ` Matthew Wilcox
2020-09-24 15:00                         ` Mikulas Patocka
2020-09-28 15:22                           ` Mikulas Patocka
2020-09-23  2:45                       ` Dave Chinner
2020-09-23  9:20                         ` A bug in ext4 with big directories (was: NVFS XFS metadata) Mikulas Patocka
2020-09-23  9:44                           ` Jan Kara
2020-09-23 12:46                             ` Mikulas Patocka
2020-09-23 17:19                         ` NVFS XFS metadata (was: [PATCH] pmem: export the symbols __copy_user_flushcache and __copy_from_user_flushcache) Mikulas Patocka
2020-09-23  9:57                       ` Jan Kara
2020-09-23 13:11                         ` Mikulas Patocka
2020-09-23 15:04                           ` Matthew Wilcox
2020-09-22 12:28                   ` Matthew Wilcox
2020-09-22 12:39                     ` Mikulas Patocka
2020-09-16 18:56               ` [PATCH] pmem: fix __copy_user_flushcache Mikulas Patocka
2020-09-18  1:53                 ` Dan Williams
2020-09-18 12:25                   ` the "read" syscall sees partial effects of the "write" syscall Mikulas Patocka
2020-09-18 13:13                     ` Jan Kara
2020-09-18 18:02                       ` Linus Torvalds
2020-09-20 23:41                       ` Dave Chinner
2020-09-17  6:50               ` [PATCH] pmem: export the symbols __copy_user_flushcache and __copy_from_user_flushcache Christoph Hellwig
2020-09-21 16:19   ` [RFC] nvfs: a filesystem for persistent memory Mikulas Patocka
2020-09-21 16:29     ` Dan Williams
2020-09-22 15:43     ` Ira Weiny

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LRH.2.02.2009191336380.3478@file01.intranet.prod.int.rdu2.redhat.com \
    --to=mpatocka@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=dchinner@redhat.com \
    --cc=esandeen@redhat.com \
    --cc=ira.weiny@intel.com \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=rajesh.tadakamadla@hpe.com \
    --cc=scott.norton@hpe.com \
    --cc=torvalds@linux-foundation.org \
    --cc=toshi.kani@hpe.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=vishal.l.verma@intel.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).