From: Mikulas Patocka <mpatocka@redhat.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: Dave Chinner <david@fromorbit.com>,
Dan Williams <dan.j.williams@intel.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Andrew Morton <akpm@linux-foundation.org>,
Vishal Verma <vishal.l.verma@intel.com>,
Dave Jiang <dave.jiang@intel.com>,
Ira Weiny <ira.weiny@intel.com>, Jan Kara <jack@suse.cz>,
Eric Sandeen <esandeen@redhat.com>,
Dave Chinner <dchinner@redhat.com>,
"Kani, Toshi" <toshi.kani@hpe.com>,
"Norton, Scott J" <scott.norton@hpe.com>,
"Tadakamadla,
Rajesh (DCIG/CDI/HPS Perf)" <rajesh.tadakamadla@hpe.com>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
linux-nvdimm <linux-nvdimm@lists.01.org>
Subject: Re: NVFS XFS metadata (was: [PATCH] pmem: export the symbols __copy_user_flushcache and __copy_from_user_flushcache)
Date: Thu, 24 Sep 2020 11:00:20 -0400 (EDT) [thread overview]
Message-ID: <alpine.LRH.2.02.2009240853200.3485@file01.intranet.prod.int.rdu2.redhat.com> (raw)
In-Reply-To: <20200922172553.GL32101@casper.infradead.org>
On Tue, 22 Sep 2020, Matthew Wilcox wrote:
> > > The NVFS indirect block tree has a fan-out of 16,
> >
> > No. The top level in the inode contains 16 blocks (11 direct and 5
> > indirect). And each indirect block can have 512 pointers (4096/8). You can
> > format the device with larger block size and this increases the fanout
> > (the NVFS block size must be greater or equal than the system page size).
> >
> > 2 levels can map 1GiB (4096*512^2), 3 levels can map 512 GiB, 4 levels can
> > map 256 TiB and 5 levels can map 128 PiB.
>
> But compare to an unfragmented file ... you can map the entire thing with
> a single entry. Even if you have to use a leaf node, you can get four
> extents in a single cacheline (and that's a fairly naive leaf node layout;
> I don't know exactly what XFS uses)
But the benchmarks show that it is comparable to extent-based filesystems.
> > > Rename is another operation that has specific "operation has atomic
> > > behaviour" expectations. I haven't looked at how you've
> > > implementated that yet, but I suspect it also is extremely difficult
> > > to implement in an atomic manner using direct pmem updates to the
> > > directory structures.
> >
> > There is a small window when renamed inode is neither in source nor in
> > target directory. Fsck will reclaim such inode and add it to lost+found -
> > just like on EXT2.
>
> ... ouch. If you have to choose, it'd be better to link it to the second
> directory then unlink it from the first one. Then your fsck can detect
> it has the wrong count and fix up the count (ie link it into both
> directories rather than neither).
I admit that this is lame and I'll fix it. Rename is not so
performance-critical, so I can add a small journal for this.
> > If you think that the lack of journaling is show-stopper, I can implement
> > it. But then, I'll have something that has complexity of EXT4 and
> > performance of EXT4. So that there will no longer be any reason why to use
> > NVFS over EXT4. Without journaling, it will be faster than EXT4 and it may
> > attract some users who want good performance and who don't care about GID
> > and UID being updated atomically, etc.
>
> Well, what's your intent with nvfs? Do you already have customers in mind
> who want to use this in production, or is this somewhere to play with and
> develop concepts that might make it into one of the longer-established
> filesystems?
I develop it just because I thought it may be interesting. So far, it
doesn't have any serious users (the physical format is still changing). I
hope that it could be useable as a general purpose root filesystem when
Optane DIMMs become common.
Mikulas
next prev parent reply other threads:[~2020-09-24 15:00 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-09-15 12:34 [RFC] nvfs: a filesystem for persistent memory Mikulas Patocka
2020-09-15 13:00 ` Matthew Wilcox
2020-09-15 13:24 ` Mikulas Patocka
2020-09-22 10:04 ` Ritesh Harjani
2020-09-15 15:16 ` Dan Williams
2020-09-15 16:58 ` Mikulas Patocka
2020-09-15 17:38 ` Mikulas Patocka
2020-09-16 10:57 ` [PATCH] pmem: export the symbols __copy_user_flushcache and __copy_from_user_flushcache Mikulas Patocka
2020-09-16 16:21 ` Dan Williams
2020-09-16 17:24 ` Mikulas Patocka
2020-09-16 17:40 ` Dan Williams
2020-09-16 18:06 ` Mikulas Patocka
2020-09-21 16:20 ` NVFS XFS metadata (was: [PATCH] pmem: export the symbols __copy_user_flushcache and __copy_from_user_flushcache) Mikulas Patocka
2020-09-22 5:03 ` Dave Chinner
2020-09-22 16:46 ` Mikulas Patocka
2020-09-22 17:25 ` Matthew Wilcox
2020-09-24 15:00 ` Mikulas Patocka [this message]
2020-09-28 15:22 ` Mikulas Patocka
2020-09-23 2:45 ` Dave Chinner
2020-09-23 9:20 ` A bug in ext4 with big directories (was: NVFS XFS metadata) Mikulas Patocka
2020-09-23 9:44 ` Jan Kara
2020-09-23 12:46 ` Mikulas Patocka
2020-09-23 17:19 ` NVFS XFS metadata (was: [PATCH] pmem: export the symbols __copy_user_flushcache and __copy_from_user_flushcache) Mikulas Patocka
2020-09-23 9:57 ` Jan Kara
2020-09-23 13:11 ` Mikulas Patocka
2020-09-23 15:04 ` Matthew Wilcox
2020-09-22 12:28 ` Matthew Wilcox
2020-09-22 12:39 ` Mikulas Patocka
2020-09-16 18:56 ` [PATCH] pmem: fix __copy_user_flushcache Mikulas Patocka
2020-09-18 1:53 ` Dan Williams
2020-09-18 12:25 ` the "read" syscall sees partial effects of the "write" syscall Mikulas Patocka
2020-09-18 13:13 ` Jan Kara
2020-09-18 18:02 ` Linus Torvalds
2020-09-20 23:41 ` Dave Chinner
2020-09-17 6:50 ` [PATCH] pmem: export the symbols __copy_user_flushcache and __copy_from_user_flushcache Christoph Hellwig
2020-09-21 16:19 ` [RFC] nvfs: a filesystem for persistent memory Mikulas Patocka
2020-09-21 16:29 ` Dan Williams
2020-09-22 15:43 ` Ira Weiny
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LRH.2.02.2009240853200.3485@file01.intranet.prod.int.rdu2.redhat.com \
--to=mpatocka@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=dan.j.williams@intel.com \
--cc=dave.jiang@intel.com \
--cc=david@fromorbit.com \
--cc=dchinner@redhat.com \
--cc=esandeen@redhat.com \
--cc=ira.weiny@intel.com \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nvdimm@lists.01.org \
--cc=rajesh.tadakamadla@hpe.com \
--cc=scott.norton@hpe.com \
--cc=torvalds@linux-foundation.org \
--cc=toshi.kani@hpe.com \
--cc=viro@zeniv.linux.org.uk \
--cc=vishal.l.verma@intel.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).