linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: Mikulas Patocka <mpatocka@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Andrew Morton <akpm@linux-foundation.org>,
	Vishal Verma <vishal.l.verma@intel.com>,
	Dave Jiang <dave.jiang@intel.com>,
	Ira Weiny <ira.weiny@intel.com>,
	Matthew Wilcox <willy@infradead.org>, Jan Kara <jack@suse.cz>,
	Eric Sandeen <esandeen@redhat.com>,
	Dave Chinner <dchinner@redhat.com>,
	"Kani, Toshi" <toshi.kani@hpe.com>,
	"Norton, Scott J" <scott.norton@hpe.com>,
	"Tadakamadla,
	Rajesh (DCIG/CDI/HPS Perf)"  <rajesh.tadakamadla@hpe.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	linux-nvdimm <linux-nvdimm@lists.01.org>
Subject: Re: [RFC] nvfs: a filesystem for persistent memory
Date: Tue, 15 Sep 2020 08:16:11 -0700	[thread overview]
Message-ID: <CAPcyv4gh=QaDB61_9_QTgtt-pZuTFdR6td0orE0VMH6=6SA2vw@mail.gmail.com> (raw)
In-Reply-To: <alpine.LRH.2.02.2009140852030.22422@file01.intranet.prod.int.rdu2.redhat.com>

On Tue, Sep 15, 2020 at 5:35 AM Mikulas Patocka <mpatocka@redhat.com> wrote:
>
> Hi
>
> I am developing a new filesystem suitable for persistent memory - nvfs.

Nice!

> The goal is to have a small and fast filesystem that can be used on
> DAX-based devices. Nvfs maps the whole device into linear address space
> and it completely bypasses the overhead of the block layer and buffer
> cache.

So does device-dax, but device-dax lacks read(2)/write(2).

> In the past, there was nova filesystem for pmem, but it was abandoned a
> year ago (the last version is for the kernel 5.1 -
> https://github.com/NVSL/linux-nova ). Nvfs is smaller and performs better.
>
> The design of nvfs is similar to ext2/ext4, so that it fits into the VFS
> layer naturally, without too much glue code.
>
> I'd like to ask you to review it.
>
>
> tarballs:
>         http://people.redhat.com/~mpatocka/nvfs/
> git:
>         git://leontynka.twibright.com/nvfs.git
> the description of filesystem internals:
>         http://people.redhat.com/~mpatocka/nvfs/INTERNALS
> benchmarks:
>         http://people.redhat.com/~mpatocka/nvfs/BENCHMARKS
>
>
> TODO:
>
> - programs run approximately 4% slower when running from Optane-based
> persistent memory. Therefore, programs and libraries should use page cache
> and not DAX mapping.

This needs to be based on platform firmware data f(ACPI HMAT) for the
relative performance of a PMEM range vs DRAM. For example, this
tradeoff should not exist with battery backed DRAM, or virtio-pmem.

>
> - when the fsck.nvfs tool mmaps the device /dev/pmem0, the kernel uses
> buffer cache for the mapping. The buffer cache slows does fsck by a factor
> of 5 to 10. Could it be possible to change the kernel so that it maps DAX
> based block devices directly?

We've been down this path before.

5a023cdba50c block: enable dax for raw block devices
9f4736fe7ca8 block: revert runtime dax control of the raw block device
acc93d30d7d4 Revert "block: enable dax for raw block devices"

EXT2/4 metadata buffer management depends on the page cache and we
eliminated a class of bugs by removing that support. The problems are
likely tractable, but there was not a straightforward fix visible at
the time.

> - __copy_from_user_inatomic_nocache doesn't flush cache for leading and
> trailing bytes.

You want copy_user_flushcache(). See how fs/dax.c arranges for
dax_copy_from_iter() to route to pmem_copy_from_iter().

  parent reply	other threads:[~2020-09-15 15:56 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-15 12:34 [RFC] nvfs: a filesystem for persistent memory Mikulas Patocka
2020-09-15 13:00 ` Matthew Wilcox
2020-09-15 13:24   ` Mikulas Patocka
2020-09-22 10:04   ` Ritesh Harjani
2020-09-15 15:16 ` Dan Williams [this message]
2020-09-15 16:58   ` Mikulas Patocka
2020-09-15 17:38     ` Mikulas Patocka
2020-09-16 10:57       ` [PATCH] pmem: export the symbols __copy_user_flushcache and __copy_from_user_flushcache Mikulas Patocka
2020-09-16 16:21         ` Dan Williams
2020-09-16 17:24           ` Mikulas Patocka
2020-09-16 17:40             ` Dan Williams
2020-09-16 18:06               ` Mikulas Patocka
2020-09-21 16:20                 ` NVFS XFS metadata (was: [PATCH] pmem: export the symbols __copy_user_flushcache and __copy_from_user_flushcache) Mikulas Patocka
2020-09-22  5:03                   ` Dave Chinner
2020-09-22 16:46                     ` Mikulas Patocka
2020-09-22 17:25                       ` Matthew Wilcox
2020-09-24 15:00                         ` Mikulas Patocka
2020-09-28 15:22                           ` Mikulas Patocka
2020-09-23  2:45                       ` Dave Chinner
2020-09-23  9:20                         ` A bug in ext4 with big directories (was: NVFS XFS metadata) Mikulas Patocka
2020-09-23  9:44                           ` Jan Kara
2020-09-23 12:46                             ` Mikulas Patocka
2020-09-23 17:19                         ` NVFS XFS metadata (was: [PATCH] pmem: export the symbols __copy_user_flushcache and __copy_from_user_flushcache) Mikulas Patocka
2020-09-23  9:57                       ` Jan Kara
2020-09-23 13:11                         ` Mikulas Patocka
2020-09-23 15:04                           ` Matthew Wilcox
2020-09-22 12:28                   ` Matthew Wilcox
2020-09-22 12:39                     ` Mikulas Patocka
2020-09-16 18:56               ` [PATCH] pmem: fix __copy_user_flushcache Mikulas Patocka
2020-09-18  1:53                 ` Dan Williams
2020-09-18 12:25                   ` the "read" syscall sees partial effects of the "write" syscall Mikulas Patocka
2020-09-18 13:13                     ` Jan Kara
2020-09-18 18:02                       ` Linus Torvalds
2020-09-20 23:41                       ` Dave Chinner
2020-09-17  6:50               ` [PATCH] pmem: export the symbols __copy_user_flushcache and __copy_from_user_flushcache Christoph Hellwig
2020-09-21 16:19   ` [RFC] nvfs: a filesystem for persistent memory Mikulas Patocka
2020-09-21 16:29     ` Dan Williams
2020-09-22 15:43     ` Ira Weiny

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAPcyv4gh=QaDB61_9_QTgtt-pZuTFdR6td0orE0VMH6=6SA2vw@mail.gmail.com' \
    --to=dan.j.williams@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=dave.jiang@intel.com \
    --cc=dchinner@redhat.com \
    --cc=esandeen@redhat.com \
    --cc=ira.weiny@intel.com \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=mpatocka@redhat.com \
    --cc=rajesh.tadakamadla@hpe.com \
    --cc=scott.norton@hpe.com \
    --cc=torvalds@linux-foundation.org \
    --cc=toshi.kani@hpe.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=vishal.l.verma@intel.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).