[RFC v2] nvfs: a filesystem for persistent memory

* [RFC v2] nvfs: a filesystem for persistent memory
@ 2021-01-07 13:15 Mikulas Patocka
  2021-01-07 15:11 ` Expense of read_iter Matthew Wilcox
  2021-01-10 16:20 ` [RFC v2] nvfs: a filesystem for persistent memory Al Viro
  0 siblings, 2 replies; 27+ messages in thread
From: Mikulas Patocka @ 2021-01-07 13:15 UTC (permalink / raw)
  To: Andrew Morton, Dan Williams, Vishal Verma, Dave Jiang, Ira Weiny,
	Matthew Wilcox, Jan Kara, Steven Whitehouse, Eric Sandeen,
	Dave Chinner, Theodore Ts'o, Wang Jianchao, Kani, Toshi,
	Norton, Scott J, Tadakamadla, Rajesh
  Cc: linux-kernel, linux-fsdevel, linux-nvdimm

Hi

I announce a new version of NVFS - a filesystem for persistent memory.
	http://people.redhat.com/~mpatocka/nvfs/
	git://leontynka.twibright.com/nvfs.git

Changes since the last release:

* I added a microjournal to the filesystem, it can hold up to 16 entries. 
  Each CPU has it's own journal, so that there is no lock contention. The 
  journal is used to provide atomicity of reaname() and extended attribute 
  replace.
  (note that file creation or deletion doesn't use the journal, because 
  these operations can be deterministically cleaned up by fsck)

* I created a framework that can be used to verify the filesystem driver. 
  It logs all writes and memory barriers to a file, the entries in the 
  file are randomly reordered (to simulate reordering in the CPU 
  write-combining buffers), the sequence is cut at a random point (to 
  simulate a system crash) and the result is replayed on a filesystem 
  image.
  With this framework, we can for example check that if a crash happens 
  during rename(), either old file or new file will be present in a 
  directory.
  This framework helped to find a few bugs in sequencing the writes.

* If we map an executable image, we turn off the DAX flag on the inode 
  (because executables run 4% slower from persistent memory). There is 
  also a switch that can turn DAX always off or always on.

I'd like to ask about this piece of code in __kernel_read:
	if (unlikely(!file->f_op->read_iter || file->f_op->read))
		return warn_unsupported...
and __kernel_write:
	if (unlikely(!file->f_op->write_iter || file->f_op->write))
		return warn_unsupported...

- It exits with an error if both read_iter and read or write_iter and 
write are present.

I found out that on NVFS, reading a file with the read method has 10% 
better performance than the read_iter method. The benchmark just reads the 
same 4k page over and over again - and the cost of creating and parsing 
the kiocb and iov_iter structures is just that high.

So, I'd like to have both read and read_iter methods. Could the above 
conditions be changed, so that they don't fail with an error if the "read" 
or "write" method is present?

Mikulas
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 27+ messages in thread