All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH/RFC] fscache/cachefiles versus btrfs
@ 2015-04-09  7:49 NeilBrown
  2015-04-09  9:23 ` David Howells
  0 siblings, 1 reply; 10+ messages in thread
From: NeilBrown @ 2015-04-09  7:49 UTC (permalink / raw)
  To: David Howells; +Cc: linux-btrfs, linux-cachefs, linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 7242 bytes --]


hi,
 fscache cannot currently be used with btrfs as the backing store for the
 cache (managed by cachefilesd).
 This is because cachefiles needs the ->bmap address_space_operation, and
 btrfs doesn't provide it.

 cachefiles only uses this to find out if a particular page is a 'hole' or
 not.  For btrfs, this can be done with 'SEEK_DATA'.

 Unfortunately it doesn't seem to be possible to query a filesystem or a file
 to see if SEEK_DATA is reliable or not, so we cannot simply use SEEK_DATA
 when reliable, else ->bmap if available.

 The following patch make fscache work for me on btrfs.  It explicitly checks
 for BTRFS_SUPER_MAGIC.  Not really a nice solution, but all I could think of.

 Is there a better way?  Could a better way be created?  Maybe
 SEEK_DATA_RELIABLE ??

 Comments, suggestions welcome.


Also, if you do try to use fscache on btrfs with 3.19, then nothing gets
cached (as expected) and with a heavy load you can lose a race and get an
asserting fail in fscache_enqueue_operation

	ASSERT(fscache_object_is_available(op->object));

It looks like the object is being killed before it is available...

[  859.700765] kernel BUG at ../fs/fscache/operation.c:38!
...
[  859.703124] Call Trace:
[  859.703193]  [<ffffffffa0448044>] fscache_run_op.isra.4+0x34/0x80 [fscache]
[  859.703260]  [<ffffffffa0448160>] fscache_start_operations+0xa0/0xf0 [fscache]
[  859.703388]  [<ffffffffa0446cd8>] fscache_kill_object+0x98/0xc0 [fscache]
[  859.703455]  [<ffffffffa04475c1>] fscache_object_work_func+0x151/0x210 [fscache]
[  859.703578]  [<ffffffff81078b07>] process_one_work+0x147/0x3c0
[  859.703642]  [<ffffffff8107929c>] worker_thread+0x20c/0x470

I haven't figured out the cause of that yet.


Thanks,
NeilBrown




diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
index 1e51714eb33e..1389d8483d5d 100644
--- a/fs/cachefiles/namei.c
+++ b/fs/cachefiles/namei.c
@@ -20,6 +20,7 @@
 #include <linux/namei.h>
 #include <linux/security.h>
 #include <linux/slab.h>
+#include <linux/magic.h>
 #include "internal.h"
 
 #define CACHEFILES_KEYBUF_SIZE 512
@@ -647,7 +648,8 @@ lookup_again:
 
 			ret = -EPERM;
 			aops = object->dentry->d_inode->i_mapping->a_ops;
-			if (!aops->bmap)
+			if (!aops->bmap &&
+			    object->dentry->d_sb->s_magic != BTRFS_SUPER_MAGIC)
 				goto check_error;
 
 			object->backer = object->dentry;
diff --git a/fs/cachefiles/rdwr.c b/fs/cachefiles/rdwr.c
index c6cd8d7a4eef..49fb330c0ab8 100644
--- a/fs/cachefiles/rdwr.c
+++ b/fs/cachefiles/rdwr.c
@@ -410,11 +410,11 @@ int cachefiles_read_or_alloc_page(struct fscache_retrieval *op,
 
 	inode = object->backer->d_inode;
 	ASSERT(S_ISREG(inode->i_mode));
-	ASSERT(inode->i_mapping->a_ops->bmap);
 	ASSERT(inode->i_mapping->a_ops->readpages);
 
 	/* calculate the shift required to use bmap */
-	if (inode->i_sb->s_blocksize > PAGE_SIZE)
+	if (inode->i_mapping->a_ops->bmap &&
+	    inode->i_sb->s_blocksize > PAGE_SIZE)
 		goto enobufs;
 
 	shift = PAGE_SHIFT - inode->i_sb->s_blocksize_bits;
@@ -423,20 +423,36 @@ int cachefiles_read_or_alloc_page(struct fscache_retrieval *op,
 	op->op.flags |= FSCACHE_OP_ASYNC;
 	op->op.processor = cachefiles_read_copier;
 
-	/* we assume the absence or presence of the first block is a good
-	 * enough indication for the page as a whole
-	 * - TODO: don't use bmap() for this as it is _not_ actually good
-	 *   enough for this as it doesn't indicate errors, but it's all we've
-	 *   got for the moment
-	 */
-	block0 = page->index;
-	block0 <<= shift;
-
-	block = inode->i_mapping->a_ops->bmap(inode->i_mapping, block0);
-	_debug("%llx -> %llx",
-	       (unsigned long long) block0,
-	       (unsigned long long) block);
+	if (inode->i_mapping->a_ops->bmap) {
+		/* we assume the absence or presence of the first block is a good
+		 * enough indication for the page as a whole
+		 * - TODO: don't use bmap() for this as it is _not_ actually good
+		 *   enough for this as it doesn't indicate errors, but it's all we've
+		 *   got for the moment
+		 */
+		block0 = page->index;
+		block0 <<= shift;
 
+		block = inode->i_mapping->a_ops->bmap(inode->i_mapping, block0);
+		_debug("%llx -> %llx",
+		       (unsigned long long) block0,
+		       (unsigned long long) block);
+	} else {
+		/* Use llseek */
+		struct path path;
+		struct file *file;
+		path.mnt = cache->mnt;
+		path.dentry = object->backer;
+		file = dentry_open(&path, O_RDONLY, cache->cache_cred);
+		if (IS_ERR(file))
+			goto enobufs;
+		block = vfs_llseek(file, page->index << PAGE_SHIFT, SEEK_DATA);
+		filp_close(file, NULL);
+		if (block != page->index << PAGE_SHIFT)
+			block = 0;
+		else
+			block = 1;
+	}
 	if (block) {
 		/* submit the apparently valid page to the backing fs to be
 		 * read from disk */
@@ -707,11 +723,11 @@ int cachefiles_read_or_alloc_pages(struct fscache_retrieval *op,
 
 	inode = object->backer->d_inode;
 	ASSERT(S_ISREG(inode->i_mode));
-	ASSERT(inode->i_mapping->a_ops->bmap);
 	ASSERT(inode->i_mapping->a_ops->readpages);
 
 	/* calculate the shift required to use bmap */
-	if (inode->i_sb->s_blocksize > PAGE_SIZE)
+	if (inode->i_mapping->a_ops->bmap &&
+	    inode->i_sb->s_blocksize > PAGE_SIZE)
 		goto all_enobufs;
 
 	shift = PAGE_SHIFT - inode->i_sb->s_blocksize_bits;
@@ -729,21 +745,38 @@ int cachefiles_read_or_alloc_pages(struct fscache_retrieval *op,
 	list_for_each_entry_safe(page, _n, pages, lru) {
 		sector_t block0, block;
 
-		/* we assume the absence or presence of the first block is a
-		 * good enough indication for the page as a whole
-		 * - TODO: don't use bmap() for this as it is _not_ actually
-		 *   good enough for this as it doesn't indicate errors, but
-		 *   it's all we've got for the moment
-		 */
-		block0 = page->index;
-		block0 <<= shift;
-
-		block = inode->i_mapping->a_ops->bmap(inode->i_mapping,
-						      block0);
-		_debug("%llx -> %llx",
-		       (unsigned long long) block0,
-		       (unsigned long long) block);
+		if (inode->i_mapping->a_ops->bmap) {
+			/* we assume the absence or presence of the first block is a
+			 * good enough indication for the page as a whole
+			 * - TODO: don't use bmap() for this as it is _not_ actually
+			 *   good enough for this as it doesn't indicate errors, but
+			 *   it's all we've got for the moment
+			 */
+			block0 = page->index;
+			block0 <<= shift;
+
+			block = inode->i_mapping->a_ops->bmap(inode->i_mapping,
+							      block0);
+			_debug("%llx -> %llx",
+			       (unsigned long long) block0,
+			       (unsigned long long) block);
 
+		} else {
+			/* Use llseek */
+			struct path path;
+			struct file *file;
+			path.mnt = cache->mnt;
+			path.dentry = object->backer;
+			file = dentry_open(&path, O_RDONLY, cache->cache_cred);
+			if (IS_ERR(file))
+				goto all_enobufs;
+			block = vfs_llseek(file, page->index << PAGE_SHIFT, SEEK_DATA);
+			filp_close(file, NULL);
+			if (block != page->index << PAGE_SHIFT)
+				block = 0;
+			else
+				block = 1;
+		}
 		if (block) {
 			/* we have data - add it to the list to give to the
 			 * backing fs */

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH/RFC] fscache/cachefiles versus btrfs
  2015-04-09  7:49 [PATCH/RFC] fscache/cachefiles versus btrfs NeilBrown
@ 2015-04-09  9:23 ` David Howells
  2015-04-09 23:52   ` Dave Chinner
                     ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: David Howells @ 2015-04-09  9:23 UTC (permalink / raw)
  To: NeilBrown; +Cc: dhowells, linux-btrfs, linux-cachefs, linux-fsdevel

NeilBrown <neilb@suse.de> wrote:

>  Is there a better way?  Could a better way be created?  Maybe
>  SEEK_DATA_RELIABLE ??

fiemap() maybe?

> Also, if you do try to use fscache on btrfs with 3.19, then nothing gets
> cached (as expected) and with a heavy load you can lose a race and get an
> asserting fail in fscache_enqueue_operation

Do you have the patches here applied?

	http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=fscache-fixes

David

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH/RFC] fscache/cachefiles versus btrfs
  2015-04-09  9:23 ` David Howells
@ 2015-04-09 23:52   ` Dave Chinner
  2015-04-10  0:42     ` NeilBrown
  2015-04-10  1:24   ` NeilBrown
  2015-04-10 13:28   ` David Howells
  2 siblings, 1 reply; 10+ messages in thread
From: Dave Chinner @ 2015-04-09 23:52 UTC (permalink / raw)
  To: David Howells; +Cc: NeilBrown, linux-btrfs, linux-cachefs, linux-fsdevel

On Thu, Apr 09, 2015 at 10:23:08AM +0100, David Howells wrote:
> NeilBrown <neilb@suse.de> wrote:
> 
> >  Is there a better way?  Could a better way be created?  Maybe
> >  SEEK_DATA_RELIABLE ??
> 
> fiemap() maybe?

fiemap is not reliable for mapping holes - it returns extent info,
not whether there is data in a range. i.e. there can be data over a
hole (e.g. delayed allocation) and fiemap will return it as a hole.
cp made this mistake back when fiemap was first introduced,
resulting in corrupt file copies.

SEEK_HOLE/SEEK_DATA is what you want, as they are page cache
coherent, not extent based operations. And, really if you need it to
really be able to find real holes, then a superblock flag might be a
better way of marking filesystems with the required capability.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH/RFC] fscache/cachefiles versus btrfs
  2015-04-09 23:52   ` Dave Chinner
@ 2015-04-10  0:42     ` NeilBrown
  2015-04-10  1:08       ` Dave Chinner
  0 siblings, 1 reply; 10+ messages in thread
From: NeilBrown @ 2015-04-10  0:42 UTC (permalink / raw)
  To: Dave Chinner; +Cc: David Howells, linux-btrfs, linux-cachefs, linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 1223 bytes --]

On Fri, 10 Apr 2015 09:52:34 +1000 Dave Chinner <david@fromorbit.com> wrote:

> On Thu, Apr 09, 2015 at 10:23:08AM +0100, David Howells wrote:
> > NeilBrown <neilb@suse.de> wrote:
> > 
> > >  Is there a better way?  Could a better way be created?  Maybe
> > >  SEEK_DATA_RELIABLE ??
> > 
> > fiemap() maybe?
> 
> fiemap is not reliable for mapping holes - it returns extent info,
> not whether there is data in a range. i.e. there can be data over a
> hole (e.g. delayed allocation) and fiemap will return it as a hole.
> cp made this mistake back when fiemap was first introduced,
> resulting in corrupt file copies.

I assumed from the doco that FIEMAP_EXTENT_DELALLOC would get returned in
this case.  I guess not?

> 
> SEEK_HOLE/SEEK_DATA is what you want, as they are page cache
> coherent, not extent based operations. And, really if you need it to
> really be able to find real holes, then a superblock flag might be a
> better way of marking filesystems with the required capability.

btrfs seems to use the same underlying functionality for both fiemap and
SEEK_HOLE/SEEK_DATA...

But yes: a new fs_flags flag is probably the go.

Thanks,
NeilBrown

> 
> Cheers,
> 
> Dave.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH/RFC] fscache/cachefiles versus btrfs
  2015-04-10  0:42     ` NeilBrown
@ 2015-04-10  1:08       ` Dave Chinner
  0 siblings, 0 replies; 10+ messages in thread
From: Dave Chinner @ 2015-04-10  1:08 UTC (permalink / raw)
  To: NeilBrown; +Cc: David Howells, linux-btrfs, linux-cachefs, linux-fsdevel

On Fri, Apr 10, 2015 at 10:42:21AM +1000, NeilBrown wrote:
> On Fri, 10 Apr 2015 09:52:34 +1000 Dave Chinner <david@fromorbit.com> wrote:
> 
> > On Thu, Apr 09, 2015 at 10:23:08AM +0100, David Howells wrote:
> > > NeilBrown <neilb@suse.de> wrote:
> > > 
> > > >  Is there a better way?  Could a better way be created?  Maybe
> > > >  SEEK_DATA_RELIABLE ??
> > > 
> > > fiemap() maybe?
> > 
> > fiemap is not reliable for mapping holes - it returns extent info,
> > not whether there is data in a range. i.e. there can be data over a
> > hole (e.g. delayed allocation) and fiemap will return it as a hole.
> > cp made this mistake back when fiemap was first introduced,
> > resulting in corrupt file copies.
> 
> I assumed from the doco that FIEMAP_EXTENT_DELALLOC would get returned in
> this case.  I guess not?

FIEMAP is advisory and implementing features in it are optional.
It's also not guaranteed to be an accurate representation of the
layout of the file because it can change even before hte FIEMAP
ioctl returns to userspace. it's also a representation of the
physical layout of the data in the file, but it does not need to be
coherent with the current data in the file.  i.e. FIEMAP is not
required to report the current state of data in the files to the
caller, just the working physical layout sitting in memory...

> > SEEK_HOLE/SEEK_DATA is what you want, as they are page cache
> > coherent, not extent based operations. And, really if you need it to
> > really be able to find real holes, then a superblock flag might be a
> > better way of marking filesystems with the required capability.
> 
> btrfs seems to use the same underlying functionality for both fiemap and
> SEEK_HOLE/SEEK_DATA...

That's just an internal implementation detail - it does not
define the behaviour and constraints of the information
FIEMAP or SEEK_HOLE/SEEK_DATA APIs are supposed to provide.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH/RFC] fscache/cachefiles versus btrfs
  2015-04-09  9:23 ` David Howells
  2015-04-09 23:52   ` Dave Chinner
@ 2015-04-10  1:24   ` NeilBrown
  2015-04-20  4:49     ` NeilBrown
  2015-04-20  9:27     ` David Howells
  2015-04-10 13:28   ` David Howells
  2 siblings, 2 replies; 10+ messages in thread
From: NeilBrown @ 2015-04-10  1:24 UTC (permalink / raw)
  To: David Howells; +Cc: linux-btrfs, linux-cachefs, linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 1068 bytes --]

On Thu, 09 Apr 2015 10:23:08 +0100 David Howells <dhowells@redhat.com> wrote:

> NeilBrown <neilb@suse.de> wrote:
> 
> >  Is there a better way?  Could a better way be created?  Maybe
> >  SEEK_DATA_RELIABLE ??
> 
> fiemap() maybe?
> 
> > Also, if you do try to use fscache on btrfs with 3.19, then nothing gets
> > cached (as expected) and with a heavy load you can lose a race and get an
> > asserting fail in fscache_enqueue_operation
> 
> Do you have the patches here applied?
> 
> 	http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=fscache-fixes
> 

Do I don't.  I had looked through them and while they did seem to be
addressing similar sorts of races, nothing seems like an obvious match.
I haven't been able to reproduce the BUG_ON myself.  I only have a report
of it repeatedly affecting someone else:

  https://bugzilla.opensuse.org/show_bug.cgi?id=908706

I'll probably have to be happy with fixing usage on btrfs, and hope the other
bug is fixed already or doesn't become a problem.

Thanks,
NeilBrown

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH/RFC] fscache/cachefiles versus btrfs
  2015-04-09  9:23 ` David Howells
  2015-04-09 23:52   ` Dave Chinner
  2015-04-10  1:24   ` NeilBrown
@ 2015-04-10 13:28   ` David Howells
  2015-04-13 17:30     ` Christoph Hellwig
  2 siblings, 1 reply; 10+ messages in thread
From: David Howells @ 2015-04-10 13:28 UTC (permalink / raw)
  To: Dave Chinner
  Cc: dhowells, NeilBrown, linux-btrfs, linux-cachefs, linux-fsdevel

Dave Chinner <david@fromorbit.com> wrote:

> SEEK_HOLE/SEEK_DATA is what you want, as they are page cache
> coherent, not extent based operations. And, really if you need it to
> really be able to find real holes, then a superblock flag might be a
> better way of marking filesystems with the required capability.

Actually, I wonder if what I want is a kernel_read() that returns ENODATA upon
encountering a hole at the beginning of the area to be read.

Of course, what I really, really want is asynchronous, direct read and write
within the kernel, where the read will notify you of any holes, but will read
all the data it can around those holes.

David

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH/RFC] fscache/cachefiles versus btrfs
  2015-04-10 13:28   ` David Howells
@ 2015-04-13 17:30     ` Christoph Hellwig
  0 siblings, 0 replies; 10+ messages in thread
From: Christoph Hellwig @ 2015-04-13 17:30 UTC (permalink / raw)
  To: David Howells
  Cc: Dave Chinner, NeilBrown, linux-btrfs, linux-cachefs,
	linux-fsdevel, linux-nfs

On Fri, Apr 10, 2015 at 02:28:16PM +0100, David Howells wrote:
> Dave Chinner <david@fromorbit.com> wrote:
> 
> > SEEK_HOLE/SEEK_DATA is what you want, as they are page cache
> > coherent, not extent based operations. And, really if you need it to
> > really be able to find real holes, then a superblock flag might be a
> > better way of marking filesystems with the required capability.
> 
> Actually, I wonder if what I want is a kernel_read() that returns ENODATA upon
> encountering a hole at the beginning of the area to be read.

NFS READ_PLUS could also make use of this, but someone needs to actually
implement it.

Until we have that lseek SEEK_HOLE/DATA is the way to go, and the
horrible ->bmap hack needs to die ASAP, I can't believe you managed to
sneak that in in the not too distant past.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH/RFC] fscache/cachefiles versus btrfs
  2015-04-10  1:24   ` NeilBrown
@ 2015-04-20  4:49     ` NeilBrown
  2015-04-20  9:27     ` David Howells
  1 sibling, 0 replies; 10+ messages in thread
From: NeilBrown @ 2015-04-20  4:49 UTC (permalink / raw)
  To: David Howells; +Cc: linux-btrfs, linux-cachefs, linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 1438 bytes --]

On Fri, 10 Apr 2015 11:24:31 +1000 NeilBrown <neilb@suse.de> wrote:

> On Thu, 09 Apr 2015 10:23:08 +0100 David Howells <dhowells@redhat.com> wrote:
> 
> > NeilBrown <neilb@suse.de> wrote:
> > 
> > >  Is there a better way?  Could a better way be created?  Maybe
> > >  SEEK_DATA_RELIABLE ??
> > 
> > fiemap() maybe?
> > 
> > > Also, if you do try to use fscache on btrfs with 3.19, then nothing gets
> > > cached (as expected) and with a heavy load you can lose a race and get an
> > > asserting fail in fscache_enqueue_operation
> > 
> > Do you have the patches here applied?
> > 
> > 	http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=fscache-fixes
> > 
> 
> Do I don't.  I had looked through them and while they did seem to be
> addressing similar sorts of races, nothing seems like an obvious match.
> I haven't been able to reproduce the BUG_ON myself.  I only have a report
> of it repeatedly affecting someone else:
> 
>   https://bugzilla.opensuse.org/show_bug.cgi?id=908706
> 
> I'll probably have to be happy with fixing usage on btrfs, and hope the other
> bug is fixed already or doesn't become a problem.

I managed to reproduce the bug, and when I applied your patches I cannot any
more.  So it looks like you've fixed it - thanks.

That just leave the bmap issue.  I'll post a patch which causes lseek to be
used when the fs says that is OK.

Thanks,
NeilBrown

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH/RFC] fscache/cachefiles versus btrfs
  2015-04-10  1:24   ` NeilBrown
  2015-04-20  4:49     ` NeilBrown
@ 2015-04-20  9:27     ` David Howells
  1 sibling, 0 replies; 10+ messages in thread
From: David Howells @ 2015-04-20  9:27 UTC (permalink / raw)
  To: NeilBrown; +Cc: dhowells, linux-btrfs, linux-cachefs, linux-fsdevel

NeilBrown <neilb@suse.de> wrote:

> I managed to reproduce the bug, and when I applied your patches I cannot any
> more.  So it looks like you've fixed it - thanks.

I hope so too.  Now I just hope Linus takes the patches.

> That just leave the bmap issue.  I'll post a patch which causes lseek to be
> used when the fs says that is OK.

Okay, thanks.

David

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-04-20  9:28 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-09  7:49 [PATCH/RFC] fscache/cachefiles versus btrfs NeilBrown
2015-04-09  9:23 ` David Howells
2015-04-09 23:52   ` Dave Chinner
2015-04-10  0:42     ` NeilBrown
2015-04-10  1:08       ` Dave Chinner
2015-04-10  1:24   ` NeilBrown
2015-04-20  4:49     ` NeilBrown
2015-04-20  9:27     ` David Howells
2015-04-10 13:28   ` David Howells
2015-04-13 17:30     ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.