[PATCH][resend] fs: Add hooks for get_hole_size to generic_block

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH][resend] fs: Add hooks for get_hole_size to generic_block_fiemap
       [not found] <1623792099.11913714.1409074745372.JavaMail.zimbra@redhat.com>
@ 2014-09-03 15:52 ` Bob Peterson
  2014-09-03 20:26   ` Dave Chinner
  2014-09-05  0:04   ` [PATCH][try4] " Bob Peterson
  0 siblings, 2 replies; 5+ messages in thread
From: Bob Peterson @ 2014-09-03 15:52 UTC (permalink / raw)
  To: Alexander Viro, linux-fsdevel

Hi,

I sent this patch a few weeks back and it didn't receive any comments.
So I assume it's okay to add upstream?

The problem:
If you do a fiemap operation on a very large sparse file, it can take
an extremely long amount of time (we're talking days here) because
function __generic_block_fiemap does a block-for-block search when it
encounters a hole.

The solution:
Allow the underlying file system to return the hole size so that function
__generic_block_fiemap can quickly skip the hole.

Preamble:
In cases where the fs-specific block_map() function finds a hole, it
can return the hole size in b_size. This is efficient because the file
system doesn't need to figure out block mapping a second time to
determine the hole size. The patch repurposes the buffer_meta flag
to tell when the fs-specific block_map() is passing back the hole_size:
If the fs-specific block_map() doesn't set the buffer_meta bit,
function __generic_block_fiemap() assumes a hole size of 1 as before.
Other file systems that want to take advantage of the new "hole size"
functionality need only write their own function to determine the
hole size, call it from their respective block_map() function, and
set_buffer_meta to put it into use. I've written a simple patch to
GFS2 that does just that, as a follow-on.

Patch description:

This patch changes function __generic_block_fiemap so that if the
fs-specific block_map sets the buffer_meta flag corresponding to a
hole, it takes the returned b_size to be the size of the hole, in
bytes. This is much faster than trying each block individually when
large holes are encountered.

Regards,

Bob Peterson
Red Hat File Systems

Signed-off-by: Bob Peterson <rpeterso@redhat.com> 
---
diff --git a/fs/ioctl.c b/fs/ioctl.c
index 8ac3fad..954d1c3 100644
--- a/fs/ioctl.c
+++ b/fs/ioctl.c
@@ -291,13 +291,18 @@ int __generic_block_fiemap(struct inode *inode,
 		memset(&map_bh, 0, sizeof(struct buffer_head));
 		map_bh.b_size = len;

+		clear_buffer_meta(&map_bh);
 		ret = get_block(inode, start_blk, &map_bh, 0);
 		if (ret)
 			break;

 		/* HOLE */
 		if (!buffer_mapped(&map_bh)) {
-			start_blk++;
+			if (buffer_meta(&map_bh))
+				start_blk += logical_to_blk(inode,
+							    map_bh.b_size);
+			else
+				start_blk++;

 			/*
 			 * We want to handle the case where there is an
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH][resend] fs: Add hooks for get_hole_size to generic_block_fiemap
  2014-09-03 15:52 ` [PATCH][resend] fs: Add hooks for get_hole_size to generic_block_fiemap Bob Peterson
@ 2014-09-03 20:26   ` Dave Chinner
  2014-09-05  0:04   ` [PATCH][try4] " Bob Peterson
  1 sibling, 0 replies; 5+ messages in thread
From: Dave Chinner @ 2014-09-03 20:26 UTC (permalink / raw)
  To: Bob Peterson; +Cc: Alexander Viro, linux-fsdevel

On Wed, Sep 03, 2014 at 11:52:25AM -0400, Bob Peterson wrote:
> Hi,
> 
> I sent this patch a few weeks back and it didn't receive any comments.
> So I assume it's okay to add upstream?
> 
> The problem:
> If you do a fiemap operation on a very large sparse file, it can take
> an extremely long amount of time (we're talking days here) because
> function __generic_block_fiemap does a block-for-block search when it
> encounters a hole.
> 
> The solution:
> Allow the underlying file system to return the hole size so that function
> __generic_block_fiemap can quickly skip the hole.
> 
> Preamble:
> In cases where the fs-specific block_map() function finds a hole, it
> can return the hole size in b_size. This is efficient because the file
> system doesn't need to figure out block mapping a second time to
> determine the hole size. The patch repurposes the buffer_meta flag
> to tell when the fs-specific block_map() is passing back the hole_size:
> If the fs-specific block_map() doesn't set the buffer_meta bit,

Use of BH_Meta is a bit wierd. Mapping the data blocks of a file and
expecting the return to have the BH_Meta flag set seems, well,
A Little Bit Wrong. We're not short of buffer flags, perhaps adding
a purpose sepcific flag for this?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH][try4] fs: Add hooks for get_hole_size to generic_block_fiemap
  2014-09-03 15:52 ` [PATCH][resend] fs: Add hooks for get_hole_size to generic_block_fiemap Bob Peterson
  2014-09-03 20:26   ` Dave Chinner
@ 2014-09-05  0:04   ` Bob Peterson
  2014-09-11 15:29     ` [PATCH][try5] fs: if block_map clears buffer_holesize bit skip hole size from b_size Bob Peterson
  1 sibling, 1 reply; 5+ messages in thread
From: Bob Peterson @ 2014-09-05  0:04 UTC (permalink / raw)
  To: Alexander Viro, linux-fsdevel

Hi,

This version uses a new buffer flag, holesize, as Dave Chinner
suggested.

The problem:
If you do a fiemap operation on a very large sparse file, it can take
an extremely long amount of time (we're talking days here) because
function __generic_block_fiemap does a block-for-block search when it
encounters a hole.

The solution:
Allow the underlying file system to return the hole size so that function
__generic_block_fiemap can quickly skip the hole.

Preamble:
In cases where the fs-specific block_map() function finds a hole, it
can return the hole size in b_size. This is efficient because the file
system doesn't need to figure out block mapping a second time to
determine the hole size. The patch uses a new buffer_holesize flag
to tell when the fs-specific block_map() is passing back the hole_size:
If the fs-specific block_map() doesn't set the buffer_holesize bit,
function __generic_block_fiemap() assumes a hole size of 1 as before.
Other file systems that want to take advantage of the new "hole size"
functionality need only write their own function to determine the
hole size, call it from their respective block_map() function, and
set_buffer_holesize to put it into use. I've written a simple patch to
GFS2 that does just that, as a follow-on.

Patch description:

This patch changes function __generic_block_fiemap so that if the
fs-specific block_map sets the buffer_holesize flag corresponding to a
hole, it takes the returned b_size to be the size of the hole, in
bytes. This is much faster than trying each block individually when
large holes are encountered.

Regards,

Bob Peterson
Red Hat File Systems

Signed-off-by: Bob Peterson <rpeterso@redhat.com> 
---
 fs/ioctl.c                  | 7 ++++++-
 include/linux/buffer_head.h | 2 ++
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/fs/ioctl.c b/fs/ioctl.c
index 8ac3fad..121ba6f 100644
--- a/fs/ioctl.c
+++ b/fs/ioctl.c
@@ -291,13 +291,18 @@ int __generic_block_fiemap(struct inode *inode,
 		memset(&map_bh, 0, sizeof(struct buffer_head));
 		map_bh.b_size = len;

+		clear_buffer_holesize(&map_bh);
 		ret = get_block(inode, start_blk, &map_bh, 0);
 		if (ret)
 			break;

 		/* HOLE */
 		if (!buffer_mapped(&map_bh)) {
-			start_blk++;
+			if (buffer_holesize(&map_bh))
+				start_blk += logical_to_blk(inode,
+							    map_bh.b_size);
+			else
+				start_blk++;

 			/*
 			 * We want to handle the case where there is an
diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
index 324329c..39ed1f1 100644
--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@@ -37,6 +37,7 @@ enum bh_state_bits {
 	BH_Meta,	/* Buffer contains metadata */
 	BH_Prio,	/* Buffer should be submitted with REQ_PRIO */
 	BH_Defer_Completion, /* Defer AIO completion to workqueue */
+	BH_Holesize,    /* Hole encountered, hole size returned */

 	BH_PrivateStart,/* not a state bit, but the first bit available
 			 * for private allocation by other entities
@@ -128,6 +129,7 @@ BUFFER_FNS(Boundary, boundary)
 BUFFER_FNS(Write_EIO, write_io_error)
 BUFFER_FNS(Unwritten, unwritten)
 BUFFER_FNS(Meta, meta)
+BUFFER_FNS(Holesize, holesize)
 BUFFER_FNS(Prio, prio)
 BUFFER_FNS(Defer_Completion, defer_completion)

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH][try5] fs: if block_map clears buffer_holesize bit skip hole size from b_size
  2014-09-05  0:04   ` [PATCH][try4] " Bob Peterson
@ 2014-09-11 15:29     ` Bob Peterson
  2014-09-16 12:59       ` Bob Peterson
  0 siblings, 1 reply; 5+ messages in thread
From: Bob Peterson @ 2014-09-11 15:29 UTC (permalink / raw)
  To: Alexander Viro, linux-fsdevel

Hi,

This version uses a new buffer flag, holesize, as Dave Chinner
suggested. It also incorporates a suggestion from Steve Whitehouse.

The problem:
If you do a fiemap operation on a very large sparse file, it can take
an extremely long amount of time (we're talking days here) because
function __generic_block_fiemap does a block-for-block search when it
encounters a hole.

The solution:
Allow the underlying file system to return the hole size so that function
__generic_block_fiemap can quickly skip the hole.

Patch description:

This patch changes function __generic_block_fiemap so that it sets a new
buffer_holesize bit. The new bit signals to the underlying file system
to return a hole size from its block_map function (if possible) in the
event that a hole is encountered at the requested block. If the block_map
function encounters a hole, and clears buffer_holesize, fiemap takes the
returned b_size to be the size of the hole, in bytes. It then skips the
hole and moves to the next block. This may be repeated several times
in a row, especially for large holes, due to possible limitations of the
fs-specific block_map function. This is still much faster than trying
each block individually when large holes are encountered. If the
block_map function does not clear buffer_holesize, the request for
holesize has been ignored, and it falls back to today's method of doing a
block-by-block search for the next valid block.

Regards,

Bob Peterson
Red Hat File Systems

Signed-off-by: Bob Peterson <rpeterso@redhat.com> 
---
 fs/ioctl.c                  | 7 ++++++-
 include/linux/buffer_head.h | 2 ++
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/fs/ioctl.c b/fs/ioctl.c
index 8ac3fad..ae63b1f 100644
--- a/fs/ioctl.c
+++ b/fs/ioctl.c
@@ -291,13 +291,18 @@ int __generic_block_fiemap(struct inode *inode,
 		memset(&map_bh, 0, sizeof(struct buffer_head));
 		map_bh.b_size = len;

+		set_buffer_holesize(&map_bh); /* return hole size if able */
 		ret = get_block(inode, start_blk, &map_bh, 0);
 		if (ret)
 			break;

 		/* HOLE */
 		if (!buffer_mapped(&map_bh)) {
-			start_blk++;
+			if (buffer_holesize(&map_bh)) /* holesize ignored */
+				start_blk++;
+			else
+				start_blk += logical_to_blk(inode,
+							    map_bh.b_size);

 			/*
 			 * We want to handle the case where there is an
diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
index 324329c..b8ce396 100644
--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@@ -37,6 +37,7 @@ enum bh_state_bits {
 	BH_Meta,	/* Buffer contains metadata */
 	BH_Prio,	/* Buffer should be submitted with REQ_PRIO */
 	BH_Defer_Completion, /* Defer AIO completion to workqueue */
+	BH_Holesize,    /* Return hole size (and clear) if possible */

 	BH_PrivateStart,/* not a state bit, but the first bit available
 			 * for private allocation by other entities
@@ -128,6 +129,7 @@ BUFFER_FNS(Boundary, boundary)
 BUFFER_FNS(Write_EIO, write_io_error)
 BUFFER_FNS(Unwritten, unwritten)
 BUFFER_FNS(Meta, meta)
+BUFFER_FNS(Holesize, holesize)
 BUFFER_FNS(Prio, prio)
 BUFFER_FNS(Defer_Completion, defer_completion)

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH][try5] fs: if block_map clears buffer_holesize bit skip hole size from b_size
  2014-09-11 15:29     ` [PATCH][try5] fs: if block_map clears buffer_holesize bit skip hole size from b_size Bob Peterson
@ 2014-09-16 12:59       ` Bob Peterson
  0 siblings, 0 replies; 5+ messages in thread
From: Bob Peterson @ 2014-09-16 12:59 UTC (permalink / raw)
  To: linux-kernel

Hi,

I've previously sent this patch to linux-fsdevel (and Viro) and gotten
little to no response, so I thought I'd send it here.

The problem:
If you do a fiemap operation on a very large sparse file, it can take
an extremely long amount of time (we're talking days here) because
function __generic_block_fiemap does a block-for-block search when it
encounters a hole.

The solution:
Allow the underlying file system to return the hole size so that function
__generic_block_fiemap can quickly skip the hole. This will be followed
by another patch to GFS2 that takes advantage of this new flag to speed
up its fiemap on sparse files. Other file systems can do the same as they
see fit. For GFS2, the time it takes to skip a 1PB hole in a sparse file
goes from several days to milliseconds.

Patch description:

This patch changes function __generic_block_fiemap so that it sets a new
buffer_holesize bit. The new bit signals to the underlying file system
to return a hole size from its block_map function (if possible) in the
event that a hole is encountered at the requested block. If the block_map
function encounters a hole, and clears buffer_holesize, fiemap takes the
returned b_size to be the size of the hole, in bytes. It then skips the
hole and moves to the next block. This may be repeated several times
in a row, especially for large holes, due to possible limitations of the
fs-specific block_map function. This is still much faster than trying
each block individually when large holes are encountered. If the
block_map function does not clear buffer_holesize, the request for
holesize has been ignored, and it falls back to today's method of doing a
block-by-block search for the next valid block.

Regards,

Bob Peterson
Red Hat File Systems

Signed-off-by: Bob Peterson <rpeterso@redhat.com> 
---
 fs/ioctl.c                  | 7 ++++++-
 include/linux/buffer_head.h | 2 ++
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/fs/ioctl.c b/fs/ioctl.c
index 8ac3fad..ae63b1f 100644
--- a/fs/ioctl.c
+++ b/fs/ioctl.c
@@ -291,13 +291,18 @@ int __generic_block_fiemap(struct inode *inode,
 		memset(&map_bh, 0, sizeof(struct buffer_head));
 		map_bh.b_size = len;

+		set_buffer_holesize(&map_bh); /* return hole size if able */
 		ret = get_block(inode, start_blk, &map_bh, 0);
 		if (ret)
 			break;

 		/* HOLE */
 		if (!buffer_mapped(&map_bh)) {
-			start_blk++;
+			if (buffer_holesize(&map_bh)) /* holesize ignored */
+				start_blk++;
+			else
+				start_blk += logical_to_blk(inode,
+							    map_bh.b_size);

 			/*
 			 * We want to handle the case where there is an
diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
index 324329c..b8ce396 100644
--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@@ -37,6 +37,7 @@ enum bh_state_bits {
 	BH_Meta,	/* Buffer contains metadata */
 	BH_Prio,	/* Buffer should be submitted with REQ_PRIO */
 	BH_Defer_Completion, /* Defer AIO completion to workqueue */
+	BH_Holesize,    /* Return hole size (and clear) if possible */

 	BH_PrivateStart,/* not a state bit, but the first bit available
 			 * for private allocation by other entities
@@ -128,6 +129,7 @@ BUFFER_FNS(Boundary, boundary)
 BUFFER_FNS(Write_EIO, write_io_error)
 BUFFER_FNS(Unwritten, unwritten)
 BUFFER_FNS(Meta, meta)
+BUFFER_FNS(Holesize, holesize)
 BUFFER_FNS(Prio, prio)
 BUFFER_FNS(Defer_Completion, defer_completion)

^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-09-16 12:59 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1623792099.11913714.1409074745372.JavaMail.zimbra@redhat.com>
2014-09-03 15:52 ` [PATCH][resend] fs: Add hooks for get_hole_size to generic_block_fiemap Bob Peterson
2014-09-03 20:26   ` Dave Chinner
2014-09-05  0:04   ` [PATCH][try4] " Bob Peterson
2014-09-11 15:29     ` [PATCH][try5] fs: if block_map clears buffer_holesize bit skip hole size from b_size Bob Peterson
2014-09-16 12:59       ` Bob Peterson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.