All of lore.kernel.org
 help / color / mirror / Atom feed
* ext4 bug: getdents uninterruptible for 117 seconds
@ 2016-03-02 17:15 Benjamin LaHaise
  2016-03-02 21:43 ` Theodore Ts'o
  0 siblings, 1 reply; 5+ messages in thread
From: Benjamin LaHaise @ 2016-03-02 17:15 UTC (permalink / raw)
  To: linux-ext4

Hi folks,

While working on a bug involving write starvation, the test I was running 
managed to trigger some pretty horrific worst case behaviour in ext4.  The 
filesystem I'm working on is about 4TB in size, and is used for storing a 
number of spool files across 100 subdirectories in the filesystem.  One of 
these subdirectories ended up growing to ~497MB in size.  Once all of the 
files were removed from these directories, the filesystem was unmounted.  
On subsequent mounts of the filesystem, it became apparent that whenever 
a specific directory was accessed using ls or find, the kernel would block 
in getdents() for north of 117 seconds.  It is clear that ext4 is slowly 
reading the entire contents of the directory into memory during this time 
at a rate of ~4MB/s.  This filesystem is being stored on an external 8Gbps 
FC SAN comprised of about 8 x 10Krpm spindles.

I've placed a copy of the e2image for the filesystem at 
http://www.kvack.org/~bcrl/ext4/ext4-readdir.img.xz .  The problematic 
directory is broken/1.  The relevant snippet of strace output is below.  
Thoughts?

		-ben

write(1, "/mnt/broken/"..., 34) = 34 <0.000039>
openat(5, "1", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_DIRECTORY|O_NOFOLLOW) = 6 <0.000007>
fcntl(6, F_GETFD)                       = 0 <0.000004>
fcntl(6, F_SETFD, FD_CLOEXEC)           = 0 <0.000004>
fstat(6, {st_mode=S_IFDIR|0755, st_size=520896512, ...}) = 0 <0.000004>
fcntl(6, F_GETFL)                       = 0x38800 (flags O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY|O_NOFOLLOW) <0.000004>
fcntl(6, F_SETFD, FD_CLOEXEC)           = 0 <0.000004>
newfstatat(5, "1", {st_mode=S_IFDIR|0755, st_size=520896512, ...}, AT_SYMLINK_NOFOLLOW) = 0 <0.000006>
fcntl(6, F_DUPFD, 3)                    = 7 <0.000005>
fcntl(7, F_GETFD)                       = 0 <0.000005>
fcntl(7, F_SETFD, FD_CLOEXEC)           = 0 <0.000004>
getdents(6, /* 2 entries */, 32768)     = 48 <117.122463>
getdents(6, /* 0 entries */, 32768)     = 0 <0.000005>
close(6)                                = 0 <0.000004>
-- 
"Thought is the essence of where you are now."

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: ext4 bug: getdents uninterruptible for 117 seconds
  2016-03-02 17:15 ext4 bug: getdents uninterruptible for 117 seconds Benjamin LaHaise
@ 2016-03-02 21:43 ` Theodore Ts'o
  2016-03-14 17:57   ` [PATCH] ext4: make it possible to interrupt initial readdir call Benjamin LaHaise
  0 siblings, 1 reply; 5+ messages in thread
From: Theodore Ts'o @ 2016-03-02 21:43 UTC (permalink / raw)
  To: Benjamin LaHaise; +Cc: linux-ext4

On Wed, Mar 02, 2016 at 12:15:11PM -0500, Benjamin LaHaise wrote:
> Hi folks,
> 
> While working on a bug involving write starvation, the test I was running 
> managed to trigger some pretty horrific worst case behaviour in ext4.  The 
> filesystem I'm working on is about 4TB in size, and is used for storing a 
> number of spool files across 100 subdirectories in the filesystem.  One of 
> these subdirectories ended up growing to ~497MB in size.  Once all of the 
> files were removed from these directories, the filesystem was unmounted.  
> On subsequent mounts of the filesystem, it became apparent that whenever 
> a specific directory was accessed using ls or find, the kernel would block 
> in getdents() for north of 117 seconds.  It is clear that ext4 is slowly 
> reading the entire contents of the directory into memory during this time 
> at a rate of ~4MB/s.  This filesystem is being stored on an external 8Gbps 
> FC SAN comprised of about 8 x 10Krpm spindles.
> 
> I've placed a copy of the e2image for the filesystem at 
> http://www.kvack.org/~bcrl/ext4/ext4-readdir.img.xz .  The problematic 
> directory is broken/1.  The relevant snippet of strace output is below.  
> Thoughts?

Yes, this is a known problem.  Right now we don't have a way of
removing empty directory blocks from a directory.  This can be fixed
up by running "e2fsck -fD /dev/sdXX" off-line, but it's not terribly
satisfying.

There are things we could do in theory try to make things better, but
they haven't been implemented yet.  In practice they tend to happen
with pathological workloads, but they do happen occasionally in real
life.  It's just not something we've had time to address up until now.

       	    	     	       	     - Ted

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH] ext4: make it possible to interrupt initial readdir call
  2016-03-02 21:43 ` Theodore Ts'o
@ 2016-03-14 17:57   ` Benjamin LaHaise
  2016-04-08 18:18     ` Benjamin LaHaise
  0 siblings, 1 reply; 5+ messages in thread
From: Benjamin LaHaise @ 2016-03-14 17:57 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: linux-ext4

This patch is a follow up to the email "ext4 bug: getdents uninterruptible 
for 117 seconds".  When doing the initial readdir on an ext4 filesystem 
that grew a directory to 497MB (the filesystem image can be downloaded at
http://www.kvack.org/~bcrl/ext4/ext4-readdir.img.xz), the first getdents64() 
system call blocked for 117 seconds.  Begin to address this issue by making 
the ext4 readdir() operation interruptible by fatal signals so that it is 
possible to abort a process that is stuck on this operation.

 fs/ext4/dir.c   |    3 +++
 fs/ext4/namei.c |    4 ++++
 2 files changed, 7 insertions(+)

diff --git a/fs/ext4/dir.c b/fs/ext4/dir.c
index 33f5e2a..a3e32e8 100644
--- a/fs/ext4/dir.c
+++ b/fs/ext4/dir.c
@@ -553,6 +553,9 @@ static int ext4_dx_readdir(struct file *file, struct dir_context *ctx)
 		info->curr_node = rb_first(&info->root);
 
 	while (1) {
+		if (fatal_signal_pending(current))
+			return -ERESTARTSYS;
+
 		/*
 		 * Fill the rbtree if we have no more entries,
 		 * or the inode has changed since we last read in the
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 48e4b89..8097cd1 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -959,6 +959,10 @@ static int htree_dirblock_to_tree(struct file *dir_file,
 	bh = ext4_read_dirblock(dir, block, DIRENT);
 	if (IS_ERR(bh))
 		return PTR_ERR(bh);
+	if (fatal_signal_pending(current)) {
+		brelse(bh);
+		return -ERESTARTSYS;
+	}
 
 	de = (struct ext4_dir_entry_2 *) bh->b_data;
 	top = (struct ext4_dir_entry_2 *) ((char *) de +
-- 
"Thought is the essence of where you are now."

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] ext4: make it possible to interrupt initial readdir call
  2016-03-14 17:57   ` [PATCH] ext4: make it possible to interrupt initial readdir call Benjamin LaHaise
@ 2016-04-08 18:18     ` Benjamin LaHaise
  2016-04-12  1:35       ` Theodore Ts'o
  0 siblings, 1 reply; 5+ messages in thread
From: Benjamin LaHaise @ 2016-04-08 18:18 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: linux-ext4

No response on this patch yet....  Given that it's clearly a bug, and we've 
got a test case that reproduces it, why isn't this being applied?

		-ben

On Mon, Mar 14, 2016 at 01:57:32PM -0400, Benjamin LaHaise wrote:
> This patch is a follow up to the email "ext4 bug: getdents uninterruptible 
> for 117 seconds".  When doing the initial readdir on an ext4 filesystem 
> that grew a directory to 497MB (the filesystem image can be downloaded at
> http://www.kvack.org/~bcrl/ext4/ext4-readdir.img.xz), the first getdents64() 
> system call blocked for 117 seconds.  Begin to address this issue by making 
> the ext4 readdir() operation interruptible by fatal signals so that it is 
> possible to abort a process that is stuck on this operation.
> 
>  fs/ext4/dir.c   |    3 +++
>  fs/ext4/namei.c |    4 ++++
>  2 files changed, 7 insertions(+)
> 
> diff --git a/fs/ext4/dir.c b/fs/ext4/dir.c
> index 33f5e2a..a3e32e8 100644
> --- a/fs/ext4/dir.c
> +++ b/fs/ext4/dir.c
> @@ -553,6 +553,9 @@ static int ext4_dx_readdir(struct file *file, struct dir_context *ctx)
>  		info->curr_node = rb_first(&info->root);
>  
>  	while (1) {
> +		if (fatal_signal_pending(current))
> +			return -ERESTARTSYS;
> +
>  		/*
>  		 * Fill the rbtree if we have no more entries,
>  		 * or the inode has changed since we last read in the
> diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
> index 48e4b89..8097cd1 100644
> --- a/fs/ext4/namei.c
> +++ b/fs/ext4/namei.c
> @@ -959,6 +959,10 @@ static int htree_dirblock_to_tree(struct file *dir_file,
>  	bh = ext4_read_dirblock(dir, block, DIRENT);
>  	if (IS_ERR(bh))
>  		return PTR_ERR(bh);
> +	if (fatal_signal_pending(current)) {
> +		brelse(bh);
> +		return -ERESTARTSYS;
> +	}
>  
>  	de = (struct ext4_dir_entry_2 *) bh->b_data;
>  	top = (struct ext4_dir_entry_2 *) ((char *) de +
> -- 
> "Thought is the essence of where you are now."

-- 
"Thought is the essence of where you are now."

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] ext4: make it possible to interrupt initial readdir call
  2016-04-08 18:18     ` Benjamin LaHaise
@ 2016-04-12  1:35       ` Theodore Ts'o
  0 siblings, 0 replies; 5+ messages in thread
From: Theodore Ts'o @ 2016-04-12  1:35 UTC (permalink / raw)
  To: Benjamin LaHaise; +Cc: linux-ext4

On Fri, Apr 08, 2016 at 02:18:41PM -0400, Benjamin LaHaise wrote:
> No response on this patch yet....  Given that it's clearly a bug, and we've 
> got a test case that reproduces it, why isn't this being applied?

Sorry, I forgot to ack the your patch.  I did apply it and it went to
Linus during the merge window.  Unfortunately, it casued a regression,
and Linus reverted it.  What went to Linus was changed slightly,
because I was trying to fix another potential case where we could end
up looping for a very long time.  Unfortunately I screwed up the
modification patch and the regression tests didn't catch the problem.

	     	       	   	      	    	   - Ted

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-04-12  1:35 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-02 17:15 ext4 bug: getdents uninterruptible for 117 seconds Benjamin LaHaise
2016-03-02 21:43 ` Theodore Ts'o
2016-03-14 17:57   ` [PATCH] ext4: make it possible to interrupt initial readdir call Benjamin LaHaise
2016-04-08 18:18     ` Benjamin LaHaise
2016-04-12  1:35       ` Theodore Ts'o

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.