Bug: Large writes can fail on ext4 if the write buffer is not empty

All of lore.kernel.org
 help / color / mirror / Atom feed

* Bug: Large writes can fail on ext4 if the write buffer is not empty
@ 2012-04-12 14:47 Jouni Siren
  2012-04-12 16:06 ` Zheng Liu
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Jouni Siren @ 2012-04-12 14:47 UTC (permalink / raw)
  To: linux-ext4

Hi,

I recently ran into problems when writing large blocks of data (more than about 2 GB) with a single call, if there is already some data in the write buffer. The problem seems to be specific to ext4, or at least it does not happen when writing to nfs on the same system. Also, the problem does not happen, if the write buffer is flushed before the large write.

The following C++ program should write a total of 4294967304 bytes, but I end up with a file of size 2147483664.

#include <fstream>

int
main(int argc, char** argv)
{
  std::streamsize data_size = (std::streamsize)1 << 31;
  char* data = new char[data_size];

  std::ofstream output("test.dat", std::ios_base::binary);
  output.write(data, 8);
  output.write(data, data_size);
  output.write(data, data_size);
  output.close();

  delete[] data;
  return 0;
}

The relevant part of strace is the following:

open("test.dat", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
writev(3, [{"\0\0\0\0\0\0\0\0", 8}, {"", 2147483648}], 2) = -2147483640
writev(3, [{0xffffffff80c6d258, 2147483648}, {"", 2147483648}], 2) = -1 EFAULT (Bad address)
write(3, "\0\0\0\0\0\0\0\0", 8)         = 8
close(3)                                = 0

The first two writes are combined into a single writev call that reports having written -2147483640 bytes. This is the same as 8 + 2147483648, when interpreted as a signed 32-bit integer. After the first call, everything more or less fails. This happens on a Linux system, where uname -a returns

Linux alm01 2.6.32-220.7.1.el6.x86_64 #1 SMP Tue Mar 6 15:45:33 CST 2012 x86_64 x86_64 x86_64 GNU/Linux

I believe that the bug can be found in file.c, function ext4_file_write, where variable ret has type int. Function generic_file_aio_write returns the number of bytes written as a ssize_t, and the returned value is stored in ret and eventually returned by ext4_file_write. If the number of bytes written is more than INT_MAX, the value returned by ext4_file_write will be incorrect.

If you need more information on the problem, I will be happy to provide it.

-- 
Jouni Siren - jouni.siren@iki.fi - http://iki.fi/jouni.siren/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Bug: Large writes can fail on ext4 if the write buffer is not empty
  2012-04-12 14:47 Bug: Large writes can fail on ext4 if the write buffer is not empty Jouni Siren
@ 2012-04-12 16:06 ` Zheng Liu
  2012-04-12 20:20   ` Jan Kara
  2012-04-13  0:10 ` Bug: Large writes can fail on ext4 if the write buffer is not empty Dave Chinner
  2012-04-19 13:10 ` Jouko Orava
  2 siblings, 1 reply; 15+ messages in thread
From: Zheng Liu @ 2012-04-12 16:06 UTC (permalink / raw)
  To: Jouni Siren; +Cc: linux-ext4

On Thu, Apr 12, 2012 at 05:47:41PM +0300, Jouni Siren wrote:
> Hi,
> 
> I recently ran into problems when writing large blocks of data (more than about 2 GB) with a single call, if there is already some data in the write buffer. The problem seems to be specific to ext4, or at least it does not happen when writing to nfs on the same system. Also, the problem does not happen, if the write buffer is flushed before the large write.
> 
> The following C++ program should write a total of 4294967304 bytes, but I end up with a file of size 2147483664.
> 
> #include <fstream>
> 
> int
> main(int argc, char** argv)
> {
>   std::streamsize data_size = (std::streamsize)1 << 31;
>   char* data = new char[data_size];
> 
>   std::ofstream output("test.dat", std::ios_base::binary);
>   output.write(data, 8);
>   output.write(data, data_size);
>   output.write(data, data_size);
>   output.close();
> 
>   delete[] data;
>   return 0;
> }
> 
> 
> The relevant part of strace is the following:
> 
> open("test.dat", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
> writev(3, [{"\0\0\0\0\0\0\0\0", 8}, {"", 2147483648}], 2) = -2147483640
> writev(3, [{0xffffffff80c6d258, 2147483648}, {"", 2147483648}], 2) = -1 EFAULT (Bad address)
> write(3, "\0\0\0\0\0\0\0\0", 8)         = 8
> close(3)                                = 0
> 
> 
> The first two writes are combined into a single writev call that reports having written -2147483640 bytes. This is the same as 8 + 2147483648, when interpreted as a signed 32-bit integer. After the first call, everything more or less fails. This happens on a Linux system, where uname -a returns
> 
> Linux alm01 2.6.32-220.7.1.el6.x86_64 #1 SMP Tue Mar 6 15:45:33 CST 2012 x86_64 x86_64 x86_64 GNU/Linux
> 
> 
> I believe that the bug can be found in file.c, function ext4_file_write, where variable ret has type int. Function generic_file_aio_write returns the number of bytes written as a ssize_t, and the returned value is stored in ret and eventually returned by ext4_file_write. If the number of bytes written is more than INT_MAX, the value returned by ext4_file_write will be incorrect.
> 
> If you need more information on the problem, I will be happy to provide it.

Hi Jouni,

Indeed, I think that it is a bug.  So the solution is straightforward.
Could you please try this patch?  Thank you.

Regards,
Zheng

From: Zheng Liu <wenqing.lz@taobao.com>
Subject: [PATCH] ext4: change return value from int to ssize_t in ext4_file_write

in 32 bit platform, when we do a write operation with a huge number of data, it
will cause that the ret variable overflows.  So it is replaced with ssize_t.

Reported-by: Jouni Siren <jouni.siren@iki.fi>
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
---
 fs/ext4/file.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index cb70f18..8c7642a 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -95,7 +95,7 @@ ext4_file_write(struct kiocb *iocb, const struct iovec *iov,
 {
 	struct inode *inode = iocb->ki_filp->f_path.dentry->d_inode;
 	int unaligned_aio = 0;
-	int ret;
+	ssize_t ret;
 
 	/*
 	 * If we have encountered a bitmap-format file, the size limit
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: Bug: Large writes can fail on ext4 if the write buffer is not empty
  2012-04-12 16:06 ` Zheng Liu
@ 2012-04-12 20:20   ` Jan Kara
  2012-04-13  1:22     ` [PATCH RESEND] ext4: change return value from int to ssize_t in ext4_file_write Zheng Liu
  0 siblings, 1 reply; 15+ messages in thread
From: Jan Kara @ 2012-04-12 20:20 UTC (permalink / raw)
  To: Zheng Liu; +Cc: Jouni Siren, linux-ext4

On Fri 13-04-12 00:06:59, Zheng Liu wrote:
> On Thu, Apr 12, 2012 at 05:47:41PM +0300, Jouni Siren wrote:
> > Hi,
> > 
> > I recently ran into problems when writing large blocks of data (more than about 2 GB) with a single call, if there is already some data in the write buffer. The problem seems to be specific to ext4, or at least it does not happen when writing to nfs on the same system. Also, the problem does not happen, if the write buffer is flushed before the large write.
> > 
> > The following C++ program should write a total of 4294967304 bytes, but I end up with a file of size 2147483664.
> > 
> > #include <fstream>
> > 
> > int
> > main(int argc, char** argv)
> > {
> >   std::streamsize data_size = (std::streamsize)1 << 31;
> >   char* data = new char[data_size];
> > 
> >   std::ofstream output("test.dat", std::ios_base::binary);
> >   output.write(data, 8);
> >   output.write(data, data_size);
> >   output.write(data, data_size);
> >   output.close();
> > 
> >   delete[] data;
> >   return 0;
> > }
> > 
> > 
> > The relevant part of strace is the following:
> > 
> > open("test.dat", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
> > writev(3, [{"\0\0\0\0\0\0\0\0", 8}, {"", 2147483648}], 2) = -2147483640
> > writev(3, [{0xffffffff80c6d258, 2147483648}, {"", 2147483648}], 2) = -1 EFAULT (Bad address)
> > write(3, "\0\0\0\0\0\0\0\0", 8)         = 8
> > close(3)                                = 0
> > 
> > 
> > The first two writes are combined into a single writev call that reports having written -2147483640 bytes. This is the same as 8 + 2147483648, when interpreted as a signed 32-bit integer. After the first call, everything more or less fails. This happens on a Linux system, where uname -a returns
> > 
> > Linux alm01 2.6.32-220.7.1.el6.x86_64 #1 SMP Tue Mar 6 15:45:33 CST 2012 x86_64 x86_64 x86_64 GNU/Linux
> > 
> > 
> > I believe that the bug can be found in file.c, function ext4_file_write, where variable ret has type int. Function generic_file_aio_write returns the number of bytes written as a ssize_t, and the returned value is stored in ret and eventually returned by ext4_file_write. If the number of bytes written is more than INT_MAX, the value returned by ext4_file_write will be incorrect.
> > 
> > If you need more information on the problem, I will be happy to provide it.
> 
> Hi Jouni,
> 
> Indeed, I think that it is a bug.  So the solution is straightforward.
> Could you please try this patch?  Thank you.
> 
> Regards,
> Zheng
> 
> From: Zheng Liu <wenqing.lz@taobao.com>
> Subject: [PATCH] ext4: change return value from int to ssize_t in ext4_file_write
> 
> in 32 bit platform, when we do a write operation with a huge number of data, it
> will cause that the ret variable overflows.  So it is replaced with ssize_t.
  Actually, the problem happens for 64-bit platforms. On 32-platforms,
ssize_t is 32-bit and thus we would do only a short write to fit into 2^31.
But on 64-bit, ssize_t is 64-bit so there we can end up writing more than
2^31. So please change the changelog. The patch itself is good, so you can
add:
  Reviewed-by: Jan Kara <jack@suse.cz>

								Honza
> 
> Reported-by: Jouni Siren <jouni.siren@iki.fi>
> Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
> ---
>  fs/ext4/file.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/fs/ext4/file.c b/fs/ext4/file.c
> index cb70f18..8c7642a 100644
> --- a/fs/ext4/file.c
> +++ b/fs/ext4/file.c
> @@ -95,7 +95,7 @@ ext4_file_write(struct kiocb *iocb, const struct iovec *iov,
>  {
>  	struct inode *inode = iocb->ki_filp->f_path.dentry->d_inode;
>  	int unaligned_aio = 0;
> -	int ret;
> +	ssize_t ret;
>  
>  	/*
>  	 * If we have encountered a bitmap-format file, the size limit
> -- 
> 1.7.4.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Bug: Large writes can fail on ext4 if the write buffer is not empty
  2012-04-12 14:47 Bug: Large writes can fail on ext4 if the write buffer is not empty Jouni Siren
  2012-04-12 16:06 ` Zheng Liu
@ 2012-04-13  0:10 ` Dave Chinner
  2012-04-19 13:10 ` Jouko Orava
  2 siblings, 0 replies; 15+ messages in thread
From: Dave Chinner @ 2012-04-13  0:10 UTC (permalink / raw)
  To: Jouni Siren; +Cc: linux-ext4

On Thu, Apr 12, 2012 at 05:47:41PM +0300, Jouni Siren wrote:
> Hi,
> 
> I recently ran into problems when writing large blocks of data (more than about 2 GB) with a single call, if there is already some data in the write buffer. The problem seems to be specific to ext4, or at least it does not happen when writing to nfs on the same system. Also, the problem does not happen, if the write buffer is flushed before the large write.
> 
> The following C++ program should write a total of 4294967304 bytes, but I end up with a file of size 2147483664.
> 
> #include <fstream>
> 
> int
> main(int argc, char** argv)
> {
>   std::streamsize data_size = (std::streamsize)1 << 31;
>   char* data = new char[data_size];
> 
>   std::ofstream output("test.dat", std::ios_base::binary);
>   output.write(data, 8);
>   output.write(data, data_size);
>   output.write(data, data_size);
>   output.close();
> 
>   delete[] data;
>   return 0;
> }
> 
> 
> The relevant part of strace is the following:
> 
> open("test.dat", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
> writev(3, [{"\0\0\0\0\0\0\0\0", 8}, {"", 2147483648}], 2) = -2147483640
> writev(3, [{0xffffffff80c6d258, 2147483648}, {"", 2147483648}], 2) = -1 EFAULT (Bad address)

EFAULT - your user buffer is too large.

IOWs, you can't do IO in chunks of 2GB or greater in a single buffer
or iovec. This limit is imposed by the VFS to prevent overflows in
badly implemented filesystem code.

Just do mulitple smaller IOs - it will be just as to do a single 2GB
IO....

Cheers,

Dave.

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH RESEND] ext4: change return value from int to ssize_t in ext4_file_write
  2012-04-12 20:20   ` Jan Kara
@ 2012-04-13  1:22     ` Zheng Liu
  2012-05-22 19:44       ` Eric Sandeen
  2012-05-28 22:08       ` Ted Ts'o
  0 siblings, 2 replies; 15+ messages in thread
From: Zheng Liu @ 2012-04-13  1:22 UTC (permalink / raw)
  To: jack; +Cc: jouni.siren, linux-ext4, Zheng Liu

On 64-platform, when we do a write operation with a huge number of data, it
will cause that the ret variable overflows.  So it is replaced with ssize_t.

Reported-by: Jouni Siren <jouni.siren@iki.fi>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
---
Hi Jan,

You are right.  I have changed the commit log.  ;-)

Regards,
Zheng

 fs/ext4/file.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index cb70f18..8c7642a 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -95,7 +95,7 @@ ext4_file_write(struct kiocb *iocb, const struct iovec *iov,
 {
 	struct inode *inode = iocb->ki_filp->f_path.dentry->d_inode;
 	int unaligned_aio = 0;
-	int ret;
+	ssize_t ret;
 
 	/*
 	 * If we have encountered a bitmap-format file, the size limit
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: Bug: Large writes can fail on ext4 if the write buffer is not empty
  2012-04-12 14:47 Bug: Large writes can fail on ext4 if the write buffer is not empty Jouni Siren
  2012-04-12 16:06 ` Zheng Liu
  2012-04-13  0:10 ` Bug: Large writes can fail on ext4 if the write buffer is not empty Dave Chinner
@ 2012-04-19 13:10 ` Jouko Orava
  2012-04-19 14:15   ` Eric Sandeen
  2012-04-19 14:56   ` Zheng Liu
  2 siblings, 2 replies; 15+ messages in thread
From: Jouko Orava @ 2012-04-19 13:10 UTC (permalink / raw)
  To: linux-ext4; +Cc: Zheng Liu

Hi Zheng,

I can confirm the bug exists on latest RHEL 6 x86_64 kernels (2.6.32-220).

On current mainline kernels all writes are limited to one page under 2GB,
which masks the problem. I have not checked if mainline 2.6.32 has this
limit or not. It does not matter: the limit is just a band-aid to paper
over filesystem bugs, and should not mean you don't fix filesystem bugs.

I can confirm the one line patch to fs/ext4/file.c does fix the problem.
I have test kernels based on 2.6.32-220.7.1.el6.x86_64 with only
the patch applied, at
	http://www.helsinki.fi/~joorava/kernel/
if anyone else is interested in testing.

I did some (limited) testing on ext4 with the patch. It fixes the problem:
large writes work very well too. No problems popped up in testing.

Tested-by: Jouko Orava <jouko.orava@helsinki.fi>

I'd also like to point at the real bug here, in Jouni's original strace:

	writev(3, [{"\0\0\0\0\0\0\0\0", 8}, {"", 2147483648}], 2) = -2147483640

The syscall returns a negative value, which is not the actual number of
bytes written (since it is 32-bit wrapped), and errno has not been
changed. There is no way for userspace to handle this result correctly!

There is no way anyone sane should just gloss over this saying
"programmer fault, you're doing it wrong".
This is a real bug, and deserves fixing sooner rather than later.

Thanks,
   Jouko Orava

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Bug: Large writes can fail on ext4 if the write buffer is not empty
  2012-04-19 13:10 ` Jouko Orava
@ 2012-04-19 14:15   ` Eric Sandeen
  2012-04-19 14:38     ` Jouko Orava
  2012-04-20  2:12     ` Dave Chinner
  2012-04-19 14:56   ` Zheng Liu
  1 sibling, 2 replies; 15+ messages in thread
From: Eric Sandeen @ 2012-04-19 14:15 UTC (permalink / raw)
  To: Jouko Orava; +Cc: linux-ext4, Zheng Liu

On 4/19/12 8:10 AM, Jouko Orava wrote:
> Hi Zheng,
> 
> I can confirm the bug exists on latest RHEL 6 x86_64 kernels (2.6.32-220).
> 
> On current mainline kernels all writes are limited to one page under 2GB,
> which masks the problem. I have not checked if mainline 2.6.32 has this
> limit or not. It does not matter: the limit is just a band-aid to paper
> over filesystem bugs, and should not mean you don't fix filesystem bugs.

FWIW, we tried fairly hard to get the limit lifted in the vfs, to no avail.

> I can confirm the one line patch to fs/ext4/file.c does fix the problem.
> I have test kernels based on 2.6.32-220.7.1.el6.x86_64 with only
> the patch applied, at
> 	http://www.helsinki.fi/~joorava/kernel/
> if anyone else is interested in testing.
> 
> I did some (limited) testing on ext4 with the patch. It fixes the problem:
> large writes work very well too. No problems popped up in testing.
> 
> Tested-by: Jouko Orava <jouko.orava@helsinki.fi>
> 
> 
> I'd also like to point at the real bug here, in Jouni's original strace:
> 
> 	writev(3, [{"\0\0\0\0\0\0\0\0", 8}, {"", 2147483648}], 2) = -2147483640
> 
> The syscall returns a negative value, which is not the actual number of
> bytes written (since it is 32-bit wrapped), and errno has not been
> changed. There is no way for userspace to handle this result correctly!
> 
> There is no way anyone sane should just gloss over this saying
> "programmer fault, you're doing it wrong".
> This is a real bug, and deserves fixing sooner rather than later.

Agreed, I think Dave was a little to quick on the draw on his reply.  :)

-Eric

> Thanks,
>    Jouko Orava
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Bug: Large writes can fail on ext4 if the write buffer is not empty
  2012-04-19 14:15   ` Eric Sandeen
@ 2012-04-19 14:38     ` Jouko Orava
  2012-04-19 14:45       ` Eric Sandeen
  2012-04-20  2:12     ` Dave Chinner
  1 sibling, 1 reply; 15+ messages in thread
From: Jouko Orava @ 2012-04-19 14:38 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Jouko Orava, linux-ext4, Zheng Liu

> FWIW, we tried fairly hard to get the limit lifted in the vfs, to no avail.

I understand. The downsides from the limit are very small, after all.

Has anyone tested the older stable kernel releases?
When was the VFS limit added, or is it something RHEL kernels patch out?

> Agreed, I think Dave was a little to quick on the draw on his reply.

It is easy to miss, the EFAULT on the syscall that follows is so obvious.

I've filed a terse report to the Red Hat Bugzilla:
	https://bugzilla.redhat.com/show_bug.cgi?id=814296
Let me know if you wish me to expand on that report.

Best regards,
    Jouko Orava

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Bug: Large writes can fail on ext4 if the write buffer is not empty
  2012-04-19 14:38     ` Jouko Orava
@ 2012-04-19 14:45       ` Eric Sandeen
  2012-04-19 15:09         ` Jouko Orava
  0 siblings, 1 reply; 15+ messages in thread
From: Eric Sandeen @ 2012-04-19 14:45 UTC (permalink / raw)
  To: Jouko Orava; +Cc: Jouko Orava, linux-ext4, Zheng Liu

On 4/19/12 9:38 AM, Jouko Orava wrote:
>> FWIW, we tried fairly hard to get the limit lifted in the vfs, to no avail.
> 
> I understand. The downsides from the limit are very small, after all.
> 
> Has anyone tested the older stable kernel releases?
> When was the VFS limit added, or is it something RHEL kernels patch out?

No, RHEL kernels have the same limits.

>> Agreed, I think Dave was a little to quick on the draw on his reply.
> 
> It is easy to miss, the EFAULT on the syscall that follows is so obvious.
> 
> I've filed a terse report to the Red Hat Bugzilla:
> 	https://bugzilla.redhat.com/show_bug.cgi?id=814296

Whoops, I just filed one as well.  I'll dup yours to mine since I've already
started the process there.

Thanks,
-Eric

> Let me know if you wish me to expand on that report.
> 
> Best regards,
>     Jouko Orava


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Bug: Large writes can fail on ext4 if the write buffer is not empty
  2012-04-19 13:10 ` Jouko Orava
  2012-04-19 14:15   ` Eric Sandeen
@ 2012-04-19 14:56   ` Zheng Liu
  1 sibling, 0 replies; 15+ messages in thread
From: Zheng Liu @ 2012-04-19 14:56 UTC (permalink / raw)
  To: Jouko Orava; +Cc: linux-ext4

On Thu, Apr 19, 2012 at 9:10 PM, Jouko Orava <jouko.orava@helsinki.fi> wrote:
> Hi Zheng,
>
> I can confirm the bug exists on latest RHEL 6 x86_64 kernels (2.6.32-220).
>
> On current mainline kernels all writes are limited to one page under 2GB,
> which masks the problem. I have not checked if mainline 2.6.32 has this
> limit or not. It does not matter: the limit is just a band-aid to paper
> over filesystem bugs, and should not mean you don't fix filesystem bugs.
>
> I can confirm the one line patch to fs/ext4/file.c does fix the problem.
> I have test kernels based on 2.6.32-220.7.1.el6.x86_64 with only
> the patch applied, at
>        http://www.helsinki.fi/~joorava/kernel/
> if anyone else is interested in testing.
>
> I did some (limited) testing on ext4 with the patch. It fixes the problem:
> large writes work very well too. No problems popped up in testing.
>
> Tested-by: Jouko Orava <jouko.orava@helsinki.fi>
>
>
> I'd also like to point at the real bug here, in Jouni's original strace:
>
>        writev(3, [{"\0\0\0\0\0\0\0\0", 8}, {"", 2147483648}], 2) = -2147483640
>
> The syscall returns a negative value, which is not the actual number of
> bytes written (since it is 32-bit wrapped), and errno has not been
> changed. There is no way for userspace to handle this result correctly!
>
> There is no way anyone sane should just gloss over this saying
> "programmer fault, you're doing it wrong".
> This is a real bug, and deserves fixing sooner rather than later.
>
> Thanks,
>   Jouko Orava

Hi Jouko,

Thanks for testing.  I don't test this bug in redhat kernel.  As Eric said, he
will fix it if it exists.  I think that it might be in stable kernel.
So I will dig it
in stable kernel.

Regards,
Zheng
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Bug: Large writes can fail on ext4 if the write buffer is not empty
  2012-04-19 14:45       ` Eric Sandeen
@ 2012-04-19 15:09         ` Jouko Orava
  2012-04-19 15:28           ` Zheng Liu
  0 siblings, 1 reply; 15+ messages in thread
From: Jouko Orava @ 2012-04-19 15:09 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: linux-ext4, Zheng Liu

> No, RHEL kernels have the same limits.

That means at least 2.6.32.x stable series will suffer from the same bug.
I'll see if I can test those mainline kernels too this weekend.

> Whoops, I just filed one as well.  I'll dup yours to mine since I've already
> started the process there.

That's excellent, thank you. Mine was unacceptably terse anyway :)

Thanks,
  Jouko

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Bug: Large writes can fail on ext4 if the write buffer is not empty
  2012-04-19 15:09         ` Jouko Orava
@ 2012-04-19 15:28           ` Zheng Liu
  0 siblings, 0 replies; 15+ messages in thread
From: Zheng Liu @ 2012-04-19 15:28 UTC (permalink / raw)
  To: Jouko Orava; +Cc: Eric Sandeen, linux-ext4

On Thu, Apr 19, 2012 at 11:09 PM, Jouko Orava <jouko.orava@helsinki.fi> wrote:
>> No, RHEL kernels have the same limits.
>
> That means at least 2.6.32.x stable series will suffer from the same bug.
> I'll see if I can test those mainline kernels too this weekend.

Great!  Please test it in stable kernels because I am afraid that I
couldn't do this work at this weekend. :-)

Regards,
Zheng

>
>> Whoops, I just filed one as well.  I'll dup yours to mine since I've already
>> started the process there.
>
> That's excellent, thank you. Mine was unacceptably terse anyway :)
>
> Thanks,
>  Jouko
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Bug: Large writes can fail on ext4 if the write buffer is not empty
  2012-04-19 14:15   ` Eric Sandeen
  2012-04-19 14:38     ` Jouko Orava
@ 2012-04-20  2:12     ` Dave Chinner
  1 sibling, 0 replies; 15+ messages in thread
From: Dave Chinner @ 2012-04-20  2:12 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Jouko Orava, linux-ext4, Zheng Liu

On Thu, Apr 19, 2012 at 09:15:40AM -0500, Eric Sandeen wrote:
> On 4/19/12 8:10 AM, Jouko Orava wrote:
> > I'd also like to point at the real bug here, in Jouni's original strace:
> > 
> > 	writev(3, [{"\0\0\0\0\0\0\0\0", 8}, {"", 2147483648}], 2) = -2147483640
> > 
> > The syscall returns a negative value, which is not the actual number of
> > bytes written (since it is 32-bit wrapped), and errno has not been
> > changed. There is no way for userspace to handle this result correctly!
> > 
> > There is no way anyone sane should just gloss over this saying
> > "programmer fault, you're doing it wrong".
> > This is a real bug, and deserves fixing sooner rather than later.
> 
> Agreed, I think Dave was a little to quick on the draw on his reply.  :)

I was pointing the -EFAULT error in the second syscall which was
just being passed the 2GB sized buffer, not the above error from the
first syscall.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RESEND] ext4: change return value from int to ssize_t in ext4_file_write
  2012-04-13  1:22     ` [PATCH RESEND] ext4: change return value from int to ssize_t in ext4_file_write Zheng Liu
@ 2012-05-22 19:44       ` Eric Sandeen
  2012-05-28 22:08       ` Ted Ts'o
  1 sibling, 0 replies; 15+ messages in thread
From: Eric Sandeen @ 2012-05-22 19:44 UTC (permalink / raw)
  To: Zheng Liu; +Cc: jack, jouni.siren, linux-ext4, Zheng Liu

On 4/12/12 8:22 PM, Zheng Liu wrote:
> On 64-platform, when we do a write operation with a huge number of data, it
> will cause that the ret variable overflows.  So it is replaced with ssize_t.
> 
> Reported-by: Jouni Siren <jouni.siren@iki.fi>
> Reviewed-by: Jan Kara <jack@suse.cz>
> Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>

Ted, is this one queued anywhere?  Seems lost so far.

Thanks,
-Eric

> ---
> Hi Jan,
> 
> You are right.  I have changed the commit log.  ;-)
> 
> Regards,
> Zheng
> 
>  fs/ext4/file.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/fs/ext4/file.c b/fs/ext4/file.c
> index cb70f18..8c7642a 100644
> --- a/fs/ext4/file.c
> +++ b/fs/ext4/file.c
> @@ -95,7 +95,7 @@ ext4_file_write(struct kiocb *iocb, const struct iovec *iov,
>  {
>  	struct inode *inode = iocb->ki_filp->f_path.dentry->d_inode;
>  	int unaligned_aio = 0;
> -	int ret;
> +	ssize_t ret;
>  
>  	/*
>  	 * If we have encountered a bitmap-format file, the size limit


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RESEND] ext4: change return value from int to ssize_t in ext4_file_write
  2012-04-13  1:22     ` [PATCH RESEND] ext4: change return value from int to ssize_t in ext4_file_write Zheng Liu
  2012-05-22 19:44       ` Eric Sandeen
@ 2012-05-28 22:08       ` Ted Ts'o
  1 sibling, 0 replies; 15+ messages in thread
From: Ted Ts'o @ 2012-05-28 22:08 UTC (permalink / raw)
  To: Zheng Liu; +Cc: jack, jouni.siren, linux-ext4, Zheng Liu

On Fri, Apr 13, 2012 at 09:22:37AM +0800, Zheng Liu wrote:
> On 64-platform, when we do a write operation with a huge number of data, it
> will cause that the ret variable overflows.  So it is replaced with ssize_t.
> 
> Reported-by: Jouni Siren <jouni.siren@iki.fi>
> Reviewed-by: Jan Kara <jack@suse.cz>
> Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>

Applied, with the following commit description:

   ext4: use consistent ssize_t type in ext4_file_write()

   The generic_file_aio_write() function returns ssize_t, and
   ext4_file_write() returns a ssize_t, so use a ssize_t to collect the
   return value from generic_file_aio_write().  It shouldn't matter since
   the VFS read/write paths shouldn't allow a read greater than MAX_INT,
   but there was previously a bug in the AIO code paths, and it's best if
   we use a consistent type so that the return value from
   generic_file_aio_write() can't get truncated.

Regards,

					- Ted

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2012-05-28 22:08 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-12 14:47 Bug: Large writes can fail on ext4 if the write buffer is not empty Jouni Siren
2012-04-12 16:06 ` Zheng Liu
2012-04-12 20:20   ` Jan Kara
2012-04-13  1:22     ` [PATCH RESEND] ext4: change return value from int to ssize_t in ext4_file_write Zheng Liu
2012-05-22 19:44       ` Eric Sandeen
2012-05-28 22:08       ` Ted Ts'o
2012-04-13  0:10 ` Bug: Large writes can fail on ext4 if the write buffer is not empty Dave Chinner
2012-04-19 13:10 ` Jouko Orava
2012-04-19 14:15   ` Eric Sandeen
2012-04-19 14:38     ` Jouko Orava
2012-04-19 14:45       ` Eric Sandeen
2012-04-19 15:09         ` Jouko Orava
2012-04-19 15:28           ` Zheng Liu
2012-04-20  2:12     ` Dave Chinner
2012-04-19 14:56   ` Zheng Liu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.