All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/1] ext4, dio: Remove overflow for size >2G in aio-dio code.
       [not found] <F9014F50-D1F6-4B64-8535-7452CC64B18A@qualexsystems.com>
@ 2012-05-20  8:01 ` manish honap
  2012-05-20 18:33   ` Linus Torvalds
  0 siblings, 1 reply; 11+ messages in thread
From: manish honap @ 2012-05-20  8:01 UTC (permalink / raw)
  To: tytso, adilger.kernel; +Cc: torvalds, linux-fsdevel



From: Manish Honap <manish_honap_vit@yahoo.co.in>

The direct-io.c::do_direct_io() returns int and  this causes the results to
overflow for sizes>=2g; the following patch removes this bug.
Signed-off-by: Manish Honap <manish_honap_vit@yahoo.co.in>


---

Kernel version - linux-3.3.4
linux-3.3.4/fs/direct-io.c  |    6 +++---
linux-3.3.4/fs/ext4/file.c  |    2 +-
linux-3.3.4/fs/ext4/inode.c |    2 +-
3 files changed, 5 insertions(+), 5 deletions(-)

diff -uprN -X linux-3.3.4-vanilla/Documentation/dontdiff linux-3.3.4-vanilla/fs/direct-io.c ../linux-3.3.4/fs/direct-io.c
--- linux-3.3.4-vanilla/fs/direct-io.c    2012-04-27 22:47:35.000000000 +0530
+++ ../linux-3.3.4/fs/direct-io.c    2012-05-20 10:34:47.885599682 +0530
@@ -892,14 +892,14 @@ static inline void dio_zero_block(struct
  * it should set b_size to PAGE_SIZE or more inside get_block().  This gives
  * fine alignment but still allows this function to work in PAGE_SIZE units.
  */
-static int do_direct_IO(struct dio *dio, struct dio_submit *sdio,
-            struct buffer_head *map_bh)
+static ssize_t do_direct_IO(struct dio *dio, struct dio_submit *sdio,
+                struct buffer_head *map_bh)
{
    const unsigned blkbits = sdio->blkbits;
    const unsigned blocks_per_page = PAGE_SIZE >> blkbits;
    struct page *page;
    unsigned block_in_page;
-    int ret = 0;
+    ssize_t ret = 0;

    /* The I/O can start at any block offset within the first page */
    block_in_page = sdio->first_block_in_page;
diff -uprN -X linux-3.3.4-vanilla/Documentation/dontdiff linux-3.3.4-vanilla/fs/ext4/file.c ../linux-3.3.4/fs/ext4/file.c
--- linux-3.3.4-vanilla/fs/ext4/file.c    2012-04-27 22:47:35.000000000 +0530
+++ ../linux-3.3.4/fs/ext4/file.c    2012-05-20 10:29:02.714603023 +0530
@@ -95,7 +95,7 @@ ext4_file_write(struct kiocb *iocb, cons
{
    struct inode *inode = iocb->ki_filp->f_path.dentry->d_inode;
    int unaligned_aio = 0;
-    int ret;
+    ssize_t ret;

    /*
     * If we have encountered a bitmap-format file, the size limit
diff -uprN -X linux-3.3.4-vanilla/Documentation/dontdiff linux-3.3.4-vanilla/fs/ext4/inode.c ../linux-3.3.4/fs/ext4/inode.c
--- linux-3.3.4-vanilla/fs/ext4/inode.c    2012-04-27 22:47:35.000000000 +0530
+++ ../linux-3.3.4/fs/ext4/inode.c    2012-05-20 10:30:15.444622201 +0530
@@ -2750,7 +2750,7 @@ static int ext4_get_block_write(struct i
}

static void ext4_end_io_dio(struct kiocb *iocb, loff_t offset,
-                ssize_t size, void *private, int ret,
+                ssize_t size, void *private, ssize_t ret,
                bool is_async)
{
    struct inode *inode = iocb->ki_filp->f_path.dentry->d_inode;


Thanks & Regards,
- Manish 
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/1] ext4, dio: Remove overflow for size >2G in aio-dio code.
  2012-05-20  8:01 ` [PATCH 1/1] ext4, dio: Remove overflow for size >2G in aio-dio code manish honap
@ 2012-05-20 18:33   ` Linus Torvalds
  2012-05-21  3:28     ` manish honap
  0 siblings, 1 reply; 11+ messages in thread
From: Linus Torvalds @ 2012-05-20 18:33 UTC (permalink / raw)
  To: manish honap; +Cc: tytso, adilger.kernel, linux-fsdevel

On Sun, May 20, 2012 at 1:01 AM, manish honap
<manish_honap_vit@yahoo.co.in> wrote:
>
> From: Manish Honap <manish_honap_vit@yahoo.co.in>
>
> The direct-io.c::do_direct_io() returns int and  this causes the results to
> overflow for sizes>=2g; the following patch removes this bug.
>
> Signed-off-by: Manish Honap <manish_honap_vit@yahoo.co.in>

It should not be possible to do a write bigger than 2GB - the generic
VFS layer should stop it. Exactly because of overflow avoidance issues
(and because a single 2GB+ write would be insane and has serious DoS
issues anyway).

Can you actually *trigger* this issue some way? If so, we should fix
that instead.

                            Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/1] ext4, dio: Remove overflow for size >2G in aio-dio code.
  2012-05-20 18:33   ` Linus Torvalds
@ 2012-05-21  3:28     ` manish honap
  2012-05-21  4:50       ` Linus Torvalds
  0 siblings, 1 reply; 11+ messages in thread
From: manish honap @ 2012-05-21  3:28 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: tytso, adilger.kernel, linux-fsdevel

Hello Linus,

The overflow issue was seen during async dio path

Please consider following code,
<code>
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <libaio.h>
#include <err.h>
#include <string.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

#define NR_BYTES 2*1024*1024*1024L

int main()
{
        int              fd;
        int              ret;
        struct iocb      iocb;
        struct iocb     *piocb  = &iocb;
        io_context_t     io_ctx = NULL;
        void            *ptr;
        struct io_event  evout;
        struct timespec  ioq_timeout;

        ioq_timeout.tv_sec = 0;
        ioq_timeout.tv_nsec = 1000000000;
        
        ret = posix_memalign(&ptr, 1<<12, NR_BYTES);
        if (ret != 0) {
                errx(1, "Allocating Memory.");
        }
        memset(ptr, 'a', NR_BYTES);
        fd = open("temp.txt", O_CREAT|O_RDWR|O_DIRECT, 0744);
        if (fd < 0) {
            fprintf(stderr, "Error opening file\n");
            goto out;
        }
        ret = io_setup(10, &io_ctx);
        if (ret != 0) {
            fprintf(stderr, "During io_setup\n");
            goto out;
        }
        piocb->aio_fildes     = fd;
        piocb->u.c.buf        = ptr;
        piocb->u.c.nbytes     = NR_BYTES;
        piocb->u.c.offset     = 0;
        piocb->aio_lio_opcode = IO_CMD_PWRITE;
        ret = io_submit(io_ctx, 1, &piocb);
        if (ret != 1) {
                fprintf(stderr, "During io_submit\n");
        goto out;
        }
        ret = io_getevents(io_ctx, 1, 1, &evout, &ioq_timeout);
        if (ret == 1) {
                printf("%ld (0x%lx) bytes transferred\n", evout.res);
        }
        io_destroy(io_ctx);
out:
        free(ptr);
        return 0;
}
</code>

-2147483648 (0xffffffff80000000) bytes transferred

Well, in this case we can see the overflow.

Thanks and Regards 
- Manish


----- Original Message -----
From: Linus Torvalds <torvalds@linux-foundation.org>
To: manish honap <manish_honap_vit@yahoo.co.in>
Cc: "tytso@mit.edu" <tytso@mit.edu>; "adilger.kernel@dilger.ca" <adilger.kernel@dilger.ca>; "linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>
Sent: Monday, 21 May 2012 12:03 AM
Subject: Re: [PATCH 1/1] ext4, dio: Remove overflow for size >2G in aio-dio code.

On Sun, May 20, 2012 at 1:01 AM, manish honap
<manish_honap_vit@yahoo.co.in> wrote:
>
> From: Manish Honap <manish_honap_vit@yahoo.co.in>
>
> The direct-io.c::do_direct_io() returns int and  this causes the results to
> overflow for sizes>=2g; the following patch removes this bug.
>
> Signed-off-by: Manish Honap <manish_honap_vit@yahoo.co.in>

It should not be possible to do a write bigger than 2GB - the generic
VFS layer should stop it. Exactly because of overflow avoidance issues
(and because a single 2GB+ write would be insane and has serious DoS
issues anyway).

Can you actually *trigger* this issue some way? If so, we should fix
that instead.

                            Linus

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/1] ext4, dio: Remove overflow for size >2G in aio-dio code.
  2012-05-21  3:28     ` manish honap
@ 2012-05-21  4:50       ` Linus Torvalds
  2012-05-21 22:22         ` Linus Torvalds
                           ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Linus Torvalds @ 2012-05-21  4:50 UTC (permalink / raw)
  To: manish honap; +Cc: tytso, adilger.kernel, linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 806 bytes --]

On Sun, May 20, 2012 at 8:28 PM, manish honap
<manish_honap_vit@yahoo.co.in> wrote:
> Hello Linus,
>
> The overflow issue was seen during async dio path

Christ. fs/aio.c doesn't do the proper rw_verify_area().

As a result, it doesn't check file locks, and it doesn't seem to check
offset overflows either.

The vector versions kind of get the size limit by mistake (because
they at least use rw_copy_check_uvector(), which does limit things to
MAX_RW_COUNT), but they don't do the offset overflow check either.

Does this patch work for you? What it *should* do is the same that the
other read/write paths do (and the vector path for aio already do),
namely truncate reads or writes to MAX_RW_COUNT (which is INT_MAX
aligned down to a page).

This patch is entirely untested,

                     Linus

[-- Attachment #2: patch.diff --]
[-- Type: application/octet-stream, Size: 2545 bytes --]

 fs/aio.c | 30 ++++++++++++++----------------
 1 file changed, 14 insertions(+), 16 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 67a6db3e1b6f..e7f2fad7b4ce 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1456,6 +1456,10 @@ static ssize_t aio_setup_vectored_rw(int type, struct kiocb *kiocb, bool compat)
 	if (ret < 0)
 		goto out;
 
+	ret = rw_verify_area(type, kiocb->ki_filp, &kiocb->ki_pos, ret);
+	if (ret < 0)
+		goto out;
+
 	kiocb->ki_nr_segs = kiocb->ki_nbytes;
 	kiocb->ki_cur_seg = 0;
 	/* ki_nbytes/left now reflect bytes instead of segs */
@@ -1467,11 +1471,17 @@ out:
 	return ret;
 }
 
-static ssize_t aio_setup_single_vector(struct kiocb *kiocb)
+static ssize_t aio_setup_single_vector(int type, struct file * file, struct kiocb *kiocb)
 {
+	int bytes;
+
+	bytes = rw_verify_area(type, file, &kiocb->ki_pos, kiocb->ki_left);
+	if (bytes < 0)
+		return bytes;
+
 	kiocb->ki_iovec = &kiocb->ki_inline_vec;
 	kiocb->ki_iovec->iov_base = kiocb->ki_buf;
-	kiocb->ki_iovec->iov_len = kiocb->ki_left;
+	kiocb->ki_iovec->iov_len = bytes;
 	kiocb->ki_nr_segs = 1;
 	kiocb->ki_cur_seg = 0;
 	return 0;
@@ -1496,10 +1506,7 @@ static ssize_t aio_setup_iocb(struct kiocb *kiocb, bool compat)
 		if (unlikely(!access_ok(VERIFY_WRITE, kiocb->ki_buf,
 			kiocb->ki_left)))
 			break;
-		ret = security_file_permission(file, MAY_READ);
-		if (unlikely(ret))
-			break;
-		ret = aio_setup_single_vector(kiocb);
+		ret = aio_setup_single_vector(READ, file, kiocb);
 		if (ret)
 			break;
 		ret = -EINVAL;
@@ -1514,10 +1521,7 @@ static ssize_t aio_setup_iocb(struct kiocb *kiocb, bool compat)
 		if (unlikely(!access_ok(VERIFY_READ, kiocb->ki_buf,
 			kiocb->ki_left)))
 			break;
-		ret = security_file_permission(file, MAY_WRITE);
-		if (unlikely(ret))
-			break;
-		ret = aio_setup_single_vector(kiocb);
+		ret = aio_setup_single_vector(WRITE, file, kiocb);
 		if (ret)
 			break;
 		ret = -EINVAL;
@@ -1528,9 +1532,6 @@ static ssize_t aio_setup_iocb(struct kiocb *kiocb, bool compat)
 		ret = -EBADF;
 		if (unlikely(!(file->f_mode & FMODE_READ)))
 			break;
-		ret = security_file_permission(file, MAY_READ);
-		if (unlikely(ret))
-			break;
 		ret = aio_setup_vectored_rw(READ, kiocb, compat);
 		if (ret)
 			break;
@@ -1542,9 +1543,6 @@ static ssize_t aio_setup_iocb(struct kiocb *kiocb, bool compat)
 		ret = -EBADF;
 		if (unlikely(!(file->f_mode & FMODE_WRITE)))
 			break;
-		ret = security_file_permission(file, MAY_WRITE);
-		if (unlikely(ret))
-			break;
 		ret = aio_setup_vectored_rw(WRITE, kiocb, compat);
 		if (ret)
 			break;

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/1] ext4, dio: Remove overflow for size >2G in aio-dio code.
  2012-05-21  4:50       ` Linus Torvalds
@ 2012-05-21 22:22         ` Linus Torvalds
  2012-05-21 23:31           ` Ted Ts'o
  2012-05-22 19:26         ` [PATCH 1/1] xfstests 286: test for 2G overflows in AIO Eric Sandeen
  2012-05-22 20:41         ` [PATCH 1/1] ext4, dio: Remove overflow for size >2G in aio-dio code Jeff Moyer
  2 siblings, 1 reply; 11+ messages in thread
From: Linus Torvalds @ 2012-05-21 22:22 UTC (permalink / raw)
  To: manish honap; +Cc: tytso, adilger.kernel, linux-fsdevel

On Sun, May 20, 2012 at 9:50 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Does this patch work for you? What it *should* do is the same that the
> other read/write paths do (and the vector path for aio already do),
> namely truncate reads or writes to MAX_RW_COUNT (which is INT_MAX
> aligned down to a page).
>
> This patch is entirely untested,

Ok, so with that patch, your program gives the expected

  2147479552 (0x7ffff000) bytes transferred

which is indeed MAX_INT aligned down to a page boundary (ie MAX_RW_COUNT).

So I'm pretty sure this patch is what we want, and rw_verify_area()
really is required to protect low-level filesystems from these kinds
of issues. Not just ext4.

At the same time, I would *really* want somebody who actually uses
anything AIO to test it out too. Because I want to not only commit it,
but also mark it for stable - and it would be nice to have some more
testing than me saying "ok, it passes the one test-case sent to me"
and "hey, the code looks sane".

Anybody?

                              Linus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/1] ext4, dio: Remove overflow for size >2G in aio-dio code.
  2012-05-21 22:22         ` Linus Torvalds
@ 2012-05-21 23:31           ` Ted Ts'o
  2012-05-22 16:11             ` Eric Sandeen
  2012-05-22 16:13             ` manish honap
  0 siblings, 2 replies; 11+ messages in thread
From: Ted Ts'o @ 2012-05-21 23:31 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: manish honap, adilger.kernel, linux-fsdevel, linux-ext4

On Mon, May 21, 2012 at 03:22:28PM -0700, Linus Torvalds wrote:
> 
> So I'm pretty sure this patch is what we want, and rw_verify_area()
> really is required to protect low-level filesystems from these kinds
> of issues. Not just ext4.
> 
> At the same time, I would *really* want somebody who actually uses
> anything AIO to test it out too. Because I want to not only commit it,
> but also mark it for stable - and it would be nice to have some more
> testing than me saying "ok, it passes the one test-case sent to me"
> and "hey, the code looks sane".

Hi Linus,

We do use AIO at Google, and I primarily use fio (usually as part of
xfstests) to test AIO functionality.

If you like I can carry the patch in the ext4 tree and test it as part
of the ext4 commits for the merge window.

I probably won't be able to get to it until a bit later in the week,
though.  Things have been really crazy for me this past two months...

						- Ted

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/1] ext4, dio: Remove overflow for size >2G in aio-dio code.
  2012-05-21 23:31           ` Ted Ts'o
@ 2012-05-22 16:11             ` Eric Sandeen
  2012-05-22 19:02               ` Eric Sandeen
  2012-05-22 16:13             ` manish honap
  1 sibling, 1 reply; 11+ messages in thread
From: Eric Sandeen @ 2012-05-22 16:11 UTC (permalink / raw)
  To: Ted Ts'o
  Cc: Linus Torvalds, manish honap, adilger.kernel, linux-fsdevel, linux-ext4

On 5/21/12 6:31 PM, Ted Ts'o wrote:
> On Mon, May 21, 2012 at 03:22:28PM -0700, Linus Torvalds wrote:
>>
>> So I'm pretty sure this patch is what we want, and rw_verify_area()
>> really is required to protect low-level filesystems from these kinds
>> of issues. Not just ext4.
>>
>> At the same time, I would *really* want somebody who actually uses
>> anything AIO to test it out too. Because I want to not only commit it,
>> but also mark it for stable - and it would be nice to have some more
>> testing than me saying "ok, it passes the one test-case sent to me"
>> and "hey, the code looks sane".

Just FWIW, the ext4_file_write() part of this fix was sent by Zheng Liu on 4/12/12:

[PATCH RESEND] ext4: change return value from int to ssize_t in ext4_file_write

I'll see about writing an xfstest for this stuff too so it doesn't regress again.

-Eric

> Hi Linus,
> 
> We do use AIO at Google, and I primarily use fio (usually as part of
> xfstests) to test AIO functionality.
> 
> If you like I can carry the patch in the ext4 tree and test it as part
> of the ext4 commits for the merge window.
> 
> I probably won't be able to get to it until a bit later in the week,
> though.  Things have been really crazy for me this past two months...
> 
> 						- Ted
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/1] ext4, dio: Remove overflow for size >2G in aio-dio code.
  2012-05-21 23:31           ` Ted Ts'o
  2012-05-22 16:11             ` Eric Sandeen
@ 2012-05-22 16:13             ` manish honap
  1 sibling, 0 replies; 11+ messages in thread
From: manish honap @ 2012-05-22 16:13 UTC (permalink / raw)
  To: Ted Ts'o, Linus Torvalds; +Cc: linux-fsdevel, linux-ext4

Hello Linus, Ted



On Mon, May 21, 2012 at 03:22:28PM -0700, Linus Torvalds wrote:
> 
> So I'm pretty sure this patch is what we want, and rw_verify_area()
> really is required to protect low-level filesystems from these kinds
> of issues. Not just ext4.

Patch works on my machine also. 
> 
> At the same time, I would *really* want somebody who actually uses
> anything AIO to test it out too. Because I want to not only commit it,
> but also mark it for stable - and it would be nice to have some more
> testing than me saying "ok, it passes the one test-case sent to me"
> and "hey, the code looks sane".

Used Chris Mason's aio-stress.c

- Manish


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/1] ext4, dio: Remove overflow for size >2G in aio-dio code.
  2012-05-22 16:11             ` Eric Sandeen
@ 2012-05-22 19:02               ` Eric Sandeen
  0 siblings, 0 replies; 11+ messages in thread
From: Eric Sandeen @ 2012-05-22 19:02 UTC (permalink / raw)
  To: Eric Sandeen
  Cc: Ted Ts'o, Linus Torvalds, manish honap, adilger.kernel,
	linux-fsdevel, linux-ext4

On 5/22/12 11:11 AM, Eric Sandeen wrote:
> On 5/21/12 6:31 PM, Ted Ts'o wrote:
>> On Mon, May 21, 2012 at 03:22:28PM -0700, Linus Torvalds wrote:
>>>
>>> So I'm pretty sure this patch is what we want, and rw_verify_area()
>>> really is required to protect low-level filesystems from these kinds
>>> of issues. Not just ext4.
>>>
>>> At the same time, I would *really* want somebody who actually uses
>>> anything AIO to test it out too. Because I want to not only commit it,
>>> but also mark it for stable - and it would be nice to have some more
>>> testing than me saying "ok, it passes the one test-case sent to me"
>>> and "hey, the code looks sane".
> 
> Just FWIW, the ext4_file_write() part of this fix was sent by Zheng Liu on 4/12/12:
> 
> [PATCH RESEND] ext4: change return value from int to ssize_t in ext4_file_write

Sorry, I was talking about Manish's original patch, not Linus's followup.

-Eric

> I'll see about writing an xfstest for this stuff too so it doesn't regress again.
> 
> -Eric
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/1] xfstests 286: test for 2G overflows in AIO
  2012-05-21  4:50       ` Linus Torvalds
  2012-05-21 22:22         ` Linus Torvalds
@ 2012-05-22 19:26         ` Eric Sandeen
  2012-05-22 20:41         ` [PATCH 1/1] ext4, dio: Remove overflow for size >2G in aio-dio code Jeff Moyer
  2 siblings, 0 replies; 11+ messages in thread
From: Eric Sandeen @ 2012-05-22 19:26 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: manish honap, tytso, adilger.kernel, linux-fsdevel, Jeff Moyer

On 5/20/12 11:50 PM, Linus Torvalds wrote:
> On Sun, May 20, 2012 at 8:28 PM, manish honap
> <manish_honap_vit@yahoo.co.in> wrote:
>> Hello Linus,
>>
>> The overflow issue was seen during async dio path
> 
> Christ. fs/aio.c doesn't do the proper rw_verify_area().
> 
> As a result, it doesn't check file locks, and it doesn't seem to check
> offset overflows either.
> 
> The vector versions kind of get the size limit by mistake (because
> they at least use rw_copy_check_uvector(), which does limit things to
> MAX_RW_COUNT), but they don't do the offset overflow check either.
> 
> Does this patch work for you? What it *should* do is the same that the
> other read/write paths do (and the vector path for aio already do),
> namely truncate reads or writes to MAX_RW_COUNT (which is INT_MAX
> aligned down to a page).
> 
> This patch is entirely untested,
> 
>                      Linus

Here's a testcase for xfstests.
---

Add new testcase looking for overflows in AIO code when 2G write
requests are issued.

Also fix up ltp/aio-stress.c to not overflow before the request
ever gets to the kernel...

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
---

diff --git a/286 b/286
new file mode 100755
index 0000000..f5daa96
--- /dev/null
+++ b/286
@@ -0,0 +1,75 @@
+#! /bin/bash
+# FS QA Test No. 286
+#
+# Check for 2G overflows in AIO
+#
+#-----------------------------------------------------------------------
+# Copyright (c) 2012 Red Hat, Inc.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#-----------------------------------------------------------------------
+#
+# creator
+owner=sandeen@sandeen.net
+
+seq=`basename $0`
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1	# failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+    cd /
+    rm -f $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+
+# real QA test starts here
+rm -f $seq.full
+
+# Modify as appropriate.
+_supported_fs generic
+_supported_os IRIX Linux
+# Because we will be writing some big files
+_require_scratch
+[ -x $here/ltp/aio-stress ] || _notrun "aio-stress not built for this platform"
+
+_scratch_mkfs > $seq.full 2>&1
+_scratch_mount
+# A little over 4G
+_require_fs_space $SCRATCH_MNT 4194500
+
+expected=4294967296
+
+# 4x 1G IOs, should pass
+$here/ltp/aio-stress -d 1 -b 1 -i 1 -O -I 4 -s 4096 -r 1048576 -v $SCRATCH_MNT/aiofile >> $seq.full 2>&1
+size=$(ls -l $SCRATCH_MNT/aiofile | $AWK_PROG '{print $5}')
+[ "$size" -ne $expected ] && _fail "2 x 1G IOs: filesize $size not $expected"
+
+rm -f $SCRATCH_MNT/aiofile
+
+# 2x 2G IOs, has failed in past
+$here/ltp/aio-stress -d 1 -b 1 -i 1 -O -I 2 -s 4096 -r 2097152 -v $SCRATCH_MNT/aiofile >> $seq.full 2>&1
+size=$(ls -l $SCRATCH_MNT/aiofile | $AWK_PROG '{print $5}')
+[ "$size" -ne $expected ] && _fail "1 x 2G IOs: filesize $size not $expected"
+
+# success, all done
+status=0
+exit
diff --git a/286.out b/286.out
new file mode 100644
index 0000000..6415ad8
--- /dev/null
+++ b/286.out
@@ -0,0 +1 @@
+QA output created by 286
diff --git a/group b/group
index 17afdcd..e91abd6 100644
--- a/group
+++ b/group
@@ -404,3 +404,4 @@ deprecated
 283 dump ioctl auto quick
 284 auto
 285 repair
+286 aio auto
diff --git a/ltp/aio-stress.c b/ltp/aio-stress.c
index 57a2158..40651b4 100644
--- a/ltp/aio-stress.c
+++ b/ltp/aio-stress.c
@@ -92,7 +92,7 @@ int completion_latency_stats = 0;
 int io_iter = 8;
 int iterations = RUN_FOREVER;
 int max_io_submit = 0;
-long rec_len = 64 * 1024;
+size_t rec_len = 64 * 1024;
 int depth = 64;
 int num_threads = 1;
 int num_contexts = 1;
@@ -102,7 +102,7 @@ int use_shm = 0;
 int shm_id;
 char *unaligned_buffer = NULL;
 char *aligned_buffer = NULL;
-int padded_reclen = 0;
+size_t padded_reclen = 0;
 int stonewall = 1;
 int verify = 0;
 char *verify_buf = NULL;
@@ -661,7 +661,7 @@ finish_oper(struct thread_info *t, struct io_oper *oper)
  * null on error
  */
 static struct io_oper * 
-create_oper(int fd, int rw, off_t start, off_t end, int reclen, int depth,
+create_oper(int fd, int rw, off_t start, off_t end, size_t reclen, int depth,
             int iter, char *file_name)
 {
     struct io_oper *oper;
@@ -925,7 +925,7 @@ void aio_setup(io_context_t *io_ctx, int n)
  */
 int setup_ious(struct thread_info *t, 
               int num_files, int depth, 
-	      int reclen, int max_io_submit) {
+	      size_t reclen, int max_io_submit) {
     int i;
     size_t bytes = num_files * depth * sizeof(*t->ios);
 
@@ -989,7 +989,7 @@ free_buffers:
  * buffers to
  */
 int setup_shared_mem(int num_threads, int num_files, int depth, 
-                     int reclen, int max_io_submit) 
+                     size_t reclen, int max_io_submit) 
 {
     char *p = NULL;
     size_t total_ram;



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/1] ext4, dio: Remove overflow for size >2G in aio-dio code.
  2012-05-21  4:50       ` Linus Torvalds
  2012-05-21 22:22         ` Linus Torvalds
  2012-05-22 19:26         ` [PATCH 1/1] xfstests 286: test for 2G overflows in AIO Eric Sandeen
@ 2012-05-22 20:41         ` Jeff Moyer
  2 siblings, 0 replies; 11+ messages in thread
From: Jeff Moyer @ 2012-05-22 20:41 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: manish honap, tytso, adilger.kernel, linux-fsdevel

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Sun, May 20, 2012 at 8:28 PM, manish honap
> <manish_honap_vit@yahoo.co.in> wrote:
>> Hello Linus,
>>
>> The overflow issue was seen during async dio path
>
> Christ. fs/aio.c doesn't do the proper rw_verify_area().
>
> As a result, it doesn't check file locks, and it doesn't seem to check
> offset overflows either.
>
> The vector versions kind of get the size limit by mistake (because
> they at least use rw_copy_check_uvector(), which does limit things to
> MAX_RW_COUNT), but they don't do the offset overflow check either.
>
> Does this patch work for you? What it *should* do is the same that the
> other read/write paths do (and the vector path for aio already do),
> namely truncate reads or writes to MAX_RW_COUNT (which is INT_MAX
> aligned down to a page).

OK, this took me a while to wrap my head around, mostly due to confusion
with ki_nbytes and ki_left.  When doing p{read|write}v, aio_nbytes is
used to store the number of vectors.  Thus, we need to set both ki_left
and ki_nbytes to the result of the rw_verify_area call, which means that
when ki_left goes to 0, we'll exit out of this loop:

	do {
		ret = rw_op(iocb, &iocb->ki_iovec[iocb->ki_cur_seg],
			    iocb->ki_nr_segs - iocb->ki_cur_seg,
			    iocb->ki_pos);
		if (ret > 0)
			aio_advance_iovec(iocb, ret);

	/* retry all partial writes.  retry partial reads as long as its a
	 * regular file. */
	} while (ret > 0 && iocb->ki_left > 0 &&
		 (opcode == IOCB_CMD_PWRITEV ||
		  (!S_ISFIFO(inode->i_mode) && !S_ISSOCK(inode->i_mode))));

and here:

	/* This means we must have transferred all that we could */
	/* No need to retry anymore */
	if ((ret == 0) || (iocb->ki_left == 0))
		ret = iocb->ki_nbytes - iocb->ki_left;

ki_nbytes will remain at whatever rw_verify_area returned, and ki_left
should be zero.  Meaning, if we passed in 2G or more, we're capped at
the value Linus mentioned earlier.

In the case of a single vector, we leave ki_left alone:

> -	kiocb->ki_iovec->iov_len = kiocb->ki_left;
> +	kiocb->ki_iovec->iov_len = bytes;

Thus, we will exit the while loop due to ret == 0, but this time,
ki_nbytes will be the initial value, and ki_left will be non-zero.  The
end result is that the amount of data transferred is returned, so all is
well.

Of course, none of that matters for O_DIRECT I/O, since the return value
is -EIOCBQUEUED.  And, since buffered AIO isn't even asynchronous,
nobody /should/ be doing that anyway, so even if the above was broken,
nobody /should/ notice.  Nevertheless, I did some targeted testing of
both the O_DIRECT and the buffered cases, and it seems to be working
fine for the single and multiple vector cases.

I do have one question, though: why do we limit the total bytes in a
vectored operation to 2GB?  So long as each segment is below 2G, we
shouldn't trip up any code paths in the kernel, right?  So that just
leaves the "it would take a long time and burn kernel resources"
argument.  So I guess that's the reason?

Anyway, consider this a long-winded:

Reviewed-by: Jeff Moyer <jmoyer@redhat.com>

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2012-05-22 20:41 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <F9014F50-D1F6-4B64-8535-7452CC64B18A@qualexsystems.com>
2012-05-20  8:01 ` [PATCH 1/1] ext4, dio: Remove overflow for size >2G in aio-dio code manish honap
2012-05-20 18:33   ` Linus Torvalds
2012-05-21  3:28     ` manish honap
2012-05-21  4:50       ` Linus Torvalds
2012-05-21 22:22         ` Linus Torvalds
2012-05-21 23:31           ` Ted Ts'o
2012-05-22 16:11             ` Eric Sandeen
2012-05-22 19:02               ` Eric Sandeen
2012-05-22 16:13             ` manish honap
2012-05-22 19:26         ` [PATCH 1/1] xfstests 286: test for 2G overflows in AIO Eric Sandeen
2012-05-22 20:41         ` [PATCH 1/1] ext4, dio: Remove overflow for size >2G in aio-dio code Jeff Moyer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.