linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] fs: fix guard_bio_eod to check for real EOD errors
@ 2019-02-22 14:13 Carlos Maiolino
  2019-02-22 14:47 ` Carlos Maiolino
  2019-02-22 23:33 ` Ming Lei
  0 siblings, 2 replies; 7+ messages in thread
From: Carlos Maiolino @ 2019-02-22 14:13 UTC (permalink / raw)
  To: linux-fsdevel

guard_bio_eod() can truncate a segment in bio to allow it to do IO on
odd last sectors of a device.

It already checks if the IO starts past EOD, but it does not consider
the possibility of an IO request starting within device boundaries can
contain more than one segment past EOD.

In such cases, truncated_bytes can be bigger than PAGE_SIZE, and will
underflow bvec->bv_len.

Fix this by checking if truncated_bytes is lower than PAGE_SIZE.

This situation has been found on filesystems such as isofs and vfat,
which doesn't check the device size before mount, if the device is
smaller than the filesystem itself, a readahead on such filesystem,
which spans EOD, can trigger this situation, leading a call to
zero_user() with a wrong size possibly corrupting memory.

I didn't see any crash, or didn't let the system run long enough to
check if memory corruption will be hit somewhere, but adding
instrumentation to guard_bio_end() to check truncated_bytes size, was
enough to see the error.

The following script can trigger the error.

MNT=/mnt
IMG=./DISK.img
DEV=/dev/loop0

mkfs.vfat $IMG
mount $IMG $MNT
cp -R /etc $MNT &> /dev/null
umount $MNT

losetup -D

losetup --find --show --sizelimit 16247280 $IMG
mount $DEV $MNT

find $MNT -type f -exec cat {} + >/dev/null

Kudos to Eric Sandeen for coming up with the reproducer above

Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
---

P.S. I'm not 100% proficient in bio internals, so I'm not sure if this is the
right fix, so comments are much appreciated.
Thanks

 fs/buffer.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/fs/buffer.c b/fs/buffer.c
index 784de3dbcf28..32dc5cd2f6ba 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -3063,6 +3063,13 @@ void guard_bio_eod(int op, struct bio *bio)
 	/* Uhhuh. We've got a bio that straddles the device size! */
 	truncated_bytes = bio->bi_iter.bi_size - (maxsector << 9);
 
+	/*
+	 * The bio contains more than one segment which spans EOD, just return
+	 * and let IO layer turn it into an EIO
+	 */
+	if (truncated_bytes > PAGE_SIZE)
+		return;
+
 	/* Truncate the bio.. */
 	bio->bi_iter.bi_size -= truncated_bytes;
 	bvec->bv_len -= truncated_bytes;
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] fs: fix guard_bio_eod to check for real EOD errors
  2019-02-22 14:13 [PATCH] fs: fix guard_bio_eod to check for real EOD errors Carlos Maiolino
@ 2019-02-22 14:47 ` Carlos Maiolino
  2019-02-22 23:33 ` Ming Lei
  1 sibling, 0 replies; 7+ messages in thread
From: Carlos Maiolino @ 2019-02-22 14:47 UTC (permalink / raw)
  To: linux-fsdevel

> MNT=/mnt
> IMG=./DISK.img
> DEV=/dev/loop0
> 

My apologies. I forgot to mention I had ./DISK.img already created in my working
tree, it's a simple

`dd if=/dev/zero of=./DISK.img bs=1m count=20`.

The image should be slighter larger than 16247280, so the FS attempts to step
out of the device.


> mkfs.vfat $IMG
> mount $IMG $MNT
> cp -R /etc $MNT &> /dev/null
> umount $MNT
> 
> losetup -D
> 
> losetup --find --show --sizelimit 16247280 $IMG
> mount $DEV $MNT
> 
> find $MNT -type f -exec cat {} + >/dev/null
> 
> Kudos to Eric Sandeen for coming up with the reproducer above
> 
> Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
> ---
> 
> P.S. I'm not 100% proficient in bio internals, so I'm not sure if this is the
> right fix, so comments are much appreciated.
> Thanks
> 
>  fs/buffer.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/fs/buffer.c b/fs/buffer.c
> index 784de3dbcf28..32dc5cd2f6ba 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -3063,6 +3063,13 @@ void guard_bio_eod(int op, struct bio *bio)
>  	/* Uhhuh. We've got a bio that straddles the device size! */
>  	truncated_bytes = bio->bi_iter.bi_size - (maxsector << 9);
>  
> +	/*
> +	 * The bio contains more than one segment which spans EOD, just return
> +	 * and let IO layer turn it into an EIO
> +	 */
> +	if (truncated_bytes > PAGE_SIZE)
> +		return;
> +
>  	/* Truncate the bio.. */
>  	bio->bi_iter.bi_size -= truncated_bytes;
>  	bvec->bv_len -= truncated_bytes;
> -- 
> 2.17.2
> 

-- 
Carlos

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] fs: fix guard_bio_eod to check for real EOD errors
  2019-02-22 14:13 [PATCH] fs: fix guard_bio_eod to check for real EOD errors Carlos Maiolino
  2019-02-22 14:47 ` Carlos Maiolino
@ 2019-02-22 23:33 ` Ming Lei
  2019-02-24 21:36   ` Dave Chinner
  2019-02-25 13:26   ` Carlos Maiolino
  1 sibling, 2 replies; 7+ messages in thread
From: Ming Lei @ 2019-02-22 23:33 UTC (permalink / raw)
  To: Carlos Maiolino; +Cc: Linux FS Devel, linux-block

Hi Carlos,

Cc block list given it is related with interface between fs and block layer.

On Fri, Feb 22, 2019 at 10:14 PM Carlos Maiolino <cmaiolino@redhat.com> wrote:
>
> guard_bio_eod() can truncate a segment in bio to allow it to do IO on
> odd last sectors of a device.
>
> It already checks if the IO starts past EOD, but it does not consider
> the possibility of an IO request starting within device boundaries can
> contain more than one segment past EOD.
>
> In such cases, truncated_bytes can be bigger than PAGE_SIZE, and will
> underflow bvec->bv_len.

It can cause memory corruption even for < PAGE_SIZE, also it can be correct
to see > PAGE_SIZE truncated_bytes:

- xfs is going to support big block size which may be 64k
- multi-page bvec patches have been in block tree, and it is planned to land
v5.1, so > PAGE_SIZE truncating may come because .bv_len can be much
bigger than PAGE_SIZE

- suppose fs block size is 4k, bio sector is 1022 and size is 4k, and
disk size is
1024, last bvec is (0, 1024), then 'truncated_bytes' will be 3k, so
the check can't
capture this case, and memory corruption still may be caused.

>
> Fix this by checking if truncated_bytes is lower than PAGE_SIZE.
>
> This situation has been found on filesystems such as isofs and vfat,
> which doesn't check the device size before mount, if the device is
> smaller than the filesystem itself, a readahead on such filesystem,
> which spans EOD, can trigger this situation, leading a call to
> zero_user() with a wrong size possibly corrupting memory.
>
> I didn't see any crash, or didn't let the system run long enough to
> check if memory corruption will be hit somewhere, but adding
> instrumentation to guard_bio_end() to check truncated_bytes size, was
> enough to see the error.
>
> The following script can trigger the error.
>
> MNT=/mnt
> IMG=./DISK.img
> DEV=/dev/loop0
>
> mkfs.vfat $IMG
> mount $IMG $MNT
> cp -R /etc $MNT &> /dev/null
> umount $MNT
>
> losetup -D
>
> losetup --find --show --sizelimit 16247280 $IMG
> mount $DEV $MNT
>
> find $MNT -type f -exec cat {} + >/dev/null

BTW, which kernel is  for this issue? Did you investigate why the bad bio
is made?

>
> Kudos to Eric Sandeen for coming up with the reproducer above
>
> Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
> ---
>
> P.S. I'm not 100% proficient in bio internals, so I'm not sure if this is the
> right fix, so comments are much appreciated.
> Thanks
>
>  fs/buffer.c | 7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff --git a/fs/buffer.c b/fs/buffer.c
> index 784de3dbcf28..32dc5cd2f6ba 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -3063,6 +3063,13 @@ void guard_bio_eod(int op, struct bio *bio)
>         /* Uhhuh. We've got a bio that straddles the device size! */
>         truncated_bytes = bio->bi_iter.bi_size - (maxsector << 9);
>
> +       /*
> +        * The bio contains more than one segment which spans EOD, just return
> +        * and let IO layer turn it into an EIO
> +        */
> +       if (truncated_bytes > PAGE_SIZE)
> +               return;
> +

The correct check should be:

+       if (truncated_bytes > bvec->bv_len)
+               return;

Also it should be helpful to warn on this bad bio, but it is fine to not do it
here cause block layer logs this bad bio too.

BTW, this is just a workaround, not one real fix on the reported problem.

Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] fs: fix guard_bio_eod to check for real EOD errors
  2019-02-22 23:33 ` Ming Lei
@ 2019-02-24 21:36   ` Dave Chinner
  2019-02-25 13:26   ` Carlos Maiolino
  1 sibling, 0 replies; 7+ messages in thread
From: Dave Chinner @ 2019-02-24 21:36 UTC (permalink / raw)
  To: Ming Lei; +Cc: Carlos Maiolino, Linux FS Devel, linux-block

On Sat, Feb 23, 2019 at 07:33:21AM +0800, Ming Lei wrote:
> Hi Carlos,
> 
> Cc block list given it is related with interface between fs and block layer.
> 
> On Fri, Feb 22, 2019 at 10:14 PM Carlos Maiolino <cmaiolino@redhat.com> wrote:
> >
> > guard_bio_eod() can truncate a segment in bio to allow it to do IO on
> > odd last sectors of a device.
> >
> > It already checks if the IO starts past EOD, but it does not consider
> > the possibility of an IO request starting within device boundaries can
> > contain more than one segment past EOD.
> >
> > In such cases, truncated_bytes can be bigger than PAGE_SIZE, and will
> > underflow bvec->bv_len.
> 
> It can cause memory corruption even for < PAGE_SIZE, also it can be correct
> to see > PAGE_SIZE truncated_bytes:
> 
> - xfs is going to support big block size which may be 64k

FYI, this isn't an XFS problem and never will be - XFS doesn't use
bufferheads and mpage_readpages() anymore, it goes down the
iomap_readpages() path which does not need this whacky
guard_bio_eod() thingy.

> - suppose fs block size is 4k, bio sector is 1022 and size is 4k, and
> disk size is
> 1024,

XFS won't do that, either - it checks at mount time if it can read
the very last sector of the filesystem via uncached IO (see
xfs_check_sizes() and xfs_rtmount_init()). If any of the EOD reads
fail, it won't mount.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] fs: fix guard_bio_eod to check for real EOD errors
  2019-02-22 23:33 ` Ming Lei
  2019-02-24 21:36   ` Dave Chinner
@ 2019-02-25 13:26   ` Carlos Maiolino
  2019-02-26  2:03     ` Ming Lei
  1 sibling, 1 reply; 7+ messages in thread
From: Carlos Maiolino @ 2019-02-25 13:26 UTC (permalink / raw)
  To: Ming Lei; +Cc: Linux FS Devel, linux-block

Hi,

On Sat, Feb 23, 2019 at 07:33:21AM +0800, Ming Lei wrote:
> Hi Carlos,
> 
> Cc block list given it is related with interface between fs and block layer.
> 
> On Fri, Feb 22, 2019 at 10:14 PM Carlos Maiolino <cmaiolino@redhat.com> wrote:
> >
> > guard_bio_eod() can truncate a segment in bio to allow it to do IO on
> > odd last sectors of a device.
> >
> > It already checks if the IO starts past EOD, but it does not consider
> > the possibility of an IO request starting within device boundaries can
> > contain more than one segment past EOD.
> >
> > In such cases, truncated_bytes can be bigger than PAGE_SIZE, and will
> > underflow bvec->bv_len.
> 
> It can cause memory corruption even for < PAGE_SIZE, also it can be correct
> to see > PAGE_SIZE truncated_bytes:
> 
> - xfs is going to support big block size which may be 64k
> - multi-page bvec patches have been in block tree, and it is planned to land
> v5.1, so > PAGE_SIZE truncating may come because .bv_len can be much
> bigger than PAGE_SIZE
> 
> - suppose fs block size is 4k, bio sector is 1022 and size is 4k, and
> disk size is
> 1024, last bvec is (0, 1024), then 'truncated_bytes' will be 3k, so
> the check can't
> capture this case, and memory corruption still may be caused.
> 
> >
> > Fix this by checking if truncated_bytes is lower than PAGE_SIZE.
> >
> > This situation has been found on filesystems such as isofs and vfat,
> > which doesn't check the device size before mount, if the device is
> > smaller than the filesystem itself, a readahead on such filesystem,
> > which spans EOD, can trigger this situation, leading a call to
> > zero_user() with a wrong size possibly corrupting memory.
> >
> > I didn't see any crash, or didn't let the system run long enough to
> > check if memory corruption will be hit somewhere, but adding
> > instrumentation to guard_bio_end() to check truncated_bytes size, was
> > enough to see the error.
> >
> > The following script can trigger the error.
> >
> > MNT=/mnt
> > IMG=./DISK.img
> > DEV=/dev/loop0
> >
> > mkfs.vfat $IMG
> > mount $IMG $MNT
> > cp -R /etc $MNT &> /dev/null
> > umount $MNT
> >
> > losetup -D
> >
> > losetup --find --show --sizelimit 16247280 $IMG
> > mount $DEV $MNT
> >
> > find $MNT -type f -exec cat {} + >/dev/null
> 
> BTW, which kernel is  for this issue? Did you investigate why the bad bio
> is made?

The problem was reproduced, and the patch also wrote based on linux-next tree,
tag next-20190207.

The bad bio is made while attempting to read the end of a filesystem, which goes
beyond EOD, as described in the patch description itself, it may happen for
filesystems which doesn't check the device size before mounting.

Despite the lack of validation of the block size from the filesystem part, the
block layer should avoid and deny such invalid IO requests. The fact is, the
block layer already validates such IO request via
generic_make_request_checks()->bio_check_eod(), but, before the IO layer could
reach there, guard_bio_eod() has already corrupted the bio.

> 
> >
> > Kudos to Eric Sandeen for coming up with the reproducer above
> >
> > Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
> > ---
> >
> > P.S. I'm not 100% proficient in bio internals, so I'm not sure if this is the
> > right fix, so comments are much appreciated.
> > Thanks
> >
> >  fs/buffer.c | 7 +++++++
> >  1 file changed, 7 insertions(+)
> >
> > diff --git a/fs/buffer.c b/fs/buffer.c
> > index 784de3dbcf28..32dc5cd2f6ba 100644
> > --- a/fs/buffer.c
> > +++ b/fs/buffer.c
> > @@ -3063,6 +3063,13 @@ void guard_bio_eod(int op, struct bio *bio)
> >         /* Uhhuh. We've got a bio that straddles the device size! */
> >         truncated_bytes = bio->bi_iter.bi_size - (maxsector << 9);
> >
> > +       /*
> > +        * The bio contains more than one segment which spans EOD, just return
> > +        * and let IO layer turn it into an EIO
> > +        */
> > +       if (truncated_bytes > PAGE_SIZE)
> > +               return;
> > +
> 
> The correct check should be:
> 
> +       if (truncated_bytes > bvec->bv_len)
> +               return;

Thanks, I'll test it.

> 
> Also it should be helpful to warn on this bad bio, but it is fine to not do it
> here cause block layer logs this bad bio too.
> 
> BTW, this is just a workaround, not one real fix on the reported problem.

A suggestion of what would be the real fix is much appreciated then.

Cheers

> 
> Thanks,
> Ming Lei

-- 
Carlos

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] fs: fix guard_bio_eod to check for real EOD errors
  2019-02-25 13:26   ` Carlos Maiolino
@ 2019-02-26  2:03     ` Ming Lei
  2019-02-26  9:37       ` Carlos Maiolino
  0 siblings, 1 reply; 7+ messages in thread
From: Ming Lei @ 2019-02-26  2:03 UTC (permalink / raw)
  To: Carlos Maiolino; +Cc: Linux FS Devel, linux-block

On Mon, Feb 25, 2019 at 9:26 PM Carlos Maiolino <cmaiolino@redhat.com> wrote:
>
> Hi,
>
> On Sat, Feb 23, 2019 at 07:33:21AM +0800, Ming Lei wrote:
> > Hi Carlos,
> >
> > Cc block list given it is related with interface between fs and block layer.
> >
> > On Fri, Feb 22, 2019 at 10:14 PM Carlos Maiolino <cmaiolino@redhat.com> wrote:
> > >
> > > guard_bio_eod() can truncate a segment in bio to allow it to do IO on
> > > odd last sectors of a device.
> > >
> > > It already checks if the IO starts past EOD, but it does not consider
> > > the possibility of an IO request starting within device boundaries can
> > > contain more than one segment past EOD.
> > >
> > > In such cases, truncated_bytes can be bigger than PAGE_SIZE, and will
> > > underflow bvec->bv_len.
> >
> > It can cause memory corruption even for < PAGE_SIZE, also it can be correct
> > to see > PAGE_SIZE truncated_bytes:
> >
> > - xfs is going to support big block size which may be 64k
> > - multi-page bvec patches have been in block tree, and it is planned to land
> > v5.1, so > PAGE_SIZE truncating may come because .bv_len can be much
> > bigger than PAGE_SIZE
> >
> > - suppose fs block size is 4k, bio sector is 1022 and size is 4k, and
> > disk size is
> > 1024, last bvec is (0, 1024), then 'truncated_bytes' will be 3k, so
> > the check can't
> > capture this case, and memory corruption still may be caused.
> >
> > >
> > > Fix this by checking if truncated_bytes is lower than PAGE_SIZE.
> > >
> > > This situation has been found on filesystems such as isofs and vfat,
> > > which doesn't check the device size before mount, if the device is
> > > smaller than the filesystem itself, a readahead on such filesystem,
> > > which spans EOD, can trigger this situation, leading a call to
> > > zero_user() with a wrong size possibly corrupting memory.
> > >
> > > I didn't see any crash, or didn't let the system run long enough to
> > > check if memory corruption will be hit somewhere, but adding
> > > instrumentation to guard_bio_end() to check truncated_bytes size, was
> > > enough to see the error.
> > >
> > > The following script can trigger the error.
> > >
> > > MNT=/mnt
> > > IMG=./DISK.img
> > > DEV=/dev/loop0
> > >
> > > mkfs.vfat $IMG
> > > mount $IMG $MNT
> > > cp -R /etc $MNT &> /dev/null
> > > umount $MNT
> > >
> > > losetup -D
> > >
> > > losetup --find --show --sizelimit 16247280 $IMG
> > > mount $DEV $MNT
> > >
> > > find $MNT -type f -exec cat {} + >/dev/null
> >
> > BTW, which kernel is  for this issue? Did you investigate why the bad bio
> > is made?
>
> The problem was reproduced, and the patch also wrote based on linux-next tree,
> tag next-20190207.
>
> The bad bio is made while attempting to read the end of a filesystem, which goes
> beyond EOD, as described in the patch description itself, it may happen for
> filesystems which doesn't check the device size before mounting.

OK.

>
> Despite the lack of validation of the block size from the filesystem part, the
> block layer should avoid and deny such invalid IO requests. The fact is, the
> block layer already validates such IO request via
> generic_make_request_checks()->bio_check_eod(), but, before the IO layer could
> reach there, guard_bio_eod() has already corrupted the bio.
>
> >
> > >
> > > Kudos to Eric Sandeen for coming up with the reproducer above
> > >
> > > Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
> > > ---
> > >
> > > P.S. I'm not 100% proficient in bio internals, so I'm not sure if this is the
> > > right fix, so comments are much appreciated.
> > > Thanks
> > >
> > >  fs/buffer.c | 7 +++++++
> > >  1 file changed, 7 insertions(+)
> > >
> > > diff --git a/fs/buffer.c b/fs/buffer.c
> > > index 784de3dbcf28..32dc5cd2f6ba 100644
> > > --- a/fs/buffer.c
> > > +++ b/fs/buffer.c
> > > @@ -3063,6 +3063,13 @@ void guard_bio_eod(int op, struct bio *bio)
> > >         /* Uhhuh. We've got a bio that straddles the device size! */
> > >         truncated_bytes = bio->bi_iter.bi_size - (maxsector << 9);
> > >
> > > +       /*
> > > +        * The bio contains more than one segment which spans EOD, just return
> > > +        * and let IO layer turn it into an EIO
> > > +        */
> > > +       if (truncated_bytes > PAGE_SIZE)
> > > +               return;
> > > +
> >
> > The correct check should be:
> >
> > +       if (truncated_bytes > bvec->bv_len)
> > +               return;
>
> Thanks, I'll test it.
>
> >
> > Also it should be helpful to warn on this bad bio, but it is fine to not do it
> > here cause block layer logs this bad bio too.
> >
> > BTW, this is just a workaround, not one real fix on the reported problem.
>
> A suggestion of what would be the real fix is much appreciated then.

Seems this issue can't be reproduced in ISO, maybe it is related with
the specific truncated size.

In theory, it is better to return failure during mounting because this fs image
is broken, or the fs code should check the bio during making the bio given
fs has the disk size info. However, maybe either one is hard to implement.

So I think this patch is good, if the check is fixed.

Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] fs: fix guard_bio_eod to check for real EOD errors
  2019-02-26  2:03     ` Ming Lei
@ 2019-02-26  9:37       ` Carlos Maiolino
  0 siblings, 0 replies; 7+ messages in thread
From: Carlos Maiolino @ 2019-02-26  9:37 UTC (permalink / raw)
  To: Ming Lei; +Cc: Linux FS Devel, linux-block

On Tue, Feb 26, 2019 at 10:03:04AM +0800, Ming Lei wrote:
> On Mon, Feb 25, 2019 at 9:26 PM Carlos Maiolino <cmaiolino@redhat.com> wrote:
> >
> > Hi,
> >
> > On Sat, Feb 23, 2019 at 07:33:21AM +0800, Ming Lei wrote:
> > > Hi Carlos,
> > >
> > > Cc block list given it is related with interface between fs and block layer.
> > >
> > > On Fri, Feb 22, 2019 at 10:14 PM Carlos Maiolino <cmaiolino@redhat.com> wrote:
> > > >
> > > > guard_bio_eod() can truncate a segment in bio to allow it to do IO on
> > > > odd last sectors of a device.
> > > >
> > > > It already checks if the IO starts past EOD, but it does not consider
> > > > the possibility of an IO request starting within device boundaries can
> > > > contain more than one segment past EOD.
> > > >
> > > > In such cases, truncated_bytes can be bigger than PAGE_SIZE, and will
> > > > underflow bvec->bv_len.
> > >
> > > It can cause memory corruption even for < PAGE_SIZE, also it can be correct
> > > to see > PAGE_SIZE truncated_bytes:
> > >
> > > - xfs is going to support big block size which may be 64k
> > > - multi-page bvec patches have been in block tree, and it is planned to land
> > > v5.1, so > PAGE_SIZE truncating may come because .bv_len can be much
> > > bigger than PAGE_SIZE
> > >
> > > - suppose fs block size is 4k, bio sector is 1022 and size is 4k, and
> > > disk size is
> > > 1024, last bvec is (0, 1024), then 'truncated_bytes' will be 3k, so
> > > the check can't
> > > capture this case, and memory corruption still may be caused.
> > >
> > > >
> > > > Fix this by checking if truncated_bytes is lower than PAGE_SIZE.
> > > >
> > > > This situation has been found on filesystems such as isofs and vfat,
> > > > which doesn't check the device size before mount, if the device is
> > > > smaller than the filesystem itself, a readahead on such filesystem,
> > > > which spans EOD, can trigger this situation, leading a call to
> > > > zero_user() with a wrong size possibly corrupting memory.
> > > >
> > > > I didn't see any crash, or didn't let the system run long enough to
> > > > check if memory corruption will be hit somewhere, but adding
> > > > instrumentation to guard_bio_end() to check truncated_bytes size, was
> > > > enough to see the error.
> > > >
> > > > The following script can trigger the error.
> > > >
> > > > MNT=/mnt
> > > > IMG=./DISK.img
> > > > DEV=/dev/loop0
> > > >
> > > > mkfs.vfat $IMG
> > > > mount $IMG $MNT
> > > > cp -R /etc $MNT &> /dev/null
> > > > umount $MNT
> > > >
> > > > losetup -D
> > > >
> > > > losetup --find --show --sizelimit 16247280 $IMG
> > > > mount $DEV $MNT
> > > >
> > > > find $MNT -type f -exec cat {} + >/dev/null
> > >
> > > BTW, which kernel is  for this issue? Did you investigate why the bad bio
> > > is made?
> >
> > The problem was reproduced, and the patch also wrote based on linux-next tree,
> > tag next-20190207.
> >
> > The bad bio is made while attempting to read the end of a filesystem, which goes
> > beyond EOD, as described in the patch description itself, it may happen for
> > filesystems which doesn't check the device size before mounting.
> 
> OK.
> 
> >
> > Despite the lack of validation of the block size from the filesystem part, the
> > block layer should avoid and deny such invalid IO requests. The fact is, the
> > block layer already validates such IO request via
> > generic_make_request_checks()->bio_check_eod(), but, before the IO layer could
> > reach there, guard_bio_eod() has already corrupted the bio.
> >
> > >
> > > >
> > > > Kudos to Eric Sandeen for coming up with the reproducer above
> > > >
> > > > Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
> > > > ---
> > > >
> > > > P.S. I'm not 100% proficient in bio internals, so I'm not sure if this is the
> > > > right fix, so comments are much appreciated.
> > > > Thanks
> > > >
> > > >  fs/buffer.c | 7 +++++++
> > > >  1 file changed, 7 insertions(+)
> > > >
> > > > diff --git a/fs/buffer.c b/fs/buffer.c
> > > > index 784de3dbcf28..32dc5cd2f6ba 100644
> > > > --- a/fs/buffer.c
> > > > +++ b/fs/buffer.c
> > > > @@ -3063,6 +3063,13 @@ void guard_bio_eod(int op, struct bio *bio)
> > > >         /* Uhhuh. We've got a bio that straddles the device size! */
> > > >         truncated_bytes = bio->bi_iter.bi_size - (maxsector << 9);
> > > >
> > > > +       /*
> > > > +        * The bio contains more than one segment which spans EOD, just return
> > > > +        * and let IO layer turn it into an EIO
> > > > +        */
> > > > +       if (truncated_bytes > PAGE_SIZE)
> > > > +               return;
> > > > +
> > >
> > > The correct check should be:
> > >
> > > +       if (truncated_bytes > bvec->bv_len)
> > > +               return;
> >
> > Thanks, I'll test it.
> >
> > >
> > > Also it should be helpful to warn on this bad bio, but it is fine to not do it
> > > here cause block layer logs this bad bio too.
> > >
> > > BTW, this is just a workaround, not one real fix on the reported problem.
> >
> > A suggestion of what would be the real fix is much appreciated then.
> 
> Seems this issue can't be reproduced in ISO, maybe it is related with
> the specific truncated size.
> 
> In theory, it is better to return failure during mounting because this fs image
> is broken, or the fs code should check the bio during making the bio given
> fs has the disk size info. However, maybe either one is hard to implement.

I do agree the filesystem should the checking it, however, I do also believe the
bio layer should still validate it.
Notice though, this was not reproduced using any crafted filesystem image, all
it needs is a filesystem which does not validate the underlying block device
size, and the same block device was shrank. The filesystem does not necessarily
need to be corrupted.

> 
> So I think this patch is good, if the check is fixed.
> 

Ok, I'll resubmit the patch with your suggestion. Thanks

> Thanks,
> Ming Lei

-- 
Carlos

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-02-26  9:37 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-22 14:13 [PATCH] fs: fix guard_bio_eod to check for real EOD errors Carlos Maiolino
2019-02-22 14:47 ` Carlos Maiolino
2019-02-22 23:33 ` Ming Lei
2019-02-24 21:36   ` Dave Chinner
2019-02-25 13:26   ` Carlos Maiolino
2019-02-26  2:03     ` Ming Lei
2019-02-26  9:37       ` Carlos Maiolino

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).