linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Fix nasty 32-bit overflow bug in buffer i/o code.
@ 2014-09-22  0:53 Anton Altaparmakov
  2014-09-22  4:43 ` Hugh Dickins
  2014-09-22 15:18 ` Linus Torvalds
  0 siblings, 2 replies; 10+ messages in thread
From: Anton Altaparmakov @ 2014-09-22  0:53 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, linux-kernel, linux-fsdevel, hughd, stable

Any code that uses __getblk() and thus bread(), breadahead(), sb_bread(), 
sb_breadahead(), sb_getblk(), and calls it using a 64-bit block on a 
32-bit arch (where "long" is 32-bit) causes an inifinite loop in 
__getblk_slow() with an infinite stream of errors logged to dmesg like 
this:

__find_get_block_slow() failed. block=6740375944, b_blocknr=2445408648
b_state=0x00000020, b_size=512
device sda1 blocksize: 512

Note how in hex block is 0x191C1F988 and b_blocknr is 0x91C1F988 i.e. the 
top 32-bits are missing (in this case the 0x1 at the top).

This is because grow_dev_page() was broken in commit 676ce6d5ca30: "block: 
replace __getblk_slow misfix by grow_dev_page fix" by Hugh Dickins so that 
it now has a 32-bit overflow due to shifting the block value to the right 
so it fits in 32-bits and storing the result in pgoff_t variable which is 
32-bit and then passing the pgoff_t variable left-shifted as the block 
number which causes the top bits to get lost as the pgoff_t is not type 
cast to sector_t / 64-bit before the shift.

This patch fixes this issue by type casting "index" to sector_t before 
doing the left shift.

Note this is not a theoretical bug but has been seen in the field on a 
4TiB hard drive with logical sector size 512 bytes.

This patch has been verified to fix the infinite loop problem on 3.17-rc5 
kernel using a 4TB disk image mounted using "-o loop".  Without this patch 
doing a "find /nt" where /nt is an NTFS volume causes the inifinite loop 
100% reproducibly whilst with the patch it works fine as expected.

Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
Cc: stable@vger.kernel.org # 3.6 3.8 3.10 3.12 3.14 3.16
---

Linus, can you please apply this?  Alternatively, Andrew, can you please 
pick this up and send it to Linus?

It would be good it it can be applied for 3.17 as well as to all stable 
kernels >= 3.6 as they are all broken!

Thanks a lot in advance!

Best regards,

	Anton
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer, http://www.linux-ntfs.org/

diff --git a/fs/buffer.c b/fs/buffer.c
index 8f05111..3588a80 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -1022,7 +1022,8 @@ grow_dev_page(struct block_device *bdev, sector_t block,
 		bh = page_buffers(page);
 		if (bh->b_size == size) {
 			end_block = init_page_buffers(page, bdev,
-						index << sizebits, size);
+						(sector_t)index << sizebits,
+						size);
 			goto done;
 		}
 		if (!try_to_free_buffers(page))
@@ -1043,7 +1044,8 @@ grow_dev_page(struct block_device *bdev, sector_t block,
 	 */
 	spin_lock(&inode->i_mapping->private_lock);
 	link_dev_buffers(page, bh);
-	end_block = init_page_buffers(page, bdev, index << sizebits, size);
+	end_block = init_page_buffers(page, bdev, (sector_t)index << sizebits,
+			size);
 	spin_unlock(&inode->i_mapping->private_lock);
 done:
 	ret = (block < end_block) ? 1 : -ENXIO;

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] Fix nasty 32-bit overflow bug in buffer i/o code.
  2014-09-22  0:53 [PATCH] Fix nasty 32-bit overflow bug in buffer i/o code Anton Altaparmakov
@ 2014-09-22  4:43 ` Hugh Dickins
  2014-09-22  9:30   ` Anton Altaparmakov
  2014-09-22 15:18 ` Linus Torvalds
  1 sibling, 1 reply; 10+ messages in thread
From: Hugh Dickins @ 2014-09-22  4:43 UTC (permalink / raw)
  To: Anton Altaparmakov
  Cc: Linus Torvalds, Andrew Morton, linux-kernel, linux-fsdevel,
	hughd, stable

On Mon, 22 Sep 2014, Anton Altaparmakov wrote:

> Any code that uses __getblk() and thus bread(), breadahead(), sb_bread(), 
> sb_breadahead(), sb_getblk(), and calls it using a 64-bit block on a 
> 32-bit arch (where "long" is 32-bit) causes an inifinite loop in 
> __getblk_slow() with an infinite stream of errors logged to dmesg like 
> this:
> 
> __find_get_block_slow() failed. block=6740375944, b_blocknr=2445408648
> b_state=0x00000020, b_size=512
> device sda1 blocksize: 512
> 
> Note how in hex block is 0x191C1F988 and b_blocknr is 0x91C1F988 i.e. the 
> top 32-bits are missing (in this case the 0x1 at the top).
> 
> This is because grow_dev_page() was broken in commit 676ce6d5ca30: "block: 
> replace __getblk_slow misfix by grow_dev_page fix" by Hugh Dickins so that 
> it now has a 32-bit overflow due to shifting the block value to the right 
> so it fits in 32-bits and storing the result in pgoff_t variable which is 
> 32-bit and then passing the pgoff_t variable left-shifted as the block 
> number which causes the top bits to get lost as the pgoff_t is not type 
> cast to sector_t / 64-bit before the shift.
> 
> This patch fixes this issue by type casting "index" to sector_t before 
> doing the left shift.
> 
> Note this is not a theoretical bug but has been seen in the field on a 
> 4TiB hard drive with logical sector size 512 bytes.
> 
> This patch has been verified to fix the infinite loop problem on 3.17-rc5 
> kernel using a 4TB disk image mounted using "-o loop".  Without this patch 
> doing a "find /nt" where /nt is an NTFS volume causes the inifinite loop 
> 100% reproducibly whilst with the patch it works fine as expected.
> 
> Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
> Cc: stable@vger.kernel.org # 3.6 3.8 3.10 3.12 3.14 3.16

Acked-by: Hugh Dickins <hughd@google.com>

Yes indeed, that's bad, and entirely my fault (though it took me a while
to see how the "block = index << sizebits" which was there before was okay,
but my passing "index << sizebits" as arg to function very much not okay).
Thank you, Anton.

But I see my commit was marked for stable 3.0 3.2 3.4 3.5: so your fix
is needed in 3.2 and 3.4 longterm also (the others now beyond life).

Hugh

> ---
> 
> Linus, can you please apply this?  Alternatively, Andrew, can you please 
> pick this up and send it to Linus?
> 
> It would be good it it can be applied for 3.17 as well as to all stable 
> kernels >= 3.6 as they are all broken!
> 
> Thanks a lot in advance!
> 
> Best regards,
> 
> 	Anton
> -- 
> Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
> Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
> Linux NTFS maintainer, http://www.linux-ntfs.org/
> 
> diff --git a/fs/buffer.c b/fs/buffer.c
> index 8f05111..3588a80 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -1022,7 +1022,8 @@ grow_dev_page(struct block_device *bdev, sector_t block,
>  		bh = page_buffers(page);
>  		if (bh->b_size == size) {
>  			end_block = init_page_buffers(page, bdev,
> -						index << sizebits, size);
> +						(sector_t)index << sizebits,
> +						size);
>  			goto done;
>  		}
>  		if (!try_to_free_buffers(page))
> @@ -1043,7 +1044,8 @@ grow_dev_page(struct block_device *bdev, sector_t block,
>  	 */
>  	spin_lock(&inode->i_mapping->private_lock);
>  	link_dev_buffers(page, bh);
> -	end_block = init_page_buffers(page, bdev, index << sizebits, size);
> +	end_block = init_page_buffers(page, bdev, (sector_t)index << sizebits,
> +			size);
>  	spin_unlock(&inode->i_mapping->private_lock);
>  done:
>  	ret = (block < end_block) ? 1 : -ENXIO;

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Fix nasty 32-bit overflow bug in buffer i/o code.
  2014-09-22  4:43 ` Hugh Dickins
@ 2014-09-22  9:30   ` Anton Altaparmakov
  2014-09-22 10:36     ` Hugh Dickins
  0 siblings, 1 reply; 10+ messages in thread
From: Anton Altaparmakov @ 2014-09-22  9:30 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Linus Torvalds, Andrew Morton, linux-kernel, linux-fsdevel, stable

Hi Hugh,

On 22 Sep 2014, at 05:43, Hugh Dickins <hughd@google.com> wrote:
> On Mon, 22 Sep 2014, Anton Altaparmakov wrote:
>> Any code that uses __getblk() and thus bread(), breadahead(), sb_bread(), 
>> sb_breadahead(), sb_getblk(), and calls it using a 64-bit block on a 
>> 32-bit arch (where "long" is 32-bit) causes an inifinite loop in 
>> __getblk_slow() with an infinite stream of errors logged to dmesg like 
>> this:
>> 
>> __find_get_block_slow() failed. block=6740375944, b_blocknr=2445408648
>> b_state=0x00000020, b_size=512
>> device sda1 blocksize: 512
>> 
>> Note how in hex block is 0x191C1F988 and b_blocknr is 0x91C1F988 i.e. the 
>> top 32-bits are missing (in this case the 0x1 at the top).
>> 
>> This is because grow_dev_page() was broken in commit 676ce6d5ca30: "block: 
>> replace __getblk_slow misfix by grow_dev_page fix" by Hugh Dickins so that 
>> it now has a 32-bit overflow due to shifting the block value to the right 
>> so it fits in 32-bits and storing the result in pgoff_t variable which is 
>> 32-bit and then passing the pgoff_t variable left-shifted as the block 
>> number which causes the top bits to get lost as the pgoff_t is not type 
>> cast to sector_t / 64-bit before the shift.
>> 
>> This patch fixes this issue by type casting "index" to sector_t before 
>> doing the left shift.
>> 
>> Note this is not a theoretical bug but has been seen in the field on a 
>> 4TiB hard drive with logical sector size 512 bytes.
>> 
>> This patch has been verified to fix the infinite loop problem on 3.17-rc5 
>> kernel using a 4TB disk image mounted using "-o loop".  Without this patch 
>> doing a "find /nt" where /nt is an NTFS volume causes the inifinite loop 
>> 100% reproducibly whilst with the patch it works fine as expected.
>> 
>> Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
>> Cc: stable@vger.kernel.org # 3.6 3.8 3.10 3.12 3.14 3.16
> 
> Acked-by: Hugh Dickins <hughd@google.com>
> 
> Yes indeed, that's bad, and entirely my fault (though it took me a while
> to see how the "block = index << sizebits" which was there before was okay,

Ouch.  I failed to see that (I admit I did not pay too much attention to the original code - I just used "git blame" to find out which commit put that code in place - my bad).  It was never ok!  I went back to 2.6.12-rc2 (original commit to git Linus' kernel repository) and that is also affected.  That implies this is the first time anyone has used a 4TiB disk with 512 byte sectors with NTFS where Windows had previously created a file/directory with an attribute list attribute in it.  -  That is the only metadata type we use sb_bread() for and the disk image from the customer does contain it and I verified that is where the inifinite loop comes from.

So it appears sb_bread() and friends have always been broken on 32-bit architectures when accessing blocks outside the range 2^32 - 1 and it appears google finds lots of occurrences of such infinite loops being reported but the fixes have always been to not make the calls in the first place.  No-one seems to have recognized that there is a genuine problem here before.

Still my patch stands correct and should be applied to all kernel versions that have your patch and older kernels should have an equivalent patch applied to fix them, too for people who run very old kernels.

> but my passing "index << sizebits" as arg to function very much not okay).
> Thank you, Anton.

You are welcome.

> But I see my commit was marked for stable 3.0 3.2 3.4 3.5: so your fix
> is needed in 3.2 and 3.4 longterm also (the others now beyond life).

You are quite right.  It needs to go back to all those kernels, too.  Thank you!

Best regards,

	Anton

> Hugh
> 
>> ---
>> 
>> Linus, can you please apply this?  Alternatively, Andrew, can you please 
>> pick this up and send it to Linus?
>> 
>> It would be good it it can be applied for 3.17 as well as to all stable 
>> kernels >= 3.6 as they are all broken!
>> 
>> Thanks a lot in advance!
>> 
>> Best regards,
>> 
>> 	Anton
>> -- 
>> Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
>> Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
>> Linux NTFS maintainer, http://www.linux-ntfs.org/
>> 
>> diff --git a/fs/buffer.c b/fs/buffer.c
>> index 8f05111..3588a80 100644
>> --- a/fs/buffer.c
>> +++ b/fs/buffer.c
>> @@ -1022,7 +1022,8 @@ grow_dev_page(struct block_device *bdev, sector_t block,
>> 		bh = page_buffers(page);
>> 		if (bh->b_size == size) {
>> 			end_block = init_page_buffers(page, bdev,
>> -						index << sizebits, size);
>> +						(sector_t)index << sizebits,
>> +						size);
>> 			goto done;
>> 		}
>> 		if (!try_to_free_buffers(page))
>> @@ -1043,7 +1044,8 @@ grow_dev_page(struct block_device *bdev, sector_t block,
>> 	 */
>> 	spin_lock(&inode->i_mapping->private_lock);
>> 	link_dev_buffers(page, bh);
>> -	end_block = init_page_buffers(page, bdev, index << sizebits, size);
>> +	end_block = init_page_buffers(page, bdev, (sector_t)index << sizebits,
>> +			size);
>> 	spin_unlock(&inode->i_mapping->private_lock);
>> done:
>> 	ret = (block < end_block) ? 1 : -ENXIO;

-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
University of Cambridge Information Services, Roger Needham Building
7 JJ Thomson Avenue, Cambridge, CB3 0RB, UK


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Fix nasty 32-bit overflow bug in buffer i/o code.
  2014-09-22  9:30   ` Anton Altaparmakov
@ 2014-09-22 10:36     ` Hugh Dickins
  2014-09-22 11:01       ` Anton Altaparmakov
  0 siblings, 1 reply; 10+ messages in thread
From: Hugh Dickins @ 2014-09-22 10:36 UTC (permalink / raw)
  To: Anton Altaparmakov
  Cc: Hugh Dickins, Linus Torvalds, Andrew Morton, linux-kernel,
	linux-fsdevel, stable

On Mon, 22 Sep 2014, Anton Altaparmakov wrote:

> Hi Hugh,
> 
> On 22 Sep 2014, at 05:43, Hugh Dickins <hughd@google.com> wrote:
> > On Mon, 22 Sep 2014, Anton Altaparmakov wrote:
> >> Any code that uses __getblk() and thus bread(), breadahead(), sb_bread(), 
> >> sb_breadahead(), sb_getblk(), and calls it using a 64-bit block on a 
> >> 32-bit arch (where "long" is 32-bit) causes an inifinite loop in 
> >> __getblk_slow() with an infinite stream of errors logged to dmesg like 
> >> this:
> >> 
> >> __find_get_block_slow() failed. block=6740375944, b_blocknr=2445408648
> >> b_state=0x00000020, b_size=512
> >> device sda1 blocksize: 512
> >> 
> >> Note how in hex block is 0x191C1F988 and b_blocknr is 0x91C1F988 i.e. the 
> >> top 32-bits are missing (in this case the 0x1 at the top).
> >> 
> >> This is because grow_dev_page() was broken in commit 676ce6d5ca30: "block: 
> >> replace __getblk_slow misfix by grow_dev_page fix" by Hugh Dickins so that 
> >> it now has a 32-bit overflow due to shifting the block value to the right 
> >> so it fits in 32-bits and storing the result in pgoff_t variable which is 
> >> 32-bit and then passing the pgoff_t variable left-shifted as the block 
> >> number which causes the top bits to get lost as the pgoff_t is not type 
> >> cast to sector_t / 64-bit before the shift.
> >> 
> >> This patch fixes this issue by type casting "index" to sector_t before 
> >> doing the left shift.
> >> 
> >> Note this is not a theoretical bug but has been seen in the field on a 
> >> 4TiB hard drive with logical sector size 512 bytes.
> >> 
> >> This patch has been verified to fix the infinite loop problem on 3.17-rc5 
> >> kernel using a 4TB disk image mounted using "-o loop".  Without this patch 
> >> doing a "find /nt" where /nt is an NTFS volume causes the inifinite loop 
> >> 100% reproducibly whilst with the patch it works fine as expected.
> >> 
> >> Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
> >> Cc: stable@vger.kernel.org # 3.6 3.8 3.10 3.12 3.14 3.16
> > 
> > Acked-by: Hugh Dickins <hughd@google.com>
> > 
> > Yes indeed, that's bad, and entirely my fault (though it took me a while
> > to see how the "block = index << sizebits" which was there before was okay,
> 
> Ouch.  I failed to see that (I admit I did not pay too much attention to the original code - I just used "git blame" to find out which commit put that code in place - my bad).  It was never ok!  I went back to 2.6.12-rc2 (original commit to git Linus' kernel repository) and that is also affected.  That implies this is the first time anyone has used a 4TiB disk with 512 byte sectors with NTFS where Windows had previously created a file/directory with an attribute list attribute in it.  -  That is the only metadata type we use sb_bread() for and the disk image from the customer does contain it and I verified that is where the inifinite loop comes from.

Ow, that line needs some truncating itself :)

You generously interpret my words as seeing that for myself.
No, thank you for following up: I had persuaded myself when writing
before, that index would be promoted from pgoff_t to sector_t before
shifting in the original assignment, but not when I passed it as arg.

I've resorted to writing a proggy to check, and it looks like I was
earlier confused, and you are right, and that code was "always" wrong.

Not much use of 4TiB disks on 32-bit kernels I suppose.

> 
> So it appears sb_bread() and friends have always been broken on 32-bit architectures when accessing blocks outside the range 2^32 - 1 and it appears google finds lots of occurrences of such infinite loops being reported but the fixes have always been to not make the calls in the first place.  No-one seems to have recognized that there is a genuine problem here before.
> 
> Still my patch stands correct and should be applied to all kernel versions that have your patch and older kernels should have an equivalent patch applied to fix them, too for people who run very old kernels.

Yes, though I'm now uncertain whether your patch is a bugfix or an
enhancement.  Definitely a bugfix, given the infinite loops.  But the
oldest code ("index = block >> sizebits; block = index << sizebits;")
is so clearly trying to truncate block, that I wonder what departing
from that might be letting us in for.

I see Andrew did 2.6.19's intervening e5657933863f ("grow_buffers()
infinite loop fix"): let's hope that he has a clearer head in the
morning than I have now - there's a chance that he might think it
safer to extend that check to exclude your case, than take your patch.

I hope not, that would not please you; but right now I'm ruling
myself out of grasping the issues here!

Hugh

> 
> > but my passing "index << sizebits" as arg to function very much not okay).
> > Thank you, Anton.
> 
> You are welcome.
> 
> > But I see my commit was marked for stable 3.0 3.2 3.4 3.5: so your fix
> > is needed in 3.2 and 3.4 longterm also (the others now beyond life).
> 
> You are quite right.  It needs to go back to all those kernels, too.  Thank you!
> 
> Best regards,
> 
> 	Anton
> 
> > Hugh
> > 
> >> ---
> >> 
> >> Linus, can you please apply this?  Alternatively, Andrew, can you please 
> >> pick this up and send it to Linus?
> >> 
> >> It would be good it it can be applied for 3.17 as well as to all stable 
> >> kernels >= 3.6 as they are all broken!
> >> 
> >> Thanks a lot in advance!
> >> 
> >> Best regards,
> >> 
> >> 	Anton
> >> -- 
> >> Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
> >> Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
> >> Linux NTFS maintainer, http://www.linux-ntfs.org/
> >> 
> >> diff --git a/fs/buffer.c b/fs/buffer.c
> >> index 8f05111..3588a80 100644
> >> --- a/fs/buffer.c
> >> +++ b/fs/buffer.c
> >> @@ -1022,7 +1022,8 @@ grow_dev_page(struct block_device *bdev, sector_t block,
> >> 		bh = page_buffers(page);
> >> 		if (bh->b_size == size) {
> >> 			end_block = init_page_buffers(page, bdev,
> >> -						index << sizebits, size);
> >> +						(sector_t)index << sizebits,
> >> +						size);
> >> 			goto done;
> >> 		}
> >> 		if (!try_to_free_buffers(page))
> >> @@ -1043,7 +1044,8 @@ grow_dev_page(struct block_device *bdev, sector_t block,
> >> 	 */
> >> 	spin_lock(&inode->i_mapping->private_lock);
> >> 	link_dev_buffers(page, bh);
> >> -	end_block = init_page_buffers(page, bdev, index << sizebits, size);
> >> +	end_block = init_page_buffers(page, bdev, (sector_t)index << sizebits,
> >> +			size);
> >> 	spin_unlock(&inode->i_mapping->private_lock);
> >> done:
> >> 	ret = (block < end_block) ? 1 : -ENXIO;
> 
> -- 
> Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
> University of Cambridge Information Services, Roger Needham Building
> 7 JJ Thomson Avenue, Cambridge, CB3 0RB, UK
> 
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Fix nasty 32-bit overflow bug in buffer i/o code.
  2014-09-22 10:36     ` Hugh Dickins
@ 2014-09-22 11:01       ` Anton Altaparmakov
  0 siblings, 0 replies; 10+ messages in thread
From: Anton Altaparmakov @ 2014-09-22 11:01 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Linus Torvalds, Andrew Morton, linux-kernel, linux-fsdevel, stable

Hi,

On 22 Sep 2014, at 11:36, Hugh Dickins <hughd@google.com> wrote:
> On Mon, 22 Sep 2014, Anton Altaparmakov wrote:
>> On 22 Sep 2014, at 05:43, Hugh Dickins <hughd@google.com> wrote:
>>> On Mon, 22 Sep 2014, Anton Altaparmakov wrote:
>>>> Any code that uses __getblk() and thus bread(), breadahead(), sb_bread(), 
>>>> sb_breadahead(), sb_getblk(), and calls it using a 64-bit block on a 
>>>> 32-bit arch (where "long" is 32-bit) causes an inifinite loop in 
>>>> __getblk_slow() with an infinite stream of errors logged to dmesg like 
>>>> this:
>>>> 
>>>> __find_get_block_slow() failed. block=6740375944, b_blocknr=2445408648
>>>> b_state=0x00000020, b_size=512
>>>> device sda1 blocksize: 512
>>>> 
>>>> Note how in hex block is 0x191C1F988 and b_blocknr is 0x91C1F988 i.e. the 
>>>> top 32-bits are missing (in this case the 0x1 at the top).
>>>> 
>>>> This is because grow_dev_page() was broken in commit 676ce6d5ca30: "block: 
>>>> replace __getblk_slow misfix by grow_dev_page fix" by Hugh Dickins so that 
>>>> it now has a 32-bit overflow due to shifting the block value to the right 
>>>> so it fits in 32-bits and storing the result in pgoff_t variable which is 
>>>> 32-bit and then passing the pgoff_t variable left-shifted as the block 
>>>> number which causes the top bits to get lost as the pgoff_t is not type 
>>>> cast to sector_t / 64-bit before the shift.
>>>> 
>>>> This patch fixes this issue by type casting "index" to sector_t before 
>>>> doing the left shift.
>>>> 
>>>> Note this is not a theoretical bug but has been seen in the field on a 
>>>> 4TiB hard drive with logical sector size 512 bytes.
>>>> 
>>>> This patch has been verified to fix the infinite loop problem on 3.17-rc5 
>>>> kernel using a 4TB disk image mounted using "-o loop".  Without this patch 
>>>> doing a "find /nt" where /nt is an NTFS volume causes the inifinite loop 
>>>> 100% reproducibly whilst with the patch it works fine as expected.
>>>> 
>>>> Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
>>>> Cc: stable@vger.kernel.org # 3.6 3.8 3.10 3.12 3.14 3.16
>>> 
>>> Acked-by: Hugh Dickins <hughd@google.com>
>>> 
>>> Yes indeed, that's bad, and entirely my fault (though it took me a while
>>> to see how the "block = index << sizebits" which was there before was okay,
>> 
>> Ouch.  I failed to see that (I admit I did not pay too much attention to the original code - I just used "git blame" to find out which commit put that code in place - my bad).  It was never ok!  I went back to 2.6.12-rc2 (original commit to git Linus' kernel repository) and that is also affected.  That implies this is the first time anyone has used a 4TiB disk with 512 byte sectors with NTFS where Windows had previously created a file/directory with an attribute list attribute in it.  -  That is the only metadata type we use sb_bread() for and the disk image from the customer does contain it and I verified that is where the inifinite loop comes from.
> 
> Ow, that line needs some truncating itself :)
> 
> You generously interpret my words as seeing that for myself.
> No, thank you for following up: I had persuaded myself when writing
> before, that index would be promoted from pgoff_t to sector_t before
> shifting in the original assignment, but not when I passed it as arg.
> 
> I've resorted to writing a proggy to check, and it looks like I was
> earlier confused, and you are right, and that code was "always" wrong.
> 
> Not much use of 4TiB disks on 32-bit kernels I suppose.

Actually on embedded devices like broadband routers/home network NAS boxes/PVR boxes, etc it is very common to have 32-bit (on ARM/MIPS or more recently also x86) and more and more people are plugging in larger disks to store their music/videos/movies, etc...  So this is something that is on the up and going to become an increasing problem as large disks get cheaper and cheaper.

>> So it appears sb_bread() and friends have always been broken on 32-bit architectures when accessing blocks outside the range 2^32 - 1 and it appears google finds lots of occurrences of such infinite loops being reported but the fixes have always been to not make the calls in the first place.  No-one seems to have recognized that there is a genuine problem here before.
>> 
>> Still my patch stands correct and should be applied to all kernel versions that have your patch and older kernels should have an equivalent patch applied to fix them, too for people who run very old kernels.
> 
> Yes, though I'm now uncertain whether your patch is a bugfix or an
> enhancement.  Definitely a bugfix, given the infinite loops.  But the
> oldest code ("index = block >> sizebits; block = index << sizebits;")
> is so clearly trying to truncate block, that I wonder what departing
> from that might be letting us in for.

Having looked at it, I think I understand the original code and intention - it is not truncating the top bits, it is truncating the bottom bits, i.e. it is rounding down so that "block" becomes the first block in the page as that is what is needed to be passed into grow_dev_page().  And the shift right followed by shift left makes those bottom bits "fall off".

A perfectly valid alternative way would be to do a logical "and" with appropriate mask to mask out the bottom bits which would have the same effect but given the shift is already done to get the page index it is probably the same amount of CPU cycles to do a shift left as to do a logical and of the original value and as the page index will be in current registers probably even more efficient than reloading the original value from the stack.

> I see Andrew did 2.6.19's intervening e5657933863f ("grow_buffers()
> infinite loop fix"): let's hope that he has a clearer head in the
> morning than I have now - there's a chance that he might think it
> safer to extend that check to exclude your case, than take your patch.

I really don't think so.  His commit added a check to ensure the block is addressable using the page cache which is good but completely unrelated to not being able to access blocks outside 32-bits.

The whole point of having CONFIG_LBDAF is to allow 64-bit values for "block" on 32-bit architectures so it cannot be the intention here to limit it to 32-bits.

This was either an oversight when CONFIG_LBDAF was added back in the day or was a bug that was introduced since when whoever wrote that code either forgot that sector_t can be 64-bit even when pgoff_t is 32-bit or they failed to notice that the compiler will not extend pgoff_t to 64-bit before doing the left shift without an explicit type cast being present.

> I hope not, that would not please you; but right now I'm ruling
> myself out of grasping the issues here!

Fair enough.  I can only hope that Linus/Andrew will agree with my understanding which I am confident is correct.  (-:

Tthe fact that reading the disk works fine with my patch applied is quite good indication that other code related paths are fine as the 64-bit value passes all the way down the stack and the correct metadata is returned from the disk sector (verified by comparing what I can read from the disk using simple "dd" command).

Best regards,

	Anton

> Hugh
> 
>> 
>>> but my passing "index << sizebits" as arg to function very much not okay).
>>> Thank you, Anton.
>> 
>> You are welcome.
>> 
>>> But I see my commit was marked for stable 3.0 3.2 3.4 3.5: so your fix
>>> is needed in 3.2 and 3.4 longterm also (the others now beyond life).
>> 
>> You are quite right.  It needs to go back to all those kernels, too.  Thank you!
>> 
>> Best regards,
>> 
>> 	Anton
>> 
>>> Hugh
>>> 
>>>> ---
>>>> 
>>>> Linus, can you please apply this?  Alternatively, Andrew, can you please 
>>>> pick this up and send it to Linus?
>>>> 
>>>> It would be good it it can be applied for 3.17 as well as to all stable 
>>>> kernels >= 3.6 as they are all broken!
>>>> 
>>>> Thanks a lot in advance!
>>>> 
>>>> Best regards,
>>>> 
>>>> 	Anton
>>>> -- 
>>>> Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
>>>> Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
>>>> Linux NTFS maintainer, http://www.linux-ntfs.org/
>>>> 
>>>> diff --git a/fs/buffer.c b/fs/buffer.c
>>>> index 8f05111..3588a80 100644
>>>> --- a/fs/buffer.c
>>>> +++ b/fs/buffer.c
>>>> @@ -1022,7 +1022,8 @@ grow_dev_page(struct block_device *bdev, sector_t block,
>>>> 		bh = page_buffers(page);
>>>> 		if (bh->b_size == size) {
>>>> 			end_block = init_page_buffers(page, bdev,
>>>> -						index << sizebits, size);
>>>> +						(sector_t)index << sizebits,
>>>> +						size);
>>>> 			goto done;
>>>> 		}
>>>> 		if (!try_to_free_buffers(page))
>>>> @@ -1043,7 +1044,8 @@ grow_dev_page(struct block_device *bdev, sector_t block,
>>>> 	 */
>>>> 	spin_lock(&inode->i_mapping->private_lock);
>>>> 	link_dev_buffers(page, bh);
>>>> -	end_block = init_page_buffers(page, bdev, index << sizebits, size);
>>>> +	end_block = init_page_buffers(page, bdev, (sector_t)index << sizebits,
>>>> +			size);
>>>> 	spin_unlock(&inode->i_mapping->private_lock);
>>>> done:
>>>> 	ret = (block < end_block) ? 1 : -ENXIO;

-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
University of Cambridge Information Services, Roger Needham Building
7 JJ Thomson Avenue, Cambridge, CB3 0RB, UK


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Fix nasty 32-bit overflow bug in buffer i/o code.
  2014-09-22  0:53 [PATCH] Fix nasty 32-bit overflow bug in buffer i/o code Anton Altaparmakov
  2014-09-22  4:43 ` Hugh Dickins
@ 2014-09-22 15:18 ` Linus Torvalds
  2014-09-22 15:24   ` Linus Torvalds
  2014-09-22 15:29   ` Anton Altaparmakov
  1 sibling, 2 replies; 10+ messages in thread
From: Linus Torvalds @ 2014-09-22 15:18 UTC (permalink / raw)
  To: Anton Altaparmakov
  Cc: Andrew Morton, Linux Kernel Mailing List, linux-fsdevel,
	Hugh Dickins, stable

[-- Attachment #1: Type: text/plain, Size: 270 bytes --]

On Sun, Sep 21, 2014 at 5:53 PM, Anton Altaparmakov <aia21@cam.ac.uk> wrote:
>
> This patch fixes this issue by type casting "index" to sector_t before
> doing the left shift.

Ugh. Does the simpler patch to just pass in "block" work as well?

                    Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/plain, Size: 936 bytes --]

 fs/buffer.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index 8f05111bbb8b..f32d6a3cff38 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -1021,8 +1021,7 @@ grow_dev_page(struct block_device *bdev, sector_t block,
 	if (page_has_buffers(page)) {
 		bh = page_buffers(page);
 		if (bh->b_size == size) {
-			end_block = init_page_buffers(page, bdev,
-						index << sizebits, size);
+			end_block = init_page_buffers(page, bdev, block, size);
 			goto done;
 		}
 		if (!try_to_free_buffers(page))
@@ -1043,7 +1042,7 @@ grow_dev_page(struct block_device *bdev, sector_t block,
 	 */
 	spin_lock(&inode->i_mapping->private_lock);
 	link_dev_buffers(page, bh);
-	end_block = init_page_buffers(page, bdev, index << sizebits, size);
+	end_block = init_page_buffers(page, bdev, block, size);
 	spin_unlock(&inode->i_mapping->private_lock);
 done:
 	ret = (block < end_block) ? 1 : -ENXIO;

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] Fix nasty 32-bit overflow bug in buffer i/o code.
  2014-09-22 15:18 ` Linus Torvalds
@ 2014-09-22 15:24   ` Linus Torvalds
  2014-09-22 15:29   ` Anton Altaparmakov
  1 sibling, 0 replies; 10+ messages in thread
From: Linus Torvalds @ 2014-09-22 15:24 UTC (permalink / raw)
  To: Anton Altaparmakov
  Cc: Andrew Morton, Linux Kernel Mailing List, linux-fsdevel,
	Hugh Dickins, stable

On Mon, Sep 22, 2014 at 8:18 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Ugh. Does the simpler patch to just pass in "block" work as well?

Ugh, never mind, no it does not. It doesn't truncate the sectors to
the beginning of a page boundary.

Will take the patch as-is (will edit the commit message that seems a
bit inaccurate).

             Linus

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Fix nasty 32-bit overflow bug in buffer i/o code.
  2014-09-22 15:18 ` Linus Torvalds
  2014-09-22 15:24   ` Linus Torvalds
@ 2014-09-22 15:29   ` Anton Altaparmakov
  2014-09-22 15:33     ` Linus Torvalds
  1 sibling, 1 reply; 10+ messages in thread
From: Anton Altaparmakov @ 2014-09-22 15:29 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, Linux Kernel Mailing List, linux-fsdevel,
	Hugh Dickins, stable

Hi Linus,

On 22 Sep 2014, at 16:18, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> On Sun, Sep 21, 2014 at 5:53 PM, Anton Altaparmakov <aia21@cam.ac.uk> wrote:
>> 
>> This patch fixes this issue by type casting "index" to sector_t before
>> doing the left shift.
> 
> Ugh. Does the simpler patch to just pass in "block" work as well?

That doesn't work because nothing rounds down "block" to the first block in the page and init_page_buffers() requires "block" to be the first block in the page.

The shift right followed by shift left achieves the "rounding down" required.

You could do "block & ~(sector_t)(size - 1)" instead of "(sector_t)index << sizebits" if you prefer but not sure that is an improvement!

Best regards,

	Anton

>                    Linus
> <patch.diff>


-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
University of Cambridge Information Services, Roger Needham Building
7 JJ Thomson Avenue, Cambridge, CB3 0RB, UK


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Fix nasty 32-bit overflow bug in buffer i/o code.
  2014-09-22 15:29   ` Anton Altaparmakov
@ 2014-09-22 15:33     ` Linus Torvalds
  2014-09-22 15:46       ` Anton Altaparmakov
  0 siblings, 1 reply; 10+ messages in thread
From: Linus Torvalds @ 2014-09-22 15:33 UTC (permalink / raw)
  To: Anton Altaparmakov
  Cc: Andrew Morton, Linux Kernel Mailing List, linux-fsdevel,
	Hugh Dickins, stable

On Mon, Sep 22, 2014 at 8:29 AM, Anton Altaparmakov <aia21@cam.ac.uk> wrote:
>
> You could do "block & ~(sector_t)(size - 1)" instead of "(sector_t)index << sizebits" if you prefer but not sure that is an improvement!

No, it would be even worse. Something like

  block & ~(sector_t)((size >> 9) - 1)

because block is the sector number (ie 512-byte) and size is in bytes.

           Linus

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Fix nasty 32-bit overflow bug in buffer i/o code.
  2014-09-22 15:33     ` Linus Torvalds
@ 2014-09-22 15:46       ` Anton Altaparmakov
  0 siblings, 0 replies; 10+ messages in thread
From: Anton Altaparmakov @ 2014-09-22 15:46 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, Linux Kernel Mailing List, linux-fsdevel,
	Hugh Dickins, stable

Hi Linus,

On 22 Sep 2014, at 16:33, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> On Mon, Sep 22, 2014 at 8:29 AM, Anton Altaparmakov <aia21@cam.ac.uk> wrote:
>> 
>> You could do "block & ~(sector_t)(size - 1)" instead of "(sector_t)index << sizebits" if you prefer but not sure that is an improvement!
> 
> No, it would be even worse. Something like
> 
>  block & ~(sector_t)((size >> 9) - 1)
> 
> because block is the sector number (ie 512-byte) and size is in bytes.

Oops, sorry.  But I think you got it wrong, too as you are ignoring the PAGE_SIZE - as was I but it is what we need to align to in addition to the problem of "size" being in bytes.  So I think the correct mask is actually based on sizebits which reflects the number of blocks per page thus:

	block & ~(sector_t)((1 << sizebits) - 1)

In any case the shift is the lesser evil I think as it is at least obviously correct whilst getting the right mask has taken us a few iterations of correcting each other! (-:

PS. Thank you for taking my patch and correcting the misleading description!

Best regards,

	Anton

>           Linus

-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
University of Cambridge Information Services, Roger Needham Building
7 JJ Thomson Avenue, Cambridge, CB3 0RB, UK


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2014-09-22 15:46 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-22  0:53 [PATCH] Fix nasty 32-bit overflow bug in buffer i/o code Anton Altaparmakov
2014-09-22  4:43 ` Hugh Dickins
2014-09-22  9:30   ` Anton Altaparmakov
2014-09-22 10:36     ` Hugh Dickins
2014-09-22 11:01       ` Anton Altaparmakov
2014-09-22 15:18 ` Linus Torvalds
2014-09-22 15:24   ` Linus Torvalds
2014-09-22 15:29   ` Anton Altaparmakov
2014-09-22 15:33     ` Linus Torvalds
2014-09-22 15:46       ` Anton Altaparmakov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).