All of lore.kernel.org
 help / color / mirror / Atom feed
* BUG: scheduling while atomic in blk_mq codepath?
@ 2014-06-19 15:35 Theodore Ts'o
  2014-06-19 15:59   ` Jens Axboe
  0 siblings, 1 reply; 22+ messages in thread
From: Theodore Ts'o @ 2014-06-19 15:35 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-kernel, linux-scsi

While trying to bisect some problems which were introduced sometime
between 3.15 and 3.16-rc1 (specifically, (1) reads to a block device
at offset 262144 * 4k are failing with a short read, and (2) block
device reads are sometimes causing the entire kernel to hang), the
following BUG got hit.

[    0.000000] Linux version 3.15.0-rc8-06047-gaaeb255 (tytso@closure) (gcc version 4.8.3 (Debian 4.8.3-2) ) #1902 SMP Thu Jun 19 11:16:10 EDT 2014

[....] Checking file systems...fsck from util-linux 2.20.1
/dev/vdg was not cleanly unmounted, check forced.
[    4.161703] BUG: scheduling while atomic: fsck.ext4/2072/0x0000000266.5%    
[    4.163673] no locks held by fsck.ext4/2072.
[    4.164318] Modules linked in:
[    4.164845] CPU: 0 PID: 2072 Comm: fsck.ext4 Not tainted 3.15.0-rc8-06047-gaaeb255 #1902
[    4.166047] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[    4.166917]  00000000 00000000 f52c5ba0 c0832655 f5158610 f52c5bac c082f88a f6501e40
[    4.168188]  f52c5c20 c08362ca c0eb3e40 c0eb3e40 374d3933 00000001 0396a8da 00000000
[    4.169474]  f5158610 f51f1674 f4f46a00 f52c5be4 c015dd4b f4f46a00 f52c5bf0 c015dd5e
[    4.170781] Call Trace:
[    4.171159]  [<c0832655>] dump_stack+0x48/0x60
[    4.171838]  [<c082f88a>] __schedule_bug+0x5c/0x6d
[    4.172572]  [<c08362ca>] __schedule+0x61/0x65a
[    4.173228]  [<c015dd4b>] ? kvm_clock_read+0x1f/0x29
[    4.173977]  [<c015dd5e>] ? kvm_clock_get_cycles+0x9/0xc
[    4.174771]  [<c01b4cb9>] ? timekeeping_get_ns.constprop.14+0x10/0x56
[    4.175701]  [<c0836922>] schedule+0x5f/0x61
[    4.176345]  [<c0836aa2>] io_schedule+0x50/0x67
[    4.177060]  [<c0423b2d>] bt_get+0xaf/0xd1
[    4.177677]  [<c0198282>] ? wake_up_atomic_t+0x1f/0x1f
[    4.178444]  [<c0423bfd>] blk_mq_get_tag+0x26/0x82
[    4.179158]  [<c0420f14>] __blk_mq_alloc_request+0x2a/0x169
[    4.180022]  [<c04222b5>] blk_mq_map_request+0x137/0x1e3
[    4.180825]  [<c0422f89>] blk_sq_make_request+0x82/0x145
[    4.181630]  [<c041a687>] generic_make_request+0x82/0xb5
[    4.182430]  [<c041a7aa>] submit_bio+0xf0/0x109
[    4.183113]  [<c019e97c>] ? trace_hardirqs_on_caller+0x14e/0x169
[    4.184019]  [<c025de72>] _submit_bh+0x1ad/0x1ca
[    4.184661]  [<c025de9e>] submit_bh+0xf/0x11
[    4.185267]  [<c025f5c9>] block_read_full_page+0x1e2/0x1f2
[    4.186073]  [<c025f8cd>] ? I_BDEV+0xa/0xa
[    4.186695]  [<c020ad30>] ? __lru_cache_add+0x24/0x46
[    4.187452]  [<c020af13>] ? lru_cache_add+0xd/0xf
[    4.188130]  [<c025fc04>] blkdev_readpage+0x14/0x16
[    4.188832]  [<c0209adf>] __do_page_cache_readahead+0x1c0/0x1eb
[    4.189704]  [<c0209cb9>] ondemand_readahead+0x1af/0x1b9
[    4.190508]  [<c0209d22>] page_cache_async_readahead+0x5f/0x6a
[    4.191424]  [<c0202370>] generic_file_aio_read+0x226/0x4f4
[    4.192272]  [<c0260841>] blkdev_aio_read+0x90/0x9e
[    4.193017]  [<c02385cd>] do_sync_read+0x52/0x79
[    4.193731]  [<c023857b>] ? fdput_pos+0x25/0x25
[    4.194412]  [<c0238d27>] vfs_read+0x72/0xd1
[    4.195064]  [<c02391da>] SyS_read+0x49/0x7c
[    4.195700]  [<c083a0c9>] syscall_call+0x7/0xb
[    4.196385]  [<c0830000>] ? print_usage_bug+0xcd/0x18e

Is any of these known problems?  This is blocking me from doing any
kind of testing at the moment...  (these problems are showing up while
running KVM using virtio devices).

						- Ted

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: BUG: scheduling while atomic in blk_mq codepath?
  2014-06-19 15:35 BUG: scheduling while atomic in blk_mq codepath? Theodore Ts'o
@ 2014-06-19 15:59   ` Jens Axboe
  0 siblings, 0 replies; 22+ messages in thread
From: Jens Axboe @ 2014-06-19 15:59 UTC (permalink / raw)
  To: Theodore Ts'o, linux-kernel, linux-scsi

On 2014-06-19 08:35, Theodore Ts'o wrote:
> While trying to bisect some problems which were introduced sometime
> between 3.15 and 3.16-rc1 (specifically, (1) reads to a block device
> at offset 262144 * 4k are failing with a short read, and (2) block
> device reads are sometimes causing the entire kernel to hang), the
> following BUG got hit.
>
> [    0.000000] Linux version 3.15.0-rc8-06047-gaaeb255 (tytso@closure) (gcc version 4.8.3 (Debian 4.8.3-2) ) #1902 SMP Thu Jun 19 11:16:10 EDT 2014
>
> [....] Checking file systems...fsck from util-linux 2.20.1
> /dev/vdg was not cleanly unmounted, check forced.
> [    4.161703] BUG: scheduling while atomic: fsck.ext4/2072/0x0000000266.5%
> [    4.163673] no locks held by fsck.ext4/2072.
> [    4.164318] Modules linked in:
> [    4.164845] CPU: 0 PID: 2072 Comm: fsck.ext4 Not tainted 3.15.0-rc8-06047-gaaeb255 #1902
> [    4.166047] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> [    4.166917]  00000000 00000000 f52c5ba0 c0832655 f5158610 f52c5bac c082f88a f6501e40
> [    4.168188]  f52c5c20 c08362ca c0eb3e40 c0eb3e40 374d3933 00000001 0396a8da 00000000
> [    4.169474]  f5158610 f51f1674 f4f46a00 f52c5be4 c015dd4b f4f46a00 f52c5bf0 c015dd5e
> [    4.170781] Call Trace:
> [    4.171159]  [<c0832655>] dump_stack+0x48/0x60
> [    4.171838]  [<c082f88a>] __schedule_bug+0x5c/0x6d
> [    4.172572]  [<c08362ca>] __schedule+0x61/0x65a
> [    4.173228]  [<c015dd4b>] ? kvm_clock_read+0x1f/0x29
> [    4.173977]  [<c015dd5e>] ? kvm_clock_get_cycles+0x9/0xc
> [    4.174771]  [<c01b4cb9>] ? timekeeping_get_ns.constprop.14+0x10/0x56
> [    4.175701]  [<c0836922>] schedule+0x5f/0x61
> [    4.176345]  [<c0836aa2>] io_schedule+0x50/0x67
> [    4.177060]  [<c0423b2d>] bt_get+0xaf/0xd1
> [    4.177677]  [<c0198282>] ? wake_up_atomic_t+0x1f/0x1f
> [    4.178444]  [<c0423bfd>] blk_mq_get_tag+0x26/0x82
> [    4.179158]  [<c0420f14>] __blk_mq_alloc_request+0x2a/0x169
> [    4.180022]  [<c04222b5>] blk_mq_map_request+0x137/0x1e3
> [    4.180825]  [<c0422f89>] blk_sq_make_request+0x82/0x145
> [    4.181630]  [<c041a687>] generic_make_request+0x82/0xb5
> [    4.182430]  [<c041a7aa>] submit_bio+0xf0/0x109
> [    4.183113]  [<c019e97c>] ? trace_hardirqs_on_caller+0x14e/0x169
> [    4.184019]  [<c025de72>] _submit_bh+0x1ad/0x1ca
> [    4.184661]  [<c025de9e>] submit_bh+0xf/0x11
> [    4.185267]  [<c025f5c9>] block_read_full_page+0x1e2/0x1f2
> [    4.186073]  [<c025f8cd>] ? I_BDEV+0xa/0xa
> [    4.186695]  [<c020ad30>] ? __lru_cache_add+0x24/0x46
> [    4.187452]  [<c020af13>] ? lru_cache_add+0xd/0xf
> [    4.188130]  [<c025fc04>] blkdev_readpage+0x14/0x16
> [    4.188832]  [<c0209adf>] __do_page_cache_readahead+0x1c0/0x1eb
> [    4.189704]  [<c0209cb9>] ondemand_readahead+0x1af/0x1b9
> [    4.190508]  [<c0209d22>] page_cache_async_readahead+0x5f/0x6a
> [    4.191424]  [<c0202370>] generic_file_aio_read+0x226/0x4f4
> [    4.192272]  [<c0260841>] blkdev_aio_read+0x90/0x9e
> [    4.193017]  [<c02385cd>] do_sync_read+0x52/0x79
> [    4.193731]  [<c023857b>] ? fdput_pos+0x25/0x25
> [    4.194412]  [<c0238d27>] vfs_read+0x72/0xd1
> [    4.195064]  [<c02391da>] SyS_read+0x49/0x7c
> [    4.195700]  [<c083a0c9>] syscall_call+0x7/0xb
> [    4.196385]  [<c0830000>] ? print_usage_bug+0xcd/0x18e
>
> Is any of these known problems?  This is blocking me from doing any
> kind of testing at the moment...  (these problems are showing up while
> running KVM using virtio devices).

I believe you already reported this issue a while back, and it should be 
fixed by commit cb96a42c in the kernel.

The other issue, not sure, not a lot of detail. It may be fixed by the 
pull request I sent out yesterday. You can try pulling in:

git://git.kernel.dk/linux-block.git for-linus

and see if that fixes it for you.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: BUG: scheduling while atomic in blk_mq codepath?
@ 2014-06-19 15:59   ` Jens Axboe
  0 siblings, 0 replies; 22+ messages in thread
From: Jens Axboe @ 2014-06-19 15:59 UTC (permalink / raw)
  To: Theodore Ts'o, linux-kernel, linux-scsi

On 2014-06-19 08:35, Theodore Ts'o wrote:
> While trying to bisect some problems which were introduced sometime
> between 3.15 and 3.16-rc1 (specifically, (1) reads to a block device
> at offset 262144 * 4k are failing with a short read, and (2) block
> device reads are sometimes causing the entire kernel to hang), the
> following BUG got hit.
>
> [    0.000000] Linux version 3.15.0-rc8-06047-gaaeb255 (tytso@closure) (gcc version 4.8.3 (Debian 4.8.3-2) ) #1902 SMP Thu Jun 19 11:16:10 EDT 2014
>
> [....] Checking file systems...fsck from util-linux 2.20.1
> /dev/vdg was not cleanly unmounted, check forced.
> [    4.161703] BUG: scheduling while atomic: fsck.ext4/2072/0x0000000266.5%
> [    4.163673] no locks held by fsck.ext4/2072.
> [    4.164318] Modules linked in:
> [    4.164845] CPU: 0 PID: 2072 Comm: fsck.ext4 Not tainted 3.15.0-rc8-06047-gaaeb255 #1902
> [    4.166047] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> [    4.166917]  00000000 00000000 f52c5ba0 c0832655 f5158610 f52c5bac c082f88a f6501e40
> [    4.168188]  f52c5c20 c08362ca c0eb3e40 c0eb3e40 374d3933 00000001 0396a8da 00000000
> [    4.169474]  f5158610 f51f1674 f4f46a00 f52c5be4 c015dd4b f4f46a00 f52c5bf0 c015dd5e
> [    4.170781] Call Trace:
> [    4.171159]  [<c0832655>] dump_stack+0x48/0x60
> [    4.171838]  [<c082f88a>] __schedule_bug+0x5c/0x6d
> [    4.172572]  [<c08362ca>] __schedule+0x61/0x65a
> [    4.173228]  [<c015dd4b>] ? kvm_clock_read+0x1f/0x29
> [    4.173977]  [<c015dd5e>] ? kvm_clock_get_cycles+0x9/0xc
> [    4.174771]  [<c01b4cb9>] ? timekeeping_get_ns.constprop.14+0x10/0x56
> [    4.175701]  [<c0836922>] schedule+0x5f/0x61
> [    4.176345]  [<c0836aa2>] io_schedule+0x50/0x67
> [    4.177060]  [<c0423b2d>] bt_get+0xaf/0xd1
> [    4.177677]  [<c0198282>] ? wake_up_atomic_t+0x1f/0x1f
> [    4.178444]  [<c0423bfd>] blk_mq_get_tag+0x26/0x82
> [    4.179158]  [<c0420f14>] __blk_mq_alloc_request+0x2a/0x169
> [    4.180022]  [<c04222b5>] blk_mq_map_request+0x137/0x1e3
> [    4.180825]  [<c0422f89>] blk_sq_make_request+0x82/0x145
> [    4.181630]  [<c041a687>] generic_make_request+0x82/0xb5
> [    4.182430]  [<c041a7aa>] submit_bio+0xf0/0x109
> [    4.183113]  [<c019e97c>] ? trace_hardirqs_on_caller+0x14e/0x169
> [    4.184019]  [<c025de72>] _submit_bh+0x1ad/0x1ca
> [    4.184661]  [<c025de9e>] submit_bh+0xf/0x11
> [    4.185267]  [<c025f5c9>] block_read_full_page+0x1e2/0x1f2
> [    4.186073]  [<c025f8cd>] ? I_BDEV+0xa/0xa
> [    4.186695]  [<c020ad30>] ? __lru_cache_add+0x24/0x46
> [    4.187452]  [<c020af13>] ? lru_cache_add+0xd/0xf
> [    4.188130]  [<c025fc04>] blkdev_readpage+0x14/0x16
> [    4.188832]  [<c0209adf>] __do_page_cache_readahead+0x1c0/0x1eb
> [    4.189704]  [<c0209cb9>] ondemand_readahead+0x1af/0x1b9
> [    4.190508]  [<c0209d22>] page_cache_async_readahead+0x5f/0x6a
> [    4.191424]  [<c0202370>] generic_file_aio_read+0x226/0x4f4
> [    4.192272]  [<c0260841>] blkdev_aio_read+0x90/0x9e
> [    4.193017]  [<c02385cd>] do_sync_read+0x52/0x79
> [    4.193731]  [<c023857b>] ? fdput_pos+0x25/0x25
> [    4.194412]  [<c0238d27>] vfs_read+0x72/0xd1
> [    4.195064]  [<c02391da>] SyS_read+0x49/0x7c
> [    4.195700]  [<c083a0c9>] syscall_call+0x7/0xb
> [    4.196385]  [<c0830000>] ? print_usage_bug+0xcd/0x18e
>
> Is any of these known problems?  This is blocking me from doing any
> kind of testing at the moment...  (these problems are showing up while
> running KVM using virtio devices).

I believe you already reported this issue a while back, and it should be 
fixed by commit cb96a42c in the kernel.

The other issue, not sure, not a lot of detail. It may be fixed by the 
pull request I sent out yesterday. You can try pulling in:

git://git.kernel.dk/linux-block.git for-linus

and see if that fixes it for you.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: BUG: scheduling while atomic in blk_mq codepath?
  2014-06-19 15:59   ` Jens Axboe
  (?)
@ 2014-06-19 16:08   ` Theodore Ts'o
  2014-06-19 16:21     ` Theodore Ts'o
  -1 siblings, 1 reply; 22+ messages in thread
From: Theodore Ts'o @ 2014-06-19 16:08 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-kernel, linux-scsi

On Thu, Jun 19, 2014 at 08:59:26AM -0700, Jens Axboe wrote:
> I believe you already reported this issue a while back, and it should be
> fixed by commit cb96a42c in the kernel.

Ah yes, I had forgotten.  Thanks for the reminder!

> The other issue, not sure, not a lot of detail. It may be fixed by the pull
> request I sent out yesterday. You can try pulling in:
> 
> git://git.kernel.dk/linux-block.git for-linus

Thanks, I'll give that a try.

				- Ted

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: BUG: scheduling while atomic in blk_mq codepath?
  2014-06-19 16:08   ` Theodore Ts'o
@ 2014-06-19 16:21     ` Theodore Ts'o
  2014-06-19 22:38       ` Dave Chinner
  0 siblings, 1 reply; 22+ messages in thread
From: Theodore Ts'o @ 2014-06-19 16:21 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel, linux-scsi

On Thu, Jun 19, 2014 at 12:08:01PM -0400, Theodore Ts'o wrote:
> > The other issue, not sure, not a lot of detail. It may be fixed by the pull
> > request I sent out yesterday. You can try pulling in:
> > 
> > git://git.kernel.dk/linux-block.git for-linus
> 
> Thanks, I'll give that a try.

I tried merging in your for-linus branch in v3.16-rc1, and I'm seeing
the following.  On a 32-bit x86 3.15 kernel, run: "mke2fs -t ext3
/dev/vdc" where /dev/vdc is a 5 gig virtio partition.

The boot 3.16-rc1 with linux-block.git's for_linus branch merged in.
(Or 3.16-rc1 by itself):

root@candygram:~# e2fsck -fy /dev/vdc
e2fsck 1.43-WIP (4-Feb-2014)
Pass 1: Checking inodes, blocks, and sizes
[   22.872217] random: nonblocking pool is initialized
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Error reading block 262144 (Attempt to read block from filesystem resulted in short read) while reading inode and block bitmaps.  Ignore error? yes

Force rewrite? yes

Block bitmap differences:  +(262144--262657)
Fix? yes


/dev/vdc: ***** FILE SYSTEM WAS MODIFIED *****
/dev/vdc: 11/327680 files (0.0% non-contiguous), 55935/1310720 blocks

With the 3.15 kernel, this works fine.

						- Ted

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: BUG: scheduling while atomic in blk_mq codepath?
  2014-06-19 16:21     ` Theodore Ts'o
@ 2014-06-19 22:38       ` Dave Chinner
  2014-06-21  3:51         ` 32-bit bug in iovec iterator changes Theodore Ts'o
  0 siblings, 1 reply; 22+ messages in thread
From: Dave Chinner @ 2014-06-19 22:38 UTC (permalink / raw)
  To: Theodore Ts'o, Jens Axboe, linux-kernel, linux-scsi

On Thu, Jun 19, 2014 at 12:21:44PM -0400, Theodore Ts'o wrote:
> On Thu, Jun 19, 2014 at 12:08:01PM -0400, Theodore Ts'o wrote:
> > > The other issue, not sure, not a lot of detail. It may be fixed by the pull
> > > request I sent out yesterday. You can try pulling in:
> > > 
> > > git://git.kernel.dk/linux-block.git for-linus
> > 
> > Thanks, I'll give that a try.
> 
> I tried merging in your for-linus branch in v3.16-rc1, and I'm seeing
> the following.  On a 32-bit x86 3.15 kernel, run: "mke2fs -t ext3
> /dev/vdc" where /dev/vdc is a 5 gig virtio partition.

Short reads are more likely a bug in all the iovec iterator stuff
that got merged in from the vfs tree. ISTR a 32 bit-only bug in that
stuff go past in to do with not being able to partition a 32GB block
dev on a 32 bit system due to a 32 bit size_t overflow somewhere....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 22+ messages in thread

* 32-bit bug in iovec iterator changes
  2014-06-19 22:38       ` Dave Chinner
@ 2014-06-21  3:51         ` Theodore Ts'o
  2014-06-21  5:53           ` Al Viro
  0 siblings, 1 reply; 22+ messages in thread
From: Theodore Ts'o @ 2014-06-21  3:51 UTC (permalink / raw)
  To: Al Viro; +Cc: Dave Chinner, Jens Axboe, linux-kernel, linux-scsi

On Fri, Jun 20, 2014 at 08:38:20AM +1000, Dave Chinner wrote:
> 
> Short reads are more likely a bug in all the iovec iterator stuff
> that got merged in from the vfs tree. ISTR a 32 bit-only bug in that
> stuff go past in to do with not being able to partition a 32GB block
> dev on a 32 bit system due to a 32 bit size_t overflow somewhere....

Dave Chinner called it.  

Al, I'm seeing a regression which shows up using a 32-bit x86 kernel.
The symptoms of the bug is when run under KVM, with a 5 GB /dev/vdc
virtual block device, a read at offset 2 ** 30 fails with a short
read:

# dd if=/dev/vdc of=/dev/null bs=4k skip=262144 count=1
0+0 records in
0+0 records out
0 bytes (0 B) copied, 0.0164144 s, 0.0 kB/s

On a 3.15 kernel, this command works:

# dd if=/dev/vdc of=/dev/null bs=4k skip=262144 count=1
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 0.0457984 s, 89.4 kB/s

I tried bisecting it, but unfortunately the iovec iterator changes are
not cleanly bisectable, since copy_page_from_iter() gets introduced
some two dozen patches before it gets defined.  :-(

However, the bisect leads quite squarely to to the iovec iterator
patches.

Al, I'd appreciate it if you could take a look?

Thanks!!

    		      	     	   	- Ted


% git bisect start
# good: [1860e379875dfe7271c649058aeddffe5afd9d0d] Linux 3.15
git bisect good 1860e379875dfe7271c649058aeddffe5afd9d0d
# bad: [7171511eaec5bf23fb06078f59784a3a0626b38f] Linux 3.16-rc1
git bisect bad 7171511eaec5bf23fb06078f59784a3a0626b38f
# good: [aaeb2554337217dfa4eac2fcc90da7be540b9a73] Merge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media into next
git bisect good aaeb2554337217dfa4eac2fcc90da7be540b9a73
# bad: [16b9057804c02e2d351e9c8f606e909b43cbd9e7] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
git bisect bad 16b9057804c02e2d351e9c8f606e909b43cbd9e7
# good: [82abb273d838318424644d8f02825db0fbbd400a] Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus
git bisect good 82abb273d838318424644d8f02825db0fbbd400a
# good: [d1e1cda862c16252087374ac75949b0e89a5717e] Merge tag 'nfs-for-3.16-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
git bisect good d1e1cda862c16252087374ac75949b0e89a5717e
# good: [23d4ed53b7342bf5999b3ea227d9f69e75e5a625] Merge branch 'for-linus' of git://git.kernel.dk/linux-block
git bisect good 23d4ed53b7342bf5999b3ea227d9f69e75e5a625
# good: [2840c566e95599cd60c7143762ca8b49d9395050] Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
git bisect good 2840c566e95599cd60c7143762ca8b49d9395050
# good: [4251c2a67011801caecd63671f26dd8c9aedb24c] Merge tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux
git bisect good 4251c2a67011801caecd63671f26dd8c9aedb24c
# skip: [3dae8750c368f8ac11c3c8c2a28f56dcee865c01] cifs: switch to ->write_iter()
git bisect skip 3dae8750c368f8ac11c3c8c2a28f56dcee865c01
# good: [5c02c392cd2320e8d612376d6b72b6548a680923] Merge tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux
git bisect good 5c02c392cd2320e8d612376d6b72b6548a680923
# bad: [5f073850602084fbcbb987948ff3e70ae273f7d2] kill generic_file_splice_write()
git bisect bad 5f073850602084fbcbb987948ff3e70ae273f7d2
# good: [38583f095c5a8138ae2a1c9173d0fd8a9f10e8aa] Merge branch 'akpm' (incoming from Andrew)
git bisect good 38583f095c5a8138ae2a1c9173d0fd8a9f10e8aa
# good: [f5ccfe1ddbaf9d923a3ebdadcb1e5e32d83e9c28] ext4: fix locking for O_APPEND writes
git bisect good f5ccfe1ddbaf9d923a3ebdadcb1e5e32d83e9c28
# bad: [f0d1bec9d58d4c038d0ac958c9af82be6eb18045] new helper: copy_page_from_iter()
git bisect bad f0d1bec9d58d4c038d0ac958c9af82be6eb18045


% (unset DISPLAY; git bisect visualize)
commit f0d1bec9d58d4c038d0ac958c9af82be6eb18045
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Thu Apr 3 15:05:18 2014 -0400

    new helper: copy_page_from_iter()
    
    parallel to copy_page_to_iter().  pipe_write() switched to it (and became
    ->write_iter()).
    
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit 84c3d55cc474f9c234c023c92e2769f940d5548c
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Thu Apr 3 14:33:23 2014 -0400

    fuse: switch to ->write_iter()
    
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit b30ac0fc4109701fc122d41ee085c65b52dc44a3
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Thu Apr 3 14:29:04 2014 -0400

    btrfs: switch to ->write_iter()
    
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit 3ef045c3d8ae8550abbfd44074efce6ff642cc86
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Thu Apr 3 14:25:22 2014 -0400

    ocfs2: switch to ->write_iter()
    
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit bf97f3bc0c32140c43fe5ca53d23514ea46a54ca
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Thu Apr 3 14:20:23 2014 -0400

    xfs: switch to ->write_iter()
    
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit 50b5551d1719c8bce60c6d4027b814cfc72c2307
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Thu Apr 3 14:13:46 2014 -0400

    afs: switch to ->write_iter()
    
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit da56e45b6ee83b67a586c61819cd2b5cfd806eb8
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Thu Apr 3 14:11:01 2014 -0400

    gfs2: switch to ->write_iter()
    
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit edaf43694898c5d7deb9a394335c60e888039100
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Thu Apr 3 14:07:25 2014 -0400

    nfs: switch to ->write_iter()
    
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit f5674c31ee1f968606702e82c160d6ae11032ded
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Thu Apr 3 14:00:23 2014 -0400

    ubifs: switch to ->write_iter()
    
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit a8f3550cd228b6edc5d17fce1a9af8cc7004f185
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Thu Apr 3 03:32:25 2014 -0400

    bury __generic_file_aio_write()
    
    all users converted to __generic_file_write_iter() now
    
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit 3dae8750c368f8ac11c3c8c2a28f56dcee865c01
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Thu Apr 3 12:05:17 2014 -0400

    cifs: switch to ->write_iter()
    
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit d4637bc18f3bf89bd63fe7b967043523aa3a6170
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Thu Apr 3 03:31:17 2014 -0400

    udf: switch to ->write_iter()
    
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit 9b884164d59707216840159d45f6be68073fac6e
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Thu Apr 17 16:09:22 2014 -0400

    convert ext4 to ->write_iter()
    
    unfortunately, Ted's changes to ext4_file_write() are *still* an
    incomplete fix - playing with rlimits can let you smuggle an
    unaligned request past the checks.  So there almost certainly
    will be more merge PITA around that place...
    
    [fix from Peter Ujfalusi <peter.ujfalusi@ti.com> folded]
    
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit a8324754889c0a491b216bc0502ef9ba557eeac7
Merge: 1456c0a f5ccfe1
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Tue May 6 17:38:41 2014 -0400

    Merge ext4 changes in ext4_file_write() into for-next
    
    From ext4.git#dev, needed for switch of ext4 to ->write_iter() ;-/

commit 1456c0a87c4241d3a801651019e66983c69ad17d
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Thu Apr 3 03:21:50 2014 -0400

    blkdev_aio_write() - turn into blkdev_write_iter()
    
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit 8174202b34c30e0c07231bf63f18ab29af634f0b
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Thu Apr 3 03:17:43 2014 -0400

    write_iter variants of {__,}generic_file_aio_write()
    
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit 3644424dc6309439c4c8d97590cdac4100376255
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Wed Apr 2 20:28:01 2014 -0400

    ceph: switch to ->read_iter()
    
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit 3aa2d199f8eb8149a88005e88736d583cbc39d31
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Wed Apr 2 20:14:12 2014 -0400

    nfs: switch to ->read_iter()
    
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit a886038baa48a17f2cf77fb5dcc204fd28659d8f
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Wed Apr 2 20:02:21 2014 -0400

    fs/block_dev.c: switch to ->read_iter()
    
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit 2ba5bbed0cd7429dbd567fa885ae3bc7a76de3d4
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Wed Apr 2 20:00:02 2014 -0400

    shmem: switch to ->read_iter()
    
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit fb9096a344e2964c6a71520931c08abb1301248e
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Wed Apr 2 19:56:54 2014 -0400

    pipe: switch to ->read_iter()
    
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit e6a7bcb4c489e3e078ba3cc92ae6621b2b8bb9a7
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Wed Apr 2 19:53:36 2014 -0400

    cifs: switch to ->read_iter()
    
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit 37c20f16e7a73e5fe34815e785ca6c5a46e4d260
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Wed Apr 2 14:47:09 2014 -0400

    fuse_file_aio_read(): convert to ->read_iter()
    
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit 3cd9ad5a303a0d503492002c4af95becfa99af03
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Wed Apr 2 14:44:18 2014 -0400

    ocfs2: switch to ->read_iter()
    
    tracepoints are evil, exhibit #6969...
    
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit 027978295d455b2fdff0cb36570f83ada7385ea6
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Wed Apr 2 14:40:38 2014 -0400

    ecryptfs: switch to ->read_iter()
    
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit b4f5d2c6d1f88c79e48f1296076b3a6a22f58c0f
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Wed Apr 2 14:37:59 2014 -0400

    xfs: switch to ->read_iter()
    
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit aad4f8bb42af06371aa0e85bf0cd9d52c0494985
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Wed Apr 2 14:33:16 2014 -0400

    switch simple generic_file_aio_read() users to ->read_iter()
    
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit 293bc9822fa9b3c9d4b7893bcb241e085580771a
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Tue Feb 11 18:37:41 2014 -0500

    new methods: ->read_iter() and ->write_iter()
    
    Beginning to introduce those.  Just the callers for now, and it's
    clumsier than it'll eventually become; once we finish converting
    aio_read and aio_write instances, the things will get nicer.
    
    For now, these guys are in parallel to ->aio_read() and ->aio_write();
    they take iocb and iov_iter, with everything in iov_iter already
    validated.  File offset is passed in iocb->ki_pos, iov/nr_segs -
    in iov_iter.
    
    Main concerns in that series are stack footprint and ability to
    split the damn thing cleanly.
    
    [fix from Peter Ujfalusi <peter.ujfalusi@ti.com> folded]
    
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit 7f7f25e82d54870df24d415a7007fbd327da027b
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Tue Feb 11 17:49:24 2014 -0500

    replace checking for ->read/->aio_read presence with check in ->f_mode
    
    Since we are about to introduce new methods (read_iter/write_iter), the
    tests in a bunch of places would have to grow inconveniently.  Check
    once (at open() time) and store results in ->f_mode as FMODE_CAN_READ
    and FMODE_CAN_WRITE resp.  It might end up being a temporary measure -
    once everything switches from ->aio_{read,write} to ->{read,write}_iter
    it might make sense to return to open-coded checks.  We'll see...
    
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit b318891929c2750055a4002bee3e7636ca3684de
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Wed Apr 2 07:06:30 2014 -0400

    xfs: trim the argument lists of xfs_file_{dio,buffered}_aio_write()
    
    pos is redundant (it's iocb->ki_pos), and iov/nr_segs/count are taken
    care of by lifting iov_iter into the caller.
    
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit 37938463540b075e9166cf774c59274379f7a8ca
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Sat Mar 22 06:57:37 2014 -0400

    blkdev_aio_read(): switch to generic_file_read_iter(), get rid of iov_shorten()
    
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit 0c949334a9e2581646c6ff0d1470a805b1e5be99
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Sat Mar 22 06:51:37 2014 -0400

    iov_iter_truncate()
    
    Now It Can Be Done(tm) - we don't need to do iov_shorten() in
    generic_file_direct_write() anymore, now that all ->direct_IO()
    instances are converted to proper iov_iter methods and honour
    iter->count and iter->iov_offset properly.
    
    Get rid of count/ocount arguments of generic_file_direct_write(),
    while we are at it.
    
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit 28060d5d9b261da110afe48aae7a2aa6555f798f
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Sat Mar 22 05:15:17 2014 -0400

    btrfs: switch check_direct_IO() to iov_iter
    
    ... and don't open-code iov_iter_alignment() there
    
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit 91f79c43d1b54d7154b118860d81b39bad07dfff
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Fri Mar 21 04:58:33 2014 -0400

    new helper: iov_iter_get_pages_alloc()
    
    same as iov_iter_get_pages(), except that pages array is allocated
    (kmalloc if possible, vmalloc if that fails) and left for caller to
    free.  Lustre and NFS ->direct_IO() switched to it.
    
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit f67da30c1d5fc9e341bc8121708874bfd7b31e45
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Wed Mar 19 01:16:16 2014 -0400

    new helper: iov_iter_npages()
    
    counts the pages covered by iov_iter, up to given limit.
    do_block_direct_io() and fuse_iter_npages() switched to
    it.
    
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit 5b46f25ddc6edf4adff1a137d453da542c27a640
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Sun Mar 16 18:07:34 2014 -0400

    f2fs: switch to iov_iter_alignment()
    
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit c9c37e2e63786c595d704244cbb7d19dc5630493
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Sun Mar 16 16:08:30 2014 -0400

    fuse: switch to iov_iter_get_pages()
    
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit d22a943f44c79c98ac7a93653fdd330378581741
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Sun Mar 16 15:50:47 2014 -0400

    fuse: pull iov_iter initializations up
    
    ... to fuse_direct_{read,write}().  ->direct_IO() path uses the
    iov_iter passed by the caller instead.
    
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit 7b2c99d15559e285384c742db52316802e24b0bd
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Sat Mar 15 04:05:57 2014 -0400

    new helper: iov_iter_get_pages()
    
    iov_iter_get_pages(iter, pages, maxsize, &start) grabs references pinning
    the pages of up to maxsize of (contiguous) data from iter.  Returns the
    amount of memory grabbed or -error.  In case of success, the requested
    area begins at offset start in pages[0] and runs through pages[1], etc.
    Less than requested amount might be returned - either because the contiguous
    area in the beginning of iterator is smaller than requested, or because
    the kernel failed to pin that many pages.
    
    direct-io.c switched to using iov_iter_get_pages()
    
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit 3320c60b3a26d05666285c55ab08ee043c017ba3
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Mon Mar 10 02:30:55 2014 -0400

    dio: take updating ->result into do_direct_IO()
    
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit 71d8e532b1549a478e6a6a8a44f309d050294d00
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Wed Mar 5 19:28:09 2014 -0500

    start adding the tag to iov_iter
    
    For now, just use the same thing we pass to ->direct_IO() - it's all
    iovec-based at the moment.  Pass it explicitly to iov_iter_init() and
    account for kvec vs. iovec in there, by the same kludge NFS ->direct_IO()
    uses.
    
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit ed978a811ec528dbe40243605c3afab55892f722
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Wed Mar 5 22:53:04 2014 -0500

    new helper: generic_file_read_iter()
    
    iov_iter-using variant of generic_file_aio_read().  Some callers
    converted.  Note that it's still not quite there for use as ->read_iter() -
    we depend on having zero iter->iov_offset in O_DIRECT case.  Fortunately,
    that's true for all converted callers (and for generic_file_aio_read() itself).
    
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit 23faa7b8db9be0be4f158cfc558460bb95d9b245
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Wed Mar 5 22:52:34 2014 -0500

    fuse_file_aio_write(): merge initializations of iov_iter
    
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit 05bb2e0bc77cb005248be318d2b0ba369b8bbab3
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Wed Mar 5 19:22:23 2014 -0500

    ceph_aio_read(): keep iov_iter across retries
    
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit 886a39115005ced8b15ab067c9c2a8d546b40a5e
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Wed Mar 5 13:50:45 2014 -0500

    new primitive: iov_iter_alignment()
    
    returns the value aligned as badly as the worst remaining segment
    in iov_iter is.  Use instead of open-coded equivalents.
    
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit 26978b8b4d83c46f4310b253db70fa9e65149e7c
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Mon Mar 10 14:08:45 2014 -0400

    give ->direct_IO() a copy of iov_iter
    
    the thing is, we want to advance what's given to ->direct_IO() as we
    are forming the request; however, the callers care about the amount
    of data actually transferred, not the amount we tried to transfer.
    It's more convenient to allow ->direct_IO() instances do use
    iov_iter_advance() on the copy of iov_iter, leaving the actual
    advancing of the original to caller.
    
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit 31b140398ce56ab41646eda7f02bcb78d6a4c916
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Wed Mar 5 01:33:16 2014 -0500

    switch {__,}blockdev_direct_IO() to iov_iter
    
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit a6cbcd4a4a85e2fdb0b3344b88df2e8b3d526b9e
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Tue Mar 4 22:38:00 2014 -0500

    get rid of pointless iov_length() in ->direct_IO()
    
    all callers have iov_length(iter->iov, iter->nr_segs) == iov_iter_count(iter)
    
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit 16b1f05d7f5ab4ce570963aca5f3b2b5d21822fa
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Tue Mar 4 22:14:00 2014 -0500

    ext4: switch the guts of ->direct_IO() to iov_iter
    
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit 619d30b4b8c488042b4a720ca79dccc346d1a516
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Tue Mar 4 21:53:33 2014 -0500

    convert the guts of nfs_direct_IO() to iov_iter
    
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit d8d3d94b80aa1a1c0ca75c58b8abdc7356f38418
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Tue Mar 4 21:27:34 2014 -0500

    pass iov_iter to ->direct_IO()
    
    unmodified, for now
    
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit cb66a7a1f149ff705fa37cad6d1252b046e0ad4f
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Tue Mar 4 15:24:06 2014 -0500

    kill generic_segment_checks()
    
    all callers of ->aio_read() and ->aio_write() have iov/nr_segs already
    checked - generic_segment_checks() done after that is just an odd way
    to spell iov_length().
    
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit 0ae5e4d370599592eab845527b31708a4f3411be
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Mon Mar 3 22:09:39 2014 -0500

    __btrfs_direct_write(): switch to iov_iter
    
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit f8579f8673b7ecdb7a81d5d5bb1d981093d9aa94
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Mon Mar 3 22:03:20 2014 -0500

    generic_file_direct_write(): switch to iov_iter
    
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit e7c24607b5d68a4cdc56e09d70a3c8bae5f0519f
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Thu Apr 10 20:54:51 2014 -0400

    kill iov_iter_copy_from_user()
    
    all callers can use copy_page_from_iter() and it actually simplifies
    them.
    
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit f6c0a1920e0180175bd5e8e4aff8ea5556f1895d
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Wed Apr 23 10:18:46 2014 -0400

    fs/file.c: don't open-code kvfree()
    
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 32-bit bug in iovec iterator changes
  2014-06-21  3:51         ` 32-bit bug in iovec iterator changes Theodore Ts'o
@ 2014-06-21  5:53           ` Al Viro
  2014-06-21 23:09             ` Theodore Ts'o
  0 siblings, 1 reply; 22+ messages in thread
From: Al Viro @ 2014-06-21  5:53 UTC (permalink / raw)
  To: Theodore Ts'o, Dave Chinner, Jens Axboe, linux-kernel, linux-scsi

On Fri, Jun 20, 2014 at 11:51:44PM -0400, Theodore Ts'o wrote:
> On Fri, Jun 20, 2014 at 08:38:20AM +1000, Dave Chinner wrote:
> > 
> > Short reads are more likely a bug in all the iovec iterator stuff
> > that got merged in from the vfs tree. ISTR a 32 bit-only bug in that
> > stuff go past in to do with not being able to partition a 32GB block
> > dev on a 32 bit system due to a 32 bit size_t overflow somewhere....
> 
> Dave Chinner called it.  
> 
> Al, I'm seeing a regression which shows up using a 32-bit x86 kernel.
> The symptoms of the bug is when run under KVM, with a 5 GB /dev/vdc
> virtual block device, a read at offset 2 ** 30 fails with a short
> read:
> 
> # dd if=/dev/vdc of=/dev/null bs=4k skip=262144 count=1
> 0+0 records in
> 0+0 records out
> 0 bytes (0 B) copied, 0.0164144 s, 0.0 kB/s

Argh...

ed include/linux/uio.h <<EOF
/iov_iter_truncate/s/size_t/u64/
w
q
EOF

Could you check if that fixes the sucker?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 32-bit bug in iovec iterator changes
  2014-06-21  5:53           ` Al Viro
@ 2014-06-21 23:09             ` Theodore Ts'o
  2014-06-21 23:49               ` Al Viro
  0 siblings, 1 reply; 22+ messages in thread
From: Theodore Ts'o @ 2014-06-21 23:09 UTC (permalink / raw)
  To: Al Viro; +Cc: Dave Chinner, Jens Axboe, linux-kernel, linux-scsi

On Sat, Jun 21, 2014 at 06:53:07AM +0100, Al Viro wrote:
> 
> ed include/linux/uio.h <<EOF
> /iov_iter_truncate/s/size_t/u64/
> w
> q
> EOF
> 
> Could you check if that fixes the sucker?

The following patch (attached at the end) appears to fix the problem,
but looking at uio.h, I'm completely confused about *why* it fixes the
problem.  In particular, iov_iter_iovec() makes no sense to me at all,
and I don't understand how the calculation of iov_len makes any sense:

		.iov_len = min(iter->count,
			       iter->iov->iov_len - iter->iov_offset),

It also looks like uio.h is mostly about offsets to memory pointers,
and so why this would make a difference when the issue is the block
device offset goes above 2**30?

There must be deep magic going on here, and so I don't know if your
s/size_t/u64/g substitation also extends to the various functions that
have size_t in them:

unsigned long iov_shorten(struct iovec *iov, unsigned long nr_segs, size_t to);
size_t iov_iter_copy_from_user_atomic(struct page *page,
		struct iov_iter *i, unsigned long offset, size_t bytes);
void iov_iter_advance(struct iov_iter *i, size_t bytes);
int iov_iter_fault_in_readable(struct iov_iter *i, size_t bytes);
size_t iov_iter_single_seg_count(const struct iov_iter *i);
size_t copy_page_to_iter(struct page *page, size_t offset, size_t bytes,
			 struct iov_iter *i);
size_t copy_page_from_iter(struct page *page, size_t offset, size_t bytes,
			 struct iov_iter *i);
unsigned long iov_iter_alignment(const struct iov_iter *i);
void iov_iter_init(struct iov_iter *i, int direction, const struct iovec *iov,
			unsigned long nr_segs, size_t count);
ssize_t iov_iter_get_pages(struct iov_iter *i, struct page **pages,
			size_t maxsize, size_t *start);
ssize_t iov_iter_get_pages_alloc(struct iov_iter *i, struct page ***pages,
			size_t maxsize, size_t *start);


Anyway, this patch does appear to make the problem go away, but given
that I don't understand what is going on here, please take it with a
huge grain of salt.  And might I suggest some comments to perhaps give
some context to someone who is trying to understand
include/linux/uio.h?

Thanks!!

						- Ted

diff --git a/include/linux/uio.h b/include/linux/uio.h
index e2231e4..bea7b7d 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -16,7 +16,7 @@ struct page;
 
 struct kvec {
 	void *iov_base; /* and that should *never* hold a userland pointer */
-	size_t iov_len;
+	u64 iov_len;
 };
 
 enum {
@@ -27,8 +27,8 @@ enum {
 
 struct iov_iter {
 	int type;
-	size_t iov_offset;
-	size_t count;
+	u64 iov_offset;
+	u64 count;
 	union {
 		const struct iovec *iov;
 		const struct bio_vec *bvec;
@@ -41,12 +41,12 @@ struct iov_iter {
  *
  * NOTE that it is not safe to use this function until all the iovec's
  * segment lengths have been validated.  Because the individual lengths can
- * overflow a size_t when added together.
+ * overflow a u64 when added together.
  */
-static inline size_t iov_length(const struct iovec *iov, unsigned long nr_segs)
+static inline u64 iov_length(const struct iovec *iov, unsigned long nr_segs)
 {
 	unsigned long seg;
-	size_t ret = 0;
+	u64 ret = 0;
 
 	for (seg = 0; seg < nr_segs; seg++)
 		ret += iov[seg].iov_len;
@@ -89,12 +89,12 @@ ssize_t iov_iter_get_pages_alloc(struct iov_iter *i, struct page ***pages,
 			size_t maxsize, size_t *start);
 int iov_iter_npages(const struct iov_iter *i, int maxpages);
 
-static inline size_t iov_iter_count(struct iov_iter *i)
+static inline u64 iov_iter_count(struct iov_iter *i)
 {
 	return i->count;
 }
 
-static inline void iov_iter_truncate(struct iov_iter *i, size_t count)
+static inline void iov_iter_truncate(struct iov_iter *i, u64 count)
 {
 	if (i->count > count)
 		i->count = count;
@@ -104,7 +104,7 @@ static inline void iov_iter_truncate(struct iov_iter *i, size_t count)
  * reexpand a previously truncated iterator; count must be no more than how much
  * we had shrunk it.
  */
-static inline void iov_iter_reexpand(struct iov_iter *i, size_t count)
+static inline void iov_iter_reexpand(struct iov_iter *i, u64 count)
 {
 	i->count = count;
 }


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: 32-bit bug in iovec iterator changes
  2014-06-21 23:09             ` Theodore Ts'o
@ 2014-06-21 23:49               ` Al Viro
  2014-06-22  0:03                 ` James Bottomley
  0 siblings, 1 reply; 22+ messages in thread
From: Al Viro @ 2014-06-21 23:49 UTC (permalink / raw)
  To: Theodore Ts'o, Dave Chinner, Jens Axboe, linux-kernel, linux-scsi

On Sat, Jun 21, 2014 at 07:09:22PM -0400, Theodore Ts'o wrote:
> On Sat, Jun 21, 2014 at 06:53:07AM +0100, Al Viro wrote:
> > 
> > ed include/linux/uio.h <<EOF
> > /iov_iter_truncate/s/size_t/u64/
> > w
> > q
> > EOF
> > 
> > Could you check if that fixes the sucker?
> 
> The following patch (attached at the end) appears to fix the problem,
> but looking at uio.h, I'm completely confused about *why* it fixes the
> problem.  In particular, iov_iter_iovec() makes no sense to me at all,
> and I don't understand how the calculation of iov_len makes any sense:
> 
> 		.iov_len = min(iter->count,
> 			       iter->iov->iov_len - iter->iov_offset),

Eh?   We have iov[0].iov_base..iov[0].iov_base+iov[0].iov_len - 1 for
area covered by the first iovec.  First iov_offset bytes have already
been consumed.  And at most count bytes matter.  So yes, this iov_len
will give you equivalent first iovec.

Said that, iov_iter_iovec() will die shortly - it's a rudiment of older
code, with almost no users left.  But yes, it is correct.

> It also looks like uio.h is mostly about offsets to memory pointers,
> and so why this would make a difference when the issue is the block
> device offset goes above 2**30?

It is, and your patch is a huge overkill.

> There must be deep magic going on here, and so I don't know if your
> s/size_t/u64/g substitation also extends to the various functions that
> have size_t in them:

No, it does not.  It's specifically about iov_iter_truncate(); moreover,
it matters to only one caller of that sucker.  Namely,

static ssize_t blkdev_read_iter(struct kiocb *iocb, struct iov_iter *to)
{
        struct file *file = iocb->ki_filp;
        struct inode *bd_inode = file->f_mapping->host;
        loff_t size = i_size_read(bd_inode);
        loff_t pos = iocb->ki_pos;

        if (pos >= size)
                return 0;

        size -= pos;
        iov_iter_truncate(to, size);
        return generic_file_read_iter(iocb, to);
}

What happens here is capping to->count, to guarantee that we won't even look
at anything past the end of block device.  Alternative fix would be to
have
	if (pos >= size)
		return 0;
	if (to->size + pos > size) {
		/* note that size - pos fits into size_t in this case,
		 * so it's OK to pass it to iov_iter_truncate().
		 */
		iov_iter_truncate(to, size - pos);
	}
        return generic_file_read_iter(iocb, to);
in there.  Other callers are passing it size_t values already, so we don't
need similar checks there.

Or we can make iov_iter_truncate() take an arbitrary u64 argument, seeing that
it's inlined anyway.  IMO it's more robust that way...

Anyway, does the following alone fix the problem you are seeing?

diff --git a/include/linux/uio.h b/include/linux/uio.h
index ddfdb53..dbb02d4 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -94,7 +94,7 @@ static inline size_t iov_iter_count(struct iov_iter *i)
 	return i->count;
 }
 
-static inline void iov_iter_truncate(struct iov_iter *i, size_t count)
+static inline void iov_iter_truncate(struct iov_iter *i, u64 count)
 {
 	if (i->count > count)
 		i->count = count;

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: 32-bit bug in iovec iterator changes
  2014-06-21 23:49               ` Al Viro
@ 2014-06-22  0:03                 ` James Bottomley
  2014-06-22  0:26                   ` Al Viro
  0 siblings, 1 reply; 22+ messages in thread
From: James Bottomley @ 2014-06-22  0:03 UTC (permalink / raw)
  To: Al Viro
  Cc: Theodore Ts'o, Dave Chinner, Jens Axboe, linux-kernel, linux-scsi

On Sun, 2014-06-22 at 00:49 +0100, Al Viro wrote:
> On Sat, Jun 21, 2014 at 07:09:22PM -0400, Theodore Ts'o wrote:
> > On Sat, Jun 21, 2014 at 06:53:07AM +0100, Al Viro wrote:
> > > 
> > > ed include/linux/uio.h <<EOF
> > > /iov_iter_truncate/s/size_t/u64/
> > > w
> > > q
> > > EOF
> > > 
> > > Could you check if that fixes the sucker?
> > 
> > The following patch (attached at the end) appears to fix the problem,
> > but looking at uio.h, I'm completely confused about *why* it fixes the
> > problem.  In particular, iov_iter_iovec() makes no sense to me at all,
> > and I don't understand how the calculation of iov_len makes any sense:
> > 
> > 		.iov_len = min(iter->count,
> > 			       iter->iov->iov_len - iter->iov_offset),
> 
> Eh?   We have iov[0].iov_base..iov[0].iov_base+iov[0].iov_len - 1 for
> area covered by the first iovec.  First iov_offset bytes have already
> been consumed.  And at most count bytes matter.  So yes, this iov_len
> will give you equivalent first iovec.
> 
> Said that, iov_iter_iovec() will die shortly - it's a rudiment of older
> code, with almost no users left.  But yes, it is correct.
> 
> > It also looks like uio.h is mostly about offsets to memory pointers,
> > and so why this would make a difference when the issue is the block
> > device offset goes above 2**30?
> 
> It is, and your patch is a huge overkill.
> 
> > There must be deep magic going on here, and so I don't know if your
> > s/size_t/u64/g substitation also extends to the various functions that
> > have size_t in them:
> 
> No, it does not.  It's specifically about iov_iter_truncate(); moreover,
> it matters to only one caller of that sucker.  Namely,
> 
> static ssize_t blkdev_read_iter(struct kiocb *iocb, struct iov_iter *to)
> {
>         struct file *file = iocb->ki_filp;
>         struct inode *bd_inode = file->f_mapping->host;
>         loff_t size = i_size_read(bd_inode);
>         loff_t pos = iocb->ki_pos;
> 
>         if (pos >= size)
>                 return 0;
> 
>         size -= pos;
>         iov_iter_truncate(to, size);
>         return generic_file_read_iter(iocb, to);
> }
> 
> What happens here is capping to->count, to guarantee that we won't even look
> at anything past the end of block device.  Alternative fix would be to
> have
> 	if (pos >= size)
> 		return 0;
> 	if (to->size + pos > size) {
> 		/* note that size - pos fits into size_t in this case,
> 		 * so it's OK to pass it to iov_iter_truncate().
> 		 */
> 		iov_iter_truncate(to, size - pos);
> 	}
>         return generic_file_read_iter(iocb, to);
> in there.  Other callers are passing it size_t values already, so we don't
> need similar checks there.
> 
> Or we can make iov_iter_truncate() take an arbitrary u64 argument, seeing that
> it's inlined anyway.  IMO it's more robust that way...
> 
> Anyway, does the following alone fix the problem you are seeing?
> 
> diff --git a/include/linux/uio.h b/include/linux/uio.h
> index ddfdb53..dbb02d4 100644
> --- a/include/linux/uio.h
> +++ b/include/linux/uio.h
> @@ -94,7 +94,7 @@ static inline size_t iov_iter_count(struct iov_iter *i)
>  	return i->count;
>  }
>  
> -static inline void iov_iter_truncate(struct iov_iter *i, size_t count)
> +static inline void iov_iter_truncate(struct iov_iter *i, u64 count)
>  {
>  	if (i->count > count)
>  		i->count = count;

Al, how can that work?  i->count is size_t, which is 32 bit, so we're
going to get truncation errors. I could see this possibly working if
count in struct iov_iter becomes u64 (which is going to have a lot of
knock on consequences, but it seems to me that at least kvec.iov_len and
iov_iter.iov_offset have to become u64 as well.

James



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 32-bit bug in iovec iterator changes
  2014-06-22  0:03                 ` James Bottomley
@ 2014-06-22  0:26                   ` Al Viro
  2014-06-22  0:32                     ` James Bottomley
  0 siblings, 1 reply; 22+ messages in thread
From: Al Viro @ 2014-06-22  0:26 UTC (permalink / raw)
  To: James Bottomley
  Cc: Theodore Ts'o, Dave Chinner, Jens Axboe, linux-kernel, linux-scsi

On Sat, Jun 21, 2014 at 05:03:20PM -0700, James Bottomley wrote:

> > Anyway, does the following alone fix the problem you are seeing?
> > 
> > diff --git a/include/linux/uio.h b/include/linux/uio.h
> > index ddfdb53..dbb02d4 100644
> > --- a/include/linux/uio.h
> > +++ b/include/linux/uio.h
> > @@ -94,7 +94,7 @@ static inline size_t iov_iter_count(struct iov_iter *i)
> >  	return i->count;
> >  }
> >  
> > -static inline void iov_iter_truncate(struct iov_iter *i, size_t count)
> > +static inline void iov_iter_truncate(struct iov_iter *i, u64 count)
> >  {
> >  	if (i->count > count)
> >  		i->count = count;
> 
> Al, how can that work?  i->count is size_t, which is 32 bit, so we're
> going to get truncation errors.

No, we are not.  Look:
	* comparison promotes both operands to u64 here, so its result is
accurate, no matter how large count is.  They are compared as natural
numbers.
	* assignment converts count to size_t, which *would* truncate for
values that are greater than the maximal value representable by size_t.
But in that case it's by definition greater than i->count, so we do not
reach that assignment at all.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 32-bit bug in iovec iterator changes
  2014-06-22  0:26                   ` Al Viro
@ 2014-06-22  0:32                     ` James Bottomley
  2014-06-22  0:53                       ` Al Viro
  0 siblings, 1 reply; 22+ messages in thread
From: James Bottomley @ 2014-06-22  0:32 UTC (permalink / raw)
  To: Al Viro
  Cc: Theodore Ts'o, Dave Chinner, Jens Axboe, linux-kernel, linux-scsi

On Sun, 2014-06-22 at 01:26 +0100, Al Viro wrote:
> On Sat, Jun 21, 2014 at 05:03:20PM -0700, James Bottomley wrote:
> 
> > > Anyway, does the following alone fix the problem you are seeing?
> > > 
> > > diff --git a/include/linux/uio.h b/include/linux/uio.h
> > > index ddfdb53..dbb02d4 100644
> > > --- a/include/linux/uio.h
> > > +++ b/include/linux/uio.h
> > > @@ -94,7 +94,7 @@ static inline size_t iov_iter_count(struct iov_iter *i)
> > >  	return i->count;
> > >  }
> > >  
> > > -static inline void iov_iter_truncate(struct iov_iter *i, size_t count)
> > > +static inline void iov_iter_truncate(struct iov_iter *i, u64 count)
> > >  {
> > >  	if (i->count > count)
> > >  		i->count = count;
> > 
> > Al, how can that work?  i->count is size_t, which is 32 bit, so we're
> > going to get truncation errors.
> 
> No, we are not.  Look:
> 	* comparison promotes both operands to u64 here, so its result is
> accurate, no matter how large count is.  They are compared as natural
> numbers.

True ... figured this out 10 seconds after sending the email.

> 	* assignment converts count to size_t, which *would* truncate for
> values that are greater than the maximal value representable by size_t.
> But in that case it's by definition greater than i->count, so we do not
> reach that assignment at all.

OK, so what I still don't get is why isn't the compiler warning when we
truncate a u64 to a u32?  We should get that warning in your new code,
and we should have got that warning in fs/block_dev.c where it would
have pinpointed the actual problem.

James



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 32-bit bug in iovec iterator changes
  2014-06-22  0:32                     ` James Bottomley
@ 2014-06-22  0:53                       ` Al Viro
  2014-06-22  1:00                         ` Al Viro
  2014-06-22  1:00                         ` 32-bit bug in iovec iterator changes James Bottomley
  0 siblings, 2 replies; 22+ messages in thread
From: Al Viro @ 2014-06-22  0:53 UTC (permalink / raw)
  To: James Bottomley
  Cc: Theodore Ts'o, Dave Chinner, Jens Axboe, linux-kernel, linux-scsi

On Sat, Jun 21, 2014 at 05:32:44PM -0700, James Bottomley wrote:
> > No, we are not.  Look:
> > 	* comparison promotes both operands to u64 here, so its result is
> > accurate, no matter how large count is.  They are compared as natural
> > numbers.
> 
> True ... figured this out 10 seconds after sending the email.
> 
> > 	* assignment converts count to size_t, which *would* truncate for
> > values that are greater than the maximal value representable by size_t.
> > But in that case it's by definition greater than i->count, so we do not
> > reach that assignment at all.
> 
> OK, so what I still don't get is why isn't the compiler warning when we
> truncate a u64 to a u32?  We should get that warning in your new code,
> and we should have got that warning in fs/block_dev.c where it would
> have pinpointed the actual problem.

In which universe?

extern void f(unsigned int);

void g(unsigned long x)
{
	f(x);
}

is perfectly valid C, with no warnings in sight.  f(1UL << 32) might
give one, but not this...

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 32-bit bug in iovec iterator changes
  2014-06-22  0:53                       ` Al Viro
@ 2014-06-22  1:00                         ` Al Viro
  2014-06-22 11:50                           ` Theodore Ts'o
  2014-06-22  1:00                         ` 32-bit bug in iovec iterator changes James Bottomley
  1 sibling, 1 reply; 22+ messages in thread
From: Al Viro @ 2014-06-22  1:00 UTC (permalink / raw)
  To: James Bottomley
  Cc: Theodore Ts'o, Dave Chinner, Jens Axboe, linux-kernel, linux-scsi

On Sun, Jun 22, 2014 at 01:53:52AM +0100, Al Viro wrote:
> On Sat, Jun 21, 2014 at 05:32:44PM -0700, James Bottomley wrote:
> > > No, we are not.  Look:
> > > 	* comparison promotes both operands to u64 here, so its result is
> > > accurate, no matter how large count is.  They are compared as natural
> > > numbers.
> > 
> > True ... figured this out 10 seconds after sending the email.
> > 
> > > 	* assignment converts count to size_t, which *would* truncate for
> > > values that are greater than the maximal value representable by size_t.
> > > But in that case it's by definition greater than i->count, so we do not
> > > reach that assignment at all.
> > 
> > OK, so what I still don't get is why isn't the compiler warning when we
> > truncate a u64 to a u32?  We should get that warning in your new code,
> > and we should have got that warning in fs/block_dev.c where it would
> > have pinpointed the actual problem.
> 
> In which universe?
> 
> extern void f(unsigned int);
> 
> void g(unsigned long x)
> {
> 	f(x);
> }
> 
> is perfectly valid C, with no warnings in sight.  f(1UL << 32) might
> give one, but not this...

PS: I agree that it's worth careful commenting, obviously, but before sending
it to Linus (*with* comments) I want to get a confirmation that this one-liner
actually fixes what Ted is seeing.  I have reproduced it here, and that change
makes the breakage go away in my testing, but I'd like to make sure that we are
seeing the same thing.  Ted?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 32-bit bug in iovec iterator changes
  2014-06-22  0:53                       ` Al Viro
  2014-06-22  1:00                         ` Al Viro
@ 2014-06-22  1:00                         ` James Bottomley
  1 sibling, 0 replies; 22+ messages in thread
From: James Bottomley @ 2014-06-22  1:00 UTC (permalink / raw)
  To: Al Viro
  Cc: Theodore Ts'o, Dave Chinner, Jens Axboe, linux-kernel, linux-scsi

On Sun, 2014-06-22 at 01:53 +0100, Al Viro wrote:
> On Sat, Jun 21, 2014 at 05:32:44PM -0700, James Bottomley wrote:
> > > No, we are not.  Look:
> > > 	* comparison promotes both operands to u64 here, so its result is
> > > accurate, no matter how large count is.  They are compared as natural
> > > numbers.
> > 
> > True ... figured this out 10 seconds after sending the email.
> > 
> > > 	* assignment converts count to size_t, which *would* truncate for
> > > values that are greater than the maximal value representable by size_t.
> > > But in that case it's by definition greater than i->count, so we do not
> > > reach that assignment at all.
> > 
> > OK, so what I still don't get is why isn't the compiler warning when we
> > truncate a u64 to a u32?  We should get that warning in your new code,
> > and we should have got that warning in fs/block_dev.c where it would
> > have pinpointed the actual problem.
> 
> In which universe?
> 
> extern void f(unsigned int);
> 
> void g(unsigned long x)
> {
> 	f(x);
> }

In the one where the code is compiled with -Wconversion ... I'm just
surprised, I thought we had this enabled.

James



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 32-bit bug in iovec iterator changes
  2014-06-22  1:00                         ` Al Viro
@ 2014-06-22 11:50                           ` Theodore Ts'o
  2014-06-23  7:44                             ` [regression] fix 32-bit breakage in block device read(2) (was Re: 32-bit bug in iovec iterator changes) Al Viro
  0 siblings, 1 reply; 22+ messages in thread
From: Theodore Ts'o @ 2014-06-22 11:50 UTC (permalink / raw)
  To: Al Viro
  Cc: James Bottomley, Dave Chinner, Jens Axboe, linux-kernel, linux-scsi

On Sun, Jun 22, 2014 at 02:00:32AM +0100, Al Viro wrote:
> 
> PS: I agree that it's worth careful commenting, obviously, but
> before sending it to Linus (*with* comments) I want to get a
> confirmation that this one-liner actually fixes what Ted is seeing.
> I have reproduced it here, and that change makes the breakage go
> away in my testing, but I'd like to make sure that we are seeing the
> same thing.  Ted?

Hep, that fixes things.  Thanks for explaining what was going on!

     	  		 	    	       - Ted

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [regression] fix 32-bit breakage in block device read(2) (was Re: 32-bit bug in iovec iterator changes)
  2014-06-22 11:50                           ` Theodore Ts'o
@ 2014-06-23  7:44                             ` Al Viro
  2014-06-23 15:43                               ` Theodore Ts'o
                                                 ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Al Viro @ 2014-06-23  7:44 UTC (permalink / raw)
  To: Linus Torvalds, Theodore Ts'o, James Bottomley, Dave Chinner,
	Jens Axboe, linux-kernel, linux-scsi

On Sun, Jun 22, 2014 at 07:50:07AM -0400, Theodore Ts'o wrote:
> On Sun, Jun 22, 2014 at 02:00:32AM +0100, Al Viro wrote:
> > 
> > PS: I agree that it's worth careful commenting, obviously, but
> > before sending it to Linus (*with* comments) I want to get a
> > confirmation that this one-liner actually fixes what Ted is seeing.
> > I have reproduced it here, and that change makes the breakage go
> > away in my testing, but I'd like to make sure that we are seeing the
> > same thing.  Ted?
> 
> Hep, that fixes things.  Thanks for explaining what was going on!

OK, here it is, hopefully with sufficient comments:
    
blkdev_read_iter() wants to cap the iov_iter by the amount of
data remaining to the end of device.  That's what iov_iter_truncate()
is for (trim iter->count if it's above the given limit).  So far,
so good, but the argument of iov_iter_truncate() is size_t, so on
32bit boxen (in case of a large device) we end up with that upper
limit truncated down to 32 bits *before* comparing it with iter->count.

Easily fixed by making iov_iter_truncate() take 64bit argument -
it does the right thing after such change (we only reach the
assignment in there when the current value of iter->count is greater
than the limit, i.e. for anything that would get truncated we don't
reach the assignment at all) and that argument is not the new
value of iter->count - it's an upper limit for such.

The overhead of passing u64 is not an issue - the thing is inlined,
so callers passing size_t won't pay any penalty.

Reported-by: Theodore Tso <tytso@mit.edu>
Tested-by: Theodore Tso <tytso@mit.edu>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---

diff --git a/include/linux/uio.h b/include/linux/uio.h
index ddfdb53..17ae7e3 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -94,8 +94,20 @@ static inline size_t iov_iter_count(struct iov_iter *i)
 	return i->count;
 }
 
-static inline void iov_iter_truncate(struct iov_iter *i, size_t count)
+/*
+ * Cap the iov_iter by given limit; note that the second argument is
+ * *not* the new size - it's upper limit for such.  Passing it a value
+ * greater than the amount of data in iov_iter is fine - it'll just do
+ * nothing in that case.
+ */
+static inline void iov_iter_truncate(struct iov_iter *i, u64 count)
 {
+	/*
+	 * count doesn't have to fit in size_t - comparison extends both
+	 * operands to u64 here and any value that would be truncated by
+	 * conversion in assignement is by definition greater than all
+	 * values of size_t, including old i->count.
+	 */
 	if (i->count > count)
 		i->count = count;
 }

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [regression] fix 32-bit breakage in block device read(2) (was Re: 32-bit bug in iovec iterator changes)
  2014-06-23  7:44                             ` [regression] fix 32-bit breakage in block device read(2) (was Re: 32-bit bug in iovec iterator changes) Al Viro
@ 2014-06-23 15:43                               ` Theodore Ts'o
  2014-06-24 12:33                                 ` One Thousand Gnomes
  2014-06-25 16:56                               ` Linus Torvalds
  2014-06-26 15:27                               ` Bruno Wolff III
  2 siblings, 1 reply; 22+ messages in thread
From: Theodore Ts'o @ 2014-06-23 15:43 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, James Bottomley, Dave Chinner, Jens Axboe,
	linux-kernel, linux-scsi

On Mon, Jun 23, 2014 at 08:44:40AM +0100, Al Viro wrote:
> 
> OK, here it is, hopefully with sufficient comments:

The comments look really good.  I assume you'll get this to
Linus in time for 3.16-rc3?

Many thanks!!

					- Ted

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [regression] fix 32-bit breakage in block device read(2) (was Re: 32-bit bug in iovec iterator changes)
  2014-06-23 15:43                               ` Theodore Ts'o
@ 2014-06-24 12:33                                 ` One Thousand Gnomes
  0 siblings, 0 replies; 22+ messages in thread
From: One Thousand Gnomes @ 2014-06-24 12:33 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Al Viro, Linus Torvalds, James Bottomley, Dave Chinner,
	Jens Axboe, linux-kernel, linux-scsi

On Mon, 23 Jun 2014 11:43:02 -0400
"Theodore Ts'o" <tytso@mit.edu> wrote:

> On Mon, Jun 23, 2014 at 08:44:40AM +0100, Al Viro wrote:
> > 
> > OK, here it is, hopefully with sufficient comments:
> 
> The comments look really good.  I assume you'll get this to
> Linus in time for 3.16-rc3?

Fixes the 32GB 'can't partition' bug I reported too.

Alan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [regression] fix 32-bit breakage in block device read(2) (was Re: 32-bit bug in iovec iterator changes)
  2014-06-23  7:44                             ` [regression] fix 32-bit breakage in block device read(2) (was Re: 32-bit bug in iovec iterator changes) Al Viro
  2014-06-23 15:43                               ` Theodore Ts'o
@ 2014-06-25 16:56                               ` Linus Torvalds
  2014-06-26 15:27                               ` Bruno Wolff III
  2 siblings, 0 replies; 22+ messages in thread
From: Linus Torvalds @ 2014-06-25 16:56 UTC (permalink / raw)
  To: Al Viro
  Cc: Theodore Ts'o, James Bottomley, Dave Chinner, Jens Axboe,
	Linux Kernel Mailing List, Linux SCSI List

Al,
 just checking - did you expect me to take this from the email, or are
you preparing a pull request?

               Linus

On Mon, Jun 23, 2014 at 12:44 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> OK, here it is, hopefully with sufficient comments:

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [regression] fix 32-bit breakage in block device read(2) (was Re: 32-bit bug in iovec iterator changes)
  2014-06-23  7:44                             ` [regression] fix 32-bit breakage in block device read(2) (was Re: 32-bit bug in iovec iterator changes) Al Viro
  2014-06-23 15:43                               ` Theodore Ts'o
  2014-06-25 16:56                               ` Linus Torvalds
@ 2014-06-26 15:27                               ` Bruno Wolff III
  2 siblings, 0 replies; 22+ messages in thread
From: Bruno Wolff III @ 2014-06-26 15:27 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Theodore Ts'o, James Bottomley, Dave Chinner,
	Jens Axboe, linux-kernel, linux-scsi

On Mon, Jun 23, 2014 at 08:44:40 +0100,
  Al Viro <viro@ZenIV.linux.org.uk> wrote:
>
>blkdev_read_iter() wants to cap the iov_iter by the amount of
>data remaining to the end of device.  That's what iov_iter_truncate()
>is for (trim iter->count if it's above the given limit).  So far,
>so good, but the argument of iov_iter_truncate() is size_t, so on
>32bit boxen (in case of a large device) we end up with that upper
>limit truncated down to 32 bits *before* comparing it with iter->count.

This seems to fix a problem I had 
(https://bugzilla.kernel.org/show_bug.cgi?id=78711) with a partition device 
(/dev/sda3) being zero size on 3.16 kernels. However I am having some 
other issues with 3.16 on i686 and the amount of testing was the raid 
array using /dev/sda3 appeared to start (which it hadn't previously), but 
the system hung before finishing the boot process.

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2014-06-26 15:35 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-19 15:35 BUG: scheduling while atomic in blk_mq codepath? Theodore Ts'o
2014-06-19 15:59 ` Jens Axboe
2014-06-19 15:59   ` Jens Axboe
2014-06-19 16:08   ` Theodore Ts'o
2014-06-19 16:21     ` Theodore Ts'o
2014-06-19 22:38       ` Dave Chinner
2014-06-21  3:51         ` 32-bit bug in iovec iterator changes Theodore Ts'o
2014-06-21  5:53           ` Al Viro
2014-06-21 23:09             ` Theodore Ts'o
2014-06-21 23:49               ` Al Viro
2014-06-22  0:03                 ` James Bottomley
2014-06-22  0:26                   ` Al Viro
2014-06-22  0:32                     ` James Bottomley
2014-06-22  0:53                       ` Al Viro
2014-06-22  1:00                         ` Al Viro
2014-06-22 11:50                           ` Theodore Ts'o
2014-06-23  7:44                             ` [regression] fix 32-bit breakage in block device read(2) (was Re: 32-bit bug in iovec iterator changes) Al Viro
2014-06-23 15:43                               ` Theodore Ts'o
2014-06-24 12:33                                 ` One Thousand Gnomes
2014-06-25 16:56                               ` Linus Torvalds
2014-06-26 15:27                               ` Bruno Wolff III
2014-06-22  1:00                         ` 32-bit bug in iovec iterator changes James Bottomley

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.