* [PATCH 1/5] bh: Prevent panic on invalid BHs
2017-04-06 12:02 [PATCH 0/5] falloc on blockdevice: what possibly can go whong? Dmitry Monakhov
@ 2017-04-06 12:02 ` Dmitry Monakhov
2017-04-06 15:42 ` Christoph Hellwig
2017-04-06 12:02 ` [PATCH 2/5] block: protect bdevname from null pointer bdev Dmitry Monakhov
` (3 subsequent siblings)
4 siblings, 1 reply; 10+ messages in thread
From: Dmitry Monakhov @ 2017-04-06 12:02 UTC (permalink / raw)
To: linux-kernel; +Cc: darrick.wong, axboe, tytso, jack, hch, Dmitry Monakhov
- Convert BUG_ON to WARN_ON+EIO on submit_bh.
Leave BUG_ON(!bh->b_end_io) as is because this is static bug in
submission logic which can not be handled at runtime anyway.
unmapped BH is also special case which signal about user
misbehavior, so just dump error messageю
- guard __find_get_block from null pointer bdev.
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
---
fs/buffer.c | 24 +++++++++++++++++++-----
1 file changed, 19 insertions(+), 5 deletions(-)
diff --git a/fs/buffer.c b/fs/buffer.c
index 9196f2a..4c8ce74 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -1355,8 +1355,12 @@ lookup_bh_lru(struct block_device *bdev, sector_t block, unsigned size)
struct buffer_head *
__find_get_block(struct block_device *bdev, sector_t block, unsigned size)
{
- struct buffer_head *bh = lookup_bh_lru(bdev, block, size);
+ struct buffer_head *bh;
+
+ if (WARN_ON_ONCE(!bdev))
+ return NULL;
+ bh = lookup_bh_lru(bdev, block, size);
if (bh == NULL) {
/* __find_get_block_slow will mark the page accessed */
bh = __find_get_block_slow(bdev, block);
@@ -3099,11 +3103,18 @@ static int submit_bh_wbc(int op, int op_flags, struct buffer_head *bh,
{
struct bio *bio;
- BUG_ON(!buffer_locked(bh));
- BUG_ON(!buffer_mapped(bh));
BUG_ON(!bh->b_end_io);
- BUG_ON(buffer_delay(bh));
- BUG_ON(buffer_unwritten(bh));
+
+ if (WARN_ON_ONCE(!buffer_locked(bh)))
+ goto bad_bh;
+ if (WARN_ON_ONCE(buffer_delay(bh)))
+ goto bad_bh;
+ if (WARN_ON_ONCE(buffer_unwritten(bh)))
+ goto bad_bh;
+ if (unlikely(!buffer_mapped(bh))) {
+ buffer_io_error(bh, ", bh not mapped");
+ goto bad_bh;
+ }
/*
* Only clear out a write error when rewriting
@@ -3143,6 +3154,9 @@ static int submit_bh_wbc(int op, int op_flags, struct buffer_head *bh,
submit_bio(bio);
return 0;
+bad_bh:
+ bh->b_end_io(bh, 0);
+ return -EIO;
}
int _submit_bh(int op, int op_flags, struct buffer_head *bh,
--
2.9.3
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 1/5] bh: Prevent panic on invalid BHs
2017-04-06 12:02 ` [PATCH 1/5] bh: Prevent panic on invalid BHs Dmitry Monakhov
@ 2017-04-06 15:42 ` Christoph Hellwig
2017-04-06 16:01 ` Dmitry Monakhov
0 siblings, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2017-04-06 15:42 UTC (permalink / raw)
To: Dmitry Monakhov; +Cc: linux-kernel, darrick.wong, axboe, tytso, jack, hch
This look ok, but how did you manage to trigger this case? I think
we might have a deeper problem here.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/5] bh: Prevent panic on invalid BHs
2017-04-06 15:42 ` Christoph Hellwig
@ 2017-04-06 16:01 ` Dmitry Monakhov
0 siblings, 0 replies; 10+ messages in thread
From: Dmitry Monakhov @ 2017-04-06 16:01 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: linux-kernel, darrick.wong, axboe, tytso, jack, hch
Christoph Hellwig <hch@infradead.org> writes:
> This look ok, but how did you manage to trigger this case?
# testcases
# TEST1
# Via bug in fallocate
truncate -l 1G img
losetup /dev/loop img
mkfs.ext4 -qF /dev/loop0
mkdir m
mount /dev/loop0 m
# command above truncate bdevs pagecache
xfs_io -c "falloc -k 0 32G" -d /dev/loop0
for ((i=0;i<100;i++));do
xfs_io -c "pwrite 0 4k" -d m/test-$i;
done
sync
# TEST2: NBD close_sock -> kill_bdev
mkdir -p a/mnt
cd a
truncate -s 1G img
mkfs.ext4 -qF img
qemu-nbd -c /dev/nbd0 img
mount /dev/nbd0 /mnt
cp -r /bin/ /mnt&
# Disconnect nbd while cp is active
qemu-nbd -d /dev/nbd0
sync
> I think
> we might have a deeper problem here.
Probably. It seems that !buffer_locked(bh) case should stay BUG_ON
because it is hard to make semi-correct decesion here.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 2/5] block: protect bdevname from null pointer bdev
2017-04-06 12:02 [PATCH 0/5] falloc on blockdevice: what possibly can go whong? Dmitry Monakhov
2017-04-06 12:02 ` [PATCH 1/5] bh: Prevent panic on invalid BHs Dmitry Monakhov
@ 2017-04-06 12:02 ` Dmitry Monakhov
2017-04-06 12:02 ` [PATCH 3/5] bio: Protect submit_bio from bdevless bio-s Dmitry Monakhov
` (2 subsequent siblings)
4 siblings, 0 replies; 10+ messages in thread
From: Dmitry Monakhov @ 2017-04-06 12:02 UTC (permalink / raw)
To: linux-kernel; +Cc: darrick.wong, axboe, tytso, jack, hch, Dmitry Monakhov
Some callers (usually error paths) call bdevname with null bdev.
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
---
block/partition-generic.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/block/partition-generic.c b/block/partition-generic.c
index 7afb990..284de18 100644
--- a/block/partition-generic.c
+++ b/block/partition-generic.c
@@ -46,6 +46,8 @@ char *disk_name(struct gendisk *hd, int partno, char *buf)
const char *bdevname(struct block_device *bdev, char *buf)
{
+ if (unlikely(!bdev))
+ return snprintf(buf, BDEVNAME_SIZE, "unknown-block(null)");
return disk_name(bdev->bd_disk, bdev->bd_part->partno, buf);
}
--
2.9.3
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 3/5] bio: Protect submit_bio from bdevless bio-s
2017-04-06 12:02 [PATCH 0/5] falloc on blockdevice: what possibly can go whong? Dmitry Monakhov
2017-04-06 12:02 ` [PATCH 1/5] bh: Prevent panic on invalid BHs Dmitry Monakhov
2017-04-06 12:02 ` [PATCH 2/5] block: protect bdevname from null pointer bdev Dmitry Monakhov
@ 2017-04-06 12:02 ` Dmitry Monakhov
2017-04-06 12:02 ` [PATCH 4/5] jbd2: use stable bdev pointer Dmitry Monakhov
2017-04-06 12:02 ` [PATCH 5/5] block: truncate page cache only when necessary on fallocate Dmitry Monakhov
4 siblings, 0 replies; 10+ messages in thread
From: Dmitry Monakhov @ 2017-04-06 12:02 UTC (permalink / raw)
To: linux-kernel; +Cc: darrick.wong, axboe, tytso, jack, hch, Dmitry Monakhov
Idially this type of check should be handled at generic_make_request_checks
, but it is too late for bdevless bios, bad pointer was already
dereferenced at that point.
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
---
block/blk-core.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/block/blk-core.c b/block/blk-core.c
index 275c1e4..e583ded 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2065,6 +2065,11 @@ EXPORT_SYMBOL(generic_make_request);
*/
blk_qc_t submit_bio(struct bio *bio)
{
+ if (WARN_ON_ONCE(!bio->bi_bdev)) {
+ bio_io_error(bio);
+ return BLK_QC_T_NONE;
+ }
+
/*
* If it's a regular read/write or a barrier with data attached,
* go through the normal accounting stuff before submission.
--
2.9.3
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 4/5] jbd2: use stable bdev pointer
2017-04-06 12:02 [PATCH 0/5] falloc on blockdevice: what possibly can go whong? Dmitry Monakhov
` (2 preceding siblings ...)
2017-04-06 12:02 ` [PATCH 3/5] bio: Protect submit_bio from bdevless bio-s Dmitry Monakhov
@ 2017-04-06 12:02 ` Dmitry Monakhov
2017-04-06 12:02 ` [PATCH 5/5] block: truncate page cache only when necessary on fallocate Dmitry Monakhov
4 siblings, 0 replies; 10+ messages in thread
From: Dmitry Monakhov @ 2017-04-06 12:02 UTC (permalink / raw)
To: linux-kernel; +Cc: darrick.wong, axboe, tytso, jack, hch, Dmitry Monakhov
This prevent us from panic if someone invalidate bh under us.
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
---
fs/jbd2/revoke.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/jbd2/revoke.c b/fs/jbd2/revoke.c
index f9aefcd..e3b791d 100644
--- a/fs/jbd2/revoke.c
+++ b/fs/jbd2/revoke.c
@@ -459,7 +459,8 @@ int jbd2_journal_cancel_revoke(handle_t *handle, struct journal_head *jh)
* state machine will get very upset later on. */
if (need_cancel) {
struct buffer_head *bh2;
- bh2 = __find_get_block(bh->b_bdev, bh->b_blocknr, bh->b_size);
+ bh2 = __find_get_block(journal->j_dev, bh->b_blocknr,
+ bh->b_size);
if (bh2) {
if (bh2 != bh)
clear_buffer_revoked(bh2);
--
2.9.3
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 5/5] block: truncate page cache only when necessary on fallocate
2017-04-06 12:02 [PATCH 0/5] falloc on blockdevice: what possibly can go whong? Dmitry Monakhov
` (3 preceding siblings ...)
2017-04-06 12:02 ` [PATCH 4/5] jbd2: use stable bdev pointer Dmitry Monakhov
@ 2017-04-06 12:02 ` Dmitry Monakhov
2017-04-06 15:43 ` Christoph Hellwig
4 siblings, 1 reply; 10+ messages in thread
From: Dmitry Monakhov @ 2017-04-06 12:02 UTC (permalink / raw)
To: linux-kernel; +Cc: darrick.wong, axboe, tytso, jack, hch, Dmitry Monakhov
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
---
fs/block_dev.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 2eca00e..f4b13e1 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -2075,7 +2075,7 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start,
{
struct block_device *bdev = I_BDEV(bdev_file_inode(file));
struct request_queue *q = bdev_get_queue(bdev);
- struct address_space *mapping;
+ struct address_space *mapping = bdev->bd_inode->i_mapping;
loff_t end = start + len - 1;
loff_t isize;
int error;
@@ -2102,13 +2102,10 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start,
if ((start | len) & (bdev_logical_block_size(bdev) - 1))
return -EINVAL;
- /* Invalidate the page cache, including dirty pages. */
- mapping = bdev->bd_inode->i_mapping;
- truncate_inode_pages_range(mapping, start, end);
-
switch (mode) {
case FALLOC_FL_ZERO_RANGE:
case FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE:
+ truncate_inode_pages_range(mapping, start, end);
error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9,
GFP_KERNEL, false);
break;
@@ -2116,12 +2113,14 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start,
/* Only punch if the device can do zeroing discard. */
if (!blk_queue_discard(q) || !q->limits.discard_zeroes_data)
return -EOPNOTSUPP;
+ truncate_inode_pages_range(mapping, start, end);
error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
GFP_KERNEL, 0);
break;
case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE | FALLOC_FL_NO_HIDE_STALE:
if (!blk_queue_discard(q))
return -EOPNOTSUPP;
+ truncate_inode_pages_range(mapping, start, end);
error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
GFP_KERNEL, 0);
break;
--
2.9.3
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 5/5] block: truncate page cache only when necessary on fallocate
2017-04-06 12:02 ` [PATCH 5/5] block: truncate page cache only when necessary on fallocate Dmitry Monakhov
@ 2017-04-06 15:43 ` Christoph Hellwig
2017-04-06 15:51 ` Dmitry Monakhov
0 siblings, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2017-04-06 15:43 UTC (permalink / raw)
To: Dmitry Monakhov; +Cc: linux-kernel, darrick.wong, axboe, tytso, jack, hch
why?
On Thu, Apr 06, 2017 at 04:02:49PM +0400, Dmitry Monakhov wrote:
> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
> ---
> fs/block_dev.c | 9 ++++-----
> 1 file changed, 4 insertions(+), 5 deletions(-)
>
> diff --git a/fs/block_dev.c b/fs/block_dev.c
> index 2eca00e..f4b13e1 100644
> --- a/fs/block_dev.c
> +++ b/fs/block_dev.c
> @@ -2075,7 +2075,7 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start,
> {
> struct block_device *bdev = I_BDEV(bdev_file_inode(file));
> struct request_queue *q = bdev_get_queue(bdev);
> - struct address_space *mapping;
> + struct address_space *mapping = bdev->bd_inode->i_mapping;
> loff_t end = start + len - 1;
> loff_t isize;
> int error;
> @@ -2102,13 +2102,10 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start,
> if ((start | len) & (bdev_logical_block_size(bdev) - 1))
> return -EINVAL;
>
> - /* Invalidate the page cache, including dirty pages. */
> - mapping = bdev->bd_inode->i_mapping;
> - truncate_inode_pages_range(mapping, start, end);
> -
> switch (mode) {
> case FALLOC_FL_ZERO_RANGE:
> case FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE:
> + truncate_inode_pages_range(mapping, start, end);
> error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9,
> GFP_KERNEL, false);
> break;
> @@ -2116,12 +2113,14 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start,
> /* Only punch if the device can do zeroing discard. */
> if (!blk_queue_discard(q) || !q->limits.discard_zeroes_data)
> return -EOPNOTSUPP;
> + truncate_inode_pages_range(mapping, start, end);
> error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
> GFP_KERNEL, 0);
> break;
> case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE | FALLOC_FL_NO_HIDE_STALE:
> if (!blk_queue_discard(q))
> return -EOPNOTSUPP;
> + truncate_inode_pages_range(mapping, start, end);
> error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
> GFP_KERNEL, 0);
> break;
> --
> 2.9.3
>
---end quoted text---
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 5/5] block: truncate page cache only when necessary on fallocate
2017-04-06 15:43 ` Christoph Hellwig
@ 2017-04-06 15:51 ` Dmitry Monakhov
0 siblings, 0 replies; 10+ messages in thread
From: Dmitry Monakhov @ 2017-04-06 15:51 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: linux-kernel, darrick.wong, axboe, tytso, jack, hch
Christoph Hellwig <hch@infradead.org> writes:
> why?
because it is not good thing to truncate page cache and fiew lines later
realize that feature is not supported !blk_queue_discard(q) and return ENOTSUPP.
Event more: if mode == FALLOC_FL_KEEP_SIZE then we do nothing and
return ENOTSUPP unconditionally.
IMHO (mode == FALLOC_FL_KEEP_SIZE) is sane API for thin-provision blkdevs
to preallocate space in advance. Nobody use it at the moment, but it may
be usefull in future.
>
> On Thu, Apr 06, 2017 at 04:02:49PM +0400, Dmitry Monakhov wrote:
>> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
>> ---
>> fs/block_dev.c | 9 ++++-----
>> 1 file changed, 4 insertions(+), 5 deletions(-)
>>
>> diff --git a/fs/block_dev.c b/fs/block_dev.c
>> index 2eca00e..f4b13e1 100644
>> --- a/fs/block_dev.c
>> +++ b/fs/block_dev.c
>> @@ -2075,7 +2075,7 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start,
>> {
>> struct block_device *bdev = I_BDEV(bdev_file_inode(file));
>> struct request_queue *q = bdev_get_queue(bdev);
>> - struct address_space *mapping;
>> + struct address_space *mapping = bdev->bd_inode->i_mapping;
>> loff_t end = start + len - 1;
>> loff_t isize;
>> int error;
>> @@ -2102,13 +2102,10 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start,
>> if ((start | len) & (bdev_logical_block_size(bdev) - 1))
>> return -EINVAL;
>>
>> - /* Invalidate the page cache, including dirty pages. */
>> - mapping = bdev->bd_inode->i_mapping;
>> - truncate_inode_pages_range(mapping, start, end);
>> -
>> switch (mode) {
>> case FALLOC_FL_ZERO_RANGE:
>> case FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE:
>> + truncate_inode_pages_range(mapping, start, end);
>> error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9,
>> GFP_KERNEL, false);
>> break;
>> @@ -2116,12 +2113,14 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start,
>> /* Only punch if the device can do zeroing discard. */
>> if (!blk_queue_discard(q) || !q->limits.discard_zeroes_data)
>> return -EOPNOTSUPP;
>> + truncate_inode_pages_range(mapping, start, end);
>> error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
>> GFP_KERNEL, 0);
>> break;
>> case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE | FALLOC_FL_NO_HIDE_STALE:
>> if (!blk_queue_discard(q))
>> return -EOPNOTSUPP;
>> + truncate_inode_pages_range(mapping, start, end);
>> error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
>> GFP_KERNEL, 0);
>> break;
>> --
>> 2.9.3
>>
> ---end quoted text---
^ permalink raw reply [flat|nested] 10+ messages in thread