* [bug report] NBD: rbd-nbd + ext4 stuck after nbd resized
@ 2020-10-19 3:29 lining
0 siblings, 0 replies; 4+ messages in thread
From: lining @ 2020-10-19 3:29 UTC (permalink / raw)
To: josef, axboe, linux-block, yunchuan.wen, ceph-users; +Cc: donglifekernel
Hi kernel、ceph comunity:
We run into an issue that mainly related to the (kernel) nbd driver and (ceph) rbd-nbd.
After some investigations, I found that the root cause of the problem seems to be related to the change in the block size of nbd.
I am not sure whether it is the nbd driver or rbd-nbd bug, however there is such a problem.
What happened:
It will always hang when accessing the mount point of nbd device with ext4 after nbd resized.
Environment information:
- kernel: v4.19.25 or master
- rbd-nbd(ceph): v12.2.0 Luminous or master
- the fs of nbd: ext4
Steps to reproduce:
1. rbd create --size 2G rbdpool/foo # create a 2G size rbd image
2. rbd-nbd map rbdpool/foo # map the rbd image as a local block device /dev/nbd0, block size is 512(the default block size is set in rbd-nbd code when nbd mapped).
3. mkfs.ext4 /dev/nbd0 # mkfs.ext4 on nbd0, only nbd + ext4 can reproduce the problem
4. mount /dev/nbd0 /mnt # mount nbd0 on /mnt
5. rbd resize --size 4G rbdpool/foo # expand the nbd backend image from 2G to 4G size
6. ls /mnt # `ls` stuck here forever
ln@ubuntu:linux>$ ps -ef |grep mnt
root 8670 7519 98 10:16 pts/5 00:28:46 ls --color=auto /mnt/
ln 9508 9293 0 10:45 pts/6 00:00:00 grep --color=auto mnt
ln@ubuntu:linux>$ sudo cat /proc/8670/stack
[<0>] io_schedule+0x1a/0x40
[<0>] __lock_page+0x105/0x150
[<0>] pagecache_get_page+0x199/0x2c0
[<0>] __getblk_gfp+0xef/0x290
[<0>] ext4_getblk+0x83/0x1a0
[<0>] ext4_bread+0x26/0xb0
[<0>] __ext4_read_dirblock+0x34/0x2c0
[<0>] htree_dirblock_to_tree+0x56/0x1c0
[<0>] ext4_htree_fill_tree+0xad/0x330
[<0>] ext4_readdir+0x6a3/0x980
[<0>] iterate_dir+0x9e/0x1a0
[<0>] ksys_getdents64+0xa0/0x130
[<0>] __x64_sys_getdents64+0x1e/0x30
[<0>] do_syscall_64+0x5e/0x110
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[<0>] 0xffffffffffffffff
Some investigations on the kernel side:
By git bisect, I found the problem is related to this commit: https://github.com/torvalds/linux/commit/9a9c3c02eacecf4bfde74b08ed32749a4929a2cf .
The kernel with this commit (9a9c3c02) can reproduce the problem, revert the commit and the problem disappears.
Some Logical analysis about the nbd block size changing:
1. rbd-nbd map rbdpool/foo
=> ioctl NBD_BLKSZSET 512
=> nbd_size_set()
=> nbd_size_update(nbd)
=>{
bdev = bdget_disk(nbd->disk, 0);
bd_set_size(bdev, 512)
set_blocksize(bdev, 512)
}
2. mkfs.ext4 /dev/nbd0
3. mount /dev/nbd0 /mnt
=> vfs mount
=> ext4_mount()
=> …
=> sb_set_blocksize()
=> set_blocksize(bdev, 4096) <= mount ext4 will set the nbd blocksize to 4096
4. rbd resize –size 4G rbdpool/foo
=> ioctl NBD_SET_SIZE 4G <= rbd-nbd will update the latest total size of nbd device
=> nbd_size_set()
=> nbd_size_update(nbd)
=>{
bdev = bdget_disk(nbd->disk, 0);
bd_set_size(bdev, 512)
set_blocksize(bdev, 512) <= the blocksize is set back to 512 [code line: set_blocksize(bdev, config->blksize); ]. It seems to be the root cause.
}
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [bug report] NBD: rbd-nbd + ext4 stuck after nbd resized
2020-10-27 1:18 ` Ming Lei
@ 2020-10-27 2:35 ` lining
0 siblings, 0 replies; 4+ messages in thread
From: lining @ 2020-10-27 2:35 UTC (permalink / raw)
To: Ming Lei
Cc: josef, axboe, linux-block, nbd, yunchuan.wen, ceph-users,
donglifekernel, magicdx, yanhaishuang
Hello Ming,
Thanks for following up on this issue. It can be reproduced on v5.9 kernel.
I reproduced it just now. Here is the details.
ln@ubuntu:linux>$ git describe HEAD
v5.9-14722-gd76913908102
ln@ubuntu:linux>$ uname -a
Linux ubuntu 5.9.0+ #3 SMP Mon Oct 26 16:56:48 CST 2020 x86_64 x86_64
x86_64 GNU/Linux
ln@ubuntu:~>$ sudo bash -x repro.sh
+ umount /tmp/mntnbd
umount: /tmp/mntnbd: no mount point specified.
+ rbd-nbd unmap kcp/foo
rbd-nbd: kcp/foo is not mapped
+ rbd rm kcp/foo
Removing image: 100% complete...done.
+ rbd create -s 2G kcp/foo
+ rbd-nbd map kcp/foo
/dev/nbd0
+ mkfs.ext4 /dev/nbd0
mke2fs 1.45.5 (07-Jan-2020)
Discarding device blocks: done
Creating filesystem with 524288 4k blocks and 131072 inodes
Filesystem UUID: f4b9635c-152f-4042-b9ca-602428628cf0
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912
Allocating group tables: done
Writing inode tables: done
Creating journal (16384 blocks): done
Writing superblocks and filesystem accounting information: done
+ mkdir -p /tmp/mntnbd
+ mount /dev/nbd0 /tmp/mntnbd
+ rbd resize kcp/foo --size 4G
Resizing image: 100% complete...done.
ln@ubuntu:~>$ ls /tmp/mntnbd/
^C^C
ln@ubuntu:~>$ top
top - 10:30:19 up 7 min, 2 users, load average: 2.06, 1.63, 0.82
Tasks: 378 total, 2 running, 376 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.0 us, 8.3 sy, 0.0 ni, 91.6 id, 0.0 wa, 0.0 hi, 0.0 si,
0.0 st
MiB Mem : 15787.1 total, 13036.7 free, 970.5 used, 1779.8 buff/cache
MiB Swap: 0.0 total, 0.0 free, 0.0 used. 14529.2 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
COMMAND
3020 ln 20 0 6508 828 724 R 100.0 0.0 5:48.08 ls
199 root 20 0 0 0 0 I 0.3 0.0 0:00.18
kworker/10:2-events
1058 root 20 0 0 0 0 I 0.3 0.0 0:00.03
kworker/8:2-events
ln@ubuntu:~>$ dmesg
...
[ 75.279029] EXT4-fs (nbd0): mounted filesystem with ordered data
mode. Opts: (null)
[ 78.490171] BUG: kernel NULL pointer dereference, address:
0000000000000010
[ 78.490212] #PF: supervisor read access in kernel mode
[ 78.490223] #PF: error_code(0x0000) - not-present page
[ 78.490254] PGD 0 P4D 0
[ 78.490262] Oops: 0000 [#1] SMP PTI
[ 78.490271] CPU: 9 PID: 2972 Comm: ext4lazyinit Not tainted 5.9.0+ #3
[ 78.490297] Hardware name: VMware, Inc. VMware Virtual Platform/440BX
Desktop Reference Platform, BIOS 6.00 07/22/2020
[ 78.490321] RIP: 0010:__ext4_journal_get_write_access+0x2c/0x120
[ 78.490347] Code: 44 00 00 55 48 89 e5 41 57 49 89 cf 41 56 41 55 41
54 49 89 d4 53 48 83 ec 18 48 89 7d d0 89 75 cc e8 78 74 7b 00 49 8b 47
30 <4c> 8b 68 10 4d 85 ed 74 2f 49 8b 85 d8 00 00 00 49 8b 9d 80 03 00
[ 78.490379] RSP: 0018:ffffb0f581793dd0 EFLAGS: 00010246
[ 78.490389] RAX: 0000000000000000 RBX: 0000000000000001 RCX:
ffff9167954c4000
[ 78.490402] RDX: ffff91679550f690 RSI: 000000000000061f RDI:
ffffffff84c4aa50
[ 78.490416] RBP: ffffb0f581793e10 R08: 0000000000001ff5 R09:
0000000000000001
[ 78.490428] R10: ffff916784cb0a00 R11: 000000000a002b8c R12:
ffff91679550f690
[ 78.490441] R13: ffff9167901ce000 R14: 0000000000000200 R15:
ffff9167954c4000
[ 78.490454] FS: 0000000000000000(0000) GS:ffff916aae040000(0000)
knlGS:0000000000000000
[ 78.490469] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 78.490479] CR2: 0000000000000010 CR3: 000000010a1d8004 CR4:
00000000003706e0
[ 78.490537] Call Trace:
[ 78.490547] ? __ext4_journal_start_sb+0x106/0x120
[ 78.490558] ext4_init_inode_table+0x168/0x390
[ 78.490976] ext4_lazyinit_thread+0x38b/0x520
[ 78.491359] kthread+0x114/0x150
[ 78.491603] ? ext4_journalled_writepage_callback+0x60/0x60
[ 78.491849] ? kthread_park+0x90/0x90
[ 78.492103] ret_from_fork+0x22/0x30
[ 78.492348] Modules linked in: nbd rfcomm xt_conntrack xt_MASQUERADE
nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype
iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bpfilter
bnep bonding vsock_loopback vmw_vsock_virtio_transport_common vsock
binfmt_misc intel_rapl_msr intel_rapl_common kvm_intel kvm
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel
crypto_simd cryptd glue_helper rapl btusb btrtl btbcm btintel bluetooth
joydev vmw_balloon psmouse input_leds ecdh_generic ecc e1000 vmw_vmci
i2c_piix4 mac_hid sch_fq_codel btrfs blake2b_generic libcrc32c xor
zstd_compress raid6_pq overlay iptable_filter ip6table_filter ip6_tables
br_netfilter serio_raw bridge mptspi scsi_transport_spi ahci mptscsih
libahci mptbase pata_acpi stp llc arp_tables vmwgfx hid_generic
drm_kms_helper usbhid hid syscopyarea sysfillrect sysimgblt fb_sys_fops
cec ttm drm parport_pc ppdev lp parport ip_tables x_tables autofs4
[ 78.495718] CR2: 0000000000000010
[ 78.496074] ---[ end trace d98825069bfe2e2a ]---
[ 78.496425] RIP: 0010:__ext4_journal_get_write_access+0x2c/0x120
[ 78.496775] Code: 44 00 00 55 48 89 e5 41 57 49 89 cf 41 56 41 55 41
54 49 89 d4 53 48 83 ec 18 48 89 7d d0 89 75 cc e8 78 74 7b 00 49 8b 47
30 <4c> 8b 68 10 4d 85 ed 74 2f 49 8b 85 d8 00 00 00 49 8b 9d 80 03 00
[ 78.497834] RSP: 0018:ffffb0f581793dd0 EFLAGS: 00010246
[ 78.498259] RAX: 0000000000000000 RBX: 0000000000000001 RCX:
ffff9167954c4000
[ 78.498606] RDX: ffff91679550f690 RSI: 000000000000061f RDI:
ffffffff84c4aa50
[ 78.498944] RBP: ffffb0f581793e10 R08: 0000000000001ff5 R09:
0000000000000001
[ 78.499295] R10: ffff916784cb0a00 R11: 000000000a002b8c R12:
ffff91679550f690
[ 78.499630] R13: ffff9167901ce000 R14: 0000000000000200 R15:
ffff9167954c4000
[ 78.499964] FS: 0000000000000000(0000) GS:ffff916aae040000(0000)
knlGS:0000000000000000
[ 78.500316] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 78.500655] CR2: 0000000000000010 CR3: 000000010a1d8004 CR4:
00000000003706e0
...
在 2020/10/27 9:18, Ming Lei 写道:
> On Wed, Oct 21, 2020 at 09:08:10AM +0800, lining wrote:
>> (Sorry for sending this mail again, this one add nbd@other.debian.org)
>>
>> Hi kernel、ceph comunity:
>>
>> We run into an issue that mainly related to the (kernel) nbd driver and (ceph) rbd-nbd.
>> After some investigations, I found that the root cause of the problem seems to be related to the change in the block size of nbd.
>>
>> I am not sure whether it is the nbd driver or rbd-nbd bug, however there is such a problem.
>>
>>
>> What happened:
>> It will always hang when accessing the mount point of nbd device with ext4 after nbd resized.
>>
>>
>> Environment information:
>> - kernel: v4.19.25 or master
>> - rbd-nbd(ceph): v12.2.0 Luminous or master
>> - the fs of nbd: ext4
>
> Hello lining,
>
> Can you reproduce this issue on v5.9 kernel?
>
>
> Thanks,
> Ming
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [bug report] NBD: rbd-nbd + ext4 stuck after nbd resized
2020-10-21 1:08 lining
@ 2020-10-27 1:18 ` Ming Lei
2020-10-27 2:35 ` lining
0 siblings, 1 reply; 4+ messages in thread
From: Ming Lei @ 2020-10-27 1:18 UTC (permalink / raw)
To: lining
Cc: josef, axboe, linux-block, nbd, yunchuan.wen, ceph-users, donglifekernel
On Wed, Oct 21, 2020 at 09:08:10AM +0800, lining wrote:
> (Sorry for sending this mail again, this one add nbd@other.debian.org)
>
> Hi kernel、ceph comunity:
>
> We run into an issue that mainly related to the (kernel) nbd driver and (ceph) rbd-nbd.
> After some investigations, I found that the root cause of the problem seems to be related to the change in the block size of nbd.
>
> I am not sure whether it is the nbd driver or rbd-nbd bug, however there is such a problem.
>
>
> What happened:
> It will always hang when accessing the mount point of nbd device with ext4 after nbd resized.
>
>
> Environment information:
> - kernel: v4.19.25 or master
> - rbd-nbd(ceph): v12.2.0 Luminous or master
> - the fs of nbd: ext4
Hello lining,
Can you reproduce this issue on v5.9 kernel?
Thanks,
Ming
^ permalink raw reply [flat|nested] 4+ messages in thread
* [bug report] NBD: rbd-nbd + ext4 stuck after nbd resized
@ 2020-10-21 1:08 lining
2020-10-27 1:18 ` Ming Lei
0 siblings, 1 reply; 4+ messages in thread
From: lining @ 2020-10-21 1:08 UTC (permalink / raw)
To: josef, axboe, linux-block, nbd, yunchuan.wen, ceph-users
Cc: lining, donglifekernel
(Sorry for sending this mail again, this one add nbd@other.debian.org)
Hi kernel、ceph comunity:
We run into an issue that mainly related to the (kernel) nbd driver and (ceph) rbd-nbd.
After some investigations, I found that the root cause of the problem seems to be related to the change in the block size of nbd.
I am not sure whether it is the nbd driver or rbd-nbd bug, however there is such a problem.
What happened:
It will always hang when accessing the mount point of nbd device with ext4 after nbd resized.
Environment information:
- kernel: v4.19.25 or master
- rbd-nbd(ceph): v12.2.0 Luminous or master
- the fs of nbd: ext4
Steps to reproduce:
1. rbd create --size 2G rbdpool/foo # create a 2G size rbd image
2. rbd-nbd map rbdpool/foo # map the rbd image as a local block device /dev/nbd0, block size is 512(the default block size is set in rbd-nbd code when nbd mapped).
3. mkfs.ext4 /dev/nbd0 # mkfs.ext4 on nbd0, only nbd + ext4 can reproduce the problem
4. mount /dev/nbd0 /mnt # mount nbd0 on /mnt
5. rbd resize --size 4G rbdpool/foo # expand the nbd backend image from 2G to 4G size
6. ls /mnt # `ls` stuck here forever
ln@ubuntu:linux>$ ps -ef |grep mnt
root 8670 7519 98 10:16 pts/5 00:28:46 ls --color=auto /mnt/
ln 9508 9293 0 10:45 pts/6 00:00:00 grep --color=auto mnt
ln@ubuntu:linux>$ sudo cat /proc/8670/stack
[<0>] io_schedule+0x1a/0x40
[<0>] __lock_page+0x105/0x150
[<0>] pagecache_get_page+0x199/0x2c0
[<0>] __getblk_gfp+0xef/0x290
[<0>] ext4_getblk+0x83/0x1a0
[<0>] ext4_bread+0x26/0xb0
[<0>] __ext4_read_dirblock+0x34/0x2c0
[<0>] htree_dirblock_to_tree+0x56/0x1c0
[<0>] ext4_htree_fill_tree+0xad/0x330
[<0>] ext4_readdir+0x6a3/0x980
[<0>] iterate_dir+0x9e/0x1a0
[<0>] ksys_getdents64+0xa0/0x130
[<0>] __x64_sys_getdents64+0x1e/0x30
[<0>] do_syscall_64+0x5e/0x110
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[<0>] 0xffffffffffffffff
Some investigations on the kernel side:
By git bisect, I found the problem is related to this commit: https://github.com/torvalds/linux/commit/9a9c3c02eacecf4bfde74b08ed32749a4929a2cf .
The kernel with this commit (9a9c3c02) can reproduce the problem, revert the commit and the problem disappears.
Some Logical analysis about the nbd block size changing:
1. rbd-nbd map rbdpool/foo
=> ioctl NBD_BLKSZSET 512
=> nbd_size_set()
=> nbd_size_update(nbd)
=>{
bdev = bdget_disk(nbd->disk, 0);
bd_set_size(bdev, 512)
set_blocksize(bdev, 512)
}
2. mkfs.ext4 /dev/nbd0
3. mount /dev/nbd0 /mnt
=> vfs mount
=> ext4_mount()
=> …
=> sb_set_blocksize()
=> set_blocksize(bdev, 4096) <= mount ext4 will set the nbd blocksize to 4096
4. rbd resize –size 4G rbdpool/foo
=> ioctl NBD_SET_SIZE 4G <= rbd-nbd will update the latest total size of nbd device
=> nbd_size_set()
=> nbd_size_update(nbd)
=>{
bdev = bdget_disk(nbd->disk, 0);
bd_set_size(bdev, 512)
set_blocksize(bdev, 512) <= the blocksize is set back to 512 [code line: set_blocksize(bdev, config->blksize); ]. It seems to be the root cause.
}
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2020-10-27 7:25 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-19 3:29 [bug report] NBD: rbd-nbd + ext4 stuck after nbd resized lining
2020-10-21 1:08 lining
2020-10-27 1:18 ` Ming Lei
2020-10-27 2:35 ` lining
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).