All of lore.kernel.org
 help / color / mirror / Atom feed
* [bcachefs][tier] kernel BUG at drivers/md/bcache/super.c:1561! after double changing state of device
@ 2016-08-25 10:59 Marcin Mirosław
  2016-08-25 11:04 ` Marcin Mirosław
  0 siblings, 1 reply; 7+ messages in thread
From: Marcin Mirosław @ 2016-08-25 10:59 UTC (permalink / raw)
  To: linux-bcache

Hi!

I did:
# echo readonly > /sys/<...>/cache0/state

try to write but filesystem was readonly. Next I did:
# echo active > /sys/<...>/cache0/state

And I got:
[ 1417.679079] bcache (181708b5-8f1d-4533-bad9-c3ebfea4db0c): required
member dm-10 going RO, forcing fs RO
[ 1417.764650] bcache (181708b5-8f1d-4533-bad9-c3ebfea4db0c): dm-10 read
only
[ 1504.428743] ------------[ cut here ]------------
[ 1504.428769] kernel BUG at drivers/md/bcache/super.c:1561!
[ 1504.428777] invalid opcode: 0000 [#1] PREEMPT SMP
[ 1504.428783] Modules linked in: bcache libcrc32c tun mousedev nouveau
wmi video fbcon bitblit softcursor font backlight vboxnetadp(O)
i2c_algo_bit ttm drm_kms_helper cfbfillrect vboxnetflt(O) syscopyarea
cfbimgblt sysfillrect vboxdrv(O) sysimgblt fb_sys_fops snd_ens1371
snd_ac97_codec cfbcopyarea coretemp drm ac97_bus agpgart hwmon
snd_rawmidi snd_pcm fb kvm_intel kvm snd_timer fbdev irqbypass e1000e
psmouse snd soundcore evdev i2c_i801 ptp i82975x_edac acpi_cpufreq
edac_core pps_core lpc_ich mfd_core 8250 processor 8250_base button
serial_core zfs(PO) zunicode(PO) zcommon(PO) znvpair(PO) spl(O) zavl(PO)
zlib_deflate sr_mod cdrom sata_sil
[ 1504.428957] CPU: 0 PID: 31355 Comm: bash Tainted: P           O
4.7.0-bcache+ #5
[ 1504.428964] Hardware name:                  /D975XBX, BIOS
BX97510J.86A.1487.2007.0902.1724 09/02/2007
[ 1504.428972] task: ffff8800a09e8000 ti: ffff880016784000 task.ti:
ffff880016784000
[ 1504.428979] RIP: 0010:[<ffffffffc0952f63>]  [<ffffffffc0952f63>]
__bch_cache_read_write+0xe3/0xf0 [bcache]
[ 1504.429026] RSP: 0018:ffff880016787d30  EFLAGS: 00010202
[ 1504.429031] RAX: ffffffffc09706b0 RBX: ffff8800c820d000 RCX:
0000000000000000
[ 1504.429038] RDX: 0000000000000000 RSI: 0000000000000246 RDI:
ffff8800c820d000
[ 1504.429044] RBP: ffff880016787d48 R08: ffff88014fc80000 R09:
00000000000003ea
[ 1504.429050] R10: 00000000000003fa R11: 0000000000000000 R12:
ffff88008bc80000
[ 1504.429056] R13: ffff8800c820d2e0 R14: ffff8800c785ac50 R15:
ffff88008bc80000
[ 1504.429063] FS:  00007f7b45c31700(0000) GS:ffff88014fc00000(0000)
knlGS:0000000000000000
[ 1504.429070] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1504.429076] CR2: 0000000000547bc0 CR3: 00000000cf167000 CR4:
00000000000006f0
[ 1504.429082] Stack:
[ 1504.429086]  ffffffffc0957831 ffffffffc0973ec0 0000000000000007
ffff880016787db8
[ 1504.429099]  ffffffffc095a96d ffff8800c820d000 ffff880016787d01
0000000000000002
[ 1504.429110]  ffff8800cfb79300 ffff88014a0a4aa0 ffff880149c51240
000000008ea8ec62
[ 1504.429123] Call Trace:
[ 1504.429142]  [<ffffffffc0957831>] ? bch_cache_read_write+0x51/0xa0
[bcache]
[ 1504.429163]  [<ffffffffc095a96d>] __bch_cache_store+0x5ad/0x650 [bcache]
[ 1504.429183]  [<ffffffffc095aa44>] bch_cache_store+0x34/0x50 [bcache]
[ 1504.429192]  [<ffffffff81206272>] sysfs_kf_write+0x32/0x40
[ 1504.429199]  [<ffffffff812057f3>] kernfs_fop_write+0x113/0x190
[ 1504.429207]  [<ffffffff8118e5c2>] __vfs_write+0x32/0x150
[ 1504.429215]  [<ffffffff812e8a33>] ? __this_cpu_preempt_check+0x13/0x20
[ 1504.429223]  [<ffffffff8109f201>] ? update_fast_ctr+0x41/0x70
[ 1504.429230]  [<ffffffff8109f262>] ? percpu_down_read+0x12/0x50
[ 1504.429236]  [<ffffffff8118f8c3>] vfs_write+0xb3/0x1b0
[ 1504.429243]  [<ffffffff81190ce0>] SyS_write+0x50/0xc0
[ 1504.429249]  [<ffffffff811adbde>] ? __close_fd+0x9e/0xc0
[ 1504.429257]  [<ffffffff8157431f>] entry_SYSCALL_64_fastpath+0x17/0x93
[ 1504.429262] Code: 8b 03 48 85 c0 75 eb 65 ff 0d 4a 94 6b 3f 74 05 e9
4f ff ff ff e8 c6 f0 6a c0 e9 45 ff ff ff e8 bc f0 6a c0 31 c0 5b 41 5c
5d c3 <0f> 0b 90 66 2e 0f 1f 84 00 00 00 00 00 55 48 8b 87 90 00 00 00
[ 1504.429372] RIP  [<ffffffffc0952f63>]
__bch_cache_read_write+0xe3/0xf0 [bcache]
[ 1504.429394]  RSP <ffff880016787d30>
[ 1504.432174] ---[ end trace 88c06790c8a7571e ]---

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [bcachefs][tier] kernel BUG at drivers/md/bcache/super.c:1561! after double changing state of device
  2016-08-25 10:59 [bcachefs][tier] kernel BUG at drivers/md/bcache/super.c:1561! after double changing state of device Marcin Mirosław
@ 2016-08-25 11:04 ` Marcin Mirosław
  2016-08-25 11:44   ` Marcin Mirosław
  0 siblings, 1 reply; 7+ messages in thread
From: Marcin Mirosław @ 2016-08-25 11:04 UTC (permalink / raw)
  To: linux-bcache

W dniu 25.08.2016 o 12:59, Marcin Mirosław pisze:
> Hi!
> 
> I did:
> # echo readonly > /sys/<...>/cache0/state
> 
> try to write but filesystem was readonly. Next I did:
> # echo active > /sys/<...>/cache0/state
> 
> And I got:
> [ 1417.679079] bcache (181708b5-8f1d-4533-bad9-c3ebfea4db0c): required
> member dm-10 going RO, forcing fs RO
> [ 1417.764650] bcache (181708b5-8f1d-4533-bad9-c3ebfea4db0c): dm-10 read
> only
> [ 1504.428743] ------------[ cut here ]------------
> [ 1504.428769] kernel BUG at drivers/md/bcache/super.c:1561!
> [ 1504.428777] invalid opcode: 0000 [#1] PREEMPT SMP
> [ 1504.428783] Modules linked in: bcache libcrc32c tun mousedev nouveau
> wmi video fbcon bitblit softcursor font backlight vboxnetadp(O)
> i2c_algo_bit ttm drm_kms_helper cfbfillrect vboxnetflt(O) syscopyarea
> cfbimgblt sysfillrect vboxdrv(O) sysimgblt fb_sys_fops snd_ens1371
> snd_ac97_codec cfbcopyarea coretemp drm ac97_bus agpgart hwmon
> snd_rawmidi snd_pcm fb kvm_intel kvm snd_timer fbdev irqbypass e1000e
> psmouse snd soundcore evdev i2c_i801 ptp i82975x_edac acpi_cpufreq
> edac_core pps_core lpc_ich mfd_core 8250 processor 8250_base button
> serial_core zfs(PO) zunicode(PO) zcommon(PO) znvpair(PO) spl(O) zavl(PO)
> zlib_deflate sr_mod cdrom sata_sil
> [ 1504.428957] CPU: 0 PID: 31355 Comm: bash Tainted: P           O
> 4.7.0-bcache+ #5
> [ 1504.428964] Hardware name:                  /D975XBX, BIOS
> BX97510J.86A.1487.2007.0902.1724 09/02/2007
> [ 1504.428972] task: ffff8800a09e8000 ti: ffff880016784000 task.ti:
> ffff880016784000
> [ 1504.428979] RIP: 0010:[<ffffffffc0952f63>]  [<ffffffffc0952f63>]
> __bch_cache_read_write+0xe3/0xf0 [bcache]
> [ 1504.429026] RSP: 0018:ffff880016787d30  EFLAGS: 00010202
> [ 1504.429031] RAX: ffffffffc09706b0 RBX: ffff8800c820d000 RCX:
> 0000000000000000
> [ 1504.429038] RDX: 0000000000000000 RSI: 0000000000000246 RDI:
> ffff8800c820d000
> [ 1504.429044] RBP: ffff880016787d48 R08: ffff88014fc80000 R09:
> 00000000000003ea
> [ 1504.429050] R10: 00000000000003fa R11: 0000000000000000 R12:
> ffff88008bc80000
> [ 1504.429056] R13: ffff8800c820d2e0 R14: ffff8800c785ac50 R15:
> ffff88008bc80000
> [ 1504.429063] FS:  00007f7b45c31700(0000) GS:ffff88014fc00000(0000)
> knlGS:0000000000000000
> [ 1504.429070] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1504.429076] CR2: 0000000000547bc0 CR3: 00000000cf167000 CR4:
> 00000000000006f0
> [ 1504.429082] Stack:
> [ 1504.429086]  ffffffffc0957831 ffffffffc0973ec0 0000000000000007
> ffff880016787db8
> [ 1504.429099]  ffffffffc095a96d ffff8800c820d000 ffff880016787d01
> 0000000000000002
> [ 1504.429110]  ffff8800cfb79300 ffff88014a0a4aa0 ffff880149c51240
> 000000008ea8ec62
> [ 1504.429123] Call Trace:
> [ 1504.429142]  [<ffffffffc0957831>] ? bch_cache_read_write+0x51/0xa0
> [bcache]
> [ 1504.429163]  [<ffffffffc095a96d>] __bch_cache_store+0x5ad/0x650 [bcache]
> [ 1504.429183]  [<ffffffffc095aa44>] bch_cache_store+0x34/0x50 [bcache]
> [ 1504.429192]  [<ffffffff81206272>] sysfs_kf_write+0x32/0x40
> [ 1504.429199]  [<ffffffff812057f3>] kernfs_fop_write+0x113/0x190
> [ 1504.429207]  [<ffffffff8118e5c2>] __vfs_write+0x32/0x150
> [ 1504.429215]  [<ffffffff812e8a33>] ? __this_cpu_preempt_check+0x13/0x20
> [ 1504.429223]  [<ffffffff8109f201>] ? update_fast_ctr+0x41/0x70
> [ 1504.429230]  [<ffffffff8109f262>] ? percpu_down_read+0x12/0x50
> [ 1504.429236]  [<ffffffff8118f8c3>] vfs_write+0xb3/0x1b0
> [ 1504.429243]  [<ffffffff81190ce0>] SyS_write+0x50/0xc0
> [ 1504.429249]  [<ffffffff811adbde>] ? __close_fd+0x9e/0xc0
> [ 1504.429257]  [<ffffffff8157431f>] entry_SYSCALL_64_fastpath+0x17/0x93
> [ 1504.429262] Code: 8b 03 48 85 c0 75 eb 65 ff 0d 4a 94 6b 3f 74 05 e9
> 4f ff ff ff e8 c6 f0 6a c0 e9 45 ff ff ff e8 bc f0 6a c0 31 c0 5b 41 5c
> 5d c3 <0f> 0b 90 66 2e 0f 1f 84 00 00 00 00 00 55 48 8b 87 90 00 00 00
> [ 1504.429372] RIP  [<ffffffffc0952f63>]
> __bch_cache_read_write+0xe3/0xf0 [bcache]
> [ 1504.429394]  RSP <ffff880016787d30>
> [ 1504.432174] ---[ end trace 88c06790c8a7571e ]---




And now I've got data corruption:
# md5sum *
md5sum: gentoo: Jest katalogiem
md5sum: proxmox-ve_3.1-93bf03d4-8.iso: Błąd wejścia/wyjścia  /* in
english: IO error */

dmesg:
[ 1857.151644] bcache (181708b5-8f1d-4533-bad9-c3ebfea4db0c): IO error
on dm-10 for checksum error
[ 1857.349356] bcache (181708b5-8f1d-4533-bad9-c3ebfea4db0c): IO error
on dm-10 for checksum error

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [bcachefs][tier] kernel BUG at drivers/md/bcache/super.c:1561! after double changing state of device
  2016-08-25 11:04 ` Marcin Mirosław
@ 2016-08-25 11:44   ` Marcin Mirosław
  2016-08-25 12:03     ` Marcin Mirosław
  0 siblings, 1 reply; 7+ messages in thread
From: Marcin Mirosław @ 2016-08-25 11:44 UTC (permalink / raw)
  To: linux-bcache

W dniu 25.08.2016 o 13:04, Marcin Mirosław pisze:
> And now I've got data corruption:
> # md5sum *
> md5sum: gentoo: Jest katalogiem
> md5sum: proxmox-ve_3.1-93bf03d4-8.iso: Błąd wejścia/wyjścia  /* in
> english: IO error */
> 
> dmesg:
> [ 1857.151644] bcache (181708b5-8f1d-4533-bad9-c3ebfea4db0c): IO error
> on dm-10 for checksum error
> [ 1857.349356] bcache (181708b5-8f1d-4533-bad9-c3ebfea4db0c): IO error
> on dm-10 for checksum error

# umount /mnt/test
hangs forever.
I issued
# echo t >/proc/sysrq-trigger

[...]
Aug 25 13:42:11 localhost kernel: [ 4321.952551] umount          D
ffff8800cf5cbce8     0  2620  20218 0x00000000
Aug 25 13:42:11 localhost kernel: [ 4321.952555]  ffff8800cf5cbce8
ffff8800cf5cbcf8 ffff88014a1a0000 ffff8800a0899b40
Aug 25 13:42:11 localhost kernel: [ 4321.952558]  ffff880149868000
ffff8800cf5cc000 ffff8800cf5cbe20 ffff8800cf5cbe18
Aug 25 13:42:11 localhost kernel: [ 4321.952562]  ffff8800a0899b40
ffff8800a089a238 ffff8800cf5cbd00 ffffffff8156fcea
Aug 25 13:42:11 localhost kernel: [ 4321.952566] Call Trace:
Aug 25 13:42:11 localhost kernel: [ 4321.952568]  [<ffffffff8156fcea>]
schedule+0x3a/0x90
Aug 25 13:42:11 localhost kernel: [ 4321.952571]  [<ffffffff81572f93>]
schedule_timeout+0x1b3/0x280
Aug 25 13:42:11 localhost kernel: [ 4321.952573]  [<ffffffff8157001f>] ?
preempt_schedule+0x1f/0x30
Aug 25 13:42:11 localhost kernel: [ 4321.952576]  [<ffffffff81002016>] ?
___preempt_schedule+0x16/0x18
Aug 25 13:42:11 localhost kernel: [ 4321.952579]  [<ffffffff815711f5>]
wait_for_completion+0xd5/0x110
Aug 25 13:42:11 localhost kernel: [ 4321.952581]  [<ffffffff81082760>] ?
wake_up_q+0x70/0x70
Aug 25 13:42:11 localhost kernel: [ 4321.952597]  [<ffffffffc0938c40>]
bch_kill_sb+0x90/0xa0 [bcache]
Aug 25 13:42:11 localhost kernel: [ 4321.952600]  [<ffffffff811921ae>]
deactivate_locked_super+0x3e/0x70
Aug 25 13:42:11 localhost kernel: [ 4321.952602]  [<ffffffff81192647>]
deactivate_super+0x57/0x60
Aug 25 13:42:11 localhost kernel: [ 4321.952605]  [<ffffffff811af9ea>]
cleanup_mnt+0x3a/0x80
Aug 25 13:42:11 localhost kernel: [ 4321.952607]  [<ffffffff811afa6d>]
__cleanup_mnt+0xd/0x10
Aug 25 13:42:11 localhost kernel: [ 4321.952609]  [<ffffffff8107609c>]
task_work_run+0x7c/0xa0
Aug 25 13:42:11 localhost kernel: [ 4321.952612]  [<ffffffff810022ef>]
exit_to_usermode_loop+0x9f/0xb0
Aug 25 13:42:11 localhost kernel: [ 4321.952615]  [<ffffffff81002b78>]
syscall_return_slowpath+0x48/0x60
Aug 25 13:42:11 localhost kernel: [ 4321.952618]  [<ffffffff81574399>]
entry_SYSCALL_64_fastpath+0x91/0x93

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [bcachefs][tier] kernel BUG at drivers/md/bcache/super.c:1561! after double changing state of device
  2016-08-25 11:44   ` Marcin Mirosław
@ 2016-08-25 12:03     ` Marcin Mirosław
  2016-08-26  2:10       ` Kent Overstreet
  0 siblings, 1 reply; 7+ messages in thread
From: Marcin Mirosław @ 2016-08-25 12:03 UTC (permalink / raw)
  To: linux-bcache

https://lwn.net/Articles/655183/ :
"Caveat: don't try to use tiering and checksumming or compression at the
same time yet, the read path needs to be reworked to handle both at the
same time."

Is it sill valid?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [bcachefs][tier] kernel BUG at drivers/md/bcache/super.c:1561! after double changing state of device
  2016-08-25 12:03     ` Marcin Mirosław
@ 2016-08-26  2:10       ` Kent Overstreet
  2016-08-26  2:17         ` Christopher James Halse Rogers
  2016-08-26  9:00         ` Marcin Mirosław
  0 siblings, 2 replies; 7+ messages in thread
From: Kent Overstreet @ 2016-08-26  2:10 UTC (permalink / raw)
  To: Marcin Mirosław, chris; +Cc: linux-bcache

On Thu, Aug 25, 2016 at 02:03:33PM +0200, Marcin Mirosław wrote:
> https://lwn.net/Articles/655183/ :
> "Caveat: don't try to use tiering and checksumming or compression at the
> same time yet, the read path needs to be reworked to handle both at the
> same time."
> 
> Is it sill valid?

No, I did do that read path reworking, they should work.

However, I haven't been focusing on or exercising the multi device stuff in
quite awhile - my main priority has been making single device filesystems rock
solid and finishing off compression and such.

Tiering ought to work, but can you hold off on exercising anything else? e.g.
the active/RO transition stuff - I'm going to have to spend a fair amount of
time digging into that code and figuring out what makes sense when the time
comes.

The checksum error is highly concerning though - that was related to messing
with cache0/state, correct? I think Christopher is using tiering with
checksumming enabled, can you confirm?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [bcachefs][tier] kernel BUG at drivers/md/bcache/super.c:1561! after double changing state of device
  2016-08-26  2:10       ` Kent Overstreet
@ 2016-08-26  2:17         ` Christopher James Halse Rogers
  2016-08-26  9:00         ` Marcin Mirosław
  1 sibling, 0 replies; 7+ messages in thread
From: Christopher James Halse Rogers @ 2016-08-26  2:17 UTC (permalink / raw)
  To: linux-bcache



On Fri, Aug 26, 2016 at 12:10 PM, Kent Overstreet 
<kent.overstreet@gmail.com> wrote:
> On Thu, Aug 25, 2016 at 02:03:33PM +0200, Marcin Mirosław wrote:
>>  https://lwn.net/Articles/655183/ :
>>  "Caveat: don't try to use tiering and checksumming or compression 
>> at the
>>  same time yet, the read path needs to be reworked to handle both at 
>> the
>>  same time."
>> 
>>  Is it sill valid?
> 
> No, I did do that read path reworking, they should work.
> 
> However, I haven't been focusing on or exercising the multi device 
> stuff in
> quite awhile - my main priority has been making single device 
> filesystems rock
> solid and finishing off compression and such.
> 
> Tiering ought to work, but can you hold off on exercising anything 
> else? e.g.
> the active/RO transition stuff - I'm going to have to spend a fair 
> amount of
> time digging into that code and figuring out what makes sense when 
> the time
> comes.
> 
> The checksum error is highly concerning though - that was related to 
> messing
> with cache0/state, correct? I think Christopher is using tiering with
> checksumming enabled, can you confirm?

Yup. I've got two tiered filesystems; one nvme in front of two HDDs 
with crc32 checksums, and one regular SSD in front of a HDD with crc32 
and lz4 compression.

Neither have had significant problems.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [bcachefs][tier] kernel BUG at drivers/md/bcache/super.c:1561! after double changing state of device
  2016-08-26  2:10       ` Kent Overstreet
  2016-08-26  2:17         ` Christopher James Halse Rogers
@ 2016-08-26  9:00         ` Marcin Mirosław
  1 sibling, 0 replies; 7+ messages in thread
From: Marcin Mirosław @ 2016-08-26  9:00 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-bcache

W dniu 26.08.2016 o 04:10, Kent Overstreet pisze:
> On Thu, Aug 25, 2016 at 02:03:33PM +0200, Marcin Mirosław wrote:
>> https://lwn.net/Articles/655183/ :
>> "Caveat: don't try to use tiering and checksumming or compression at the
>> same time yet, the read path needs to be reworked to handle both at the
>> same time."
>>
>> Is it sill valid?
> 
> No, I did do that read path reworking, they should work.
> 
> However, I haven't been focusing on or exercising the multi device stuff in
> quite awhile - my main priority has been making single device filesystems rock
> solid and finishing off compression and such.
> 
> Tiering ought to work, but can you hold off on exercising anything else? e.g.
> the active/RO transition stuff - I'm going to have to spend a fair amount of
> time digging into that code and figuring out what makes sense when the time
> comes.

Ok, I'll focuse on one disk bcachefs.


> The checksum error is highly concerning though - that was related to messing
> with cache0/state, correct? I think Christopher is using tiering with
> checksumming enabled, can you confirm?

Yes, I only did echo "readonly", next "active" to >state. It was enough
to get BUG in kern.log.

Marcin

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-08-26  9:04 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-25 10:59 [bcachefs][tier] kernel BUG at drivers/md/bcache/super.c:1561! after double changing state of device Marcin Mirosław
2016-08-25 11:04 ` Marcin Mirosław
2016-08-25 11:44   ` Marcin Mirosław
2016-08-25 12:03     ` Marcin Mirosław
2016-08-26  2:10       ` Kent Overstreet
2016-08-26  2:17         ` Christopher James Halse Rogers
2016-08-26  9:00         ` Marcin Mirosław

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.