All of lore.kernel.org
 help / color / mirror / Atom feed
* [bcachefs] bcache (dm-10): IO error on dm-10 for checksum error (due to change of str_hash?)
@ 2016-09-15  9:36 Marcin Mirosław
  2016-09-15  9:39 ` Marcin Mirosław
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Marcin Mirosław @ 2016-09-15  9:36 UTC (permalink / raw)
  To: linux-bcache

Hi!
I was playing with fs without tiering. I was using it for tmp dir for
compilation. Next I changed in sys:
echo crc64 > options/data_checksum
echo crc64 > options/metadata_checksum
echo crc64 > options/str_hash

After a couple of minutes I got:
[ 8372.574346] bcache (dm-10): IO error on dm-10 for checksum error
[ 8372.680196] bcache (dm-10): IO error on dm-10 for checksum error
[ 8464.361860] bcache (dm-10): IO error on dm-10 for checksum error
[ 8466.146966] bcache (dm-10): IO error on dm-10 for checksum error
[ 8466.995095] bcache (dm-10): IO error on dm-10 for checksum error
[ 8469.199749] bcache (dm-10): IO error on dm-10 for checksum error
[ 8469.441408] bcache (dm-10): IO error on dm-10 for checksum error
[ 8469.722676] bcache (dm-10): IO error on dm-10 for checksum error
[ 8469.827055] bcache (dm-10): IO error on dm-10 for checksum error
[ 8470.038869] bcache (dm-10): IO error on dm-10 for checksum error
[ 8470.236663] bcache (dm-10): IO error on dm-10 for checksum error
[ 8470.427094] bcache (dm-10): IO error on dm-10 for checksum error
[ 8472.030519] bcache (dm-10): IO error on dm-10 for checksum error
[ 8473.098820] bcache (dm-10): IO error on dm-10 for checksum error
[ 8916.491297] bcache (dm-10): IO error on dm-10 for checksum error
[ 8916.715057] bcache (dm-10): IO error on dm-10 for checksum error
[ 8916.715111] bcache (dm-10): too many IO errors on dm-10, setting
filesystem RO
[ 8916.733056] bcache (dm-10): IO error on dm-10 for checksum error
[ 8916.733125] bcache (dm-10): dm-10 read only
[ 8916.733161] bcache (dm-10): too many IO errors on dm-10, setting
device RO
[ 8916.988286] bcache (dm-10): IO error: read only
[ 8916.988545] bcache (dm-10): IO error: read only


Is this due to changing str_hash?
Marcin

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [bcachefs] bcache (dm-10): IO error on dm-10 for checksum error (due to change of str_hash?)
  2016-09-15  9:36 [bcachefs] bcache (dm-10): IO error on dm-10 for checksum error (due to change of str_hash?) Marcin Mirosław
@ 2016-09-15  9:39 ` Marcin Mirosław
  2016-09-16  2:12 ` Kent Overstreet
  2016-09-16  3:33 ` Kent Overstreet
  2 siblings, 0 replies; 12+ messages in thread
From: Marcin Mirosław @ 2016-09-15  9:39 UTC (permalink / raw)
  To: linux-bcache

And I can't mount fs again:
[ 9620.478666] bcache (dm-10): journal replay done, 2218 keys in 100
entries, seq 63824
[ 9623.561759] bcache: bch_open_as_blockdevs() register_cache_set err
error gcing inode nlinks
[ 9623.584080] bcache (dm-10): stopped

Marcin

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [bcachefs] bcache (dm-10): IO error on dm-10 for checksum error (due to change of str_hash?)
  2016-09-15  9:36 [bcachefs] bcache (dm-10): IO error on dm-10 for checksum error (due to change of str_hash?) Marcin Mirosław
  2016-09-15  9:39 ` Marcin Mirosław
@ 2016-09-16  2:12 ` Kent Overstreet
  2016-09-16  3:33 ` Kent Overstreet
  2 siblings, 0 replies; 12+ messages in thread
From: Kent Overstreet @ 2016-09-16  2:12 UTC (permalink / raw)
  To: Marcin Mirosław; +Cc: linux-bcache

On Thu, Sep 15, 2016 at 11:36:14AM +0200, Marcin Mirosław wrote:
> Hi!
> I was playing with fs without tiering. I was using it for tmp dir for
> compilation. Next I changed in sys:
> echo crc64 > options/data_checksum
> echo crc64 > options/metadata_checksum
> echo crc64 > options/str_hash
> 
> After a couple of minutes I got:
> [ 8372.574346] bcache (dm-10): IO error on dm-10 for checksum error
> [ 8372.680196] bcache (dm-10): IO error on dm-10 for checksum error
> [ 8464.361860] bcache (dm-10): IO error on dm-10 for checksum error
> [ 8466.146966] bcache (dm-10): IO error on dm-10 for checksum error
> [ 8466.995095] bcache (dm-10): IO error on dm-10 for checksum error
> [ 8469.199749] bcache (dm-10): IO error on dm-10 for checksum error
> [ 8469.441408] bcache (dm-10): IO error on dm-10 for checksum error
> [ 8469.722676] bcache (dm-10): IO error on dm-10 for checksum error
> [ 8469.827055] bcache (dm-10): IO error on dm-10 for checksum error
> [ 8470.038869] bcache (dm-10): IO error on dm-10 for checksum error
> [ 8470.236663] bcache (dm-10): IO error on dm-10 for checksum error
> [ 8470.427094] bcache (dm-10): IO error on dm-10 for checksum error
> [ 8472.030519] bcache (dm-10): IO error on dm-10 for checksum error
> [ 8473.098820] bcache (dm-10): IO error on dm-10 for checksum error
> [ 8916.491297] bcache (dm-10): IO error on dm-10 for checksum error
> [ 8916.715057] bcache (dm-10): IO error on dm-10 for checksum error
> [ 8916.715111] bcache (dm-10): too many IO errors on dm-10, setting
> filesystem RO
> [ 8916.733056] bcache (dm-10): IO error on dm-10 for checksum error
> [ 8916.733125] bcache (dm-10): dm-10 read only
> [ 8916.733161] bcache (dm-10): too many IO errors on dm-10, setting
> device RO
> [ 8916.988286] bcache (dm-10): IO error: read only
> [ 8916.988545] bcache (dm-10): IO error: read only
> 
> 
> Is this due to changing str_hash?

Damn, you're finding all the bugs :)

I'm trying to reproduce it. It's probably due to changing data_checksum though,
not str_hash (and changing all of those at runtime should definitely work! just
neglected to test that, it seems...)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [bcachefs] bcache (dm-10): IO error on dm-10 for checksum error (due to change of str_hash?)
  2016-09-15  9:36 [bcachefs] bcache (dm-10): IO error on dm-10 for checksum error (due to change of str_hash?) Marcin Mirosław
  2016-09-15  9:39 ` Marcin Mirosław
  2016-09-16  2:12 ` Kent Overstreet
@ 2016-09-16  3:33 ` Kent Overstreet
  2016-09-16  8:07   ` Marcin Mirosław
  2 siblings, 1 reply; 12+ messages in thread
From: Kent Overstreet @ 2016-09-16  3:33 UTC (permalink / raw)
  To: Marcin Mirosław; +Cc: linux-bcache

On Thu, Sep 15, 2016 at 11:36:14AM +0200, Marcin Mirosław wrote:
> Hi!
> I was playing with fs without tiering. I was using it for tmp dir for
> compilation. Next I changed in sys:
> echo crc64 > options/data_checksum
> echo crc64 > options/metadata_checksum
> echo crc64 > options/str_hash
> 
> After a couple of minutes I got:
> [ 8372.574346] bcache (dm-10): IO error on dm-10 for checksum error
> [ 8372.680196] bcache (dm-10): IO error on dm-10 for checksum error
> [ 8464.361860] bcache (dm-10): IO error on dm-10 for checksum error
> [ 8466.146966] bcache (dm-10): IO error on dm-10 for checksum error
> [ 8466.995095] bcache (dm-10): IO error on dm-10 for checksum error
> [ 8469.199749] bcache (dm-10): IO error on dm-10 for checksum error
> [ 8469.441408] bcache (dm-10): IO error on dm-10 for checksum error
> [ 8469.722676] bcache (dm-10): IO error on dm-10 for checksum error
> [ 8469.827055] bcache (dm-10): IO error on dm-10 for checksum error
> [ 8470.038869] bcache (dm-10): IO error on dm-10 for checksum error
> [ 8470.236663] bcache (dm-10): IO error on dm-10 for checksum error
> [ 8470.427094] bcache (dm-10): IO error on dm-10 for checksum error
> [ 8472.030519] bcache (dm-10): IO error on dm-10 for checksum error
> [ 8473.098820] bcache (dm-10): IO error on dm-10 for checksum error
> [ 8916.491297] bcache (dm-10): IO error on dm-10 for checksum error
> [ 8916.715057] bcache (dm-10): IO error on dm-10 for checksum error
> [ 8916.715111] bcache (dm-10): too many IO errors on dm-10, setting
> filesystem RO
> [ 8916.733056] bcache (dm-10): IO error on dm-10 for checksum error
> [ 8916.733125] bcache (dm-10): dm-10 read only
> [ 8916.733161] bcache (dm-10): too many IO errors on dm-10, setting
> device RO
> [ 8916.988286] bcache (dm-10): IO error: read only
> [ 8916.988545] bcache (dm-10): IO error: read only

Ok, it turns out the crc64 for data checksums code was just fubar. Fix is up
(the fix does change how crc64 is computed for bios though, so it'll be
incompatible with your existing filesystem).

Also pushed a patch that adds some more error messages to fs-gc, we should
figure out why it wouldn't mount. I can't think of any reason why data checksum
errors would've caused that.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [bcachefs] bcache (dm-10): IO error on dm-10 for checksum error (due to change of str_hash?)
  2016-09-16  3:33 ` Kent Overstreet
@ 2016-09-16  8:07   ` Marcin Mirosław
  2016-09-16  8:38     ` Kent Overstreet
  0 siblings, 1 reply; 12+ messages in thread
From: Marcin Mirosław @ 2016-09-16  8:07 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-bcache

W dniu 16.09.2016 o 05:33, Kent Overstreet pisze:
> On Thu, Sep 15, 2016 at 11:36:14AM +0200, Marcin Mirosław wrote:
>> Hi!
>> I was playing with fs without tiering. I was using it for tmp dir for
>> compilation. Next I changed in sys:
>> echo crc64 > options/data_checksum
>> echo crc64 > options/metadata_checksum
>> echo crc64 > options/str_hash
>>
>> After a couple of minutes I got:
>> [ 8372.574346] bcache (dm-10): IO error on dm-10 for checksum error
>> [ 8372.680196] bcache (dm-10): IO error on dm-10 for checksum error
>> [ 8464.361860] bcache (dm-10): IO error on dm-10 for checksum error
>> [ 8466.146966] bcache (dm-10): IO error on dm-10 for checksum error
>> [ 8466.995095] bcache (dm-10): IO error on dm-10 for checksum error
>> [ 8469.199749] bcache (dm-10): IO error on dm-10 for checksum error
>> [ 8469.441408] bcache (dm-10): IO error on dm-10 for checksum error
>> [ 8469.722676] bcache (dm-10): IO error on dm-10 for checksum error
>> [ 8469.827055] bcache (dm-10): IO error on dm-10 for checksum error
>> [ 8470.038869] bcache (dm-10): IO error on dm-10 for checksum error
>> [ 8470.236663] bcache (dm-10): IO error on dm-10 for checksum error
>> [ 8470.427094] bcache (dm-10): IO error on dm-10 for checksum error
>> [ 8472.030519] bcache (dm-10): IO error on dm-10 for checksum error
>> [ 8473.098820] bcache (dm-10): IO error on dm-10 for checksum error
>> [ 8916.491297] bcache (dm-10): IO error on dm-10 for checksum error
>> [ 8916.715057] bcache (dm-10): IO error on dm-10 for checksum error
>> [ 8916.715111] bcache (dm-10): too many IO errors on dm-10, setting
>> filesystem RO
>> [ 8916.733056] bcache (dm-10): IO error on dm-10 for checksum error
>> [ 8916.733125] bcache (dm-10): dm-10 read only
>> [ 8916.733161] bcache (dm-10): too many IO errors on dm-10, setting
>> device RO
>> [ 8916.988286] bcache (dm-10): IO error: read only
>> [ 8916.988545] bcache (dm-10): IO error: read only
> 
> Ok, it turns out the crc64 for data checksums code was just fubar. Fix is up
> (the fix does change how crc64 is computed for bios though, so it'll be
> incompatible with your existing filesystem).
> 
> Also pushed a patch that adds some more error messages to fs-gc, we should
> figure out why it wouldn't mount. I can't think of any reason why data checksum
> errors would've caused that.

Hi Kent, hi all,
when I tried to mount fs that has troubles yesterday I've got:
[  494.296818] bcache (dm-10): dm-10: journal checksum bad (got
18446744072224191025 expect 2809606705), sector 2048u
[  494.309973] bcache (dm-10): dm-10: journal checksum bad (got
18446744073320597786 expect 3906013466), sector 2304u
[  494.311597] bcache (dm-10): dm-10: journal checksum bad (got
18446744070980686285 expect 1566101965), sector 2560u
[  494.313038] bcache (dm-10): dm-10: journal checksum bad (got
18446744073177643543 expect 3763059223), sector 2816u
[  494.324082] bcache (dm-10): dm-10: journal checksum bad (got
18446744070081456445 expect 666872125), sector 3072u
[... many similar lines...]
[  495.000229] bcache (dm-10): dm-10: journal checksum bad (got
18446744071270315299 expect 1855730979), sector 90368u
[  495.001373] bcache (dm-10): dm-10: journal checksum bad (got
18446744070901133954 expect 1486549634), sector 90624u
[  495.002696] bcache (dm-10): dm-10: journal checksum bad (got
18446744071373615633 expect 1959031313), sector 90880u
[  496.618084] bcache (dm-10): journal replay error: -28
[  496.618124] bcache: bch_open_as_blockdevs() register_cache_set err
journal replay failed
[  496.796085] bcache (dm-10): stopped


What str_hash does?

Today I formated block device and again I play with changing
"compression, data_checksum, metadata_checksum, str_hash". I was
changing options while intensive writing to fs. Two times I had hard
lockup of kernel. No chance for getting dmesg. After first lockup I
caouldn't mount fs again due to:
kernel: [  260.141942] bcache: bch_open_as_blockdevs()
register_cache_set err bad btree root

So -> format -> testing - hard lockup. On the second time I could mount
again fs:
kernel: [  234.920846] bcache (dm-11): journal replay done, 29 keys in 1
entries, seq 3447

I'm thinking about using netconsole but I'm not sure I would have a time
for this before tuesday.

Thanks,
Marcin

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [bcachefs] bcache (dm-10): IO error on dm-10 for checksum error (due to change of str_hash?)
  2016-09-16  8:07   ` Marcin Mirosław
@ 2016-09-16  8:38     ` Kent Overstreet
  2016-09-16  9:02       ` Marcin Mirosław
  0 siblings, 1 reply; 12+ messages in thread
From: Kent Overstreet @ 2016-09-16  8:38 UTC (permalink / raw)
  To: Marcin Mirosław; +Cc: linux-bcache

On Fri, Sep 16, 2016 at 10:07:30AM +0200, Marcin Mirosław wrote:
> Hi Kent, hi all,
> when I tried to mount fs that has troubles yesterday I've got:
> [  494.296818] bcache (dm-10): dm-10: journal checksum bad (got
> 18446744072224191025 expect 2809606705), sector 2048u
> [  494.309973] bcache (dm-10): dm-10: journal checksum bad (got
> 18446744073320597786 expect 3906013466), sector 2304u
> [  494.311597] bcache (dm-10): dm-10: journal checksum bad (got
> 18446744070980686285 expect 1566101965), sector 2560u
> [  494.313038] bcache (dm-10): dm-10: journal checksum bad (got
> 18446744073177643543 expect 3763059223), sector 2816u
> [  494.324082] bcache (dm-10): dm-10: journal checksum bad (got
> 18446744070081456445 expect 666872125), sector 3072u
> [... many similar lines...]
> [  495.000229] bcache (dm-10): dm-10: journal checksum bad (got
> 18446744071270315299 expect 1855730979), sector 90368u
> [  495.001373] bcache (dm-10): dm-10: journal checksum bad (got
> 18446744070901133954 expect 1486549634), sector 90624u
> [  495.002696] bcache (dm-10): dm-10: journal checksum bad (got
> 18446744071373615633 expect 1959031313), sector 90880u
> [  496.618084] bcache (dm-10): journal replay error: -28
> [  496.618124] bcache: bch_open_as_blockdevs() register_cache_set err
> journal replay failed
> [  496.796085] bcache (dm-10): stopped

Damn, metadata checksums were getting truncated too... Those checksum errors are
a result of the patch fixing that and not truncating the checksum anymore.

Well, if you want you could try testing with a5e2d9aaea to see what the error
was in fs gc, but regardless you're going to have to end reformatting (you said
nothing important was on this filesystem, right?)

> What str_hash does?

It selects the hash function used for indexing strings - dirents and xattrs are
indexed by hash.

> Today I formated block device and again I play with changing
> "compression, data_checksum, metadata_checksum, str_hash". I was
> changing options while intensive writing to fs. Two times I had hard
> lockup of kernel. No chance for getting dmesg. After first lockup I
> caouldn't mount fs again due to:
> kernel: [  260.141942] bcache: bch_open_as_blockdevs()
> register_cache_set err bad btree root
> 
> So -> format -> testing - hard lockup. On the second time I could mount
> again fs:
> kernel: [  234.920846] bcache (dm-11): journal replay done, 29 keys in 1
> entries, seq 3447
> 
> I'm thinking about using netconsole but I'm not sure I would have a time
> for this before tuesday.

I haven't yet tried randomly flipping the compression type at runtime, I'll try
that now...

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [bcachefs] bcache (dm-10): IO error on dm-10 for checksum error (due to change of str_hash?)
  2016-09-16  8:38     ` Kent Overstreet
@ 2016-09-16  9:02       ` Marcin Mirosław
  2016-09-16  9:16         ` Kent Overstreet
  0 siblings, 1 reply; 12+ messages in thread
From: Marcin Mirosław @ 2016-09-16  9:02 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-bcache

W dniu 16.09.2016 o 10:38, Kent Overstreet pisze:
[...]
> I haven't yet tried randomly flipping the compression type at runtime, I'll try
> that now...

Don't forget changing other options, simply changing compression type I
tested some times ago and it worked;) I think that also heavy writes
while making changes are important.

Marcin

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [bcachefs] bcache (dm-10): IO error on dm-10 for checksum error (due to change of str_hash?)
  2016-09-16  9:02       ` Marcin Mirosław
@ 2016-09-16  9:16         ` Kent Overstreet
  2016-09-16 11:17           ` Marcin Mirosław
  0 siblings, 1 reply; 12+ messages in thread
From: Kent Overstreet @ 2016-09-16  9:16 UTC (permalink / raw)
  To: Marcin Mirosław; +Cc: linux-bcache

On Fri, Sep 16, 2016 at 11:02:10AM +0200, Marcin Mirosław wrote:
> W dniu 16.09.2016 o 10:38, Kent Overstreet pisze:
> [...]
> > I haven't yet tried randomly flipping the compression type at runtime, I'll try
> > that now...
> 
> Don't forget changing other options, simply changing compression type I
> tested some times ago and it worked;) I think that also heavy writes
> while making changes are important.

Yeah, I think you're right about heavy writes - the one other bug remotely like
this that's been reported was an intermittent deadlock under heavy write load.

But "heavy write workload" describes a lot of the tests I already have, so I'm
not sure what I'm missing. Argh.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [bcachefs] bcache (dm-10): IO error on dm-10 for checksum error (due to change of str_hash?)
  2016-09-16  9:16         ` Kent Overstreet
@ 2016-09-16 11:17           ` Marcin Mirosław
  2016-09-16 11:24             ` Kent Overstreet
  0 siblings, 1 reply; 12+ messages in thread
From: Marcin Mirosław @ 2016-09-16 11:17 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-bcache

W dniu 16.09.2016 o 11:16, Kent Overstreet pisze:
> On Fri, Sep 16, 2016 at 11:02:10AM +0200, Marcin Mirosław wrote:
>> W dniu 16.09.2016 o 10:38, Kent Overstreet pisze:
>> [...]
>>> I haven't yet tried randomly flipping the compression type at runtime, I'll try
>>> that now...
>>
>> Don't forget changing other options, simply changing compression type I
>> tested some times ago and it worked;) I think that also heavy writes
>> while making changes are important.
> 
> Yeah, I think you're right about heavy writes - the one other bug remotely like
> this that's been reported was an intermittent deadlock under heavy write load.
> 
> But "heavy write workload" describes a lot of the tests I already have, so I'm
> not sure what I'm missing. Argh.

I used rsync to make noise:) Ok, With netconsole's help I have:
[11055.485337] bcache (dm-11): journal replay done, 0 keys in 1 entries,
seq 3451
< now I'm starting rsync on earlier used fs, problem happened soon >
[11159.293119] Kernel panic - not syncing: stack-protector: Kernel stack
is corrupted in: ffffffffc095b021
[11159.293157] CPU: 0 PID: 30023 Comm: rsync Tainted: P           O
4.7.0-bcache+ #2
[11159.293166] Hardware name: .   .  /IP35 Pro XE(Intel P35-ICH9R), BIOS
6.00 PG 09/09/2008
[11159.293176]  0000000000000086
00000000414f6380
ffff88002814fae0
ffffffff812cbe0d
[11159.293537] Kernel Offset: disabled
[11159.296006] ---[ end Kernel panic - not syncing: stack-protector:
Kernel stack is corrupted in: ffffffffc095b021

New lines can be broken, I used tcpdump to catch messages.

Marcin

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [bcachefs] bcache (dm-10): IO error on dm-10 for checksum error (due to change of str_hash?)
  2016-09-16 11:17           ` Marcin Mirosław
@ 2016-09-16 11:24             ` Kent Overstreet
  2016-09-16 12:27               ` Marcin Mirosław
  0 siblings, 1 reply; 12+ messages in thread
From: Kent Overstreet @ 2016-09-16 11:24 UTC (permalink / raw)
  To: Marcin Mirosław; +Cc: linux-bcache

On Fri, Sep 16, 2016 at 01:17:39PM +0200, Marcin Mirosław wrote:
> W dniu 16.09.2016 o 11:16, Kent Overstreet pisze:
> > On Fri, Sep 16, 2016 at 11:02:10AM +0200, Marcin Mirosław wrote:
> >> W dniu 16.09.2016 o 10:38, Kent Overstreet pisze:
> >> [...]
> >>> I haven't yet tried randomly flipping the compression type at runtime, I'll try
> >>> that now...
> >>
> >> Don't forget changing other options, simply changing compression type I
> >> tested some times ago and it worked;) I think that also heavy writes
> >> while making changes are important.
> > 
> > Yeah, I think you're right about heavy writes - the one other bug remotely like
> > this that's been reported was an intermittent deadlock under heavy write load.
> > 
> > But "heavy write workload" describes a lot of the tests I already have, so I'm
> > not sure what I'm missing. Argh.
> 
> I used rsync to make noise:) Ok, With netconsole's help I have:
> [11055.485337] bcache (dm-11): journal replay done, 0 keys in 1 entries,
> seq 3451
> < now I'm starting rsync on earlier used fs, problem happened soon >
> [11159.293119] Kernel panic - not syncing: stack-protector: Kernel stack
> is corrupted in: ffffffffc095b021
> [11159.293157] CPU: 0 PID: 30023 Comm: rsync Tainted: P           O
> 4.7.0-bcache+ #2
> [11159.293166] Hardware name: .   .  /IP35 Pro XE(Intel P35-ICH9R), BIOS
> 6.00 PG 09/09/2008
> [11159.293176]  0000000000000086
> 00000000414f6380
> ffff88002814fae0
> ffffffff812cbe0d
> [11159.293537] Kernel Offset: disabled
> [11159.296006] ---[ end Kernel panic - not syncing: stack-protector:
> Kernel stack is corrupted in: ffffffffc095b021

Can you see what ffffffffc095b021 is with addr2line?

addr2line -i -e vmlinux ffffffffc095b021

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [bcachefs] bcache (dm-10): IO error on dm-10 for checksum error (due to change of str_hash?)
  2016-09-16 11:24             ` Kent Overstreet
@ 2016-09-16 12:27               ` Marcin Mirosław
  2016-09-16 12:36                 ` Kent Overstreet
  0 siblings, 1 reply; 12+ messages in thread
From: Marcin Mirosław @ 2016-09-16 12:27 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-bcache

W dniu 16.09.2016 o 13:24, Kent Overstreet pisze:
> addr2line -i -e vmlinux ffffffffc095b021

# addr2line -i -e vmlinux ffffffffc095b021
??:0

bcache is compiled as a module at this host:(
I recompiled kernel and build bcache inside kernel. I'm getting now much
more:

> [  172.035755] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffff814883d1                                                                       [24/212]
> [  172.035755] 
> [  172.035789] CPU: 1 PID: 3949 Comm: rsync Tainted: P           O    4.7.0-bcache+ #3
> [  172.035804] Hardware name: .   .  /IP35 Pro XE(Intel P35-ICH9R), BIOS 6.00 PG 09/09/2008
> [  172.035814]  0000000000000086 0000000058043aa7 ffff8800bd39bae0 ffffffff812cbe2d
> [  172.035837]  ffffffff8178a260 ffff8800bd39bb78 ffff8800bd39bb68 ffffffff81128a91
> [  172.035861]  ffff880000000010 ffff8800bd39bb78 ffff8800bd39bb10 0000000058043aa7
> [  172.035883] Call Trace:
> [  172.035895]  [<ffffffff812cbe2d>] dump_stack+0x4f/0x72
> [  172.035905]  [<ffffffff81128a91>] panic+0xd3/0x219
> [  172.035914]  [<ffffffff812942a4>] ? sha1_final+0x94/0x110
> [  172.035924]  [<ffffffff814883d1>] ? bch_xattr_hash+0x2b1/0x2d0
> [  172.035934]  [<ffffffff81058f24>] __stack_chk_fail+0x14/0x30
> [  172.035942]  [<ffffffff814883d1>] bch_xattr_hash+0x2b1/0x2d0
> [  172.035950]  [<ffffffff81488459>] xattr_hash_key+0x9/0x10
> [  172.035960]  [<ffffffff814884e4>] ? bch_xattr_get+0x84/0x1a0
> [  172.035969]  [<ffffffff8145cd69>] ? __bch_write_inode+0x289/0x2d0
> [  172.035978]  [<ffffffff81435438>] ? bch_get_acl+0x48/0x2b0
> [  172.035988]  [<ffffffff811ee9c1>] ? get_acl+0x71/0xf0
> [  172.035996]  [<ffffffff811eea8a>] ? posix_acl_chmod+0x4a/0xe0
> [  172.036005]  [<ffffffff8145d13f>] ? bch_setattr+0x7f/0xa0
> [  172.036015]  [<ffffffff811ac98e>] ? notify_change+0x23e/0x350
> [  172.036024]  [<ffffffff8118c6a9>] ? chmod_common+0x89/0x140
> [  172.036033]  [<ffffffff8118d991>] ? SyS_fchmod+0x31/0x50
> [  172.036042]  [<ffffffff815d0c1f>] ? entry_SYSCALL_64_fastpath+0x17/0x93
> [  172.036062] Kernel Offset: disabled
> [  172.036071] ---[ end Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffff814883d1
> [  172.036071]
> [  172.036088] ------------[ cut here ]------------
> [  172.036114] WARNING: CPU: 1 PID: 3949 at arch/x86/kernel/smp.c:125 native_smp_send_reschedule+0x39/0x40
> [  172.037526] Modules linked in: netconsole configfs tun dm_snapshot dm_bufio mousedev pci_stub nouveau vboxpci(O) vboxnetadp(O) vboxnetflt(O) fbcon bitblit wmi softcursor font video t
> tm drm_kms_helper syscopyarea sysfillrect sysimgblt vboxdrv(O) fb_sys_fops drm nvidiafb cfbfillrect cfbimgblt vgastate cfbcopyarea i2c_algo_bit backlight fb_ddc fb coretemp hwmon fbdev 
> snd_hda_codec_realtek snd_hda_codec_generic kvm_intel kvm snd_hda_intel snd_hda_codec irqbypass psmouse snd_hda_core snd_pcm evdev r8169 i2c_i801 snd_timer intel_agp mii acpi_cpufreq in
> tel_gtt 8250 snd thermal agpgart soundcore processor 8250_base shpchp serial_core lpc_ich button mfd_core zfs(PO) zunicode(PO) sr_mod cdrom pata_acpi zcommon(PO) znvpair(PO) spl(O) zavl
> (PO) pata_jmicron
> [  172.040086] CPU: 1 PID: 3949 Comm: rsync Tainted: P           O    4.7.0-bcache+ #3
> [  172.040086] Hardware name: .   .  /IP35 Pro XE(Intel P35-ICH9R), BIOS 6.00 PG 09/09/2008
> [  172.040086]  0000000000000086 0000000058043aa7 ffff88014fc83da0 ffffffff812cbe2d
> [  172.040086]  0000000000000000 0000000000000000 ffff88014fc83de0 ffffffff810591d6
> [  172.040086]  0000007d4fc96200 0000000000000000 0000000000000001 000000000000c888
> [  172.040086] Call Trace:
> [  172.040086]  <IRQ>  [<ffffffff812cbe2d>] dump_stack+0x4f/0x72
> [  172.040086]  [<ffffffff810591d6>] __warn+0xc6/0xe0
> [  172.040086]  [<ffffffff810592f8>] warn_slowpath_null+0x18/0x20
> [  172.040086]  [<ffffffff81039f19>] native_smp_send_reschedule+0x39/0x40
> [  172.040086]  [<ffffffff81092ed1>] trigger_load_balance+0x131/0x210
> [  172.040086]  [<ffffffff81083959>] scheduler_tick+0x99/0xd0
> [  172.040086]  [<ffffffff810d05c0>] ? tick_sched_handle.isra.12+0x60/0x60
> [  172.040086]  [<ffffffff810c137c>] update_process_times+0x4c/0x60
> [  172.040086]  [<ffffffff810d0580>] tick_sched_handle.isra.12+0x20/0x60
> [  172.040086]  [<ffffffff810d05f8>] tick_sched_timer+0x38/0x70
> [  172.040086]  [<ffffffff810c1bf0>] __hrtimer_run_queues+0xb0/0x2c0
> [  172.040086]  [<ffffffff810c2467>] hrtimer_interrupt+0xa7/0x1a0
> [  172.040086]  [<ffffffff8103bee1>] local_apic_timer_interrupt+0x31/0x50
> [  172.040086]  [<ffffffff8103c8e8>] smp_apic_timer_interrupt+0x38/0x50
> [  172.040086]  [<ffffffff815d180f>] apic_timer_interrupt+0x7f/0x90
> [  172.040086]  <EOI>  [<ffffffff81128b99>] ? panic+0x1db/0x219
> [  172.040086]  [<ffffffff812942a4>] ? sha1_final+0x94/0x110
> [  172.040086]  [<ffffffff814883d1>] ? bch_xattr_hash+0x2b1/0x2d0
> [  172.040086]  [<ffffffff81058f24>] __stack_chk_fail+0x14/0x30
> [  172.040086]  [<ffffffff814883d1>] bch_xattr_hash+0x2b1/0x2d0
> [  172.040086]  [<ffffffff81488459>] xattr_hash_key+0x9/0x10
> [  172.040086]  [<ffffffff814884e4>] ? bch_xattr_get+0x84/0x1a0
> [  172.040086]  [<ffffffff8145cd69>] ? __bch_write_inode+0x289/0x2d0
> [  172.040086]  [<ffffffff81435438>] ? bch_get_acl+0x48/0x2b0
> [  172.040086]  [<ffffffff811ee9c1>] ? get_acl+0x71/0xf0
> [  172.040086]  [<ffffffff811eea8a>] ? posix_acl_chmod+0x4a/0xe0
> [  172.040086]  [<ffffffff8145d13f>] ? bch_setattr+0x7f/0xa0
> [  172.040086]  [<ffffffff811ac98e>] ? notify_change+0x23e/0x350
> [  172.040086]  [<ffffffff8118c6a9>] ? chmod_common+0x89/0x140
> [  172.040086]  [<ffffffff8118d991>] ? SyS_fchmod+0x31/0x50
> [  172.040086]  [<ffffffff815d0c1f>] ? entry_SYSCALL_64_fastpath+0x17/0x93
> [  172.040086] ---[ end trace eaafefe4328420aa ]---

And the winner is...:
# addr2line -i -e vmlinux ffffffff814883d1
/usr/src/linux-bcache/drivers/md/bcache/xattr.c:33

_Current_ options are:
# cat /sys/fs/bcache/1005562c-b899-4a27-bfe5-e7d9e89d4bf2/options/*
none lz4 [gzip]
none crc32c [crc64]
1
continue [remount-ro] panic
10
0
0
none crc32c [crc64]
1
0
0
0
crc32c crc64 siphash [sha1]
0


Marcin

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [bcachefs] bcache (dm-10): IO error on dm-10 for checksum error (due to change of str_hash?)
  2016-09-16 12:27               ` Marcin Mirosław
@ 2016-09-16 12:36                 ` Kent Overstreet
  0 siblings, 0 replies; 12+ messages in thread
From: Kent Overstreet @ 2016-09-16 12:36 UTC (permalink / raw)
  To: Marcin Mirosław; +Cc: linux-bcache

On Fri, Sep 16, 2016 at 02:27:11PM +0200, Marcin Mirosław wrote:
> W dniu 16.09.2016 o 13:24, Kent Overstreet pisze:
> > addr2line -i -e vmlinux ffffffffc095b021
> 
> # addr2line -i -e vmlinux ffffffffc095b021
> ??:0
> 
> bcache is compiled as a module at this host:(
> I recompiled kernel and build bcache inside kernel. I'm getting now much
> more:

The sha1 code - oh, I know what's going on. I'll work on a fix tomorrow.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-09-16 12:36 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-15  9:36 [bcachefs] bcache (dm-10): IO error on dm-10 for checksum error (due to change of str_hash?) Marcin Mirosław
2016-09-15  9:39 ` Marcin Mirosław
2016-09-16  2:12 ` Kent Overstreet
2016-09-16  3:33 ` Kent Overstreet
2016-09-16  8:07   ` Marcin Mirosław
2016-09-16  8:38     ` Kent Overstreet
2016-09-16  9:02       ` Marcin Mirosław
2016-09-16  9:16         ` Kent Overstreet
2016-09-16 11:17           ` Marcin Mirosław
2016-09-16 11:24             ` Kent Overstreet
2016-09-16 12:27               ` Marcin Mirosław
2016-09-16 12:36                 ` Kent Overstreet

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.