* Bcache stuck at writeback of a key, consuming 100% CPU, not possible to detach
@ 2015-08-30 8:54 Vojtech Pavlik
2015-08-31 14:39 ` Emmanuel Florac
2015-09-05 11:06 ` Jens-U. Mozdzen
0 siblings, 2 replies; 19+ messages in thread
From: Vojtech Pavlik @ 2015-08-30 8:54 UTC (permalink / raw)
To: kmo; +Cc: linux-bcache
Hello Kent, list,
I have been using bcache for quite a while now. This particular setup is
a 6TB RAID5 mdraid, with a 200 GB SSD for caching, with bcache running
in writeback mode with a fixed low writeback rate and a btrfs filesystem
on top. Currently on kernel 4.1.6.
The system has survived some crashes and some power outages, however
always recovered without visible damage.
btrfs scrub finishes in a couple hours and doesn't report any errors, so
the filesystem data are consistent and readable.
Some months ago, I started observing large latencies when servicing some
I/O requests, these events would come minutes apart and would result in
applications stuck in the D state for tens of seconds.
I started debugging and found out that several factors contributed,
namely I had a large readahead value set up, disabling readahead helped.
The kernel switched to a multiqueue I/O scheduler, which didn't have
request merging, going back to deadline helped. But didn't help enough.
Then I noticed that during those situations where the system was slow, and
processes stuck in D, bcache_writeback CPU usage was soaring all the way to
saturating a core, showing this backtrace, spending time in refill_keybuf_fn():
NMI backtrace for cpu 6
CPU: 6 PID: 1743 Comm: bcache_writebac Not tainted 4.1.6-2.gce0123d-desktop #1
Hardware name: Gigabyte Technology Co., Ltd. Z87MX-D3H/Z87MX-D3H-CF, BIOS F5 08/02/2013
task: ffff8807ecf32250 ti: ffff8808045c4000 task.ti: ffff8808045c4000
RIP: 0010:[<ffffffffa0bdf592>] [<ffffffffa0bdf592>] refill_keybuf_fn+0x12/0x220 [bcache]
RSP: 0018:ffff8808045c7b68 EFLAGS: 00000292
RAX: ffffffffa0bdf580 RBX: 0000000000000000 RCX: ffffc900073fb000
RDX: ffff8807e4d4cca0 RSI: ffff8807fb22e800 RDI: ffff8808045c7db8
RBP: ffff8808045c7db8 R08: ffff8807ea466000 R09: 0000000000000001
R10: 0000000000000008 R11: 000007ffffffffff R12: ffff8808045c7db8
R13: 0000000000000000 R14: 00000000ffffffff R15: ffff8807fb22e8d0
FS: 0000000000000000(0000) GS:ffff88082f380000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f5c080b4660 CR3: 0000000002e0f000 CR4: 00000000001426e0
Stack:
ffffffffa0bde137 0000000000000000 ffff8807fb22e800 0000000000000000
ffff8807fb22e800 ffff8808045c7db8 0000000000000000 ffffffffa0be2120
ffff8807ea4a0000 ffffffffa0bdf580 ffffffffa0bdee84 00000001a0bdcfc3
Call Trace:
[<ffffffffa0be2120>] bch_btree_map_keys_recurse+0x50/0x170 [bcache]
[<ffffffffa0be21a4>] bch_btree_map_keys_recurse+0xd4/0x170 [bcache]
[<ffffffffa0be5207>] bch_btree_map_keys+0x167/0x1c0 [bcache]
[<ffffffffa0be5309>] bch_refill_keybuf+0xa9/0x200 [bcache]
[<ffffffffa0bf9d3e>] bch_writeback_thread+0x11e/0x430 [bcache]
[<ffffffff810876d1>] kthread+0xc1/0xe0
[<ffffffff816b07e2>] ret_from_fork+0x42/0x70
Code: e8 24 e8 ac e0 48 83 c4 08 48 89 d8 5b 5d 41 5c 41 5d 41 5e 41 5f c3 0f 0b 0f 1f 44 00 00 41 55 41 54 55 53 48 89 fd 48 83 ec 18 <48> 8b 77 40 48 8b 5f 38 48 8b 3a 48 8b 0e 48 89 f8 25 ff ff 0f
Changing the configuration to writeback_percent=40 helped. For some
time at least.
When the issue returned, without any further changes to the system, I
started investigating deeper. Since writeback_percent was large, also
the amount of dirty data was large. Before poking deeper, I decided I
want to clear the dirty data entierly. So I set the system to
cache_mode=writethrough and watched the dirty data trickle to the
backing device.
But then it stopped at 2.8G and didn't progress any further. The
bcache_writeback thread was at 100% CPU usage again and system was near
unusable. Reverting to writeback made the system responsive again.
A ftrace shows this in a tight loop over and over again:
bcache_writebac-1707 [004] .... 1064.836033: bch_btree_iter_next_filter <-bch_btree_map_keys_recurse
bcache_writebac-1707 [004] .... 1064.836033: bch_extent_bad <-bch_btree_iter_next_filter
bcache_writebac-1707 [004] .... 1064.836033: __bch_extent_invalid <-bch_extent_bad
bcache_writebac-1707 [004] .... 1064.836033: __ptr_invalid <-__bch_extent_invalid
bcache_writebac-1707 [004] .... 1064.836033: refill_keybuf_fn <-bch_btree_map_keys_recurse
bcache_writebac-1707 [004] .... 1064.836033: dirty_pred <-refill_keybuf_fn
A full trace can be found at http://twilight.ucw.cz/trace.bz2 and starts
with a working system, then shows the bad behavior (most of the lines)
and ends with a recovery after writeback mode was enabled again.
I also tried detaching the cache device, however, that resulted in the
same behavior that only was possible to stop by a reboot, since
bcache was tring to flush the dirty data no matter what.
So my conclusion is that there is an extent on the cache device, that's
declared 'bad' by bcache for some reason, most likely past corruption.
And when bcache_writeback hits it, and has nothing else to work on,
it immediately returns to it and tries again, with no progress.
I have tried enabling the "expensive debug checks" on the device, but it
didn't show any additional information in the system log when hitting
the bad extents.
Since I need to recover the system from this situation, my current plan
is to copy all the filesystem contents to another media and then rebuild
the bcache setup from scratch.
I consider this a rather serious bug, even though it is most likely
caused by the cache device being corrupted. Any hints?
Best regards,
--
Vojtech Pavlik
Director SUSE Labs
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Bcache stuck at writeback of a key, consuming 100% CPU, not possible to detach
2015-08-30 8:54 Bcache stuck at writeback of a key, consuming 100% CPU, not possible to detach Vojtech Pavlik
@ 2015-08-31 14:39 ` Emmanuel Florac
2015-08-31 14:49 ` Vojtech Pavlik
2015-09-05 11:06 ` Jens-U. Mozdzen
1 sibling, 1 reply; 19+ messages in thread
From: Emmanuel Florac @ 2015-08-31 14:39 UTC (permalink / raw)
To: Vojtech Pavlik; +Cc: kmo, linux-bcache
Le Sun, 30 Aug 2015 10:54:42 +0200
Vojtech Pavlik <vojtech@suse.cz> écrivait:
> Then I noticed that during those situations where the system was
> slow, and processes stuck in D, bcache_writeback CPU usage was
> soaring all the way to saturating a core,
In my experience, bcache_writeback stays in Wait state, therefore
always saturate a core: any machine I'm running bcache on has a
constant load of 1.00 even when completely idle.
> showing this backtrace,
> spending time in refill_keybuf_fn():
<snip>
> Changing the configuration to writeback_percent=40 helped. For some
> time at least.
>
> When the issue returned, without any further changes to the system, I
> started investigating deeper. Since writeback_percent was large, also
> the amount of dirty data was large.
In my case, when dirty data reaches the upper limit (i.e. when the
amount of dirty data equals the writeback_percent * backing device
size ), and it occurs regularly, the system just freezes...
> Before poking deeper, I decided I
> want to clear the dirty data entierly. So I set the system to
> cache_mode=writethrough and watched the dirty data trickle to the
> backing device.
>
> But then it stopped at 2.8G and didn't progress any further. The
> bcache_writeback thread was at 100% CPU usage again and system was
> near unusable. Reverting to writeback made the system responsive
> again.
The bcache_writeback stays at 100% _even_ when in writethrough mode,
alas. So this looks normal. However dirty_data definitely should drop
to zero...
<snip>
> I consider this a rather serious bug, even though it is most likely
> caused by the cache device being corrupted. Any hints?
Did you check what "smartctl -a" has to say about your backing device,
and maybe your spinning drives too? Just in case...
--
------------------------------------------------------------------------
Emmanuel Florac | Direction technique
| Intellique
| <eflorac@intellique.com>
| +33 1 78 94 84 02
------------------------------------------------------------------------
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Bcache stuck at writeback of a key, consuming 100% CPU, not possible to detach
2015-08-31 14:39 ` Emmanuel Florac
@ 2015-08-31 14:49 ` Vojtech Pavlik
2015-08-31 15:04 ` Kent Overstreet
2015-08-31 15:09 ` Emmanuel Florac
0 siblings, 2 replies; 19+ messages in thread
From: Vojtech Pavlik @ 2015-08-31 14:49 UTC (permalink / raw)
To: Emmanuel Florac; +Cc: kmo, linux-bcache
On Mon, Aug 31, 2015 at 04:39:37PM +0200, Emmanuel Florac wrote:
> > Then I noticed that during those situations where the system was
> > slow, and processes stuck in D, bcache_writeback CPU usage was
> > soaring all the way to saturating a core,
>
> In my experience, bcache_writeback stays in Wait state, therefore
> always saturate a core: any machine I'm running bcache on has a
> constant load of 1.00 even when completely idle.
In this situation, I see it in an "R" state.
> > showing this backtrace,
> > spending time in refill_keybuf_fn():
> <snip>
> > Changing the configuration to writeback_percent=40 helped. For some
> > time at least.
> >
> > When the issue returned, without any further changes to the system, I
> > started investigating deeper. Since writeback_percent was large, also
> > the amount of dirty data was large.
>
> In my case, when dirty data reaches the upper limit (i.e. when the
> amount of dirty data equals the writeback_percent * backing device
> size ), and it occurs regularly, the system just freezes...
That may be a similar symptom.
> > Before poking deeper, I decided I
> > want to clear the dirty data entierly. So I set the system to
> > cache_mode=writethrough and watched the dirty data trickle to the
> > backing device.
> >
> > But then it stopped at 2.8G and didn't progress any further. The
> > bcache_writeback thread was at 100% CPU usage again and system was
> > near unusable. Reverting to writeback made the system responsive
> > again.
>
> The bcache_writeback stays at 100% _even_ when in writethrough mode,
> alas. So this looks normal. However dirty_data definitely should drop
> to zero...
This most certainly isn't normal. The ftrace shows it's looping in a
loop doing nothing useful.
> <snip>
> > I consider this a rather serious bug, even though it is most likely
> > caused by the cache device being corrupted. Any hints?
>
> Did you check what "smartctl -a" has to say about your backing device,
> and maybe your spinning drives too? Just in case...
Yes, they're fine. The backing device is a RAID5.
--
Vojtech Pavlik
Director SuSE Labs
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Bcache stuck at writeback of a key, consuming 100% CPU, not possible to detach
2015-08-31 14:49 ` Vojtech Pavlik
@ 2015-08-31 15:04 ` Kent Overstreet
2015-08-31 16:45 ` Vojtech Pavlik
2015-08-31 16:54 ` Vojtech Pavlik
2015-08-31 15:09 ` Emmanuel Florac
1 sibling, 2 replies; 19+ messages in thread
From: Kent Overstreet @ 2015-08-31 15:04 UTC (permalink / raw)
To: Vojtech Pavlik; +Cc: Emmanuel Florac, kmo, linux-bcache
On Mon, Aug 31, 2015 at 04:49:49PM +0200, Vojtech Pavlik wrote:
> On Mon, Aug 31, 2015 at 04:39:37PM +0200, Emmanuel Florac wrote:
>
> > > Then I noticed that during those situations where the system was
> > > slow, and processes stuck in D, bcache_writeback CPU usage was
> > > soaring all the way to saturating a core,
> >
> > In my experience, bcache_writeback stays in Wait state, therefore
> > always saturate a core: any machine I'm running bcache on has a
> > constant load of 1.00 even when completely idle.
>
> In this situation, I see it in an "R" state.
>
> > > showing this backtrace,
> > > spending time in refill_keybuf_fn():
> > <snip>
> > > Changing the configuration to writeback_percent=40 helped. For some
> > > time at least.
> > >
> > > When the issue returned, without any further changes to the system, I
> > > started investigating deeper. Since writeback_percent was large, also
> > > the amount of dirty data was large.
> >
> > In my case, when dirty data reaches the upper limit (i.e. when the
> > amount of dirty data equals the writeback_percent * backing device
> > size ), and it occurs regularly, the system just freezes...
>
> That may be a similar symptom.
I suspect there's two different bugs here.
- I'm starting to suspect there's a bug in the dirty data accounting, and it's
getting out of sync - i.e. reading 2.8 GB or whatever when it's actually 0.
that would explain it spinning when there actually isn't any work for it to
do.
- with a large enough amount of data, the 30 second writeback_delay may be
insufficient; if it takes longer than that just to scan the entire keyspace
it'll never get a chance to sleep. try bumping writeback_delay up and see if
that helps.
the ratelimiting on scanning for dirty data needs to be changed to something
more sophisticated, the existing fixed delay is problematic.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Bcache stuck at writeback of a key, consuming 100% CPU, not possible to detach
2015-08-31 14:49 ` Vojtech Pavlik
2015-08-31 15:04 ` Kent Overstreet
@ 2015-08-31 15:09 ` Emmanuel Florac
2015-08-31 15:54 ` Vojtech Pavlik
1 sibling, 1 reply; 19+ messages in thread
From: Emmanuel Florac @ 2015-08-31 15:09 UTC (permalink / raw)
To: Vojtech Pavlik; +Cc: kmo, linux-bcache
Le Mon, 31 Aug 2015 16:49:49 +0200
Vojtech Pavlik <vojtech@suse.cz> écrivait:
> >
> > The bcache_writeback stays at 100% _even_ when in writethrough mode,
> > alas. So this looks normal. However dirty_data definitely should
> > drop to zero...
>
> This most certainly isn't normal. The ftrace shows it's looping in a
> loop doing nothing useful.
>
Yep, I've had a quick look at the code, however it's not very
inspiring. It very much looks like there's a missing test. From your
trace it looks like it finds a key_bad (?) -- from the struct
bch_extent_keys_ops in bcache/extents.c -- but there is no code anywhere
to manage this case apparently.
Ah, and I can't download your complete trace, too : twilight.ucw.cz
isn't responding. :)
--
------------------------------------------------------------------------
Emmanuel Florac | Direction technique
| Intellique
| <eflorac@intellique.com>
| +33 1 78 94 84 02
------------------------------------------------------------------------
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Bcache stuck at writeback of a key, consuming 100% CPU, not possible to detach
2015-08-31 15:09 ` Emmanuel Florac
@ 2015-08-31 15:54 ` Vojtech Pavlik
0 siblings, 0 replies; 19+ messages in thread
From: Vojtech Pavlik @ 2015-08-31 15:54 UTC (permalink / raw)
To: Emmanuel Florac; +Cc: kmo, linux-bcache
On Mon, Aug 31, 2015 at 05:09:28PM +0200, Emmanuel Florac wrote:
> Le Mon, 31 Aug 2015 16:49:49 +0200
> Vojtech Pavlik <vojtech@suse.cz> écrivait:
>
> > >
> > > The bcache_writeback stays at 100% _even_ when in writethrough mode,
> > > alas. So this looks normal. However dirty_data definitely should
> > > drop to zero...
> >
> > This most certainly isn't normal. The ftrace shows it's looping in a
> > loop doing nothing useful.
> >
>
> Yep, I've had a quick look at the code, however it's not very
> inspiring. It very much looks like there's a missing test. From your
> trace it looks like it finds a key_bad (?) -- from the struct
> bch_extent_keys_ops in bcache/extents.c -- but there is no code anywhere
> to manage this case apparently.
>
> Ah, and I can't download your complete trace, too : twilight.ucw.cz
> isn't responding. :)
Sadly, the bcache array crashed hard earlier today (with a bug in and the machine
doesn't boot anymore. I'm recovering it now.
--
Vojtech Pavlik
Director SuSE Labs
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Bcache stuck at writeback of a key, consuming 100% CPU, not possible to detach
2015-08-31 15:04 ` Kent Overstreet
@ 2015-08-31 16:45 ` Vojtech Pavlik
2015-08-31 16:53 ` Kent Overstreet
2015-08-31 16:54 ` Vojtech Pavlik
1 sibling, 1 reply; 19+ messages in thread
From: Vojtech Pavlik @ 2015-08-31 16:45 UTC (permalink / raw)
To: Kent Overstreet; +Cc: Emmanuel Florac, kmo, linux-bcache
On Mon, Aug 31, 2015 at 07:04:29AM -0800, Kent Overstreet wrote:
> I suspect there's two different bugs here.
>
> - I'm starting to suspect there's a bug in the dirty data accounting, and it's
> getting out of sync - i.e. reading 2.8 GB or whatever when it's actually 0.
> that would explain it spinning when there actually isn't any work for it to
> do.
That may be the case, but doesn't quite match my observation. Using this
command line:
echo 2 > writeback_percent; echo 0 > writeback_percent; echo 100000 > writeback_rate; echo none > cache_mode; while true; do if top -b -n 1 | grep 'R.*bcache_write'; then date; echo looping; echo writeback > cache_mode; echo 40 > writeback_percent; sleep 1; echo 2 > writeback_percent; echo 0 > writeback_percent; echo 100000 > writeback_rate; echo none > cache_mode; echo fixed; cat /sys/block/bcache0/bcache/dirty_data; fi; done
I managed to get down to about 200 MB od dirty data reported. If the
reporting was off by a fixed offset, I wouldn't be getting the 100% CPU
and running bcache_writeback at 5GB of dirty data already.
At least unless the accounting of dirty data is very wrong and
fluctuating.
> - with a large enough amount of data, the 30 second writeback_delay may be
> insufficient; if it takes longer than that just to scan the entire keyspace
> it'll never get a chance to sleep. try bumping writeback_delay up and see if
> that helps.
That shouldn't be the case when the amount of dirty data is below a
gigabyte, or is it?
> the ratelimiting on scanning for dirty data needs to be changed to something
> more sophisticated, the existing fixed delay is problematic.
--
Vojtech Pavlik
Director SuSE Labs
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Bcache stuck at writeback of a key, consuming 100% CPU, not possible to detach
2015-08-31 16:45 ` Vojtech Pavlik
@ 2015-08-31 16:53 ` Kent Overstreet
2015-08-31 17:09 ` Vojtech Pavlik
2015-09-01 13:34 ` Vojtech Pavlik
0 siblings, 2 replies; 19+ messages in thread
From: Kent Overstreet @ 2015-08-31 16:53 UTC (permalink / raw)
To: Vojtech Pavlik; +Cc: Emmanuel Florac, kmo, linux-bcache
On Mon, Aug 31, 2015 at 06:45:31PM +0200, Vojtech Pavlik wrote:
> On Mon, Aug 31, 2015 at 07:04:29AM -0800, Kent Overstreet wrote:
>
> > I suspect there's two different bugs here.
> >
> > - I'm starting to suspect there's a bug in the dirty data accounting, and it's
> > getting out of sync - i.e. reading 2.8 GB or whatever when it's actually 0.
> > that would explain it spinning when there actually isn't any work for it to
> > do.
>
> That may be the case, but doesn't quite match my observation. Using this
> command line:
>
> echo 2 > writeback_percent; echo 0 > writeback_percent; echo 100000 > writeback_rate; echo none > cache_mode; while true; do if top -b -n 1 | grep 'R.*bcache_write'; then date; echo looping; echo writeback > cache_mode; echo 40 > writeback_percent; sleep 1; echo 2 > writeback_percent; echo 0 > writeback_percent; echo 100000 > writeback_rate; echo none > cache_mode; echo fixed; cat /sys/block/bcache0/bcache/dirty_data; fi; done
>
> I managed to get down to about 200 MB od dirty data reported. If the
> reporting was off by a fixed offset, I wouldn't be getting the 100% CPU
> and running bcache_writeback at 5GB of dirty data already.
>
> At least unless the accounting of dirty data is very wrong and
> fluctuating.
Huh, that's good to know then.
> > - with a large enough amount of data, the 30 second writeback_delay may be
> > insufficient; if it takes longer than that just to scan the entire keyspace
> > it'll never get a chance to sleep. try bumping writeback_delay up and see if
> > that helps.
>
> That shouldn't be the case when the amount of dirty data is below a
> gigabyte, or is it?
No - it has to scan the entire btree, cached _and_ dirty data - so the scanning
gets expensive when you have lots of clean cached data and very little dirty
data, so it's supposed to ratelimit to no more than one scan every 30 seconds
(IIRC; that algorithm has gone through a couple different iterations). But if
it's taking more than 30 seconds to complete one scan... well, you see the
problem?
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Bcache stuck at writeback of a key, consuming 100% CPU, not possible to detach
2015-08-31 15:04 ` Kent Overstreet
2015-08-31 16:45 ` Vojtech Pavlik
@ 2015-08-31 16:54 ` Vojtech Pavlik
1 sibling, 0 replies; 19+ messages in thread
From: Vojtech Pavlik @ 2015-08-31 16:54 UTC (permalink / raw)
To: Kent Overstreet; +Cc: Emmanuel Florac, linux-bcache
On Mon, Aug 31, 2015 at 07:04:29AM -0800, Kent Overstreet wrote:
> I suspect there's two different bugs here.
>
> - I'm starting to suspect there's a bug in the dirty data accounting, and it's
> getting out of sync - i.e. reading 2.8 GB or whatever when it's actually 0.
> that would explain it spinning when there actually isn't any work for it to
> do.
>
> - with a large enough amount of data, the 30 second writeback_delay may be
> insufficient; if it takes longer than that just to scan the entire keyspace
> it'll never get a chance to sleep. try bumping writeback_delay up and see if
> that helps.
>
> the ratelimiting on scanning for dirty data needs to be changed to something
> more sophisticated, the existing fixed delay is problematic.
Is there any difference regarding writeback problems between using 4.1.6
and your bcache-dev git branch?
It may take a bit of effort, but I should be able to get bcache-dev
installed on my system.
--
Vojtech Pavlik
Director SuSE Labs
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Bcache stuck at writeback of a key, consuming 100% CPU, not possible to detach
2015-08-31 16:53 ` Kent Overstreet
@ 2015-08-31 17:09 ` Vojtech Pavlik
2015-09-01 13:34 ` Vojtech Pavlik
1 sibling, 0 replies; 19+ messages in thread
From: Vojtech Pavlik @ 2015-08-31 17:09 UTC (permalink / raw)
To: Kent Overstreet; +Cc: Emmanuel Florac, kmo, linux-bcache
On Mon, Aug 31, 2015 at 08:53:54AM -0800, Kent Overstreet wrote:
> > > - with a large enough amount of data, the 30 second writeback_delay may be
> > > insufficient; if it takes longer than that just to scan the entire keyspace
> > > it'll never get a chance to sleep. try bumping writeback_delay up and see if
> > > that helps.
> >
> > That shouldn't be the case when the amount of dirty data is below a
> > gigabyte, or is it?
>
> No - it has to scan the entire btree, cached _and_ dirty data - so the scanning
> gets expensive when you have lots of clean cached data and very little dirty
> data, so it's supposed to ratelimit to no more than one scan every 30 seconds
> (IIRC; that algorithm has gone through a couple different iterations). But if
> it's taking more than 30 seconds to complete one scan... well, you see the
> problem?
That makes sense. The cache device is ~200 GB. I'll try to play with
writeback_delay when the system is up and running again. Currently I'm
doing a full backup such that I can test things without any worries.
--
Vojtech Pavlik
Director SuSE Labs
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Bcache stuck at writeback of a key, consuming 100% CPU, not possible to detach
2015-08-31 16:53 ` Kent Overstreet
2015-08-31 17:09 ` Vojtech Pavlik
@ 2015-09-01 13:34 ` Vojtech Pavlik
1 sibling, 0 replies; 19+ messages in thread
From: Vojtech Pavlik @ 2015-09-01 13:34 UTC (permalink / raw)
To: Kent Overstreet; +Cc: Emmanuel Florac, linux-bcache
On Mon, Aug 31, 2015 at 08:53:54AM -0800, Kent Overstreet wrote:
> > > - with a large enough amount of data, the 30 second writeback_delay may be
> > > insufficient; if it takes longer than that just to scan the entire keyspace
> > > it'll never get a chance to sleep. try bumping writeback_delay up and see if
> > > that helps.
> >
> > That shouldn't be the case when the amount of dirty data is below a
> > gigabyte, or is it?
>
> No - it has to scan the entire btree, cached _and_ dirty data - so the scanning
> gets expensive when you have lots of clean cached data and very little dirty
> data, so it's supposed to ratelimit to no more than one scan every 30 seconds
> (IIRC; that algorithm has gone through a couple different iterations). But if
> it's taking more than 30 seconds to complete one scan... well, you see the
> problem?
I tested that and set 'writeback_delay' to 600. That should be 10
minutes. No change:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1721 root 20 0 0 0 0 R 100.0 0.000 1075:09 bcache_writeback
Vojtech
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Bcache stuck at writeback of a key, consuming 100% CPU, not possible to detach
2015-08-30 8:54 Bcache stuck at writeback of a key, consuming 100% CPU, not possible to detach Vojtech Pavlik
2015-08-31 14:39 ` Emmanuel Florac
@ 2015-09-05 11:06 ` Jens-U. Mozdzen
2015-09-05 11:29 ` Vojtech Pavlik
1 sibling, 1 reply; 19+ messages in thread
From: Jens-U. Mozdzen @ 2015-09-05 11:06 UTC (permalink / raw)
To: linux-bcache
Hi all,
I've noticed similar oddities with our servers - not quite the same,
but close enough so I won't open a new thread:
We're running kernel 3.18.8 on server machines delivering SAN and NAS
resources. Back-end storage is a MD-RAID6 (7 WD Red 1TB 2,5"), cache is
on MD-RAID1 (2 SSD TOSHIBA PX02SMF020 200GB). As we live migrated from
128GB SSDs to the Toshibas, the cache size is still at 128 GB.
On one of the servers, I noticed excessive I/O load reported by our
monitoring tools. Having read this thread, I tried to get the amount
of dirty data down (cache_mode set to "writeback", writeback_percent
to 0, writeback_rate to 10000 and then monitoring
writeback_rate_debug), but unlike with the other server, the amount of
dirty data would not go below 186M. Unlike with the original report,
bcache_writeback wasn't at 100% but varying in its CPU usage - but
always on top of all other processes running. I/O wait was unusually
high, compared to the amount of data written.
I left the system rest over night, to find that the next day it would
not go below 197M, so that "bad spot" had changed. The load on this
server had increased - looking at the stats, it seemed like writeback
was trying to write data all the time, but for whatever reason failing
(which matched the lower limit of dirty data).
Fearing the worst, I set the cache mode to writearound to disable
further caching (amount of dirty still wouldn't drop below its border
value), stopped the clients for this server and rebooted.
Luckily, the server came up without a problem, and *I now could get
the amount of dirty data down to zero*. I switched back to writeback,
with writeback_percent to 0 and a fixed writeback_rate.
So in our case, it looks like something borked in bcache's run-time,
rather than on-disk (read: SSD cache content).
Regards,
Jens
PS: We're still facing random reboots (of unknown cause), which may
correlate with bcache's "amount dirty" being near the limit set by
writeback_percent. I'm trying to work-around this these days by
running with writeback_percent set to zero and using a writeback_rate
that lets the cache clean up over the day... so far, so good, but it's
too early to tell for sure. Since switching to the new SSDs, the
reboot rate went down to about once a week and I've made this change
only two days ago.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Bcache stuck at writeback of a key, consuming 100% CPU, not possible to detach
2015-09-05 11:06 ` Jens-U. Mozdzen
@ 2015-09-05 11:29 ` Vojtech Pavlik
2015-09-07 15:13 ` Jens-U. Mozdzen
0 siblings, 1 reply; 19+ messages in thread
From: Vojtech Pavlik @ 2015-09-05 11:29 UTC (permalink / raw)
To: Jens-U. Mozdzen; +Cc: linux-bcache
On Sat, Sep 05, 2015 at 01:06:57PM +0200, Jens-U. Mozdzen wrote:
> I've noticed similar oddities with our servers - not quite the same,
> but close enough so I won't open a new thread:
>
> We're running kernel 3.18.8 on server machines delivering SAN and
> NAS resources. Back-end storage is a MD-RAID6 (7 WD Red 1TB 2,5"),
> cache is
> on MD-RAID1 (2 SSD TOSHIBA PX02SMF020 200GB). As we live migrated
> from 128GB SSDs to the Toshibas, the cache size is still at 128 GB.
>
> On one of the servers, I noticed excessive I/O load reported by our
> monitoring tools. Having read this thread, I tried to get the amount
> of dirty data down (cache_mode set to "writeback", writeback_percent
> to 0, writeback_rate to 10000 and then monitoring
> writeback_rate_debug), but unlike with the other server, the amount
> of dirty data would not go below 186M. Unlike with the original
> report, bcache_writeback wasn't at 100% but varying in its CPU usage
> - but always on top of all other processes running. I/O wait was
> unusually high, compared to the amount of data written.
This matches my situation and behavior exactly. Including the fact that
the backing device is a md raid.
> I left the system rest over night, to find that the next day it
> would not go below 197M, so that "bad spot" had changed. The load on
> this server had increased - looking at the stats, it seemed like
> writeback was trying to write data all the time, but for whatever
> reason failing (which matched the lower limit of dirty data).
>
> Fearing the worst, I set the cache mode to writearound to disable
> further caching (amount of dirty still wouldn't drop below its
> border value), stopped the clients for this server and rebooted.
>
> Luckily, the server came up without a problem, and *I now could get
> the amount of dirty data down to zero*. I switched back to
> writeback, with writeback_percent to 0 and a fixed writeback_rate.
>
> So in our case, it looks like something borked in bcache's run-time,
> rather than on-disk (read: SSD cache content).
My understanding of bcache internals is still limited: I only spent
three days gazing into the code and adding debugging. But this is what I
believe is happening:
Bcache tries to be clever and only write whole stripes to the RAID. That
is good as it avoids a R-M-W cycle in the RAID on the data.
However, the stripe optimization was added on top of the usual writeback
code and doesn't play very nicely with it, since it takes over the usage
of the last_scanned pointer, confusing the original code.
This means that bcache_writeback is able to write all complete stripes
back, but gets stuck at any random small dirty data bits.
Stuck and spinning, going to sleep only for zero-length times.
This also explains that write traffic on the device can complete some of
the small dirty bits to full stripes and eventually have them written
back.
I've sent an one-liner patch to the mailing list a few moments ago that
fixes the issue.
> PS: We're still facing random reboots (of unknown cause), which may
> correlate with bcache's "amount dirty" being near the limit set by
> writeback_percent.
Being near that limit is what bcache tries to achieve. It can be below
or above, in both cases the PID regulator will just do its thing.
> I'm trying to work-around this these days by
> running with writeback_percent set to zero and using a
> writeback_rate that lets the cache clean up over the day... so far,
> so good, but it's too early to tell for sure. Since switching to the
> new SSDs, the reboot rate went down to about once a week and I've
> made this change only two days ago.
With the fix I posted, setting writeback_percent to zero finally works
again without the risk of writeback_thread spinning.
--
Vojtech Pavlik
Director SUSE Labs
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Bcache stuck at writeback of a key, consuming 100% CPU, not possible to detach
2015-09-05 11:29 ` Vojtech Pavlik
@ 2015-09-07 15:13 ` Jens-U. Mozdzen
2015-09-07 15:52 ` Vojtech Pavlik
0 siblings, 1 reply; 19+ messages in thread
From: Jens-U. Mozdzen @ 2015-09-07 15:13 UTC (permalink / raw)
To: Vojtech Pavlik, Kent Overstreet; +Cc: linux-bcache
Hi Vojtech,
Zitat von Vojtech Pavlik <vojtech@suse.com>:
> This matches my situation and behavior exactly. Including the fact that
> the backing device is a md raid.
> [...]
> I've sent an one-liner patch to the mailing list a few moments ago that
> fixes the issue.
I've added your patch to our kernel. Just for your reference, the
system is running Opensuse 13.1, with the kernel from
--- cut here ---
Information for package kernel-source:
--------------------------------------
Repository: conecenter
Name: kernel-source
Version: 3.18.8-5.1
Arch: noarch
Vendor: obs://build.opensuse.org/home:conecenter
Installed: Yes
Status: up-to-date
Installed Size: 471.5 MiB
Summary: The Linux Kernel Sources
Description:
Linux kernel sources with many fixes and improvements.
Source Timestamp: 2015-02-06 23:35:46 +0200
GIT Revision: ec2a744f14f988690583c04bd910145cd5a1f3c9
GIT Branch: stable
--- cut here ---
and the patches
bcache001.eml:Subject: [PATCH] bcache: [BUG] clear
BCACHE_DEV_UNLINK_DONE flag when attaching a backing device
bcache002.eml:Subject: [PATCH] bcache: fix a livelock in btree lock
bcache003.eml:Subject: [PATCH] bcache: unregister reboot notifier when
bcache fails to register a block device
bcache004.eml:Subject: [PATCH] fix a leak in bch_cached_dev_run()
bcache005.eml:Subject: [PATCH] bcache: Fix writeback_thread never
writing back incomplete stripes.
I can confirm that running with writeback_percent to zero now works
much smoother (or "at all", for certain circumstances).
>> PS: We're still facing random reboots (of unknown cause), which may
>> correlate with bcache's "amount dirty" being near the limit set by
>> writeback_percent.
For a test, after a few hours running the latest patch, I switched
from writeback_percent==0 to writeback_percent==1, and had a full
kernel crash within an hour! Luckily, I still had a console open on
the machine, so I could for the first time see a hint (but not much
more) of what is going on:
--- cut here ---
Message from syslogd@san02 at Sep 7 14:56:15 ...
kernel:[74182.424659] Kernel panic - not syncing: stack-protector:
Kernel stack is corrupted in: ffffffffa001a815
Message from syslogd@san02 at Sep 7 14:56:15 ...
kernel:[74182.424659]
Message from syslogd@san02 at Sep 7 14:56:15 ...
kernel:[74182.474050] Kernel Offset: 0x0 from 0xffffffff81000000
(relocation range: 0xffffffff80000000-0xffffffff9fffffff)
--- cut here ---
Since there's no stack trace, this lets much room for speculation. But
at least I now have an idea where the reboots (and two other "full
stops") might stem from: stack corruption. I have run
scripts/checkstack.pl on the bcache module and found no excessive
stack use, but checking for memset() and memcpy() in bcache's code
gave a number of hits - I'll have to have a look at them, one by one,
and hope to find my way around.
I'll give my servers at least two weeks to run with your patch and
writeback_percent==0 to see if we're hit by reboots with that code as
well. If not, I'll take that as an indicator that the implementation
of the "PID regulator" may need a closer look.
Kent, do you remember having fixed anything that might explain this
stack corruption behavior, in code later than what's included in
kernel 3.18.8?
Regards,
Jens
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Bcache stuck at writeback of a key, consuming 100% CPU, not possible to detach
2015-09-07 15:13 ` Jens-U. Mozdzen
@ 2015-09-07 15:52 ` Vojtech Pavlik
2015-09-07 16:01 ` Vojtech Pavlik
0 siblings, 1 reply; 19+ messages in thread
From: Vojtech Pavlik @ 2015-09-07 15:52 UTC (permalink / raw)
To: Jens-U. Mozdzen; +Cc: Kent Overstreet, linux-bcache
On Mon, Sep 07, 2015 at 05:13:18PM +0200, Jens-U. Mozdzen wrote:
> and the patches
>
> bcache001.eml:Subject: [PATCH] bcache: [BUG] clear
> BCACHE_DEV_UNLINK_DONE flag when attaching a backing device
> bcache002.eml:Subject: [PATCH] bcache: fix a livelock in btree lock
> bcache003.eml:Subject: [PATCH] bcache: unregister reboot notifier
> when bcache fails to register a block device
> bcache004.eml:Subject: [PATCH] fix a leak in bch_cached_dev_run()
> bcache005.eml:Subject: [PATCH] bcache: Fix writeback_thread never
> writing back incomplete stripes.
>
> I can confirm that running with writeback_percent to zero now works
> much smoother (or "at all", for certain circumstances).
I'm glad to hear that.
> >>PS: We're still facing random reboots (of unknown cause), which may
> >>correlate with bcache's "amount dirty" being near the limit set by
> >>writeback_percent.
>
> For a test, after a few hours running the latest patch, I switched
> from writeback_percent==0 to writeback_percent==1, and had a full
> kernel crash within an hour! Luckily, I still had a console open on
> the machine, so I could for the first time see a hint (but not much
> more) of what is going on:
I'm running the openSUSE most recent stable kernel, available here:
http://download.opensuse.org/repositories/Kernel:/stable/standard/
It's currently at 4.2.0 and contains all of the above patches. I've
seen crashes in __find_stripe a couple times a few months apart on older
kernels, but these aren't likely related to bcache. Similar to this:
https://bugzilla.kernel.org/show_bug.cgi?id=100321
But except for these, the system has been running stable (at
writeback_percent=40 the last few months), so I would bet on a different
source of your crashes than bcache.
> --- cut here ---
> Message from syslogd@san02 at Sep 7 14:56:15 ...
> kernel:[74182.424659] Kernel panic - not syncing: stack-protector:
> Kernel stack is corrupted in: ffffffffa001a815
>
> Message from syslogd@san02 at Sep 7 14:56:15 ...
> kernel:[74182.424659]
>
> Message from syslogd@san02 at Sep 7 14:56:15 ...
> kernel:[74182.474050] Kernel Offset: 0x0 from 0xffffffff81000000
> (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
> --- cut here ---
Maybe you could set up a serial console? That way you'd be able to catch
all the kernel messages.
> Since there's no stack trace, this lets much room for speculation.
> But at least I now have an idea where the reboots (and two other
> "full stops") might stem from: stack corruption. I have run
> scripts/checkstack.pl on the bcache module and found no excessive
> stack use, but checking for memset() and memcpy() in bcache's code
> gave a number of hits - I'll have to have a look at them, one by
> one, and hope to find my way around.
>
> I'll give my servers at least two weeks to run with your patch and
> writeback_percent==0 to see if we're hit by reboots with that code
> as well. If not, I'll take that as an indicator that the
> implementation of the "PID regulator" may need a closer look.
>
> Kent, do you remember having fixed anything that might explain this
> stack corruption behavior, in code later than what's included in
> kernel 3.18.8?
--
Vojtech Pavlik
Director SUSE Labs
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Bcache stuck at writeback of a key, consuming 100% CPU, not possible to detach
2015-09-07 15:52 ` Vojtech Pavlik
@ 2015-09-07 16:01 ` Vojtech Pavlik
[not found] ` <B7A73681-AF9A-438C-9323-B2CE3BEFCA98@profihost.ag>
0 siblings, 1 reply; 19+ messages in thread
From: Vojtech Pavlik @ 2015-09-07 16:01 UTC (permalink / raw)
To: Jens-U. Mozdzen; +Cc: Kent Overstreet, linux-bcache
On Mon, Sep 07, 2015 at 05:52:17PM +0200, Vojtech Pavlik wrote:
> I'm running the openSUSE most recent stable kernel, available here:
>
> http://download.opensuse.org/repositories/Kernel:/stable/standard/
>
> It's currently at 4.2.0 and contains all of the above patches. I've
> seen crashes in __find_stripe a couple times a few months apart on older
> kernels, but these aren't likely related to bcache. Similar to this:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=100321
And actually that should also be fixed by latest code from Neil:
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=49895bcc7e566ba455eb2996607d6fbd3447ce16
so bcache is exonerated for any crashes on my system in the past.
--
Vojtech Pavlik
Director SUSE Labs
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Bcache stuck at writeback of a key, consuming 100% CPU, not possible to detach
[not found] ` <B7A73681-AF9A-438C-9323-B2CE3BEFCA98@profihost.ag>
@ 2015-09-07 18:56 ` Vojtech Pavlik
2015-09-08 9:04 ` Jens-U. Mozdzen
0 siblings, 1 reply; 19+ messages in thread
From: Vojtech Pavlik @ 2015-09-07 18:56 UTC (permalink / raw)
To: Stefan Priebe - Profihost AG
Cc: Jens-U. Mozdzen, Kent Overstreet, linux-bcache
On Mon, Sep 07, 2015 at 07:58:24PM +0200, Stefan Priebe - Profihost AG wrote:
> Hi Pavlik,
> is suse officially working on bcache or fixing bugs from customers?
Yes, we are supporting it and fixing customer problems.
--
Vojtech Pavlik
Director SUSE Labs
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Bcache stuck at writeback of a key, consuming 100% CPU, not possible to detach
2015-09-07 18:56 ` Vojtech Pavlik
@ 2015-09-08 9:04 ` Jens-U. Mozdzen
2015-09-08 9:10 ` Vojtech Pavlik
0 siblings, 1 reply; 19+ messages in thread
From: Jens-U. Mozdzen @ 2015-09-08 9:04 UTC (permalink / raw)
To: Vojtech Pavlik; +Cc: linux-bcache
Hi Vojtech,
Zitat von Vojtech Pavlik <vojtech@suse.com>:
> On Mon, Sep 07, 2015 at 07:58:24PM +0200, Stefan Priebe - Profihost AG wrote:
>
>> Hi Pavlik,
>> is suse officially working on bcache or fixing bugs from customers?
>
> Yes, we are supporting it and fixing customer problems.
for which releases is that - SLES11 SPx, SLES12, ...?
Regards,
Jens
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Bcache stuck at writeback of a key, consuming 100% CPU, not possible to detach
2015-09-08 9:04 ` Jens-U. Mozdzen
@ 2015-09-08 9:10 ` Vojtech Pavlik
0 siblings, 0 replies; 19+ messages in thread
From: Vojtech Pavlik @ 2015-09-08 9:10 UTC (permalink / raw)
To: Jens-U. Mozdzen; +Cc: linux-bcache
On Tue, Sep 08, 2015 at 11:04:17AM +0200, Jens-U. Mozdzen wrote:
> Zitat von Vojtech Pavlik <vojtech@suse.com>:
> >On Mon, Sep 07, 2015 at 07:58:24PM +0200, Stefan Priebe - Profihost AG wrote:
> >
> >>Hi Pavlik,
> >>is suse officially working on bcache or fixing bugs from customers?
> >
> >Yes, we are supporting it and fixing customer problems.
>
> for which releases is that - SLES11 SPx, SLES12, ...?
Starting with SLES12 forward.
--
Vojtech Pavlik
Director SUSE Labs
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2015-09-08 9:10 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-30 8:54 Bcache stuck at writeback of a key, consuming 100% CPU, not possible to detach Vojtech Pavlik
2015-08-31 14:39 ` Emmanuel Florac
2015-08-31 14:49 ` Vojtech Pavlik
2015-08-31 15:04 ` Kent Overstreet
2015-08-31 16:45 ` Vojtech Pavlik
2015-08-31 16:53 ` Kent Overstreet
2015-08-31 17:09 ` Vojtech Pavlik
2015-09-01 13:34 ` Vojtech Pavlik
2015-08-31 16:54 ` Vojtech Pavlik
2015-08-31 15:09 ` Emmanuel Florac
2015-08-31 15:54 ` Vojtech Pavlik
2015-09-05 11:06 ` Jens-U. Mozdzen
2015-09-05 11:29 ` Vojtech Pavlik
2015-09-07 15:13 ` Jens-U. Mozdzen
2015-09-07 15:52 ` Vojtech Pavlik
2015-09-07 16:01 ` Vojtech Pavlik
[not found] ` <B7A73681-AF9A-438C-9323-B2CE3BEFCA98@profihost.ag>
2015-09-07 18:56 ` Vojtech Pavlik
2015-09-08 9:04 ` Jens-U. Mozdzen
2015-09-08 9:10 ` Vojtech Pavlik
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.