All of lore.kernel.org
 help / color / mirror / Atom feed
* Bcache stuck at writeback of a key, consuming 100% CPU, not possible to detach
@ 2015-08-30  8:54 Vojtech Pavlik
  2015-08-31 14:39 ` Emmanuel Florac
  2015-09-05 11:06 ` Jens-U. Mozdzen
  0 siblings, 2 replies; 19+ messages in thread
From: Vojtech Pavlik @ 2015-08-30  8:54 UTC (permalink / raw)
  To: kmo; +Cc: linux-bcache

Hello Kent, list,

I have been using bcache for quite a while now. This particular setup is
a 6TB RAID5 mdraid, with a 200 GB SSD for caching, with bcache running
in writeback mode with a fixed low writeback rate and a btrfs filesystem
on top. Currently on kernel 4.1.6.

The system has survived some crashes and some power outages, however
always recovered without visible damage.

btrfs scrub finishes in a couple hours and doesn't report any errors, so
the filesystem data are consistent and readable.

Some months ago, I started observing large latencies when servicing some
I/O requests, these events would come minutes apart and would result in
applications stuck in the D state for tens of seconds.

I started debugging and found out that several factors contributed,
namely I had a large readahead value set up, disabling readahead helped.
The kernel switched to a multiqueue I/O scheduler, which didn't have
request merging, going back to deadline helped. But didn't help enough.

Then I noticed that during those situations where the system was slow, and
processes stuck in D, bcache_writeback CPU usage was soaring all the way to
saturating a core, showing this backtrace, spending time in refill_keybuf_fn():

	NMI backtrace for cpu 6
	CPU: 6 PID: 1743 Comm: bcache_writebac Not tainted 4.1.6-2.gce0123d-desktop #1
	Hardware name: Gigabyte Technology Co., Ltd. Z87MX-D3H/Z87MX-D3H-CF, BIOS F5 08/02/2013
	task: ffff8807ecf32250 ti: ffff8808045c4000 task.ti: ffff8808045c4000
	RIP: 0010:[<ffffffffa0bdf592>]  [<ffffffffa0bdf592>] refill_keybuf_fn+0x12/0x220 [bcache]
	RSP: 0018:ffff8808045c7b68  EFLAGS: 00000292
	RAX: ffffffffa0bdf580 RBX: 0000000000000000 RCX: ffffc900073fb000
	RDX: ffff8807e4d4cca0 RSI: ffff8807fb22e800 RDI: ffff8808045c7db8
	RBP: ffff8808045c7db8 R08: ffff8807ea466000 R09: 0000000000000001
	R10: 0000000000000008 R11: 000007ffffffffff R12: ffff8808045c7db8
	R13: 0000000000000000 R14: 00000000ffffffff R15: ffff8807fb22e8d0
	FS:  0000000000000000(0000) GS:ffff88082f380000(0000) knlGS:0000000000000000
	CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
	CR2: 00007f5c080b4660 CR3: 0000000002e0f000 CR4: 00000000001426e0
	Stack:
	 ffffffffa0bde137 0000000000000000 ffff8807fb22e800 0000000000000000
	 ffff8807fb22e800 ffff8808045c7db8 0000000000000000 ffffffffa0be2120
	 ffff8807ea4a0000 ffffffffa0bdf580 ffffffffa0bdee84 00000001a0bdcfc3
	Call Trace:
	 [<ffffffffa0be2120>] bch_btree_map_keys_recurse+0x50/0x170 [bcache]
	 [<ffffffffa0be21a4>] bch_btree_map_keys_recurse+0xd4/0x170 [bcache]
	 [<ffffffffa0be5207>] bch_btree_map_keys+0x167/0x1c0 [bcache]
	 [<ffffffffa0be5309>] bch_refill_keybuf+0xa9/0x200 [bcache]
	 [<ffffffffa0bf9d3e>] bch_writeback_thread+0x11e/0x430 [bcache]
	 [<ffffffff810876d1>] kthread+0xc1/0xe0
	 [<ffffffff816b07e2>] ret_from_fork+0x42/0x70
	Code: e8 24 e8 ac e0 48 83 c4 08 48 89 d8 5b 5d 41 5c 41 5d 41 5e 41 5f c3 0f 0b 0f 1f 44 00 00 41 55 41 54 55 53 48 89 fd 48 83 ec 18 <48> 8b 77 40 48 8b 5f 38 48 8b 3a 48 8b 0e 48 89 f8 25 ff ff 0f

Changing the configuration to writeback_percent=40 helped. For some
time at least.

When the issue returned, without any further changes to the system, I
started investigating deeper. Since writeback_percent was large, also
the amount of dirty data was large. Before poking deeper, I decided I
want to clear the dirty data entierly. So I set the system to
cache_mode=writethrough and watched the dirty data trickle to the
backing device.

But then it stopped at 2.8G and didn't progress any further. The
bcache_writeback thread was at 100% CPU usage again and system was near
unusable. Reverting to writeback made the system responsive again.

A ftrace shows this in a tight loop over and over again:

	bcache_writebac-1707  [004] ....  1064.836033: bch_btree_iter_next_filter <-bch_btree_map_keys_recurse 
	bcache_writebac-1707  [004] ....  1064.836033: bch_extent_bad <-bch_btree_iter_next_filter 
	bcache_writebac-1707  [004] ....  1064.836033: __bch_extent_invalid <-bch_extent_bad 
	bcache_writebac-1707  [004] ....  1064.836033: __ptr_invalid <-__bch_extent_invalid 
	bcache_writebac-1707  [004] ....  1064.836033: refill_keybuf_fn <-bch_btree_map_keys_recurse 
	bcache_writebac-1707  [004] ....  1064.836033: dirty_pred <-refill_keybuf_fn

A full trace can be found at http://twilight.ucw.cz/trace.bz2 and starts
with a working system, then shows the bad behavior (most of the lines)
and ends with a recovery after writeback mode was enabled again.

I also tried detaching the cache device, however, that resulted in the
same behavior that only was possible to stop by a reboot, since 
bcache was tring to flush the dirty data no matter what.

So my conclusion is that there is an extent on the cache device, that's
declared 'bad' by bcache for some reason, most likely past corruption.
And when bcache_writeback hits it, and has nothing else to work on, 
it immediately returns to it and tries again, with no progress.

I have tried enabling the "expensive debug checks" on the device, but it
didn't show any additional information in the system log when hitting
the bad extents.

Since I need to recover the system from this situation, my current plan
is to copy all the filesystem contents to another media and then rebuild
the bcache setup from scratch.

I consider this a rather serious bug, even though it is most likely
caused by the cache device being corrupted. Any hints?

Best regards,
-- 
Vojtech Pavlik
Director SUSE Labs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Bcache stuck at writeback of a key, consuming 100% CPU, not possible to detach
  2015-08-30  8:54 Bcache stuck at writeback of a key, consuming 100% CPU, not possible to detach Vojtech Pavlik
@ 2015-08-31 14:39 ` Emmanuel Florac
  2015-08-31 14:49   ` Vojtech Pavlik
  2015-09-05 11:06 ` Jens-U. Mozdzen
  1 sibling, 1 reply; 19+ messages in thread
From: Emmanuel Florac @ 2015-08-31 14:39 UTC (permalink / raw)
  To: Vojtech Pavlik; +Cc: kmo, linux-bcache

Le Sun, 30 Aug 2015 10:54:42 +0200
Vojtech Pavlik <vojtech@suse.cz> écrivait:

> Then I noticed that during those situations where the system was
> slow, and processes stuck in D, bcache_writeback CPU usage was
> soaring all the way to saturating a core,

In my experience, bcache_writeback stays in Wait state, therefore
always saturate a core: any machine I'm running bcache on has a
constant load of 1.00 even when completely idle.

> showing this backtrace,
> spending time in refill_keybuf_fn():
 <snip>
> Changing the configuration to writeback_percent=40 helped. For some
> time at least.
> 
> When the issue returned, without any further changes to the system, I
> started investigating deeper. Since writeback_percent was large, also
> the amount of dirty data was large.

In my case, when dirty data reaches the upper limit (i.e. when the
amount of dirty data equals the writeback_percent * backing device
size ), and it occurs regularly, the system just freezes...

> Before poking deeper, I decided I
> want to clear the dirty data entierly. So I set the system to
> cache_mode=writethrough and watched the dirty data trickle to the
> backing device.
> 
> But then it stopped at 2.8G and didn't progress any further. The
> bcache_writeback thread was at 100% CPU usage again and system was
> near unusable. Reverting to writeback made the system responsive
> again.

The bcache_writeback stays at 100% _even_ when in writethrough mode,
alas. So this looks normal. However dirty_data definitely should drop
to zero...

 <snip> 
> I consider this a rather serious bug, even though it is most likely
> caused by the cache device being corrupted. Any hints?

Did you check what "smartctl -a" has to say about your backing device,
and maybe your spinning drives too? Just in case...

-- 
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Bcache stuck at writeback of a key, consuming 100% CPU, not possible to detach
  2015-08-31 14:39 ` Emmanuel Florac
@ 2015-08-31 14:49   ` Vojtech Pavlik
  2015-08-31 15:04     ` Kent Overstreet
  2015-08-31 15:09     ` Emmanuel Florac
  0 siblings, 2 replies; 19+ messages in thread
From: Vojtech Pavlik @ 2015-08-31 14:49 UTC (permalink / raw)
  To: Emmanuel Florac; +Cc: kmo, linux-bcache

On Mon, Aug 31, 2015 at 04:39:37PM +0200, Emmanuel Florac wrote:

> > Then I noticed that during those situations where the system was
> > slow, and processes stuck in D, bcache_writeback CPU usage was
> > soaring all the way to saturating a core,
> 
> In my experience, bcache_writeback stays in Wait state, therefore
> always saturate a core: any machine I'm running bcache on has a
> constant load of 1.00 even when completely idle.

In this situation, I see it in an "R" state.

> > showing this backtrace,
> > spending time in refill_keybuf_fn():
>  <snip>
> > Changing the configuration to writeback_percent=40 helped. For some
> > time at least.
> > 
> > When the issue returned, without any further changes to the system, I
> > started investigating deeper. Since writeback_percent was large, also
> > the amount of dirty data was large.
> 
> In my case, when dirty data reaches the upper limit (i.e. when the
> amount of dirty data equals the writeback_percent * backing device
> size ), and it occurs regularly, the system just freezes...

That may be a similar symptom.

> > Before poking deeper, I decided I
> > want to clear the dirty data entierly. So I set the system to
> > cache_mode=writethrough and watched the dirty data trickle to the
> > backing device.
> > 
> > But then it stopped at 2.8G and didn't progress any further. The
> > bcache_writeback thread was at 100% CPU usage again and system was
> > near unusable. Reverting to writeback made the system responsive
> > again.
> 
> The bcache_writeback stays at 100% _even_ when in writethrough mode,
> alas. So this looks normal. However dirty_data definitely should drop
> to zero...

This most certainly isn't normal. The ftrace shows it's looping in a
loop doing nothing useful.

>  <snip> 
> > I consider this a rather serious bug, even though it is most likely
> > caused by the cache device being corrupted. Any hints?
> 
> Did you check what "smartctl -a" has to say about your backing device,
> and maybe your spinning drives too? Just in case...

Yes, they're fine. The backing device is a RAID5.

-- 
Vojtech Pavlik
Director SuSE Labs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Bcache stuck at writeback of a key, consuming 100% CPU, not possible to detach
  2015-08-31 14:49   ` Vojtech Pavlik
@ 2015-08-31 15:04     ` Kent Overstreet
  2015-08-31 16:45       ` Vojtech Pavlik
  2015-08-31 16:54       ` Vojtech Pavlik
  2015-08-31 15:09     ` Emmanuel Florac
  1 sibling, 2 replies; 19+ messages in thread
From: Kent Overstreet @ 2015-08-31 15:04 UTC (permalink / raw)
  To: Vojtech Pavlik; +Cc: Emmanuel Florac, kmo, linux-bcache

On Mon, Aug 31, 2015 at 04:49:49PM +0200, Vojtech Pavlik wrote:
> On Mon, Aug 31, 2015 at 04:39:37PM +0200, Emmanuel Florac wrote:
> 
> > > Then I noticed that during those situations where the system was
> > > slow, and processes stuck in D, bcache_writeback CPU usage was
> > > soaring all the way to saturating a core,
> > 
> > In my experience, bcache_writeback stays in Wait state, therefore
> > always saturate a core: any machine I'm running bcache on has a
> > constant load of 1.00 even when completely idle.
> 
> In this situation, I see it in an "R" state.
> 
> > > showing this backtrace,
> > > spending time in refill_keybuf_fn():
> >  <snip>
> > > Changing the configuration to writeback_percent=40 helped. For some
> > > time at least.
> > > 
> > > When the issue returned, without any further changes to the system, I
> > > started investigating deeper. Since writeback_percent was large, also
> > > the amount of dirty data was large.
> > 
> > In my case, when dirty data reaches the upper limit (i.e. when the
> > amount of dirty data equals the writeback_percent * backing device
> > size ), and it occurs regularly, the system just freezes...
> 
> That may be a similar symptom.

I suspect there's two different bugs here.

 - I'm starting to suspect there's a bug in the dirty data accounting, and it's
   getting out of sync - i.e. reading 2.8 GB or whatever when it's actually 0.
   that would explain it spinning when there actually isn't any work for it to
   do.

 - with a large enough amount of data, the 30 second writeback_delay may be
   insufficient; if it takes longer than that just to scan the entire keyspace
   it'll never get a chance to sleep. try bumping writeback_delay up and see if
   that helps.

   the ratelimiting on scanning for dirty data needs to be changed to something
   more sophisticated, the existing fixed delay is problematic.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Bcache stuck at writeback of a key, consuming 100% CPU, not possible to detach
  2015-08-31 14:49   ` Vojtech Pavlik
  2015-08-31 15:04     ` Kent Overstreet
@ 2015-08-31 15:09     ` Emmanuel Florac
  2015-08-31 15:54       ` Vojtech Pavlik
  1 sibling, 1 reply; 19+ messages in thread
From: Emmanuel Florac @ 2015-08-31 15:09 UTC (permalink / raw)
  To: Vojtech Pavlik; +Cc: kmo, linux-bcache

Le Mon, 31 Aug 2015 16:49:49 +0200
Vojtech Pavlik <vojtech@suse.cz> écrivait:

> > 
> > The bcache_writeback stays at 100% _even_ when in writethrough mode,
> > alas. So this looks normal. However dirty_data definitely should
> > drop to zero...  
> 
> This most certainly isn't normal. The ftrace shows it's looping in a
> loop doing nothing useful.
> 

Yep, I've had a quick look at the code, however it's not very
inspiring. It very much looks like there's a missing test. From your
trace it looks like it finds a key_bad (?) -- from the struct
bch_extent_keys_ops in bcache/extents.c -- but there is no code anywhere
to manage this case apparently.

Ah, and I can't download your complete trace, too : twilight.ucw.cz
isn't responding. :)

-- 
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Bcache stuck at writeback of a key, consuming 100% CPU, not possible to detach
  2015-08-31 15:09     ` Emmanuel Florac
@ 2015-08-31 15:54       ` Vojtech Pavlik
  0 siblings, 0 replies; 19+ messages in thread
From: Vojtech Pavlik @ 2015-08-31 15:54 UTC (permalink / raw)
  To: Emmanuel Florac; +Cc: kmo, linux-bcache

On Mon, Aug 31, 2015 at 05:09:28PM +0200, Emmanuel Florac wrote:
> Le Mon, 31 Aug 2015 16:49:49 +0200
> Vojtech Pavlik <vojtech@suse.cz> écrivait:
> 
> > > 
> > > The bcache_writeback stays at 100% _even_ when in writethrough mode,
> > > alas. So this looks normal. However dirty_data definitely should
> > > drop to zero...  
> > 
> > This most certainly isn't normal. The ftrace shows it's looping in a
> > loop doing nothing useful.
> > 
> 
> Yep, I've had a quick look at the code, however it's not very
> inspiring. It very much looks like there's a missing test. From your
> trace it looks like it finds a key_bad (?) -- from the struct
> bch_extent_keys_ops in bcache/extents.c -- but there is no code anywhere
> to manage this case apparently.
> 
> Ah, and I can't download your complete trace, too : twilight.ucw.cz
> isn't responding. :)

Sadly, the bcache array crashed hard earlier today (with a bug in  and the machine
doesn't boot anymore. I'm recovering it now.

-- 
Vojtech Pavlik
Director SuSE Labs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Bcache stuck at writeback of a key, consuming 100% CPU, not possible to detach
  2015-08-31 15:04     ` Kent Overstreet
@ 2015-08-31 16:45       ` Vojtech Pavlik
  2015-08-31 16:53         ` Kent Overstreet
  2015-08-31 16:54       ` Vojtech Pavlik
  1 sibling, 1 reply; 19+ messages in thread
From: Vojtech Pavlik @ 2015-08-31 16:45 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: Emmanuel Florac, kmo, linux-bcache

On Mon, Aug 31, 2015 at 07:04:29AM -0800, Kent Overstreet wrote:

> I suspect there's two different bugs here.
> 
>  - I'm starting to suspect there's a bug in the dirty data accounting, and it's
>    getting out of sync - i.e. reading 2.8 GB or whatever when it's actually 0.
>    that would explain it spinning when there actually isn't any work for it to
>    do.

That may be the case, but doesn't quite match my observation. Using this
command line:

	echo 2 > writeback_percent; echo 0 > writeback_percent; echo 100000 > writeback_rate; echo none > cache_mode; while true; do if top -b -n 1 | grep 'R.*bcache_write'; then date; echo looping; echo writeback > cache_mode; echo 40 > writeback_percent; sleep 1; echo 2 > writeback_percent; echo 0 > writeback_percent; echo 100000 > writeback_rate; echo none > cache_mode; echo fixed; cat /sys/block/bcache0/bcache/dirty_data; fi; done

I managed to get down to about 200 MB od dirty data reported. If the
reporting was off by a fixed offset, I wouldn't be getting the 100% CPU
and running bcache_writeback at 5GB of dirty data already.

At least unless the accounting of dirty data is very wrong and
fluctuating.

>  - with a large enough amount of data, the 30 second writeback_delay may be
>    insufficient; if it takes longer than that just to scan the entire keyspace
>    it'll never get a chance to sleep. try bumping writeback_delay up and see if
>    that helps.

That shouldn't be the case when the amount of dirty data is below a
gigabyte, or is it?

>    the ratelimiting on scanning for dirty data needs to be changed to something
>    more sophisticated, the existing fixed delay is problematic.

-- 
Vojtech Pavlik
Director SuSE Labs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Bcache stuck at writeback of a key, consuming 100% CPU, not possible to detach
  2015-08-31 16:45       ` Vojtech Pavlik
@ 2015-08-31 16:53         ` Kent Overstreet
  2015-08-31 17:09           ` Vojtech Pavlik
  2015-09-01 13:34           ` Vojtech Pavlik
  0 siblings, 2 replies; 19+ messages in thread
From: Kent Overstreet @ 2015-08-31 16:53 UTC (permalink / raw)
  To: Vojtech Pavlik; +Cc: Emmanuel Florac, kmo, linux-bcache

On Mon, Aug 31, 2015 at 06:45:31PM +0200, Vojtech Pavlik wrote:
> On Mon, Aug 31, 2015 at 07:04:29AM -0800, Kent Overstreet wrote:
> 
> > I suspect there's two different bugs here.
> > 
> >  - I'm starting to suspect there's a bug in the dirty data accounting, and it's
> >    getting out of sync - i.e. reading 2.8 GB or whatever when it's actually 0.
> >    that would explain it spinning when there actually isn't any work for it to
> >    do.
> 
> That may be the case, but doesn't quite match my observation. Using this
> command line:
> 
> 	echo 2 > writeback_percent; echo 0 > writeback_percent; echo 100000 > writeback_rate; echo none > cache_mode; while true; do if top -b -n 1 | grep 'R.*bcache_write'; then date; echo looping; echo writeback > cache_mode; echo 40 > writeback_percent; sleep 1; echo 2 > writeback_percent; echo 0 > writeback_percent; echo 100000 > writeback_rate; echo none > cache_mode; echo fixed; cat /sys/block/bcache0/bcache/dirty_data; fi; done
> 
> I managed to get down to about 200 MB od dirty data reported. If the
> reporting was off by a fixed offset, I wouldn't be getting the 100% CPU
> and running bcache_writeback at 5GB of dirty data already.
> 
> At least unless the accounting of dirty data is very wrong and
> fluctuating.

Huh, that's good to know then.

> >  - with a large enough amount of data, the 30 second writeback_delay may be
> >    insufficient; if it takes longer than that just to scan the entire keyspace
> >    it'll never get a chance to sleep. try bumping writeback_delay up and see if
> >    that helps.
> 
> That shouldn't be the case when the amount of dirty data is below a
> gigabyte, or is it?

No - it has to scan the entire btree, cached _and_ dirty data - so the scanning
gets expensive when you have lots of clean cached data and very little dirty
data, so it's supposed to ratelimit to no more than one scan every 30 seconds
(IIRC; that algorithm has gone through a couple different iterations). But if
it's taking more than 30 seconds to complete one scan... well, you see the
problem?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Bcache stuck at writeback of a key, consuming 100% CPU, not possible to detach
  2015-08-31 15:04     ` Kent Overstreet
  2015-08-31 16:45       ` Vojtech Pavlik
@ 2015-08-31 16:54       ` Vojtech Pavlik
  1 sibling, 0 replies; 19+ messages in thread
From: Vojtech Pavlik @ 2015-08-31 16:54 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: Emmanuel Florac, linux-bcache

On Mon, Aug 31, 2015 at 07:04:29AM -0800, Kent Overstreet wrote:

> I suspect there's two different bugs here.
> 
>  - I'm starting to suspect there's a bug in the dirty data accounting, and it's
>    getting out of sync - i.e. reading 2.8 GB or whatever when it's actually 0.
>    that would explain it spinning when there actually isn't any work for it to
>    do.
> 
>  - with a large enough amount of data, the 30 second writeback_delay may be
>    insufficient; if it takes longer than that just to scan the entire keyspace
>    it'll never get a chance to sleep. try bumping writeback_delay up and see if
>    that helps.
> 
>    the ratelimiting on scanning for dirty data needs to be changed to something
>    more sophisticated, the existing fixed delay is problematic.

Is there any difference regarding writeback problems between using 4.1.6
and your bcache-dev git branch?

It may take a bit of effort, but I should be able to get bcache-dev
installed on my system.

-- 
Vojtech Pavlik
Director SuSE Labs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Bcache stuck at writeback of a key, consuming 100% CPU, not possible to detach
  2015-08-31 16:53         ` Kent Overstreet
@ 2015-08-31 17:09           ` Vojtech Pavlik
  2015-09-01 13:34           ` Vojtech Pavlik
  1 sibling, 0 replies; 19+ messages in thread
From: Vojtech Pavlik @ 2015-08-31 17:09 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: Emmanuel Florac, kmo, linux-bcache

On Mon, Aug 31, 2015 at 08:53:54AM -0800, Kent Overstreet wrote:

> > >  - with a large enough amount of data, the 30 second writeback_delay may be
> > >    insufficient; if it takes longer than that just to scan the entire keyspace
> > >    it'll never get a chance to sleep. try bumping writeback_delay up and see if
> > >    that helps.
> > 
> > That shouldn't be the case when the amount of dirty data is below a
> > gigabyte, or is it?
> 
> No - it has to scan the entire btree, cached _and_ dirty data - so the scanning
> gets expensive when you have lots of clean cached data and very little dirty
> data, so it's supposed to ratelimit to no more than one scan every 30 seconds
> (IIRC; that algorithm has gone through a couple different iterations). But if
> it's taking more than 30 seconds to complete one scan... well, you see the
> problem?

That makes sense. The cache device is ~200 GB. I'll try to play with
writeback_delay when the system is up and running again. Currently I'm
doing a full backup such that I can test things without any worries.

-- 
Vojtech Pavlik
Director SuSE Labs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Bcache stuck at writeback of a key, consuming 100% CPU, not possible to detach
  2015-08-31 16:53         ` Kent Overstreet
  2015-08-31 17:09           ` Vojtech Pavlik
@ 2015-09-01 13:34           ` Vojtech Pavlik
  1 sibling, 0 replies; 19+ messages in thread
From: Vojtech Pavlik @ 2015-09-01 13:34 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: Emmanuel Florac, linux-bcache

On Mon, Aug 31, 2015 at 08:53:54AM -0800, Kent Overstreet wrote:

> > >  - with a large enough amount of data, the 30 second writeback_delay may be
> > >    insufficient; if it takes longer than that just to scan the entire keyspace
> > >    it'll never get a chance to sleep. try bumping writeback_delay up and see if
> > >    that helps.
> > 
> > That shouldn't be the case when the amount of dirty data is below a
> > gigabyte, or is it?
> 
> No - it has to scan the entire btree, cached _and_ dirty data - so the scanning
> gets expensive when you have lots of clean cached data and very little dirty
> data, so it's supposed to ratelimit to no more than one scan every 30 seconds
> (IIRC; that algorithm has gone through a couple different iterations). But if
> it's taking more than 30 seconds to complete one scan... well, you see the
> problem?

I tested that and set 'writeback_delay' to 600. That should be 10
minutes. No change:

	 PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
	1721 root      20   0       0      0      0 R 100.0 0.000   1075:09 bcache_writeback

Vojtech

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Bcache stuck at writeback of a key, consuming 100% CPU, not possible to detach
  2015-08-30  8:54 Bcache stuck at writeback of a key, consuming 100% CPU, not possible to detach Vojtech Pavlik
  2015-08-31 14:39 ` Emmanuel Florac
@ 2015-09-05 11:06 ` Jens-U. Mozdzen
  2015-09-05 11:29   ` Vojtech Pavlik
  1 sibling, 1 reply; 19+ messages in thread
From: Jens-U. Mozdzen @ 2015-09-05 11:06 UTC (permalink / raw)
  To: linux-bcache

Hi all,

I've noticed similar oddities with our servers - not quite the same,  
but close enough so I won't open a new thread:

We're running kernel 3.18.8 on server machines delivering SAN and NAS  
resources. Back-end storage is a MD-RAID6 (7 WD Red 1TB 2,5"), cache is
on MD-RAID1 (2 SSD TOSHIBA PX02SMF020 200GB). As we live migrated from  
128GB SSDs to the Toshibas, the cache size is still at 128 GB.

On one of the servers, I noticed excessive I/O load reported by our  
monitoring tools. Having read this thread, I tried to get the amount  
of dirty data down (cache_mode set to "writeback", writeback_percent  
to 0, writeback_rate to 10000 and then monitoring  
writeback_rate_debug), but unlike with the other server, the amount of  
dirty data would not go below 186M. Unlike with the original report,  
bcache_writeback wasn't at 100% but varying in its CPU usage - but  
always on top of all other processes running. I/O wait was unusually  
high, compared to the amount of data written.

I left the system rest over night, to find that the next day it would  
not go below 197M, so that "bad spot" had changed. The load on this  
server had increased - looking at the stats, it seemed like writeback  
was trying to write data all the time, but for whatever reason failing  
(which matched the lower limit of dirty data).

Fearing the worst, I set the cache mode to writearound to disable  
further caching (amount of dirty still wouldn't drop below its border  
value), stopped the clients for this server and rebooted.

Luckily, the server came up without a problem, and *I now could get  
the amount of dirty data down to zero*. I switched back to writeback,  
with writeback_percent to 0 and a fixed writeback_rate.

So in our case, it looks like something borked in bcache's run-time,  
rather than on-disk (read: SSD cache content).

Regards,
Jens

PS: We're still facing random reboots (of unknown cause), which may  
correlate with bcache's "amount dirty" being near the limit set by  
writeback_percent. I'm trying to work-around this these days by  
running with writeback_percent set to zero and using a writeback_rate  
that lets the cache clean up over the day... so far, so good, but it's  
too early to tell for sure. Since switching to the new SSDs, the  
reboot rate went down to about once a week and I've made this change  
only two days ago.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Bcache stuck at writeback of a key, consuming 100% CPU, not possible to detach
  2015-09-05 11:06 ` Jens-U. Mozdzen
@ 2015-09-05 11:29   ` Vojtech Pavlik
  2015-09-07 15:13     ` Jens-U. Mozdzen
  0 siblings, 1 reply; 19+ messages in thread
From: Vojtech Pavlik @ 2015-09-05 11:29 UTC (permalink / raw)
  To: Jens-U. Mozdzen; +Cc: linux-bcache

On Sat, Sep 05, 2015 at 01:06:57PM +0200, Jens-U. Mozdzen wrote:

> I've noticed similar oddities with our servers - not quite the same,
> but close enough so I won't open a new thread:
> 
> We're running kernel 3.18.8 on server machines delivering SAN and
> NAS resources. Back-end storage is a MD-RAID6 (7 WD Red 1TB 2,5"),
> cache is
> on MD-RAID1 (2 SSD TOSHIBA PX02SMF020 200GB). As we live migrated
> from 128GB SSDs to the Toshibas, the cache size is still at 128 GB.
> 
> On one of the servers, I noticed excessive I/O load reported by our
> monitoring tools. Having read this thread, I tried to get the amount
> of dirty data down (cache_mode set to "writeback", writeback_percent
> to 0, writeback_rate to 10000 and then monitoring
> writeback_rate_debug), but unlike with the other server, the amount
> of dirty data would not go below 186M. Unlike with the original
> report, bcache_writeback wasn't at 100% but varying in its CPU usage
> - but always on top of all other processes running. I/O wait was
> unusually high, compared to the amount of data written.

This matches my situation and behavior exactly. Including the fact that
the backing device is a md raid.

> I left the system rest over night, to find that the next day it
> would not go below 197M, so that "bad spot" had changed. The load on
> this server had increased - looking at the stats, it seemed like
> writeback was trying to write data all the time, but for whatever
> reason failing (which matched the lower limit of dirty data).
> 
> Fearing the worst, I set the cache mode to writearound to disable
> further caching (amount of dirty still wouldn't drop below its
> border value), stopped the clients for this server and rebooted.
> 
> Luckily, the server came up without a problem, and *I now could get
> the amount of dirty data down to zero*. I switched back to
> writeback, with writeback_percent to 0 and a fixed writeback_rate.
> 
> So in our case, it looks like something borked in bcache's run-time,
> rather than on-disk (read: SSD cache content).

My understanding of bcache internals is still limited: I only spent
three days gazing into the code and adding debugging. But this is what I
believe is happening:

Bcache tries to be clever and only write whole stripes to the RAID. That
is good as it avoids a R-M-W cycle in the RAID on the data.

However, the stripe optimization was added on top of the usual writeback
code and doesn't play very nicely with it, since it takes over the usage
of the last_scanned pointer, confusing the original code.

This means that bcache_writeback is able to write all complete stripes
back, but gets stuck at any random small dirty data bits.

Stuck and spinning, going to sleep only for zero-length times.

This also explains that write traffic on the device can complete some of
the small dirty bits to full stripes and eventually have them written
back.

I've sent an one-liner patch to the mailing list a few moments ago that
fixes the issue.

> PS: We're still facing random reboots (of unknown cause), which may
> correlate with bcache's "amount dirty" being near the limit set by
> writeback_percent. 

Being near that limit is what bcache tries to achieve. It can be below
or above, in both cases the PID regulator will just do its thing.

> I'm trying to work-around this these days by
> running with writeback_percent set to zero and using a
> writeback_rate that lets the cache clean up over the day... so far,
> so good, but it's too early to tell for sure. Since switching to the
> new SSDs, the reboot rate went down to about once a week and I've
> made this change only two days ago.

With the fix I posted, setting writeback_percent to zero finally works
again without the risk of writeback_thread spinning.

-- 
Vojtech Pavlik
Director SUSE Labs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Bcache stuck at writeback of a key, consuming 100% CPU, not possible to detach
  2015-09-05 11:29   ` Vojtech Pavlik
@ 2015-09-07 15:13     ` Jens-U. Mozdzen
  2015-09-07 15:52       ` Vojtech Pavlik
  0 siblings, 1 reply; 19+ messages in thread
From: Jens-U. Mozdzen @ 2015-09-07 15:13 UTC (permalink / raw)
  To: Vojtech Pavlik, Kent Overstreet; +Cc: linux-bcache

Hi Vojtech,

Zitat von Vojtech Pavlik <vojtech@suse.com>:
> This matches my situation and behavior exactly. Including the fact that
> the backing device is a md raid.
> [...]
> I've sent an one-liner patch to the mailing list a few moments ago that
> fixes the issue.

I've added your patch to our kernel. Just for your reference, the  
system is running Opensuse 13.1, with the kernel from

--- cut here ---
Information for package kernel-source:
--------------------------------------
Repository: conecenter
Name: kernel-source
Version: 3.18.8-5.1
Arch: noarch
Vendor: obs://build.opensuse.org/home:conecenter
Installed: Yes
Status: up-to-date
Installed Size: 471.5 MiB
Summary: The Linux Kernel Sources
Description:
Linux kernel sources with many fixes and improvements.


Source Timestamp: 2015-02-06 23:35:46 +0200
GIT Revision: ec2a744f14f988690583c04bd910145cd5a1f3c9
GIT Branch: stable
--- cut here ---

and the patches

bcache001.eml:Subject: [PATCH] bcache: [BUG] clear  
BCACHE_DEV_UNLINK_DONE flag when attaching a backing device
bcache002.eml:Subject: [PATCH] bcache: fix a livelock in btree lock
bcache003.eml:Subject: [PATCH] bcache: unregister reboot notifier when  
bcache fails to register a block device
bcache004.eml:Subject: [PATCH] fix a leak in bch_cached_dev_run()
bcache005.eml:Subject: [PATCH] bcache: Fix writeback_thread never  
writing back incomplete stripes.

I can confirm that running with writeback_percent to zero now works  
much smoother (or "at all", for certain circumstances).

>> PS: We're still facing random reboots (of unknown cause), which may
>> correlate with bcache's "amount dirty" being near the limit set by
>> writeback_percent.

For a test, after a few hours running the latest patch, I switched  
from writeback_percent==0 to writeback_percent==1, and had a full  
kernel crash within an hour! Luckily, I still had a console open on  
the machine, so I could for the first time see a hint (but not much  
more) of what is going on:

--- cut here ---
Message from syslogd@san02 at Sep  7 14:56:15 ...
  kernel:[74182.424659] Kernel panic - not syncing: stack-protector:  
Kernel stack is corrupted in: ffffffffa001a815

Message from syslogd@san02 at Sep  7 14:56:15 ...
  kernel:[74182.424659]

Message from syslogd@san02 at Sep  7 14:56:15 ...
  kernel:[74182.474050] Kernel Offset: 0x0 from 0xffffffff81000000  
(relocation range: 0xffffffff80000000-0xffffffff9fffffff)
--- cut here ---

Since there's no stack trace, this lets much room for speculation. But  
at least I now have an idea where the reboots (and two other "full  
stops") might stem from: stack corruption. I have run  
scripts/checkstack.pl on the bcache module and found no excessive  
stack use, but checking for memset() and memcpy() in bcache's code  
gave a number of hits - I'll have to have a look at them, one by one,  
and hope to find my way around.

I'll give my servers at least two weeks to run with your patch and  
writeback_percent==0 to see if we're hit by reboots with that code as  
well. If not, I'll take that as an indicator that the implementation  
of the "PID regulator" may need a closer look.

Kent, do you remember having fixed anything that might explain this  
stack corruption behavior, in code later than what's included in  
kernel 3.18.8?

Regards,
Jens

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Bcache stuck at writeback of a key, consuming 100% CPU, not possible to detach
  2015-09-07 15:13     ` Jens-U. Mozdzen
@ 2015-09-07 15:52       ` Vojtech Pavlik
  2015-09-07 16:01         ` Vojtech Pavlik
  0 siblings, 1 reply; 19+ messages in thread
From: Vojtech Pavlik @ 2015-09-07 15:52 UTC (permalink / raw)
  To: Jens-U. Mozdzen; +Cc: Kent Overstreet, linux-bcache

On Mon, Sep 07, 2015 at 05:13:18PM +0200, Jens-U. Mozdzen wrote:

> and the patches
> 
> bcache001.eml:Subject: [PATCH] bcache: [BUG] clear
> BCACHE_DEV_UNLINK_DONE flag when attaching a backing device
> bcache002.eml:Subject: [PATCH] bcache: fix a livelock in btree lock
> bcache003.eml:Subject: [PATCH] bcache: unregister reboot notifier
> when bcache fails to register a block device
> bcache004.eml:Subject: [PATCH] fix a leak in bch_cached_dev_run()
> bcache005.eml:Subject: [PATCH] bcache: Fix writeback_thread never
> writing back incomplete stripes.
> 
> I can confirm that running with writeback_percent to zero now works
> much smoother (or "at all", for certain circumstances).

I'm glad to hear that.

> >>PS: We're still facing random reboots (of unknown cause), which may
> >>correlate with bcache's "amount dirty" being near the limit set by
> >>writeback_percent.
> 
> For a test, after a few hours running the latest patch, I switched
> from writeback_percent==0 to writeback_percent==1, and had a full
> kernel crash within an hour! Luckily, I still had a console open on
> the machine, so I could for the first time see a hint (but not much
> more) of what is going on:

I'm running the openSUSE most recent stable kernel, available here:

	http://download.opensuse.org/repositories/Kernel:/stable/standard/

It's currently at 4.2.0 and contains all of the above patches. I've
seen crashes in __find_stripe a couple times a few months apart on older
kernels, but these aren't likely related to bcache. Similar to this:

	https://bugzilla.kernel.org/show_bug.cgi?id=100321

But except for these, the system has been running stable (at
writeback_percent=40 the last few months), so I would bet on a different
source of your crashes than bcache.

> --- cut here ---
> Message from syslogd@san02 at Sep  7 14:56:15 ...
>  kernel:[74182.424659] Kernel panic - not syncing: stack-protector:
> Kernel stack is corrupted in: ffffffffa001a815
> 
> Message from syslogd@san02 at Sep  7 14:56:15 ...
>  kernel:[74182.424659]
> 
> Message from syslogd@san02 at Sep  7 14:56:15 ...
>  kernel:[74182.474050] Kernel Offset: 0x0 from 0xffffffff81000000
> (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
> --- cut here ---

Maybe you could set up a serial console? That way you'd be able to catch
all the kernel messages.

> Since there's no stack trace, this lets much room for speculation.
> But at least I now have an idea where the reboots (and two other
> "full stops") might stem from: stack corruption. I have run
> scripts/checkstack.pl on the bcache module and found no excessive
> stack use, but checking for memset() and memcpy() in bcache's code
> gave a number of hits - I'll have to have a look at them, one by
> one, and hope to find my way around.
> 
> I'll give my servers at least two weeks to run with your patch and
> writeback_percent==0 to see if we're hit by reboots with that code
> as well. If not, I'll take that as an indicator that the
> implementation of the "PID regulator" may need a closer look.
>
> Kent, do you remember having fixed anything that might explain this
> stack corruption behavior, in code later than what's included in
> kernel 3.18.8?

-- 
Vojtech Pavlik
Director SUSE Labs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Bcache stuck at writeback of a key, consuming 100% CPU, not possible to detach
  2015-09-07 15:52       ` Vojtech Pavlik
@ 2015-09-07 16:01         ` Vojtech Pavlik
       [not found]           ` <B7A73681-AF9A-438C-9323-B2CE3BEFCA98@profihost.ag>
  0 siblings, 1 reply; 19+ messages in thread
From: Vojtech Pavlik @ 2015-09-07 16:01 UTC (permalink / raw)
  To: Jens-U. Mozdzen; +Cc: Kent Overstreet, linux-bcache

On Mon, Sep 07, 2015 at 05:52:17PM +0200, Vojtech Pavlik wrote:

> I'm running the openSUSE most recent stable kernel, available here:
> 
> 	http://download.opensuse.org/repositories/Kernel:/stable/standard/
> 
> It's currently at 4.2.0 and contains all of the above patches. I've
> seen crashes in __find_stripe a couple times a few months apart on older
> kernels, but these aren't likely related to bcache. Similar to this:
> 
> 	https://bugzilla.kernel.org/show_bug.cgi?id=100321

And actually that should also be fixed by latest code from Neil:

	https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=49895bcc7e566ba455eb2996607d6fbd3447ce16

so bcache is exonerated for any crashes on my system in the past.

-- 
Vojtech Pavlik
Director SUSE Labs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Bcache stuck at writeback of a key, consuming 100% CPU, not possible to detach
       [not found]           ` <B7A73681-AF9A-438C-9323-B2CE3BEFCA98@profihost.ag>
@ 2015-09-07 18:56             ` Vojtech Pavlik
  2015-09-08  9:04               ` Jens-U. Mozdzen
  0 siblings, 1 reply; 19+ messages in thread
From: Vojtech Pavlik @ 2015-09-07 18:56 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG
  Cc: Jens-U. Mozdzen, Kent Overstreet, linux-bcache

On Mon, Sep 07, 2015 at 07:58:24PM +0200, Stefan Priebe - Profihost AG wrote:

> Hi Pavlik,
> is suse officially working on bcache or fixing bugs from customers?

Yes, we are supporting it and fixing customer problems.

-- 
Vojtech Pavlik
Director SUSE Labs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Bcache stuck at writeback of a key, consuming 100% CPU, not possible to detach
  2015-09-07 18:56             ` Vojtech Pavlik
@ 2015-09-08  9:04               ` Jens-U. Mozdzen
  2015-09-08  9:10                 ` Vojtech Pavlik
  0 siblings, 1 reply; 19+ messages in thread
From: Jens-U. Mozdzen @ 2015-09-08  9:04 UTC (permalink / raw)
  To: Vojtech Pavlik; +Cc: linux-bcache

Hi Vojtech,

Zitat von Vojtech Pavlik <vojtech@suse.com>:
> On Mon, Sep 07, 2015 at 07:58:24PM +0200, Stefan Priebe - Profihost AG wrote:
>
>> Hi Pavlik,
>> is suse officially working on bcache or fixing bugs from customers?
>
> Yes, we are supporting it and fixing customer problems.

for which releases is that - SLES11 SPx, SLES12, ...?

Regards,
Jens

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Bcache stuck at writeback of a key, consuming 100% CPU, not possible to detach
  2015-09-08  9:04               ` Jens-U. Mozdzen
@ 2015-09-08  9:10                 ` Vojtech Pavlik
  0 siblings, 0 replies; 19+ messages in thread
From: Vojtech Pavlik @ 2015-09-08  9:10 UTC (permalink / raw)
  To: Jens-U. Mozdzen; +Cc: linux-bcache

On Tue, Sep 08, 2015 at 11:04:17AM +0200, Jens-U. Mozdzen wrote:

> Zitat von Vojtech Pavlik <vojtech@suse.com>:
> >On Mon, Sep 07, 2015 at 07:58:24PM +0200, Stefan Priebe - Profihost AG wrote:
> >
> >>Hi Pavlik,
> >>is suse officially working on bcache or fixing bugs from customers?
> >
> >Yes, we are supporting it and fixing customer problems.
> 
> for which releases is that - SLES11 SPx, SLES12, ...?

Starting with SLES12 forward.

-- 
Vojtech Pavlik
Director SUSE Labs

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2015-09-08  9:10 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-30  8:54 Bcache stuck at writeback of a key, consuming 100% CPU, not possible to detach Vojtech Pavlik
2015-08-31 14:39 ` Emmanuel Florac
2015-08-31 14:49   ` Vojtech Pavlik
2015-08-31 15:04     ` Kent Overstreet
2015-08-31 16:45       ` Vojtech Pavlik
2015-08-31 16:53         ` Kent Overstreet
2015-08-31 17:09           ` Vojtech Pavlik
2015-09-01 13:34           ` Vojtech Pavlik
2015-08-31 16:54       ` Vojtech Pavlik
2015-08-31 15:09     ` Emmanuel Florac
2015-08-31 15:54       ` Vojtech Pavlik
2015-09-05 11:06 ` Jens-U. Mozdzen
2015-09-05 11:29   ` Vojtech Pavlik
2015-09-07 15:13     ` Jens-U. Mozdzen
2015-09-07 15:52       ` Vojtech Pavlik
2015-09-07 16:01         ` Vojtech Pavlik
     [not found]           ` <B7A73681-AF9A-438C-9323-B2CE3BEFCA98@profihost.ag>
2015-09-07 18:56             ` Vojtech Pavlik
2015-09-08  9:04               ` Jens-U. Mozdzen
2015-09-08  9:10                 ` Vojtech Pavlik

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.