linux-bcache.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Undoing an "Auto-Stop" when Cache device has recovered?
@ 2021-03-24 20:21 Nikolaus Rath
  2021-03-25  5:29 ` Coly Li
  0 siblings, 1 reply; 5+ messages in thread
From: Nikolaus Rath @ 2021-03-24 20:21 UTC (permalink / raw)
  To: linux-bcache

Hello,

My (writeback enabled) bcache cache device had a temporary failure, but seems to have fully recovered (it may have been overheating or a loose cable).

From the last kernel messages, it seems that bcache tried to flush the dirty data, but failed, and then stopped the cache device.

After a reboot, the bcacheX device indeed no longer has an associated cache set..

I think in my case the cache device is in perfect shape again and still has all the data, so I would really like bcache to attach it again so that the dirty cache data is not lost.

Is there a way to do that?

(Yes, I will still replace the device afterwards)

(I am pretty sure that just re-attaching the cacheset will make bcache forget that there was a previous association and will wipe the corresponding metadata).

Best,
Nikolaus

--
GPG Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

             »Time flies like an arrow, fruit flies like a Banana.«


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Undoing an "Auto-Stop" when Cache device has recovered?
  2021-03-24 20:21 Undoing an "Auto-Stop" when Cache device has recovered? Nikolaus Rath
@ 2021-03-25  5:29 ` Coly Li
  2021-03-25  6:16   ` Nikolaus Rath
  0 siblings, 1 reply; 5+ messages in thread
From: Coly Li @ 2021-03-25  5:29 UTC (permalink / raw)
  To: Nikolaus Rath; +Cc: linux-bcache

On 3/25/21 4:21 AM, Nikolaus Rath wrote:
> Hello,
> 
> My (writeback enabled) bcache cache device had a temporary failure, but seems to have fully recovered (it may have been overheating or a loose cable).
> 
> From the last kernel messages, it seems that bcache tried to flush the dirty data, but failed, and then stopped the cache device.
> 
> After a reboot, the bcacheX device indeed no longer has an associated cache set..
> 
> I think in my case the cache device is in perfect shape again and still has all the data, so I would really like bcache to attach it again so that the dirty cache data is not lost.
> 
> Is there a way to do that?
> 
> (Yes, I will still replace the device afterwards)
> 
> (I am pretty sure that just re-attaching the cacheset will make bcache forget that there was a previous association and will wipe the corresponding metadata).
> 

Hi Nikolaus,

Do you have the kernel log? It depends on whether the cache set is clean
or not. For a clear cache set, the cache set is detached, and next
reattach will invalidate all existing cached data. If the cache set is
dirty and all existing data is wiped, that will be fishy....

Thanks.

Coly Li

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Undoing an "Auto-Stop" when Cache device has recovered?
  2021-03-25  5:29 ` Coly Li
@ 2021-03-25  6:16   ` Nikolaus Rath
  2021-04-06 12:16     ` Marc Smith
  0 siblings, 1 reply; 5+ messages in thread
From: Nikolaus Rath @ 2021-03-25  6:16 UTC (permalink / raw)
  To: Coly Li; +Cc: linux-bcache


On Thu, 25 Mar 2021, at 05:29, Coly Li wrote:
> On 3/25/21 4:21 AM, Nikolaus Rath wrote:
> > Hello,
> > 
> > My (writeback enabled) bcache cache device had a temporary failure, but seems to have fully recovered (it may have been overheating or a loose cable).
> > 
> > From the last kernel messages, it seems that bcache tried to flush the dirty data, but failed, and then stopped the cache device.
> > 
> > After a reboot, the bcacheX device indeed no longer has an associated cache set..
> > 
> > I think in my case the cache device is in perfect shape again and still has all the data, so I would really like bcache to attach it again so that the dirty cache data is not lost.
> > 
> > Is there a way to do that?
> > 
> > (Yes, I will still replace the device afterwards)
> > 
> > (I am pretty sure that just re-attaching the cacheset will make bcache forget that there was a previous association and will wipe the corresponding metadata).
> > 
> 
> Hi Nikolaus,
> 
> Do you have the kernel log? It depends on whether the cache set is clean
> or not. For a clear cache set, the cache set is detached, and next
> reattach will invalidate all existing cached data. If the cache set is
> dirty and all existing data is wiped, that will be fishy....

Hi Cody,

I'm not sure I understand. I believe there is dirty data on the cacheset (it was effectively disconnected in the middle of operations). Also, if it wasn't dirty then there would be no need to re-attach it (all the important data would be on the backing device).

On the other hand, after a reboot the cache set shows up in /sys/fs/bcache - just not associated with any backing device. So I guess from that point of view it is clean?

The kernel logs are on the affected bcache, and I have avoided doing anything with it (including mounting). I took a few pictures of the last visible messages on the console before re-booting though. For example, here is when the problem starts

First ATA errors: https://drive.google.com/file/d/1_vr-JBWZjajzbWyXUSmtn4faNH6072ut/view?usp=sharing
First bcache errors: https://drive.google.com/file/d/1XLCWDi6G2lP1JiVitZTtIqzB4QqxXv2-/view?usp=sharing

Does that help?

Best,
-Nikolaus

--
GPG Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

             »Time flies like an arrow, fruit flies like a Banana.«

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Undoing an "Auto-Stop" when Cache device has recovered?
  2021-03-25  6:16   ` Nikolaus Rath
@ 2021-04-06 12:16     ` Marc Smith
  2021-04-06 12:37       ` Kai Krakow
  0 siblings, 1 reply; 5+ messages in thread
From: Marc Smith @ 2021-04-06 12:16 UTC (permalink / raw)
  To: Nikolaus Rath; +Cc: Coly Li, linux-bcache

On Thu, Mar 25, 2021 at 2:18 AM Nikolaus Rath <nikolaus@rath.org> wrote:
>
>
> On Thu, 25 Mar 2021, at 05:29, Coly Li wrote:
> > On 3/25/21 4:21 AM, Nikolaus Rath wrote:
> > > Hello,
> > >
> > > My (writeback enabled) bcache cache device had a temporary failure, but seems to have fully recovered (it may have been overheating or a loose cable).
> > >
> > > From the last kernel messages, it seems that bcache tried to flush the dirty data, but failed, and then stopped the cache device.
> > >
> > > After a reboot, the bcacheX device indeed no longer has an associated cache set..
> > >
> > > I think in my case the cache device is in perfect shape again and still has all the data, so I would really like bcache to attach it again so that the dirty cache data is not lost.
> > >
> > > Is there a way to do that?
> > >
> > > (Yes, I will still replace the device afterwards)
> > >
> > > (I am pretty sure that just re-attaching the cacheset will make bcache forget that there was a previous association and will wipe the corresponding metadata).
> > >
> >
> > Hi Nikolaus,
> >
> > Do you have the kernel log? It depends on whether the cache set is clean
> > or not. For a clear cache set, the cache set is detached, and next
> > reattach will invalidate all existing cached data. If the cache set is
> > dirty and all existing data is wiped, that will be fishy....
>
> Hi Cody,
>
> I'm not sure I understand. I believe there is dirty data on the cacheset (it was effectively disconnected in the middle of operations). Also, if it wasn't dirty then there would be no need to re-attach it (all the important data would be on the backing device).
>
> On the other hand, after a reboot the cache set shows up in /sys/fs/bcache - just not associated with any backing device. So I guess from that point of view it is clean?

I actually have experienced very similar behavior with a transient
cache device failure (it's not totally dead) and just posted here
recently: https://marc.info/?l=linux-bcache&m=161642940714578&w=1

My thought was to use "panic" in the 'errors' sysfs attribute so the
machine panics instead of detaching the cache device. Otherwise, it
seems the cache device gets detached with dirty data present, and the
backing device is started (yet data is not present).

I'll work on reproducing the original case with the "unregister" value
and provide logs, as it sounds like this behavior is unexpected (eg, a
cache device should only detach if there is NO dirty data present).

--Marc

>
> The kernel logs are on the affected bcache, and I have avoided doing anything with it (including mounting). I took a few pictures of the last visible messages on the console before re-booting though. For example, here is when the problem starts
>
> First ATA errors: https://drive.google.com/file/d/1_vr-JBWZjajzbWyXUSmtn4faNH6072ut/view?usp=sharing
> First bcache errors: https://drive.google.com/file/d/1XLCWDi6G2lP1JiVitZTtIqzB4QqxXv2-/view?usp=sharing
>
> Does that help?
>
> Best,
> -Nikolaus
>
> --
> GPG Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F
>
>              »Time flies like an arrow, fruit flies like a Banana.«

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Undoing an "Auto-Stop" when Cache device has recovered?
  2021-04-06 12:16     ` Marc Smith
@ 2021-04-06 12:37       ` Kai Krakow
  0 siblings, 0 replies; 5+ messages in thread
From: Kai Krakow @ 2021-04-06 12:37 UTC (permalink / raw)
  To: Marc Smith; +Cc: Nikolaus Rath, Coly Li, linux-bcache

Am Di., 6. Apr. 2021 um 14:16 Uhr schrieb Marc Smith <msmith626@gmail.com>:

> My thought was to use "panic" in the 'errors' sysfs attribute so the
> machine panics instead of detaching the cache device. Otherwise, it
> seems the cache device gets detached with dirty data present, and the
> backing device is started (yet data is not present).
>
> I'll work on reproducing the original case with the "unregister" value
> and provide logs, as it sounds like this behavior is unexpected (eg, a
> cache device should only detach if there is NO dirty data present).

It could be useful to switch the caching mode to write-around at the
same time, so no data would be written to the device accidentally.

For consistency reasons, the backing device should become inaccessible
when the cache device with dirty data goes away. If the cache is
clean, the cache can just be detached. So it should have an "auto"
option which does either the one or the other thing depending on
caching state.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-04-06 12:37 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-24 20:21 Undoing an "Auto-Stop" when Cache device has recovered? Nikolaus Rath
2021-03-25  5:29 ` Coly Li
2021-03-25  6:16   ` Nikolaus Rath
2021-04-06 12:16     ` Marc Smith
2021-04-06 12:37       ` Kai Krakow

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).