linux-bcache.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Coly Li <colyli@suse.de>
To: Kai Krakow <kai@kaishome.de>
Cc: "linux-bcache@vger.kernel.org" <linux-bcache@vger.kernel.org>,
	"吴本卿(云桌面 福州)" <wubenqing@ruijie.com.cn>
Subject: Re: Dirty data loss after cache disk error recovery
Date: Fri, 7 May 2021 20:11:50 +0800	[thread overview]
Message-ID: <70b9cdd0-ace9-9ee7-19c7-5c47a4d2fce9@suse.de> (raw)
In-Reply-To: <CAC2ZOYuBhFbpZeRnnc-1-Vt-tV_3iwkf3i21+YjVukYkx7J7YQ@mail.gmail.com>

On 4/29/21 2:51 AM, Kai Krakow wrote:
>> I think this behavior was introduced by https://lwn.net/Articles/748226/
>>
>> So above is my late review. ;-)
>>
>> (around commit 7e027ca4b534b6b99a7c0471e13ba075ffa3f482 if you cannot
>> access LWN for reasons[tm])
> 
> The problem may actually come from a different code path which retires
> the cache on metadata error:
> 
> commit 804f3c6981f5e4a506a8f14dc284cb218d0659ae
> "bcache: fix cached_dev->count usage for bch_cache_set_error()"
> 
> It probably should consider if there's any dirty data. As a first
> step, it may be sufficient to run a BUG_ON(there_is_dirty_data) (this
> would kill the bcache thread, may not be a good idea) or even freeze
> the system with an unrecoverable error, or at least stop the device to
> prevent any IO with possibly stale data (because retiring throws away
> dirty data). A good solution would be if the "with dirty data" error
> path could somehow force the attached file system into read-only mode,
> maybe by just reporting IO errors when this bdev is accessed through
> bcache.


There is an option to panic the system when cache device failed. It is
in errors file with available options as "unregister" and "panic". This
option is default set to "unregister", if you set it to "panic" then
panic() will be called.

If the cache set is attached, read-only the bcache device does not
prevent the meta data I/O on cache device (when try to cache the reading
data), if the cache device is really disconnected that will be
problematic too.

The "auto" and "always" options are for "unregister" error action. When
I enhance the device failure handling, I don't add new error action, all
my work was to make the "unregister" action work better.

Adding a new "stop" error action IMHO doesn't make things better. When
the cache device is disconnected, it is always risky that some caching
data or meta data is not updated onto cache device. Permit the cache
device to be re-attached to the backing device may introduce "silent
data loss" which might be worse....  It was the reason why I didn't add
new error action for the device failure handling patch set.

Thanks.

Coly Li

  reply	other threads:[~2021-05-07 12:12 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-20  3:17 Dirty data loss after cache disk error recovery 吴本卿(云桌面 福州)
2021-04-28 18:30 ` Kai Krakow
2021-04-28 18:39   ` Kai Krakow
2021-04-28 18:51     ` Kai Krakow
2021-05-07 12:11       ` Coly Li [this message]
2021-05-07 14:56         ` Kai Krakow
     [not found]           ` <6ab4d6a-de99-6464-cb2-ad66d0918446@ewheeler.net>
2023-09-06 22:56             ` Kai Krakow
     [not found]               ` <7cadf9ff-b496-5567-9d60-f0af48122595@ewheeler.net>
2023-09-07 12:00                 ` Kai Krakow
2023-09-07 19:10                   ` Eric Wheeler
2023-09-12  6:54                 ` 邹明哲
     [not found]                   ` <f2fcf354-29ec-e2f7-b251-fb9b7d36f4@ewheeler.net>
2023-10-11 16:19                     ` Kai Krakow
2023-10-16 23:39                       ` Eric Wheeler
2023-10-17  0:33                         ` Kai Krakow
2023-10-17  0:39                           ` Kai Krakow
2023-10-11 16:29                     ` Kai Krakow
2021-05-07 12:13     ` Coly Li
2023-10-17  1:57 ` Coly Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=70b9cdd0-ace9-9ee7-19c7-5c47a4d2fce9@suse.de \
    --to=colyli@suse.de \
    --cc=kai@kaishome.de \
    --cc=linux-bcache@vger.kernel.org \
    --cc=wubenqing@ruijie.com.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).