All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Wheeler <bcache@lists.ewheeler.net>
To: Coly Li <colyli@suse.de>
Cc: tang.junhui@zte.com.cn, linux-bcache@vger.kernel.org,
	linux-bcache-owner@vger.kernel.org, linux-block@vger.kernel.org
Subject: Re: [PATCH] bcache: only recovery I/O error for writethrough mode
Date: Thu, 13 Jul 2017 00:53:55 +0000 (UTC)	[thread overview]
Message-ID: <alpine.LRH.2.11.1707130012390.9683@mail.ewheeler.net> (raw)
In-Reply-To: <be0287d1-9af2-659c-0d40-5fb774e3b178@suse.de>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 3765 bytes --]

On Wed, 12 Jul 2017, Coly Li wrote:

> On 2017/7/12 上午10:01, tang.junhui@zte.com.cn wrote:
> >>I meant "it is very necessary for data base applications which always
> >>use *writeback* mode and not switch to other mode during all their
> >>online time."  ^_^
> > 
> > I know, it is necessary, but not enough. who can promise they will not
> > switch during online time? This patch is logical imperfectly.
> 
> Yes, I agree with you. Since Eric mentions dirty data map, an improved
> fix shows up in my head,
> 
> When cache device disconnected from system,
> 0) If in non mode, do nothing.

Does non mode guarantee that nothing is dirty?  I'm not convinced of that.  
I think you can set non mode with dirty blocks. (Correct me if I'm wrong 
here.)

> 1) If in writeback/writethough/writearound mode, and dirty map is clean,
>    - switch to non mode
>    - continue to handle I/O without cache device

Sure, that makes sense.  

> 2) If in writeback mode, and dirty map is not clean,

You would want to do a dirty map lookup for each IO.  How about this:

2) If in _any_ mode, and dirty map is dirty for *that specific block*:

  If WRITE request completely overlaps the dirty segment then clear 
	the dirty flag and pass through to the backing dev.
  otherwise:
    - return -EIO immediately for WRITE request
    - return -EIO immediately for READ request (*)

If the WRITE request completely overlaps the dirty segment as indicated 
from the in-memory metadata, then clear its dirty flag and write to the 
backing device.  Whatever was dirty isn't important anymore as it was 
overwritten.

Unless there is a good reason to diverge, we would want this recovery 
logic would be the same for failed IOs from an existing cachedev (eg, with 
badblocks), and for cachedevs that are altogether missing.


> 3) If not in writeback mode, and dirty map is not clean. It means the
> cache mode is switched from writeback mode with dirty data lost, then
>    - returns -EIO immediately for WRITE request
>    - returns -EIO immediately for READ request (*)

For #2,3, do a dirty map lookup for every IO: if the block is clean, pass 
it to the backing device.  Only -EIO if the request cannot be recovered 
(block is dirty) and invoke pr_crit_once() to notify the user.  We want 
all IO requests to succeed to the extent possible.

I think #3 is the same case as #2.  The logic is the same whether its is 
now or ever was in writeback mode, regardless of the current mode.
 
> (*) NOTE:
> A sysfs entry "recovery_io_error" can be add here, which is disabled as
> default. If it is enabled, if a READ request does not hit dirty map,
> bcache will provide it from backing device.

Resilience first!  This should default on.  

Sometimes systems run for months with bad sectors, and this case is no 
different.  Let the bcache users' IOs succeed if possible but notify them 
with pr_crit_once().

A reboot might loose data.  They could be lucky with important data in the 
page cache; notifying them without killing the device because it is dirty 
might give them a chance to do a hot backup before rebooting (or 
stop/starting bcache).

Since the dirty map is still in memory, that information is 
useful for recovery.  After a reboot the dirty map is lost---and with it 
the data about what is consistent and what is not. 

For example, if LVM snapshots sit atop of the bcache volume, then you 
could `dd` them off.  If you hit an error, you know that copy is at least 
partially inconsistent and can try an older snapshot until one is found 
which is old enough to be 100% consistent.  Without the dirty map, you 
would only be guessing at which volume is actually consistent.

Let users set recovery_io_error=0 for those who really want to fail early.

--
Eric Wheeler

  reply	other threads:[~2017-07-13  0:53 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-10 11:18 [PATCH] bcache: only recovery I/O error for writethrough mode Coly Li
2017-07-10 21:46 ` Kai Krakow
2017-07-11  3:55   ` Coly Li
2017-07-11 17:42     ` Eric Wheeler
2017-07-12  1:32       ` Coly Li
     [not found] ` <OFCB2AA6A0.E15FCC3D-ON4825815B.0001EEF5-4825815B.00033093@zte.com.cn>
2017-07-12  1:37   ` Coly Li
     [not found]     ` <OFAAA836F5.65C1F846-ON4825815B.00093270-4825815B.0009741B@zte.com.cn>
2017-07-12  1:45       ` Coly Li
     [not found]         ` <OF7159B9D9.1C66A3AE-ON4825815B.000AD084-4825815B.000B252E@zte.com.cn>
2017-07-12  3:20           ` Coly Li
2017-07-13  0:53             ` Eric Wheeler [this message]
2017-07-13  6:47               ` handling cache device disconnection more properly ( was Re: [PATCH] bcache: only recovery I/O error for writethrough mode) Coly Li
2017-07-13  6:47                 ` Coly Li
2017-07-24 19:19                 ` Eric Wheeler
2017-07-26 17:08                   ` Coly Li
2017-07-26 20:08                     ` Eric Wheeler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LRH.2.11.1707130012390.9683@mail.ewheeler.net \
    --to=bcache@lists.ewheeler.net \
    --cc=colyli@suse.de \
    --cc=linux-bcache-owner@vger.kernel.org \
    --cc=linux-bcache@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=tang.junhui@zte.com.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.