All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] bcache: recover data from backing device when read request hit clean
@ 2017-11-17  3:51 Rui Hua
  2017-11-17  4:02 ` Michael Lyle
  2017-11-17  6:01 ` Coly Li
  0 siblings, 2 replies; 19+ messages in thread
From: Rui Hua @ 2017-11-17  3:51 UTC (permalink / raw)
  To: Coly Li, Michael Lyle, Kent Overstreet; +Cc: linux-bcache, linux-block

When we send a read request and hit the clean data in cache device, there
is a situation called cache read race in bcache(see the commit in the tail
of cache_look_up(), the following explaination just copy from there):
The bucket we're reading from might be reused while our bio is in flight,
and we could then end up reading the wrong data. We guard against this
by checking (in bch_cache_read_endio()) if the pointer is stale again;
if so, we treat it as an error (s->iop.error = -EINTR) and reread from
the backing device (but we don't pass that error up anywhere)

It should be noted that cache read race happened under normal
circumstances, not the circumstance when SSD failed, it was counted
and shown in  /sys/fs/bcache/XXX/internal/cache_read_races.

Without this patch, when we use writeback mode, we will never reread from
the backing device when cache read race happend, until the whole cache
device is clean, because the condition
(s->recoverable && (dc && !atomic_read(&dc->has_dirty))) is false in
cached_dev_read_error(). In this situation, the s->iop.error(= -EINTR)
will be passed up, at last, user will receive -EINTR when it's bio end,
this is not suitable, and wield to up-application.

In this patch, we use s->read_dirty_data to judge whether the read
request hit dirty data in cache device, it is safe to reread data from
the backing device when the read request hit clean data. This can not
only handle cache read race, but also recover data when failed read
request from cache device.

Signed-off-by: Hua Rui <huarui.dev@gmail.com>
---
 drivers/md/bcache/request.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
index 3a7aed7..f02fefb 100644
--- a/drivers/md/bcache/request.c
+++ b/drivers/md/bcache/request.c
@@ -708,16 +708,15 @@ static void cached_dev_read_error(struct closure *cl)
 {
        struct search *s = container_of(cl, struct search, cl);
        struct bio *bio = &s->bio.bio;
-       struct cached_dev *dc = container_of(s->d, struct cached_dev, disk);

        /*
-        * If cache device is dirty (dc->has_dirty is non-zero), then
-        * recovery a failed read request from cached device may get a
-        * stale data back. So read failure recovery is only permitted
-        * when cache device is clean.
+        * If read request hit dirty data (s->read_dirty_data is true),
+        * then recovery a failed read request from cached device may
+        * get a stale data back. So read failure recovery is only
+        * permitted when read request hit clean data in cache device,
+        * or when cache read race happend.
         */
-       if (s->recoverable &&
-           (dc && !atomic_read(&dc->has_dirty))) {
+       if (s->recoverable && !s->read_dirty_data) {
                /* Retry from the backing device: */
                trace_bcache_read_retry(s->orig_bio);

-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH] bcache: recover data from backing device when read request hit clean
  2017-11-17  3:51 [PATCH] bcache: recover data from backing device when read request hit clean Rui Hua
@ 2017-11-17  4:02 ` Michael Lyle
  2017-11-17  4:25   ` Rui Hua
  2017-11-17  6:01 ` Coly Li
  1 sibling, 1 reply; 19+ messages in thread
From: Michael Lyle @ 2017-11-17  4:02 UTC (permalink / raw)
  To: Rui Hua, Coly Li, Kent Overstreet; +Cc: linux-bcache, linux-block

Hi, Rui Hua--

On 11/16/2017 07:51 PM, Rui Hua wrote:
> In this patch, we use s->read_dirty_data to judge whether the read
> request hit dirty data in cache device, it is safe to reread data from
> the backing device when the read request hit clean data. This can not
> only handle cache read race, but also recover data when failed read
> request from cache device.

I haven't read over all of this yet, to understand the read race.  But
this change can't be safe-- read_dirty_data only is set to true if the
metadata on the cache device is correctly read.  If that fails, the flag
may not be set and we could read stale data on the backing device.

Mike

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] bcache: recover data from backing device when read request hit clean
  2017-11-17  4:02 ` Michael Lyle
@ 2017-11-17  4:25   ` Rui Hua
  2017-11-18  0:57     ` Michael Lyle
  0 siblings, 1 reply; 19+ messages in thread
From: Rui Hua @ 2017-11-17  4:25 UTC (permalink / raw)
  To: Michael Lyle; +Cc: Coly Li, Kent Overstreet, linux-bcache, linux-block

Hi, Michael--

Thanks for the quickly reply.

If the metadata on the cache device was NOT correctly read,
cached_dev_read_error() will not be called, so the recovery check will
not be executed. so this patch is safe.
s->iop.error was not set when faild to read metadata on the cache
device, so the  cached_dev_bio_complete() will be called instead of
cached_dev_read_error()

2017-11-17 12:02 GMT+08:00 Michael Lyle <mlyle@lyle.org>:
> Hi, Rui Hua--
>
> On 11/16/2017 07:51 PM, Rui Hua wrote:
>> In this patch, we use s->read_dirty_data to judge whether the read
>> request hit dirty data in cache device, it is safe to reread data from
>> the backing device when the read request hit clean data. This can not
>> only handle cache read race, but also recover data when failed read
>> request from cache device.
>
> I haven't read over all of this yet, to understand the read race.  But
> this change can't be safe-- read_dirty_data only is set to true if the
> metadata on the cache device is correctly read.  If that fails, the flag
> may not be set and we could read stale data on the backing device.
>
> Mike

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] bcache: recover data from backing device when read request hit clean
  2017-11-17  3:51 [PATCH] bcache: recover data from backing device when read request hit clean Rui Hua
  2017-11-17  4:02 ` Michael Lyle
@ 2017-11-17  6:01 ` Coly Li
  2017-11-17  7:26   ` Rui Hua
  1 sibling, 1 reply; 19+ messages in thread
From: Coly Li @ 2017-11-17  6:01 UTC (permalink / raw)
  To: Rui Hua, Coly Li, Michael Lyle, Kent Overstreet; +Cc: linux-bcache, linux-block

On 17/11/2017 11:51 AM, Rui Hua wrote:
> When we send a read request and hit the clean data in cache device, there
> is a situation called cache read race in bcache(see the commit in the tail
> of cache_look_up(), the following explaination just copy from there):
> The bucket we're reading from might be reused while our bio is in flight,
> and we could then end up reading the wrong data. We guard against this
> by checking (in bch_cache_read_endio()) if the pointer is stale again;
> if so, we treat it as an error (s->iop.error = -EINTR) and reread from
> the backing device (but we don't pass that error up anywhere)
> 

Hi Rui Hua,

I see the point, and agree with you this should be fixed.
Indeed return -EINTR to upper layer makes sense. A read request may
cover several keys, and when a key encounters read race, bcache may
return result of previous keys before read race encountered, and tells
upper layer code this is not full completed by returnning -EINTR. This
is a defined behavior by POSIX.

The reason why I still agree to fix it is, most of upper layer code just
check if bio->bi_status is 0 or not, they don't check whether
bio->bi_status is -EAGAIN or -EINTR.

> It should be noted that cache read race happened under normal
> circumstances, not the circumstance when SSD failed, it was counted
> and shown in  /sys/fs/bcache/XXX/internal/cache_read_races.
> 

Yes, this is not for real I/O error on cache device, the I/O error code
is set by bcache code. I didn't notice this case, and my test didn't
cover such situation. Thank for point out this !

> Without this patch, when we use writeback mode, we will never reread from
> the backing device when cache read race happend, until the whole cache
> device is clean, because the condition

I assume this is a race condition, that means after the race window, the
KEY associated to the reused bucket will be invalided too. So a second
read will trigger a cache miss, and re-read the data back into cache
device. So it won't be "never". The problem is, if upper layer code
treats it as a failed I/O, and it happens to be metadata of upper layer
code, then some negative result may happens. For example file system
turns into read-only mode.

> (s->recoverable && (dc && !atomic_read(&dc->has_dirty))) is false in
> cached_dev_read_error(). In this situation, the s->iop.error(= -EINTR)
> will be passed up, at last, user will receive -EINTR when it's bio end,
> this is not suitable, and wield to up-application.

As I said, most of upper layer code only checks whether bio->bi_status
is 0, otherwise that's error. We need to give such error a second try.

> 
> In this patch, we use s->read_dirty_data to judge whether the read
> request hit dirty data in cache device, it is safe to reread data from
> the backing device when the read request hit clean data. This can not
> only handle cache read race, but also recover data when failed read
> request from cache device.

In your another reply to Michael, you mentioned if bcache internal node
I/O failed, cached_dev_read_error() won't be called. I check the code
with your hint, it seems make sense. Also I am not sure whether the
internal node I/O failure is handled properly or not, I feel for leaf
node, your modification works for both writeback mode, and a
writethrough mode changed from a writeback mode with dirty data.

It seems my previous patch neglects the internal btree node I/O failure,
because I mistakenly thought both internal and leaf node I/O failure
will go into cached_dev_read_error(). Thanks for correct me and provide
a comprehensive fix.

A positive side effect is, Eric, Kent and Michael they all want to
permit data recovery from backing device is the failed data is clean on
cache device. Your patch permits more read request to be served, and
fixs a read race at same time, cool :-)

For users want aggressive I/O error notifying of cache device (SSD), I
think it can be solved some other error accounting methods. We don't
discuss in this thread.

> 
> Signed-off-by: Hua Rui <huarui.dev@gmail.com>

Reviewed-by: Coly Li <colyli@suse.de>

P.S could you please also take a look on btree internal node I/O failure
? Thanks in advance.

Coly Li

> ---
>  drivers/md/bcache/request.c | 13 ++++++-------
>  1 file changed, 6 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
> index 3a7aed7..f02fefb 100644
> --- a/drivers/md/bcache/request.c
> +++ b/drivers/md/bcache/request.c
> @@ -708,16 +708,15 @@ static void cached_dev_read_error(struct closure *cl)
>  {
>         struct search *s = container_of(cl, struct search, cl);
>         struct bio *bio = &s->bio.bio;
> -       struct cached_dev *dc = container_of(s->d, struct cached_dev, disk);
> 
>         /*
> -        * If cache device is dirty (dc->has_dirty is non-zero), then
> -        * recovery a failed read request from cached device may get a
> -        * stale data back. So read failure recovery is only permitted
> -        * when cache device is clean.
> +        * If read request hit dirty data (s->read_dirty_data is true),
> +        * then recovery a failed read request from cached device may
> +        * get a stale data back. So read failure recovery is only
> +        * permitted when read request hit clean data in cache device,
> +        * or when cache read race happend.
>          */
> -       if (s->recoverable &&
> -           (dc && !atomic_read(&dc->has_dirty))) {
> +       if (s->recoverable && !s->read_dirty_data) {
>                 /* Retry from the backing device: */
>                 trace_bcache_read_retry(s->orig_bio);
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] bcache: recover data from backing device when read request hit clean
  2017-11-17  6:01 ` Coly Li
@ 2017-11-17  7:26   ` Rui Hua
       [not found]     ` <D393CD04-8C7A-4A8C-B23A-6C68D5544E34@profihost.ag>
  0 siblings, 1 reply; 19+ messages in thread
From: Rui Hua @ 2017-11-17  7:26 UTC (permalink / raw)
  To: Coly Li; +Cc: Coly Li, Michael Lyle, Kent Overstreet, linux-bcache, linux-block

Hi, Coly--
Thanks for your detailed review notes.

2017-11-17 14:01 GMT+08:00 Coly Li <colyli@suse.de>:

>> Without this patch, when we use writeback mode, we will never reread from
>> the backing device when cache read race happend, until the whole cache
>> device is clean, because the condition
>
> I assume this is a race condition, that means after the race window, the
> KEY associated to the reused bucket will be invalided too. So a second
> read will trigger a cache miss, and re-read the data back into cache
> device. So it won't be "never". The problem is, if upper layer code
> treats it as a failed I/O, and it happens to be metadata of upper layer
> code, then some negative result may happens. For example file system
> turns into read-only mode.

Yes, your example about metadata of upper layer code provides a good reference ,
Indeed, I met the xfs metadata error a few days ago when I use xfs on bcache,
under heavy load, I got the following xfs error:
localhost kernel: XFS (bcache0): metadata I/O error: block 0x3e3e2af0
("xfs_trans_read_buf_map") error 4 numblks 16
And I can confirm it was caused by the read race.

The word "never" I used in above is not suitable:-/, thanks for your point,
 what I want to say is we will not recover clean data UNTIL the whole cache
device become clean, or in writethrough mode.

>
> P.S could you please also take a look on btree internal node I/O failure
> ? Thanks in advance.
>
At least, for now, I find that when cache_lookup() traverse btree,
if bch_btree_node_get() return ERR_PTR(-EIO), it will not pass to upper layer,
cached_dev_bio_complete() will be called, just like there is no any IO error,
I think this is another bug. I'll deep into code and find more detail
when I have time.

Thanks,

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] bcache: recover data from backing device when read request hit clean
       [not found]     ` <D393CD04-8C7A-4A8C-B23A-6C68D5544E34@profihost.ag>
@ 2017-11-17 10:20         ` Rui Hua
  0 siblings, 0 replies; 19+ messages in thread
From: Rui Hua @ 2017-11-17 10:20 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG
  Cc: Coly Li, Coly Li, Michael Lyle, Kent Overstreet, linux-bcache,
	linux-block

Hi, Stefan

2017-11-17 16:28 GMT+08:00 Stefan Priebe - Profihost AG <s.priebe@profihost=
.ag>:
> I=E2=80=98m getting the same xfs error message under high load. Does this=
 patch fix
> it?
>
Did you applied the patch "bcache: only permit to recovery read error
when cache device is clean" ?
If you did, maybe this patch can fix it. And you'd better check
/sys/fs/bcache/XXX/internal/cache_read_races in your environment,
meanwhile, it should not be zero when you get that err message.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] bcache: recover data from backing device when read request hit clean
@ 2017-11-17 10:20         ` Rui Hua
  0 siblings, 0 replies; 19+ messages in thread
From: Rui Hua @ 2017-11-17 10:20 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG
  Cc: Coly Li, Coly Li, Michael Lyle, Kent Overstreet, linux-bcache,
	linux-block

Hi, Stefan

2017-11-17 16:28 GMT+08:00 Stefan Priebe - Profihost AG <s.priebe@profihost.ag>:
> I‘m getting the same xfs error message under high load. Does this patch fix
> it?
>
Did you applied the patch "bcache: only permit to recovery read error
when cache device is clean" ?
If you did, maybe this patch can fix it. And you'd better check
/sys/fs/bcache/XXX/internal/cache_read_races in your environment,
meanwhile, it should not be zero when you get that err message.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] bcache: recover data from backing device when read request hit clean
  2017-11-17 10:20         ` Rui Hua
  (?)
@ 2017-11-17 12:38         ` Stefan Priebe - Profihost AG
  -1 siblings, 0 replies; 19+ messages in thread
From: Stefan Priebe - Profihost AG @ 2017-11-17 12:38 UTC (permalink / raw)
  To: Rui Hua
  Cc: Coly Li, Coly Li, Michael Lyle, Kent Overstreet, linux-bcache,
	linux-block


Am 17.11.2017 um 11:20 schrieb Rui Hua:
> Hi, Stefan
> 
> 2017-11-17 16:28 GMT+08:00 Stefan Priebe - Profihost AG <s.priebe@profihost.ag>:
>> I‘m getting the same xfs error message under high load. Does this patch fix
>> it?
>>
> Did you applied the patch "bcache: only permit to recovery read error
> when cache device is clean" ?
> If you did, maybe this patch can fix it. And you'd better check
> /sys/fs/bcache/XXX/internal/cache_read_races in your environment,
> meanwhile, it should not be zero when you get that err message.
> 


Yes this patch is applied.

Stefan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] bcache: recover data from backing device when read request hit clean
  2017-11-17 10:20         ` Rui Hua
  (?)
  (?)
@ 2017-11-17 12:57         ` Eddie Chapman
  2017-11-17 13:22           ` Coly Li
  -1 siblings, 1 reply; 19+ messages in thread
From: Eddie Chapman @ 2017-11-17 12:57 UTC (permalink / raw)
  To: Rui Hua, Stefan Priebe - Profihost AG
  Cc: Coly Li, Michael Lyle, Kent Overstreet, linux-bcache, linux-block

On 17/11/17 10:20, Rui Hua wrote:
> Hi, Stefan
> 
> 2017-11-17 16:28 GMT+08:00 Stefan Priebe - Profihost AG <s.priebe@profihost.ag>:
>> I‘m getting the same xfs error message under high load. Does this patch fix
>> it?
>>
> Did you applied the patch "bcache: only permit to recovery read error
> when cache device is clean" ?
> If you did, maybe this patch can fix it. And you'd better check
> /sys/fs/bcache/XXX/internal/cache_read_races in your environment,
> meanwhile, it should not be zero when you get that err message.

Hi all,

I have 3 servers running a very recent 4.9 stable release, with several 
recent bcache patches cherry picked, including V4 of "bcache: only 
permit to recovery read error when cache device is clean".

In the 3 weeks since using these cherry picks I've experienced a very 
small number of isolated read errors in the layer above bcache, on all 3 
servers.

On one of the servers, 2 out of the 6 bcache resources have a value of 1 
in /sys/fs/bcache/XXX/internal/cache_read_races, and it is on these same 
2 bcache resources where one read error has occurred on the upper layer. 
The other 4 bcache resources have 0 in cache_read_races and I haven't 
had any read errors on the layers above them.

On another server, I have 1 bcache resource out of 10 with a value of 5 
in /sys/fs/bcache/XXX/internal/cache_read_races, and it is on that 
bcache resource where a read error occurred on one occasion. The other 9 
bcache resources have 0 in cache_read_races, and no read errors have 
occurred on the layers above any of them.

On the 3rd server where some read errors occurred, I cannot verify if 
there were positive values in cache_read_races as I moved the data from 
there onto other storage, and shut down the bcache resources where the 
errors occurred.

If I can provide any other info which might help with this issue, please 
let me know.

regards,
Eddie

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] bcache: recover data from backing device when read request hit clean
  2017-11-17 12:57         ` Eddie Chapman
@ 2017-11-17 13:22           ` Coly Li
  2017-11-17 13:43             ` Eddie Chapman
  0 siblings, 1 reply; 19+ messages in thread
From: Coly Li @ 2017-11-17 13:22 UTC (permalink / raw)
  To: Eddie Chapman
  Cc: Rui Hua, Stefan Priebe - Profihost AG, Coly Li, Michael Lyle,
	Kent Overstreet, linux-bcache, linux-block

On 17/11/2017 8:57 PM, Eddie Chapman wrote:
> On 17/11/17 10:20, Rui Hua wrote:
>> Hi, Stefan
>>
>> 2017-11-17 16:28 GMT+08:00 Stefan Priebe - Profihost AG
>> <s.priebe@profihost.ag>:
>>> I‘m getting the same xfs error message under high load. Does this
>>> patch fix
>>> it?
>>>
>> Did you applied the patch "bcache: only permit to recovery read error
>> when cache device is clean" ?
>> If you did, maybe this patch can fix it. And you'd better check
>> /sys/fs/bcache/XXX/internal/cache_read_races in your environment,
>> meanwhile, it should not be zero when you get that err message.
> 
> Hi all,
> 
> I have 3 servers running a very recent 4.9 stable release, with several
> recent bcache patches cherry picked, including V4 of "bcache: only
> permit to recovery read error when cache device is clean".
> 
> In the 3 weeks since using these cherry picks I've experienced a very
> small number of isolated read errors in the layer above bcache, on all 3
> servers.
> 
> On one of the servers, 2 out of the 6 bcache resources have a value of 1
> in /sys/fs/bcache/XXX/internal/cache_read_races, and it is on these same
> 2 bcache resources where one read error has occurred on the upper layer.
> The other 4 bcache resources have 0 in cache_read_races and I haven't
> had any read errors on the layers above them.
> 
> On another server, I have 1 bcache resource out of 10 with a value of 5
> in /sys/fs/bcache/XXX/internal/cache_read_races, and it is on that
> bcache resource where a read error occurred on one occasion. The other 9
> bcache resources have 0 in cache_read_races, and no read errors have
> occurred on the layers above any of them.
> 
> On the 3rd server where some read errors occurred, I cannot verify if
> there were positive values in cache_read_races as I moved the data from
> there onto other storage, and shut down the bcache resources where the
> errors occurred.
> 
> If I can provide any other info which might help with this issue, please
> let me know.

Hi Eddie,

This is very informative, thank you so much :-)

Coly Li

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] bcache: recover data from backing device when read request hit clean
  2017-11-17 13:22           ` Coly Li
@ 2017-11-17 13:43             ` Eddie Chapman
  0 siblings, 0 replies; 19+ messages in thread
From: Eddie Chapman @ 2017-11-17 13:43 UTC (permalink / raw)
  To: Coly Li
  Cc: Rui Hua, Stefan Priebe - Profihost AG, Coly Li, Michael Lyle,
	Kent Overstreet, linux-bcache, linux-block

On 17/11/17 13:22, Coly Li wrote:
> On 17/11/2017 8:57 PM, Eddie Chapman wrote:
>> On 17/11/17 10:20, Rui Hua wrote:
>>> Hi, Stefan
>>>
>>> 2017-11-17 16:28 GMT+08:00 Stefan Priebe - Profihost AG
>>> <s.priebe@profihost.ag>:
>>>> I‘m getting the same xfs error message under high load. Does this
>>>> patch fix
>>>> it?
>>>>
>>> Did you applied the patch "bcache: only permit to recovery read error
>>> when cache device is clean" ?
>>> If you did, maybe this patch can fix it. And you'd better check
>>> /sys/fs/bcache/XXX/internal/cache_read_races in your environment,
>>> meanwhile, it should not be zero when you get that err message.
>>
>> Hi all,
>>
>> I have 3 servers running a very recent 4.9 stable release, with several
>> recent bcache patches cherry picked, including V4 of "bcache: only
>> permit to recovery read error when cache device is clean".
>>
>> In the 3 weeks since using these cherry picks I've experienced a very
>> small number of isolated read errors in the layer above bcache, on all 3
>> servers.
>>
>> On one of the servers, 2 out of the 6 bcache resources have a value of 1
>> in /sys/fs/bcache/XXX/internal/cache_read_races, and it is on these same
>> 2 bcache resources where one read error has occurred on the upper layer.
>> The other 4 bcache resources have 0 in cache_read_races and I haven't
>> had any read errors on the layers above them.
>>
>> On another server, I have 1 bcache resource out of 10 with a value of 5
>> in /sys/fs/bcache/XXX/internal/cache_read_races, and it is on that
>> bcache resource where a read error occurred on one occasion. The other 9
>> bcache resources have 0 in cache_read_races, and no read errors have
>> occurred on the layers above any of them.
>>
>> On the 3rd server where some read errors occurred, I cannot verify if
>> there were positive values in cache_read_races as I moved the data from
>> there onto other storage, and shut down the bcache resources where the
>> errors occurred.
>>
>> If I can provide any other info which might help with this issue, please
>> let me know.
> 
> Hi Eddie,
> 
> This is very informative, thank you so much :-)
> 
> Coly Li

Hi Coly,

You are most welcome. Another interesting info, but maybe it is 
unrelated/coincidence: the bcache resources where the errors occurred, 
the underlying backing device was a raid adapter that is quite a lot 
slower than the (different) underlying physical storage on the other 
bcache resources that do not have read races. Up to now I had suspected 
a driver issue with this raid adapter as causing the read errors, so I 
started the process of gradually retiring the adapter on these servers 
in the last 3 weeks. Anyway, in light of this issue coming up here I'm 
wondering if this is significant in suggesting possibly that the read 
races are more likely to occur if the backing storage is quite slow. Or 
maybe not.

Eddie

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] bcache: recover data from backing device when read request hit clean
  2017-11-17  4:25   ` Rui Hua
@ 2017-11-18  0:57     ` Michael Lyle
  2017-11-18 17:00       ` Rui Hua
  2017-11-22  5:13       ` Michael Lyle
  0 siblings, 2 replies; 19+ messages in thread
From: Michael Lyle @ 2017-11-18  0:57 UTC (permalink / raw)
  To: Rui Hua; +Cc: Coly Li, Kent Overstreet, linux-bcache, linux-block

Rui Hua--

Thanks for the clarification.  You're correct, it doesn't get called,
and it goes to read_complete.  However, read_error should probably get
called.  How would you suggest handling this-- should we initially set
read_dirty_data true and clear it if it is all clean?  (and change the
condition to properly go into the error path).  As it is, the patch
makes things no worse but the error path is very unsafe.

Mike

On Thu, Nov 16, 2017 at 8:25 PM, Rui Hua <huarui.dev@gmail.com> wrote:
> Hi, Michael--
>
> Thanks for the quickly reply.
>
> If the metadata on the cache device was NOT correctly read,
> cached_dev_read_error() will not be called, so the recovery check will
> not be executed. so this patch is safe.
> s->iop.error was not set when faild to read metadata on the cache
> device, so the  cached_dev_bio_complete() will be called instead of
> cached_dev_read_error()
>
> 2017-11-17 12:02 GMT+08:00 Michael Lyle <mlyle@lyle.org>:
>> Hi, Rui Hua--
>>
>> On 11/16/2017 07:51 PM, Rui Hua wrote:
>>> In this patch, we use s->read_dirty_data to judge whether the read
>>> request hit dirty data in cache device, it is safe to reread data from
>>> the backing device when the read request hit clean data. This can not
>>> only handle cache read race, but also recover data when failed read
>>> request from cache device.
>>
>> I haven't read over all of this yet, to understand the read race.  But
>> this change can't be safe-- read_dirty_data only is set to true if the
>> metadata on the cache device is correctly read.  If that fails, the flag
>> may not be set and we could read stale data on the backing device.
>>
>> Mike

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] bcache: recover data from backing device when read request hit clean
  2017-11-18  0:57     ` Michael Lyle
@ 2017-11-18 17:00       ` Rui Hua
  2017-11-22  5:13       ` Michael Lyle
  1 sibling, 0 replies; 19+ messages in thread
From: Rui Hua @ 2017-11-18 17:00 UTC (permalink / raw)
  To: Michael Lyle; +Cc: Coly Li, Kent Overstreet, linux-bcache, linux-block

Hi, Michael

> Thanks for the clarification.  You're correct, it doesn't get called,
> and it goes to read_complete.  However, read_error should probably get
> called.  How would you suggest handling this-- should we initially set
> read_dirty_data true and clear it if it is all clean?  (and change the
> condition to properly go into the error path).  As it is, the patch
> makes things no worse but the error path is very unsafe.

Yes, init read_dirty_data to true is an option, but I think it is not
so easy to judge whether it is all clean, because a read request may
cover several keys(just as Coly mentioned in previous reply), probably
some of them are dirty, others are clean, and cache_lookup_fn() was
called on each of these keys when read request search them on btree,
which means we should count and mark them in some place?

Maybe we can set s->iop.status when metadata I/O error, and clear
s->recoverable at the same time (add the code in cache_lookup(), after
bch_btree_map_keys() returns), then it can goto the read_error path,
and will not goto the recovery code.  This option is just a idea,
certainly, we should consider other exception cases.

Thanks,

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] bcache: recover data from backing device when read request hit clean
  2017-11-18  0:57     ` Michael Lyle
  2017-11-18 17:00       ` Rui Hua
@ 2017-11-22  5:13       ` Michael Lyle
  2017-11-22  6:04         ` Rui Hua
  2017-11-23 15:09         ` Eddie Chapman
  1 sibling, 2 replies; 19+ messages in thread
From: Michael Lyle @ 2017-11-22  5:13 UTC (permalink / raw)
  To: Rui Hua; +Cc: Coly Li, Kent Overstreet, linux-bcache, linux-block

Reviewed-by: Michael Lyle <mlyle@lyle.org>

Please note though, that your patches have all tabs expanded to spaces,
which makes them difficult to apply.  I have fixed things these times
but please try to submit them in correct form in the future.

Thanks,

Mike

On 11/17/2017 04:57 PM, Michael Lyle wrote:
> Rui Hua--
> 
> Thanks for the clarification.  You're correct, it doesn't get called,
> and it goes to read_complete.  However, read_error should probably get
> called.  How would you suggest handling this-- should we initially set
> read_dirty_data true and clear it if it is all clean?  (and change the
> condition to properly go into the error path).  As it is, the patch
> makes things no worse but the error path is very unsafe.
> 
> Mike
> 
> On Thu, Nov 16, 2017 at 8:25 PM, Rui Hua <huarui.dev@gmail.com> wrote:
>> Hi, Michael--
>>
>> Thanks for the quickly reply.
>>
>> If the metadata on the cache device was NOT correctly read,
>> cached_dev_read_error() will not be called, so the recovery check will
>> not be executed. so this patch is safe.
>> s->iop.error was not set when faild to read metadata on the cache
>> device, so the  cached_dev_bio_complete() will be called instead of
>> cached_dev_read_error()
>>
>> 2017-11-17 12:02 GMT+08:00 Michael Lyle <mlyle@lyle.org>:
>>> Hi, Rui Hua--
>>>
>>> On 11/16/2017 07:51 PM, Rui Hua wrote:
>>>> In this patch, we use s->read_dirty_data to judge whether the read
>>>> request hit dirty data in cache device, it is safe to reread data from
>>>> the backing device when the read request hit clean data. This can not
>>>> only handle cache read race, but also recover data when failed read
>>>> request from cache device.
>>>
>>> I haven't read over all of this yet, to understand the read race.  But
>>> this change can't be safe-- read_dirty_data only is set to true if the
>>> metadata on the cache device is correctly read.  If that fails, the flag
>>> may not be set and we could read stale data on the backing device.
>>>
>>> Mike
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] bcache: recover data from backing device when read request hit clean
  2017-11-22  5:13       ` Michael Lyle
@ 2017-11-22  6:04         ` Rui Hua
  2017-11-23 15:09         ` Eddie Chapman
  1 sibling, 0 replies; 19+ messages in thread
From: Rui Hua @ 2017-11-22  6:04 UTC (permalink / raw)
  To: Michael Lyle; +Cc: Coly Li, Kent Overstreet, linux-bcache, linux-block

Hi, Michael

Thanks for your reminder. I'll checkpatch carefully next time.

Thanks

2017-11-22 13:13 GMT+08:00 Michael Lyle <mlyle@lyle.org>:
> Reviewed-by: Michael Lyle <mlyle@lyle.org>
>
> Please note though, that your patches have all tabs expanded to spaces,
> which makes them difficult to apply.  I have fixed things these times
> but please try to submit them in correct form in the future.
>
> Thanks,
>
> Mike
>
> On 11/17/2017 04:57 PM, Michael Lyle wrote:
>> Rui Hua--
>>
>> Thanks for the clarification.  You're correct, it doesn't get called,
>> and it goes to read_complete.  However, read_error should probably get
>> called.  How would you suggest handling this-- should we initially set
>> read_dirty_data true and clear it if it is all clean?  (and change the
>> condition to properly go into the error path).  As it is, the patch
>> makes things no worse but the error path is very unsafe.
>>
>> Mike
>>
>> On Thu, Nov 16, 2017 at 8:25 PM, Rui Hua <huarui.dev@gmail.com> wrote:
>>> Hi, Michael--
>>>
>>> Thanks for the quickly reply.
>>>
>>> If the metadata on the cache device was NOT correctly read,
>>> cached_dev_read_error() will not be called, so the recovery check will
>>> not be executed. so this patch is safe.
>>> s->iop.error was not set when faild to read metadata on the cache
>>> device, so the  cached_dev_bio_complete() will be called instead of
>>> cached_dev_read_error()
>>>
>>> 2017-11-17 12:02 GMT+08:00 Michael Lyle <mlyle@lyle.org>:
>>>> Hi, Rui Hua--
>>>>
>>>> On 11/16/2017 07:51 PM, Rui Hua wrote:
>>>>> In this patch, we use s->read_dirty_data to judge whether the read
>>>>> request hit dirty data in cache device, it is safe to reread data from
>>>>> the backing device when the read request hit clean data. This can not
>>>>> only handle cache read race, but also recover data when failed read
>>>>> request from cache device.
>>>>
>>>> I haven't read over all of this yet, to understand the read race.  But
>>>> this change can't be safe-- read_dirty_data only is set to true if the
>>>> metadata on the cache device is correctly read.  If that fails, the flag
>>>> may not be set and we could read stale data on the backing device.
>>>>
>>>> Mike
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] bcache: recover data from backing device when read request hit clean
  2017-11-22  5:13       ` Michael Lyle
  2017-11-22  6:04         ` Rui Hua
@ 2017-11-23 15:09         ` Eddie Chapman
  2017-11-23 16:31           ` Coly Li
  1 sibling, 1 reply; 19+ messages in thread
From: Eddie Chapman @ 2017-11-23 15:09 UTC (permalink / raw)
  To: Michael Lyle, Rui Hua, Coly Li; +Cc: linux-bcache, Kent Overstreet

On 22/11/17 05:13, Michael Lyle wrote:
> Reviewed-by: Michael Lyle <mlyle@lyle.org>

<snip>

I have a little dilemma now related to this patch and I wonder if I 
could ask you guys for your opinion?

I have been hit and am being hit by the bug this patch fixes every few 
days at the moment, on writeback bcache resources. When it hits it 
causes a failure in the layer above, and this leads to other problems 
that can and has led to loss of data (not in bcache, just that the read 
failure causes a chain of events in upper layer, separate issue, won't 
clutter this mail with that). So, I am very keen to resolve this 
urgently in my environment.

I am running stable 4.9, and I have cherry picked several recent bcache 
patches, including "bcache: only permit to recovery read error
when cache device is clean", which I understand causes this issue.

I have a dilemma now whether I should apply the new patch in this 
thread, which should fix the issue, or revert "only permit to recovery 
read error" patch.

The dilemma is because I don't like the idea of reverting the "only 
permit to recovery read error" patch since it appears to fix an 
important problem.  Or am I overestimating the problem (maybe that's why 
the original patch has not been backported to stable kernels, maybe the 
original issue quite rare)?

But I am also wary of applying the new patch as Michael only just added 
to his tree so hasn't had much baking time.

I wondered what is the opinion of you guys which of these two I should 
choose?

(BTW, please don't say "just don't cherry pick onto 4.9". I am aware of 
the risks of cherry picking, as discussed recently on this list. I am 
happy with the level of risk and I am being very careful in what I pick. 
In this case I have not been "burnt" by cherry picking since I would 
have hit the same issue if I would have just upgraded to 4.14 which 
contains the original "only permit to recovery read error" patch.)

( N.B. I understand that any opinion/advice I receive is just that. I 
take complete responsibility for what I decide :-) )

Thanks!
Eddie

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] bcache: recover data from backing device when read request hit clean
  2017-11-23 15:09         ` Eddie Chapman
@ 2017-11-23 16:31           ` Coly Li
  2017-11-23 18:08             ` Eddie Chapman
  0 siblings, 1 reply; 19+ messages in thread
From: Coly Li @ 2017-11-23 16:31 UTC (permalink / raw)
  To: Eddie Chapman; +Cc: Michael Lyle, Rui Hua, linux-bcache, Kent Overstreet

On 23/11/2017 11:09 PM, Eddie Chapman wrote:
> On 22/11/17 05:13, Michael Lyle wrote:
>> Reviewed-by: Michael Lyle <mlyle@lyle.org>
> 
> <snip>
> 
> I have a little dilemma now related to this patch and I wonder if I
> could ask you guys for your opinion?
> 

Hi Eddie,

As author of  "only permit to recovery read error" patch, let me try to
response your question.

> I have been hit and am being hit by the bug this patch fixes every few
> days at the moment, on writeback bcache resources. When it hits it
> causes a failure in the layer above, and this leads to other problems
> that can and has led to loss of data (not in bcache, just that the read
> failure causes a chain of events in upper layer, separate issue, won't
> clutter this mail with that). So, I am very keen to resolve this
> urgently in my environment.
> 
> I am running stable 4.9, and I have cherry picked several recent bcache
> patches, including "bcache: only permit to recovery read error
> when cache device is clean", which I understand causes this issue.
> 
> I have a dilemma now whether I should apply the new patch in this
> thread, which should fix the issue, or revert "only permit to recovery
> read error" patch.
> 

I suggest you to take one of the 2 actions,
- apply "bcache: recover data from backing device when read request hit
clean" patch, or
- revert "only permit to recovery read error" patch, and add the
modification as "bcache: recover data from backing device when read
request hit clean" does.

> The dilemma is because I don't like the idea of reverting the "only
> permit to recovery read error" patch since it appears to fix an
> important problem.  Or am I overestimating the problem (maybe that's why
> the original patch has not been backported to stable kernels, maybe the
> original issue quite rare)?
> 

"only permit to recovery read error" patch fixes a real data corruption
issue with broken cache device (SSD), which was reported from production
environment. If you don't have all these 2 patches, your data may be
corrupted in silence when SSD is broken.

> But I am also wary of applying the new patch as Michael only just added
> to his tree so hasn't had much baking time.
> 

The new one "bcache: recover data from backing device when read request
hit clean" can be treated as an improvement of my patch. Because when I
wrote that patch, I didn't notice metadata and data failure on cache
device were in different code path, I have to only check ds->has_dirty.
Rui points out the metadata failure is in another code pathm and buggy
too. He posts another patch to fix it, and you may find it in
linux-bcache mailing list.

When I wrote the "only permit to recovery read error" patch, I didn't
notice the read race existed neither, until Junhui and Rui noticed me.
The read race only happens on clean data (because its bucket is clean
and re-allocated to different write request), so the new patch permits
data recovery from backing device if a read failed on clean data.

The result for this new fix is, less I/O error may returns to upper
layer code, especially the bogus I/O error generated by btree read race.

> I wondered what is the opinion of you guys which of these two I should
> choose?

If you only do patching, you need to have both of them. If you only want
one patch, you may rebase this patch to 4.9 kernel and ignore the first one.

> 
> (BTW, please don't say "just don't cherry pick onto 4.9". I am aware of
> the risks of cherry picking, as discussed recently on this list. I am
> happy with the level of risk and I am being very careful in what I pick.
> In this case I have not been "burnt" by cherry picking since I would
> have hit the same issue if I would have just upgraded to 4.14 which
> contains the original "only permit to recovery read error" patch.)
> 
> ( N.B. I understand that any opinion/advice I receive is just that. I
> take complete responsibility for what I decide :-) )

One more comment, when you apply both patches, I/O failure on broken
cache device (SSD) is still risky. As Rui points out in his patch,
"The read request might meet error when searching the btree, but the
error was not handled in cache_lookup(), and this kind of metadata
failure will not go into cached_dev_read_error(), finally, the upper
layer will receive bi_status=0."

Rui's patch "bcache: return IOERR to upper layer when read request meet
metadata error" is also added inti Mike's tree, if you may consider
testing it, that will be very helpful to us.

Thank you for asking great question :-)

-- 
Coly Li

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] bcache: recover data from backing device when read request hit clean
  2017-11-23 16:31           ` Coly Li
@ 2017-11-23 18:08             ` Eddie Chapman
  2017-11-24 12:50               ` Rui Hua
  0 siblings, 1 reply; 19+ messages in thread
From: Eddie Chapman @ 2017-11-23 18:08 UTC (permalink / raw)
  To: Coly Li; +Cc: Michael Lyle, Rui Hua, linux-bcache, Kent Overstreet

Hi Coly,

On 23/11/17 16:31, Coly Li wrote:
> On 23/11/2017 11:09 PM, Eddie Chapman wrote:
>> On 22/11/17 05:13, Michael Lyle wrote:
>>> Reviewed-by: Michael Lyle <mlyle@lyle.org>
>>
>> <snip>
>>
>> I have a little dilemma now related to this patch and I wonder if I
>> could ask you guys for your opinion?
>>
> 
> Hi Eddie,
> 
> As author of  "only permit to recovery read error" patch, let me try to
> response your question.
> 
>> I have been hit and am being hit by the bug this patch fixes every few
>> days at the moment, on writeback bcache resources. When it hits it
>> causes a failure in the layer above, and this leads to other problems
>> that can and has led to loss of data (not in bcache, just that the read
>> failure causes a chain of events in upper layer, separate issue, won't
>> clutter this mail with that). So, I am very keen to resolve this
>> urgently in my environment.
>>
>> I am running stable 4.9, and I have cherry picked several recent bcache
>> patches, including "bcache: only permit to recovery read error
>> when cache device is clean", which I understand causes this issue.
>>
>> I have a dilemma now whether I should apply the new patch in this
>> thread, which should fix the issue, or revert "only permit to recovery
>> read error" patch.
>>
> 
> I suggest you to take one of the 2 actions,
> - apply "bcache: recover data from backing device when read request hit
> clean" patch, or
> - revert "only permit to recovery read error" patch, and add the
> modification as "bcache: recover data from backing device when read
> request hit clean" does.

Thanks for clarifying what the 2 best options are.

>> The dilemma is because I don't like the idea of reverting the "only
>> permit to recovery read error" patch since it appears to fix an
>> important problem.  Or am I overestimating the problem (maybe that's why
>> the original patch has not been backported to stable kernels, maybe the
>> original issue quite rare)?
>>
> 
> "only permit to recovery read error" patch fixes a real data corruption
> issue with broken cache device (SSD), which was reported from production
> environment. If you don't have all these 2 patches, your data may be
> corrupted in silence when SSD is broken.

OK, I think that confirms to me then that I'm definitely keeping the 
"only permit to recovery read error" patch :-)

That sounds like a very significant bug fix. I'm wondering, shouldn't 
"only permit to recovery read error" patch be backported to the stable 
kernels like 4.9, 4.4? Or maybe it doesn't fully meet the stable 
inclusion criteria. I know the patch was CC'd to stable but it has not 
been picked up. Maybe because it was not in Linus' tree yet when it was 
originally CC'd.

>> But I am also wary of applying the new patch as Michael only just added
>> to his tree so hasn't had much baking time.
>>
> 
> The new one "bcache: recover data from backing device when read request
> hit clean" can be treated as an improvement of my patch. Because when I
> wrote that patch, I didn't notice metadata and data failure on cache
> device were in different code path, I have to only check ds->has_dirty.

OK, thanks, that makes me feel a little better about applying it.

> Rui points out the metadata failure is in another code pathm and buggy
> too. He posts another patch to fix it, and you may find it in
> linux-bcache mailing list.

Yes, I noticed that patch and was thinking whether to apply that one too.

> When I wrote the "only permit to recovery read error" patch, I didn't
> notice the read race existed neither, until Junhui and Rui noticed me.
> The read race only happens on clean data (because its bucket is clean
> and re-allocated to different write request), so the new patch permits
> data recovery from backing device if a read failed on clean data.
> 
> The result for this new fix is, less I/O error may returns to upper
> layer code, especially the bogus I/O error generated by btree read race.

Thanks for explaining and the detail, it is much appreciated :-) And I 
hope I don't come across as complaining about this patch in any way :-) 
I know this is difficult work, and, hey, even with the new bug 
introduced, the new bug is a *lot* less painful than the original bug 
you fixed :-)

>> I wondered what is the opinion of you guys which of these two I should
>> choose?
> 
> If you only do patching, you need to have both of them. If you only want
> one patch, you may rebase this patch to 4.9 kernel and ignore the first one.
> 
>>
>> (BTW, please don't say "just don't cherry pick onto 4.9". I am aware of
>> the risks of cherry picking, as discussed recently on this list. I am
>> happy with the level of risk and I am being very careful in what I pick.
>> In this case I have not been "burnt" by cherry picking since I would
>> have hit the same issue if I would have just upgraded to 4.14 which
>> contains the original "only permit to recovery read error" patch.)
>>
>> ( N.B. I understand that any opinion/advice I receive is just that. I
>> take complete responsibility for what I decide :-) )
> 
> One more comment, when you apply both patches, I/O failure on broken
> cache device (SSD) is still risky. As Rui points out in his patch,
> "The read request might meet error when searching the btree, but the
> error was not handled in cache_lookup(), and this kind of metadata
> failure will not go into cached_dev_read_error(), finally, the upper
> layer will receive bi_status=0."
> 
> Rui's patch "bcache: return IOERR to upper layer when read request meet
> metadata error" is also added inti Mike's tree, if you may consider
> testing it, that will be very helpful to us.

You have convinced me that the best way forward is to keep "only permit 
to recovery read error", and also apply both of the new patches.   I 
think I will wait a few more days to give a little more "baking" time 
for the new patches, then will try them and report back.

> 
> Thank you for asking great question :-)
> 

Thanks!
Eddie

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] bcache: recover data from backing device when read request hit clean
  2017-11-23 18:08             ` Eddie Chapman
@ 2017-11-24 12:50               ` Rui Hua
  0 siblings, 0 replies; 19+ messages in thread
From: Rui Hua @ 2017-11-24 12:50 UTC (permalink / raw)
  To: Eddie Chapman; +Cc: Coly Li, Michael Lyle, linux-bcache, Kent Overstreet

Hi, Eddie,

I totally agree with Coly's suggestion about how to use the two new
patches in your environment. It will be very helpful to us if you try
them.

Coly,

Thank you very much for sorting out the ins and outs about the
patches, the explanation is very helpful and necessary.

Thanks.

2017-11-24 2:08 GMT+08:00 Eddie Chapman <eddie@ehuk.net>:
> Hi Coly,
>
>
> On 23/11/17 16:31, Coly Li wrote:
>>
>> On 23/11/2017 11:09 PM, Eddie Chapman wrote:
>>>
>>> On 22/11/17 05:13, Michael Lyle wrote:
>>>>
>>>> Reviewed-by: Michael Lyle <mlyle@lyle.org>
>>>
>>>
>>> <snip>
>>>
>>> I have a little dilemma now related to this patch and I wonder if I
>>> could ask you guys for your opinion?
>>>
>>
>> Hi Eddie,
>>
>> As author of  "only permit to recovery read error" patch, let me try to
>> response your question.
>>
>>> I have been hit and am being hit by the bug this patch fixes every few
>>> days at the moment, on writeback bcache resources. When it hits it
>>> causes a failure in the layer above, and this leads to other problems
>>> that can and has led to loss of data (not in bcache, just that the read
>>> failure causes a chain of events in upper layer, separate issue, won't
>>> clutter this mail with that). So, I am very keen to resolve this
>>> urgently in my environment.
>>>
>>> I am running stable 4.9, and I have cherry picked several recent bcache
>>> patches, including "bcache: only permit to recovery read error
>>> when cache device is clean", which I understand causes this issue.
>>>
>>> I have a dilemma now whether I should apply the new patch in this
>>> thread, which should fix the issue, or revert "only permit to recovery
>>> read error" patch.
>>>
>>
>> I suggest you to take one of the 2 actions,
>> - apply "bcache: recover data from backing device when read request hit
>> clean" patch, or
>> - revert "only permit to recovery read error" patch, and add the
>> modification as "bcache: recover data from backing device when read
>> request hit clean" does.
>
>
> Thanks for clarifying what the 2 best options are.
>
>>> The dilemma is because I don't like the idea of reverting the "only
>>> permit to recovery read error" patch since it appears to fix an
>>> important problem.  Or am I overestimating the problem (maybe that's why
>>> the original patch has not been backported to stable kernels, maybe the
>>> original issue quite rare)?
>>>
>>
>> "only permit to recovery read error" patch fixes a real data corruption
>> issue with broken cache device (SSD), which was reported from production
>> environment. If you don't have all these 2 patches, your data may be
>> corrupted in silence when SSD is broken.
>
>
> OK, I think that confirms to me then that I'm definitely keeping the "only
> permit to recovery read error" patch :-)
>
> That sounds like a very significant bug fix. I'm wondering, shouldn't "only
> permit to recovery read error" patch be backported to the stable kernels
> like 4.9, 4.4? Or maybe it doesn't fully meet the stable inclusion criteria.
> I know the patch was CC'd to stable but it has not been picked up. Maybe
> because it was not in Linus' tree yet when it was originally CC'd.
>
>>> But I am also wary of applying the new patch as Michael only just added
>>> to his tree so hasn't had much baking time.
>>>
>>
>> The new one "bcache: recover data from backing device when read request
>> hit clean" can be treated as an improvement of my patch. Because when I
>> wrote that patch, I didn't notice metadata and data failure on cache
>> device were in different code path, I have to only check ds->has_dirty.
>
>
> OK, thanks, that makes me feel a little better about applying it.
>
>> Rui points out the metadata failure is in another code pathm and buggy
>> too. He posts another patch to fix it, and you may find it in
>> linux-bcache mailing list.
>
>
> Yes, I noticed that patch and was thinking whether to apply that one too.
>
>> When I wrote the "only permit to recovery read error" patch, I didn't
>> notice the read race existed neither, until Junhui and Rui noticed me.
>> The read race only happens on clean data (because its bucket is clean
>> and re-allocated to different write request), so the new patch permits
>> data recovery from backing device if a read failed on clean data.
>>
>> The result for this new fix is, less I/O error may returns to upper
>> layer code, especially the bogus I/O error generated by btree read race.
>
>
> Thanks for explaining and the detail, it is much appreciated :-) And I hope
> I don't come across as complaining about this patch in any way :-) I know
> this is difficult work, and, hey, even with the new bug introduced, the new
> bug is a *lot* less painful than the original bug you fixed :-)
>
>>> I wondered what is the opinion of you guys which of these two I should
>>> choose?
>>
>>
>> If you only do patching, you need to have both of them. If you only want
>> one patch, you may rebase this patch to 4.9 kernel and ignore the first
>> one.
>>
>>>
>>> (BTW, please don't say "just don't cherry pick onto 4.9". I am aware of
>>> the risks of cherry picking, as discussed recently on this list. I am
>>> happy with the level of risk and I am being very careful in what I pick.
>>> In this case I have not been "burnt" by cherry picking since I would
>>> have hit the same issue if I would have just upgraded to 4.14 which
>>> contains the original "only permit to recovery read error" patch.)
>>>
>>> ( N.B. I understand that any opinion/advice I receive is just that. I
>>> take complete responsibility for what I decide :-) )
>>
>>
>> One more comment, when you apply both patches, I/O failure on broken
>> cache device (SSD) is still risky. As Rui points out in his patch,
>> "The read request might meet error when searching the btree, but the
>> error was not handled in cache_lookup(), and this kind of metadata
>> failure will not go into cached_dev_read_error(), finally, the upper
>> layer will receive bi_status=0."
>>
>> Rui's patch "bcache: return IOERR to upper layer when read request meet
>> metadata error" is also added inti Mike's tree, if you may consider
>> testing it, that will be very helpful to us.
>
>
> You have convinced me that the best way forward is to keep "only permit to
> recovery read error", and also apply both of the new patches.   I think I
> will wait a few more days to give a little more "baking" time for the new
> patches, then will try them and report back.
>
>>
>> Thank you for asking great question :-)
>>
>
> Thanks!
> Eddie

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2017-11-24 12:50 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-17  3:51 [PATCH] bcache: recover data from backing device when read request hit clean Rui Hua
2017-11-17  4:02 ` Michael Lyle
2017-11-17  4:25   ` Rui Hua
2017-11-18  0:57     ` Michael Lyle
2017-11-18 17:00       ` Rui Hua
2017-11-22  5:13       ` Michael Lyle
2017-11-22  6:04         ` Rui Hua
2017-11-23 15:09         ` Eddie Chapman
2017-11-23 16:31           ` Coly Li
2017-11-23 18:08             ` Eddie Chapman
2017-11-24 12:50               ` Rui Hua
2017-11-17  6:01 ` Coly Li
2017-11-17  7:26   ` Rui Hua
     [not found]     ` <D393CD04-8C7A-4A8C-B23A-6C68D5544E34@profihost.ag>
2017-11-17 10:20       ` Rui Hua
2017-11-17 10:20         ` Rui Hua
2017-11-17 12:38         ` Stefan Priebe - Profihost AG
2017-11-17 12:57         ` Eddie Chapman
2017-11-17 13:22           ` Coly Li
2017-11-17 13:43             ` Eddie Chapman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.