[PATCH] bcache: add REQ_FUA to avoid data lost in writeback mode

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] bcache: add REQ_FUA to avoid data lost in writeback mode
@ 2019-12-02 10:24 kungf
  2019-12-02 11:08 ` Coly Li
  0 siblings, 1 reply; 8+ messages in thread
From: kungf @ 2019-12-02 10:24 UTC (permalink / raw)
  To: colyli; +Cc: kent.overstreet, linux-bcache, linux-kernel, kungf

data may lost when in the follow scene of writeback mode:
1. client write data1 to bcache
2. client fdatasync
3. bcache flush cache set and backing device
if now data1 was not writed back to backing, it was only guaranteed safe in cache.
4.then cache writeback data1 to backing with only REQ_OP_WRITE
So data1 was not guaranteed in non-volatile storage,  it may lost if  power interruption 

Signed-off-by: kungf <wings.wyang@gmail.com>
---
 drivers/md/bcache/writeback.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
index 4a40f9eadeaf..e5cecb60569e 100644
--- a/drivers/md/bcache/writeback.c
+++ b/drivers/md/bcache/writeback.c
@@ -357,7 +357,7 @@ static void write_dirty(struct closure *cl)
 	 */
 	if (KEY_DIRTY(&w->key)) {
 		dirty_init(w);
-		bio_set_op_attrs(&io->bio, REQ_OP_WRITE, 0);
+		bio_set_op_attrs(&io->bio, REQ_OP_WRITE | REQ_FUA, 0);
 		io->bio.bi_iter.bi_sector = KEY_START(&w->key);
 		bio_set_dev(&io->bio, io->dc->bdev);
 		io->bio.bi_end_io	= dirty_endio;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] bcache: add REQ_FUA to avoid data lost in writeback mode
  2019-12-02 10:24 [PATCH] bcache: add REQ_FUA to avoid data lost in writeback mode kungf
@ 2019-12-02 11:08 ` Coly Li
  2019-12-02 19:34   ` Eric Wheeler
                     ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Coly Li @ 2019-12-02 11:08 UTC (permalink / raw)
  To: kungf; +Cc: kent.overstreet, linux-bcache, linux-kernel

On 2019/12/2 6:24 下午, kungf wrote:
> data may lost when in the follow scene of writeback mode:
> 1. client write data1 to bcache
> 2. client fdatasync
> 3. bcache flush cache set and backing device
> if now data1 was not writed back to backing, it was only guaranteed safe in cache.
> 4.then cache writeback data1 to backing with only REQ_OP_WRITE
> So data1 was not guaranteed in non-volatile storage,  it may lost if  power interruption 
> 

Hi,

Do you encounter such problem in real work load ? With bcache journal, I
don't see the possibility of data lost with your description.

Correct me if I am wrong.

Coly Li

> Signed-off-by: kungf <wings.wyang@gmail.com>
> ---
>  drivers/md/bcache/writeback.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
> index 4a40f9eadeaf..e5cecb60569e 100644
> --- a/drivers/md/bcache/writeback.c
> +++ b/drivers/md/bcache/writeback.c
> @@ -357,7 +357,7 @@ static void write_dirty(struct closure *cl)
>  	 */
>  	if (KEY_DIRTY(&w->key)) {
>  		dirty_init(w);
> -		bio_set_op_attrs(&io->bio, REQ_OP_WRITE, 0);
> +		bio_set_op_attrs(&io->bio, REQ_OP_WRITE | REQ_FUA, 0);
>  		io->bio.bi_iter.bi_sector = KEY_START(&w->key);
>  		bio_set_dev(&io->bio, io->dc->bdev);
>  		io->bio.bi_end_io	= dirty_endio;
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] bcache: add REQ_FUA to avoid data lost in writeback mode
  2019-12-02 11:08 ` Coly Li
@ 2019-12-02 19:34   ` Eric Wheeler
  2019-12-03 14:21     ` Coly Li
  2019-12-03 11:39   ` kungf
       [not found]   ` <CAMo46+q-usqgwHWhk2mcKMvK9yMY2kfN-t10wyqm+pv8L0HqYA@mail.gmail.com>
  2 siblings, 1 reply; 8+ messages in thread
From: Eric Wheeler @ 2019-12-02 19:34 UTC (permalink / raw)
  To: Coly Li; +Cc: kungf, kent.overstreet, linux-bcache, linux-kernel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1591 bytes --]

On Mon, 2 Dec 2019, Coly Li wrote:
> On 2019/12/2 6:24 下午, kungf wrote:
> > data may lost when in the follow scene of writeback mode:
> > 1. client write data1 to bcache
> > 2. client fdatasync
> > 3. bcache flush cache set and backing device
> > if now data1 was not writed back to backing, it was only guaranteed safe in cache.
> > 4.then cache writeback data1 to backing with only REQ_OP_WRITE
> > So data1 was not guaranteed in non-volatile storage,  it may lost if  power interruption 
> > 
> 
> Hi,
> 
> Do you encounter such problem in real work load ? With bcache journal, I
> don't see the possibility of data lost with your description.
> 
> Correct me if I am wrong.
> 
> Coly Li

If this does become necessary, then we should have a sysfs or superblock 
flag to disable FUA for those with RAID BBUs.

--
Eric Wheeler




> > Signed-off-by: kungf <wings.wyang@gmail.com>
> > ---
> >  drivers/md/bcache/writeback.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
> > index 4a40f9eadeaf..e5cecb60569e 100644
> > --- a/drivers/md/bcache/writeback.c
> > +++ b/drivers/md/bcache/writeback.c
> > @@ -357,7 +357,7 @@ static void write_dirty(struct closure *cl)
> >  	 */
> >  	if (KEY_DIRTY(&w->key)) {
> >  		dirty_init(w);
> > -		bio_set_op_attrs(&io->bio, REQ_OP_WRITE, 0);
> > +		bio_set_op_attrs(&io->bio, REQ_OP_WRITE | REQ_FUA, 0);
> >  		io->bio.bi_iter.bi_sector = KEY_START(&w->key);
> >  		bio_set_dev(&io->bio, io->dc->bdev);
> >  		io->bio.bi_end_io	= dirty_endio;
> > 
> 
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] bcache: add REQ_FUA to avoid data lost in writeback mode
  2019-12-02 11:08 ` Coly Li
  2019-12-02 19:34   ` Eric Wheeler
@ 2019-12-03 11:39   ` kungf
       [not found]   ` <CAMo46+q-usqgwHWhk2mcKMvK9yMY2kfN-t10wyqm+pv8L0HqYA@mail.gmail.com>
  2 siblings, 0 replies; 8+ messages in thread
From: kungf @ 2019-12-03 11:39 UTC (permalink / raw)
  To: Coly Li; +Cc: kent.overstreet, linux-bcache, linux-kernel

On Mon, 2 Dec 2019 19:08:59 +0800
Coly Li <colyli@suse.de> wrote:

> On 2019/12/2 6:24 pm, kungf wrote:
> > data may lost when in the follow scene of writeback mode:
> > 1. client write data1 to bcache
> > 2. client fdatasync
> > 3. bcache flush cache set and backing device
> > if now data1 was not writed back to backing, it was only guaranteed safe in cache.
> > 4.then cache writeback data1 to backing with only REQ_OP_WRITE
> > So data1 was not guaranteed in non-volatile storage,  it may lost if  power interruption 
> > 
> 
> Hi,
> 
> Do you encounter such problem in real work load ? With bcache journal, I
> don't see the possibility of data lost with your description.
> 
> Correct me if I am wrong.
> 
> Coly Li

Hi Coly?

Sorry to confuse you. As i known now, write_dirty function write dirty to backing without FUA, and write_dirty_finish make dirty key clean,
it means the data indexed by the key will not be writeback again, am i wrong?
I only find that the backing device will be flushed when bcache get an PREFLUSH bio, any other place  it will be flushed in journal?

I made a test that write bcache with dd, and then detach it, blktrace the cache and backing device at the same time.
1. close writeback
# echo 0 > /sys/block/bcache0/bcache/writeback_running
2. write data with a fdatasync
#dd if=/dev/zero of=/dev/bcache0 bs=16k count=1 oflag=direct
3. detach and trigger writeback
#echo b1f40ca5-37a3-4852-9abf-6abed96d71db >/sys/block/bcache0/bcache/detach

the blow text is blkparse result.
from cache blktrace blow, we can see 16k data write to cache set, and then flush with op FWFSM(PREFLUSH|WRITE|FUA|SYNC|META)
```
  8,160 33        1     0.000000000 222844  A   W 630609920 + 32 <- (8,167) 1464320
  8,167 33        2     0.000000478 222844  Q   W 630609920 + 32 [dd]
  8,167 33        3     0.000006167 222844  G   W 630609920 + 32 [dd]
  8,167 33        5     0.000011385 222844  I   W 630609920 + 32 [dd]
  8,167 33        6     0.000023890   948  D   W 630609920 + 32 [kworker/33:1H]
  8,167 33        7     0.000111203     0  C   W 630609920 + 32 [0]
  8,160 34        1     0.000167029 215616  A FWFSM 629153808 + 8 <- (8,167) 8208
  8,167 34        2     0.000167490 215616  Q FWFSM 629153808 + 8 [kworker/34:2]
  8,167 34        3     0.000169061 215616  G FWFSM 629153808 + 8 [kworker/34:2]
  8,167 34        4     0.000301308   949  D WFSM 629153808 + 8 [kworker/34:1H]
  8,167 34        5     0.000348832     0  C WFSM 629153808 + 8 [0]
  8,167 34        6     0.000349612     0  C WFSM 629153808 [0]
```

from backing blktrace blow, the backing device first get flush op FWS(PERFLUSH|WRITE|SYNC) because of we stop writeback, then get W op after detach,
the 16k data was writeback to backing device, and after this, the backing device never get flush op, it means that the 16k data we write it's not safe
in backing device, even we had write with fdatasync.
```
  8,144 33        1     0.000000000 222844  Q WSM 8 + 8 [dd]
  8,144 33        2     0.000016609 222844  G WSM 8 + 8 [dd]
  8,144 33        5     0.000020710 222844  I WSM 8 + 8 [dd]
  8,144 33        6     0.000031967   948  D WSM 8 + 8 [kworker/33:1H]
  8,144 33        7     0.000152945 88631  C  WS 16 + 32 [0]
  8,144 34        1     0.000186127 215616  Q FWS [kworker/34:2]
  8,144 34        2     0.000187006 215616  G FWS [kworker/34:2]
  8,144 33        8     0.000326761     0  C WSM 8 + 8 [0]
  8,144 34        3     0.020195027     0  C  WS 16 [0]
  8,144 34        4     0.020195904     0  C FWS 16 [0]
  8,144 23        1    19.415130395 215884  Q   W 16 + 32 [kworker/23:2]
  8,144 23        2    19.415132072 215884  G   W 16 + 32 [kworker/23:2]
  8,144 23        3    19.415133134 215884  I   W 16 + 32 [kworker/23:2]
  8,144 23        4    19.415137776  1215  D   W 16 + 32 [kworker/23:1H]
  8,144 23        5    19.416607260     0  C   W 16 + 32 [0]
  8,144 24        1    19.416640754 222593  Q WSM 8 + 8 [bcache_writebac]
  8,144 24        2    19.416642698 222593  G WSM 8 + 8 [bcache_writebac]
  8,144 24        3    19.416643505 222593  I WSM 8 + 8 [bcache_writebac]
  8,144 24        4    19.416650589  1107  D WSM 8 + 8 [kworker/24:1H]
  8,144 24        5    19.416865258     0  C WSM 8 + 8 [0]
  8,144 24        6    19.416871350 221889  Q WSM 8 + 8 [kworker/24:1]
  8,144 24        7    19.416872201 221889  G WSM 8 + 8 [kworker/24:1]
  8,144 24        8    19.416872542 221889  I WSM 8 + 8 [kworker/24:1]
  8,144 24        9    19.416875458  1107  D WSM 8 + 8 [kworker/24:1H]
  8,144 24       10    19.417076935     0  C WSM 8 + 8 [0]
```
> 
> > Signed-off-by: kungf <wings.wyang@gmail.com>
> > ---
> >  drivers/md/bcache/writeback.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
> > index 4a40f9eadeaf..e5cecb60569e 100644
> > --- a/drivers/md/bcache/writeback.c
> > +++ b/drivers/md/bcache/writeback.c
> > @@ -357,7 +357,7 @@ static void write_dirty(struct closure *cl)
> >  	 */
> >  	if (KEY_DIRTY(&w->key)) {
> >  		dirty_init(w);
> > -		bio_set_op_attrs(&io->bio, REQ_OP_WRITE, 0);
> > +		bio_set_op_attrs(&io->bio, REQ_OP_WRITE | REQ_FUA, 0);
> >  		io->bio.bi_iter.bi_sector = KEY_START(&w->key);
> >  		bio_set_dev(&io->bio, io->dc->bdev);
> >  		io->bio.bi_end_io	= dirty_endio;
> > 
> 
-- 
kungf <wings.wyang@gmail.com>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] bcache: add REQ_FUA to avoid data lost in writeback mode
       [not found]   ` <CAMo46+q-usqgwHWhk2mcKMvK9yMY2kfN-t10wyqm+pv8L0HqYA@mail.gmail.com>
@ 2019-12-03 14:09     ` Coly Li
  0 siblings, 0 replies; 8+ messages in thread
From: Coly Li @ 2019-12-03 14:09 UTC (permalink / raw)
  To: kungf; +Cc: kent.overstreet, linux-bcache, linux-kernel

On 2019/12/3 3:16 下午, kungf wrote:
> 
> 
> On Mon, 2 Dec 2019 at 19:09, Coly Li <colyli@suse.de
> <mailto:colyli@suse.de>> wrote:
>>
>> On 2019/12/2 6:24 下午, kungf wrote:
>> > data may lost when in the follow scene of writeback mode:
>> > 1. client write data1 to bcache
>> > 2. client fdatasync
>> > 3. bcache flush cache set and backing device
>> > if now data1 was not writed back to backing, it was only guaranteed
> safe in cache.
>> > 4.then cache writeback data1 to backing with only REQ_OP_WRITE
>> > So data1 was not guaranteed in non-volatile storage,  it may lost if
>  power interruption
>> >
>>
>> Hi,
>>
>> Do you encounter such problem in real work load ? With bcache journal, I
>> don't see the possibility of data lost with your description.
>>
>> Correct me if I am wrong.
>>
>> Coly Li
>>
> Hi Coly,
> 
> Sorry to confuse you.  As i known now, write_dirty function write dirty
> to backing without FUA，and write_dirty_finish make dirty key clean,
> it means the data indexed by the key will not be writeback again, am i
> wrong?

Yes, you are right. This is the behavior as design. We don't guarantee
the data will be on always on platter, this is what most storage systems do.

> I only find that the backing device will be flushed when bcache get an
> PREFLUSH bio, any other place  it will be flushed in journal?
> 

Storage system flushes its buffer when upper layer requires, that means
if the application wants to make its writing data flushed on platter, it
should explicitly issue a flush request.

What you observe and test are all as designed IMHO. The I/O stack does
not guarantee any data persistent on storage media unless an explicit
flush request received from upper layer and returned to upper layer.

Coly Li

> I made a test that write bcache with dd，and then detach it, blktrace
> the cache and backing device at the same time.
> 1. close writeback
> # echo 0 > /sys/block/bcache0/bcache/writeback_running
> 2. write data with a fdatasync
> #dd if=/dev/zero of=/dev/bcache0 bs=16k count=1 oflag=direct
> 3. detach and trigger writeback
> #echo b1f40ca5-37a3-4852-9abf-6abed96d71db >/sys/block/bcache0/bcache/detach
> 
> the blow text is blkparse result.
> from cache blktrace blow, we can see 16k data write to cache set, and
> then flush with op FWFSM (PREFLUSH| WRITE| FUA|SYNC|META )
> ```
>   8,160 33        1     0.000000000 222844  A   W 630609920 + 32 <-
> (8,167) 1464320
>   8,167 33        2     0.000000478 222844  Q   W 630609920 + 32 [dd]
>   8,167 33        3     0.000006167 222844  G   W 630609920 + 32 [dd]
>   8,167 33        5     0.000011385 222844  I   W 630609920 + 32 [dd]
>   8,167 33        6     0.000023890   948  D   W 630609920 + 32
> [kworker/33:1H]
>   8,167 33        7     0.000111203     0  C   W 630609920 + 32 [0]
>   8,160 34        1     0.000167029 215616  A FWFSM 629153808 + 8 <-
> (8,167) 8208
>   8,167 34        2     0.000167490 215616  Q FWFSM 629153808 + 8
> [kworker/34:2]
>   8,167 34        3     0.000169061 215616  G FWFSM 629153808 + 8
> [kworker/34:2]
>   8,167 34        4     0.000301308   949  D WFSM 629153808 + 8
> [kworker/34:1H]
>   8,167 34        5     0.000348832     0  C WFSM 629153808 + 8 [0]
>   8,167 34        6     0.000349612     0  C WFSM 629153808 [0]
> ```
> 
> from backing blktrace blow, the backing device first get flush op FWS
> (PERFLUSH|WRITE|SYNC)  because of we stop writeback, then get W op after
> detach,
> the 16k data was writeback to backing device, and after this, the
> backing device never get flush op, */it means that the 16k data we write
> it's not safe in backing/*
> */device, even we dd write with fdatasync./*
> ```
>   8,144 33        1     0.000000000 222844  Q WSM 8 + 8 [dd]
>   8,144 33        2     0.000016609 222844  G WSM 8 + 8 [dd]
>   8,144 33        5     0.000020710 222844  I WSM 8 + 8 [dd]
>   8,144 33        6     0.000031967   948  D WSM 8 + 8 [kworker/33:1H]
>   8,144 33        7     0.000152945 88631  C  WS 16 + 32 [0]
>   8,144 34        1     0.000186127 215616  Q FWS [kworker/34:2]
>   8,144 34        2     0.000187006 215616  G FWS [kworker/34:2]
>   8,144 33        8     0.000326761     0  C WSM 8 + 8 [0]
>   8,144 34        3     0.020195027     0  C  WS 16 [0]
>   8,144 34        4     0.020195904     0  C FWS 16 [0]
>   8,144 23        1    19.415130395 215884  Q   W 16 + 32 [kworker/23:2]
>   8,144 23        2    19.415132072 215884  G   W 16 + 32 [kworker/23:2]
>   8,144 23        3    19.415133134 215884  I   W 16 + 32 [kworker/23:2]
>   8,144 23        4    19.415137776  1215  D   W 16 + 32 [kworker/23:1H]
>   8,144 23        5    19.416607260     0  C   W 16 + 32 [0]
>   8,144 24        1    19.416640754 222593  Q WSM 8 + 8 [bcache_writebac]
>   8,144 24        2    19.416642698 222593  G WSM 8 + 8 [bcache_writebac]
>   8,144 24        3    19.416643505 222593  I WSM 8 + 8 [bcache_writebac]
>   8,144 24        4    19.416650589  1107  D WSM 8 + 8 [kworker/24:1H]
>   8,144 24        5    19.416865258     0  C WSM 8 + 8 [0]
>   8,144 24        6    19.416871350 221889  Q WSM 8 + 8 [kworker/24:1]
>   8,144 24        7    19.416872201 221889  G WSM 8 + 8 [kworker/24:1]
>   8,144 24        8    19.416872542 221889  I WSM 8 + 8 [kworker/24:1]
>   8,144 24        9    19.416875458  1107  D WSM 8 + 8 [kworker/24:1H]
>   8,144 24       10    19.417076935     0  C WSM 8 + 8 [0]
> ```
> 
> 
> 
> On Mon, 2 Dec 2019 at 19:09, Coly Li <colyli@suse.de
> <mailto:colyli@suse.de>> wrote:
> 
>     On 2019/12/2 6:24 下午, kungf wrote:
>     > data may lost when in the follow scene of writeback mode:
>     > 1. client write data1 to bcache
>     > 2. client fdatasync
>     > 3. bcache flush cache set and backing device
>     > if now data1 was not writed back to backing, it was only
>     guaranteed safe in cache.
>     > 4.then cache writeback data1 to backing with only REQ_OP_WRITE
>     > So data1 was not guaranteed in non-volatile storage,  it may lost
>     if  power interruption 
>     >
> 
>     Hi,
> 
>     Do you encounter such problem in real work load ? With bcache journal, I
>     don't see the possibility of data lost with your description.
> 
>     Correct me if I am wrong.
> 
>     Coly Li
> 
>     > Signed-off-by: kungf <wings.wyang@gmail.com
>     <mailto:wings.wyang@gmail.com>>
>     > ---
>     >  drivers/md/bcache/writeback.c | 2 +-
>     >  1 file changed, 1 insertion(+), 1 deletion(-)
>     >
>     > diff --git a/drivers/md/bcache/writeback.c
>     b/drivers/md/bcache/writeback.c
>     > index 4a40f9eadeaf..e5cecb60569e 100644
>     > --- a/drivers/md/bcache/writeback.c
>     > +++ b/drivers/md/bcache/writeback.c
>     > @@ -357,7 +357,7 @@ static void write_dirty(struct closure *cl)
>     >        */
>     >       if (KEY_DIRTY(&w->key)) {
>     >               dirty_init(w);
>     > -             bio_set_op_attrs(&io->bio, REQ_OP_WRITE, 0);
>     > +             bio_set_op_attrs(&io->bio, REQ_OP_WRITE | REQ_FUA, 0);
>     >               io->bio.bi_iter.bi_sector = KEY_START(&w->key);
>     >               bio_set_dev(&io->bio, io->dc->bdev);
>     >               io->bio.bi_end_io       = dirty_endio;
>     >
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] bcache: add REQ_FUA to avoid data lost in writeback mode
  2019-12-02 19:34   ` Eric Wheeler
@ 2019-12-03 14:21     ` Coly Li
  2019-12-04 10:55       ` Coly Li
  2019-12-06  0:04       ` Eric Wheeler
  0 siblings, 2 replies; 8+ messages in thread
From: Coly Li @ 2019-12-03 14:21 UTC (permalink / raw)
  To: Eric Wheeler; +Cc: kungf, kent.overstreet, linux-bcache, linux-kernel

On 2019/12/3 3:34 上午, Eric Wheeler wrote:
> On Mon, 2 Dec 2019, Coly Li wrote:
>> On 2019/12/2 6:24 下午, kungf wrote:
>>> data may lost when in the follow scene of writeback mode:
>>> 1. client write data1 to bcache
>>> 2. client fdatasync
>>> 3. bcache flush cache set and backing device
>>> if now data1 was not writed back to backing, it was only guaranteed safe in cache.
>>> 4.then cache writeback data1 to backing with only REQ_OP_WRITE
>>> So data1 was not guaranteed in non-volatile storage,  it may lost if  power interruption 
>>>
>>
>> Hi,
>>
>> Do you encounter such problem in real work load ? With bcache journal, I
>> don't see the possibility of data lost with your description.
>>
>> Correct me if I am wrong.
>>
>> Coly Li
> 
> If this does become necessary, then we should have a sysfs or superblock 
> flag to disable FUA for those with RAID BBUs.

Hi Eric,

I doubt it is necessary to add FUA tag for all writeback bios, it is
unnecessary. If power failure happens after dirty data written to
backing device and the bkey turns into clean, a following read request
will go to cache device because the LBA can be indexed no matter it is
dirty or clean. Unless the bkey is invalidated from the B+tree, read
will always go to cache device firstly in writeback mode. If a power
failure happens before the cached bkey turns from dirty to clean, just
an extra writeback bio flushed from cache device to backing device with
identical data. Comparing the FUA tag for all writeback bios (it will be
really slow), the extra writeback IOs after a power failure is more
acceptable to me.

Coly Li

> 
>>> Signed-off-by: kungf <wings.wyang@gmail.com>
>>> ---
>>>  drivers/md/bcache/writeback.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
>>> index 4a40f9eadeaf..e5cecb60569e 100644
>>> --- a/drivers/md/bcache/writeback.c
>>> +++ b/drivers/md/bcache/writeback.c
>>> @@ -357,7 +357,7 @@ static void write_dirty(struct closure *cl)
>>>  	 */
>>>  	if (KEY_DIRTY(&w->key)) {
>>>  		dirty_init(w);
>>> -		bio_set_op_attrs(&io->bio, REQ_OP_WRITE, 0);
>>> +		bio_set_op_attrs(&io->bio, REQ_OP_WRITE | REQ_FUA, 0);
>>>  		io->bio.bi_iter.bi_sector = KEY_START(&w->key);
>>>  		bio_set_dev(&io->bio, io->dc->bdev);
>>>  		io->bio.bi_end_io	= dirty_endio;
>>>
>>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] bcache: add REQ_FUA to avoid data lost in writeback mode
  2019-12-03 14:21     ` Coly Li
@ 2019-12-04 10:55       ` Coly Li
  2019-12-06  0:04       ` Eric Wheeler
  1 sibling, 0 replies; 8+ messages in thread
From: Coly Li @ 2019-12-04 10:55 UTC (permalink / raw)
  To: Eric Wheeler; +Cc: kungf, kent.overstreet, linux-bcache, linux-kernel

On 2019/12/3 10:21 下午, Coly Li wrote:
> On 2019/12/3 3:34 上午, Eric Wheeler wrote:
>> On Mon, 2 Dec 2019, Coly Li wrote:
>>> On 2019/12/2 6:24 下午, kungf wrote:
>>>> data may lost when in the follow scene of writeback mode:
>>>> 1. client write data1 to bcache
>>>> 2. client fdatasync
>>>> 3. bcache flush cache set and backing device
>>>> if now data1 was not writed back to backing, it was only guaranteed safe in cache.
>>>> 4.then cache writeback data1 to backing with only REQ_OP_WRITE
>>>> So data1 was not guaranteed in non-volatile storage,  it may lost if  power interruption 
>>>>
>>>
>>> Hi,
>>>
>>> Do you encounter such problem in real work load ? With bcache journal, I
>>> don't see the possibility of data lost with your description.
>>>
>>> Correct me if I am wrong.
>>>
>>> Coly Li
>>
>> If this does become necessary, then we should have a sysfs or superblock 
>> flag to disable FUA for those with RAID BBUs.
> 
> Hi Eric,
> 
> I doubt it is necessary to add FUA tag for all writeback bios, it is
> unnecessary. If power failure happens after dirty data written to
> backing device and the bkey turns into clean, a following read request
> will go to cache device because the LBA can be indexed no matter it is
> dirty or clean. Unless the bkey is invalidated from the B+tree, read
> will always go to cache device firstly in writeback mode. If a power
> failure happens before the cached bkey turns from dirty to clean, just
> an extra writeback bio flushed from cache device to backing device with
> identical data. Comparing the FUA tag for all writeback bios (it will be
> really slow), the extra writeback IOs after a power failure is more
> acceptable to me.

Hi Eric,

I come to realize what the problem is. Yes there is potential
possibility.With FUA the writeback performance will be very low, it is
quite tricky....

Coly Li

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] bcache: add REQ_FUA to avoid data lost in writeback mode
  2019-12-03 14:21     ` Coly Li
  2019-12-04 10:55       ` Coly Li
@ 2019-12-06  0:04       ` Eric Wheeler
  1 sibling, 0 replies; 8+ messages in thread
From: Eric Wheeler @ 2019-12-06  0:04 UTC (permalink / raw)
  To: Coly Li; +Cc: kungf, kent.overstreet, linux-bcache, linux-kernel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 3573 bytes --]

On Tue, 3 Dec 2019, Coly Li wrote:

> On 2019/12/3 3:34 上午, Eric Wheeler wrote:
> > On Mon, 2 Dec 2019, Coly Li wrote:
> >> On 2019/12/2 6:24 下午, kungf wrote:
> >>> data may lost when in the follow scene of writeback mode:
> >>> 1. client write data1 to bcache
> >>> 2. client fdatasync
> >>> 3. bcache flush cache set and backing device
> >>> if now data1 was not writed back to backing, it was only guaranteed safe in cache.
> >>> 4.then cache writeback data1 to backing with only REQ_OP_WRITE
> >>> So data1 was not guaranteed in non-volatile storage,  it may lost if  power interruption 
> >>>
> >>
> >> Hi,
> >>
> >> Do you encounter such problem in real work load ? With bcache journal, I
> >> don't see the possibility of data lost with your description.
> >>
> >> Correct me if I am wrong.
> >>
> >> Coly Li
> > 
> > If this does become necessary, then we should have a sysfs or superblock 
> > flag to disable FUA for those with RAID BBUs.
> 
> Hi Eric,
> 
> I doubt it is necessary to add FUA tag for all writeback bios, it is
> unnecessary. If power failure happens after dirty data written to
> backing device and the bkey turns into clean, a following read request
> will go to cache device because the LBA can be indexed no matter it is
> dirty or clean. Unless the bkey is invalidated from the B+tree, read
> will always go to cache device firstly in writeback mode. If a power
> failure happens before the cached bkey turns from dirty to clean, just
> an extra writeback bio flushed from cache device to backing device with
> identical data. Comparing the FUA tag for all writeback bios (it will be
> really slow), the extra writeback IOs after a power failure is more
> acceptable to me.

I agree.  FWIW, I just learned about /sys/block/sdX/queue/write_cache from 
Nikos Tsironis <ntsironis@arrikto.com>.  Thus, my flag request for a FUA 
bypass isn't necessary anyway, even if you did want an FUA there, because 
FUAs are stripped when a blockdev is set to "write back" (QUEUE_FLAG_WC).

----------------------------------------------------------------------
This happens in generic_make_request_checks():

              /*
               * Filter flush bio's early so that make_request based
               * drivers without flush support don't have to worry
               * about them.
               */
              if (op_is_flush(bio->bi_opf) &&
                  !test_bit(QUEUE_FLAG_WC, &q->queue_flags)) {
                      bio->bi_opf &= ~(REQ_PREFLUSH | REQ_FUA);
                      if (!nr_sectors) {
                              status = BLK_STS_OK;
                              goto end_io;
                      }
              }
----------------------------------------------------------------------

-Eric

> 
> Coly Li
> 
> > 
> >>> Signed-off-by: kungf <wings.wyang@gmail.com>
> >>> ---
> >>>  drivers/md/bcache/writeback.c | 2 +-
> >>>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>>
> >>> diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
> >>> index 4a40f9eadeaf..e5cecb60569e 100644
> >>> --- a/drivers/md/bcache/writeback.c
> >>> +++ b/drivers/md/bcache/writeback.c
> >>> @@ -357,7 +357,7 @@ static void write_dirty(struct closure *cl)
> >>>  	 */
> >>>  	if (KEY_DIRTY(&w->key)) {
> >>>  		dirty_init(w);
> >>> -		bio_set_op_attrs(&io->bio, REQ_OP_WRITE, 0);
> >>> +		bio_set_op_attrs(&io->bio, REQ_OP_WRITE | REQ_FUA, 0);
> >>>  		io->bio.bi_iter.bi_sector = KEY_START(&w->key);
> >>>  		bio_set_dev(&io->bio, io->dc->bdev);
> >>>  		io->bio.bi_end_io	= dirty_endio;
> >>>
> >>
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-12-06  0:04 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-02 10:24 [PATCH] bcache: add REQ_FUA to avoid data lost in writeback mode kungf
2019-12-02 11:08 ` Coly Li
2019-12-02 19:34   ` Eric Wheeler
2019-12-03 14:21     ` Coly Li
2019-12-04 10:55       ` Coly Li
2019-12-06  0:04       ` Eric Wheeler
2019-12-03 11:39   ` kungf
     [not found]   ` <CAMo46+q-usqgwHWhk2mcKMvK9yMY2kfN-t10wyqm+pv8L0HqYA@mail.gmail.com>
2019-12-03 14:09     ` Coly Li

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.