linux-f2fs-devel.lists.sourceforge.net archive mirror
 help / color / mirror / Atom feed
* [PATCH] f2fs: fix race between write_checkpoint and write_begin
@ 2018-11-12 12:01 Sheng Yong
  2018-11-12 15:12 ` Chao Yu
  0 siblings, 1 reply; 4+ messages in thread
From: Sheng Yong @ 2018-11-12 12:01 UTC (permalink / raw)
  To: yuchao0, jaegeuk; +Cc: linux-f2fs-devel

The following race could lead to inconsistent SIT bitmap:

Task A                          Task B
======                          ======
f2fs_write_checkpoint
  block_operations
    f2fs_lock_all
      down_write(node_change)
      down_write(node_write)
      ... sync ...
      up_write(node_change)
                                f2fs_file_write_iter
                                  set_inode_flag(FI_NO_PREALLOC)
                                  ......
                                  f2fs_write_begin(index=0, has inline data)
                                    prepare_write_begin
                                      __do_map_lock(AIO) => down_read(node_change)
                                      f2fs_convert_inline_page => update SIT
                                      __do_map_lock(AIO) => up_read(node_change)
  f2fs_flush_sit_entries <= inconsistent SIT
  finish write checkpoint
  sudden-power-off

If SPO occurs after checkpoint is finished, SIT bitmap will be set
incorrectly. This patch uses node_write to avoid the race condition.

Signed-off-by: Sheng Yong <shengyong1@huawei.com>
---
 fs/f2fs/data.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 106f116466bf..dc5b251aee86 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -2355,7 +2355,9 @@ static int prepare_write_begin(struct f2fs_sb_info *sbi,
 			if (inode->i_nlink)
 				set_inline_node(ipage);
 		} else {
+			down_read(&sbi->node_write);
 			err = f2fs_convert_inline_page(&dn, page);
+			up_read(&sbi->node_write);
 			if (err)
 				goto out;
 			if (dn.data_blkaddr == NULL_ADDR)
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] f2fs: fix race between write_checkpoint and write_begin
  2018-11-12 12:01 [PATCH] f2fs: fix race between write_checkpoint and write_begin Sheng Yong
@ 2018-11-12 15:12 ` Chao Yu
  2018-11-13  0:34   ` Jaegeuk Kim
  0 siblings, 1 reply; 4+ messages in thread
From: Chao Yu @ 2018-11-12 15:12 UTC (permalink / raw)
  To: Sheng Yong, yuchao0, jaegeuk; +Cc: linux-f2fs-devel

On 2018-11-12 20:01, Sheng Yong wrote:
> The following race could lead to inconsistent SIT bitmap:
> 
> Task A                          Task B
> ======                          ======
> f2fs_write_checkpoint
>   block_operations
>     f2fs_lock_all
>       down_write(node_change)
>       down_write(node_write)
>       ... sync ...
>       up_write(node_change)
>                                 f2fs_file_write_iter
>                                   set_inode_flag(FI_NO_PREALLOC)
>                                   ......
>                                   f2fs_write_begin(index=0, has inline data)
>                                     prepare_write_begin
>                                       __do_map_lock(AIO) => down_read(node_change)
>                                       f2fs_convert_inline_page => update SIT
>                                       __do_map_lock(AIO) => up_read(node_change)
>   f2fs_flush_sit_entries <= inconsistent SIT
>   finish write checkpoint
>   sudden-power-off
> 
> If SPO occurs after checkpoint is finished, SIT bitmap will be set
> incorrectly. This patch uses node_write to avoid the race condition.

Good catch!

> 
> Signed-off-by: Sheng Yong <shengyong1@huawei.com>
> ---
>  fs/f2fs/data.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> index 106f116466bf..dc5b251aee86 100644
> --- a/fs/f2fs/data.c
> +++ b/fs/f2fs/data.c
> @@ -2355,7 +2355,9 @@ static int prepare_write_begin(struct f2fs_sb_info *sbi,
>  			if (inode->i_nlink)
>  				set_inline_node(ipage);
>  		} else {
> +			down_read(&sbi->node_write);

This makes lock dependence more complicated, how about calling f2fs_lock_op for
inline data conversion case:

        struct extent_info ei = {0,0,0};
+       int flag;
        int err = 0;

-       if (f2fs_has_inline_data(inode) ||
-                       (pos & PAGE_MASK) >= i_size_read(inode)) {
-               __do_map_lock(sbi, F2FS_GET_BLOCK_PRE_AIO, true);
+       if (f2fs_has_inline_data(inode)) {
+               if (pos + len > MAX_INLINE_DATA(inode))
+                       flag = F2FS_GET_BLOCK_PRE_DIO;
+               else if (pos & PAGE_MASK) >= i_size_read(inode))
+                       flag = F2FS_GET_BLOCK_PRE_AIO;
+               __do_map_lock(sbi, flag, true);

                        if (err || dn.data_blkaddr == NULL_ADDR) {
                                f2fs_put_dnode(&dn);
-                               __do_map_lock(sbi, F2FS_GET_BLOCK_PRE_AIO,
-                                                               true);
+                               flag = F2FS_GET_BLOCK_PRE_AIO;
+                               __do_map_lock(sbi, flag, true);
                                locked = true;

 unlock_out:
        if (locked)
-               __do_map_lock(sbi, F2FS_GET_BLOCK_PRE_AIO, false);
+               __do_map_lock(sbi, flag, false);

Thanks,

>  			err = f2fs_convert_inline_page(&dn, page);
> +			up_read(&sbi->node_write);
>  			if (err)
>  				goto out;
>  			if (dn.data_blkaddr == NULL_ADDR)
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] f2fs: fix race between write_checkpoint and write_begin
  2018-11-12 15:12 ` Chao Yu
@ 2018-11-13  0:34   ` Jaegeuk Kim
  2018-11-13  3:42     ` Chao Yu
  0 siblings, 1 reply; 4+ messages in thread
From: Jaegeuk Kim @ 2018-11-13  0:34 UTC (permalink / raw)
  To: Chao Yu; +Cc: linux-f2fs-devel

On 11/12, Chao Yu wrote:
> On 2018-11-12 20:01, Sheng Yong wrote:
> > The following race could lead to inconsistent SIT bitmap:
> > 
> > Task A                          Task B
> > ======                          ======
> > f2fs_write_checkpoint
> >   block_operations
> >     f2fs_lock_all
> >       down_write(node_change)
> >       down_write(node_write)
> >       ... sync ...
> >       up_write(node_change)
> >                                 f2fs_file_write_iter
> >                                   set_inode_flag(FI_NO_PREALLOC)
> >                                   ......
> >                                   f2fs_write_begin(index=0, has inline data)
> >                                     prepare_write_begin
> >                                       __do_map_lock(AIO) => down_read(node_change)
> >                                       f2fs_convert_inline_page => update SIT
> >                                       __do_map_lock(AIO) => up_read(node_change)
> >   f2fs_flush_sit_entries <= inconsistent SIT
> >   finish write checkpoint
> >   sudden-power-off
> > 
> > If SPO occurs after checkpoint is finished, SIT bitmap will be set
> > incorrectly. This patch uses node_write to avoid the race condition.
> 
> Good catch!
> 
> > 
> > Signed-off-by: Sheng Yong <shengyong1@huawei.com>
> > ---
> >  fs/f2fs/data.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> > index 106f116466bf..dc5b251aee86 100644
> > --- a/fs/f2fs/data.c
> > +++ b/fs/f2fs/data.c
> > @@ -2355,7 +2355,9 @@ static int prepare_write_begin(struct f2fs_sb_info *sbi,
> >  			if (inode->i_nlink)
> >  				set_inline_node(ipage);
> >  		} else {
> > +			down_read(&sbi->node_write);
> 
> This makes lock dependence more complicated, how about calling f2fs_lock_op for
> inline data conversion case:
> 
>         struct extent_info ei = {0,0,0};
> +       int flag;
>         int err = 0;
> 
> -       if (f2fs_has_inline_data(inode) ||
> -                       (pos & PAGE_MASK) >= i_size_read(inode)) {
> -               __do_map_lock(sbi, F2FS_GET_BLOCK_PRE_AIO, true);
> +       if (f2fs_has_inline_data(inode)) {
> +               if (pos + len > MAX_INLINE_DATA(inode))
> +                       flag = F2FS_GET_BLOCK_PRE_DIO;

This is not DIO case. I think calling f2fs_lock_op() explicitly might be better.


> +               else if (pos & PAGE_MASK) >= i_size_read(inode))
> +                       flag = F2FS_GET_BLOCK_PRE_AIO;
> +               __do_map_lock(sbi, flag, true);
> 
>                         if (err || dn.data_blkaddr == NULL_ADDR) {
>                                 f2fs_put_dnode(&dn);
> -                               __do_map_lock(sbi, F2FS_GET_BLOCK_PRE_AIO,
> -                                                               true);
> +                               flag = F2FS_GET_BLOCK_PRE_AIO;
> +                               __do_map_lock(sbi, flag, true);
>                                 locked = true;
> 
>  unlock_out:
>         if (locked)
> -               __do_map_lock(sbi, F2FS_GET_BLOCK_PRE_AIO, false);
> +               __do_map_lock(sbi, flag, false);
> 
> Thanks,
> 
> >  			err = f2fs_convert_inline_page(&dn, page);
> > +			up_read(&sbi->node_write);
> >  			if (err)
> >  				goto out;
> >  			if (dn.data_blkaddr == NULL_ADDR)
> > 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] f2fs: fix race between write_checkpoint and write_begin
  2018-11-13  0:34   ` Jaegeuk Kim
@ 2018-11-13  3:42     ` Chao Yu
  0 siblings, 0 replies; 4+ messages in thread
From: Chao Yu @ 2018-11-13  3:42 UTC (permalink / raw)
  To: Jaegeuk Kim, Chao Yu; +Cc: linux-f2fs-devel

On 2018/11/13 8:34, Jaegeuk Kim wrote:
> On 11/12, Chao Yu wrote:
>> On 2018-11-12 20:01, Sheng Yong wrote:
>>> The following race could lead to inconsistent SIT bitmap:
>>>
>>> Task A                          Task B
>>> ======                          ======
>>> f2fs_write_checkpoint
>>>   block_operations
>>>     f2fs_lock_all
>>>       down_write(node_change)
>>>       down_write(node_write)
>>>       ... sync ...
>>>       up_write(node_change)
>>>                                 f2fs_file_write_iter
>>>                                   set_inode_flag(FI_NO_PREALLOC)
>>>                                   ......
>>>                                   f2fs_write_begin(index=0, has inline data)
>>>                                     prepare_write_begin
>>>                                       __do_map_lock(AIO) => down_read(node_change)
>>>                                       f2fs_convert_inline_page => update SIT
>>>                                       __do_map_lock(AIO) => up_read(node_change)
>>>   f2fs_flush_sit_entries <= inconsistent SIT
>>>   finish write checkpoint
>>>   sudden-power-off
>>>
>>> If SPO occurs after checkpoint is finished, SIT bitmap will be set
>>> incorrectly. This patch uses node_write to avoid the race condition.
>>
>> Good catch!
>>
>>>
>>> Signed-off-by: Sheng Yong <shengyong1@huawei.com>
>>> ---
>>>  fs/f2fs/data.c | 2 ++
>>>  1 file changed, 2 insertions(+)
>>>
>>> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
>>> index 106f116466bf..dc5b251aee86 100644
>>> --- a/fs/f2fs/data.c
>>> +++ b/fs/f2fs/data.c
>>> @@ -2355,7 +2355,9 @@ static int prepare_write_begin(struct f2fs_sb_info *sbi,
>>>  			if (inode->i_nlink)
>>>  				set_inline_node(ipage);
>>>  		} else {
>>> +			down_read(&sbi->node_write);
>>
>> This makes lock dependence more complicated, how about calling f2fs_lock_op for
>> inline data conversion case:
>>
>>         struct extent_info ei = {0,0,0};
>> +       int flag;
>>         int err = 0;
>>
>> -       if (f2fs_has_inline_data(inode) ||
>> -                       (pos & PAGE_MASK) >= i_size_read(inode)) {
>> -               __do_map_lock(sbi, F2FS_GET_BLOCK_PRE_AIO, true);
>> +       if (f2fs_has_inline_data(inode)) {
>> +               if (pos + len > MAX_INLINE_DATA(inode))
>> +                       flag = F2FS_GET_BLOCK_PRE_DIO;
> 
> This is not DIO case. I think calling f2fs_lock_op() explicitly might be better.

Agreed.

> 
> 
>> +               else if (pos & PAGE_MASK) >= i_size_read(inode))
>> +                       flag = F2FS_GET_BLOCK_PRE_AIO;

Sorry, there is a missing case that @flag doesn't be initialized, please
note that, Sheng Yong.

Thanks,

>> +               __do_map_lock(sbi, flag, true);
>>
>>                         if (err || dn.data_blkaddr == NULL_ADDR) {
>>                                 f2fs_put_dnode(&dn);
>> -                               __do_map_lock(sbi, F2FS_GET_BLOCK_PRE_AIO,
>> -                                                               true);
>> +                               flag = F2FS_GET_BLOCK_PRE_AIO;
>> +                               __do_map_lock(sbi, flag, true);
>>                                 locked = true;
>>
>>  unlock_out:
>>         if (locked)
>> -               __do_map_lock(sbi, F2FS_GET_BLOCK_PRE_AIO, false);
>> +               __do_map_lock(sbi, flag, false);
>>
>> Thanks,
>>
>>>  			err = f2fs_convert_inline_page(&dn, page);
>>> +			up_read(&sbi->node_write);
>>>  			if (err)
>>>  				goto out;
>>>  			if (dn.data_blkaddr == NULL_ADDR)
>>>
> 
> .
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-11-13  3:42 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-12 12:01 [PATCH] f2fs: fix race between write_checkpoint and write_begin Sheng Yong
2018-11-12 15:12 ` Chao Yu
2018-11-13  0:34   ` Jaegeuk Kim
2018-11-13  3:42     ` Chao Yu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).