* [PATCH] f2fs: fix race between write_checkpoint and write_begin
@ 2018-11-12 12:01 Sheng Yong
2018-11-12 15:12 ` Chao Yu
0 siblings, 1 reply; 4+ messages in thread
From: Sheng Yong @ 2018-11-12 12:01 UTC (permalink / raw)
To: yuchao0, jaegeuk; +Cc: linux-f2fs-devel
The following race could lead to inconsistent SIT bitmap:
Task A Task B
====== ======
f2fs_write_checkpoint
block_operations
f2fs_lock_all
down_write(node_change)
down_write(node_write)
... sync ...
up_write(node_change)
f2fs_file_write_iter
set_inode_flag(FI_NO_PREALLOC)
......
f2fs_write_begin(index=0, has inline data)
prepare_write_begin
__do_map_lock(AIO) => down_read(node_change)
f2fs_convert_inline_page => update SIT
__do_map_lock(AIO) => up_read(node_change)
f2fs_flush_sit_entries <= inconsistent SIT
finish write checkpoint
sudden-power-off
If SPO occurs after checkpoint is finished, SIT bitmap will be set
incorrectly. This patch uses node_write to avoid the race condition.
Signed-off-by: Sheng Yong <shengyong1@huawei.com>
---
fs/f2fs/data.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 106f116466bf..dc5b251aee86 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -2355,7 +2355,9 @@ static int prepare_write_begin(struct f2fs_sb_info *sbi,
if (inode->i_nlink)
set_inline_node(ipage);
} else {
+ down_read(&sbi->node_write);
err = f2fs_convert_inline_page(&dn, page);
+ up_read(&sbi->node_write);
if (err)
goto out;
if (dn.data_blkaddr == NULL_ADDR)
--
2.17.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] f2fs: fix race between write_checkpoint and write_begin
2018-11-12 12:01 [PATCH] f2fs: fix race between write_checkpoint and write_begin Sheng Yong
@ 2018-11-12 15:12 ` Chao Yu
2018-11-13 0:34 ` Jaegeuk Kim
0 siblings, 1 reply; 4+ messages in thread
From: Chao Yu @ 2018-11-12 15:12 UTC (permalink / raw)
To: Sheng Yong, yuchao0, jaegeuk; +Cc: linux-f2fs-devel
On 2018-11-12 20:01, Sheng Yong wrote:
> The following race could lead to inconsistent SIT bitmap:
>
> Task A Task B
> ====== ======
> f2fs_write_checkpoint
> block_operations
> f2fs_lock_all
> down_write(node_change)
> down_write(node_write)
> ... sync ...
> up_write(node_change)
> f2fs_file_write_iter
> set_inode_flag(FI_NO_PREALLOC)
> ......
> f2fs_write_begin(index=0, has inline data)
> prepare_write_begin
> __do_map_lock(AIO) => down_read(node_change)
> f2fs_convert_inline_page => update SIT
> __do_map_lock(AIO) => up_read(node_change)
> f2fs_flush_sit_entries <= inconsistent SIT
> finish write checkpoint
> sudden-power-off
>
> If SPO occurs after checkpoint is finished, SIT bitmap will be set
> incorrectly. This patch uses node_write to avoid the race condition.
Good catch!
>
> Signed-off-by: Sheng Yong <shengyong1@huawei.com>
> ---
> fs/f2fs/data.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> index 106f116466bf..dc5b251aee86 100644
> --- a/fs/f2fs/data.c
> +++ b/fs/f2fs/data.c
> @@ -2355,7 +2355,9 @@ static int prepare_write_begin(struct f2fs_sb_info *sbi,
> if (inode->i_nlink)
> set_inline_node(ipage);
> } else {
> + down_read(&sbi->node_write);
This makes lock dependence more complicated, how about calling f2fs_lock_op for
inline data conversion case:
struct extent_info ei = {0,0,0};
+ int flag;
int err = 0;
- if (f2fs_has_inline_data(inode) ||
- (pos & PAGE_MASK) >= i_size_read(inode)) {
- __do_map_lock(sbi, F2FS_GET_BLOCK_PRE_AIO, true);
+ if (f2fs_has_inline_data(inode)) {
+ if (pos + len > MAX_INLINE_DATA(inode))
+ flag = F2FS_GET_BLOCK_PRE_DIO;
+ else if (pos & PAGE_MASK) >= i_size_read(inode))
+ flag = F2FS_GET_BLOCK_PRE_AIO;
+ __do_map_lock(sbi, flag, true);
if (err || dn.data_blkaddr == NULL_ADDR) {
f2fs_put_dnode(&dn);
- __do_map_lock(sbi, F2FS_GET_BLOCK_PRE_AIO,
- true);
+ flag = F2FS_GET_BLOCK_PRE_AIO;
+ __do_map_lock(sbi, flag, true);
locked = true;
unlock_out:
if (locked)
- __do_map_lock(sbi, F2FS_GET_BLOCK_PRE_AIO, false);
+ __do_map_lock(sbi, flag, false);
Thanks,
> err = f2fs_convert_inline_page(&dn, page);
> + up_read(&sbi->node_write);
> if (err)
> goto out;
> if (dn.data_blkaddr == NULL_ADDR)
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] f2fs: fix race between write_checkpoint and write_begin
2018-11-12 15:12 ` Chao Yu
@ 2018-11-13 0:34 ` Jaegeuk Kim
2018-11-13 3:42 ` Chao Yu
0 siblings, 1 reply; 4+ messages in thread
From: Jaegeuk Kim @ 2018-11-13 0:34 UTC (permalink / raw)
To: Chao Yu; +Cc: linux-f2fs-devel
On 11/12, Chao Yu wrote:
> On 2018-11-12 20:01, Sheng Yong wrote:
> > The following race could lead to inconsistent SIT bitmap:
> >
> > Task A Task B
> > ====== ======
> > f2fs_write_checkpoint
> > block_operations
> > f2fs_lock_all
> > down_write(node_change)
> > down_write(node_write)
> > ... sync ...
> > up_write(node_change)
> > f2fs_file_write_iter
> > set_inode_flag(FI_NO_PREALLOC)
> > ......
> > f2fs_write_begin(index=0, has inline data)
> > prepare_write_begin
> > __do_map_lock(AIO) => down_read(node_change)
> > f2fs_convert_inline_page => update SIT
> > __do_map_lock(AIO) => up_read(node_change)
> > f2fs_flush_sit_entries <= inconsistent SIT
> > finish write checkpoint
> > sudden-power-off
> >
> > If SPO occurs after checkpoint is finished, SIT bitmap will be set
> > incorrectly. This patch uses node_write to avoid the race condition.
>
> Good catch!
>
> >
> > Signed-off-by: Sheng Yong <shengyong1@huawei.com>
> > ---
> > fs/f2fs/data.c | 2 ++
> > 1 file changed, 2 insertions(+)
> >
> > diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> > index 106f116466bf..dc5b251aee86 100644
> > --- a/fs/f2fs/data.c
> > +++ b/fs/f2fs/data.c
> > @@ -2355,7 +2355,9 @@ static int prepare_write_begin(struct f2fs_sb_info *sbi,
> > if (inode->i_nlink)
> > set_inline_node(ipage);
> > } else {
> > + down_read(&sbi->node_write);
>
> This makes lock dependence more complicated, how about calling f2fs_lock_op for
> inline data conversion case:
>
> struct extent_info ei = {0,0,0};
> + int flag;
> int err = 0;
>
> - if (f2fs_has_inline_data(inode) ||
> - (pos & PAGE_MASK) >= i_size_read(inode)) {
> - __do_map_lock(sbi, F2FS_GET_BLOCK_PRE_AIO, true);
> + if (f2fs_has_inline_data(inode)) {
> + if (pos + len > MAX_INLINE_DATA(inode))
> + flag = F2FS_GET_BLOCK_PRE_DIO;
This is not DIO case. I think calling f2fs_lock_op() explicitly might be better.
> + else if (pos & PAGE_MASK) >= i_size_read(inode))
> + flag = F2FS_GET_BLOCK_PRE_AIO;
> + __do_map_lock(sbi, flag, true);
>
> if (err || dn.data_blkaddr == NULL_ADDR) {
> f2fs_put_dnode(&dn);
> - __do_map_lock(sbi, F2FS_GET_BLOCK_PRE_AIO,
> - true);
> + flag = F2FS_GET_BLOCK_PRE_AIO;
> + __do_map_lock(sbi, flag, true);
> locked = true;
>
> unlock_out:
> if (locked)
> - __do_map_lock(sbi, F2FS_GET_BLOCK_PRE_AIO, false);
> + __do_map_lock(sbi, flag, false);
>
> Thanks,
>
> > err = f2fs_convert_inline_page(&dn, page);
> > + up_read(&sbi->node_write);
> > if (err)
> > goto out;
> > if (dn.data_blkaddr == NULL_ADDR)
> >
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] f2fs: fix race between write_checkpoint and write_begin
2018-11-13 0:34 ` Jaegeuk Kim
@ 2018-11-13 3:42 ` Chao Yu
0 siblings, 0 replies; 4+ messages in thread
From: Chao Yu @ 2018-11-13 3:42 UTC (permalink / raw)
To: Jaegeuk Kim, Chao Yu; +Cc: linux-f2fs-devel
On 2018/11/13 8:34, Jaegeuk Kim wrote:
> On 11/12, Chao Yu wrote:
>> On 2018-11-12 20:01, Sheng Yong wrote:
>>> The following race could lead to inconsistent SIT bitmap:
>>>
>>> Task A Task B
>>> ====== ======
>>> f2fs_write_checkpoint
>>> block_operations
>>> f2fs_lock_all
>>> down_write(node_change)
>>> down_write(node_write)
>>> ... sync ...
>>> up_write(node_change)
>>> f2fs_file_write_iter
>>> set_inode_flag(FI_NO_PREALLOC)
>>> ......
>>> f2fs_write_begin(index=0, has inline data)
>>> prepare_write_begin
>>> __do_map_lock(AIO) => down_read(node_change)
>>> f2fs_convert_inline_page => update SIT
>>> __do_map_lock(AIO) => up_read(node_change)
>>> f2fs_flush_sit_entries <= inconsistent SIT
>>> finish write checkpoint
>>> sudden-power-off
>>>
>>> If SPO occurs after checkpoint is finished, SIT bitmap will be set
>>> incorrectly. This patch uses node_write to avoid the race condition.
>>
>> Good catch!
>>
>>>
>>> Signed-off-by: Sheng Yong <shengyong1@huawei.com>
>>> ---
>>> fs/f2fs/data.c | 2 ++
>>> 1 file changed, 2 insertions(+)
>>>
>>> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
>>> index 106f116466bf..dc5b251aee86 100644
>>> --- a/fs/f2fs/data.c
>>> +++ b/fs/f2fs/data.c
>>> @@ -2355,7 +2355,9 @@ static int prepare_write_begin(struct f2fs_sb_info *sbi,
>>> if (inode->i_nlink)
>>> set_inline_node(ipage);
>>> } else {
>>> + down_read(&sbi->node_write);
>>
>> This makes lock dependence more complicated, how about calling f2fs_lock_op for
>> inline data conversion case:
>>
>> struct extent_info ei = {0,0,0};
>> + int flag;
>> int err = 0;
>>
>> - if (f2fs_has_inline_data(inode) ||
>> - (pos & PAGE_MASK) >= i_size_read(inode)) {
>> - __do_map_lock(sbi, F2FS_GET_BLOCK_PRE_AIO, true);
>> + if (f2fs_has_inline_data(inode)) {
>> + if (pos + len > MAX_INLINE_DATA(inode))
>> + flag = F2FS_GET_BLOCK_PRE_DIO;
>
> This is not DIO case. I think calling f2fs_lock_op() explicitly might be better.
Agreed.
>
>
>> + else if (pos & PAGE_MASK) >= i_size_read(inode))
>> + flag = F2FS_GET_BLOCK_PRE_AIO;
Sorry, there is a missing case that @flag doesn't be initialized, please
note that, Sheng Yong.
Thanks,
>> + __do_map_lock(sbi, flag, true);
>>
>> if (err || dn.data_blkaddr == NULL_ADDR) {
>> f2fs_put_dnode(&dn);
>> - __do_map_lock(sbi, F2FS_GET_BLOCK_PRE_AIO,
>> - true);
>> + flag = F2FS_GET_BLOCK_PRE_AIO;
>> + __do_map_lock(sbi, flag, true);
>> locked = true;
>>
>> unlock_out:
>> if (locked)
>> - __do_map_lock(sbi, F2FS_GET_BLOCK_PRE_AIO, false);
>> + __do_map_lock(sbi, flag, false);
>>
>> Thanks,
>>
>>> err = f2fs_convert_inline_page(&dn, page);
>>> + up_read(&sbi->node_write);
>>> if (err)
>>> goto out;
>>> if (dn.data_blkaddr == NULL_ADDR)
>>>
>
> .
>
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2018-11-13 3:42 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-12 12:01 [PATCH] f2fs: fix race between write_checkpoint and write_begin Sheng Yong
2018-11-12 15:12 ` Chao Yu
2018-11-13 0:34 ` Jaegeuk Kim
2018-11-13 3:42 ` Chao Yu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).