Linux-f2fs-devel Archive on lore.kernel.org
 help / color / Atom feed
* [f2fs-dev] [PATCH V4] f2fs: Avoid double lock for cp_rwsem during checkpoint
@ 2020-04-30 10:58 Sayali Lokhande
  2020-05-06  6:21 ` Chao Yu
  2020-05-08 16:10 ` Jaegeuk Kim
  0 siblings, 2 replies; 10+ messages in thread
From: Sayali Lokhande @ 2020-04-30 10:58 UTC (permalink / raw)
  To: jaegeuk, yuchao0, linux-f2fs-devel; +Cc: linux-kernel

There could be a scenario where f2fs_sync_node_pages gets
called during checkpoint, which in turn tries to flush
inline data and calls iput(). This results in deadlock as
iput() tries to hold cp_rwsem, which is already held at the
beginning by checkpoint->block_operations().

Call stack :

Thread A		Thread B
f2fs_write_checkpoint()
- block_operations(sbi)
 - f2fs_lock_all(sbi);
  - down_write(&sbi->cp_rwsem);

                        - open()
                         - igrab()
                        - write() write inline data
                        - unlink()
- f2fs_sync_node_pages()
 - if (is_inline_node(page))
  - flush_inline_data()
   - ilookup()
     page = f2fs_pagecache_get_page()
     if (!page)
      goto iput_out;
     iput_out:
			-close()
			-iput()
       iput(inode);
       - f2fs_evict_inode()
        - f2fs_truncate_blocks()
         - f2fs_lock_op()
           - down_read(&sbi->cp_rwsem);

Fixes: 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data")
Signed-off-by: Sayali Lokhande <sayalil@codeaurora.org>
---
 fs/f2fs/checkpoint.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index 5ba649e..97b6378 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -1219,21 +1219,19 @@ static int block_operations(struct f2fs_sb_info *sbi)
 		goto retry_flush_quotas;
 	}
 
-retry_flush_nodes:
 	down_write(&sbi->node_write);
 
 	if (get_pages(sbi, F2FS_DIRTY_NODES)) {
 		up_write(&sbi->node_write);
+		up_write(&sbi->node_change);
+		f2fs_unlock_all(sbi);
 		atomic_inc(&sbi->wb_sync_req[NODE]);
 		err = f2fs_sync_node_pages(sbi, &wbc, false, FS_CP_NODE_IO);
 		atomic_dec(&sbi->wb_sync_req[NODE]);
-		if (err) {
-			up_write(&sbi->node_change);
-			f2fs_unlock_all(sbi);
+		if (err)
 			goto out;
-		}
 		cond_resched();
-		goto retry_flush_nodes;
+		goto retry_flush_quotas;
 	}
 
 	/*
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [f2fs-dev] [PATCH V4] f2fs: Avoid double lock for cp_rwsem during checkpoint
  2020-04-30 10:58 [f2fs-dev] [PATCH V4] f2fs: Avoid double lock for cp_rwsem during checkpoint Sayali Lokhande
@ 2020-05-06  6:21 ` Chao Yu
  2020-05-08 16:10 ` Jaegeuk Kim
  1 sibling, 0 replies; 10+ messages in thread
From: Chao Yu @ 2020-05-06  6:21 UTC (permalink / raw)
  To: Sayali Lokhande, jaegeuk, linux-f2fs-devel; +Cc: linux-kernel

On 2020/4/30 18:58, Sayali Lokhande wrote:
> There could be a scenario where f2fs_sync_node_pages gets
> called during checkpoint, which in turn tries to flush
> inline data and calls iput(). This results in deadlock as
> iput() tries to hold cp_rwsem, which is already held at the
> beginning by checkpoint->block_operations().
> 
> Call stack :
> 
> Thread A		Thread B
> f2fs_write_checkpoint()
> - block_operations(sbi)
>  - f2fs_lock_all(sbi);
>   - down_write(&sbi->cp_rwsem);
> 
>                         - open()
>                          - igrab()
>                         - write() write inline data
>                         - unlink()
> - f2fs_sync_node_pages()
>  - if (is_inline_node(page))
>   - flush_inline_data()
>    - ilookup()
>      page = f2fs_pagecache_get_page()
>      if (!page)
>       goto iput_out;
>      iput_out:
> 			-close()
> 			-iput()
>        iput(inode);
>        - f2fs_evict_inode()
>         - f2fs_truncate_blocks()
>          - f2fs_lock_op()
>            - down_read(&sbi->cp_rwsem);
> 
> Fixes: 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data")
> Signed-off-by: Sayali Lokhande <sayalil@codeaurora.org>

Reviewed-by: Chao Yu <yuchao0@huawei.com>

Thanks,


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [f2fs-dev] [PATCH V4] f2fs: Avoid double lock for cp_rwsem during checkpoint
  2020-04-30 10:58 [f2fs-dev] [PATCH V4] f2fs: Avoid double lock for cp_rwsem during checkpoint Sayali Lokhande
  2020-05-06  6:21 ` Chao Yu
@ 2020-05-08 16:10 ` Jaegeuk Kim
  2020-05-09  3:02   ` Chao Yu
  1 sibling, 1 reply; 10+ messages in thread
From: Jaegeuk Kim @ 2020-05-08 16:10 UTC (permalink / raw)
  To: Sayali Lokhande; +Cc: linux-kernel, linux-f2fs-devel

Hi Sayali,

In order to address the perf regression, how about this?

From 48418af635884803ffb35972df7958a2e6649322 Mon Sep 17 00:00:00 2001
From: Jaegeuk Kim <jaegeuk@kernel.org>
Date: Fri, 8 May 2020 09:08:37 -0700
Subject: [PATCH] f2fs: avoid double lock for cp_rwsem during checkpoint

There could be a scenario where f2fs_sync_node_pages gets
called during checkpoint, which in turn tries to flush
inline data and calls iput(). This results in deadlock as
iput() tries to hold cp_rwsem, which is already held at the
beginning by checkpoint->block_operations().

Call stack :

Thread A		Thread B
f2fs_write_checkpoint()
- block_operations(sbi)
 - f2fs_lock_all(sbi);
  - down_write(&sbi->cp_rwsem);

                        - open()
                         - igrab()
                        - write() write inline data
                        - unlink()
- f2fs_sync_node_pages()
 - if (is_inline_node(page))
  - flush_inline_data()
   - ilookup()
     page = f2fs_pagecache_get_page()
     if (!page)
      goto iput_out;
     iput_out:
			-close()
			-iput()
       iput(inode);
       - f2fs_evict_inode()
        - f2fs_truncate_blocks()
         - f2fs_lock_op()
           - down_read(&sbi->cp_rwsem);

Fixes: 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data")
Signed-off-by: Sayali Lokhande <sayalil@codeaurora.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
---
 fs/f2fs/node.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index 1db8cabf727ef..626d7daca09de 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -1870,8 +1870,8 @@ int f2fs_sync_node_pages(struct f2fs_sb_info *sbi,
 				goto continue_unlock;
 			}
 
-			/* flush inline_data */
-			if (is_inline_node(page)) {
+			/* flush inline_data, if it's not sync path. */
+			if (do_balance && is_inline_node(page)) {
 				clear_inline_node(page);
 				unlock_page(page);
 				flush_inline_data(sbi, ino_of_node(page));
-- 
2.26.2.645.ge9eca65c58-goog



_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [f2fs-dev] [PATCH V4] f2fs: Avoid double lock for cp_rwsem during checkpoint
  2020-05-08 16:10 ` Jaegeuk Kim
@ 2020-05-09  3:02   ` Chao Yu
  2020-05-09 19:03     ` Jaegeuk Kim
  0 siblings, 1 reply; 10+ messages in thread
From: Chao Yu @ 2020-05-09  3:02 UTC (permalink / raw)
  To: Jaegeuk Kim, Sayali Lokhande; +Cc: linux-kernel, linux-f2fs-devel

On 2020/5/9 0:10, Jaegeuk Kim wrote:
> Hi Sayali,
> 
> In order to address the perf regression, how about this?
> 
>>From 48418af635884803ffb35972df7958a2e6649322 Mon Sep 17 00:00:00 2001
> From: Jaegeuk Kim <jaegeuk@kernel.org>
> Date: Fri, 8 May 2020 09:08:37 -0700
> Subject: [PATCH] f2fs: avoid double lock for cp_rwsem during checkpoint
> 
> There could be a scenario where f2fs_sync_node_pages gets
> called during checkpoint, which in turn tries to flush
> inline data and calls iput(). This results in deadlock as
> iput() tries to hold cp_rwsem, which is already held at the
> beginning by checkpoint->block_operations().
> 
> Call stack :
> 
> Thread A		Thread B
> f2fs_write_checkpoint()
> - block_operations(sbi)
>  - f2fs_lock_all(sbi);
>   - down_write(&sbi->cp_rwsem);
> 
>                         - open()
>                          - igrab()
>                         - write() write inline data
>                         - unlink()
> - f2fs_sync_node_pages()
>  - if (is_inline_node(page))
>   - flush_inline_data()
>    - ilookup()
>      page = f2fs_pagecache_get_page()
>      if (!page)
>       goto iput_out;
>      iput_out:
> 			-close()
> 			-iput()
>        iput(inode);
>        - f2fs_evict_inode()
>         - f2fs_truncate_blocks()
>          - f2fs_lock_op()
>            - down_read(&sbi->cp_rwsem);
> 
> Fixes: 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data")
> Signed-off-by: Sayali Lokhande <sayalil@codeaurora.org>
> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> ---
>  fs/f2fs/node.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
> index 1db8cabf727ef..626d7daca09de 100644
> --- a/fs/f2fs/node.c
> +++ b/fs/f2fs/node.c
> @@ -1870,8 +1870,8 @@ int f2fs_sync_node_pages(struct f2fs_sb_info *sbi,
>  				goto continue_unlock;
>  			}
>  
> -			/* flush inline_data */
> -			if (is_inline_node(page)) {
> +			/* flush inline_data, if it's not sync path. */
> +			if (do_balance && is_inline_node(page)) {

IIRC, this flow was designed to avoid running out of free space issue
during checkpoint:

2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data")

The sceanrio is:
1. create fully node blocks
2. flush node blocks
3. write inline_data for all the node blocks again
4. flush node blocks redundantly

I guess this may cause failing one case of fstest.


Since block_operations->f2fs_sync_inode_meta has synced inode cache to
inode page, so in block_operations->f2fs_sync_node_pages, could we
check nlink before flush_inline_data():

if (is_inline_node(page)) {
	if (IS_INODE(page) && raw_inode_page->i_links) {
		flush_inline_data()
	}
}


>  				clear_inline_node(page);
>  				unlock_page(page);
>  				flush_inline_data(sbi, ino_of_node(page));
> 


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [f2fs-dev] [PATCH V4] f2fs: Avoid double lock for cp_rwsem during checkpoint
  2020-05-09  3:02   ` Chao Yu
@ 2020-05-09 19:03     ` Jaegeuk Kim
  2020-05-11  1:28       ` Chao Yu
  0 siblings, 1 reply; 10+ messages in thread
From: Jaegeuk Kim @ 2020-05-09 19:03 UTC (permalink / raw)
  To: Chao Yu; +Cc: linux-kernel, linux-f2fs-devel

On 05/09, Chao Yu wrote:
> On 2020/5/9 0:10, Jaegeuk Kim wrote:
> > Hi Sayali,
> > 
> > In order to address the perf regression, how about this?
> > 
> >>From 48418af635884803ffb35972df7958a2e6649322 Mon Sep 17 00:00:00 2001
> > From: Jaegeuk Kim <jaegeuk@kernel.org>
> > Date: Fri, 8 May 2020 09:08:37 -0700
> > Subject: [PATCH] f2fs: avoid double lock for cp_rwsem during checkpoint
> > 
> > There could be a scenario where f2fs_sync_node_pages gets
> > called during checkpoint, which in turn tries to flush
> > inline data and calls iput(). This results in deadlock as
> > iput() tries to hold cp_rwsem, which is already held at the
> > beginning by checkpoint->block_operations().
> > 
> > Call stack :
> > 
> > Thread A		Thread B
> > f2fs_write_checkpoint()
> > - block_operations(sbi)
> >  - f2fs_lock_all(sbi);
> >   - down_write(&sbi->cp_rwsem);
> > 
> >                         - open()
> >                          - igrab()
> >                         - write() write inline data
> >                         - unlink()
> > - f2fs_sync_node_pages()
> >  - if (is_inline_node(page))
> >   - flush_inline_data()
> >    - ilookup()
> >      page = f2fs_pagecache_get_page()
> >      if (!page)
> >       goto iput_out;
> >      iput_out:
> > 			-close()
> > 			-iput()
> >        iput(inode);
> >        - f2fs_evict_inode()
> >         - f2fs_truncate_blocks()
> >          - f2fs_lock_op()
> >            - down_read(&sbi->cp_rwsem);
> > 
> > Fixes: 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data")
> > Signed-off-by: Sayali Lokhande <sayalil@codeaurora.org>
> > Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> > ---
> >  fs/f2fs/node.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
> > index 1db8cabf727ef..626d7daca09de 100644
> > --- a/fs/f2fs/node.c
> > +++ b/fs/f2fs/node.c
> > @@ -1870,8 +1870,8 @@ int f2fs_sync_node_pages(struct f2fs_sb_info *sbi,
> >  				goto continue_unlock;
> >  			}
> >  
> > -			/* flush inline_data */
> > -			if (is_inline_node(page)) {
> > +			/* flush inline_data, if it's not sync path. */
> > +			if (do_balance && is_inline_node(page)) {
> 
> IIRC, this flow was designed to avoid running out of free space issue
> during checkpoint:
> 
> 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data")
> 
> The sceanrio is:
> 1. create fully node blocks
> 2. flush node blocks
> 3. write inline_data for all the node blocks again
> 4. flush node blocks redundantly
> 
> I guess this may cause failing one case of fstest.

Yeah, actually I was hitting 204 failure, and thus, revised like this.
Now, I don't see any regression in fstest.

From 8f1882acfb0a5fc43e5a2bbd576a8f3c681a7d2c Mon Sep 17 00:00:00 2001
From: Sayali Lokhande <sayalil@codeaurora.org>
Date: Thu, 30 Apr 2020 16:28:29 +0530
Subject: [PATCH] f2fs: Avoid double lock for cp_rwsem during checkpoint

There could be a scenario where f2fs_sync_node_pages gets
called during checkpoint, which in turn tries to flush
inline data and calls iput(). This results in deadlock as
iput() tries to hold cp_rwsem, which is already held at the
beginning by checkpoint->block_operations().

Call stack :

Thread A		Thread B
f2fs_write_checkpoint()
- block_operations(sbi)
 - f2fs_lock_all(sbi);
  - down_write(&sbi->cp_rwsem);

                        - open()
                         - igrab()
                        - write() write inline data
                        - unlink()
- f2fs_sync_node_pages()
 - if (is_inline_node(page))
  - flush_inline_data()
   - ilookup()
     page = f2fs_pagecache_get_page()
     if (!page)
      goto iput_out;
     iput_out:
			-close()
			-iput()
       iput(inode);
       - f2fs_evict_inode()
        - f2fs_truncate_blocks()
         - f2fs_lock_op()
           - down_read(&sbi->cp_rwsem);

Fixes: 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data")
Signed-off-by: Sayali Lokhande <sayalil@codeaurora.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
---
 fs/f2fs/checkpoint.c |  9 ++++++++-
 fs/f2fs/f2fs.h       |  4 ++--
 fs/f2fs/node.c       | 10 +++++-----
 3 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index d49f7a01d8a26..928aea4ff663d 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -1168,6 +1168,12 @@ static int block_operations(struct f2fs_sb_info *sbi)
 	};
 	int err = 0, cnt = 0;
 
+	/*
+	 * Let's flush node pages first to flush inline_data.
+	 * We'll actually guarantee everything below under f2fs_lock_all.
+	 */
+	f2fs_sync_node_pages(sbi, &wbc, false, false, FS_CP_NODE_IO);
+
 retry_flush_quotas:
 	f2fs_lock_all(sbi);
 	if (__need_flush_quota(sbi)) {
@@ -1222,7 +1228,8 @@ static int block_operations(struct f2fs_sb_info *sbi)
 	if (get_pages(sbi, F2FS_DIRTY_NODES)) {
 		up_write(&sbi->node_write);
 		atomic_inc(&sbi->wb_sync_req[NODE]);
-		err = f2fs_sync_node_pages(sbi, &wbc, false, FS_CP_NODE_IO);
+		err = f2fs_sync_node_pages(sbi, &wbc, false,
+					true, FS_CP_NODE_IO);
 		atomic_dec(&sbi->wb_sync_req[NODE]);
 		if (err) {
 			up_write(&sbi->node_change);
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index d916540f12813..ac6ae42b9dd4e 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -3286,8 +3286,8 @@ int f2fs_fsync_node_pages(struct f2fs_sb_info *sbi, struct inode *inode,
 			struct writeback_control *wbc, bool atomic,
 			unsigned int *seq_id);
 int f2fs_sync_node_pages(struct f2fs_sb_info *sbi,
-			struct writeback_control *wbc,
-			bool do_balance, enum iostat_type io_type);
+		struct writeback_control *wbc,
+		bool do_balance, bool sync, enum iostat_type io_type);
 int f2fs_build_free_nids(struct f2fs_sb_info *sbi, bool sync, bool mount);
 bool f2fs_alloc_nid(struct f2fs_sb_info *sbi, nid_t *nid);
 void f2fs_alloc_nid_done(struct f2fs_sb_info *sbi, nid_t nid);
diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index 1db8cabf727ef..fd00a8c119088 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -1808,8 +1808,8 @@ static bool flush_dirty_inode(struct page *page)
 }
 
 int f2fs_sync_node_pages(struct f2fs_sb_info *sbi,
-				struct writeback_control *wbc,
-				bool do_balance, enum iostat_type io_type)
+			struct writeback_control *wbc,
+			bool do_balance, bool sync, enum iostat_type io_type)
 {
 	pgoff_t index;
 	struct pagevec pvec;
@@ -1870,8 +1870,8 @@ int f2fs_sync_node_pages(struct f2fs_sb_info *sbi,
 				goto continue_unlock;
 			}
 
-			/* flush inline_data */
-			if (is_inline_node(page)) {
+			/* flush inline_data, if it's async context. */
+			if (!sync && is_inline_node(page)) {
 				clear_inline_node(page);
 				unlock_page(page);
 				flush_inline_data(sbi, ino_of_node(page));
@@ -1999,7 +1999,7 @@ static int f2fs_write_node_pages(struct address_space *mapping,
 
 	diff = nr_pages_to_write(sbi, NODE, wbc);
 	blk_start_plug(&plug);
-	f2fs_sync_node_pages(sbi, wbc, true, FS_NODE_IO);
+	f2fs_sync_node_pages(sbi, wbc, true, false, FS_NODE_IO);
 	blk_finish_plug(&plug);
 	wbc->nr_to_write = max((long)0, wbc->nr_to_write - diff);
 
-- 
2.26.2.645.ge9eca65c58-goog



_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [f2fs-dev] [PATCH V4] f2fs: Avoid double lock for cp_rwsem during checkpoint
  2020-05-09 19:03     ` Jaegeuk Kim
@ 2020-05-11  1:28       ` Chao Yu
  2020-05-11 22:11         ` Jaegeuk Kim
  0 siblings, 1 reply; 10+ messages in thread
From: Chao Yu @ 2020-05-11  1:28 UTC (permalink / raw)
  To: Jaegeuk Kim; +Cc: linux-kernel, linux-f2fs-devel

On 2020/5/10 3:03, Jaegeuk Kim wrote:
> On 05/09, Chao Yu wrote:
>> On 2020/5/9 0:10, Jaegeuk Kim wrote:
>>> Hi Sayali,
>>>
>>> In order to address the perf regression, how about this?
>>>
>>> >From 48418af635884803ffb35972df7958a2e6649322 Mon Sep 17 00:00:00 2001
>>> From: Jaegeuk Kim <jaegeuk@kernel.org>
>>> Date: Fri, 8 May 2020 09:08:37 -0700
>>> Subject: [PATCH] f2fs: avoid double lock for cp_rwsem during checkpoint
>>>
>>> There could be a scenario where f2fs_sync_node_pages gets
>>> called during checkpoint, which in turn tries to flush
>>> inline data and calls iput(). This results in deadlock as
>>> iput() tries to hold cp_rwsem, which is already held at the
>>> beginning by checkpoint->block_operations().
>>>
>>> Call stack :
>>>
>>> Thread A		Thread B
>>> f2fs_write_checkpoint()
>>> - block_operations(sbi)
>>>  - f2fs_lock_all(sbi);
>>>   - down_write(&sbi->cp_rwsem);
>>>
>>>                         - open()
>>>                          - igrab()
>>>                         - write() write inline data
>>>                         - unlink()
>>> - f2fs_sync_node_pages()
>>>  - if (is_inline_node(page))
>>>   - flush_inline_data()
>>>    - ilookup()
>>>      page = f2fs_pagecache_get_page()
>>>      if (!page)
>>>       goto iput_out;
>>>      iput_out:
>>> 			-close()
>>> 			-iput()
>>>        iput(inode);
>>>        - f2fs_evict_inode()
>>>         - f2fs_truncate_blocks()
>>>          - f2fs_lock_op()
>>>            - down_read(&sbi->cp_rwsem);
>>>
>>> Fixes: 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data")
>>> Signed-off-by: Sayali Lokhande <sayalil@codeaurora.org>
>>> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
>>> ---
>>>  fs/f2fs/node.c | 4 ++--
>>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
>>> index 1db8cabf727ef..626d7daca09de 100644
>>> --- a/fs/f2fs/node.c
>>> +++ b/fs/f2fs/node.c
>>> @@ -1870,8 +1870,8 @@ int f2fs_sync_node_pages(struct f2fs_sb_info *sbi,
>>>  				goto continue_unlock;
>>>  			}
>>>  
>>> -			/* flush inline_data */
>>> -			if (is_inline_node(page)) {
>>> +			/* flush inline_data, if it's not sync path. */
>>> +			if (do_balance && is_inline_node(page)) {
>>
>> IIRC, this flow was designed to avoid running out of free space issue
>> during checkpoint:
>>
>> 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data")
>>
>> The sceanrio is:
>> 1. create fully node blocks
>> 2. flush node blocks
>> 3. write inline_data for all the node blocks again
>> 4. flush node blocks redundantly
>>
>> I guess this may cause failing one case of fstest.
> 
> Yeah, actually I was hitting 204 failure, and thus, revised like this.
> Now, I don't see any regression in fstest.
> 
>>From 8f1882acfb0a5fc43e5a2bbd576a8f3c681a7d2c Mon Sep 17 00:00:00 2001
> From: Sayali Lokhande <sayalil@codeaurora.org>
> Date: Thu, 30 Apr 2020 16:28:29 +0530
> Subject: [PATCH] f2fs: Avoid double lock for cp_rwsem during checkpoint
> 
> There could be a scenario where f2fs_sync_node_pages gets
> called during checkpoint, which in turn tries to flush
> inline data and calls iput(). This results in deadlock as
> iput() tries to hold cp_rwsem, which is already held at the
> beginning by checkpoint->block_operations().
> 
> Call stack :
> 
> Thread A		Thread B
> f2fs_write_checkpoint()
> - block_operations(sbi)
>  - f2fs_lock_all(sbi);
>   - down_write(&sbi->cp_rwsem);
> 
>                         - open()
>                          - igrab()
>                         - write() write inline data
>                         - unlink()
> - f2fs_sync_node_pages()
>  - if (is_inline_node(page))
>   - flush_inline_data()
>    - ilookup()
>      page = f2fs_pagecache_get_page()
>      if (!page)
>       goto iput_out;
>      iput_out:
> 			-close()
> 			-iput()
>        iput(inode);
>        - f2fs_evict_inode()
>         - f2fs_truncate_blocks()
>          - f2fs_lock_op()
>            - down_read(&sbi->cp_rwsem);
> 
> Fixes: 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data")
> Signed-off-by: Sayali Lokhande <sayalil@codeaurora.org>
> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> ---
>  fs/f2fs/checkpoint.c |  9 ++++++++-
>  fs/f2fs/f2fs.h       |  4 ++--
>  fs/f2fs/node.c       | 10 +++++-----
>  3 files changed, 15 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
> index d49f7a01d8a26..928aea4ff663d 100644
> --- a/fs/f2fs/checkpoint.c
> +++ b/fs/f2fs/checkpoint.c
> @@ -1168,6 +1168,12 @@ static int block_operations(struct f2fs_sb_info *sbi)
>  	};
>  	int err = 0, cnt = 0;
>  
> +	/*
> +	 * Let's flush node pages first to flush inline_data.
> +	 * We'll actually guarantee everything below under f2fs_lock_all.
> +	 */
> +	f2fs_sync_node_pages(sbi, &wbc, false, false, FS_CP_NODE_IO);

It is possible that user write a large number of inline data in between
f2fs_sync_node_pages() and f2fs_lock_all(), it will cause the no-space issue in
race condition.

Also, if there is huge number of F2FS_DIRTY_IMETA, after this change, we will
flush inode page twice which is unneeded.

f2fs_sync_node_pages() --- flush dirty inode page
f2fs_lock_all()
...
f2fs_sync_inode_meta() --- update dirty inode page
f2fs_sync_node_pages() --- flush dirty inode page again.

Thanks,

> +
>  retry_flush_quotas:
>  	f2fs_lock_all(sbi);
>  	if (__need_flush_quota(sbi)) {
> @@ -1222,7 +1228,8 @@ static int block_operations(struct f2fs_sb_info *sbi)
>  	if (get_pages(sbi, F2FS_DIRTY_NODES)) {
>  		up_write(&sbi->node_write);
>  		atomic_inc(&sbi->wb_sync_req[NODE]);
> -		err = f2fs_sync_node_pages(sbi, &wbc, false, FS_CP_NODE_IO);
> +		err = f2fs_sync_node_pages(sbi, &wbc, false,
> +					true, FS_CP_NODE_IO);
>  		atomic_dec(&sbi->wb_sync_req[NODE]);
>  		if (err) {
>  			up_write(&sbi->node_change);
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index d916540f12813..ac6ae42b9dd4e 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -3286,8 +3286,8 @@ int f2fs_fsync_node_pages(struct f2fs_sb_info *sbi, struct inode *inode,
>  			struct writeback_control *wbc, bool atomic,
>  			unsigned int *seq_id);
>  int f2fs_sync_node_pages(struct f2fs_sb_info *sbi,
> -			struct writeback_control *wbc,
> -			bool do_balance, enum iostat_type io_type);
> +		struct writeback_control *wbc,
> +		bool do_balance, bool sync, enum iostat_type io_type);
>  int f2fs_build_free_nids(struct f2fs_sb_info *sbi, bool sync, bool mount);
>  bool f2fs_alloc_nid(struct f2fs_sb_info *sbi, nid_t *nid);
>  void f2fs_alloc_nid_done(struct f2fs_sb_info *sbi, nid_t nid);
> diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
> index 1db8cabf727ef..fd00a8c119088 100644
> --- a/fs/f2fs/node.c
> +++ b/fs/f2fs/node.c
> @@ -1808,8 +1808,8 @@ static bool flush_dirty_inode(struct page *page)
>  }
>  
>  int f2fs_sync_node_pages(struct f2fs_sb_info *sbi,
> -				struct writeback_control *wbc,
> -				bool do_balance, enum iostat_type io_type)
> +			struct writeback_control *wbc,
> +			bool do_balance, bool sync, enum iostat_type io_type)
>  {
>  	pgoff_t index;
>  	struct pagevec pvec;
> @@ -1870,8 +1870,8 @@ int f2fs_sync_node_pages(struct f2fs_sb_info *sbi,
>  				goto continue_unlock;
>  			}
>  
> -			/* flush inline_data */
> -			if (is_inline_node(page)) {
> +			/* flush inline_data, if it's async context. */
> +			if (!sync && is_inline_node(page)) {
>  				clear_inline_node(page);
>  				unlock_page(page);
>  				flush_inline_data(sbi, ino_of_node(page));
> @@ -1999,7 +1999,7 @@ static int f2fs_write_node_pages(struct address_space *mapping,
>  
>  	diff = nr_pages_to_write(sbi, NODE, wbc);
>  	blk_start_plug(&plug);
> -	f2fs_sync_node_pages(sbi, wbc, true, FS_NODE_IO);
> +	f2fs_sync_node_pages(sbi, wbc, true, false, FS_NODE_IO);
>  	blk_finish_plug(&plug);
>  	wbc->nr_to_write = max((long)0, wbc->nr_to_write - diff);
>  
> 


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [f2fs-dev] [PATCH V4] f2fs: Avoid double lock for cp_rwsem during checkpoint
  2020-05-11  1:28       ` Chao Yu
@ 2020-05-11 22:11         ` Jaegeuk Kim
  2020-05-12  1:57           ` Chao Yu
  0 siblings, 1 reply; 10+ messages in thread
From: Jaegeuk Kim @ 2020-05-11 22:11 UTC (permalink / raw)
  To: Chao Yu; +Cc: linux-kernel, linux-f2fs-devel

On 05/11, Chao Yu wrote:
> On 2020/5/10 3:03, Jaegeuk Kim wrote:
> > On 05/09, Chao Yu wrote:
> >> On 2020/5/9 0:10, Jaegeuk Kim wrote:
> >>> Hi Sayali,
> >>>
> >>> In order to address the perf regression, how about this?
> >>>
> >>> >From 48418af635884803ffb35972df7958a2e6649322 Mon Sep 17 00:00:00 2001
> >>> From: Jaegeuk Kim <jaegeuk@kernel.org>
> >>> Date: Fri, 8 May 2020 09:08:37 -0700
> >>> Subject: [PATCH] f2fs: avoid double lock for cp_rwsem during checkpoint
> >>>
> >>> There could be a scenario where f2fs_sync_node_pages gets
> >>> called during checkpoint, which in turn tries to flush
> >>> inline data and calls iput(). This results in deadlock as
> >>> iput() tries to hold cp_rwsem, which is already held at the
> >>> beginning by checkpoint->block_operations().
> >>>
> >>> Call stack :
> >>>
> >>> Thread A		Thread B
> >>> f2fs_write_checkpoint()
> >>> - block_operations(sbi)
> >>>  - f2fs_lock_all(sbi);
> >>>   - down_write(&sbi->cp_rwsem);
> >>>
> >>>                         - open()
> >>>                          - igrab()
> >>>                         - write() write inline data
> >>>                         - unlink()
> >>> - f2fs_sync_node_pages()
> >>>  - if (is_inline_node(page))
> >>>   - flush_inline_data()
> >>>    - ilookup()
> >>>      page = f2fs_pagecache_get_page()
> >>>      if (!page)
> >>>       goto iput_out;
> >>>      iput_out:
> >>> 			-close()
> >>> 			-iput()
> >>>        iput(inode);
> >>>        - f2fs_evict_inode()
> >>>         - f2fs_truncate_blocks()
> >>>          - f2fs_lock_op()
> >>>            - down_read(&sbi->cp_rwsem);
> >>>
> >>> Fixes: 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data")
> >>> Signed-off-by: Sayali Lokhande <sayalil@codeaurora.org>
> >>> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> >>> ---
> >>>  fs/f2fs/node.c | 4 ++--
> >>>  1 file changed, 2 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
> >>> index 1db8cabf727ef..626d7daca09de 100644
> >>> --- a/fs/f2fs/node.c
> >>> +++ b/fs/f2fs/node.c
> >>> @@ -1870,8 +1870,8 @@ int f2fs_sync_node_pages(struct f2fs_sb_info *sbi,
> >>>  				goto continue_unlock;
> >>>  			}
> >>>  
> >>> -			/* flush inline_data */
> >>> -			if (is_inline_node(page)) {
> >>> +			/* flush inline_data, if it's not sync path. */
> >>> +			if (do_balance && is_inline_node(page)) {
> >>
> >> IIRC, this flow was designed to avoid running out of free space issue
> >> during checkpoint:
> >>
> >> 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data")
> >>
> >> The sceanrio is:
> >> 1. create fully node blocks
> >> 2. flush node blocks
> >> 3. write inline_data for all the node blocks again
> >> 4. flush node blocks redundantly
> >>
> >> I guess this may cause failing one case of fstest.
> > 
> > Yeah, actually I was hitting 204 failure, and thus, revised like this.
> > Now, I don't see any regression in fstest.
> > 
> >>From 8f1882acfb0a5fc43e5a2bbd576a8f3c681a7d2c Mon Sep 17 00:00:00 2001
> > From: Sayali Lokhande <sayalil@codeaurora.org>
> > Date: Thu, 30 Apr 2020 16:28:29 +0530
> > Subject: [PATCH] f2fs: Avoid double lock for cp_rwsem during checkpoint
> > 
> > There could be a scenario where f2fs_sync_node_pages gets
> > called during checkpoint, which in turn tries to flush
> > inline data and calls iput(). This results in deadlock as
> > iput() tries to hold cp_rwsem, which is already held at the
> > beginning by checkpoint->block_operations().
> > 
> > Call stack :
> > 
> > Thread A		Thread B
> > f2fs_write_checkpoint()
> > - block_operations(sbi)
> >  - f2fs_lock_all(sbi);
> >   - down_write(&sbi->cp_rwsem);
> > 
> >                         - open()
> >                          - igrab()
> >                         - write() write inline data
> >                         - unlink()
> > - f2fs_sync_node_pages()
> >  - if (is_inline_node(page))
> >   - flush_inline_data()
> >    - ilookup()
> >      page = f2fs_pagecache_get_page()
> >      if (!page)
> >       goto iput_out;
> >      iput_out:
> > 			-close()
> > 			-iput()
> >        iput(inode);
> >        - f2fs_evict_inode()
> >         - f2fs_truncate_blocks()
> >          - f2fs_lock_op()
> >            - down_read(&sbi->cp_rwsem);
> > 
> > Fixes: 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data")
> > Signed-off-by: Sayali Lokhande <sayalil@codeaurora.org>
> > Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> > ---
> >  fs/f2fs/checkpoint.c |  9 ++++++++-
> >  fs/f2fs/f2fs.h       |  4 ++--
> >  fs/f2fs/node.c       | 10 +++++-----
> >  3 files changed, 15 insertions(+), 8 deletions(-)
> > 
> > diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
> > index d49f7a01d8a26..928aea4ff663d 100644
> > --- a/fs/f2fs/checkpoint.c
> > +++ b/fs/f2fs/checkpoint.c
> > @@ -1168,6 +1168,12 @@ static int block_operations(struct f2fs_sb_info *sbi)
> >  	};
> >  	int err = 0, cnt = 0;
> >  
> > +	/*
> > +	 * Let's flush node pages first to flush inline_data.
> > +	 * We'll actually guarantee everything below under f2fs_lock_all.
> > +	 */
> > +	f2fs_sync_node_pages(sbi, &wbc, false, false, FS_CP_NODE_IO);
> 
> It is possible that user write a large number of inline data in between
> f2fs_sync_node_pages() and f2fs_lock_all(), it will cause the no-space issue in
> race condition.
> 
> Also, if there is huge number of F2FS_DIRTY_IMETA, after this change, we will
> flush inode page twice which is unneeded.
> 
> f2fs_sync_node_pages() --- flush dirty inode page
> f2fs_lock_all()
> ...
> f2fs_sync_inode_meta() --- update dirty inode page
> f2fs_sync_node_pages() --- flush dirty inode page again.
> 

Another version:

From 6b430b72af57c65c20dea7b87f7ba7e9df36be98 Mon Sep 17 00:00:00 2001
From: Sayali Lokhande <sayalil@codeaurora.org>
Date: Thu, 30 Apr 2020 16:28:29 +0530
Subject: [PATCH] f2fs: Avoid double lock for cp_rwsem during checkpoint

There could be a scenario where f2fs_sync_node_pages gets
called during checkpoint, which in turn tries to flush
inline data and calls iput(). This results in deadlock as
iput() tries to hold cp_rwsem, which is already held at the
beginning by checkpoint->block_operations().

Call stack :

Thread A		Thread B
f2fs_write_checkpoint()
- block_operations(sbi)
 - f2fs_lock_all(sbi);
  - down_write(&sbi->cp_rwsem);

                        - open()
                         - igrab()
                        - write() write inline data
                        - unlink()
- f2fs_sync_node_pages()
 - if (is_inline_node(page))
  - flush_inline_data()
   - ilookup()
     page = f2fs_pagecache_get_page()
     if (!page)
      goto iput_out;
     iput_out:
			-close()
			-iput()
       iput(inode);
       - f2fs_evict_inode()
        - f2fs_truncate_blocks()
         - f2fs_lock_op()
           - down_read(&sbi->cp_rwsem);

Fixes: 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data")
Signed-off-by: Sayali Lokhande <sayalil@codeaurora.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
---
 fs/f2fs/checkpoint.c |  5 +++++
 fs/f2fs/f2fs.h       |  1 +
 fs/f2fs/node.c       | 51 ++++++++++++++++++++++++++++++++++++++++++--
 3 files changed, 55 insertions(+), 2 deletions(-)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index d49f7a01d8a26..79e605f38f4fa 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -1168,6 +1168,11 @@ static int block_operations(struct f2fs_sb_info *sbi)
 	};
 	int err = 0, cnt = 0;
 
+	/*
+	 * Let's flush inline_data in dirty node pages.
+	 */
+	f2fs_flush_inline_data(sbi);
+
 retry_flush_quotas:
 	f2fs_lock_all(sbi);
 	if (__need_flush_quota(sbi)) {
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 2a8ea81c52a15..7f3d259e7e376 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -3282,6 +3282,7 @@ void f2fs_ra_node_page(struct f2fs_sb_info *sbi, nid_t nid);
 struct page *f2fs_get_node_page(struct f2fs_sb_info *sbi, pgoff_t nid);
 struct page *f2fs_get_node_page_ra(struct page *parent, int start);
 int f2fs_move_node_page(struct page *node_page, int gc_type);
+int f2fs_flush_inline_data(struct f2fs_sb_info *sbi);
 int f2fs_fsync_node_pages(struct f2fs_sb_info *sbi, struct inode *inode,
 			struct writeback_control *wbc, bool atomic,
 			unsigned int *seq_id);
diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index 1db8cabf727ef..e632de10aedab 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -1807,6 +1807,53 @@ static bool flush_dirty_inode(struct page *page)
 	return true;
 }
 
+int f2fs_flush_inline_data(struct f2fs_sb_info *sbi)
+{
+	pgoff_t index = 0;
+	struct pagevec pvec;
+	int nr_pages;
+	int ret = 0;
+
+	pagevec_init(&pvec);
+
+	while ((nr_pages = pagevec_lookup_tag(&pvec,
+			NODE_MAPPING(sbi), &index, PAGECACHE_TAG_DIRTY))) {
+		int i;
+
+		for (i = 0; i < nr_pages; i++) {
+			struct page *page = pvec.pages[i];
+
+			if (!IS_DNODE(page))
+				continue;
+
+			lock_page(page);
+
+			if (unlikely(page->mapping != NODE_MAPPING(sbi))) {
+continue_unlock:
+				unlock_page(page);
+				continue;
+			}
+
+			if (!PageDirty(page)) {
+				/* someone wrote it for us */
+				goto continue_unlock;
+			}
+
+			/* flush inline_data, if it's async context. */
+			if (is_inline_node(page)) {
+				clear_inline_node(page);
+				unlock_page(page);
+				flush_inline_data(sbi, ino_of_node(page));
+				continue;
+			}
+			unlock_page(page);
+		}
+		pagevec_release(&pvec);
+		cond_resched();
+	}
+	return ret;
+}
+
 int f2fs_sync_node_pages(struct f2fs_sb_info *sbi,
 				struct writeback_control *wbc,
 				bool do_balance, enum iostat_type io_type)
@@ -1870,8 +1917,8 @@ int f2fs_sync_node_pages(struct f2fs_sb_info *sbi,
 				goto continue_unlock;
 			}
 
-			/* flush inline_data */
-			if (is_inline_node(page)) {
+			/* flush inline_data, if it's async context. */
+			if (do_balance && is_inline_node(page)) {
 				clear_inline_node(page);
 				unlock_page(page);
 				flush_inline_data(sbi, ino_of_node(page));
-- 
2.26.2.645.ge9eca65c58-goog



_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [f2fs-dev] [PATCH V4] f2fs: Avoid double lock for cp_rwsem during checkpoint
  2020-05-11 22:11         ` Jaegeuk Kim
@ 2020-05-12  1:57           ` Chao Yu
  2020-05-12  3:24             ` Jaegeuk Kim
  0 siblings, 1 reply; 10+ messages in thread
From: Chao Yu @ 2020-05-12  1:57 UTC (permalink / raw)
  To: Jaegeuk Kim; +Cc: linux-kernel, linux-f2fs-devel

On 2020/5/12 6:11, Jaegeuk Kim wrote:
> On 05/11, Chao Yu wrote:
>> On 2020/5/10 3:03, Jaegeuk Kim wrote:
>>> On 05/09, Chao Yu wrote:
>>>> On 2020/5/9 0:10, Jaegeuk Kim wrote:
>>>>> Hi Sayali,
>>>>>
>>>>> In order to address the perf regression, how about this?
>>>>>
>>>>> >From 48418af635884803ffb35972df7958a2e6649322 Mon Sep 17 00:00:00 2001
>>>>> From: Jaegeuk Kim <jaegeuk@kernel.org>
>>>>> Date: Fri, 8 May 2020 09:08:37 -0700
>>>>> Subject: [PATCH] f2fs: avoid double lock for cp_rwsem during checkpoint
>>>>>
>>>>> There could be a scenario where f2fs_sync_node_pages gets
>>>>> called during checkpoint, which in turn tries to flush
>>>>> inline data and calls iput(). This results in deadlock as
>>>>> iput() tries to hold cp_rwsem, which is already held at the
>>>>> beginning by checkpoint->block_operations().
>>>>>
>>>>> Call stack :
>>>>>
>>>>> Thread A		Thread B
>>>>> f2fs_write_checkpoint()
>>>>> - block_operations(sbi)
>>>>>  - f2fs_lock_all(sbi);
>>>>>   - down_write(&sbi->cp_rwsem);
>>>>>
>>>>>                         - open()
>>>>>                          - igrab()
>>>>>                         - write() write inline data
>>>>>                         - unlink()
>>>>> - f2fs_sync_node_pages()
>>>>>  - if (is_inline_node(page))
>>>>>   - flush_inline_data()
>>>>>    - ilookup()
>>>>>      page = f2fs_pagecache_get_page()
>>>>>      if (!page)
>>>>>       goto iput_out;
>>>>>      iput_out:
>>>>> 			-close()
>>>>> 			-iput()
>>>>>        iput(inode);
>>>>>        - f2fs_evict_inode()
>>>>>         - f2fs_truncate_blocks()
>>>>>          - f2fs_lock_op()
>>>>>            - down_read(&sbi->cp_rwsem);
>>>>>
>>>>> Fixes: 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data")
>>>>> Signed-off-by: Sayali Lokhande <sayalil@codeaurora.org>
>>>>> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
>>>>> ---
>>>>>  fs/f2fs/node.c | 4 ++--
>>>>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
>>>>> index 1db8cabf727ef..626d7daca09de 100644
>>>>> --- a/fs/f2fs/node.c
>>>>> +++ b/fs/f2fs/node.c
>>>>> @@ -1870,8 +1870,8 @@ int f2fs_sync_node_pages(struct f2fs_sb_info *sbi,
>>>>>  				goto continue_unlock;
>>>>>  			}
>>>>>  
>>>>> -			/* flush inline_data */
>>>>> -			if (is_inline_node(page)) {
>>>>> +			/* flush inline_data, if it's not sync path. */
>>>>> +			if (do_balance && is_inline_node(page)) {
>>>>
>>>> IIRC, this flow was designed to avoid running out of free space issue
>>>> during checkpoint:
>>>>
>>>> 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data")
>>>>
>>>> The sceanrio is:
>>>> 1. create fully node blocks
>>>> 2. flush node blocks
>>>> 3. write inline_data for all the node blocks again
>>>> 4. flush node blocks redundantly
>>>>
>>>> I guess this may cause failing one case of fstest.
>>>
>>> Yeah, actually I was hitting 204 failure, and thus, revised like this.
>>> Now, I don't see any regression in fstest.
>>>
>>> >From 8f1882acfb0a5fc43e5a2bbd576a8f3c681a7d2c Mon Sep 17 00:00:00 2001
>>> From: Sayali Lokhande <sayalil@codeaurora.org>
>>> Date: Thu, 30 Apr 2020 16:28:29 +0530
>>> Subject: [PATCH] f2fs: Avoid double lock for cp_rwsem during checkpoint
>>>
>>> There could be a scenario where f2fs_sync_node_pages gets
>>> called during checkpoint, which in turn tries to flush
>>> inline data and calls iput(). This results in deadlock as
>>> iput() tries to hold cp_rwsem, which is already held at the
>>> beginning by checkpoint->block_operations().
>>>
>>> Call stack :
>>>
>>> Thread A		Thread B
>>> f2fs_write_checkpoint()
>>> - block_operations(sbi)
>>>  - f2fs_lock_all(sbi);
>>>   - down_write(&sbi->cp_rwsem);
>>>
>>>                         - open()
>>>                          - igrab()
>>>                         - write() write inline data
>>>                         - unlink()
>>> - f2fs_sync_node_pages()
>>>  - if (is_inline_node(page))
>>>   - flush_inline_data()
>>>    - ilookup()
>>>      page = f2fs_pagecache_get_page()
>>>      if (!page)
>>>       goto iput_out;
>>>      iput_out:
>>> 			-close()
>>> 			-iput()
>>>        iput(inode);
>>>        - f2fs_evict_inode()
>>>         - f2fs_truncate_blocks()
>>>          - f2fs_lock_op()
>>>            - down_read(&sbi->cp_rwsem);
>>>
>>> Fixes: 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data")
>>> Signed-off-by: Sayali Lokhande <sayalil@codeaurora.org>
>>> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
>>> ---
>>>  fs/f2fs/checkpoint.c |  9 ++++++++-
>>>  fs/f2fs/f2fs.h       |  4 ++--
>>>  fs/f2fs/node.c       | 10 +++++-----
>>>  3 files changed, 15 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
>>> index d49f7a01d8a26..928aea4ff663d 100644
>>> --- a/fs/f2fs/checkpoint.c
>>> +++ b/fs/f2fs/checkpoint.c
>>> @@ -1168,6 +1168,12 @@ static int block_operations(struct f2fs_sb_info *sbi)
>>>  	};
>>>  	int err = 0, cnt = 0;
>>>  
>>> +	/*
>>> +	 * Let's flush node pages first to flush inline_data.
>>> +	 * We'll actually guarantee everything below under f2fs_lock_all.
>>> +	 */
>>> +	f2fs_sync_node_pages(sbi, &wbc, false, false, FS_CP_NODE_IO);
>>
>> It is possible that user write a large number of inline data in between
>> f2fs_sync_node_pages() and f2fs_lock_all(), it will cause the no-space issue in
>> race condition.
>>
>> Also, if there is huge number of F2FS_DIRTY_IMETA, after this change, we will
>> flush inode page twice which is unneeded.
>>
>> f2fs_sync_node_pages() --- flush dirty inode page
>> f2fs_lock_all()
>> ...
>> f2fs_sync_inode_meta() --- update dirty inode page
>> f2fs_sync_node_pages() --- flush dirty inode page again.
>>
> 
> Another version:
> 
>>From 6b430b72af57c65c20dea7b87f7ba7e9df36be98 Mon Sep 17 00:00:00 2001
> From: Sayali Lokhande <sayalil@codeaurora.org>
> Date: Thu, 30 Apr 2020 16:28:29 +0530
> Subject: [PATCH] f2fs: Avoid double lock for cp_rwsem during checkpoint
> 
> There could be a scenario where f2fs_sync_node_pages gets
> called during checkpoint, which in turn tries to flush
> inline data and calls iput(). This results in deadlock as
> iput() tries to hold cp_rwsem, which is already held at the
> beginning by checkpoint->block_operations().
> 
> Call stack :
> 
> Thread A		Thread B
> f2fs_write_checkpoint()
> - block_operations(sbi)
>  - f2fs_lock_all(sbi);
>   - down_write(&sbi->cp_rwsem);
> 
>                         - open()
>                          - igrab()
>                         - write() write inline data
>                         - unlink()
> - f2fs_sync_node_pages()
>  - if (is_inline_node(page))
>   - flush_inline_data()
>    - ilookup()
>      page = f2fs_pagecache_get_page()
>      if (!page)
>       goto iput_out;
>      iput_out:
> 			-close()
> 			-iput()
>        iput(inode);
>        - f2fs_evict_inode()
>         - f2fs_truncate_blocks()
>          - f2fs_lock_op()
>            - down_read(&sbi->cp_rwsem);
> 
> Fixes: 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data")
> Signed-off-by: Sayali Lokhande <sayalil@codeaurora.org>
> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> ---
>  fs/f2fs/checkpoint.c |  5 +++++
>  fs/f2fs/f2fs.h       |  1 +
>  fs/f2fs/node.c       | 51 ++++++++++++++++++++++++++++++++++++++++++--
>  3 files changed, 55 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
> index d49f7a01d8a26..79e605f38f4fa 100644
> --- a/fs/f2fs/checkpoint.c
> +++ b/fs/f2fs/checkpoint.c
> @@ -1168,6 +1168,11 @@ static int block_operations(struct f2fs_sb_info *sbi)
>  	};
>  	int err = 0, cnt = 0;
>  
> +	/*
> +	 * Let's flush inline_data in dirty node pages.
> +	 */
> +	f2fs_flush_inline_data(sbi);

Still there is a gap, user can write a large number of inline data here...

Would that be enough? I doubt we can suffer this issue in below pathes
as well:

- block_operations
 - f2fs_sync_dirty_inodes
  - iput
 - f2fs_sync_inode_meta
  - iput

Thanks,

> +
>  retry_flush_quotas:
>  	f2fs_lock_all(sbi);
>  	if (__need_flush_quota(sbi)) {
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index 2a8ea81c52a15..7f3d259e7e376 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -3282,6 +3282,7 @@ void f2fs_ra_node_page(struct f2fs_sb_info *sbi, nid_t nid);
>  struct page *f2fs_get_node_page(struct f2fs_sb_info *sbi, pgoff_t nid);
>  struct page *f2fs_get_node_page_ra(struct page *parent, int start);
>  int f2fs_move_node_page(struct page *node_page, int gc_type);
> +int f2fs_flush_inline_data(struct f2fs_sb_info *sbi);
>  int f2fs_fsync_node_pages(struct f2fs_sb_info *sbi, struct inode *inode,
>  			struct writeback_control *wbc, bool atomic,
>  			unsigned int *seq_id);
> diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
> index 1db8cabf727ef..e632de10aedab 100644
> --- a/fs/f2fs/node.c
> +++ b/fs/f2fs/node.c
> @@ -1807,6 +1807,53 @@ static bool flush_dirty_inode(struct page *page)
>  	return true;
>  }
>  
> +int f2fs_flush_inline_data(struct f2fs_sb_info *sbi)
> +{
> +	pgoff_t index = 0;
> +	struct pagevec pvec;
> +	int nr_pages;
> +	int ret = 0;
> +
> +	pagevec_init(&pvec);
> +
> +	while ((nr_pages = pagevec_lookup_tag(&pvec,
> +			NODE_MAPPING(sbi), &index, PAGECACHE_TAG_DIRTY))) {
> +		int i;
> +
> +		for (i = 0; i < nr_pages; i++) {
> +			struct page *page = pvec.pages[i];
> +
> +			if (!IS_DNODE(page))
> +				continue;
> +
> +			lock_page(page);
> +
> +			if (unlikely(page->mapping != NODE_MAPPING(sbi))) {
> +continue_unlock:
> +				unlock_page(page);
> +				continue;
> +			}
> +
> +			if (!PageDirty(page)) {
> +				/* someone wrote it for us */
> +				goto continue_unlock;
> +			}
> +
> +			/* flush inline_data, if it's async context. */
> +			if (is_inline_node(page)) {
> +				clear_inline_node(page);
> +				unlock_page(page);
> +				flush_inline_data(sbi, ino_of_node(page));
> +				continue;
> +			}
> +			unlock_page(page);
> +		}
> +		pagevec_release(&pvec);
> +		cond_resched();
> +	}
> +	return ret;
> +}
> +
>  int f2fs_sync_node_pages(struct f2fs_sb_info *sbi,
>  				struct writeback_control *wbc,
>  				bool do_balance, enum iostat_type io_type)
> @@ -1870,8 +1917,8 @@ int f2fs_sync_node_pages(struct f2fs_sb_info *sbi,
>  				goto continue_unlock;
>  			}
>  
> -			/* flush inline_data */
> -			if (is_inline_node(page)) {
> +			/* flush inline_data, if it's async context. */
> +			if (do_balance && is_inline_node(page)) {
>  				clear_inline_node(page);
>  				unlock_page(page);
>  				flush_inline_data(sbi, ino_of_node(page));
> 


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [f2fs-dev] [PATCH V4] f2fs: Avoid double lock for cp_rwsem during checkpoint
  2020-05-12  1:57           ` Chao Yu
@ 2020-05-12  3:24             ` Jaegeuk Kim
  2020-05-12  6:49               ` Chao Yu
  0 siblings, 1 reply; 10+ messages in thread
From: Jaegeuk Kim @ 2020-05-12  3:24 UTC (permalink / raw)
  To: Chao Yu; +Cc: linux-kernel, linux-f2fs-devel

On 05/12, Chao Yu wrote:
> On 2020/5/12 6:11, Jaegeuk Kim wrote:
> > On 05/11, Chao Yu wrote:
> >> On 2020/5/10 3:03, Jaegeuk Kim wrote:
> >>> On 05/09, Chao Yu wrote:
> >>>> On 2020/5/9 0:10, Jaegeuk Kim wrote:
> >>>>> Hi Sayali,
> >>>>>
> >>>>> In order to address the perf regression, how about this?
> >>>>>
> >>>>> >From 48418af635884803ffb35972df7958a2e6649322 Mon Sep 17 00:00:00 2001
> >>>>> From: Jaegeuk Kim <jaegeuk@kernel.org>
> >>>>> Date: Fri, 8 May 2020 09:08:37 -0700
> >>>>> Subject: [PATCH] f2fs: avoid double lock for cp_rwsem during checkpoint
> >>>>>
> >>>>> There could be a scenario where f2fs_sync_node_pages gets
> >>>>> called during checkpoint, which in turn tries to flush
> >>>>> inline data and calls iput(). This results in deadlock as
> >>>>> iput() tries to hold cp_rwsem, which is already held at the
> >>>>> beginning by checkpoint->block_operations().
> >>>>>
> >>>>> Call stack :
> >>>>>
> >>>>> Thread A		Thread B
> >>>>> f2fs_write_checkpoint()
> >>>>> - block_operations(sbi)
> >>>>>  - f2fs_lock_all(sbi);
> >>>>>   - down_write(&sbi->cp_rwsem);
> >>>>>
> >>>>>                         - open()
> >>>>>                          - igrab()
> >>>>>                         - write() write inline data
> >>>>>                         - unlink()
> >>>>> - f2fs_sync_node_pages()
> >>>>>  - if (is_inline_node(page))
> >>>>>   - flush_inline_data()
> >>>>>    - ilookup()
> >>>>>      page = f2fs_pagecache_get_page()
> >>>>>      if (!page)
> >>>>>       goto iput_out;
> >>>>>      iput_out:
> >>>>> 			-close()
> >>>>> 			-iput()
> >>>>>        iput(inode);
> >>>>>        - f2fs_evict_inode()
> >>>>>         - f2fs_truncate_blocks()
> >>>>>          - f2fs_lock_op()
> >>>>>            - down_read(&sbi->cp_rwsem);
> >>>>>
> >>>>> Fixes: 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data")
> >>>>> Signed-off-by: Sayali Lokhande <sayalil@codeaurora.org>
> >>>>> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> >>>>> ---
> >>>>>  fs/f2fs/node.c | 4 ++--
> >>>>>  1 file changed, 2 insertions(+), 2 deletions(-)
> >>>>>
> >>>>> diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
> >>>>> index 1db8cabf727ef..626d7daca09de 100644
> >>>>> --- a/fs/f2fs/node.c
> >>>>> +++ b/fs/f2fs/node.c
> >>>>> @@ -1870,8 +1870,8 @@ int f2fs_sync_node_pages(struct f2fs_sb_info *sbi,
> >>>>>  				goto continue_unlock;
> >>>>>  			}
> >>>>>  
> >>>>> -			/* flush inline_data */
> >>>>> -			if (is_inline_node(page)) {
> >>>>> +			/* flush inline_data, if it's not sync path. */
> >>>>> +			if (do_balance && is_inline_node(page)) {
> >>>>
> >>>> IIRC, this flow was designed to avoid running out of free space issue
> >>>> during checkpoint:
> >>>>
> >>>> 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data")
> >>>>
> >>>> The sceanrio is:
> >>>> 1. create fully node blocks
> >>>> 2. flush node blocks
> >>>> 3. write inline_data for all the node blocks again
> >>>> 4. flush node blocks redundantly
> >>>>
> >>>> I guess this may cause failing one case of fstest.
> >>>
> >>> Yeah, actually I was hitting 204 failure, and thus, revised like this.
> >>> Now, I don't see any regression in fstest.
> >>>
> >>> >From 8f1882acfb0a5fc43e5a2bbd576a8f3c681a7d2c Mon Sep 17 00:00:00 2001
> >>> From: Sayali Lokhande <sayalil@codeaurora.org>
> >>> Date: Thu, 30 Apr 2020 16:28:29 +0530
> >>> Subject: [PATCH] f2fs: Avoid double lock for cp_rwsem during checkpoint
> >>>
> >>> There could be a scenario where f2fs_sync_node_pages gets
> >>> called during checkpoint, which in turn tries to flush
> >>> inline data and calls iput(). This results in deadlock as
> >>> iput() tries to hold cp_rwsem, which is already held at the
> >>> beginning by checkpoint->block_operations().
> >>>
> >>> Call stack :
> >>>
> >>> Thread A		Thread B
> >>> f2fs_write_checkpoint()
> >>> - block_operations(sbi)
> >>>  - f2fs_lock_all(sbi);
> >>>   - down_write(&sbi->cp_rwsem);
> >>>
> >>>                         - open()
> >>>                          - igrab()
> >>>                         - write() write inline data
> >>>                         - unlink()
> >>> - f2fs_sync_node_pages()
> >>>  - if (is_inline_node(page))
> >>>   - flush_inline_data()
> >>>    - ilookup()
> >>>      page = f2fs_pagecache_get_page()
> >>>      if (!page)
> >>>       goto iput_out;
> >>>      iput_out:
> >>> 			-close()
> >>> 			-iput()
> >>>        iput(inode);
> >>>        - f2fs_evict_inode()
> >>>         - f2fs_truncate_blocks()
> >>>          - f2fs_lock_op()
> >>>            - down_read(&sbi->cp_rwsem);
> >>>
> >>> Fixes: 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data")
> >>> Signed-off-by: Sayali Lokhande <sayalil@codeaurora.org>
> >>> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> >>> ---
> >>>  fs/f2fs/checkpoint.c |  9 ++++++++-
> >>>  fs/f2fs/f2fs.h       |  4 ++--
> >>>  fs/f2fs/node.c       | 10 +++++-----
> >>>  3 files changed, 15 insertions(+), 8 deletions(-)
> >>>
> >>> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
> >>> index d49f7a01d8a26..928aea4ff663d 100644
> >>> --- a/fs/f2fs/checkpoint.c
> >>> +++ b/fs/f2fs/checkpoint.c
> >>> @@ -1168,6 +1168,12 @@ static int block_operations(struct f2fs_sb_info *sbi)
> >>>  	};
> >>>  	int err = 0, cnt = 0;
> >>>  
> >>> +	/*
> >>> +	 * Let's flush node pages first to flush inline_data.
> >>> +	 * We'll actually guarantee everything below under f2fs_lock_all.
> >>> +	 */
> >>> +	f2fs_sync_node_pages(sbi, &wbc, false, false, FS_CP_NODE_IO);
> >>
> >> It is possible that user write a large number of inline data in between
> >> f2fs_sync_node_pages() and f2fs_lock_all(), it will cause the no-space issue in
> >> race condition.
> >>
> >> Also, if there is huge number of F2FS_DIRTY_IMETA, after this change, we will
> >> flush inode page twice which is unneeded.
> >>
> >> f2fs_sync_node_pages() --- flush dirty inode page
> >> f2fs_lock_all()
> >> ...
> >> f2fs_sync_inode_meta() --- update dirty inode page
> >> f2fs_sync_node_pages() --- flush dirty inode page again.
> >>
> > 
> > Another version:
> > 
> >>From 6b430b72af57c65c20dea7b87f7ba7e9df36be98 Mon Sep 17 00:00:00 2001
> > From: Sayali Lokhande <sayalil@codeaurora.org>
> > Date: Thu, 30 Apr 2020 16:28:29 +0530
> > Subject: [PATCH] f2fs: Avoid double lock for cp_rwsem during checkpoint
> > 
> > There could be a scenario where f2fs_sync_node_pages gets
> > called during checkpoint, which in turn tries to flush
> > inline data and calls iput(). This results in deadlock as
> > iput() tries to hold cp_rwsem, which is already held at the
> > beginning by checkpoint->block_operations().
> > 
> > Call stack :
> > 
> > Thread A		Thread B
> > f2fs_write_checkpoint()
> > - block_operations(sbi)
> >  - f2fs_lock_all(sbi);
> >   - down_write(&sbi->cp_rwsem);
> > 
> >                         - open()
> >                          - igrab()
> >                         - write() write inline data
> >                         - unlink()
> > - f2fs_sync_node_pages()
> >  - if (is_inline_node(page))
> >   - flush_inline_data()
> >    - ilookup()
> >      page = f2fs_pagecache_get_page()
> >      if (!page)
> >       goto iput_out;
> >      iput_out:
> > 			-close()
> > 			-iput()
> >        iput(inode);
> >        - f2fs_evict_inode()
> >         - f2fs_truncate_blocks()
> >          - f2fs_lock_op()
> >            - down_read(&sbi->cp_rwsem);
> > 
> > Fixes: 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data")
> > Signed-off-by: Sayali Lokhande <sayalil@codeaurora.org>
> > Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> > ---
> >  fs/f2fs/checkpoint.c |  5 +++++
> >  fs/f2fs/f2fs.h       |  1 +
> >  fs/f2fs/node.c       | 51 ++++++++++++++++++++++++++++++++++++++++++--
> >  3 files changed, 55 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
> > index d49f7a01d8a26..79e605f38f4fa 100644
> > --- a/fs/f2fs/checkpoint.c
> > +++ b/fs/f2fs/checkpoint.c
> > @@ -1168,6 +1168,11 @@ static int block_operations(struct f2fs_sb_info *sbi)
> >  	};
> >  	int err = 0, cnt = 0;
> >  
> > +	/*
> > +	 * Let's flush inline_data in dirty node pages.
> > +	 */
> > +	f2fs_flush_inline_data(sbi);
> 
> Still there is a gap, user can write a large number of inline data here...

I think generic/204 is the case, and I don't hit a panic with this patch.

> 
> Would that be enough? I doubt we can suffer this issue in below pathes
> as well:

I don't think so, since iput is called after f2fs_unlock_all().

> 
> - block_operations
>  - f2fs_sync_dirty_inodes
>   - iput
>  - f2fs_sync_inode_meta
>   - iput
> 
> Thanks,
> 
> > +
> >  retry_flush_quotas:
> >  	f2fs_lock_all(sbi);
> >  	if (__need_flush_quota(sbi)) {
> > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> > index 2a8ea81c52a15..7f3d259e7e376 100644
> > --- a/fs/f2fs/f2fs.h
> > +++ b/fs/f2fs/f2fs.h
> > @@ -3282,6 +3282,7 @@ void f2fs_ra_node_page(struct f2fs_sb_info *sbi, nid_t nid);
> >  struct page *f2fs_get_node_page(struct f2fs_sb_info *sbi, pgoff_t nid);
> >  struct page *f2fs_get_node_page_ra(struct page *parent, int start);
> >  int f2fs_move_node_page(struct page *node_page, int gc_type);
> > +int f2fs_flush_inline_data(struct f2fs_sb_info *sbi);
> >  int f2fs_fsync_node_pages(struct f2fs_sb_info *sbi, struct inode *inode,
> >  			struct writeback_control *wbc, bool atomic,
> >  			unsigned int *seq_id);
> > diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
> > index 1db8cabf727ef..e632de10aedab 100644
> > --- a/fs/f2fs/node.c
> > +++ b/fs/f2fs/node.c
> > @@ -1807,6 +1807,53 @@ static bool flush_dirty_inode(struct page *page)
> >  	return true;
> >  }
> >  
> > +int f2fs_flush_inline_data(struct f2fs_sb_info *sbi)
> > +{
> > +	pgoff_t index = 0;
> > +	struct pagevec pvec;
> > +	int nr_pages;
> > +	int ret = 0;
> > +
> > +	pagevec_init(&pvec);
> > +
> > +	while ((nr_pages = pagevec_lookup_tag(&pvec,
> > +			NODE_MAPPING(sbi), &index, PAGECACHE_TAG_DIRTY))) {
> > +		int i;
> > +
> > +		for (i = 0; i < nr_pages; i++) {
> > +			struct page *page = pvec.pages[i];
> > +
> > +			if (!IS_DNODE(page))
> > +				continue;
> > +
> > +			lock_page(page);
> > +
> > +			if (unlikely(page->mapping != NODE_MAPPING(sbi))) {
> > +continue_unlock:
> > +				unlock_page(page);
> > +				continue;
> > +			}
> > +
> > +			if (!PageDirty(page)) {
> > +				/* someone wrote it for us */
> > +				goto continue_unlock;
> > +			}
> > +
> > +			/* flush inline_data, if it's async context. */
> > +			if (is_inline_node(page)) {
> > +				clear_inline_node(page);
> > +				unlock_page(page);
> > +				flush_inline_data(sbi, ino_of_node(page));
> > +				continue;
> > +			}
> > +			unlock_page(page);
> > +		}
> > +		pagevec_release(&pvec);
> > +		cond_resched();
> > +	}
> > +	return ret;
> > +}
> > +
> >  int f2fs_sync_node_pages(struct f2fs_sb_info *sbi,
> >  				struct writeback_control *wbc,
> >  				bool do_balance, enum iostat_type io_type)
> > @@ -1870,8 +1917,8 @@ int f2fs_sync_node_pages(struct f2fs_sb_info *sbi,
> >  				goto continue_unlock;
> >  			}
> >  
> > -			/* flush inline_data */
> > -			if (is_inline_node(page)) {
> > +			/* flush inline_data, if it's async context. */
> > +			if (do_balance && is_inline_node(page)) {
> >  				clear_inline_node(page);
> >  				unlock_page(page);
> >  				flush_inline_data(sbi, ino_of_node(page));
> > 


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [f2fs-dev] [PATCH V4] f2fs: Avoid double lock for cp_rwsem during checkpoint
  2020-05-12  3:24             ` Jaegeuk Kim
@ 2020-05-12  6:49               ` Chao Yu
  0 siblings, 0 replies; 10+ messages in thread
From: Chao Yu @ 2020-05-12  6:49 UTC (permalink / raw)
  To: Jaegeuk Kim; +Cc: linux-kernel, linux-f2fs-devel

On 2020/5/12 11:24, Jaegeuk Kim wrote:
> On 05/12, Chao Yu wrote:
>> On 2020/5/12 6:11, Jaegeuk Kim wrote:
>>> On 05/11, Chao Yu wrote:
>>>> On 2020/5/10 3:03, Jaegeuk Kim wrote:
>>>>> On 05/09, Chao Yu wrote:
>>>>>> On 2020/5/9 0:10, Jaegeuk Kim wrote:
>>>>>>> Hi Sayali,
>>>>>>>
>>>>>>> In order to address the perf regression, how about this?
>>>>>>>
>>>>>>> >From 48418af635884803ffb35972df7958a2e6649322 Mon Sep 17 00:00:00 2001
>>>>>>> From: Jaegeuk Kim <jaegeuk@kernel.org>
>>>>>>> Date: Fri, 8 May 2020 09:08:37 -0700
>>>>>>> Subject: [PATCH] f2fs: avoid double lock for cp_rwsem during checkpoint
>>>>>>>
>>>>>>> There could be a scenario where f2fs_sync_node_pages gets
>>>>>>> called during checkpoint, which in turn tries to flush
>>>>>>> inline data and calls iput(). This results in deadlock as
>>>>>>> iput() tries to hold cp_rwsem, which is already held at the
>>>>>>> beginning by checkpoint->block_operations().
>>>>>>>
>>>>>>> Call stack :
>>>>>>>
>>>>>>> Thread A		Thread B
>>>>>>> f2fs_write_checkpoint()
>>>>>>> - block_operations(sbi)
>>>>>>>  - f2fs_lock_all(sbi);
>>>>>>>   - down_write(&sbi->cp_rwsem);
>>>>>>>
>>>>>>>                         - open()
>>>>>>>                          - igrab()
>>>>>>>                         - write() write inline data
>>>>>>>                         - unlink()
>>>>>>> - f2fs_sync_node_pages()
>>>>>>>  - if (is_inline_node(page))
>>>>>>>   - flush_inline_data()
>>>>>>>    - ilookup()
>>>>>>>      page = f2fs_pagecache_get_page()
>>>>>>>      if (!page)
>>>>>>>       goto iput_out;
>>>>>>>      iput_out:
>>>>>>> 			-close()
>>>>>>> 			-iput()
>>>>>>>        iput(inode);
>>>>>>>        - f2fs_evict_inode()
>>>>>>>         - f2fs_truncate_blocks()
>>>>>>>          - f2fs_lock_op()
>>>>>>>            - down_read(&sbi->cp_rwsem);
>>>>>>>
>>>>>>> Fixes: 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data")
>>>>>>> Signed-off-by: Sayali Lokhande <sayalil@codeaurora.org>
>>>>>>> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
>>>>>>> ---
>>>>>>>  fs/f2fs/node.c | 4 ++--
>>>>>>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
>>>>>>> index 1db8cabf727ef..626d7daca09de 100644
>>>>>>> --- a/fs/f2fs/node.c
>>>>>>> +++ b/fs/f2fs/node.c
>>>>>>> @@ -1870,8 +1870,8 @@ int f2fs_sync_node_pages(struct f2fs_sb_info *sbi,
>>>>>>>  				goto continue_unlock;
>>>>>>>  			}
>>>>>>>  
>>>>>>> -			/* flush inline_data */
>>>>>>> -			if (is_inline_node(page)) {
>>>>>>> +			/* flush inline_data, if it's not sync path. */
>>>>>>> +			if (do_balance && is_inline_node(page)) {
>>>>>>
>>>>>> IIRC, this flow was designed to avoid running out of free space issue
>>>>>> during checkpoint:
>>>>>>
>>>>>> 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data")
>>>>>>
>>>>>> The sceanrio is:
>>>>>> 1. create fully node blocks
>>>>>> 2. flush node blocks
>>>>>> 3. write inline_data for all the node blocks again
>>>>>> 4. flush node blocks redundantly
>>>>>>
>>>>>> I guess this may cause failing one case of fstest.
>>>>>
>>>>> Yeah, actually I was hitting 204 failure, and thus, revised like this.
>>>>> Now, I don't see any regression in fstest.
>>>>>
>>>>> >From 8f1882acfb0a5fc43e5a2bbd576a8f3c681a7d2c Mon Sep 17 00:00:00 2001
>>>>> From: Sayali Lokhande <sayalil@codeaurora.org>
>>>>> Date: Thu, 30 Apr 2020 16:28:29 +0530
>>>>> Subject: [PATCH] f2fs: Avoid double lock for cp_rwsem during checkpoint
>>>>>
>>>>> There could be a scenario where f2fs_sync_node_pages gets
>>>>> called during checkpoint, which in turn tries to flush
>>>>> inline data and calls iput(). This results in deadlock as
>>>>> iput() tries to hold cp_rwsem, which is already held at the
>>>>> beginning by checkpoint->block_operations().
>>>>>
>>>>> Call stack :
>>>>>
>>>>> Thread A		Thread B
>>>>> f2fs_write_checkpoint()
>>>>> - block_operations(sbi)
>>>>>  - f2fs_lock_all(sbi);
>>>>>   - down_write(&sbi->cp_rwsem);
>>>>>
>>>>>                         - open()
>>>>>                          - igrab()
>>>>>                         - write() write inline data
>>>>>                         - unlink()
>>>>> - f2fs_sync_node_pages()
>>>>>  - if (is_inline_node(page))
>>>>>   - flush_inline_data()
>>>>>    - ilookup()
>>>>>      page = f2fs_pagecache_get_page()
>>>>>      if (!page)
>>>>>       goto iput_out;
>>>>>      iput_out:
>>>>> 			-close()
>>>>> 			-iput()
>>>>>        iput(inode);
>>>>>        - f2fs_evict_inode()
>>>>>         - f2fs_truncate_blocks()
>>>>>          - f2fs_lock_op()
>>>>>            - down_read(&sbi->cp_rwsem);
>>>>>
>>>>> Fixes: 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data")
>>>>> Signed-off-by: Sayali Lokhande <sayalil@codeaurora.org>
>>>>> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
>>>>> ---
>>>>>  fs/f2fs/checkpoint.c |  9 ++++++++-
>>>>>  fs/f2fs/f2fs.h       |  4 ++--
>>>>>  fs/f2fs/node.c       | 10 +++++-----
>>>>>  3 files changed, 15 insertions(+), 8 deletions(-)
>>>>>
>>>>> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
>>>>> index d49f7a01d8a26..928aea4ff663d 100644
>>>>> --- a/fs/f2fs/checkpoint.c
>>>>> +++ b/fs/f2fs/checkpoint.c
>>>>> @@ -1168,6 +1168,12 @@ static int block_operations(struct f2fs_sb_info *sbi)
>>>>>  	};
>>>>>  	int err = 0, cnt = 0;
>>>>>  
>>>>> +	/*
>>>>> +	 * Let's flush node pages first to flush inline_data.
>>>>> +	 * We'll actually guarantee everything below under f2fs_lock_all.
>>>>> +	 */
>>>>> +	f2fs_sync_node_pages(sbi, &wbc, false, false, FS_CP_NODE_IO);
>>>>
>>>> It is possible that user write a large number of inline data in between
>>>> f2fs_sync_node_pages() and f2fs_lock_all(), it will cause the no-space issue in
>>>> race condition.
>>>>
>>>> Also, if there is huge number of F2FS_DIRTY_IMETA, after this change, we will
>>>> flush inode page twice which is unneeded.
>>>>
>>>> f2fs_sync_node_pages() --- flush dirty inode page
>>>> f2fs_lock_all()
>>>> ...
>>>> f2fs_sync_inode_meta() --- update dirty inode page
>>>> f2fs_sync_node_pages() --- flush dirty inode page again.
>>>>
>>>
>>> Another version:
>>>
>>> >From 6b430b72af57c65c20dea7b87f7ba7e9df36be98 Mon Sep 17 00:00:00 2001
>>> From: Sayali Lokhande <sayalil@codeaurora.org>
>>> Date: Thu, 30 Apr 2020 16:28:29 +0530
>>> Subject: [PATCH] f2fs: Avoid double lock for cp_rwsem during checkpoint
>>>
>>> There could be a scenario where f2fs_sync_node_pages gets
>>> called during checkpoint, which in turn tries to flush
>>> inline data and calls iput(). This results in deadlock as
>>> iput() tries to hold cp_rwsem, which is already held at the
>>> beginning by checkpoint->block_operations().
>>>
>>> Call stack :
>>>
>>> Thread A		Thread B
>>> f2fs_write_checkpoint()
>>> - block_operations(sbi)
>>>  - f2fs_lock_all(sbi);
>>>   - down_write(&sbi->cp_rwsem);
>>>
>>>                         - open()
>>>                          - igrab()
>>>                         - write() write inline data
>>>                         - unlink()
>>> - f2fs_sync_node_pages()
>>>  - if (is_inline_node(page))
>>>   - flush_inline_data()
>>>    - ilookup()
>>>      page = f2fs_pagecache_get_page()
>>>      if (!page)
>>>       goto iput_out;
>>>      iput_out:
>>> 			-close()
>>> 			-iput()
>>>        iput(inode);
>>>        - f2fs_evict_inode()
>>>         - f2fs_truncate_blocks()
>>>          - f2fs_lock_op()
>>>            - down_read(&sbi->cp_rwsem);
>>>
>>> Fixes: 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data")
>>> Signed-off-by: Sayali Lokhande <sayalil@codeaurora.org>
>>> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
>>> ---
>>>  fs/f2fs/checkpoint.c |  5 +++++
>>>  fs/f2fs/f2fs.h       |  1 +
>>>  fs/f2fs/node.c       | 51 ++++++++++++++++++++++++++++++++++++++++++--
>>>  3 files changed, 55 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
>>> index d49f7a01d8a26..79e605f38f4fa 100644
>>> --- a/fs/f2fs/checkpoint.c
>>> +++ b/fs/f2fs/checkpoint.c
>>> @@ -1168,6 +1168,11 @@ static int block_operations(struct f2fs_sb_info *sbi)
>>>  	};
>>>  	int err = 0, cnt = 0;
>>>  
>>> +	/*
>>> +	 * Let's flush inline_data in dirty node pages.
>>> +	 */
>>> +	f2fs_flush_inline_data(sbi);
>>
>> Still there is a gap, user can write a large number of inline data here...
> 
> I think generic/204 is the case, and I don't hit a panic with this patch.

Yes, may be triggered by a variant of generic/204 which is not exist yet, and
it's a rare corner case.

> 
>>
>> Would that be enough? I doubt we can suffer this issue in below pathes
>> as well:
> 
> I don't think so, since iput is called after f2fs_unlock_all().

That's correct.

Thanks,

> 
>>
>> - block_operations
>>  - f2fs_sync_dirty_inodes
>>   - iput
>>  - f2fs_sync_inode_meta
>>   - iput
>>
>> Thanks,
>>
>>> +
>>>  retry_flush_quotas:
>>>  	f2fs_lock_all(sbi);
>>>  	if (__need_flush_quota(sbi)) {
>>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
>>> index 2a8ea81c52a15..7f3d259e7e376 100644
>>> --- a/fs/f2fs/f2fs.h
>>> +++ b/fs/f2fs/f2fs.h
>>> @@ -3282,6 +3282,7 @@ void f2fs_ra_node_page(struct f2fs_sb_info *sbi, nid_t nid);
>>>  struct page *f2fs_get_node_page(struct f2fs_sb_info *sbi, pgoff_t nid);
>>>  struct page *f2fs_get_node_page_ra(struct page *parent, int start);
>>>  int f2fs_move_node_page(struct page *node_page, int gc_type);
>>> +int f2fs_flush_inline_data(struct f2fs_sb_info *sbi);
>>>  int f2fs_fsync_node_pages(struct f2fs_sb_info *sbi, struct inode *inode,
>>>  			struct writeback_control *wbc, bool atomic,
>>>  			unsigned int *seq_id);
>>> diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
>>> index 1db8cabf727ef..e632de10aedab 100644
>>> --- a/fs/f2fs/node.c
>>> +++ b/fs/f2fs/node.c
>>> @@ -1807,6 +1807,53 @@ static bool flush_dirty_inode(struct page *page)
>>>  	return true;
>>>  }
>>>  
>>> +int f2fs_flush_inline_data(struct f2fs_sb_info *sbi)
>>> +{
>>> +	pgoff_t index = 0;
>>> +	struct pagevec pvec;
>>> +	int nr_pages;
>>> +	int ret = 0;
>>> +
>>> +	pagevec_init(&pvec);
>>> +
>>> +	while ((nr_pages = pagevec_lookup_tag(&pvec,
>>> +			NODE_MAPPING(sbi), &index, PAGECACHE_TAG_DIRTY))) {
>>> +		int i;
>>> +
>>> +		for (i = 0; i < nr_pages; i++) {
>>> +			struct page *page = pvec.pages[i];
>>> +
>>> +			if (!IS_DNODE(page))
>>> +				continue;
>>> +
>>> +			lock_page(page);
>>> +
>>> +			if (unlikely(page->mapping != NODE_MAPPING(sbi))) {
>>> +continue_unlock:
>>> +				unlock_page(page);
>>> +				continue;
>>> +			}
>>> +
>>> +			if (!PageDirty(page)) {
>>> +				/* someone wrote it for us */
>>> +				goto continue_unlock;
>>> +			}
>>> +
>>> +			/* flush inline_data, if it's async context. */
>>> +			if (is_inline_node(page)) {
>>> +				clear_inline_node(page);
>>> +				unlock_page(page);
>>> +				flush_inline_data(sbi, ino_of_node(page));
>>> +				continue;
>>> +			}
>>> +			unlock_page(page);
>>> +		}
>>> +		pagevec_release(&pvec);
>>> +		cond_resched();
>>> +	}
>>> +	return ret;
>>> +}
>>> +
>>>  int f2fs_sync_node_pages(struct f2fs_sb_info *sbi,
>>>  				struct writeback_control *wbc,
>>>  				bool do_balance, enum iostat_type io_type)
>>> @@ -1870,8 +1917,8 @@ int f2fs_sync_node_pages(struct f2fs_sb_info *sbi,
>>>  				goto continue_unlock;
>>>  			}
>>>  
>>> -			/* flush inline_data */
>>> -			if (is_inline_node(page)) {
>>> +			/* flush inline_data, if it's async context. */
>>> +			if (do_balance && is_inline_node(page)) {
>>>  				clear_inline_node(page);
>>>  				unlock_page(page);
>>>  				flush_inline_data(sbi, ino_of_node(page));
>>>
> .
> 


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, back to index

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-30 10:58 [f2fs-dev] [PATCH V4] f2fs: Avoid double lock for cp_rwsem during checkpoint Sayali Lokhande
2020-05-06  6:21 ` Chao Yu
2020-05-08 16:10 ` Jaegeuk Kim
2020-05-09  3:02   ` Chao Yu
2020-05-09 19:03     ` Jaegeuk Kim
2020-05-11  1:28       ` Chao Yu
2020-05-11 22:11         ` Jaegeuk Kim
2020-05-12  1:57           ` Chao Yu
2020-05-12  3:24             ` Jaegeuk Kim
2020-05-12  6:49               ` Chao Yu

Linux-f2fs-devel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-f2fs-devel/0 linux-f2fs-devel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-f2fs-devel linux-f2fs-devel/ https://lore.kernel.org/linux-f2fs-devel \
		linux-f2fs-devel@lists.sourceforge.net
	public-inbox-index linux-f2fs-devel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/net.sourceforge.lists.linux-f2fs-devel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git