linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] f2fs: fix a deadlock in fsync
@ 2013-08-05 12:02 Jin Xu
  2013-08-06 12:46 ` Jaegeuk Kim
  0 siblings, 1 reply; 4+ messages in thread
From: Jin Xu @ 2013-08-05 12:02 UTC (permalink / raw)
  To: jaegeuk.kim; +Cc: linux-fsdevel, jinuxstyle, linux-kernel, linux-f2fs-devel

From: Jin Xu <jinuxstyle@gmail.com>

This patch fixes a deadlock bug that occurs quite often when there are
concurrent write and fsync on a same file.

Following is the simplified call trace when tasks get hung.

fsync thread:
- f2fs_sync_file
 ...
 - f2fs_write_data_pages
 ...
  - update_extent_cache
  ...
   - update_inode
    - wait_on_page_writeback

bdi writeback thread
- __writeback_single_inode
 - f2fs_write_data_pages
  - mutex_lock(sbi->writepages)

The deadlock happens when the fsync thread waits on a inode page that has
been added to the f2fs' cached bio sbi->bio[NODE], and unfortunately,
no one else could be able to submit the cached bio to block layer for
writeback. This is because the fsync thread already hold a sbi->fs_lock and
the sbi->writepages lock, causing the bdi thread being blocked when attempt
to write data pages for the same inode. At the same time, f2fs_gc thread
does not notice the situation and could not help. Even the sync syscall
gets blocked.

To fix it, we could submit the cached bio first before waiting on a inode page
that is being written back.

Signed-off-by: Jin Xu <jinuxstyle@gmail.com>
---
 fs/f2fs/f2fs.h    |    2 ++
 fs/f2fs/gc.c      |    5 +----
 fs/f2fs/inode.c   |    3 ++-
 fs/f2fs/segment.c |    9 +++++++++
 4 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 467d42d..064d3f9 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1013,6 +1013,8 @@ void allocate_new_segments(struct f2fs_sb_info *);
 struct page *get_sum_page(struct f2fs_sb_info *, unsigned int);
 struct bio *f2fs_bio_alloc(struct block_device *, int);
 void f2fs_submit_bio(struct f2fs_sb_info *, enum page_type, bool sync);
+void f2fs_wait_on_page_writeback(struct f2fs_sb_info *sbi,
+		struct page *page, enum page_type type, bool sync);
 void write_meta_page(struct f2fs_sb_info *, struct page *);
 void write_node_page(struct f2fs_sb_info *, struct page *, unsigned int,
 					block_t, block_t *);
diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index 35f9b1a..acfa411 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -508,10 +508,7 @@ static void move_data_page(struct inode *inode, struct page *page, int gc_type)
 	} else {
 		struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
 
-		if (PageWriteback(page)) {
-			f2fs_submit_bio(sbi, DATA, true);
-			wait_on_page_writeback(page);
-		}
+		f2fs_wait_on_page_writeback(sbi, page, DATA, true);
 
 		if (clear_page_dirty_for_io(page) &&
 			S_ISDIR(inode->i_mode)) {
diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
index 2b2d45d1..d42b85b 100644
--- a/fs/f2fs/inode.c
+++ b/fs/f2fs/inode.c
@@ -148,10 +148,11 @@ bad_inode:
 
 void update_inode(struct inode *inode, struct page *node_page)
 {
+	struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
 	struct f2fs_node *rn;
 	struct f2fs_inode *ri;
 
-	wait_on_page_writeback(node_page);
+	f2fs_wait_on_page_writeback(sbi, node_page, NODE, false);
 
 	rn = page_address(node_page);
 	ri = &(rn->i);
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index a86d125..7056cc0 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -702,6 +702,15 @@ alloc_new:
 	trace_f2fs_submit_write_page(page, blk_addr, type);
 }
 
+void f2fs_wait_on_page_writeback(struct f2fs_sb_info *sbi,
+		struct page *page, enum page_type type, bool sync)
+{
+	if (PageWriteback(page)) {
+		f2fs_submit_bio(sbi, type, sync);
+		wait_on_page_writeback(page);
+	}
+}
+
 static bool __has_curseg_space(struct f2fs_sb_info *sbi, int type)
 {
 	struct curseg_info *curseg = CURSEG_I(sbi, type);
-- 
1.7.9.5


------------------------------------------------------------------------------
Get your SQL database under version control now!
Version control is standard for application code, but databases havent 
caught up. So what steps can you take to put your SQL databases under 
version control? Why should you start doing it? Read more to find out.
http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] f2fs: fix a deadlock in fsync
  2013-08-05 12:02 [PATCH] f2fs: fix a deadlock in fsync Jin Xu
@ 2013-08-06 12:46 ` Jaegeuk Kim
  2013-08-07  4:23   ` Jim Xu
  0 siblings, 1 reply; 4+ messages in thread
From: Jaegeuk Kim @ 2013-08-06 12:46 UTC (permalink / raw)
  To: Jin Xu; +Cc: linux-fsdevel, linux-kernel, linux-f2fs-devel

Hi, Jin,

IMO, this patch tries to fix the deadlock condition on
f2fs_write_data_pages.

I think the errorneous scenario is something like this.
When there remains only one fs_lock during the checkpoint procedure,
f2fs_write_data_pages successfully gets the last one at the moment.
Then, other operations like sync and writeback thread are definitely
blocked too.

Meanwhile, in the flow of f2fs_write_data_pages, it is able to wait on
writebacked node page, which is what you decribed.

If you indicated this scenario correctly, as I examined the flow again,
I found one more case, __set_data_blkaddr, in addition to the
update_inode. And, I can clean up another minor flow too.

Please check the below patch.
Thanks,

----> 

>From a9c62162ea89c9b6b52d39d6db3f8f27c4d2ce5c Mon Sep 17 00:00:00 2001
From: Jin Xu <jinuxstyle@gmail.com>
Date: Mon, 5 Aug 2013 20:02:04 +0800
Subject: [PATCH] f2fs: fix a deadlock in fsync
Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-f2fs-devel@lists.sourceforge.net

This patch fixes a deadlock bug that occurs quite often when there are
concurrent write and fsync on a same file.

Following is the simplified call trace when tasks get hung.

fsync thread:
- f2fs_sync_file
 ...
 - f2fs_write_data_pages
 ...
  - update_extent_cache
  ...
   - update_inode
    - wait_on_page_writeback

bdi writeback thread
- __writeback_single_inode
 - f2fs_write_data_pages
  - mutex_lock(sbi->writepages)

The deadlock happens when the fsync thread waits on a inode page that
has
been added to the f2fs' cached bio sbi->bio[NODE], and unfortunately,
no one else could be able to submit the cached bio to block layer for
writeback. This is because the fsync thread already hold a sbi->fs_lock
and
the sbi->writepages lock, causing the bdi thread being blocked when
attempt
to write data pages for the same inode. At the same time, f2fs_gc thread
does not notice the situation and could not help. Even the sync syscall
gets blocked.

To fix it, we could submit the cached bio first before waiting on a
inode page
that is being written back.

Signed-off-by: Jin Xu <jinuxstyle@gmail.com>
[Jaegeuk Kim: add more cases to use f2fs_wait_on_page_writeback]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
---
 fs/f2fs/data.c    |  2 +-
 fs/f2fs/f2fs.h    |  3 ++-
 fs/f2fs/gc.c      |  8 ++------
 fs/f2fs/inode.c   |  2 +-
 fs/f2fs/segment.c | 10 ++++++++++
 5 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index f458883..a7eb529 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -37,7 +37,7 @@ static void __set_data_blkaddr(struct dnode_of_data
*dn, block_t new_addr)
 	struct page *node_page = dn->node_page;
 	unsigned int ofs_in_node = dn->ofs_in_node;
 
-	wait_on_page_writeback(node_page);
+	f2fs_wait_on_page_writeback(node_page, NODE, false);
 
 	rn = F2FS_NODE(node_page);
 
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 63813be..13db10b 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1023,7 +1023,8 @@ int npages_for_summary_flush(struct f2fs_sb_info
*);
 void allocate_new_segments(struct f2fs_sb_info *);
 struct page *get_sum_page(struct f2fs_sb_info *, unsigned int);
 struct bio *f2fs_bio_alloc(struct block_device *, int);
-void f2fs_submit_bio(struct f2fs_sb_info *, enum page_type, bool sync);
+void f2fs_submit_bio(struct f2fs_sb_info *, enum page_type, bool);
+void f2fs_wait_on_page_writeback(struct page *, enum page_type, bool);
 void write_meta_page(struct f2fs_sb_info *, struct page *);
 void write_node_page(struct f2fs_sb_info *, struct page *, unsigned
int,
 					block_t, block_t *);
diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index d286d8b..e6b3ffd 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -422,8 +422,7 @@ next_step:
 
 		/* set page dirty and write it */
 		if (gc_type == FG_GC) {
-			f2fs_submit_bio(sbi, NODE, true);
-			wait_on_page_writeback(node_page);
+			f2fs_wait_on_page_writeback(node_page, NODE, true);
 			set_page_dirty(node_page);
 		} else {
 			if (!PageWriteback(node_page))
@@ -523,10 +522,7 @@ static void move_data_page(struct inode *inode,
struct page *page, int gc_type)
 	} else {
 		struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
 
-		if (PageWriteback(page)) {
-			f2fs_submit_bio(sbi, DATA, true);
-			wait_on_page_writeback(page);
-		}
+		f2fs_wait_on_page_writeback(page, DATA, true);
 
 		if (clear_page_dirty_for_io(page) &&
 			S_ISDIR(inode->i_mode)) {
diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
index debf743..9ab81e7 100644
--- a/fs/f2fs/inode.c
+++ b/fs/f2fs/inode.c
@@ -151,7 +151,7 @@ void update_inode(struct inode *inode, struct page
*node_page)
 	struct f2fs_node *rn;
 	struct f2fs_inode *ri;
 
-	wait_on_page_writeback(node_page);
+	f2fs_wait_on_page_writeback(node_page, NODE, false);
 
 	rn = F2FS_NODE(node_page);
 	ri = &(rn->i);
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 9b74ae2..68e344f 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -705,6 +705,16 @@ retry:
 	trace_f2fs_submit_write_page(page, blk_addr, type);
 }
 
+void f2fs_wait_on_page_writeback(struct page *page,
+				enum page_type type, bool sync)
+{
+	struct f2fs_sb_info *sbi = F2FS_SB(page->mapping->host->i_sb);
+	if (PageWriteback(page)) {
+		f2fs_submit_bio(sbi, type, sync);
+		wait_on_page_writeback(page);
+	}
+}
+
 static bool __has_curseg_space(struct f2fs_sb_info *sbi, int type)
 {
 	struct curseg_info *curseg = CURSEG_I(sbi, type);
-- 
1.8.3.1.437.g0dbd812



-- 
Jaegeuk Kim
Samsung


------------------------------------------------------------------------------
Get your SQL database under version control now!
Version control is standard for application code, but databases havent 
caught up. So what steps can you take to put your SQL databases under 
version control? Why should you start doing it? Read more to find out.
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] f2fs: fix a deadlock in fsync
  2013-08-06 12:46 ` Jaegeuk Kim
@ 2013-08-07  4:23   ` Jim Xu
  2013-08-07  9:43     ` Jaegeuk Kim
  0 siblings, 1 reply; 4+ messages in thread
From: Jim Xu @ 2013-08-07  4:23 UTC (permalink / raw)
  To: jaegeuk.kim; +Cc: linux-fsdevel, linux-kernel, linux-f2fs-devel

Hi Kim,

The patch sounds good to me.

BTW, as in the deadlock situation I noticed, I did not find any thread was doing checkpoint at that moment. The bdi writeback thread and the fsync user thread was not. Neither was the background f2fs_gc thread because there had enough free segments. Meanwhile, another user thread was blocked in the f2fs_sync_file->sync_node_pages, which, I think, is not a factor contributing to the deadlock.

Thanks,
Jin

On 2013-8-6, at 20:46, Jaegeuk Kim <jaegeuk.kim@samsung.com> wrote:

> Hi, Jin,
> 
> IMO, this patch tries to fix the deadlock condition on
> f2fs_write_data_pages.
> 
> I think the errorneous scenario is something like this.
> When there remains only one fs_lock during the checkpoint procedure,
> f2fs_write_data_pages successfully gets the last one at the moment.
> Then, other operations like sync and writeback thread are definitely
> blocked too.
> 
> Meanwhile, in the flow of f2fs_write_data_pages, it is able to wait on
> writebacked node page, which is what you decribed.
> 
> If you indicated this scenario correctly, as I examined the flow again,
> I found one more case, __set_data_blkaddr, in addition to the
> update_inode. And, I can clean up another minor flow too.
> 
> Please check the below patch.
> Thanks,
> 
> ----> 
> 
> From a9c62162ea89c9b6b52d39d6db3f8f27c4d2ce5c Mon Sep 17 00:00:00 2001
> From: Jin Xu <jinuxstyle@gmail.com>
> Date: Mon, 5 Aug 2013 20:02:04 +0800
> Subject: [PATCH] f2fs: fix a deadlock in fsync
> Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
> linux-f2fs-devel@lists.sourceforge.net
> 
> This patch fixes a deadlock bug that occurs quite often when there are
> concurrent write and fsync on a same file.
> 
> Following is the simplified call trace when tasks get hung.
> 
> fsync thread:
> - f2fs_sync_file
> ...
> - f2fs_write_data_pages
> ...
>  - update_extent_cache
>  ...
>   - update_inode
>    - wait_on_page_writeback
> 
> bdi writeback thread
> - __writeback_single_inode
> - f2fs_write_data_pages
>  - mutex_lock(sbi->writepages)
> 
> The deadlock happens when the fsync thread waits on a inode page that
> has
> been added to the f2fs' cached bio sbi->bio[NODE], and unfortunately,
> no one else could be able to submit the cached bio to block layer for
> writeback. This is because the fsync thread already hold a sbi->fs_lock
> and
> the sbi->writepages lock, causing the bdi thread being blocked when
> attempt
> to write data pages for the same inode. At the same time, f2fs_gc thread
> does not notice the situation and could not help. Even the sync syscall
> gets blocked.
> 
> To fix it, we could submit the cached bio first before waiting on a
> inode page
> that is being written back.
> 
> Signed-off-by: Jin Xu <jinuxstyle@gmail.com>
> [Jaegeuk Kim: add more cases to use f2fs_wait_on_page_writeback]
> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
> ---
> fs/f2fs/data.c    |  2 +-
> fs/f2fs/f2fs.h    |  3 ++-
> fs/f2fs/gc.c      |  8 ++------
> fs/f2fs/inode.c   |  2 +-
> fs/f2fs/segment.c | 10 ++++++++++
> 5 files changed, 16 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> index f458883..a7eb529 100644
> --- a/fs/f2fs/data.c
> +++ b/fs/f2fs/data.c
> @@ -37,7 +37,7 @@ static void __set_data_blkaddr(struct dnode_of_data
> *dn, block_t new_addr)
>    struct page *node_page = dn->node_page;
>    unsigned int ofs_in_node = dn->ofs_in_node;
> 
> -    wait_on_page_writeback(node_page);
> +    f2fs_wait_on_page_writeback(node_page, NODE, false);
> 
>    rn = F2FS_NODE(node_page);
> 
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index 63813be..13db10b 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -1023,7 +1023,8 @@ int npages_for_summary_flush(struct f2fs_sb_info
> *);
> void allocate_new_segments(struct f2fs_sb_info *);
> struct page *get_sum_page(struct f2fs_sb_info *, unsigned int);
> struct bio *f2fs_bio_alloc(struct block_device *, int);
> -void f2fs_submit_bio(struct f2fs_sb_info *, enum page_type, bool sync);
> +void f2fs_submit_bio(struct f2fs_sb_info *, enum page_type, bool);
> +void f2fs_wait_on_page_writeback(struct page *, enum page_type, bool);
> void write_meta_page(struct f2fs_sb_info *, struct page *);
> void write_node_page(struct f2fs_sb_info *, struct page *, unsigned
> int,
>                    block_t, block_t *);
> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
> index d286d8b..e6b3ffd 100644
> --- a/fs/f2fs/gc.c
> +++ b/fs/f2fs/gc.c
> @@ -422,8 +422,7 @@ next_step:
> 
>        /* set page dirty and write it */
>        if (gc_type == FG_GC) {
> -            f2fs_submit_bio(sbi, NODE, true);
> -            wait_on_page_writeback(node_page);
> +            f2fs_wait_on_page_writeback(node_page, NODE, true);
>            set_page_dirty(node_page);
>        } else {
>            if (!PageWriteback(node_page))
> @@ -523,10 +522,7 @@ static void move_data_page(struct inode *inode,
> struct page *page, int gc_type)
>    } else {
>        struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
> 
> -        if (PageWriteback(page)) {
> -            f2fs_submit_bio(sbi, DATA, true);
> -            wait_on_page_writeback(page);
> -        }
> +        f2fs_wait_on_page_writeback(page, DATA, true);
> 
>        if (clear_page_dirty_for_io(page) &&
>            S_ISDIR(inode->i_mode)) {
> diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
> index debf743..9ab81e7 100644
> --- a/fs/f2fs/inode.c
> +++ b/fs/f2fs/inode.c
> @@ -151,7 +151,7 @@ void update_inode(struct inode *inode, struct page
> *node_page)
>    struct f2fs_node *rn;
>    struct f2fs_inode *ri;
> 
> -    wait_on_page_writeback(node_page);
> +    f2fs_wait_on_page_writeback(node_page, NODE, false);
> 
>    rn = F2FS_NODE(node_page);
>    ri = &(rn->i);
> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> index 9b74ae2..68e344f 100644
> --- a/fs/f2fs/segment.c
> +++ b/fs/f2fs/segment.c
> @@ -705,6 +705,16 @@ retry:
>    trace_f2fs_submit_write_page(page, blk_addr, type);
> }
> 
> +void f2fs_wait_on_page_writeback(struct page *page,
> +                enum page_type type, bool sync)
> +{
> +    struct f2fs_sb_info *sbi = F2FS_SB(page->mapping->host->i_sb);
> +    if (PageWriteback(page)) {
> +        f2fs_submit_bio(sbi, type, sync);
> +        wait_on_page_writeback(page);
> +    }
> +}
> +
> static bool __has_curseg_space(struct f2fs_sb_info *sbi, int type)
> {
>    struct curseg_info *curseg = CURSEG_I(sbi, type);
> -- 
> 1.8.3.1.437.g0dbd812
> 
> 
> 
> -- 
> Jaegeuk Kim
> Samsung
> 

------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead. 
Download for free and get started troubleshooting in minutes. 
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] f2fs: fix a deadlock in fsync
  2013-08-07  4:23   ` Jim Xu
@ 2013-08-07  9:43     ` Jaegeuk Kim
  0 siblings, 0 replies; 4+ messages in thread
From: Jaegeuk Kim @ 2013-08-07  9:43 UTC (permalink / raw)
  To: Jim Xu; +Cc: linux-fsdevel, linux-kernel, linux-f2fs-devel

Hi Jin,

2013-08-07 (수), 12:23 +0800, Jim Xu:
> Hi Kim,
> 
> The patch sounds good to me.
> 
> BTW, as in the deadlock situation I noticed, I did not find any thread was doing checkpoint at that moment. The bdi writeback thread and the fsync user thread was not. Neither was the background f2fs_gc thread because there had enough free segments. Meanwhile, another user thread was blocked in the f2fs_sync_file->sync_node_pages, which, I think, is not a factor contributing to the deadlock.

I indicated one possible scenario.
As you described, many fsync calls can also consume all the fs_lock too.
Thanks,

-- 
Jaegeuk Kim
Samsung



------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead. 
Download for free and get started troubleshooting in minutes. 
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2013-08-07  9:43 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-05 12:02 [PATCH] f2fs: fix a deadlock in fsync Jin Xu
2013-08-06 12:46 ` Jaegeuk Kim
2013-08-07  4:23   ` Jim Xu
2013-08-07  9:43     ` Jaegeuk Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).