All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] f2fs: Fix recover when nid of non-inode dnode < nid of inode
@ 2014-09-08 11:38 ` Huang Ying
  0 siblings, 0 replies; 31+ messages in thread
From: Huang Ying @ 2014-09-08 11:38 UTC (permalink / raw)
  To: Jaegeuk Kim, Changman Lee
  Cc: linux-f2fs-devel, linux-kernel, Huang Ying, huang.ying.caritas

For fsync, if the nid of a non-inode dnode < nid of inode and the
inode is not checkpointed.  The non-inode dnode may be written before
inode.  So in find_fsync_dnodes, f2fs_iget will fail, cause the
recovery fail.

Usually, inode will be allocated before non-inode dnode, so the nid of
inode < nid of non-inode dnode.  But it is possible for the reverse.
For example, because of alloc_nid_failed.

This is fixed via ignoring non-inode dnode before inode dnode in
find_fsync_dnodes.

The patch was tested via allocating nid reversely via a debugging
patch, that is, from big number to small number.

Signed-off-by: Huang, Ying <ying.huang@intel.com>
---
 fs/f2fs/recovery.c |    7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

--- a/fs/f2fs/recovery.c
+++ b/fs/f2fs/recovery.c
@@ -172,8 +172,8 @@ static int find_fsync_dnodes(struct f2fs
 			if (IS_INODE(page) && is_dent_dnode(page))
 				set_inode_flag(F2FS_I(entry->inode),
 							FI_INC_LINK);
-		} else {
-			if (IS_INODE(page) && is_dent_dnode(page)) {
+		} else if (IS_INODE(page)) {
+			if (is_dent_dnode(page)) {
 				err = recover_inode_page(sbi, page);
 				if (err)
 					break;
@@ -193,7 +193,8 @@ static int find_fsync_dnodes(struct f2fs
 				break;
 			}
 			list_add_tail(&entry->list, head);
-		}
+		} else
+			goto next;
 		entry->blkaddr = blkaddr;
 
 		err = recover_inode(entry->inode, page);

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH] f2fs: Fix recover when nid of non-inode dnode < nid of inode
@ 2014-09-08 11:38 ` Huang Ying
  0 siblings, 0 replies; 31+ messages in thread
From: Huang Ying @ 2014-09-08 11:38 UTC (permalink / raw)
  To: Jaegeuk Kim, Changman Lee
  Cc: Huang Ying, huang.ying.caritas, linux-kernel, linux-f2fs-devel

For fsync, if the nid of a non-inode dnode < nid of inode and the
inode is not checkpointed.  The non-inode dnode may be written before
inode.  So in find_fsync_dnodes, f2fs_iget will fail, cause the
recovery fail.

Usually, inode will be allocated before non-inode dnode, so the nid of
inode < nid of non-inode dnode.  But it is possible for the reverse.
For example, because of alloc_nid_failed.

This is fixed via ignoring non-inode dnode before inode dnode in
find_fsync_dnodes.

The patch was tested via allocating nid reversely via a debugging
patch, that is, from big number to small number.

Signed-off-by: Huang, Ying <ying.huang@intel.com>
---
 fs/f2fs/recovery.c |    7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

--- a/fs/f2fs/recovery.c
+++ b/fs/f2fs/recovery.c
@@ -172,8 +172,8 @@ static int find_fsync_dnodes(struct f2fs
 			if (IS_INODE(page) && is_dent_dnode(page))
 				set_inode_flag(F2FS_I(entry->inode),
 							FI_INC_LINK);
-		} else {
-			if (IS_INODE(page) && is_dent_dnode(page)) {
+		} else if (IS_INODE(page)) {
+			if (is_dent_dnode(page)) {
 				err = recover_inode_page(sbi, page);
 				if (err)
 					break;
@@ -193,7 +193,8 @@ static int find_fsync_dnodes(struct f2fs
 				break;
 			}
 			list_add_tail(&entry->list, head);
-		}
+		} else
+			goto next;
 		entry->blkaddr = blkaddr;
 
 		err = recover_inode(entry->inode, page);

------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] f2fs: Fix recover when nid of non-inode dnode < nid of inode
  2014-09-08 11:38 ` Huang Ying
@ 2014-09-09  5:23   ` Jaegeuk Kim
  -1 siblings, 0 replies; 31+ messages in thread
From: Jaegeuk Kim @ 2014-09-09  5:23 UTC (permalink / raw)
  To: Huang Ying
  Cc: Changman Lee, linux-f2fs-devel, linux-kernel, huang.ying.caritas

Hi Huang,

On Mon, Sep 08, 2014 at 07:38:26PM +0800, Huang Ying wrote:
> For fsync, if the nid of a non-inode dnode < nid of inode and the
> inode is not checkpointed.  The non-inode dnode may be written before
> inode.  So in find_fsync_dnodes, f2fs_iget will fail, cause the
> recovery fail.
> 
> Usually, inode will be allocated before non-inode dnode, so the nid of
> inode < nid of non-inode dnode.  But it is possible for the reverse.
> For example, because of alloc_nid_failed.
> 
> This is fixed via ignoring non-inode dnode before inode dnode in
> find_fsync_dnodes.
> 
> The patch was tested via allocating nid reversely via a debugging
> patch, that is, from big number to small number.
> 
> Signed-off-by: Huang, Ying <ying.huang@intel.com>
> ---
>  fs/f2fs/recovery.c |    7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> --- a/fs/f2fs/recovery.c
> +++ b/fs/f2fs/recovery.c
> @@ -172,8 +172,8 @@ static int find_fsync_dnodes(struct f2fs
>  			if (IS_INODE(page) && is_dent_dnode(page))
>  				set_inode_flag(F2FS_I(entry->inode),
>  							FI_INC_LINK);
> -		} else {
> -			if (IS_INODE(page) && is_dent_dnode(page)) {

If this is not inode block, we should add this inode to recover its data blocks.
Rather than this tweak, if iget is failed, we'd better go to next instead of
break.
Can you test that?

> +		} else if (IS_INODE(page)) {
> +			if (is_dent_dnode(page)) {
>  				err = recover_inode_page(sbi, page);
>  				if (err)
>  					break;
> @@ -193,7 +193,8 @@ static int find_fsync_dnodes(struct f2fs
>  				break;
>  			}
>  			list_add_tail(&entry->list, head);
> -		}
> +		} else
> +			goto next;
>  		entry->blkaddr = blkaddr;
>  
>  		err = recover_inode(entry->inode, page);

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] f2fs: Fix recover when nid of non-inode dnode < nid of inode
@ 2014-09-09  5:23   ` Jaegeuk Kim
  0 siblings, 0 replies; 31+ messages in thread
From: Jaegeuk Kim @ 2014-09-09  5:23 UTC (permalink / raw)
  To: Huang Ying; +Cc: huang.ying.caritas, linux-kernel, linux-f2fs-devel

Hi Huang,

On Mon, Sep 08, 2014 at 07:38:26PM +0800, Huang Ying wrote:
> For fsync, if the nid of a non-inode dnode < nid of inode and the
> inode is not checkpointed.  The non-inode dnode may be written before
> inode.  So in find_fsync_dnodes, f2fs_iget will fail, cause the
> recovery fail.
> 
> Usually, inode will be allocated before non-inode dnode, so the nid of
> inode < nid of non-inode dnode.  But it is possible for the reverse.
> For example, because of alloc_nid_failed.
> 
> This is fixed via ignoring non-inode dnode before inode dnode in
> find_fsync_dnodes.
> 
> The patch was tested via allocating nid reversely via a debugging
> patch, that is, from big number to small number.
> 
> Signed-off-by: Huang, Ying <ying.huang@intel.com>
> ---
>  fs/f2fs/recovery.c |    7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> --- a/fs/f2fs/recovery.c
> +++ b/fs/f2fs/recovery.c
> @@ -172,8 +172,8 @@ static int find_fsync_dnodes(struct f2fs
>  			if (IS_INODE(page) && is_dent_dnode(page))
>  				set_inode_flag(F2FS_I(entry->inode),
>  							FI_INC_LINK);
> -		} else {
> -			if (IS_INODE(page) && is_dent_dnode(page)) {

If this is not inode block, we should add this inode to recover its data blocks.
Rather than this tweak, if iget is failed, we'd better go to next instead of
break.
Can you test that?

> +		} else if (IS_INODE(page)) {
> +			if (is_dent_dnode(page)) {
>  				err = recover_inode_page(sbi, page);
>  				if (err)
>  					break;
> @@ -193,7 +193,8 @@ static int find_fsync_dnodes(struct f2fs
>  				break;
>  			}
>  			list_add_tail(&entry->list, head);
> -		}
> +		} else
> +			goto next;
>  		entry->blkaddr = blkaddr;
>  
>  		err = recover_inode(entry->inode, page);

------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce.
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] f2fs: Fix recover when nid of non-inode dnode < nid of inode
  2014-09-09  5:23   ` Jaegeuk Kim
  (?)
@ 2014-09-09  5:39   ` Huang Ying
  2014-09-09  7:09       ` Jaegeuk Kim
  -1 siblings, 1 reply; 31+ messages in thread
From: Huang Ying @ 2014-09-09  5:39 UTC (permalink / raw)
  To: Jaegeuk Kim
  Cc: Changman Lee, linux-f2fs-devel, linux-kernel, huang.ying.caritas

On Mon, 2014-09-08 at 22:23 -0700, Jaegeuk Kim wrote:
> Hi Huang,
> 
> On Mon, Sep 08, 2014 at 07:38:26PM +0800, Huang Ying wrote:
> > For fsync, if the nid of a non-inode dnode < nid of inode and the
> > inode is not checkpointed.  The non-inode dnode may be written before
> > inode.  So in find_fsync_dnodes, f2fs_iget will fail, cause the
> > recovery fail.
> > 
> > Usually, inode will be allocated before non-inode dnode, so the nid of
> > inode < nid of non-inode dnode.  But it is possible for the reverse.
> > For example, because of alloc_nid_failed.
> > 
> > This is fixed via ignoring non-inode dnode before inode dnode in
> > find_fsync_dnodes.
> > 
> > The patch was tested via allocating nid reversely via a debugging
> > patch, that is, from big number to small number.
> > 
> > Signed-off-by: Huang, Ying <ying.huang@intel.com>
> > ---
> >  fs/f2fs/recovery.c |    7 ++++---
> >  1 file changed, 4 insertions(+), 3 deletions(-)
> > 
> > --- a/fs/f2fs/recovery.c
> > +++ b/fs/f2fs/recovery.c
> > @@ -172,8 +172,8 @@ static int find_fsync_dnodes(struct f2fs
> >  			if (IS_INODE(page) && is_dent_dnode(page))
> >  				set_inode_flag(F2FS_I(entry->inode),
> >  							FI_INC_LINK);
> > -		} else {
> > -			if (IS_INODE(page) && is_dent_dnode(page)) {
> 
> If this is not inode block, we should add this inode to recover its data blocks.

Is it possible that there is only non-inode dnode but no inode when
find_fsync_dnodes checking dnodes?  Per my understanding, any changes to
file will cause inode page dirty (for example, mtime changed), so that
we will write inode block.  Is it right?  If so, the solution in this
patch should work too.

Best Regards,
Huang, Ying

> Rather than this tweak, if iget is failed, we'd better go to next instead of
> break.
> Can you test that?
> 
> > +		} else if (IS_INODE(page)) {
> > +			if (is_dent_dnode(page)) {
> >  				err = recover_inode_page(sbi, page);
> >  				if (err)
> >  					break;
> > @@ -193,7 +193,8 @@ static int find_fsync_dnodes(struct f2fs
> >  				break;
> >  			}
> >  			list_add_tail(&entry->list, head);
> > -		}
> > +		} else
> > +			goto next;
> >  		entry->blkaddr = blkaddr;
> >  
> >  		err = recover_inode(entry->inode, page);



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] f2fs: Fix recover when nid of non-inode dnode < nid of inode
  2014-09-09  5:39   ` Huang Ying
@ 2014-09-09  7:09       ` Jaegeuk Kim
  0 siblings, 0 replies; 31+ messages in thread
From: Jaegeuk Kim @ 2014-09-09  7:09 UTC (permalink / raw)
  To: Huang Ying
  Cc: Changman Lee, linux-f2fs-devel, linux-kernel, huang.ying.caritas

Hi,

On Tue, Sep 09, 2014 at 01:39:30PM +0800, Huang Ying wrote:
> On Mon, 2014-09-08 at 22:23 -0700, Jaegeuk Kim wrote:
> > Hi Huang,
> > 
> > On Mon, Sep 08, 2014 at 07:38:26PM +0800, Huang Ying wrote:
> > > For fsync, if the nid of a non-inode dnode < nid of inode and the
> > > inode is not checkpointed.  The non-inode dnode may be written before
> > > inode.  So in find_fsync_dnodes, f2fs_iget will fail, cause the
> > > recovery fail.
> > > 
> > > Usually, inode will be allocated before non-inode dnode, so the nid of
> > > inode < nid of non-inode dnode.  But it is possible for the reverse.
> > > For example, because of alloc_nid_failed.
> > > 
> > > This is fixed via ignoring non-inode dnode before inode dnode in
> > > find_fsync_dnodes.
> > > 
> > > The patch was tested via allocating nid reversely via a debugging
> > > patch, that is, from big number to small number.
> > > 
> > > Signed-off-by: Huang, Ying <ying.huang@intel.com>
> > > ---
> > >  fs/f2fs/recovery.c |    7 ++++---
> > >  1 file changed, 4 insertions(+), 3 deletions(-)
> > > 
> > > --- a/fs/f2fs/recovery.c
> > > +++ b/fs/f2fs/recovery.c
> > > @@ -172,8 +172,8 @@ static int find_fsync_dnodes(struct f2fs
> > >  			if (IS_INODE(page) && is_dent_dnode(page))
> > >  				set_inode_flag(F2FS_I(entry->inode),
> > >  							FI_INC_LINK);
> > > -		} else {
> > > -			if (IS_INODE(page) && is_dent_dnode(page)) {
> > 
> > If this is not inode block, we should add this inode to recover its data blocks.
> 
> Is it possible that there is only non-inode dnode but no inode when
> find_fsync_dnodes checking dnodes?  Per my understanding, any changes to
> file will cause inode page dirty (for example, mtime changed), so that
> we will write inode block.  Is it right?  If so, the solution in this
> patch should work too.

Your description says that f2fs_iget will fail, which causes the recovery fail.
So, I thought it would be better to handle the f2fs_iget failure directly.

In addition, we cannot guarantee the write order of dnode and inode.
For exmaple,
1. the inode is written by flusher or kswapd, then,
2. f2fs_sync_file writes its dnode.

In that case, we can get only non-inode dnode in the node chain, since the inode
has not fsync_mark.

Thanks,

> 
> Best Regards,
> Huang, Ying
> 
> > Rather than this tweak, if iget is failed, we'd better go to next instead of
> > break.
> > Can you test that?
> > 
> > > +		} else if (IS_INODE(page)) {
> > > +			if (is_dent_dnode(page)) {
> > >  				err = recover_inode_page(sbi, page);
> > >  				if (err)
> > >  					break;
> > > @@ -193,7 +193,8 @@ static int find_fsync_dnodes(struct f2fs
> > >  				break;
> > >  			}
> > >  			list_add_tail(&entry->list, head);
> > > -		}
> > > +		} else
> > > +			goto next;
> > >  		entry->blkaddr = blkaddr;
> > >  
> > >  		err = recover_inode(entry->inode, page);

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] f2fs: Fix recover when nid of non-inode dnode < nid of inode
@ 2014-09-09  7:09       ` Jaegeuk Kim
  0 siblings, 0 replies; 31+ messages in thread
From: Jaegeuk Kim @ 2014-09-09  7:09 UTC (permalink / raw)
  To: Huang Ying; +Cc: huang.ying.caritas, linux-kernel, linux-f2fs-devel

Hi,

On Tue, Sep 09, 2014 at 01:39:30PM +0800, Huang Ying wrote:
> On Mon, 2014-09-08 at 22:23 -0700, Jaegeuk Kim wrote:
> > Hi Huang,
> > 
> > On Mon, Sep 08, 2014 at 07:38:26PM +0800, Huang Ying wrote:
> > > For fsync, if the nid of a non-inode dnode < nid of inode and the
> > > inode is not checkpointed.  The non-inode dnode may be written before
> > > inode.  So in find_fsync_dnodes, f2fs_iget will fail, cause the
> > > recovery fail.
> > > 
> > > Usually, inode will be allocated before non-inode dnode, so the nid of
> > > inode < nid of non-inode dnode.  But it is possible for the reverse.
> > > For example, because of alloc_nid_failed.
> > > 
> > > This is fixed via ignoring non-inode dnode before inode dnode in
> > > find_fsync_dnodes.
> > > 
> > > The patch was tested via allocating nid reversely via a debugging
> > > patch, that is, from big number to small number.
> > > 
> > > Signed-off-by: Huang, Ying <ying.huang@intel.com>
> > > ---
> > >  fs/f2fs/recovery.c |    7 ++++---
> > >  1 file changed, 4 insertions(+), 3 deletions(-)
> > > 
> > > --- a/fs/f2fs/recovery.c
> > > +++ b/fs/f2fs/recovery.c
> > > @@ -172,8 +172,8 @@ static int find_fsync_dnodes(struct f2fs
> > >  			if (IS_INODE(page) && is_dent_dnode(page))
> > >  				set_inode_flag(F2FS_I(entry->inode),
> > >  							FI_INC_LINK);
> > > -		} else {
> > > -			if (IS_INODE(page) && is_dent_dnode(page)) {
> > 
> > If this is not inode block, we should add this inode to recover its data blocks.
> 
> Is it possible that there is only non-inode dnode but no inode when
> find_fsync_dnodes checking dnodes?  Per my understanding, any changes to
> file will cause inode page dirty (for example, mtime changed), so that
> we will write inode block.  Is it right?  If so, the solution in this
> patch should work too.

Your description says that f2fs_iget will fail, which causes the recovery fail.
So, I thought it would be better to handle the f2fs_iget failure directly.

In addition, we cannot guarantee the write order of dnode and inode.
For exmaple,
1. the inode is written by flusher or kswapd, then,
2. f2fs_sync_file writes its dnode.

In that case, we can get only non-inode dnode in the node chain, since the inode
has not fsync_mark.

Thanks,

> 
> Best Regards,
> Huang, Ying
> 
> > Rather than this tweak, if iget is failed, we'd better go to next instead of
> > break.
> > Can you test that?
> > 
> > > +		} else if (IS_INODE(page)) {
> > > +			if (is_dent_dnode(page)) {
> > >  				err = recover_inode_page(sbi, page);
> > >  				if (err)
> > >  					break;
> > > @@ -193,7 +193,8 @@ static int find_fsync_dnodes(struct f2fs
> > >  				break;
> > >  			}
> > >  			list_add_tail(&entry->list, head);
> > > -		}
> > > +		} else
> > > +			goto next;
> > >  		entry->blkaddr = blkaddr;
> > >  
> > >  		err = recover_inode(entry->inode, page);

------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce.
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] f2fs: Fix recover when nid of non-inode dnode < nid of inode
       [not found]       ` <CAC=cRTMA9AqmHjQqK3=5pAs8yqi25Rzmz8MaZ8=oTDaxHAXU+A@mail.gmail.com>
@ 2014-09-10  7:21           ` Jaegeuk Kim
  0 siblings, 0 replies; 31+ messages in thread
From: Jaegeuk Kim @ 2014-09-10  7:21 UTC (permalink / raw)
  To: huang ying; +Cc: Huang Ying, Changman Lee, linux-f2fs-devel, LKML

On Tue, Sep 09, 2014 at 07:31:49PM +0800, huang ying wrote:
> On Tue, Sep 9, 2014 at 3:09 PM, Jaegeuk Kim <jaegeuk@kernel.org> wrote:
> 
> > Hi,
> >
> > On Tue, Sep 09, 2014 at 01:39:30PM +0800, Huang Ying wrote:
> > > On Mon, 2014-09-08 at 22:23 -0700, Jaegeuk Kim wrote:
> > > > Hi Huang,
> > > >
> > > > On Mon, Sep 08, 2014 at 07:38:26PM +0800, Huang Ying wrote:
> > > > > For fsync, if the nid of a non-inode dnode < nid of inode and the
> > > > > inode is not checkpointed.  The non-inode dnode may be written before
> > > > > inode.  So in find_fsync_dnodes, f2fs_iget will fail, cause the
> > > > > recovery fail.
> > > > >
> > > > > Usually, inode will be allocated before non-inode dnode, so the nid
> > of
> > > > > inode < nid of non-inode dnode.  But it is possible for the reverse.
> > > > > For example, because of alloc_nid_failed.
> > > > >
> > > > > This is fixed via ignoring non-inode dnode before inode dnode in
> > > > > find_fsync_dnodes.
> > > > >
> > > > > The patch was tested via allocating nid reversely via a debugging
> > > > > patch, that is, from big number to small number.
> > > > >
> > > > > Signed-off-by: Huang, Ying <ying.huang@intel.com>
> > > > > ---
> > > > >  fs/f2fs/recovery.c |    7 ++++---
> > > > >  1 file changed, 4 insertions(+), 3 deletions(-)
> > > > >
> > > > > --- a/fs/f2fs/recovery.c
> > > > > +++ b/fs/f2fs/recovery.c
> > > > > @@ -172,8 +172,8 @@ static int find_fsync_dnodes(struct f2fs
> > > > >                   if (IS_INODE(page) && is_dent_dnode(page))
> > > > >                           set_inode_flag(F2FS_I(entry->inode),
> > > > >                                                   FI_INC_LINK);
> > > > > -         } else {
> > > > > -                 if (IS_INODE(page) && is_dent_dnode(page)) {
> > > >
> > > > If this is not inode block, we should add this inode to recover its
> > data blocks.
> > >
> > > Is it possible that there is only non-inode dnode but no inode when
> > > find_fsync_dnodes checking dnodes?  Per my understanding, any changes to
> > > file will cause inode page dirty (for example, mtime changed), so that
> > > we will write inode block.  Is it right?  If so, the solution in this
> > > patch should work too.
> >
> > Your description says that f2fs_iget will fail, which causes the recovery
> > fail.
> > So, I thought it would be better to handle the f2fs_iget failure directly.
> >
> 
> Yes.  That is another way to fix the issue.
> 
> 
> > In addition, we cannot guarantee the write order of dnode and inode.
> > For exmaple,
> > 1. the inode is written by flusher or kswapd, then,
> > 2. f2fs_sync_file writes its dnode.
> >
> > In that case, we can get only non-inode dnode in the node chain, since the
> > inode
> > has not fsync_mark.
> >
> 
> I think your solution is better here, but does not fix all scenarios.  If
> the inode is checkpointed, the file can be recovered, although the inode
> information may be not up to date.  But if the inode is not checkpointed,
> f2fs_iget will fail too and recover will fail.

Ok, let me consider your scenarios.

Term: F: fsync_mark, D: dentry_mark

1. inode(x) | CP | inode(x) | dnode(F)
 -> Lose the latest inode(x). Need to fix.

2. inode(x) | CP | dnode(F) | inode(x)
 -> Impossible, but recover latest dnode(F)

3. CP | inode(x) | dnode(F)
 -> Need to write inode(DF) in f2fs_sync_file.

4. CP | dnode(F) | inode(DF)
 -> If f2fs_iget fails, then goto next.

5. CP | dnode(F) | inode(x)
 -> If f2fs_iget fails, then goto next. But, this is an impossible scenario.
    Drop this dnode(F).

Indeed, there were some missing scenarios.

So, how about this patch?

>From 552dc68c5f07a335d7b55c197bab531efb135521 Mon Sep 17 00:00:00 2001
From: Jaegeuk Kim <jaegeuk@kernel.org>
Date: Wed, 10 Sep 2014 00:16:34 -0700
Subject: [PATCH] f2fs: fix roll-forward missing scenarios

We can summarize the roll forward recovery scenarios as follows.

[Term] F: fsync_mark, D: dentry_mark

1. inode(x) | CP | inode(x) | dnode(F)
-> Update the latest inode(x).

2. inode(x) | CP | inode(F) | dnode(F)
-> No problem.

3. inode(x) | CP | dnode(F) | inode(x)
-> Impossible, but recover latest dnode(F)

4. inode(x) | CP | dnode(F) | inode(F)
-> No problem.

5. CP | inode(x) | dnode(F)
-> Impossible. Write inode(DF) with dnode(F) by f2fs_sync_file.

6. CP | inode(DF) | dnode(F)
-> No problem.

7. CP | dnode(F) | inode(DF)
-> If f2fs_iget fails, then goto next to find inode(DF).

8. CP | dnode(F) | inode(x)
-> If f2fs_iget fails, then goto next. But, this is an impossible scenario.
   Drop this dnode(F).

So, this patch adds some missing points such as #1, #5, #7, and #8.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
---
 fs/f2fs/file.c     | 10 ++++++++
 fs/f2fs/recovery.c | 70 +++++++++++++++++++++++++++++++++++++++++++++---------
 2 files changed, 69 insertions(+), 11 deletions(-)

diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 5cde363..2660af2 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -207,6 +207,16 @@ int f2fs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
 			up_write(&fi->i_sem);
 		}
 	} else {
+		/*
+		 * CP | inode(x) | dnode(F)
+		 * We need to remain inode(DF) for roll-forward recovery.
+		 */
+		if (fsync_mark_done(sbi, inode->i_ino) &&
+			!is_checkpointed_node(sbi, F2FS_I(inode)->i_pino)) {
+			mark_inode_dirty_sync(inode);
+			f2fs_write_inode(inode, NULL);
+		}
+
 		/* if there is no written node page, write its inode page */
 		while (!sync_node_pages(sbi, inode->i_ino, &wbc)) {
 			if (fsync_mark_done(sbi, inode->i_ino))
diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c
index 6c5a74a..8e69b9d 100644
--- a/fs/f2fs/recovery.c
+++ b/fs/f2fs/recovery.c
@@ -14,6 +14,36 @@
 #include "node.h"
 #include "segment.h"
 
+/*
+ * Roll forward recovery scenarios.
+ *
+ * [Term] F: fsync_mark, D: dentry_mark
+ *
+ * 1. inode(x) | CP | inode(x) | dnode(F)
+ * -> Update the latest inode(x).
+ *
+ * 2. inode(x) | CP | inode(F) | dnode(F)
+ * -> No problem.
+ *
+ * 3. inode(x) | CP | dnode(F) | inode(x)
+ * -> Impossible, but recover latest dnode(F)
+ *
+ * 4. inode(x) | CP | dnode(F) | inode(F)
+ * -> No problem.
+ *
+ * 5. CP | inode(x) | dnode(F)
+ * -> Impossible. Write inode(DF) with dnode(F) by f2fs_sync_file.
+ *
+ * 6. CP | inode(DF) | dnode(F)
+ * -> No problem.
+ *
+ * 7. CP | dnode(F) | inode(DF)
+ * -> If f2fs_iget fails, then goto next to find inode(DF).
+ *
+ * 8. CP | dnode(F) | inode(x)
+ * -> If f2fs_iget fails, then goto next. But, this is an impossible scenario.
+ *    Drop this dnode(F).
+ */
 static struct kmem_cache *fsync_entry_slab;
 
 bool space_for_roll_forward(struct f2fs_sb_info *sbi)
@@ -110,27 +140,32 @@ out:
 	return err;
 }
 
-static int recover_inode(struct inode *inode, struct page *node_page)
+static void __recover_inode(struct inode *inode, struct page *page)
 {
-	struct f2fs_inode *raw_inode = F2FS_INODE(node_page);
+	struct f2fs_inode *raw = F2FS_INODE(page);
+
+	inode->i_mode = le16_to_cpu(raw->i_mode);
+	i_size_write(inode, le64_to_cpu(raw->i_size));
+	inode->i_atime.tv_sec = le64_to_cpu(raw->i_mtime);
+	inode->i_ctime.tv_sec = le64_to_cpu(raw->i_ctime);
+	inode->i_mtime.tv_sec = le64_to_cpu(raw->i_mtime);
+	inode->i_atime.tv_nsec = le32_to_cpu(raw->i_mtime_nsec);
+	inode->i_ctime.tv_nsec = le32_to_cpu(raw->i_ctime_nsec);
+	inode->i_mtime.tv_nsec = le32_to_cpu(raw->i_mtime_nsec);
+}
 
+static int recover_inode(struct inode *inode, struct page *node_page)
+{
 	if (!IS_INODE(node_page))
 		return 0;
 
-	inode->i_mode = le16_to_cpu(raw_inode->i_mode);
-	i_size_write(inode, le64_to_cpu(raw_inode->i_size));
-	inode->i_atime.tv_sec = le64_to_cpu(raw_inode->i_mtime);
-	inode->i_ctime.tv_sec = le64_to_cpu(raw_inode->i_ctime);
-	inode->i_mtime.tv_sec = le64_to_cpu(raw_inode->i_mtime);
-	inode->i_atime.tv_nsec = le32_to_cpu(raw_inode->i_mtime_nsec);
-	inode->i_ctime.tv_nsec = le32_to_cpu(raw_inode->i_ctime_nsec);
-	inode->i_mtime.tv_nsec = le32_to_cpu(raw_inode->i_mtime_nsec);
+	__recover_inode(inode, node_page);
 
 	if (is_dent_dnode(node_page))
 		return recover_dentry(node_page, inode);
 
 	f2fs_msg(inode->i_sb, KERN_NOTICE, "recover_inode: ino = %x, name = %s",
-			ino_of_node(node_page), raw_inode->i_name);
+			ino_of_node(node_page), F2FS_INODE(node_page)->i_name);
 	return 0;
 }
 
@@ -186,10 +221,16 @@ static int find_fsync_dnodes(struct f2fs_sb_info *sbi, struct list_head *head)
 				break;
 			}
 
+			/*
+			 * CP | dnode(F) | inode(DF)
+			 * For this case, we should not give up now.
+			 */
 			entry->inode = f2fs_iget(sbi->sb, ino_of_node(page));
 			if (IS_ERR(entry->inode)) {
 				err = PTR_ERR(entry->inode);
 				kmem_cache_free(fsync_entry_slab, entry);
+				if (PTR_ERR(entry->inode) == -ENOENT)
+					goto next;
 				break;
 			}
 			list_add_tail(&entry->list, head);
@@ -416,6 +457,13 @@ static int recover_data(struct f2fs_sb_info *sbi,
 		entry = get_fsync_inode(head, ino_of_node(page));
 		if (!entry)
 			goto next;
+		/*
+		 * inode(x) | CP | inode(x) | dnode(F)
+		 * In this case, we can lose the latest inode(x).
+		 * So, call __recover_inode for the inode update.
+		 */
+		if (IS_INODE(page))
+			__recover_inode(entry->inode, page);
 
 		err = do_recover_data(sbi, entry->inode, page, blkaddr);
 		if (err)
-- 
1.8.5.2 (Apple Git-48)


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH] f2fs: Fix recover when nid of non-inode dnode < nid of inode
@ 2014-09-10  7:21           ` Jaegeuk Kim
  0 siblings, 0 replies; 31+ messages in thread
From: Jaegeuk Kim @ 2014-09-10  7:21 UTC (permalink / raw)
  To: huang ying; +Cc: linux-f2fs-devel, LKML, Huang Ying

On Tue, Sep 09, 2014 at 07:31:49PM +0800, huang ying wrote:
> On Tue, Sep 9, 2014 at 3:09 PM, Jaegeuk Kim <jaegeuk@kernel.org> wrote:
> 
> > Hi,
> >
> > On Tue, Sep 09, 2014 at 01:39:30PM +0800, Huang Ying wrote:
> > > On Mon, 2014-09-08 at 22:23 -0700, Jaegeuk Kim wrote:
> > > > Hi Huang,
> > > >
> > > > On Mon, Sep 08, 2014 at 07:38:26PM +0800, Huang Ying wrote:
> > > > > For fsync, if the nid of a non-inode dnode < nid of inode and the
> > > > > inode is not checkpointed.  The non-inode dnode may be written before
> > > > > inode.  So in find_fsync_dnodes, f2fs_iget will fail, cause the
> > > > > recovery fail.
> > > > >
> > > > > Usually, inode will be allocated before non-inode dnode, so the nid
> > of
> > > > > inode < nid of non-inode dnode.  But it is possible for the reverse.
> > > > > For example, because of alloc_nid_failed.
> > > > >
> > > > > This is fixed via ignoring non-inode dnode before inode dnode in
> > > > > find_fsync_dnodes.
> > > > >
> > > > > The patch was tested via allocating nid reversely via a debugging
> > > > > patch, that is, from big number to small number.
> > > > >
> > > > > Signed-off-by: Huang, Ying <ying.huang@intel.com>
> > > > > ---
> > > > >  fs/f2fs/recovery.c |    7 ++++---
> > > > >  1 file changed, 4 insertions(+), 3 deletions(-)
> > > > >
> > > > > --- a/fs/f2fs/recovery.c
> > > > > +++ b/fs/f2fs/recovery.c
> > > > > @@ -172,8 +172,8 @@ static int find_fsync_dnodes(struct f2fs
> > > > >                   if (IS_INODE(page) && is_dent_dnode(page))
> > > > >                           set_inode_flag(F2FS_I(entry->inode),
> > > > >                                                   FI_INC_LINK);
> > > > > -         } else {
> > > > > -                 if (IS_INODE(page) && is_dent_dnode(page)) {
> > > >
> > > > If this is not inode block, we should add this inode to recover its
> > data blocks.
> > >
> > > Is it possible that there is only non-inode dnode but no inode when
> > > find_fsync_dnodes checking dnodes?  Per my understanding, any changes to
> > > file will cause inode page dirty (for example, mtime changed), so that
> > > we will write inode block.  Is it right?  If so, the solution in this
> > > patch should work too.
> >
> > Your description says that f2fs_iget will fail, which causes the recovery
> > fail.
> > So, I thought it would be better to handle the f2fs_iget failure directly.
> >
> 
> Yes.  That is another way to fix the issue.
> 
> 
> > In addition, we cannot guarantee the write order of dnode and inode.
> > For exmaple,
> > 1. the inode is written by flusher or kswapd, then,
> > 2. f2fs_sync_file writes its dnode.
> >
> > In that case, we can get only non-inode dnode in the node chain, since the
> > inode
> > has not fsync_mark.
> >
> 
> I think your solution is better here, but does not fix all scenarios.  If
> the inode is checkpointed, the file can be recovered, although the inode
> information may be not up to date.  But if the inode is not checkpointed,
> f2fs_iget will fail too and recover will fail.

Ok, let me consider your scenarios.

Term: F: fsync_mark, D: dentry_mark

1. inode(x) | CP | inode(x) | dnode(F)
 -> Lose the latest inode(x). Need to fix.

2. inode(x) | CP | dnode(F) | inode(x)
 -> Impossible, but recover latest dnode(F)

3. CP | inode(x) | dnode(F)
 -> Need to write inode(DF) in f2fs_sync_file.

4. CP | dnode(F) | inode(DF)
 -> If f2fs_iget fails, then goto next.

5. CP | dnode(F) | inode(x)
 -> If f2fs_iget fails, then goto next. But, this is an impossible scenario.
    Drop this dnode(F).

Indeed, there were some missing scenarios.

So, how about this patch?

>From 552dc68c5f07a335d7b55c197bab531efb135521 Mon Sep 17 00:00:00 2001
From: Jaegeuk Kim <jaegeuk@kernel.org>
Date: Wed, 10 Sep 2014 00:16:34 -0700
Subject: [PATCH] f2fs: fix roll-forward missing scenarios

We can summarize the roll forward recovery scenarios as follows.

[Term] F: fsync_mark, D: dentry_mark

1. inode(x) | CP | inode(x) | dnode(F)
-> Update the latest inode(x).

2. inode(x) | CP | inode(F) | dnode(F)
-> No problem.

3. inode(x) | CP | dnode(F) | inode(x)
-> Impossible, but recover latest dnode(F)

4. inode(x) | CP | dnode(F) | inode(F)
-> No problem.

5. CP | inode(x) | dnode(F)
-> Impossible. Write inode(DF) with dnode(F) by f2fs_sync_file.

6. CP | inode(DF) | dnode(F)
-> No problem.

7. CP | dnode(F) | inode(DF)
-> If f2fs_iget fails, then goto next to find inode(DF).

8. CP | dnode(F) | inode(x)
-> If f2fs_iget fails, then goto next. But, this is an impossible scenario.
   Drop this dnode(F).

So, this patch adds some missing points such as #1, #5, #7, and #8.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
---
 fs/f2fs/file.c     | 10 ++++++++
 fs/f2fs/recovery.c | 70 +++++++++++++++++++++++++++++++++++++++++++++---------
 2 files changed, 69 insertions(+), 11 deletions(-)

diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 5cde363..2660af2 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -207,6 +207,16 @@ int f2fs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
 			up_write(&fi->i_sem);
 		}
 	} else {
+		/*
+		 * CP | inode(x) | dnode(F)
+		 * We need to remain inode(DF) for roll-forward recovery.
+		 */
+		if (fsync_mark_done(sbi, inode->i_ino) &&
+			!is_checkpointed_node(sbi, F2FS_I(inode)->i_pino)) {
+			mark_inode_dirty_sync(inode);
+			f2fs_write_inode(inode, NULL);
+		}
+
 		/* if there is no written node page, write its inode page */
 		while (!sync_node_pages(sbi, inode->i_ino, &wbc)) {
 			if (fsync_mark_done(sbi, inode->i_ino))
diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c
index 6c5a74a..8e69b9d 100644
--- a/fs/f2fs/recovery.c
+++ b/fs/f2fs/recovery.c
@@ -14,6 +14,36 @@
 #include "node.h"
 #include "segment.h"
 
+/*
+ * Roll forward recovery scenarios.
+ *
+ * [Term] F: fsync_mark, D: dentry_mark
+ *
+ * 1. inode(x) | CP | inode(x) | dnode(F)
+ * -> Update the latest inode(x).
+ *
+ * 2. inode(x) | CP | inode(F) | dnode(F)
+ * -> No problem.
+ *
+ * 3. inode(x) | CP | dnode(F) | inode(x)
+ * -> Impossible, but recover latest dnode(F)
+ *
+ * 4. inode(x) | CP | dnode(F) | inode(F)
+ * -> No problem.
+ *
+ * 5. CP | inode(x) | dnode(F)
+ * -> Impossible. Write inode(DF) with dnode(F) by f2fs_sync_file.
+ *
+ * 6. CP | inode(DF) | dnode(F)
+ * -> No problem.
+ *
+ * 7. CP | dnode(F) | inode(DF)
+ * -> If f2fs_iget fails, then goto next to find inode(DF).
+ *
+ * 8. CP | dnode(F) | inode(x)
+ * -> If f2fs_iget fails, then goto next. But, this is an impossible scenario.
+ *    Drop this dnode(F).
+ */
 static struct kmem_cache *fsync_entry_slab;
 
 bool space_for_roll_forward(struct f2fs_sb_info *sbi)
@@ -110,27 +140,32 @@ out:
 	return err;
 }
 
-static int recover_inode(struct inode *inode, struct page *node_page)
+static void __recover_inode(struct inode *inode, struct page *page)
 {
-	struct f2fs_inode *raw_inode = F2FS_INODE(node_page);
+	struct f2fs_inode *raw = F2FS_INODE(page);
+
+	inode->i_mode = le16_to_cpu(raw->i_mode);
+	i_size_write(inode, le64_to_cpu(raw->i_size));
+	inode->i_atime.tv_sec = le64_to_cpu(raw->i_mtime);
+	inode->i_ctime.tv_sec = le64_to_cpu(raw->i_ctime);
+	inode->i_mtime.tv_sec = le64_to_cpu(raw->i_mtime);
+	inode->i_atime.tv_nsec = le32_to_cpu(raw->i_mtime_nsec);
+	inode->i_ctime.tv_nsec = le32_to_cpu(raw->i_ctime_nsec);
+	inode->i_mtime.tv_nsec = le32_to_cpu(raw->i_mtime_nsec);
+}
 
+static int recover_inode(struct inode *inode, struct page *node_page)
+{
 	if (!IS_INODE(node_page))
 		return 0;
 
-	inode->i_mode = le16_to_cpu(raw_inode->i_mode);
-	i_size_write(inode, le64_to_cpu(raw_inode->i_size));
-	inode->i_atime.tv_sec = le64_to_cpu(raw_inode->i_mtime);
-	inode->i_ctime.tv_sec = le64_to_cpu(raw_inode->i_ctime);
-	inode->i_mtime.tv_sec = le64_to_cpu(raw_inode->i_mtime);
-	inode->i_atime.tv_nsec = le32_to_cpu(raw_inode->i_mtime_nsec);
-	inode->i_ctime.tv_nsec = le32_to_cpu(raw_inode->i_ctime_nsec);
-	inode->i_mtime.tv_nsec = le32_to_cpu(raw_inode->i_mtime_nsec);
+	__recover_inode(inode, node_page);
 
 	if (is_dent_dnode(node_page))
 		return recover_dentry(node_page, inode);
 
 	f2fs_msg(inode->i_sb, KERN_NOTICE, "recover_inode: ino = %x, name = %s",
-			ino_of_node(node_page), raw_inode->i_name);
+			ino_of_node(node_page), F2FS_INODE(node_page)->i_name);
 	return 0;
 }
 
@@ -186,10 +221,16 @@ static int find_fsync_dnodes(struct f2fs_sb_info *sbi, struct list_head *head)
 				break;
 			}
 
+			/*
+			 * CP | dnode(F) | inode(DF)
+			 * For this case, we should not give up now.
+			 */
 			entry->inode = f2fs_iget(sbi->sb, ino_of_node(page));
 			if (IS_ERR(entry->inode)) {
 				err = PTR_ERR(entry->inode);
 				kmem_cache_free(fsync_entry_slab, entry);
+				if (PTR_ERR(entry->inode) == -ENOENT)
+					goto next;
 				break;
 			}
 			list_add_tail(&entry->list, head);
@@ -416,6 +457,13 @@ static int recover_data(struct f2fs_sb_info *sbi,
 		entry = get_fsync_inode(head, ino_of_node(page));
 		if (!entry)
 			goto next;
+		/*
+		 * inode(x) | CP | inode(x) | dnode(F)
+		 * In this case, we can lose the latest inode(x).
+		 * So, call __recover_inode for the inode update.
+		 */
+		if (IS_INODE(page))
+			__recover_inode(entry->inode, page);
 
 		err = do_recover_data(sbi, entry->inode, page, blkaddr);
 		if (err)
-- 
1.8.5.2 (Apple Git-48)


------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH] f2fs: Fix recover when nid of non-inode dnode < nid of inode
       [not found]           ` <CAC=cRTMGJfD_SZ=bE8pi5U6n5W1MF17emLC3VZDbpLSQtbNcKg@mail.gmail.com>
@ 2014-09-11  5:37               ` Jaegeuk Kim
  0 siblings, 0 replies; 31+ messages in thread
From: Jaegeuk Kim @ 2014-09-11  5:37 UTC (permalink / raw)
  To: huang ying; +Cc: Huang Ying, Changman Lee, linux-f2fs-devel, LKML

On Wed, Sep 10, 2014 at 07:08:32PM +0800, huang ying wrote:
> On Wed, Sep 10, 2014 at 3:21 PM, Jaegeuk Kim <jaegeuk@kernel.org> wrote:
> 
> > On Tue, Sep 09, 2014 at 07:31:49PM +0800, huang ying wrote:
> > > On Tue, Sep 9, 2014 at 3:09 PM, Jaegeuk Kim <jaegeuk@kernel.org> wrote:
> > >
> > > > Hi,
> > > >
> > > > On Tue, Sep 09, 2014 at 01:39:30PM +0800, Huang Ying wrote:
> > > > > On Mon, 2014-09-08 at 22:23 -0700, Jaegeuk Kim wrote:
> > > > > > Hi Huang,
> > > > > >
> > > > > > On Mon, Sep 08, 2014 at 07:38:26PM +0800, Huang Ying wrote:
> > > > > > > For fsync, if the nid of a non-inode dnode < nid of inode and the
> > > > > > > inode is not checkpointed.  The non-inode dnode may be written
> > before
> > > > > > > inode.  So in find_fsync_dnodes, f2fs_iget will fail, cause the
> > > > > > > recovery fail.
> > > > > > >
> > > > > > > Usually, inode will be allocated before non-inode dnode, so the
> > nid
> > > > of
> > > > > > > inode < nid of non-inode dnode.  But it is possible for the
> > reverse.
> > > > > > > For example, because of alloc_nid_failed.
> > > > > > >
> > > > > > > This is fixed via ignoring non-inode dnode before inode dnode in
> > > > > > > find_fsync_dnodes.
> > > > > > >
> > > > > > > The patch was tested via allocating nid reversely via a debugging
> > > > > > > patch, that is, from big number to small number.
> > > > > > >
> > > > > > > Signed-off-by: Huang, Ying <ying.huang@intel.com>
> > > > > > > ---
> > > > > > >  fs/f2fs/recovery.c |    7 ++++---
> > > > > > >  1 file changed, 4 insertions(+), 3 deletions(-)
> > > > > > >
> > > > > > > --- a/fs/f2fs/recovery.c
> > > > > > > +++ b/fs/f2fs/recovery.c
> > > > > > > @@ -172,8 +172,8 @@ static int find_fsync_dnodes(struct f2fs
> > > > > > >                   if (IS_INODE(page) && is_dent_dnode(page))
> > > > > > >                           set_inode_flag(F2FS_I(entry->inode),
> > > > > > >                                                   FI_INC_LINK);
> > > > > > > -         } else {
> > > > > > > -                 if (IS_INODE(page) && is_dent_dnode(page)) {
> > > > > >
> > > > > > If this is not inode block, we should add this inode to recover its
> > > > data blocks.
> > > > >
> > > > > Is it possible that there is only non-inode dnode but no inode when
> > > > > find_fsync_dnodes checking dnodes?  Per my understanding, any
> > changes to
> > > > > file will cause inode page dirty (for example, mtime changed), so
> > that
> > > > > we will write inode block.  Is it right?  If so, the solution in this
> > > > > patch should work too.
> > > >
> > > > Your description says that f2fs_iget will fail, which causes the
> > recovery
> > > > fail.
> > > > So, I thought it would be better to handle the f2fs_iget failure
> > directly.
> > > >
> > >
> > > Yes.  That is another way to fix the issue.
> > >
> > >
> > > > In addition, we cannot guarantee the write order of dnode and inode.
> > > > For exmaple,
> > > > 1. the inode is written by flusher or kswapd, then,
> > > > 2. f2fs_sync_file writes its dnode.
> > > >
> > > > In that case, we can get only non-inode dnode in the node chain, since
> > the
> > > > inode
> > > > has not fsync_mark.
> > > >
> > >
> > > I think your solution is better here, but does not fix all scenarios.  If
> > > the inode is checkpointed, the file can be recovered, although the inode
> > > information may be not up to date.  But if the inode is not checkpointed,
> > > f2fs_iget will fail too and recover will fail.
> >
> > Ok, let me consider your scenarios.
> >
> > Term: F: fsync_mark, D: dentry_mark
> >
> > 1. inode(x) | CP | inode(x) | dnode(F)
> >  -> Lose the latest inode(x). Need to fix.
> >
> > 2. inode(x) | CP | dnode(F) | inode(x)
> >  -> Impossible, but recover latest dnode(F)
> >
> > 3. CP | inode(x) | dnode(F)
> >  -> Need to write inode(DF) in f2fs_sync_file.
> >
> > 4. CP | dnode(F) | inode(DF)
> >  -> If f2fs_iget fails, then goto next.
> >
> > 5. CP | dnode(F) | inode(x)
> >  -> If f2fs_iget fails, then goto next. But, this is an impossible
> > scenario.
> >     Drop this dnode(F).
> >
> > Indeed, there were some missing scenarios.
> >
> > So, how about this patch?
> >
> > From 552dc68c5f07a335d7b55c197bab531efb135521 Mon Sep 17 00:00:00 2001
> > From: Jaegeuk Kim <jaegeuk@kernel.org>
> > Date: Wed, 10 Sep 2014 00:16:34 -0700
> > Subject: [PATCH] f2fs: fix roll-forward missing scenarios
> >
> > We can summarize the roll forward recovery scenarios as follows.
> >
> > [Term] F: fsync_mark, D: dentry_mark
> >
> > 1. inode(x) | CP | inode(x) | dnode(F)
> > -> Update the latest inode(x).
> >
> > 2. inode(x) | CP | inode(F) | dnode(F)
> > -> No problem.
> >
> > 3. inode(x) | CP | dnode(F) | inode(x)
> > -> Impossible, but recover latest dnode(F)
> >
> 
> I think this is possible.  If f2fs_sync_file runs concurrently with
> writeback. f2fs_sync_file written dnode(F), then writeback written inode(x).

If the inode(x) was written, f2fs_sync_file will do write the inode again with
fsync_mark. So, dnode(F) | inode(x) | inode(F) should be shown.

In f2fs_sync_file,
	...
	while (!sync_node_pages(sbi, ino, &wbc)) {
		if (fsync_mark_done(sbi, ino))
			goto out;
		mark_inode_dirty_sync(inode);
		ret = f2fs_write_inode(inode, NULL);
		if (ret)
			goto out;
	}
	...

> 
> 
> > 4. inode(x) | CP | dnode(F) | inode(F)
> > -> No problem.
> >
> > 5. CP | inode(x) | dnode(F)
> > -> Impossible. Write inode(DF) with dnode(F) by f2fs_sync_file.
> >
> 
> I think this is possible.  If inode(x) is written by writeback and then
> dnode(F) is written by f2fs_sync_file.

-> The inode(DF) was missing. Should drop this dnode(F).
Again, CP | inode(x) | dnode(F) | inode(DF) should be shown.
This is fixed by the below updated patch.

> 
> 
> > 6. CP | inode(DF) | dnode(F)
> > -> No problem.
> >
> > 7. CP | dnode(F) | inode(DF)
> > -> If f2fs_iget fails, then goto next to find inode(DF).
> >
> > 8. CP | dnode(F) | inode(x)
> > -> If f2fs_iget fails, then goto next. But, this is an impossible scenario.
> >    Drop this dnode(F).
> >
> 
> I think this is possible.  as 3.

ditto.

> 
> 
> > So, this patch adds some missing points such as #1, #5, #7, and #8.
> >
> > Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> > ---
> >  fs/f2fs/file.c     | 10 ++++++++
> >  fs/f2fs/recovery.c | 70
> > +++++++++++++++++++++++++++++++++++++++++++++---------
> >  2 files changed, 69 insertions(+), 11 deletions(-)
> >
> > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> > index 5cde363..2660af2 100644
> > --- a/fs/f2fs/file.c
> > +++ b/fs/f2fs/file.c
> > @@ -207,6 +207,16 @@ int f2fs_sync_file(struct file *file, loff_t start,
> > loff_t end, int datasync)
> >                         up_write(&fi->i_sem);
> >                 }
> >         } else {
> > +               /*
> > +                * CP | inode(x) | dnode(F)
> > +                * We need to remain inode(DF) for roll-forward recovery.
> > +                */
> > +               if (fsync_mark_done(sbi, inode->i_ino) &&
> >
> 
> How can this happen?  We have not called sync_node_pages(sbi, inode->i_ino,
> &wbc).  Where is the fsync mark set?
> 
> 
> > +                       !is_checkpointed_node(sbi, F2FS_I(inode)->i_pino))
> > {
> >
> 
> if !is_checkpointed_node(sbi, F2FS_I(inode)->i_pino), we need to do
> checkpoint.  Right?

Yes, whole conditions were wrong.
Please, refer the following patch.
Thanks,

>From 2e70059dd680830c030f59813d4e9837cb65ecb1 Mon Sep 17 00:00:00 2001
From: Jaegeuk Kim <jaegeuk@kernel.org>
Date: Wed, 10 Sep 2014 00:16:34 -0700
Subject: [PATCH 1/2] f2fs: fix roll-forward missing scenarios

We can summarize the roll forward recovery scenarios as follows.

[Term] F: fsync_mark, D: dentry_mark

1. inode(x) | CP | inode(x) | dnode(F)
-> Update the latest inode(x).

2. inode(x) | CP | inode(F) | dnode(F)
-> No problem.

3. inode(x) | CP | dnode(F) | inode(x)
-> Impossible, but recover latest dnode(F).

4. inode(x) | CP | dnode(F) | inode(F)
-> No problem.

5. CP | inode(x) | dnode(F)
-> The inode(DF) was missing. Should drop this dnode(F).

6. CP | inode(DF) | dnode(F)
-> No problem.

7. CP | dnode(F) | inode(DF)
-> If f2fs_iget fails, then goto next to find inode(DF).

8. CP | dnode(F) | inode(x)
-> If f2fs_iget fails, then goto next to find inode(DF).
   But it will fail due to no inode(DF).

So, this patch adds some missing points such as #1, #5, #7, and #8.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
---
 fs/f2fs/file.c     | 10 ++++++++
 fs/f2fs/recovery.c | 70 +++++++++++++++++++++++++++++++++++++++++++++---------
 2 files changed, 69 insertions(+), 11 deletions(-)

diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index e7681c3..e2a2f38 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -206,6 +206,16 @@ int f2fs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
 			up_write(&fi->i_sem);
 		}
 	} else {
+		/*
+		 * CP | inode(x) | dnode(F)
+		 * We need to remain inode(DF) for roll-forward recovery.
+		 */
+		if (!fsync_mark_done(sbi, inode->i_ino) &&
+			!is_checkpointed_node(sbi, inode->i_ino)) {
+			mark_inode_dirty_sync(inode);
+			f2fs_write_inode(inode, NULL);
+		}
+
 		/* if there is no written node page, write its inode page */
 		while (!sync_node_pages(sbi, ino, &wbc)) {
 			if (fsync_mark_done(sbi, ino))
diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c
index 6c5a74a..eeb3785 100644
--- a/fs/f2fs/recovery.c
+++ b/fs/f2fs/recovery.c
@@ -14,6 +14,36 @@
 #include "node.h"
 #include "segment.h"
 
+/*
+ * Roll forward recovery scenarios.
+ *
+ * [Term] F: fsync_mark, D: dentry_mark
+ *
+ * 1. inode(x) | CP | inode(x) | dnode(F)
+ * -> Update the latest inode(x).
+ *
+ * 2. inode(x) | CP | inode(F) | dnode(F)
+ * -> No problem.
+ *
+ * 3. inode(x) | CP | dnode(F) | inode(x)
+ * -> Impossible, but recover latest dnode(F)
+ *
+ * 4. inode(x) | CP | dnode(F) | inode(F)
+ * -> No problem.
+ *
+ * 5. CP | inode(x) | dnode(F)
+ * -> The inode(DF) was missing. Should drop this dnode(F).
+ *
+ * 6. CP | inode(DF) | dnode(F)
+ * -> No problem.
+ *
+ * 7. CP | dnode(F) | inode(DF)
+ * -> If f2fs_iget fails, then goto next to find inode(DF).
+ *
+ * 8. CP | dnode(F) | inode(x)
+ * -> If f2fs_iget fails, then goto next to find inode(DF).
+ *    But it will fail due to no inode(DF).
+ */
 static struct kmem_cache *fsync_entry_slab;
 
 bool space_for_roll_forward(struct f2fs_sb_info *sbi)
@@ -110,27 +140,32 @@ out:
 	return err;
 }
 
-static int recover_inode(struct inode *inode, struct page *node_page)
+static void __recover_inode(struct inode *inode, struct page *page)
 {
-	struct f2fs_inode *raw_inode = F2FS_INODE(node_page);
+	struct f2fs_inode *raw = F2FS_INODE(page);
+
+	inode->i_mode = le16_to_cpu(raw->i_mode);
+	i_size_write(inode, le64_to_cpu(raw->i_size));
+	inode->i_atime.tv_sec = le64_to_cpu(raw->i_mtime);
+	inode->i_ctime.tv_sec = le64_to_cpu(raw->i_ctime);
+	inode->i_mtime.tv_sec = le64_to_cpu(raw->i_mtime);
+	inode->i_atime.tv_nsec = le32_to_cpu(raw->i_mtime_nsec);
+	inode->i_ctime.tv_nsec = le32_to_cpu(raw->i_ctime_nsec);
+	inode->i_mtime.tv_nsec = le32_to_cpu(raw->i_mtime_nsec);
+}
 
+static int recover_inode(struct inode *inode, struct page *node_page)
+{
 	if (!IS_INODE(node_page))
 		return 0;
 
-	inode->i_mode = le16_to_cpu(raw_inode->i_mode);
-	i_size_write(inode, le64_to_cpu(raw_inode->i_size));
-	inode->i_atime.tv_sec = le64_to_cpu(raw_inode->i_mtime);
-	inode->i_ctime.tv_sec = le64_to_cpu(raw_inode->i_ctime);
-	inode->i_mtime.tv_sec = le64_to_cpu(raw_inode->i_mtime);
-	inode->i_atime.tv_nsec = le32_to_cpu(raw_inode->i_mtime_nsec);
-	inode->i_ctime.tv_nsec = le32_to_cpu(raw_inode->i_ctime_nsec);
-	inode->i_mtime.tv_nsec = le32_to_cpu(raw_inode->i_mtime_nsec);
+	__recover_inode(inode, node_page);
 
 	if (is_dent_dnode(node_page))
 		return recover_dentry(node_page, inode);
 
 	f2fs_msg(inode->i_sb, KERN_NOTICE, "recover_inode: ino = %x, name = %s",
-			ino_of_node(node_page), raw_inode->i_name);
+			ino_of_node(node_page), F2FS_INODE(node_page)->i_name);
 	return 0;
 }
 
@@ -186,10 +221,16 @@ static int find_fsync_dnodes(struct f2fs_sb_info *sbi, struct list_head *head)
 				break;
 			}
 
+			/*
+			 * CP | dnode(F) | inode(DF)
+			 * For this case, we should not give up now.
+			 */
 			entry->inode = f2fs_iget(sbi->sb, ino_of_node(page));
 			if (IS_ERR(entry->inode)) {
 				err = PTR_ERR(entry->inode);
 				kmem_cache_free(fsync_entry_slab, entry);
+				if (err == -ENOENT)
+					goto next;
 				break;
 			}
 			list_add_tail(&entry->list, head);
@@ -416,6 +457,13 @@ static int recover_data(struct f2fs_sb_info *sbi,
 		entry = get_fsync_inode(head, ino_of_node(page));
 		if (!entry)
 			goto next;
+		/*
+		 * inode(x) | CP | inode(x) | dnode(F)
+		 * In this case, we can lose the latest inode(x).
+		 * So, call __recover_inode for the inode update.
+		 */
+		if (IS_INODE(page))
+			__recover_inode(entry->inode, page);
 
 		err = do_recover_data(sbi, entry->inode, page, blkaddr);
 		if (err)
-- 
1.8.5.2 (Apple Git-48)


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH] f2fs: Fix recover when nid of non-inode dnode < nid of inode
@ 2014-09-11  5:37               ` Jaegeuk Kim
  0 siblings, 0 replies; 31+ messages in thread
From: Jaegeuk Kim @ 2014-09-11  5:37 UTC (permalink / raw)
  To: huang ying; +Cc: linux-f2fs-devel, LKML, Huang Ying

On Wed, Sep 10, 2014 at 07:08:32PM +0800, huang ying wrote:
> On Wed, Sep 10, 2014 at 3:21 PM, Jaegeuk Kim <jaegeuk@kernel.org> wrote:
> 
> > On Tue, Sep 09, 2014 at 07:31:49PM +0800, huang ying wrote:
> > > On Tue, Sep 9, 2014 at 3:09 PM, Jaegeuk Kim <jaegeuk@kernel.org> wrote:
> > >
> > > > Hi,
> > > >
> > > > On Tue, Sep 09, 2014 at 01:39:30PM +0800, Huang Ying wrote:
> > > > > On Mon, 2014-09-08 at 22:23 -0700, Jaegeuk Kim wrote:
> > > > > > Hi Huang,
> > > > > >
> > > > > > On Mon, Sep 08, 2014 at 07:38:26PM +0800, Huang Ying wrote:
> > > > > > > For fsync, if the nid of a non-inode dnode < nid of inode and the
> > > > > > > inode is not checkpointed.  The non-inode dnode may be written
> > before
> > > > > > > inode.  So in find_fsync_dnodes, f2fs_iget will fail, cause the
> > > > > > > recovery fail.
> > > > > > >
> > > > > > > Usually, inode will be allocated before non-inode dnode, so the
> > nid
> > > > of
> > > > > > > inode < nid of non-inode dnode.  But it is possible for the
> > reverse.
> > > > > > > For example, because of alloc_nid_failed.
> > > > > > >
> > > > > > > This is fixed via ignoring non-inode dnode before inode dnode in
> > > > > > > find_fsync_dnodes.
> > > > > > >
> > > > > > > The patch was tested via allocating nid reversely via a debugging
> > > > > > > patch, that is, from big number to small number.
> > > > > > >
> > > > > > > Signed-off-by: Huang, Ying <ying.huang@intel.com>
> > > > > > > ---
> > > > > > >  fs/f2fs/recovery.c |    7 ++++---
> > > > > > >  1 file changed, 4 insertions(+), 3 deletions(-)
> > > > > > >
> > > > > > > --- a/fs/f2fs/recovery.c
> > > > > > > +++ b/fs/f2fs/recovery.c
> > > > > > > @@ -172,8 +172,8 @@ static int find_fsync_dnodes(struct f2fs
> > > > > > >                   if (IS_INODE(page) && is_dent_dnode(page))
> > > > > > >                           set_inode_flag(F2FS_I(entry->inode),
> > > > > > >                                                   FI_INC_LINK);
> > > > > > > -         } else {
> > > > > > > -                 if (IS_INODE(page) && is_dent_dnode(page)) {
> > > > > >
> > > > > > If this is not inode block, we should add this inode to recover its
> > > > data blocks.
> > > > >
> > > > > Is it possible that there is only non-inode dnode but no inode when
> > > > > find_fsync_dnodes checking dnodes?  Per my understanding, any
> > changes to
> > > > > file will cause inode page dirty (for example, mtime changed), so
> > that
> > > > > we will write inode block.  Is it right?  If so, the solution in this
> > > > > patch should work too.
> > > >
> > > > Your description says that f2fs_iget will fail, which causes the
> > recovery
> > > > fail.
> > > > So, I thought it would be better to handle the f2fs_iget failure
> > directly.
> > > >
> > >
> > > Yes.  That is another way to fix the issue.
> > >
> > >
> > > > In addition, we cannot guarantee the write order of dnode and inode.
> > > > For exmaple,
> > > > 1. the inode is written by flusher or kswapd, then,
> > > > 2. f2fs_sync_file writes its dnode.
> > > >
> > > > In that case, we can get only non-inode dnode in the node chain, since
> > the
> > > > inode
> > > > has not fsync_mark.
> > > >
> > >
> > > I think your solution is better here, but does not fix all scenarios.  If
> > > the inode is checkpointed, the file can be recovered, although the inode
> > > information may be not up to date.  But if the inode is not checkpointed,
> > > f2fs_iget will fail too and recover will fail.
> >
> > Ok, let me consider your scenarios.
> >
> > Term: F: fsync_mark, D: dentry_mark
> >
> > 1. inode(x) | CP | inode(x) | dnode(F)
> >  -> Lose the latest inode(x). Need to fix.
> >
> > 2. inode(x) | CP | dnode(F) | inode(x)
> >  -> Impossible, but recover latest dnode(F)
> >
> > 3. CP | inode(x) | dnode(F)
> >  -> Need to write inode(DF) in f2fs_sync_file.
> >
> > 4. CP | dnode(F) | inode(DF)
> >  -> If f2fs_iget fails, then goto next.
> >
> > 5. CP | dnode(F) | inode(x)
> >  -> If f2fs_iget fails, then goto next. But, this is an impossible
> > scenario.
> >     Drop this dnode(F).
> >
> > Indeed, there were some missing scenarios.
> >
> > So, how about this patch?
> >
> > From 552dc68c5f07a335d7b55c197bab531efb135521 Mon Sep 17 00:00:00 2001
> > From: Jaegeuk Kim <jaegeuk@kernel.org>
> > Date: Wed, 10 Sep 2014 00:16:34 -0700
> > Subject: [PATCH] f2fs: fix roll-forward missing scenarios
> >
> > We can summarize the roll forward recovery scenarios as follows.
> >
> > [Term] F: fsync_mark, D: dentry_mark
> >
> > 1. inode(x) | CP | inode(x) | dnode(F)
> > -> Update the latest inode(x).
> >
> > 2. inode(x) | CP | inode(F) | dnode(F)
> > -> No problem.
> >
> > 3. inode(x) | CP | dnode(F) | inode(x)
> > -> Impossible, but recover latest dnode(F)
> >
> 
> I think this is possible.  If f2fs_sync_file runs concurrently with
> writeback. f2fs_sync_file written dnode(F), then writeback written inode(x).

If the inode(x) was written, f2fs_sync_file will do write the inode again with
fsync_mark. So, dnode(F) | inode(x) | inode(F) should be shown.

In f2fs_sync_file,
	...
	while (!sync_node_pages(sbi, ino, &wbc)) {
		if (fsync_mark_done(sbi, ino))
			goto out;
		mark_inode_dirty_sync(inode);
		ret = f2fs_write_inode(inode, NULL);
		if (ret)
			goto out;
	}
	...

> 
> 
> > 4. inode(x) | CP | dnode(F) | inode(F)
> > -> No problem.
> >
> > 5. CP | inode(x) | dnode(F)
> > -> Impossible. Write inode(DF) with dnode(F) by f2fs_sync_file.
> >
> 
> I think this is possible.  If inode(x) is written by writeback and then
> dnode(F) is written by f2fs_sync_file.

-> The inode(DF) was missing. Should drop this dnode(F).
Again, CP | inode(x) | dnode(F) | inode(DF) should be shown.
This is fixed by the below updated patch.

> 
> 
> > 6. CP | inode(DF) | dnode(F)
> > -> No problem.
> >
> > 7. CP | dnode(F) | inode(DF)
> > -> If f2fs_iget fails, then goto next to find inode(DF).
> >
> > 8. CP | dnode(F) | inode(x)
> > -> If f2fs_iget fails, then goto next. But, this is an impossible scenario.
> >    Drop this dnode(F).
> >
> 
> I think this is possible.  as 3.

ditto.

> 
> 
> > So, this patch adds some missing points such as #1, #5, #7, and #8.
> >
> > Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> > ---
> >  fs/f2fs/file.c     | 10 ++++++++
> >  fs/f2fs/recovery.c | 70
> > +++++++++++++++++++++++++++++++++++++++++++++---------
> >  2 files changed, 69 insertions(+), 11 deletions(-)
> >
> > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> > index 5cde363..2660af2 100644
> > --- a/fs/f2fs/file.c
> > +++ b/fs/f2fs/file.c
> > @@ -207,6 +207,16 @@ int f2fs_sync_file(struct file *file, loff_t start,
> > loff_t end, int datasync)
> >                         up_write(&fi->i_sem);
> >                 }
> >         } else {
> > +               /*
> > +                * CP | inode(x) | dnode(F)
> > +                * We need to remain inode(DF) for roll-forward recovery.
> > +                */
> > +               if (fsync_mark_done(sbi, inode->i_ino) &&
> >
> 
> How can this happen?  We have not called sync_node_pages(sbi, inode->i_ino,
> &wbc).  Where is the fsync mark set?
> 
> 
> > +                       !is_checkpointed_node(sbi, F2FS_I(inode)->i_pino))
> > {
> >
> 
> if !is_checkpointed_node(sbi, F2FS_I(inode)->i_pino), we need to do
> checkpoint.  Right?

Yes, whole conditions were wrong.
Please, refer the following patch.
Thanks,

>From 2e70059dd680830c030f59813d4e9837cb65ecb1 Mon Sep 17 00:00:00 2001
From: Jaegeuk Kim <jaegeuk@kernel.org>
Date: Wed, 10 Sep 2014 00:16:34 -0700
Subject: [PATCH 1/2] f2fs: fix roll-forward missing scenarios

We can summarize the roll forward recovery scenarios as follows.

[Term] F: fsync_mark, D: dentry_mark

1. inode(x) | CP | inode(x) | dnode(F)
-> Update the latest inode(x).

2. inode(x) | CP | inode(F) | dnode(F)
-> No problem.

3. inode(x) | CP | dnode(F) | inode(x)
-> Impossible, but recover latest dnode(F).

4. inode(x) | CP | dnode(F) | inode(F)
-> No problem.

5. CP | inode(x) | dnode(F)
-> The inode(DF) was missing. Should drop this dnode(F).

6. CP | inode(DF) | dnode(F)
-> No problem.

7. CP | dnode(F) | inode(DF)
-> If f2fs_iget fails, then goto next to find inode(DF).

8. CP | dnode(F) | inode(x)
-> If f2fs_iget fails, then goto next to find inode(DF).
   But it will fail due to no inode(DF).

So, this patch adds some missing points such as #1, #5, #7, and #8.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
---
 fs/f2fs/file.c     | 10 ++++++++
 fs/f2fs/recovery.c | 70 +++++++++++++++++++++++++++++++++++++++++++++---------
 2 files changed, 69 insertions(+), 11 deletions(-)

diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index e7681c3..e2a2f38 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -206,6 +206,16 @@ int f2fs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
 			up_write(&fi->i_sem);
 		}
 	} else {
+		/*
+		 * CP | inode(x) | dnode(F)
+		 * We need to remain inode(DF) for roll-forward recovery.
+		 */
+		if (!fsync_mark_done(sbi, inode->i_ino) &&
+			!is_checkpointed_node(sbi, inode->i_ino)) {
+			mark_inode_dirty_sync(inode);
+			f2fs_write_inode(inode, NULL);
+		}
+
 		/* if there is no written node page, write its inode page */
 		while (!sync_node_pages(sbi, ino, &wbc)) {
 			if (fsync_mark_done(sbi, ino))
diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c
index 6c5a74a..eeb3785 100644
--- a/fs/f2fs/recovery.c
+++ b/fs/f2fs/recovery.c
@@ -14,6 +14,36 @@
 #include "node.h"
 #include "segment.h"
 
+/*
+ * Roll forward recovery scenarios.
+ *
+ * [Term] F: fsync_mark, D: dentry_mark
+ *
+ * 1. inode(x) | CP | inode(x) | dnode(F)
+ * -> Update the latest inode(x).
+ *
+ * 2. inode(x) | CP | inode(F) | dnode(F)
+ * -> No problem.
+ *
+ * 3. inode(x) | CP | dnode(F) | inode(x)
+ * -> Impossible, but recover latest dnode(F)
+ *
+ * 4. inode(x) | CP | dnode(F) | inode(F)
+ * -> No problem.
+ *
+ * 5. CP | inode(x) | dnode(F)
+ * -> The inode(DF) was missing. Should drop this dnode(F).
+ *
+ * 6. CP | inode(DF) | dnode(F)
+ * -> No problem.
+ *
+ * 7. CP | dnode(F) | inode(DF)
+ * -> If f2fs_iget fails, then goto next to find inode(DF).
+ *
+ * 8. CP | dnode(F) | inode(x)
+ * -> If f2fs_iget fails, then goto next to find inode(DF).
+ *    But it will fail due to no inode(DF).
+ */
 static struct kmem_cache *fsync_entry_slab;
 
 bool space_for_roll_forward(struct f2fs_sb_info *sbi)
@@ -110,27 +140,32 @@ out:
 	return err;
 }
 
-static int recover_inode(struct inode *inode, struct page *node_page)
+static void __recover_inode(struct inode *inode, struct page *page)
 {
-	struct f2fs_inode *raw_inode = F2FS_INODE(node_page);
+	struct f2fs_inode *raw = F2FS_INODE(page);
+
+	inode->i_mode = le16_to_cpu(raw->i_mode);
+	i_size_write(inode, le64_to_cpu(raw->i_size));
+	inode->i_atime.tv_sec = le64_to_cpu(raw->i_mtime);
+	inode->i_ctime.tv_sec = le64_to_cpu(raw->i_ctime);
+	inode->i_mtime.tv_sec = le64_to_cpu(raw->i_mtime);
+	inode->i_atime.tv_nsec = le32_to_cpu(raw->i_mtime_nsec);
+	inode->i_ctime.tv_nsec = le32_to_cpu(raw->i_ctime_nsec);
+	inode->i_mtime.tv_nsec = le32_to_cpu(raw->i_mtime_nsec);
+}
 
+static int recover_inode(struct inode *inode, struct page *node_page)
+{
 	if (!IS_INODE(node_page))
 		return 0;
 
-	inode->i_mode = le16_to_cpu(raw_inode->i_mode);
-	i_size_write(inode, le64_to_cpu(raw_inode->i_size));
-	inode->i_atime.tv_sec = le64_to_cpu(raw_inode->i_mtime);
-	inode->i_ctime.tv_sec = le64_to_cpu(raw_inode->i_ctime);
-	inode->i_mtime.tv_sec = le64_to_cpu(raw_inode->i_mtime);
-	inode->i_atime.tv_nsec = le32_to_cpu(raw_inode->i_mtime_nsec);
-	inode->i_ctime.tv_nsec = le32_to_cpu(raw_inode->i_ctime_nsec);
-	inode->i_mtime.tv_nsec = le32_to_cpu(raw_inode->i_mtime_nsec);
+	__recover_inode(inode, node_page);
 
 	if (is_dent_dnode(node_page))
 		return recover_dentry(node_page, inode);
 
 	f2fs_msg(inode->i_sb, KERN_NOTICE, "recover_inode: ino = %x, name = %s",
-			ino_of_node(node_page), raw_inode->i_name);
+			ino_of_node(node_page), F2FS_INODE(node_page)->i_name);
 	return 0;
 }
 
@@ -186,10 +221,16 @@ static int find_fsync_dnodes(struct f2fs_sb_info *sbi, struct list_head *head)
 				break;
 			}
 
+			/*
+			 * CP | dnode(F) | inode(DF)
+			 * For this case, we should not give up now.
+			 */
 			entry->inode = f2fs_iget(sbi->sb, ino_of_node(page));
 			if (IS_ERR(entry->inode)) {
 				err = PTR_ERR(entry->inode);
 				kmem_cache_free(fsync_entry_slab, entry);
+				if (err == -ENOENT)
+					goto next;
 				break;
 			}
 			list_add_tail(&entry->list, head);
@@ -416,6 +457,13 @@ static int recover_data(struct f2fs_sb_info *sbi,
 		entry = get_fsync_inode(head, ino_of_node(page));
 		if (!entry)
 			goto next;
+		/*
+		 * inode(x) | CP | inode(x) | dnode(F)
+		 * In this case, we can lose the latest inode(x).
+		 * So, call __recover_inode for the inode update.
+		 */
+		if (IS_INODE(page))
+			__recover_inode(entry->inode, page);
 
 		err = do_recover_data(sbi, entry->inode, page, blkaddr);
 		if (err)
-- 
1.8.5.2 (Apple Git-48)


------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* RE: [f2fs-dev] [PATCH] f2fs: Fix recover when nid of non-inode dnode < nid of inode
  2014-09-11  5:37               ` Jaegeuk Kim
@ 2014-09-11 10:47                 ` Chao Yu
  -1 siblings, 0 replies; 31+ messages in thread
From: Chao Yu @ 2014-09-11 10:47 UTC (permalink / raw)
  To: 'Jaegeuk Kim', 'huang ying'
  Cc: linux-f2fs-devel, 'LKML', 'Huang Ying'

> -----Original Message-----
> From: Jaegeuk Kim [mailto:jaegeuk@kernel.org]
> Sent: Thursday, September 11, 2014 1:37 PM
> To: huang ying
> Cc: linux-f2fs-devel@lists.sourceforge.net; LKML; Huang Ying
> Subject: Re: [f2fs-dev] [PATCH] f2fs: Fix recover when nid of non-inode dnode < nid of inode
> 
> On Wed, Sep 10, 2014 at 07:08:32PM +0800, huang ying wrote:
> > On Wed, Sep 10, 2014 at 3:21 PM, Jaegeuk Kim <jaegeuk@kernel.org> wrote:
> >
> > > On Tue, Sep 09, 2014 at 07:31:49PM +0800, huang ying wrote:
> > > > On Tue, Sep 9, 2014 at 3:09 PM, Jaegeuk Kim <jaegeuk@kernel.org> wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > On Tue, Sep 09, 2014 at 01:39:30PM +0800, Huang Ying wrote:
> > > > > > On Mon, 2014-09-08 at 22:23 -0700, Jaegeuk Kim wrote:
> > > > > > > Hi Huang,
> > > > > > >
> > > > > > > On Mon, Sep 08, 2014 at 07:38:26PM +0800, Huang Ying wrote:
> > > > > > > > For fsync, if the nid of a non-inode dnode < nid of inode and the
> > > > > > > > inode is not checkpointed.  The non-inode dnode may be written
> > > before
> > > > > > > > inode.  So in find_fsync_dnodes, f2fs_iget will fail, cause the
> > > > > > > > recovery fail.
> > > > > > > >
> > > > > > > > Usually, inode will be allocated before non-inode dnode, so the
> > > nid
> > > > > of
> > > > > > > > inode < nid of non-inode dnode.  But it is possible for the
> > > reverse.
> > > > > > > > For example, because of alloc_nid_failed.
> > > > > > > >
> > > > > > > > This is fixed via ignoring non-inode dnode before inode dnode in
> > > > > > > > find_fsync_dnodes.
> > > > > > > >
> > > > > > > > The patch was tested via allocating nid reversely via a debugging
> > > > > > > > patch, that is, from big number to small number.
> > > > > > > >
> > > > > > > > Signed-off-by: Huang, Ying <ying.huang@intel.com>
> > > > > > > > ---
> > > > > > > >  fs/f2fs/recovery.c |    7 ++++---
> > > > > > > >  1 file changed, 4 insertions(+), 3 deletions(-)
> > > > > > > >
> > > > > > > > --- a/fs/f2fs/recovery.c
> > > > > > > > +++ b/fs/f2fs/recovery.c
> > > > > > > > @@ -172,8 +172,8 @@ static int find_fsync_dnodes(struct f2fs
> > > > > > > >                   if (IS_INODE(page) && is_dent_dnode(page))
> > > > > > > >                           set_inode_flag(F2FS_I(entry->inode),
> > > > > > > >                                                   FI_INC_LINK);
> > > > > > > > -         } else {
> > > > > > > > -                 if (IS_INODE(page) && is_dent_dnode(page)) {
> > > > > > >
> > > > > > > If this is not inode block, we should add this inode to recover its
> > > > > data blocks.
> > > > > >
> > > > > > Is it possible that there is only non-inode dnode but no inode when
> > > > > > find_fsync_dnodes checking dnodes?  Per my understanding, any
> > > changes to
> > > > > > file will cause inode page dirty (for example, mtime changed), so
> > > that
> > > > > > we will write inode block.  Is it right?  If so, the solution in this
> > > > > > patch should work too.
> > > > >
> > > > > Your description says that f2fs_iget will fail, which causes the
> > > recovery
> > > > > fail.
> > > > > So, I thought it would be better to handle the f2fs_iget failure
> > > directly.
> > > > >
> > > >
> > > > Yes.  That is another way to fix the issue.
> > > >
> > > >
> > > > > In addition, we cannot guarantee the write order of dnode and inode.
> > > > > For exmaple,
> > > > > 1. the inode is written by flusher or kswapd, then,
> > > > > 2. f2fs_sync_file writes its dnode.
> > > > >
> > > > > In that case, we can get only non-inode dnode in the node chain, since
> > > the
> > > > > inode
> > > > > has not fsync_mark.
> > > > >
> > > >
> > > > I think your solution is better here, but does not fix all scenarios.  If
> > > > the inode is checkpointed, the file can be recovered, although the inode
> > > > information may be not up to date.  But if the inode is not checkpointed,
> > > > f2fs_iget will fail too and recover will fail.
> > >
> > > Ok, let me consider your scenarios.
> > >
> > > Term: F: fsync_mark, D: dentry_mark
> > >
> > > 1. inode(x) | CP | inode(x) | dnode(F)
> > >  -> Lose the latest inode(x). Need to fix.
> > >
> > > 2. inode(x) | CP | dnode(F) | inode(x)
> > >  -> Impossible, but recover latest dnode(F)
> > >
> > > 3. CP | inode(x) | dnode(F)
> > >  -> Need to write inode(DF) in f2fs_sync_file.
> > >
> > > 4. CP | dnode(F) | inode(DF)
> > >  -> If f2fs_iget fails, then goto next.
> > >
> > > 5. CP | dnode(F) | inode(x)
> > >  -> If f2fs_iget fails, then goto next. But, this is an impossible
> > > scenario.
> > >     Drop this dnode(F).
> > >
> > > Indeed, there were some missing scenarios.
> > >
> > > So, how about this patch?
> > >
> > > From 552dc68c5f07a335d7b55c197bab531efb135521 Mon Sep 17 00:00:00 2001
> > > From: Jaegeuk Kim <jaegeuk@kernel.org>
> > > Date: Wed, 10 Sep 2014 00:16:34 -0700
> > > Subject: [PATCH] f2fs: fix roll-forward missing scenarios
> > >
> > > We can summarize the roll forward recovery scenarios as follows.
> > >
> > > [Term] F: fsync_mark, D: dentry_mark
> > >
> > > 1. inode(x) | CP | inode(x) | dnode(F)
> > > -> Update the latest inode(x).
> > >
> > > 2. inode(x) | CP | inode(F) | dnode(F)
> > > -> No problem.
> > >
> > > 3. inode(x) | CP | dnode(F) | inode(x)
> > > -> Impossible, but recover latest dnode(F)
> > >
> >
> > I think this is possible.  If f2fs_sync_file runs concurrently with
> > writeback. f2fs_sync_file written dnode(F), then writeback written inode(x).
> 
> If the inode(x) was written, f2fs_sync_file will do write the inode again with
> fsync_mark. So, dnode(F) | inode(x) | inode(F) should be shown.
> 
> In f2fs_sync_file,
> 	...
> 	while (!sync_node_pages(sbi, ino, &wbc)) {
> 		if (fsync_mark_done(sbi, ino))
> 			goto out;
> 		mark_inode_dirty_sync(inode);
> 		ret = f2fs_write_inode(inode, NULL);
> 		if (ret)
> 			goto out;
> 	}
> 	...
> 
> >
> >
> > > 4. inode(x) | CP | dnode(F) | inode(F)
> > > -> No problem.
> > >
> > > 5. CP | inode(x) | dnode(F)
> > > -> Impossible. Write inode(DF) with dnode(F) by f2fs_sync_file.
> > >
> >
> > I think this is possible.  If inode(x) is written by writeback and then
> > dnode(F) is written by f2fs_sync_file.
> 
> -> The inode(DF) was missing. Should drop this dnode(F).
> Again, CP | inode(x) | dnode(F) | inode(DF) should be shown.
> This is fixed by the below updated patch.
> 
> >
> >
> > > 6. CP | inode(DF) | dnode(F)
> > > -> No problem.
> > >
> > > 7. CP | dnode(F) | inode(DF)
> > > -> If f2fs_iget fails, then goto next to find inode(DF).
> > >
> > > 8. CP | dnode(F) | inode(x)
> > > -> If f2fs_iget fails, then goto next. But, this is an impossible scenario.
> > >    Drop this dnode(F).
> > >
> >
> > I think this is possible.  as 3.
> 
> ditto.
> 
> >
> >
> > > So, this patch adds some missing points such as #1, #5, #7, and #8.
> > >
> > > Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> > > ---
> > >  fs/f2fs/file.c     | 10 ++++++++
> > >  fs/f2fs/recovery.c | 70
> > > +++++++++++++++++++++++++++++++++++++++++++++---------
> > >  2 files changed, 69 insertions(+), 11 deletions(-)
> > >
> > > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> > > index 5cde363..2660af2 100644
> > > --- a/fs/f2fs/file.c
> > > +++ b/fs/f2fs/file.c
> > > @@ -207,6 +207,16 @@ int f2fs_sync_file(struct file *file, loff_t start,
> > > loff_t end, int datasync)
> > >                         up_write(&fi->i_sem);
> > >                 }
> > >         } else {
> > > +               /*
> > > +                * CP | inode(x) | dnode(F)
> > > +                * We need to remain inode(DF) for roll-forward recovery.
> > > +                */
> > > +               if (fsync_mark_done(sbi, inode->i_ino) &&
> > >
> >
> > How can this happen?  We have not called sync_node_pages(sbi, inode->i_ino,
> > &wbc).  Where is the fsync mark set?
> >
> >
> > > +                       !is_checkpointed_node(sbi, F2FS_I(inode)->i_pino))
> > > {
> > >
> >
> > if !is_checkpointed_node(sbi, F2FS_I(inode)->i_pino), we need to do
> > checkpoint.  Right?
> 
> Yes, whole conditions were wrong.
> Please, refer the following patch.
> Thanks,
> 
> >From 2e70059dd680830c030f59813d4e9837cb65ecb1 Mon Sep 17 00:00:00 2001
> From: Jaegeuk Kim <jaegeuk@kernel.org>
> Date: Wed, 10 Sep 2014 00:16:34 -0700
> Subject: [PATCH 1/2] f2fs: fix roll-forward missing scenarios
> 
> We can summarize the roll forward recovery scenarios as follows.
> 
> [Term] F: fsync_mark, D: dentry_mark
> 
> 1. inode(x) | CP | inode(x) | dnode(F)
> -> Update the latest inode(x).
> 
> 2. inode(x) | CP | inode(F) | dnode(F)
> -> No problem.
> 
> 3. inode(x) | CP | dnode(F) | inode(x)
> -> Impossible, but recover latest dnode(F).
> 
> 4. inode(x) | CP | dnode(F) | inode(F)
> -> No problem.
> 
> 5. CP | inode(x) | dnode(F)
> -> The inode(DF) was missing. Should drop this dnode(F).
> 
> 6. CP | inode(DF) | dnode(F)
> -> No problem.
> 
> 7. CP | dnode(F) | inode(DF)
> -> If f2fs_iget fails, then goto next to find inode(DF).
> 
> 8. CP | dnode(F) | inode(x)
> -> If f2fs_iget fails, then goto next to find inode(DF).
>    But it will fail due to no inode(DF).
> 
> So, this patch adds some missing points such as #1, #5, #7, and #8.
> 
> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> ---
>  fs/f2fs/file.c     | 10 ++++++++
>  fs/f2fs/recovery.c | 70 +++++++++++++++++++++++++++++++++++++++++++++---------
>  2 files changed, 69 insertions(+), 11 deletions(-)
> 
> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> index e7681c3..e2a2f38 100644
> --- a/fs/f2fs/file.c
> +++ b/fs/f2fs/file.c
> @@ -206,6 +206,16 @@ int f2fs_sync_file(struct file *file, loff_t start, loff_t end, int
> datasync)
>  			up_write(&fi->i_sem);
>  		}
>  	} else {
> +		/*
> +		 * CP | inode(x) | dnode(F)
> +		 * We need to remain inode(DF) for roll-forward recovery.
> +		 */
> +		if (!fsync_mark_done(sbi, inode->i_ino) &&
> +			!is_checkpointed_node(sbi, inode->i_ino)) {
> +			mark_inode_dirty_sync(inode);
> +			f2fs_write_inode(inode, NULL);
> +		}

This un-checkpointed non-fsynced inode page might be writeback by
kswapd/bdi-flusher here. Then CP | inode(x) | dnode(F) occur.

So how about modifying as below:

		while (!sync_node_pages(sbi, inode->i_ino, &wbc) ||
			(!fsync_mark_done(sbi, inode->i_ino) &&
			!is_checkpointed_node(sbi, inode->i_ino))) {

to guarantee the inode(DF) is be written out to disk?

Thanks,
Yu

> +
>  		/* if there is no written node page, write its inode page */
>  		while (!sync_node_pages(sbi, ino, &wbc)) {
>  			if (fsync_mark_done(sbi, ino))
> diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c
> index 6c5a74a..eeb3785 100644
> --- a/fs/f2fs/recovery.c
> +++ b/fs/f2fs/recovery.c
> @@ -14,6 +14,36 @@
>  #include "node.h"
>  #include "segment.h"
> 
> +/*
> + * Roll forward recovery scenarios.
> + *
> + * [Term] F: fsync_mark, D: dentry_mark
> + *
> + * 1. inode(x) | CP | inode(x) | dnode(F)
> + * -> Update the latest inode(x).
> + *
> + * 2. inode(x) | CP | inode(F) | dnode(F)
> + * -> No problem.
> + *
> + * 3. inode(x) | CP | dnode(F) | inode(x)
> + * -> Impossible, but recover latest dnode(F)
> + *
> + * 4. inode(x) | CP | dnode(F) | inode(F)
> + * -> No problem.
> + *
> + * 5. CP | inode(x) | dnode(F)
> + * -> The inode(DF) was missing. Should drop this dnode(F).
> + *
> + * 6. CP | inode(DF) | dnode(F)
> + * -> No problem.
> + *
> + * 7. CP | dnode(F) | inode(DF)
> + * -> If f2fs_iget fails, then goto next to find inode(DF).
> + *
> + * 8. CP | dnode(F) | inode(x)
> + * -> If f2fs_iget fails, then goto next to find inode(DF).
> + *    But it will fail due to no inode(DF).
> + */
>  static struct kmem_cache *fsync_entry_slab;
> 
>  bool space_for_roll_forward(struct f2fs_sb_info *sbi)
> @@ -110,27 +140,32 @@ out:
>  	return err;
>  }
> 
> -static int recover_inode(struct inode *inode, struct page *node_page)
> +static void __recover_inode(struct inode *inode, struct page *page)
>  {
> -	struct f2fs_inode *raw_inode = F2FS_INODE(node_page);
> +	struct f2fs_inode *raw = F2FS_INODE(page);
> +
> +	inode->i_mode = le16_to_cpu(raw->i_mode);
> +	i_size_write(inode, le64_to_cpu(raw->i_size));
> +	inode->i_atime.tv_sec = le64_to_cpu(raw->i_mtime);
> +	inode->i_ctime.tv_sec = le64_to_cpu(raw->i_ctime);
> +	inode->i_mtime.tv_sec = le64_to_cpu(raw->i_mtime);
> +	inode->i_atime.tv_nsec = le32_to_cpu(raw->i_mtime_nsec);
> +	inode->i_ctime.tv_nsec = le32_to_cpu(raw->i_ctime_nsec);
> +	inode->i_mtime.tv_nsec = le32_to_cpu(raw->i_mtime_nsec);
> +}
> 
> +static int recover_inode(struct inode *inode, struct page *node_page)
> +{
>  	if (!IS_INODE(node_page))
>  		return 0;
> 
> -	inode->i_mode = le16_to_cpu(raw_inode->i_mode);
> -	i_size_write(inode, le64_to_cpu(raw_inode->i_size));
> -	inode->i_atime.tv_sec = le64_to_cpu(raw_inode->i_mtime);
> -	inode->i_ctime.tv_sec = le64_to_cpu(raw_inode->i_ctime);
> -	inode->i_mtime.tv_sec = le64_to_cpu(raw_inode->i_mtime);
> -	inode->i_atime.tv_nsec = le32_to_cpu(raw_inode->i_mtime_nsec);
> -	inode->i_ctime.tv_nsec = le32_to_cpu(raw_inode->i_ctime_nsec);
> -	inode->i_mtime.tv_nsec = le32_to_cpu(raw_inode->i_mtime_nsec);
> +	__recover_inode(inode, node_page);
> 
>  	if (is_dent_dnode(node_page))
>  		return recover_dentry(node_page, inode);
> 
>  	f2fs_msg(inode->i_sb, KERN_NOTICE, "recover_inode: ino = %x, name = %s",
> -			ino_of_node(node_page), raw_inode->i_name);
> +			ino_of_node(node_page), F2FS_INODE(node_page)->i_name);
>  	return 0;
>  }
> 
> @@ -186,10 +221,16 @@ static int find_fsync_dnodes(struct f2fs_sb_info *sbi, struct list_head
> *head)
>  				break;
>  			}
> 
> +			/*
> +			 * CP | dnode(F) | inode(DF)
> +			 * For this case, we should not give up now.
> +			 */
>  			entry->inode = f2fs_iget(sbi->sb, ino_of_node(page));
>  			if (IS_ERR(entry->inode)) {
>  				err = PTR_ERR(entry->inode);
>  				kmem_cache_free(fsync_entry_slab, entry);
> +				if (err == -ENOENT)
> +					goto next;
>  				break;
>  			}
>  			list_add_tail(&entry->list, head);
> @@ -416,6 +457,13 @@ static int recover_data(struct f2fs_sb_info *sbi,
>  		entry = get_fsync_inode(head, ino_of_node(page));
>  		if (!entry)
>  			goto next;
> +		/*
> +		 * inode(x) | CP | inode(x) | dnode(F)
> +		 * In this case, we can lose the latest inode(x).
> +		 * So, call __recover_inode for the inode update.
> +		 */
> +		if (IS_INODE(page))
> +			__recover_inode(entry->inode, page);
> 
>  		err = do_recover_data(sbi, entry->inode, page, blkaddr);
>  		if (err)
> --
> 1.8.5.2 (Apple Git-48)
> 
> 
> ------------------------------------------------------------------------------
> Want excitement?
> Manually upgrade your production database.
> When you want reliability, choose Perforce
> Perforce version control. Predictably reliable.
> http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
> _______________________________________________
> Linux-f2fs-devel mailing list
> Linux-f2fs-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] f2fs: Fix recover when nid of non-inode dnode < nid of inode
@ 2014-09-11 10:47                 ` Chao Yu
  0 siblings, 0 replies; 31+ messages in thread
From: Chao Yu @ 2014-09-11 10:47 UTC (permalink / raw)
  To: 'Jaegeuk Kim', 'huang ying'
  Cc: 'Huang Ying', 'LKML', linux-f2fs-devel

> -----Original Message-----
> From: Jaegeuk Kim [mailto:jaegeuk@kernel.org]
> Sent: Thursday, September 11, 2014 1:37 PM
> To: huang ying
> Cc: linux-f2fs-devel@lists.sourceforge.net; LKML; Huang Ying
> Subject: Re: [f2fs-dev] [PATCH] f2fs: Fix recover when nid of non-inode dnode < nid of inode
> 
> On Wed, Sep 10, 2014 at 07:08:32PM +0800, huang ying wrote:
> > On Wed, Sep 10, 2014 at 3:21 PM, Jaegeuk Kim <jaegeuk@kernel.org> wrote:
> >
> > > On Tue, Sep 09, 2014 at 07:31:49PM +0800, huang ying wrote:
> > > > On Tue, Sep 9, 2014 at 3:09 PM, Jaegeuk Kim <jaegeuk@kernel.org> wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > On Tue, Sep 09, 2014 at 01:39:30PM +0800, Huang Ying wrote:
> > > > > > On Mon, 2014-09-08 at 22:23 -0700, Jaegeuk Kim wrote:
> > > > > > > Hi Huang,
> > > > > > >
> > > > > > > On Mon, Sep 08, 2014 at 07:38:26PM +0800, Huang Ying wrote:
> > > > > > > > For fsync, if the nid of a non-inode dnode < nid of inode and the
> > > > > > > > inode is not checkpointed.  The non-inode dnode may be written
> > > before
> > > > > > > > inode.  So in find_fsync_dnodes, f2fs_iget will fail, cause the
> > > > > > > > recovery fail.
> > > > > > > >
> > > > > > > > Usually, inode will be allocated before non-inode dnode, so the
> > > nid
> > > > > of
> > > > > > > > inode < nid of non-inode dnode.  But it is possible for the
> > > reverse.
> > > > > > > > For example, because of alloc_nid_failed.
> > > > > > > >
> > > > > > > > This is fixed via ignoring non-inode dnode before inode dnode in
> > > > > > > > find_fsync_dnodes.
> > > > > > > >
> > > > > > > > The patch was tested via allocating nid reversely via a debugging
> > > > > > > > patch, that is, from big number to small number.
> > > > > > > >
> > > > > > > > Signed-off-by: Huang, Ying <ying.huang@intel.com>
> > > > > > > > ---
> > > > > > > >  fs/f2fs/recovery.c |    7 ++++---
> > > > > > > >  1 file changed, 4 insertions(+), 3 deletions(-)
> > > > > > > >
> > > > > > > > --- a/fs/f2fs/recovery.c
> > > > > > > > +++ b/fs/f2fs/recovery.c
> > > > > > > > @@ -172,8 +172,8 @@ static int find_fsync_dnodes(struct f2fs
> > > > > > > >                   if (IS_INODE(page) && is_dent_dnode(page))
> > > > > > > >                           set_inode_flag(F2FS_I(entry->inode),
> > > > > > > >                                                   FI_INC_LINK);
> > > > > > > > -         } else {
> > > > > > > > -                 if (IS_INODE(page) && is_dent_dnode(page)) {
> > > > > > >
> > > > > > > If this is not inode block, we should add this inode to recover its
> > > > > data blocks.
> > > > > >
> > > > > > Is it possible that there is only non-inode dnode but no inode when
> > > > > > find_fsync_dnodes checking dnodes?  Per my understanding, any
> > > changes to
> > > > > > file will cause inode page dirty (for example, mtime changed), so
> > > that
> > > > > > we will write inode block.  Is it right?  If so, the solution in this
> > > > > > patch should work too.
> > > > >
> > > > > Your description says that f2fs_iget will fail, which causes the
> > > recovery
> > > > > fail.
> > > > > So, I thought it would be better to handle the f2fs_iget failure
> > > directly.
> > > > >
> > > >
> > > > Yes.  That is another way to fix the issue.
> > > >
> > > >
> > > > > In addition, we cannot guarantee the write order of dnode and inode.
> > > > > For exmaple,
> > > > > 1. the inode is written by flusher or kswapd, then,
> > > > > 2. f2fs_sync_file writes its dnode.
> > > > >
> > > > > In that case, we can get only non-inode dnode in the node chain, since
> > > the
> > > > > inode
> > > > > has not fsync_mark.
> > > > >
> > > >
> > > > I think your solution is better here, but does not fix all scenarios.  If
> > > > the inode is checkpointed, the file can be recovered, although the inode
> > > > information may be not up to date.  But if the inode is not checkpointed,
> > > > f2fs_iget will fail too and recover will fail.
> > >
> > > Ok, let me consider your scenarios.
> > >
> > > Term: F: fsync_mark, D: dentry_mark
> > >
> > > 1. inode(x) | CP | inode(x) | dnode(F)
> > >  -> Lose the latest inode(x). Need to fix.
> > >
> > > 2. inode(x) | CP | dnode(F) | inode(x)
> > >  -> Impossible, but recover latest dnode(F)
> > >
> > > 3. CP | inode(x) | dnode(F)
> > >  -> Need to write inode(DF) in f2fs_sync_file.
> > >
> > > 4. CP | dnode(F) | inode(DF)
> > >  -> If f2fs_iget fails, then goto next.
> > >
> > > 5. CP | dnode(F) | inode(x)
> > >  -> If f2fs_iget fails, then goto next. But, this is an impossible
> > > scenario.
> > >     Drop this dnode(F).
> > >
> > > Indeed, there were some missing scenarios.
> > >
> > > So, how about this patch?
> > >
> > > From 552dc68c5f07a335d7b55c197bab531efb135521 Mon Sep 17 00:00:00 2001
> > > From: Jaegeuk Kim <jaegeuk@kernel.org>
> > > Date: Wed, 10 Sep 2014 00:16:34 -0700
> > > Subject: [PATCH] f2fs: fix roll-forward missing scenarios
> > >
> > > We can summarize the roll forward recovery scenarios as follows.
> > >
> > > [Term] F: fsync_mark, D: dentry_mark
> > >
> > > 1. inode(x) | CP | inode(x) | dnode(F)
> > > -> Update the latest inode(x).
> > >
> > > 2. inode(x) | CP | inode(F) | dnode(F)
> > > -> No problem.
> > >
> > > 3. inode(x) | CP | dnode(F) | inode(x)
> > > -> Impossible, but recover latest dnode(F)
> > >
> >
> > I think this is possible.  If f2fs_sync_file runs concurrently with
> > writeback. f2fs_sync_file written dnode(F), then writeback written inode(x).
> 
> If the inode(x) was written, f2fs_sync_file will do write the inode again with
> fsync_mark. So, dnode(F) | inode(x) | inode(F) should be shown.
> 
> In f2fs_sync_file,
> 	...
> 	while (!sync_node_pages(sbi, ino, &wbc)) {
> 		if (fsync_mark_done(sbi, ino))
> 			goto out;
> 		mark_inode_dirty_sync(inode);
> 		ret = f2fs_write_inode(inode, NULL);
> 		if (ret)
> 			goto out;
> 	}
> 	...
> 
> >
> >
> > > 4. inode(x) | CP | dnode(F) | inode(F)
> > > -> No problem.
> > >
> > > 5. CP | inode(x) | dnode(F)
> > > -> Impossible. Write inode(DF) with dnode(F) by f2fs_sync_file.
> > >
> >
> > I think this is possible.  If inode(x) is written by writeback and then
> > dnode(F) is written by f2fs_sync_file.
> 
> -> The inode(DF) was missing. Should drop this dnode(F).
> Again, CP | inode(x) | dnode(F) | inode(DF) should be shown.
> This is fixed by the below updated patch.
> 
> >
> >
> > > 6. CP | inode(DF) | dnode(F)
> > > -> No problem.
> > >
> > > 7. CP | dnode(F) | inode(DF)
> > > -> If f2fs_iget fails, then goto next to find inode(DF).
> > >
> > > 8. CP | dnode(F) | inode(x)
> > > -> If f2fs_iget fails, then goto next. But, this is an impossible scenario.
> > >    Drop this dnode(F).
> > >
> >
> > I think this is possible.  as 3.
> 
> ditto.
> 
> >
> >
> > > So, this patch adds some missing points such as #1, #5, #7, and #8.
> > >
> > > Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> > > ---
> > >  fs/f2fs/file.c     | 10 ++++++++
> > >  fs/f2fs/recovery.c | 70
> > > +++++++++++++++++++++++++++++++++++++++++++++---------
> > >  2 files changed, 69 insertions(+), 11 deletions(-)
> > >
> > > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> > > index 5cde363..2660af2 100644
> > > --- a/fs/f2fs/file.c
> > > +++ b/fs/f2fs/file.c
> > > @@ -207,6 +207,16 @@ int f2fs_sync_file(struct file *file, loff_t start,
> > > loff_t end, int datasync)
> > >                         up_write(&fi->i_sem);
> > >                 }
> > >         } else {
> > > +               /*
> > > +                * CP | inode(x) | dnode(F)
> > > +                * We need to remain inode(DF) for roll-forward recovery.
> > > +                */
> > > +               if (fsync_mark_done(sbi, inode->i_ino) &&
> > >
> >
> > How can this happen?  We have not called sync_node_pages(sbi, inode->i_ino,
> > &wbc).  Where is the fsync mark set?
> >
> >
> > > +                       !is_checkpointed_node(sbi, F2FS_I(inode)->i_pino))
> > > {
> > >
> >
> > if !is_checkpointed_node(sbi, F2FS_I(inode)->i_pino), we need to do
> > checkpoint.  Right?
> 
> Yes, whole conditions were wrong.
> Please, refer the following patch.
> Thanks,
> 
> >From 2e70059dd680830c030f59813d4e9837cb65ecb1 Mon Sep 17 00:00:00 2001
> From: Jaegeuk Kim <jaegeuk@kernel.org>
> Date: Wed, 10 Sep 2014 00:16:34 -0700
> Subject: [PATCH 1/2] f2fs: fix roll-forward missing scenarios
> 
> We can summarize the roll forward recovery scenarios as follows.
> 
> [Term] F: fsync_mark, D: dentry_mark
> 
> 1. inode(x) | CP | inode(x) | dnode(F)
> -> Update the latest inode(x).
> 
> 2. inode(x) | CP | inode(F) | dnode(F)
> -> No problem.
> 
> 3. inode(x) | CP | dnode(F) | inode(x)
> -> Impossible, but recover latest dnode(F).
> 
> 4. inode(x) | CP | dnode(F) | inode(F)
> -> No problem.
> 
> 5. CP | inode(x) | dnode(F)
> -> The inode(DF) was missing. Should drop this dnode(F).
> 
> 6. CP | inode(DF) | dnode(F)
> -> No problem.
> 
> 7. CP | dnode(F) | inode(DF)
> -> If f2fs_iget fails, then goto next to find inode(DF).
> 
> 8. CP | dnode(F) | inode(x)
> -> If f2fs_iget fails, then goto next to find inode(DF).
>    But it will fail due to no inode(DF).
> 
> So, this patch adds some missing points such as #1, #5, #7, and #8.
> 
> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> ---
>  fs/f2fs/file.c     | 10 ++++++++
>  fs/f2fs/recovery.c | 70 +++++++++++++++++++++++++++++++++++++++++++++---------
>  2 files changed, 69 insertions(+), 11 deletions(-)
> 
> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> index e7681c3..e2a2f38 100644
> --- a/fs/f2fs/file.c
> +++ b/fs/f2fs/file.c
> @@ -206,6 +206,16 @@ int f2fs_sync_file(struct file *file, loff_t start, loff_t end, int
> datasync)
>  			up_write(&fi->i_sem);
>  		}
>  	} else {
> +		/*
> +		 * CP | inode(x) | dnode(F)
> +		 * We need to remain inode(DF) for roll-forward recovery.
> +		 */
> +		if (!fsync_mark_done(sbi, inode->i_ino) &&
> +			!is_checkpointed_node(sbi, inode->i_ino)) {
> +			mark_inode_dirty_sync(inode);
> +			f2fs_write_inode(inode, NULL);
> +		}

This un-checkpointed non-fsynced inode page might be writeback by
kswapd/bdi-flusher here. Then CP | inode(x) | dnode(F) occur.

So how about modifying as below:

		while (!sync_node_pages(sbi, inode->i_ino, &wbc) ||
			(!fsync_mark_done(sbi, inode->i_ino) &&
			!is_checkpointed_node(sbi, inode->i_ino))) {

to guarantee the inode(DF) is be written out to disk?

Thanks,
Yu

> +
>  		/* if there is no written node page, write its inode page */
>  		while (!sync_node_pages(sbi, ino, &wbc)) {
>  			if (fsync_mark_done(sbi, ino))
> diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c
> index 6c5a74a..eeb3785 100644
> --- a/fs/f2fs/recovery.c
> +++ b/fs/f2fs/recovery.c
> @@ -14,6 +14,36 @@
>  #include "node.h"
>  #include "segment.h"
> 
> +/*
> + * Roll forward recovery scenarios.
> + *
> + * [Term] F: fsync_mark, D: dentry_mark
> + *
> + * 1. inode(x) | CP | inode(x) | dnode(F)
> + * -> Update the latest inode(x).
> + *
> + * 2. inode(x) | CP | inode(F) | dnode(F)
> + * -> No problem.
> + *
> + * 3. inode(x) | CP | dnode(F) | inode(x)
> + * -> Impossible, but recover latest dnode(F)
> + *
> + * 4. inode(x) | CP | dnode(F) | inode(F)
> + * -> No problem.
> + *
> + * 5. CP | inode(x) | dnode(F)
> + * -> The inode(DF) was missing. Should drop this dnode(F).
> + *
> + * 6. CP | inode(DF) | dnode(F)
> + * -> No problem.
> + *
> + * 7. CP | dnode(F) | inode(DF)
> + * -> If f2fs_iget fails, then goto next to find inode(DF).
> + *
> + * 8. CP | dnode(F) | inode(x)
> + * -> If f2fs_iget fails, then goto next to find inode(DF).
> + *    But it will fail due to no inode(DF).
> + */
>  static struct kmem_cache *fsync_entry_slab;
> 
>  bool space_for_roll_forward(struct f2fs_sb_info *sbi)
> @@ -110,27 +140,32 @@ out:
>  	return err;
>  }
> 
> -static int recover_inode(struct inode *inode, struct page *node_page)
> +static void __recover_inode(struct inode *inode, struct page *page)
>  {
> -	struct f2fs_inode *raw_inode = F2FS_INODE(node_page);
> +	struct f2fs_inode *raw = F2FS_INODE(page);
> +
> +	inode->i_mode = le16_to_cpu(raw->i_mode);
> +	i_size_write(inode, le64_to_cpu(raw->i_size));
> +	inode->i_atime.tv_sec = le64_to_cpu(raw->i_mtime);
> +	inode->i_ctime.tv_sec = le64_to_cpu(raw->i_ctime);
> +	inode->i_mtime.tv_sec = le64_to_cpu(raw->i_mtime);
> +	inode->i_atime.tv_nsec = le32_to_cpu(raw->i_mtime_nsec);
> +	inode->i_ctime.tv_nsec = le32_to_cpu(raw->i_ctime_nsec);
> +	inode->i_mtime.tv_nsec = le32_to_cpu(raw->i_mtime_nsec);
> +}
> 
> +static int recover_inode(struct inode *inode, struct page *node_page)
> +{
>  	if (!IS_INODE(node_page))
>  		return 0;
> 
> -	inode->i_mode = le16_to_cpu(raw_inode->i_mode);
> -	i_size_write(inode, le64_to_cpu(raw_inode->i_size));
> -	inode->i_atime.tv_sec = le64_to_cpu(raw_inode->i_mtime);
> -	inode->i_ctime.tv_sec = le64_to_cpu(raw_inode->i_ctime);
> -	inode->i_mtime.tv_sec = le64_to_cpu(raw_inode->i_mtime);
> -	inode->i_atime.tv_nsec = le32_to_cpu(raw_inode->i_mtime_nsec);
> -	inode->i_ctime.tv_nsec = le32_to_cpu(raw_inode->i_ctime_nsec);
> -	inode->i_mtime.tv_nsec = le32_to_cpu(raw_inode->i_mtime_nsec);
> +	__recover_inode(inode, node_page);
> 
>  	if (is_dent_dnode(node_page))
>  		return recover_dentry(node_page, inode);
> 
>  	f2fs_msg(inode->i_sb, KERN_NOTICE, "recover_inode: ino = %x, name = %s",
> -			ino_of_node(node_page), raw_inode->i_name);
> +			ino_of_node(node_page), F2FS_INODE(node_page)->i_name);
>  	return 0;
>  }
> 
> @@ -186,10 +221,16 @@ static int find_fsync_dnodes(struct f2fs_sb_info *sbi, struct list_head
> *head)
>  				break;
>  			}
> 
> +			/*
> +			 * CP | dnode(F) | inode(DF)
> +			 * For this case, we should not give up now.
> +			 */
>  			entry->inode = f2fs_iget(sbi->sb, ino_of_node(page));
>  			if (IS_ERR(entry->inode)) {
>  				err = PTR_ERR(entry->inode);
>  				kmem_cache_free(fsync_entry_slab, entry);
> +				if (err == -ENOENT)
> +					goto next;
>  				break;
>  			}
>  			list_add_tail(&entry->list, head);
> @@ -416,6 +457,13 @@ static int recover_data(struct f2fs_sb_info *sbi,
>  		entry = get_fsync_inode(head, ino_of_node(page));
>  		if (!entry)
>  			goto next;
> +		/*
> +		 * inode(x) | CP | inode(x) | dnode(F)
> +		 * In this case, we can lose the latest inode(x).
> +		 * So, call __recover_inode for the inode update.
> +		 */
> +		if (IS_INODE(page))
> +			__recover_inode(entry->inode, page);
> 
>  		err = do_recover_data(sbi, entry->inode, page, blkaddr);
>  		if (err)
> --
> 1.8.5.2 (Apple Git-48)
> 
> 
> ------------------------------------------------------------------------------
> Want excitement?
> Manually upgrade your production database.
> When you want reliability, choose Perforce
> Perforce version control. Predictably reliable.
> http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
> _______________________________________________
> Linux-f2fs-devel mailing list
> Linux-f2fs-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] f2fs: Fix recover when nid of non-inode dnode < nid of inode
  2014-09-11  5:37               ` Jaegeuk Kim
@ 2014-09-11 12:25                 ` Huang Ying
  -1 siblings, 0 replies; 31+ messages in thread
From: Huang Ying @ 2014-09-11 12:25 UTC (permalink / raw)
  To: Jaegeuk Kim; +Cc: huang ying, Changman Lee, linux-f2fs-devel, LKML


On Wed, 2014-09-10 at 22:37 -0700, Jaegeuk Kim wrote:
> On Wed, Sep 10, 2014 at 07:08:32PM +0800, huang ying wrote:
> > On Wed, Sep 10, 2014 at 3:21 PM, Jaegeuk Kim <jaegeuk@kernel.org> wrote:
> > 
> > > On Tue, Sep 09, 2014 at 07:31:49PM +0800, huang ying wrote:
> > > > On Tue, Sep 9, 2014 at 3:09 PM, Jaegeuk Kim <jaegeuk@kernel.org> wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > On Tue, Sep 09, 2014 at 01:39:30PM +0800, Huang Ying wrote:
> > > > > > On Mon, 2014-09-08 at 22:23 -0700, Jaegeuk Kim wrote:
> > > > > > > Hi Huang,
> > > > > > >
> > > > > > > On Mon, Sep 08, 2014 at 07:38:26PM +0800, Huang Ying wrote:
> > > > > > > > For fsync, if the nid of a non-inode dnode < nid of inode and the
> > > > > > > > inode is not checkpointed.  The non-inode dnode may be written
> > > before
> > > > > > > > inode.  So in find_fsync_dnodes, f2fs_iget will fail, cause the
> > > > > > > > recovery fail.
> > > > > > > >
> > > > > > > > Usually, inode will be allocated before non-inode dnode, so the
> > > nid
> > > > > of
> > > > > > > > inode < nid of non-inode dnode.  But it is possible for the
> > > reverse.
> > > > > > > > For example, because of alloc_nid_failed.
> > > > > > > >
> > > > > > > > This is fixed via ignoring non-inode dnode before inode dnode in
> > > > > > > > find_fsync_dnodes.
> > > > > > > >
> > > > > > > > The patch was tested via allocating nid reversely via a debugging
> > > > > > > > patch, that is, from big number to small number.
> > > > > > > >
> > > > > > > > Signed-off-by: Huang, Ying <ying.huang@intel.com>
> > > > > > > > ---
> > > > > > > >  fs/f2fs/recovery.c |    7 ++++---
> > > > > > > >  1 file changed, 4 insertions(+), 3 deletions(-)
> > > > > > > >
> > > > > > > > --- a/fs/f2fs/recovery.c
> > > > > > > > +++ b/fs/f2fs/recovery.c
> > > > > > > > @@ -172,8 +172,8 @@ static int find_fsync_dnodes(struct f2fs
> > > > > > > >                   if (IS_INODE(page) && is_dent_dnode(page))
> > > > > > > >                           set_inode_flag(F2FS_I(entry->inode),
> > > > > > > >                                                   FI_INC_LINK);
> > > > > > > > -         } else {
> > > > > > > > -                 if (IS_INODE(page) && is_dent_dnode(page)) {
> > > > > > >
> > > > > > > If this is not inode block, we should add this inode to recover its
> > > > > data blocks.
> > > > > >
> > > > > > Is it possible that there is only non-inode dnode but no inode when
> > > > > > find_fsync_dnodes checking dnodes?  Per my understanding, any
> > > changes to
> > > > > > file will cause inode page dirty (for example, mtime changed), so
> > > that
> > > > > > we will write inode block.  Is it right?  If so, the solution in this
> > > > > > patch should work too.
> > > > >
> > > > > Your description says that f2fs_iget will fail, which causes the
> > > recovery
> > > > > fail.
> > > > > So, I thought it would be better to handle the f2fs_iget failure
> > > directly.
> > > > >
> > > >
> > > > Yes.  That is another way to fix the issue.
> > > >
> > > >
> > > > > In addition, we cannot guarantee the write order of dnode and inode.
> > > > > For exmaple,
> > > > > 1. the inode is written by flusher or kswapd, then,
> > > > > 2. f2fs_sync_file writes its dnode.
> > > > >
> > > > > In that case, we can get only non-inode dnode in the node chain, since
> > > the
> > > > > inode
> > > > > has not fsync_mark.
> > > > >
> > > >
> > > > I think your solution is better here, but does not fix all scenarios.  If
> > > > the inode is checkpointed, the file can be recovered, although the inode
> > > > information may be not up to date.  But if the inode is not checkpointed,
> > > > f2fs_iget will fail too and recover will fail.
> > >
> > > Ok, let me consider your scenarios.
> > >
> > > Term: F: fsync_mark, D: dentry_mark
> > >
> > > 1. inode(x) | CP | inode(x) | dnode(F)
> > >  -> Lose the latest inode(x). Need to fix.
> > >
> > > 2. inode(x) | CP | dnode(F) | inode(x)
> > >  -> Impossible, but recover latest dnode(F)
> > >
> > > 3. CP | inode(x) | dnode(F)
> > >  -> Need to write inode(DF) in f2fs_sync_file.
> > >
> > > 4. CP | dnode(F) | inode(DF)
> > >  -> If f2fs_iget fails, then goto next.
> > >
> > > 5. CP | dnode(F) | inode(x)
> > >  -> If f2fs_iget fails, then goto next. But, this is an impossible
> > > scenario.
> > >     Drop this dnode(F).
> > >
> > > Indeed, there were some missing scenarios.
> > >
> > > So, how about this patch?
> > >
> > > From 552dc68c5f07a335d7b55c197bab531efb135521 Mon Sep 17 00:00:00 2001
> > > From: Jaegeuk Kim <jaegeuk@kernel.org>
> > > Date: Wed, 10 Sep 2014 00:16:34 -0700
> > > Subject: [PATCH] f2fs: fix roll-forward missing scenarios
> > >
> > > We can summarize the roll forward recovery scenarios as follows.
> > >
> > > [Term] F: fsync_mark, D: dentry_mark
> > >
> > > 1. inode(x) | CP | inode(x) | dnode(F)
> > > -> Update the latest inode(x).
> > >
> > > 2. inode(x) | CP | inode(F) | dnode(F)
> > > -> No problem.
> > >
> > > 3. inode(x) | CP | dnode(F) | inode(x)
> > > -> Impossible, but recover latest dnode(F)
> > >
> > 
> > I think this is possible.  If f2fs_sync_file runs concurrently with
> > writeback. f2fs_sync_file written dnode(F), then writeback written inode(x).
> 
> If the inode(x) was written, f2fs_sync_file will do write the inode again with
> fsync_mark. So, dnode(F) | inode(x) | inode(F) should be shown.
> 
> In f2fs_sync_file,
> 	...
> 	while (!sync_node_pages(sbi, ino, &wbc)) {
> 		if (fsync_mark_done(sbi, ino))
> 			goto out;
> 		mark_inode_dirty_sync(inode);
> 		ret = f2fs_write_inode(inode, NULL);
> 		if (ret)
> 			goto out;
> 	}
> 	...

Is the following situation possible?

f2fs_sync_file				<writeback>
  sync_node_pages			  f2fs_write_node_pages
    write dnode(F)			    sync_node_pages
					      write inode(x) /* clear PAGECACHE_TAG_DIRTY */


That is, f2fs_sync_file run parallel with <writeback>.  The
sync_node_pages above will return 1, because dnode(F) is written.
inode(x) is written by <writeback> path.  And because
PAGECACHE_TAG_DIRTY is cleared, it is possible that sync_node_pages
called by f2fs_sync_file does not write inode(F).

Best Regards,
Huang, Ying

> > 
> > 
> > > 4. inode(x) | CP | dnode(F) | inode(F)
> > > -> No problem.
> > >
> > > 5. CP | inode(x) | dnode(F)
> > > -> Impossible. Write inode(DF) with dnode(F) by f2fs_sync_file.
> > >
> > 
> > I think this is possible.  If inode(x) is written by writeback and then
> > dnode(F) is written by f2fs_sync_file.
> 
> -> The inode(DF) was missing. Should drop this dnode(F).
> Again, CP | inode(x) | dnode(F) | inode(DF) should be shown.
> This is fixed by the below updated patch.
> 
> > 
> > 
> > > 6. CP | inode(DF) | dnode(F)
> > > -> No problem.
> > >
> > > 7. CP | dnode(F) | inode(DF)
> > > -> If f2fs_iget fails, then goto next to find inode(DF).
> > >
> > > 8. CP | dnode(F) | inode(x)
> > > -> If f2fs_iget fails, then goto next. But, this is an impossible scenario.
> > >    Drop this dnode(F).
> > >
> > 
> > I think this is possible.  as 3.
> 
> ditto.
> 
> > 
> > 
> > > So, this patch adds some missing points such as #1, #5, #7, and #8.
> > >
> > > Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> > > ---
> > >  fs/f2fs/file.c     | 10 ++++++++
> > >  fs/f2fs/recovery.c | 70
> > > +++++++++++++++++++++++++++++++++++++++++++++---------
> > >  2 files changed, 69 insertions(+), 11 deletions(-)
> > >
> > > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> > > index 5cde363..2660af2 100644
> > > --- a/fs/f2fs/file.c
> > > +++ b/fs/f2fs/file.c
> > > @@ -207,6 +207,16 @@ int f2fs_sync_file(struct file *file, loff_t start,
> > > loff_t end, int datasync)
> > >                         up_write(&fi->i_sem);
> > >                 }
> > >         } else {
> > > +               /*
> > > +                * CP | inode(x) | dnode(F)
> > > +                * We need to remain inode(DF) for roll-forward recovery.
> > > +                */
> > > +               if (fsync_mark_done(sbi, inode->i_ino) &&
> > >
> > 
> > How can this happen?  We have not called sync_node_pages(sbi, inode->i_ino,
> > &wbc).  Where is the fsync mark set?
> > 
> > 
> > > +                       !is_checkpointed_node(sbi, F2FS_I(inode)->i_pino))
> > > {
> > >
> > 
> > if !is_checkpointed_node(sbi, F2FS_I(inode)->i_pino), we need to do
> > checkpoint.  Right?
> 
> Yes, whole conditions were wrong.
> Please, refer the following patch.
> Thanks,
> 
> From 2e70059dd680830c030f59813d4e9837cb65ecb1 Mon Sep 17 00:00:00 2001
> From: Jaegeuk Kim <jaegeuk@kernel.org>
> Date: Wed, 10 Sep 2014 00:16:34 -0700
> Subject: [PATCH 1/2] f2fs: fix roll-forward missing scenarios
> 
> We can summarize the roll forward recovery scenarios as follows.
> 
> [Term] F: fsync_mark, D: dentry_mark
> 
> 1. inode(x) | CP | inode(x) | dnode(F)
> -> Update the latest inode(x).
> 
> 2. inode(x) | CP | inode(F) | dnode(F)
> -> No problem.
> 
> 3. inode(x) | CP | dnode(F) | inode(x)
> -> Impossible, but recover latest dnode(F).
> 
> 4. inode(x) | CP | dnode(F) | inode(F)
> -> No problem.
> 
> 5. CP | inode(x) | dnode(F)
> -> The inode(DF) was missing. Should drop this dnode(F).
> 
> 6. CP | inode(DF) | dnode(F)
> -> No problem.
> 
> 7. CP | dnode(F) | inode(DF)
> -> If f2fs_iget fails, then goto next to find inode(DF).
> 
> 8. CP | dnode(F) | inode(x)
> -> If f2fs_iget fails, then goto next to find inode(DF).
>    But it will fail due to no inode(DF).
> 
> So, this patch adds some missing points such as #1, #5, #7, and #8.
> 
> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> ---
>  fs/f2fs/file.c     | 10 ++++++++
>  fs/f2fs/recovery.c | 70 +++++++++++++++++++++++++++++++++++++++++++++---------
>  2 files changed, 69 insertions(+), 11 deletions(-)
> 
> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> index e7681c3..e2a2f38 100644
> --- a/fs/f2fs/file.c
> +++ b/fs/f2fs/file.c
> @@ -206,6 +206,16 @@ int f2fs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
>  			up_write(&fi->i_sem);
>  		}
>  	} else {
> +		/*
> +		 * CP | inode(x) | dnode(F)
> +		 * We need to remain inode(DF) for roll-forward recovery.
> +		 */
> +		if (!fsync_mark_done(sbi, inode->i_ino) &&
> +			!is_checkpointed_node(sbi, inode->i_ino)) {
> +			mark_inode_dirty_sync(inode);
> +			f2fs_write_inode(inode, NULL);
> +		}
> +
>  		/* if there is no written node page, write its inode page */
>  		while (!sync_node_pages(sbi, ino, &wbc)) {
>  			if (fsync_mark_done(sbi, ino))
> diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c
> index 6c5a74a..eeb3785 100644
> --- a/fs/f2fs/recovery.c
> +++ b/fs/f2fs/recovery.c
> @@ -14,6 +14,36 @@
>  #include "node.h"
>  #include "segment.h"
>  
> +/*
> + * Roll forward recovery scenarios.
> + *
> + * [Term] F: fsync_mark, D: dentry_mark
> + *
> + * 1. inode(x) | CP | inode(x) | dnode(F)
> + * -> Update the latest inode(x).
> + *
> + * 2. inode(x) | CP | inode(F) | dnode(F)
> + * -> No problem.
> + *
> + * 3. inode(x) | CP | dnode(F) | inode(x)
> + * -> Impossible, but recover latest dnode(F)
> + *
> + * 4. inode(x) | CP | dnode(F) | inode(F)
> + * -> No problem.
> + *
> + * 5. CP | inode(x) | dnode(F)
> + * -> The inode(DF) was missing. Should drop this dnode(F).
> + *
> + * 6. CP | inode(DF) | dnode(F)
> + * -> No problem.
> + *
> + * 7. CP | dnode(F) | inode(DF)
> + * -> If f2fs_iget fails, then goto next to find inode(DF).
> + *
> + * 8. CP | dnode(F) | inode(x)
> + * -> If f2fs_iget fails, then goto next to find inode(DF).
> + *    But it will fail due to no inode(DF).
> + */
>  static struct kmem_cache *fsync_entry_slab;
>  
>  bool space_for_roll_forward(struct f2fs_sb_info *sbi)
> @@ -110,27 +140,32 @@ out:
>  	return err;
>  }
>  
> -static int recover_inode(struct inode *inode, struct page *node_page)
> +static void __recover_inode(struct inode *inode, struct page *page)
>  {
> -	struct f2fs_inode *raw_inode = F2FS_INODE(node_page);
> +	struct f2fs_inode *raw = F2FS_INODE(page);
> +
> +	inode->i_mode = le16_to_cpu(raw->i_mode);
> +	i_size_write(inode, le64_to_cpu(raw->i_size));
> +	inode->i_atime.tv_sec = le64_to_cpu(raw->i_mtime);
> +	inode->i_ctime.tv_sec = le64_to_cpu(raw->i_ctime);
> +	inode->i_mtime.tv_sec = le64_to_cpu(raw->i_mtime);
> +	inode->i_atime.tv_nsec = le32_to_cpu(raw->i_mtime_nsec);
> +	inode->i_ctime.tv_nsec = le32_to_cpu(raw->i_ctime_nsec);
> +	inode->i_mtime.tv_nsec = le32_to_cpu(raw->i_mtime_nsec);
> +}
>  
> +static int recover_inode(struct inode *inode, struct page *node_page)
> +{
>  	if (!IS_INODE(node_page))
>  		return 0;
>  
> -	inode->i_mode = le16_to_cpu(raw_inode->i_mode);
> -	i_size_write(inode, le64_to_cpu(raw_inode->i_size));
> -	inode->i_atime.tv_sec = le64_to_cpu(raw_inode->i_mtime);
> -	inode->i_ctime.tv_sec = le64_to_cpu(raw_inode->i_ctime);
> -	inode->i_mtime.tv_sec = le64_to_cpu(raw_inode->i_mtime);
> -	inode->i_atime.tv_nsec = le32_to_cpu(raw_inode->i_mtime_nsec);
> -	inode->i_ctime.tv_nsec = le32_to_cpu(raw_inode->i_ctime_nsec);
> -	inode->i_mtime.tv_nsec = le32_to_cpu(raw_inode->i_mtime_nsec);
> +	__recover_inode(inode, node_page);
>  
>  	if (is_dent_dnode(node_page))
>  		return recover_dentry(node_page, inode);
>  
>  	f2fs_msg(inode->i_sb, KERN_NOTICE, "recover_inode: ino = %x, name = %s",
> -			ino_of_node(node_page), raw_inode->i_name);
> +			ino_of_node(node_page), F2FS_INODE(node_page)->i_name);
>  	return 0;
>  }
>  
> @@ -186,10 +221,16 @@ static int find_fsync_dnodes(struct f2fs_sb_info *sbi, struct list_head *head)
>  				break;
>  			}
>  
> +			/*
> +			 * CP | dnode(F) | inode(DF)
> +			 * For this case, we should not give up now.
> +			 */
>  			entry->inode = f2fs_iget(sbi->sb, ino_of_node(page));
>  			if (IS_ERR(entry->inode)) {
>  				err = PTR_ERR(entry->inode);
>  				kmem_cache_free(fsync_entry_slab, entry);
> +				if (err == -ENOENT)
> +					goto next;
>  				break;
>  			}
>  			list_add_tail(&entry->list, head);
> @@ -416,6 +457,13 @@ static int recover_data(struct f2fs_sb_info *sbi,
>  		entry = get_fsync_inode(head, ino_of_node(page));
>  		if (!entry)
>  			goto next;
> +		/*
> +		 * inode(x) | CP | inode(x) | dnode(F)
> +		 * In this case, we can lose the latest inode(x).
> +		 * So, call __recover_inode for the inode update.
> +		 */
> +		if (IS_INODE(page))
> +			__recover_inode(entry->inode, page);
>  
>  		err = do_recover_data(sbi, entry->inode, page, blkaddr);
>  		if (err)




^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] f2fs: Fix recover when nid of non-inode dnode < nid of inode
@ 2014-09-11 12:25                 ` Huang Ying
  0 siblings, 0 replies; 31+ messages in thread
From: Huang Ying @ 2014-09-11 12:25 UTC (permalink / raw)
  To: Jaegeuk Kim; +Cc: huang ying, LKML, linux-f2fs-devel


On Wed, 2014-09-10 at 22:37 -0700, Jaegeuk Kim wrote:
> On Wed, Sep 10, 2014 at 07:08:32PM +0800, huang ying wrote:
> > On Wed, Sep 10, 2014 at 3:21 PM, Jaegeuk Kim <jaegeuk@kernel.org> wrote:
> > 
> > > On Tue, Sep 09, 2014 at 07:31:49PM +0800, huang ying wrote:
> > > > On Tue, Sep 9, 2014 at 3:09 PM, Jaegeuk Kim <jaegeuk@kernel.org> wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > On Tue, Sep 09, 2014 at 01:39:30PM +0800, Huang Ying wrote:
> > > > > > On Mon, 2014-09-08 at 22:23 -0700, Jaegeuk Kim wrote:
> > > > > > > Hi Huang,
> > > > > > >
> > > > > > > On Mon, Sep 08, 2014 at 07:38:26PM +0800, Huang Ying wrote:
> > > > > > > > For fsync, if the nid of a non-inode dnode < nid of inode and the
> > > > > > > > inode is not checkpointed.  The non-inode dnode may be written
> > > before
> > > > > > > > inode.  So in find_fsync_dnodes, f2fs_iget will fail, cause the
> > > > > > > > recovery fail.
> > > > > > > >
> > > > > > > > Usually, inode will be allocated before non-inode dnode, so the
> > > nid
> > > > > of
> > > > > > > > inode < nid of non-inode dnode.  But it is possible for the
> > > reverse.
> > > > > > > > For example, because of alloc_nid_failed.
> > > > > > > >
> > > > > > > > This is fixed via ignoring non-inode dnode before inode dnode in
> > > > > > > > find_fsync_dnodes.
> > > > > > > >
> > > > > > > > The patch was tested via allocating nid reversely via a debugging
> > > > > > > > patch, that is, from big number to small number.
> > > > > > > >
> > > > > > > > Signed-off-by: Huang, Ying <ying.huang@intel.com>
> > > > > > > > ---
> > > > > > > >  fs/f2fs/recovery.c |    7 ++++---
> > > > > > > >  1 file changed, 4 insertions(+), 3 deletions(-)
> > > > > > > >
> > > > > > > > --- a/fs/f2fs/recovery.c
> > > > > > > > +++ b/fs/f2fs/recovery.c
> > > > > > > > @@ -172,8 +172,8 @@ static int find_fsync_dnodes(struct f2fs
> > > > > > > >                   if (IS_INODE(page) && is_dent_dnode(page))
> > > > > > > >                           set_inode_flag(F2FS_I(entry->inode),
> > > > > > > >                                                   FI_INC_LINK);
> > > > > > > > -         } else {
> > > > > > > > -                 if (IS_INODE(page) && is_dent_dnode(page)) {
> > > > > > >
> > > > > > > If this is not inode block, we should add this inode to recover its
> > > > > data blocks.
> > > > > >
> > > > > > Is it possible that there is only non-inode dnode but no inode when
> > > > > > find_fsync_dnodes checking dnodes?  Per my understanding, any
> > > changes to
> > > > > > file will cause inode page dirty (for example, mtime changed), so
> > > that
> > > > > > we will write inode block.  Is it right?  If so, the solution in this
> > > > > > patch should work too.
> > > > >
> > > > > Your description says that f2fs_iget will fail, which causes the
> > > recovery
> > > > > fail.
> > > > > So, I thought it would be better to handle the f2fs_iget failure
> > > directly.
> > > > >
> > > >
> > > > Yes.  That is another way to fix the issue.
> > > >
> > > >
> > > > > In addition, we cannot guarantee the write order of dnode and inode.
> > > > > For exmaple,
> > > > > 1. the inode is written by flusher or kswapd, then,
> > > > > 2. f2fs_sync_file writes its dnode.
> > > > >
> > > > > In that case, we can get only non-inode dnode in the node chain, since
> > > the
> > > > > inode
> > > > > has not fsync_mark.
> > > > >
> > > >
> > > > I think your solution is better here, but does not fix all scenarios.  If
> > > > the inode is checkpointed, the file can be recovered, although the inode
> > > > information may be not up to date.  But if the inode is not checkpointed,
> > > > f2fs_iget will fail too and recover will fail.
> > >
> > > Ok, let me consider your scenarios.
> > >
> > > Term: F: fsync_mark, D: dentry_mark
> > >
> > > 1. inode(x) | CP | inode(x) | dnode(F)
> > >  -> Lose the latest inode(x). Need to fix.
> > >
> > > 2. inode(x) | CP | dnode(F) | inode(x)
> > >  -> Impossible, but recover latest dnode(F)
> > >
> > > 3. CP | inode(x) | dnode(F)
> > >  -> Need to write inode(DF) in f2fs_sync_file.
> > >
> > > 4. CP | dnode(F) | inode(DF)
> > >  -> If f2fs_iget fails, then goto next.
> > >
> > > 5. CP | dnode(F) | inode(x)
> > >  -> If f2fs_iget fails, then goto next. But, this is an impossible
> > > scenario.
> > >     Drop this dnode(F).
> > >
> > > Indeed, there were some missing scenarios.
> > >
> > > So, how about this patch?
> > >
> > > From 552dc68c5f07a335d7b55c197bab531efb135521 Mon Sep 17 00:00:00 2001
> > > From: Jaegeuk Kim <jaegeuk@kernel.org>
> > > Date: Wed, 10 Sep 2014 00:16:34 -0700
> > > Subject: [PATCH] f2fs: fix roll-forward missing scenarios
> > >
> > > We can summarize the roll forward recovery scenarios as follows.
> > >
> > > [Term] F: fsync_mark, D: dentry_mark
> > >
> > > 1. inode(x) | CP | inode(x) | dnode(F)
> > > -> Update the latest inode(x).
> > >
> > > 2. inode(x) | CP | inode(F) | dnode(F)
> > > -> No problem.
> > >
> > > 3. inode(x) | CP | dnode(F) | inode(x)
> > > -> Impossible, but recover latest dnode(F)
> > >
> > 
> > I think this is possible.  If f2fs_sync_file runs concurrently with
> > writeback. f2fs_sync_file written dnode(F), then writeback written inode(x).
> 
> If the inode(x) was written, f2fs_sync_file will do write the inode again with
> fsync_mark. So, dnode(F) | inode(x) | inode(F) should be shown.
> 
> In f2fs_sync_file,
> 	...
> 	while (!sync_node_pages(sbi, ino, &wbc)) {
> 		if (fsync_mark_done(sbi, ino))
> 			goto out;
> 		mark_inode_dirty_sync(inode);
> 		ret = f2fs_write_inode(inode, NULL);
> 		if (ret)
> 			goto out;
> 	}
> 	...

Is the following situation possible?

f2fs_sync_file				<writeback>
  sync_node_pages			  f2fs_write_node_pages
    write dnode(F)			    sync_node_pages
					      write inode(x) /* clear PAGECACHE_TAG_DIRTY */


That is, f2fs_sync_file run parallel with <writeback>.  The
sync_node_pages above will return 1, because dnode(F) is written.
inode(x) is written by <writeback> path.  And because
PAGECACHE_TAG_DIRTY is cleared, it is possible that sync_node_pages
called by f2fs_sync_file does not write inode(F).

Best Regards,
Huang, Ying

> > 
> > 
> > > 4. inode(x) | CP | dnode(F) | inode(F)
> > > -> No problem.
> > >
> > > 5. CP | inode(x) | dnode(F)
> > > -> Impossible. Write inode(DF) with dnode(F) by f2fs_sync_file.
> > >
> > 
> > I think this is possible.  If inode(x) is written by writeback and then
> > dnode(F) is written by f2fs_sync_file.
> 
> -> The inode(DF) was missing. Should drop this dnode(F).
> Again, CP | inode(x) | dnode(F) | inode(DF) should be shown.
> This is fixed by the below updated patch.
> 
> > 
> > 
> > > 6. CP | inode(DF) | dnode(F)
> > > -> No problem.
> > >
> > > 7. CP | dnode(F) | inode(DF)
> > > -> If f2fs_iget fails, then goto next to find inode(DF).
> > >
> > > 8. CP | dnode(F) | inode(x)
> > > -> If f2fs_iget fails, then goto next. But, this is an impossible scenario.
> > >    Drop this dnode(F).
> > >
> > 
> > I think this is possible.  as 3.
> 
> ditto.
> 
> > 
> > 
> > > So, this patch adds some missing points such as #1, #5, #7, and #8.
> > >
> > > Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> > > ---
> > >  fs/f2fs/file.c     | 10 ++++++++
> > >  fs/f2fs/recovery.c | 70
> > > +++++++++++++++++++++++++++++++++++++++++++++---------
> > >  2 files changed, 69 insertions(+), 11 deletions(-)
> > >
> > > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> > > index 5cde363..2660af2 100644
> > > --- a/fs/f2fs/file.c
> > > +++ b/fs/f2fs/file.c
> > > @@ -207,6 +207,16 @@ int f2fs_sync_file(struct file *file, loff_t start,
> > > loff_t end, int datasync)
> > >                         up_write(&fi->i_sem);
> > >                 }
> > >         } else {
> > > +               /*
> > > +                * CP | inode(x) | dnode(F)
> > > +                * We need to remain inode(DF) for roll-forward recovery.
> > > +                */
> > > +               if (fsync_mark_done(sbi, inode->i_ino) &&
> > >
> > 
> > How can this happen?  We have not called sync_node_pages(sbi, inode->i_ino,
> > &wbc).  Where is the fsync mark set?
> > 
> > 
> > > +                       !is_checkpointed_node(sbi, F2FS_I(inode)->i_pino))
> > > {
> > >
> > 
> > if !is_checkpointed_node(sbi, F2FS_I(inode)->i_pino), we need to do
> > checkpoint.  Right?
> 
> Yes, whole conditions were wrong.
> Please, refer the following patch.
> Thanks,
> 
> From 2e70059dd680830c030f59813d4e9837cb65ecb1 Mon Sep 17 00:00:00 2001
> From: Jaegeuk Kim <jaegeuk@kernel.org>
> Date: Wed, 10 Sep 2014 00:16:34 -0700
> Subject: [PATCH 1/2] f2fs: fix roll-forward missing scenarios
> 
> We can summarize the roll forward recovery scenarios as follows.
> 
> [Term] F: fsync_mark, D: dentry_mark
> 
> 1. inode(x) | CP | inode(x) | dnode(F)
> -> Update the latest inode(x).
> 
> 2. inode(x) | CP | inode(F) | dnode(F)
> -> No problem.
> 
> 3. inode(x) | CP | dnode(F) | inode(x)
> -> Impossible, but recover latest dnode(F).
> 
> 4. inode(x) | CP | dnode(F) | inode(F)
> -> No problem.
> 
> 5. CP | inode(x) | dnode(F)
> -> The inode(DF) was missing. Should drop this dnode(F).
> 
> 6. CP | inode(DF) | dnode(F)
> -> No problem.
> 
> 7. CP | dnode(F) | inode(DF)
> -> If f2fs_iget fails, then goto next to find inode(DF).
> 
> 8. CP | dnode(F) | inode(x)
> -> If f2fs_iget fails, then goto next to find inode(DF).
>    But it will fail due to no inode(DF).
> 
> So, this patch adds some missing points such as #1, #5, #7, and #8.
> 
> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> ---
>  fs/f2fs/file.c     | 10 ++++++++
>  fs/f2fs/recovery.c | 70 +++++++++++++++++++++++++++++++++++++++++++++---------
>  2 files changed, 69 insertions(+), 11 deletions(-)
> 
> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> index e7681c3..e2a2f38 100644
> --- a/fs/f2fs/file.c
> +++ b/fs/f2fs/file.c
> @@ -206,6 +206,16 @@ int f2fs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
>  			up_write(&fi->i_sem);
>  		}
>  	} else {
> +		/*
> +		 * CP | inode(x) | dnode(F)
> +		 * We need to remain inode(DF) for roll-forward recovery.
> +		 */
> +		if (!fsync_mark_done(sbi, inode->i_ino) &&
> +			!is_checkpointed_node(sbi, inode->i_ino)) {
> +			mark_inode_dirty_sync(inode);
> +			f2fs_write_inode(inode, NULL);
> +		}
> +
>  		/* if there is no written node page, write its inode page */
>  		while (!sync_node_pages(sbi, ino, &wbc)) {
>  			if (fsync_mark_done(sbi, ino))
> diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c
> index 6c5a74a..eeb3785 100644
> --- a/fs/f2fs/recovery.c
> +++ b/fs/f2fs/recovery.c
> @@ -14,6 +14,36 @@
>  #include "node.h"
>  #include "segment.h"
>  
> +/*
> + * Roll forward recovery scenarios.
> + *
> + * [Term] F: fsync_mark, D: dentry_mark
> + *
> + * 1. inode(x) | CP | inode(x) | dnode(F)
> + * -> Update the latest inode(x).
> + *
> + * 2. inode(x) | CP | inode(F) | dnode(F)
> + * -> No problem.
> + *
> + * 3. inode(x) | CP | dnode(F) | inode(x)
> + * -> Impossible, but recover latest dnode(F)
> + *
> + * 4. inode(x) | CP | dnode(F) | inode(F)
> + * -> No problem.
> + *
> + * 5. CP | inode(x) | dnode(F)
> + * -> The inode(DF) was missing. Should drop this dnode(F).
> + *
> + * 6. CP | inode(DF) | dnode(F)
> + * -> No problem.
> + *
> + * 7. CP | dnode(F) | inode(DF)
> + * -> If f2fs_iget fails, then goto next to find inode(DF).
> + *
> + * 8. CP | dnode(F) | inode(x)
> + * -> If f2fs_iget fails, then goto next to find inode(DF).
> + *    But it will fail due to no inode(DF).
> + */
>  static struct kmem_cache *fsync_entry_slab;
>  
>  bool space_for_roll_forward(struct f2fs_sb_info *sbi)
> @@ -110,27 +140,32 @@ out:
>  	return err;
>  }
>  
> -static int recover_inode(struct inode *inode, struct page *node_page)
> +static void __recover_inode(struct inode *inode, struct page *page)
>  {
> -	struct f2fs_inode *raw_inode = F2FS_INODE(node_page);
> +	struct f2fs_inode *raw = F2FS_INODE(page);
> +
> +	inode->i_mode = le16_to_cpu(raw->i_mode);
> +	i_size_write(inode, le64_to_cpu(raw->i_size));
> +	inode->i_atime.tv_sec = le64_to_cpu(raw->i_mtime);
> +	inode->i_ctime.tv_sec = le64_to_cpu(raw->i_ctime);
> +	inode->i_mtime.tv_sec = le64_to_cpu(raw->i_mtime);
> +	inode->i_atime.tv_nsec = le32_to_cpu(raw->i_mtime_nsec);
> +	inode->i_ctime.tv_nsec = le32_to_cpu(raw->i_ctime_nsec);
> +	inode->i_mtime.tv_nsec = le32_to_cpu(raw->i_mtime_nsec);
> +}
>  
> +static int recover_inode(struct inode *inode, struct page *node_page)
> +{
>  	if (!IS_INODE(node_page))
>  		return 0;
>  
> -	inode->i_mode = le16_to_cpu(raw_inode->i_mode);
> -	i_size_write(inode, le64_to_cpu(raw_inode->i_size));
> -	inode->i_atime.tv_sec = le64_to_cpu(raw_inode->i_mtime);
> -	inode->i_ctime.tv_sec = le64_to_cpu(raw_inode->i_ctime);
> -	inode->i_mtime.tv_sec = le64_to_cpu(raw_inode->i_mtime);
> -	inode->i_atime.tv_nsec = le32_to_cpu(raw_inode->i_mtime_nsec);
> -	inode->i_ctime.tv_nsec = le32_to_cpu(raw_inode->i_ctime_nsec);
> -	inode->i_mtime.tv_nsec = le32_to_cpu(raw_inode->i_mtime_nsec);
> +	__recover_inode(inode, node_page);
>  
>  	if (is_dent_dnode(node_page))
>  		return recover_dentry(node_page, inode);
>  
>  	f2fs_msg(inode->i_sb, KERN_NOTICE, "recover_inode: ino = %x, name = %s",
> -			ino_of_node(node_page), raw_inode->i_name);
> +			ino_of_node(node_page), F2FS_INODE(node_page)->i_name);
>  	return 0;
>  }
>  
> @@ -186,10 +221,16 @@ static int find_fsync_dnodes(struct f2fs_sb_info *sbi, struct list_head *head)
>  				break;
>  			}
>  
> +			/*
> +			 * CP | dnode(F) | inode(DF)
> +			 * For this case, we should not give up now.
> +			 */
>  			entry->inode = f2fs_iget(sbi->sb, ino_of_node(page));
>  			if (IS_ERR(entry->inode)) {
>  				err = PTR_ERR(entry->inode);
>  				kmem_cache_free(fsync_entry_slab, entry);
> +				if (err == -ENOENT)
> +					goto next;
>  				break;
>  			}
>  			list_add_tail(&entry->list, head);
> @@ -416,6 +457,13 @@ static int recover_data(struct f2fs_sb_info *sbi,
>  		entry = get_fsync_inode(head, ino_of_node(page));
>  		if (!entry)
>  			goto next;
> +		/*
> +		 * inode(x) | CP | inode(x) | dnode(F)
> +		 * In this case, we can lose the latest inode(x).
> +		 * So, call __recover_inode for the inode update.
> +		 */
> +		if (IS_INODE(page))
> +			__recover_inode(entry->inode, page);
>  
>  		err = do_recover_data(sbi, entry->inode, page, blkaddr);
>  		if (err)




------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [f2fs-dev] [PATCH] f2fs: Fix recover when nid of non-inode dnode < nid of inode
  2014-09-11 10:47                 ` Chao Yu
@ 2014-09-11 12:31                   ` Huang Ying
  -1 siblings, 0 replies; 31+ messages in thread
From: Huang Ying @ 2014-09-11 12:31 UTC (permalink / raw)
  To: Chao Yu
  Cc: 'Jaegeuk Kim', 'huang ying',
	linux-f2fs-devel, 'LKML'

On Thu, 2014-09-11 at 18:47 +0800, Chao Yu wrote:
> > -----Original Message-----
> > From: Jaegeuk Kim [mailto:jaegeuk@kernel.org]
> > Sent: Thursday, September 11, 2014 1:37 PM
> > To: huang ying
> > Cc: linux-f2fs-devel@lists.sourceforge.net; LKML; Huang Ying
> > Subject: Re: [f2fs-dev] [PATCH] f2fs: Fix recover when nid of non-inode dnode < nid of inode
> > 
> > On Wed, Sep 10, 2014 at 07:08:32PM +0800, huang ying wrote:
> > > On Wed, Sep 10, 2014 at 3:21 PM, Jaegeuk Kim <jaegeuk@kernel.org> wrote:
> > >
> > > > On Tue, Sep 09, 2014 at 07:31:49PM +0800, huang ying wrote:
> > > > > On Tue, Sep 9, 2014 at 3:09 PM, Jaegeuk Kim <jaegeuk@kernel.org> wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > On Tue, Sep 09, 2014 at 01:39:30PM +0800, Huang Ying wrote:
> > > > > > > On Mon, 2014-09-08 at 22:23 -0700, Jaegeuk Kim wrote:
> > > > > > > > Hi Huang,
> > > > > > > >
> > > > > > > > On Mon, Sep 08, 2014 at 07:38:26PM +0800, Huang Ying wrote:
> > > > > > > > > For fsync, if the nid of a non-inode dnode < nid of inode and the
> > > > > > > > > inode is not checkpointed.  The non-inode dnode may be written
> > > > before
> > > > > > > > > inode.  So in find_fsync_dnodes, f2fs_iget will fail, cause the
> > > > > > > > > recovery fail.
> > > > > > > > >
> > > > > > > > > Usually, inode will be allocated before non-inode dnode, so the
> > > > nid
> > > > > > of
> > > > > > > > > inode < nid of non-inode dnode.  But it is possible for the
> > > > reverse.
> > > > > > > > > For example, because of alloc_nid_failed.
> > > > > > > > >
> > > > > > > > > This is fixed via ignoring non-inode dnode before inode dnode in
> > > > > > > > > find_fsync_dnodes.
> > > > > > > > >
> > > > > > > > > The patch was tested via allocating nid reversely via a debugging
> > > > > > > > > patch, that is, from big number to small number.
> > > > > > > > >
> > > > > > > > > Signed-off-by: Huang, Ying <ying.huang@intel.com>
> > > > > > > > > ---
> > > > > > > > >  fs/f2fs/recovery.c |    7 ++++---
> > > > > > > > >  1 file changed, 4 insertions(+), 3 deletions(-)
> > > > > > > > >
> > > > > > > > > --- a/fs/f2fs/recovery.c
> > > > > > > > > +++ b/fs/f2fs/recovery.c
> > > > > > > > > @@ -172,8 +172,8 @@ static int find_fsync_dnodes(struct f2fs
> > > > > > > > >                   if (IS_INODE(page) && is_dent_dnode(page))
> > > > > > > > >                           set_inode_flag(F2FS_I(entry->inode),
> > > > > > > > >                                                   FI_INC_LINK);
> > > > > > > > > -         } else {
> > > > > > > > > -                 if (IS_INODE(page) && is_dent_dnode(page)) {
> > > > > > > >
> > > > > > > > If this is not inode block, we should add this inode to recover its
> > > > > > data blocks.
> > > > > > >
> > > > > > > Is it possible that there is only non-inode dnode but no inode when
> > > > > > > find_fsync_dnodes checking dnodes?  Per my understanding, any
> > > > changes to
> > > > > > > file will cause inode page dirty (for example, mtime changed), so
> > > > that
> > > > > > > we will write inode block.  Is it right?  If so, the solution in this
> > > > > > > patch should work too.
> > > > > >
> > > > > > Your description says that f2fs_iget will fail, which causes the
> > > > recovery
> > > > > > fail.
> > > > > > So, I thought it would be better to handle the f2fs_iget failure
> > > > directly.
> > > > > >
> > > > >
> > > > > Yes.  That is another way to fix the issue.
> > > > >
> > > > >
> > > > > > In addition, we cannot guarantee the write order of dnode and inode.
> > > > > > For exmaple,
> > > > > > 1. the inode is written by flusher or kswapd, then,
> > > > > > 2. f2fs_sync_file writes its dnode.
> > > > > >
> > > > > > In that case, we can get only non-inode dnode in the node chain, since
> > > > the
> > > > > > inode
> > > > > > has not fsync_mark.
> > > > > >
> > > > >
> > > > > I think your solution is better here, but does not fix all scenarios.  If
> > > > > the inode is checkpointed, the file can be recovered, although the inode
> > > > > information may be not up to date.  But if the inode is not checkpointed,
> > > > > f2fs_iget will fail too and recover will fail.
> > > >
> > > > Ok, let me consider your scenarios.
> > > >
> > > > Term: F: fsync_mark, D: dentry_mark
> > > >
> > > > 1. inode(x) | CP | inode(x) | dnode(F)
> > > >  -> Lose the latest inode(x). Need to fix.
> > > >
> > > > 2. inode(x) | CP | dnode(F) | inode(x)
> > > >  -> Impossible, but recover latest dnode(F)
> > > >
> > > > 3. CP | inode(x) | dnode(F)
> > > >  -> Need to write inode(DF) in f2fs_sync_file.
> > > >
> > > > 4. CP | dnode(F) | inode(DF)
> > > >  -> If f2fs_iget fails, then goto next.
> > > >
> > > > 5. CP | dnode(F) | inode(x)
> > > >  -> If f2fs_iget fails, then goto next. But, this is an impossible
> > > > scenario.
> > > >     Drop this dnode(F).
> > > >
> > > > Indeed, there were some missing scenarios.
> > > >
> > > > So, how about this patch?
> > > >
> > > > From 552dc68c5f07a335d7b55c197bab531efb135521 Mon Sep 17 00:00:00 2001
> > > > From: Jaegeuk Kim <jaegeuk@kernel.org>
> > > > Date: Wed, 10 Sep 2014 00:16:34 -0700
> > > > Subject: [PATCH] f2fs: fix roll-forward missing scenarios
> > > >
> > > > We can summarize the roll forward recovery scenarios as follows.
> > > >
> > > > [Term] F: fsync_mark, D: dentry_mark
> > > >
> > > > 1. inode(x) | CP | inode(x) | dnode(F)
> > > > -> Update the latest inode(x).
> > > >
> > > > 2. inode(x) | CP | inode(F) | dnode(F)
> > > > -> No problem.
> > > >
> > > > 3. inode(x) | CP | dnode(F) | inode(x)
> > > > -> Impossible, but recover latest dnode(F)
> > > >
> > >
> > > I think this is possible.  If f2fs_sync_file runs concurrently with
> > > writeback. f2fs_sync_file written dnode(F), then writeback written inode(x).
> > 
> > If the inode(x) was written, f2fs_sync_file will do write the inode again with
> > fsync_mark. So, dnode(F) | inode(x) | inode(F) should be shown.
> > 
> > In f2fs_sync_file,
> > 	...
> > 	while (!sync_node_pages(sbi, ino, &wbc)) {
> > 		if (fsync_mark_done(sbi, ino))
> > 			goto out;
> > 		mark_inode_dirty_sync(inode);
> > 		ret = f2fs_write_inode(inode, NULL);
> > 		if (ret)
> > 			goto out;
> > 	}
> > 	...
> > 
> > >
> > >
> > > > 4. inode(x) | CP | dnode(F) | inode(F)
> > > > -> No problem.
> > > >
> > > > 5. CP | inode(x) | dnode(F)
> > > > -> Impossible. Write inode(DF) with dnode(F) by f2fs_sync_file.
> > > >
> > >
> > > I think this is possible.  If inode(x) is written by writeback and then
> > > dnode(F) is written by f2fs_sync_file.
> > 
> > -> The inode(DF) was missing. Should drop this dnode(F).
> > Again, CP | inode(x) | dnode(F) | inode(DF) should be shown.
> > This is fixed by the below updated patch.
> > 
> > >
> > >
> > > > 6. CP | inode(DF) | dnode(F)
> > > > -> No problem.
> > > >
> > > > 7. CP | dnode(F) | inode(DF)
> > > > -> If f2fs_iget fails, then goto next to find inode(DF).
> > > >
> > > > 8. CP | dnode(F) | inode(x)
> > > > -> If f2fs_iget fails, then goto next. But, this is an impossible scenario.
> > > >    Drop this dnode(F).
> > > >
> > >
> > > I think this is possible.  as 3.
> > 
> > ditto.
> > 
> > >
> > >
> > > > So, this patch adds some missing points such as #1, #5, #7, and #8.
> > > >
> > > > Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> > > > ---
> > > >  fs/f2fs/file.c     | 10 ++++++++
> > > >  fs/f2fs/recovery.c | 70
> > > > +++++++++++++++++++++++++++++++++++++++++++++---------
> > > >  2 files changed, 69 insertions(+), 11 deletions(-)
> > > >
> > > > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> > > > index 5cde363..2660af2 100644
> > > > --- a/fs/f2fs/file.c
> > > > +++ b/fs/f2fs/file.c
> > > > @@ -207,6 +207,16 @@ int f2fs_sync_file(struct file *file, loff_t start,
> > > > loff_t end, int datasync)
> > > >                         up_write(&fi->i_sem);
> > > >                 }
> > > >         } else {
> > > > +               /*
> > > > +                * CP | inode(x) | dnode(F)
> > > > +                * We need to remain inode(DF) for roll-forward recovery.
> > > > +                */
> > > > +               if (fsync_mark_done(sbi, inode->i_ino) &&
> > > >
> > >
> > > How can this happen?  We have not called sync_node_pages(sbi, inode->i_ino,
> > > &wbc).  Where is the fsync mark set?
> > >
> > >
> > > > +                       !is_checkpointed_node(sbi, F2FS_I(inode)->i_pino))
> > > > {
> > > >
> > >
> > > if !is_checkpointed_node(sbi, F2FS_I(inode)->i_pino), we need to do
> > > checkpoint.  Right?
> > 
> > Yes, whole conditions were wrong.
> > Please, refer the following patch.
> > Thanks,
> > 
> > >From 2e70059dd680830c030f59813d4e9837cb65ecb1 Mon Sep 17 00:00:00 2001
> > From: Jaegeuk Kim <jaegeuk@kernel.org>
> > Date: Wed, 10 Sep 2014 00:16:34 -0700
> > Subject: [PATCH 1/2] f2fs: fix roll-forward missing scenarios
> > 
> > We can summarize the roll forward recovery scenarios as follows.
> > 
> > [Term] F: fsync_mark, D: dentry_mark
> > 
> > 1. inode(x) | CP | inode(x) | dnode(F)
> > -> Update the latest inode(x).
> > 
> > 2. inode(x) | CP | inode(F) | dnode(F)
> > -> No problem.
> > 
> > 3. inode(x) | CP | dnode(F) | inode(x)
> > -> Impossible, but recover latest dnode(F).
> > 
> > 4. inode(x) | CP | dnode(F) | inode(F)
> > -> No problem.
> > 
> > 5. CP | inode(x) | dnode(F)
> > -> The inode(DF) was missing. Should drop this dnode(F).
> > 
> > 6. CP | inode(DF) | dnode(F)
> > -> No problem.
> > 
> > 7. CP | dnode(F) | inode(DF)
> > -> If f2fs_iget fails, then goto next to find inode(DF).
> > 
> > 8. CP | dnode(F) | inode(x)
> > -> If f2fs_iget fails, then goto next to find inode(DF).
> >    But it will fail due to no inode(DF).
> > 
> > So, this patch adds some missing points such as #1, #5, #7, and #8.
> > 
> > Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> > ---
> >  fs/f2fs/file.c     | 10 ++++++++
> >  fs/f2fs/recovery.c | 70 +++++++++++++++++++++++++++++++++++++++++++++---------
> >  2 files changed, 69 insertions(+), 11 deletions(-)
> > 
> > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> > index e7681c3..e2a2f38 100644
> > --- a/fs/f2fs/file.c
> > +++ b/fs/f2fs/file.c
> > @@ -206,6 +206,16 @@ int f2fs_sync_file(struct file *file, loff_t start, loff_t end, int
> > datasync)
> >  			up_write(&fi->i_sem);
> >  		}
> >  	} else {
> > +		/*
> > +		 * CP | inode(x) | dnode(F)
> > +		 * We need to remain inode(DF) for roll-forward recovery.
> > +		 */
> > +		if (!fsync_mark_done(sbi, inode->i_ino) &&
> > +			!is_checkpointed_node(sbi, inode->i_ino)) {
> > +			mark_inode_dirty_sync(inode);
> > +			f2fs_write_inode(inode, NULL);
> > +		}
> 
> This un-checkpointed non-fsynced inode page might be writeback by
> kswapd/bdi-flusher here. Then CP | inode(x) | dnode(F) occur.
> 
> So how about modifying as below:
> 
> 		while (!sync_node_pages(sbi, inode->i_ino, &wbc) ||
> 			(!fsync_mark_done(sbi, inode->i_ino) &&
> 			!is_checkpointed_node(sbi, inode->i_ino))) {
> 
> to guarantee the inode(DF) is be written out to disk?

If sync_node_pages write dnode(F), the fsync_done flag for inode will be
set if my understanding of code were correct.  In set_node_addr:

        /* update fsync_mark if its inode nat entry is still alive */
        e = __lookup_nat_cache(nm_i, ni->ino);
        if (e)
                e->fsync_done = fsync_done;

Best Regards,
Huang, Ying

> Thanks,
> Yu
> 
> > +
> >  		/* if there is no written node page, write its inode page */
> >  		while (!sync_node_pages(sbi, ino, &wbc)) {
> >  			if (fsync_mark_done(sbi, ino))
> > diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c
> > index 6c5a74a..eeb3785 100644
> > --- a/fs/f2fs/recovery.c
> > +++ b/fs/f2fs/recovery.c
> > @@ -14,6 +14,36 @@
> >  #include "node.h"
> >  #include "segment.h"
> > 
> > +/*
> > + * Roll forward recovery scenarios.
> > + *
> > + * [Term] F: fsync_mark, D: dentry_mark
> > + *
> > + * 1. inode(x) | CP | inode(x) | dnode(F)
> > + * -> Update the latest inode(x).
> > + *
> > + * 2. inode(x) | CP | inode(F) | dnode(F)
> > + * -> No problem.
> > + *
> > + * 3. inode(x) | CP | dnode(F) | inode(x)
> > + * -> Impossible, but recover latest dnode(F)
> > + *
> > + * 4. inode(x) | CP | dnode(F) | inode(F)
> > + * -> No problem.
> > + *
> > + * 5. CP | inode(x) | dnode(F)
> > + * -> The inode(DF) was missing. Should drop this dnode(F).
> > + *
> > + * 6. CP | inode(DF) | dnode(F)
> > + * -> No problem.
> > + *
> > + * 7. CP | dnode(F) | inode(DF)
> > + * -> If f2fs_iget fails, then goto next to find inode(DF).
> > + *
> > + * 8. CP | dnode(F) | inode(x)
> > + * -> If f2fs_iget fails, then goto next to find inode(DF).
> > + *    But it will fail due to no inode(DF).
> > + */
> >  static struct kmem_cache *fsync_entry_slab;
> > 
> >  bool space_for_roll_forward(struct f2fs_sb_info *sbi)
> > @@ -110,27 +140,32 @@ out:
> >  	return err;
> >  }
> > 
> > -static int recover_inode(struct inode *inode, struct page *node_page)
> > +static void __recover_inode(struct inode *inode, struct page *page)
> >  {
> > -	struct f2fs_inode *raw_inode = F2FS_INODE(node_page);
> > +	struct f2fs_inode *raw = F2FS_INODE(page);
> > +
> > +	inode->i_mode = le16_to_cpu(raw->i_mode);
> > +	i_size_write(inode, le64_to_cpu(raw->i_size));
> > +	inode->i_atime.tv_sec = le64_to_cpu(raw->i_mtime);
> > +	inode->i_ctime.tv_sec = le64_to_cpu(raw->i_ctime);
> > +	inode->i_mtime.tv_sec = le64_to_cpu(raw->i_mtime);
> > +	inode->i_atime.tv_nsec = le32_to_cpu(raw->i_mtime_nsec);
> > +	inode->i_ctime.tv_nsec = le32_to_cpu(raw->i_ctime_nsec);
> > +	inode->i_mtime.tv_nsec = le32_to_cpu(raw->i_mtime_nsec);
> > +}
> > 
> > +static int recover_inode(struct inode *inode, struct page *node_page)
> > +{
> >  	if (!IS_INODE(node_page))
> >  		return 0;
> > 
> > -	inode->i_mode = le16_to_cpu(raw_inode->i_mode);
> > -	i_size_write(inode, le64_to_cpu(raw_inode->i_size));
> > -	inode->i_atime.tv_sec = le64_to_cpu(raw_inode->i_mtime);
> > -	inode->i_ctime.tv_sec = le64_to_cpu(raw_inode->i_ctime);
> > -	inode->i_mtime.tv_sec = le64_to_cpu(raw_inode->i_mtime);
> > -	inode->i_atime.tv_nsec = le32_to_cpu(raw_inode->i_mtime_nsec);
> > -	inode->i_ctime.tv_nsec = le32_to_cpu(raw_inode->i_ctime_nsec);
> > -	inode->i_mtime.tv_nsec = le32_to_cpu(raw_inode->i_mtime_nsec);
> > +	__recover_inode(inode, node_page);
> > 
> >  	if (is_dent_dnode(node_page))
> >  		return recover_dentry(node_page, inode);
> > 
> >  	f2fs_msg(inode->i_sb, KERN_NOTICE, "recover_inode: ino = %x, name = %s",
> > -			ino_of_node(node_page), raw_inode->i_name);
> > +			ino_of_node(node_page), F2FS_INODE(node_page)->i_name);
> >  	return 0;
> >  }
> > 
> > @@ -186,10 +221,16 @@ static int find_fsync_dnodes(struct f2fs_sb_info *sbi, struct list_head
> > *head)
> >  				break;
> >  			}
> > 
> > +			/*
> > +			 * CP | dnode(F) | inode(DF)
> > +			 * For this case, we should not give up now.
> > +			 */
> >  			entry->inode = f2fs_iget(sbi->sb, ino_of_node(page));
> >  			if (IS_ERR(entry->inode)) {
> >  				err = PTR_ERR(entry->inode);
> >  				kmem_cache_free(fsync_entry_slab, entry);
> > +				if (err == -ENOENT)
> > +					goto next;
> >  				break;
> >  			}
> >  			list_add_tail(&entry->list, head);
> > @@ -416,6 +457,13 @@ static int recover_data(struct f2fs_sb_info *sbi,
> >  		entry = get_fsync_inode(head, ino_of_node(page));
> >  		if (!entry)
> >  			goto next;
> > +		/*
> > +		 * inode(x) | CP | inode(x) | dnode(F)
> > +		 * In this case, we can lose the latest inode(x).
> > +		 * So, call __recover_inode for the inode update.
> > +		 */
> > +		if (IS_INODE(page))
> > +			__recover_inode(entry->inode, page);
> > 
> >  		err = do_recover_data(sbi, entry->inode, page, blkaddr);
> >  		if (err)
> > --
> > 1.8.5.2 (Apple Git-48)
> > 
> > 
> > ------------------------------------------------------------------------------
> > Want excitement?
> > Manually upgrade your production database.
> > When you want reliability, choose Perforce
> > Perforce version control. Predictably reliable.
> > http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
> > _______________________________________________
> > Linux-f2fs-devel mailing list
> > Linux-f2fs-devel@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
> 



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] f2fs: Fix recover when nid of non-inode dnode < nid of inode
@ 2014-09-11 12:31                   ` Huang Ying
  0 siblings, 0 replies; 31+ messages in thread
From: Huang Ying @ 2014-09-11 12:31 UTC (permalink / raw)
  To: Chao Yu
  Cc: 'Jaegeuk Kim', 'huang ying', 'LKML',
	linux-f2fs-devel

On Thu, 2014-09-11 at 18:47 +0800, Chao Yu wrote:
> > -----Original Message-----
> > From: Jaegeuk Kim [mailto:jaegeuk@kernel.org]
> > Sent: Thursday, September 11, 2014 1:37 PM
> > To: huang ying
> > Cc: linux-f2fs-devel@lists.sourceforge.net; LKML; Huang Ying
> > Subject: Re: [f2fs-dev] [PATCH] f2fs: Fix recover when nid of non-inode dnode < nid of inode
> > 
> > On Wed, Sep 10, 2014 at 07:08:32PM +0800, huang ying wrote:
> > > On Wed, Sep 10, 2014 at 3:21 PM, Jaegeuk Kim <jaegeuk@kernel.org> wrote:
> > >
> > > > On Tue, Sep 09, 2014 at 07:31:49PM +0800, huang ying wrote:
> > > > > On Tue, Sep 9, 2014 at 3:09 PM, Jaegeuk Kim <jaegeuk@kernel.org> wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > On Tue, Sep 09, 2014 at 01:39:30PM +0800, Huang Ying wrote:
> > > > > > > On Mon, 2014-09-08 at 22:23 -0700, Jaegeuk Kim wrote:
> > > > > > > > Hi Huang,
> > > > > > > >
> > > > > > > > On Mon, Sep 08, 2014 at 07:38:26PM +0800, Huang Ying wrote:
> > > > > > > > > For fsync, if the nid of a non-inode dnode < nid of inode and the
> > > > > > > > > inode is not checkpointed.  The non-inode dnode may be written
> > > > before
> > > > > > > > > inode.  So in find_fsync_dnodes, f2fs_iget will fail, cause the
> > > > > > > > > recovery fail.
> > > > > > > > >
> > > > > > > > > Usually, inode will be allocated before non-inode dnode, so the
> > > > nid
> > > > > > of
> > > > > > > > > inode < nid of non-inode dnode.  But it is possible for the
> > > > reverse.
> > > > > > > > > For example, because of alloc_nid_failed.
> > > > > > > > >
> > > > > > > > > This is fixed via ignoring non-inode dnode before inode dnode in
> > > > > > > > > find_fsync_dnodes.
> > > > > > > > >
> > > > > > > > > The patch was tested via allocating nid reversely via a debugging
> > > > > > > > > patch, that is, from big number to small number.
> > > > > > > > >
> > > > > > > > > Signed-off-by: Huang, Ying <ying.huang@intel.com>
> > > > > > > > > ---
> > > > > > > > >  fs/f2fs/recovery.c |    7 ++++---
> > > > > > > > >  1 file changed, 4 insertions(+), 3 deletions(-)
> > > > > > > > >
> > > > > > > > > --- a/fs/f2fs/recovery.c
> > > > > > > > > +++ b/fs/f2fs/recovery.c
> > > > > > > > > @@ -172,8 +172,8 @@ static int find_fsync_dnodes(struct f2fs
> > > > > > > > >                   if (IS_INODE(page) && is_dent_dnode(page))
> > > > > > > > >                           set_inode_flag(F2FS_I(entry->inode),
> > > > > > > > >                                                   FI_INC_LINK);
> > > > > > > > > -         } else {
> > > > > > > > > -                 if (IS_INODE(page) && is_dent_dnode(page)) {
> > > > > > > >
> > > > > > > > If this is not inode block, we should add this inode to recover its
> > > > > > data blocks.
> > > > > > >
> > > > > > > Is it possible that there is only non-inode dnode but no inode when
> > > > > > > find_fsync_dnodes checking dnodes?  Per my understanding, any
> > > > changes to
> > > > > > > file will cause inode page dirty (for example, mtime changed), so
> > > > that
> > > > > > > we will write inode block.  Is it right?  If so, the solution in this
> > > > > > > patch should work too.
> > > > > >
> > > > > > Your description says that f2fs_iget will fail, which causes the
> > > > recovery
> > > > > > fail.
> > > > > > So, I thought it would be better to handle the f2fs_iget failure
> > > > directly.
> > > > > >
> > > > >
> > > > > Yes.  That is another way to fix the issue.
> > > > >
> > > > >
> > > > > > In addition, we cannot guarantee the write order of dnode and inode.
> > > > > > For exmaple,
> > > > > > 1. the inode is written by flusher or kswapd, then,
> > > > > > 2. f2fs_sync_file writes its dnode.
> > > > > >
> > > > > > In that case, we can get only non-inode dnode in the node chain, since
> > > > the
> > > > > > inode
> > > > > > has not fsync_mark.
> > > > > >
> > > > >
> > > > > I think your solution is better here, but does not fix all scenarios.  If
> > > > > the inode is checkpointed, the file can be recovered, although the inode
> > > > > information may be not up to date.  But if the inode is not checkpointed,
> > > > > f2fs_iget will fail too and recover will fail.
> > > >
> > > > Ok, let me consider your scenarios.
> > > >
> > > > Term: F: fsync_mark, D: dentry_mark
> > > >
> > > > 1. inode(x) | CP | inode(x) | dnode(F)
> > > >  -> Lose the latest inode(x). Need to fix.
> > > >
> > > > 2. inode(x) | CP | dnode(F) | inode(x)
> > > >  -> Impossible, but recover latest dnode(F)
> > > >
> > > > 3. CP | inode(x) | dnode(F)
> > > >  -> Need to write inode(DF) in f2fs_sync_file.
> > > >
> > > > 4. CP | dnode(F) | inode(DF)
> > > >  -> If f2fs_iget fails, then goto next.
> > > >
> > > > 5. CP | dnode(F) | inode(x)
> > > >  -> If f2fs_iget fails, then goto next. But, this is an impossible
> > > > scenario.
> > > >     Drop this dnode(F).
> > > >
> > > > Indeed, there were some missing scenarios.
> > > >
> > > > So, how about this patch?
> > > >
> > > > From 552dc68c5f07a335d7b55c197bab531efb135521 Mon Sep 17 00:00:00 2001
> > > > From: Jaegeuk Kim <jaegeuk@kernel.org>
> > > > Date: Wed, 10 Sep 2014 00:16:34 -0700
> > > > Subject: [PATCH] f2fs: fix roll-forward missing scenarios
> > > >
> > > > We can summarize the roll forward recovery scenarios as follows.
> > > >
> > > > [Term] F: fsync_mark, D: dentry_mark
> > > >
> > > > 1. inode(x) | CP | inode(x) | dnode(F)
> > > > -> Update the latest inode(x).
> > > >
> > > > 2. inode(x) | CP | inode(F) | dnode(F)
> > > > -> No problem.
> > > >
> > > > 3. inode(x) | CP | dnode(F) | inode(x)
> > > > -> Impossible, but recover latest dnode(F)
> > > >
> > >
> > > I think this is possible.  If f2fs_sync_file runs concurrently with
> > > writeback. f2fs_sync_file written dnode(F), then writeback written inode(x).
> > 
> > If the inode(x) was written, f2fs_sync_file will do write the inode again with
> > fsync_mark. So, dnode(F) | inode(x) | inode(F) should be shown.
> > 
> > In f2fs_sync_file,
> > 	...
> > 	while (!sync_node_pages(sbi, ino, &wbc)) {
> > 		if (fsync_mark_done(sbi, ino))
> > 			goto out;
> > 		mark_inode_dirty_sync(inode);
> > 		ret = f2fs_write_inode(inode, NULL);
> > 		if (ret)
> > 			goto out;
> > 	}
> > 	...
> > 
> > >
> > >
> > > > 4. inode(x) | CP | dnode(F) | inode(F)
> > > > -> No problem.
> > > >
> > > > 5. CP | inode(x) | dnode(F)
> > > > -> Impossible. Write inode(DF) with dnode(F) by f2fs_sync_file.
> > > >
> > >
> > > I think this is possible.  If inode(x) is written by writeback and then
> > > dnode(F) is written by f2fs_sync_file.
> > 
> > -> The inode(DF) was missing. Should drop this dnode(F).
> > Again, CP | inode(x) | dnode(F) | inode(DF) should be shown.
> > This is fixed by the below updated patch.
> > 
> > >
> > >
> > > > 6. CP | inode(DF) | dnode(F)
> > > > -> No problem.
> > > >
> > > > 7. CP | dnode(F) | inode(DF)
> > > > -> If f2fs_iget fails, then goto next to find inode(DF).
> > > >
> > > > 8. CP | dnode(F) | inode(x)
> > > > -> If f2fs_iget fails, then goto next. But, this is an impossible scenario.
> > > >    Drop this dnode(F).
> > > >
> > >
> > > I think this is possible.  as 3.
> > 
> > ditto.
> > 
> > >
> > >
> > > > So, this patch adds some missing points such as #1, #5, #7, and #8.
> > > >
> > > > Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> > > > ---
> > > >  fs/f2fs/file.c     | 10 ++++++++
> > > >  fs/f2fs/recovery.c | 70
> > > > +++++++++++++++++++++++++++++++++++++++++++++---------
> > > >  2 files changed, 69 insertions(+), 11 deletions(-)
> > > >
> > > > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> > > > index 5cde363..2660af2 100644
> > > > --- a/fs/f2fs/file.c
> > > > +++ b/fs/f2fs/file.c
> > > > @@ -207,6 +207,16 @@ int f2fs_sync_file(struct file *file, loff_t start,
> > > > loff_t end, int datasync)
> > > >                         up_write(&fi->i_sem);
> > > >                 }
> > > >         } else {
> > > > +               /*
> > > > +                * CP | inode(x) | dnode(F)
> > > > +                * We need to remain inode(DF) for roll-forward recovery.
> > > > +                */
> > > > +               if (fsync_mark_done(sbi, inode->i_ino) &&
> > > >
> > >
> > > How can this happen?  We have not called sync_node_pages(sbi, inode->i_ino,
> > > &wbc).  Where is the fsync mark set?
> > >
> > >
> > > > +                       !is_checkpointed_node(sbi, F2FS_I(inode)->i_pino))
> > > > {
> > > >
> > >
> > > if !is_checkpointed_node(sbi, F2FS_I(inode)->i_pino), we need to do
> > > checkpoint.  Right?
> > 
> > Yes, whole conditions were wrong.
> > Please, refer the following patch.
> > Thanks,
> > 
> > >From 2e70059dd680830c030f59813d4e9837cb65ecb1 Mon Sep 17 00:00:00 2001
> > From: Jaegeuk Kim <jaegeuk@kernel.org>
> > Date: Wed, 10 Sep 2014 00:16:34 -0700
> > Subject: [PATCH 1/2] f2fs: fix roll-forward missing scenarios
> > 
> > We can summarize the roll forward recovery scenarios as follows.
> > 
> > [Term] F: fsync_mark, D: dentry_mark
> > 
> > 1. inode(x) | CP | inode(x) | dnode(F)
> > -> Update the latest inode(x).
> > 
> > 2. inode(x) | CP | inode(F) | dnode(F)
> > -> No problem.
> > 
> > 3. inode(x) | CP | dnode(F) | inode(x)
> > -> Impossible, but recover latest dnode(F).
> > 
> > 4. inode(x) | CP | dnode(F) | inode(F)
> > -> No problem.
> > 
> > 5. CP | inode(x) | dnode(F)
> > -> The inode(DF) was missing. Should drop this dnode(F).
> > 
> > 6. CP | inode(DF) | dnode(F)
> > -> No problem.
> > 
> > 7. CP | dnode(F) | inode(DF)
> > -> If f2fs_iget fails, then goto next to find inode(DF).
> > 
> > 8. CP | dnode(F) | inode(x)
> > -> If f2fs_iget fails, then goto next to find inode(DF).
> >    But it will fail due to no inode(DF).
> > 
> > So, this patch adds some missing points such as #1, #5, #7, and #8.
> > 
> > Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> > ---
> >  fs/f2fs/file.c     | 10 ++++++++
> >  fs/f2fs/recovery.c | 70 +++++++++++++++++++++++++++++++++++++++++++++---------
> >  2 files changed, 69 insertions(+), 11 deletions(-)
> > 
> > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> > index e7681c3..e2a2f38 100644
> > --- a/fs/f2fs/file.c
> > +++ b/fs/f2fs/file.c
> > @@ -206,6 +206,16 @@ int f2fs_sync_file(struct file *file, loff_t start, loff_t end, int
> > datasync)
> >  			up_write(&fi->i_sem);
> >  		}
> >  	} else {
> > +		/*
> > +		 * CP | inode(x) | dnode(F)
> > +		 * We need to remain inode(DF) for roll-forward recovery.
> > +		 */
> > +		if (!fsync_mark_done(sbi, inode->i_ino) &&
> > +			!is_checkpointed_node(sbi, inode->i_ino)) {
> > +			mark_inode_dirty_sync(inode);
> > +			f2fs_write_inode(inode, NULL);
> > +		}
> 
> This un-checkpointed non-fsynced inode page might be writeback by
> kswapd/bdi-flusher here. Then CP | inode(x) | dnode(F) occur.
> 
> So how about modifying as below:
> 
> 		while (!sync_node_pages(sbi, inode->i_ino, &wbc) ||
> 			(!fsync_mark_done(sbi, inode->i_ino) &&
> 			!is_checkpointed_node(sbi, inode->i_ino))) {
> 
> to guarantee the inode(DF) is be written out to disk?

If sync_node_pages write dnode(F), the fsync_done flag for inode will be
set if my understanding of code were correct.  In set_node_addr:

        /* update fsync_mark if its inode nat entry is still alive */
        e = __lookup_nat_cache(nm_i, ni->ino);
        if (e)
                e->fsync_done = fsync_done;

Best Regards,
Huang, Ying

> Thanks,
> Yu
> 
> > +
> >  		/* if there is no written node page, write its inode page */
> >  		while (!sync_node_pages(sbi, ino, &wbc)) {
> >  			if (fsync_mark_done(sbi, ino))
> > diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c
> > index 6c5a74a..eeb3785 100644
> > --- a/fs/f2fs/recovery.c
> > +++ b/fs/f2fs/recovery.c
> > @@ -14,6 +14,36 @@
> >  #include "node.h"
> >  #include "segment.h"
> > 
> > +/*
> > + * Roll forward recovery scenarios.
> > + *
> > + * [Term] F: fsync_mark, D: dentry_mark
> > + *
> > + * 1. inode(x) | CP | inode(x) | dnode(F)
> > + * -> Update the latest inode(x).
> > + *
> > + * 2. inode(x) | CP | inode(F) | dnode(F)
> > + * -> No problem.
> > + *
> > + * 3. inode(x) | CP | dnode(F) | inode(x)
> > + * -> Impossible, but recover latest dnode(F)
> > + *
> > + * 4. inode(x) | CP | dnode(F) | inode(F)
> > + * -> No problem.
> > + *
> > + * 5. CP | inode(x) | dnode(F)
> > + * -> The inode(DF) was missing. Should drop this dnode(F).
> > + *
> > + * 6. CP | inode(DF) | dnode(F)
> > + * -> No problem.
> > + *
> > + * 7. CP | dnode(F) | inode(DF)
> > + * -> If f2fs_iget fails, then goto next to find inode(DF).
> > + *
> > + * 8. CP | dnode(F) | inode(x)
> > + * -> If f2fs_iget fails, then goto next to find inode(DF).
> > + *    But it will fail due to no inode(DF).
> > + */
> >  static struct kmem_cache *fsync_entry_slab;
> > 
> >  bool space_for_roll_forward(struct f2fs_sb_info *sbi)
> > @@ -110,27 +140,32 @@ out:
> >  	return err;
> >  }
> > 
> > -static int recover_inode(struct inode *inode, struct page *node_page)
> > +static void __recover_inode(struct inode *inode, struct page *page)
> >  {
> > -	struct f2fs_inode *raw_inode = F2FS_INODE(node_page);
> > +	struct f2fs_inode *raw = F2FS_INODE(page);
> > +
> > +	inode->i_mode = le16_to_cpu(raw->i_mode);
> > +	i_size_write(inode, le64_to_cpu(raw->i_size));
> > +	inode->i_atime.tv_sec = le64_to_cpu(raw->i_mtime);
> > +	inode->i_ctime.tv_sec = le64_to_cpu(raw->i_ctime);
> > +	inode->i_mtime.tv_sec = le64_to_cpu(raw->i_mtime);
> > +	inode->i_atime.tv_nsec = le32_to_cpu(raw->i_mtime_nsec);
> > +	inode->i_ctime.tv_nsec = le32_to_cpu(raw->i_ctime_nsec);
> > +	inode->i_mtime.tv_nsec = le32_to_cpu(raw->i_mtime_nsec);
> > +}
> > 
> > +static int recover_inode(struct inode *inode, struct page *node_page)
> > +{
> >  	if (!IS_INODE(node_page))
> >  		return 0;
> > 
> > -	inode->i_mode = le16_to_cpu(raw_inode->i_mode);
> > -	i_size_write(inode, le64_to_cpu(raw_inode->i_size));
> > -	inode->i_atime.tv_sec = le64_to_cpu(raw_inode->i_mtime);
> > -	inode->i_ctime.tv_sec = le64_to_cpu(raw_inode->i_ctime);
> > -	inode->i_mtime.tv_sec = le64_to_cpu(raw_inode->i_mtime);
> > -	inode->i_atime.tv_nsec = le32_to_cpu(raw_inode->i_mtime_nsec);
> > -	inode->i_ctime.tv_nsec = le32_to_cpu(raw_inode->i_ctime_nsec);
> > -	inode->i_mtime.tv_nsec = le32_to_cpu(raw_inode->i_mtime_nsec);
> > +	__recover_inode(inode, node_page);
> > 
> >  	if (is_dent_dnode(node_page))
> >  		return recover_dentry(node_page, inode);
> > 
> >  	f2fs_msg(inode->i_sb, KERN_NOTICE, "recover_inode: ino = %x, name = %s",
> > -			ino_of_node(node_page), raw_inode->i_name);
> > +			ino_of_node(node_page), F2FS_INODE(node_page)->i_name);
> >  	return 0;
> >  }
> > 
> > @@ -186,10 +221,16 @@ static int find_fsync_dnodes(struct f2fs_sb_info *sbi, struct list_head
> > *head)
> >  				break;
> >  			}
> > 
> > +			/*
> > +			 * CP | dnode(F) | inode(DF)
> > +			 * For this case, we should not give up now.
> > +			 */
> >  			entry->inode = f2fs_iget(sbi->sb, ino_of_node(page));
> >  			if (IS_ERR(entry->inode)) {
> >  				err = PTR_ERR(entry->inode);
> >  				kmem_cache_free(fsync_entry_slab, entry);
> > +				if (err == -ENOENT)
> > +					goto next;
> >  				break;
> >  			}
> >  			list_add_tail(&entry->list, head);
> > @@ -416,6 +457,13 @@ static int recover_data(struct f2fs_sb_info *sbi,
> >  		entry = get_fsync_inode(head, ino_of_node(page));
> >  		if (!entry)
> >  			goto next;
> > +		/*
> > +		 * inode(x) | CP | inode(x) | dnode(F)
> > +		 * In this case, we can lose the latest inode(x).
> > +		 * So, call __recover_inode for the inode update.
> > +		 */
> > +		if (IS_INODE(page))
> > +			__recover_inode(entry->inode, page);
> > 
> >  		err = do_recover_data(sbi, entry->inode, page, blkaddr);
> >  		if (err)
> > --
> > 1.8.5.2 (Apple Git-48)
> > 
> > 
> > ------------------------------------------------------------------------------
> > Want excitement?
> > Manually upgrade your production database.
> > When you want reliability, choose Perforce
> > Perforce version control. Predictably reliable.
> > http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
> > _______________________________________________
> > Linux-f2fs-devel mailing list
> > Linux-f2fs-devel@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
> 



------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] f2fs: Fix recover when nid of non-inode dnode < nid of inode
  2014-09-11 12:25                 ` Huang Ying
  (?)
@ 2014-09-12  5:13                 ` Jaegeuk Kim
  2014-09-12  7:34                     ` Huang Ying
  -1 siblings, 1 reply; 31+ messages in thread
From: Jaegeuk Kim @ 2014-09-12  5:13 UTC (permalink / raw)
  To: Huang Ying; +Cc: huang ying, Changman Lee, linux-f2fs-devel, LKML

On Thu, Sep 11, 2014 at 08:25:17PM +0800, Huang Ying wrote:
> 
> On Wed, 2014-09-10 at 22:37 -0700, Jaegeuk Kim wrote:
> > On Wed, Sep 10, 2014 at 07:08:32PM +0800, huang ying wrote:
> > > On Wed, Sep 10, 2014 at 3:21 PM, Jaegeuk Kim <jaegeuk@kernel.org> wrote:
> > > 
> > > > On Tue, Sep 09, 2014 at 07:31:49PM +0800, huang ying wrote:
> > > > > On Tue, Sep 9, 2014 at 3:09 PM, Jaegeuk Kim <jaegeuk@kernel.org> wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > On Tue, Sep 09, 2014 at 01:39:30PM +0800, Huang Ying wrote:
> > > > > > > On Mon, 2014-09-08 at 22:23 -0700, Jaegeuk Kim wrote:
> > > > > > > > Hi Huang,
> > > > > > > >
> > > > > > > > On Mon, Sep 08, 2014 at 07:38:26PM +0800, Huang Ying wrote:
> > > > > > > > > For fsync, if the nid of a non-inode dnode < nid of inode and the
> > > > > > > > > inode is not checkpointed.  The non-inode dnode may be written
> > > > before
> > > > > > > > > inode.  So in find_fsync_dnodes, f2fs_iget will fail, cause the
> > > > > > > > > recovery fail.
> > > > > > > > >
> > > > > > > > > Usually, inode will be allocated before non-inode dnode, so the
> > > > nid
> > > > > > of
> > > > > > > > > inode < nid of non-inode dnode.  But it is possible for the
> > > > reverse.
> > > > > > > > > For example, because of alloc_nid_failed.
> > > > > > > > >
> > > > > > > > > This is fixed via ignoring non-inode dnode before inode dnode in
> > > > > > > > > find_fsync_dnodes.
> > > > > > > > >
> > > > > > > > > The patch was tested via allocating nid reversely via a debugging
> > > > > > > > > patch, that is, from big number to small number.
> > > > > > > > >
> > > > > > > > > Signed-off-by: Huang, Ying <ying.huang@intel.com>
> > > > > > > > > ---
> > > > > > > > >  fs/f2fs/recovery.c |    7 ++++---
> > > > > > > > >  1 file changed, 4 insertions(+), 3 deletions(-)
> > > > > > > > >
> > > > > > > > > --- a/fs/f2fs/recovery.c
> > > > > > > > > +++ b/fs/f2fs/recovery.c
> > > > > > > > > @@ -172,8 +172,8 @@ static int find_fsync_dnodes(struct f2fs
> > > > > > > > >                   if (IS_INODE(page) && is_dent_dnode(page))
> > > > > > > > >                           set_inode_flag(F2FS_I(entry->inode),
> > > > > > > > >                                                   FI_INC_LINK);
> > > > > > > > > -         } else {
> > > > > > > > > -                 if (IS_INODE(page) && is_dent_dnode(page)) {
> > > > > > > >
> > > > > > > > If this is not inode block, we should add this inode to recover its
> > > > > > data blocks.
> > > > > > >
> > > > > > > Is it possible that there is only non-inode dnode but no inode when
> > > > > > > find_fsync_dnodes checking dnodes?  Per my understanding, any
> > > > changes to
> > > > > > > file will cause inode page dirty (for example, mtime changed), so
> > > > that
> > > > > > > we will write inode block.  Is it right?  If so, the solution in this
> > > > > > > patch should work too.
> > > > > >
> > > > > > Your description says that f2fs_iget will fail, which causes the
> > > > recovery
> > > > > > fail.
> > > > > > So, I thought it would be better to handle the f2fs_iget failure
> > > > directly.
> > > > > >
> > > > >
> > > > > Yes.  That is another way to fix the issue.
> > > > >
> > > > >
> > > > > > In addition, we cannot guarantee the write order of dnode and inode.
> > > > > > For exmaple,
> > > > > > 1. the inode is written by flusher or kswapd, then,
> > > > > > 2. f2fs_sync_file writes its dnode.
> > > > > >
> > > > > > In that case, we can get only non-inode dnode in the node chain, since
> > > > the
> > > > > > inode
> > > > > > has not fsync_mark.
> > > > > >
> > > > >
> > > > > I think your solution is better here, but does not fix all scenarios.  If
> > > > > the inode is checkpointed, the file can be recovered, although the inode
> > > > > information may be not up to date.  But if the inode is not checkpointed,
> > > > > f2fs_iget will fail too and recover will fail.
> > > >
> > > > Ok, let me consider your scenarios.
> > > >
> > > > Term: F: fsync_mark, D: dentry_mark
> > > >
> > > > 1. inode(x) | CP | inode(x) | dnode(F)
> > > >  -> Lose the latest inode(x). Need to fix.
> > > >
> > > > 2. inode(x) | CP | dnode(F) | inode(x)
> > > >  -> Impossible, but recover latest dnode(F)
> > > >
> > > > 3. CP | inode(x) | dnode(F)
> > > >  -> Need to write inode(DF) in f2fs_sync_file.
> > > >
> > > > 4. CP | dnode(F) | inode(DF)
> > > >  -> If f2fs_iget fails, then goto next.
> > > >
> > > > 5. CP | dnode(F) | inode(x)
> > > >  -> If f2fs_iget fails, then goto next. But, this is an impossible
> > > > scenario.
> > > >     Drop this dnode(F).
> > > >
> > > > Indeed, there were some missing scenarios.
> > > >
> > > > So, how about this patch?
> > > >
> > > > From 552dc68c5f07a335d7b55c197bab531efb135521 Mon Sep 17 00:00:00 2001
> > > > From: Jaegeuk Kim <jaegeuk@kernel.org>
> > > > Date: Wed, 10 Sep 2014 00:16:34 -0700
> > > > Subject: [PATCH] f2fs: fix roll-forward missing scenarios
> > > >
> > > > We can summarize the roll forward recovery scenarios as follows.
> > > >
> > > > [Term] F: fsync_mark, D: dentry_mark
> > > >
> > > > 1. inode(x) | CP | inode(x) | dnode(F)
> > > > -> Update the latest inode(x).
> > > >
> > > > 2. inode(x) | CP | inode(F) | dnode(F)
> > > > -> No problem.
> > > >
> > > > 3. inode(x) | CP | dnode(F) | inode(x)
> > > > -> Impossible, but recover latest dnode(F)
> > > >
> > > 
> > > I think this is possible.  If f2fs_sync_file runs concurrently with
> > > writeback. f2fs_sync_file written dnode(F), then writeback written inode(x).
> > 
> > If the inode(x) was written, f2fs_sync_file will do write the inode again with
> > fsync_mark. So, dnode(F) | inode(x) | inode(F) should be shown.
> > 
> > In f2fs_sync_file,
> > 	...
> > 	while (!sync_node_pages(sbi, ino, &wbc)) {
> > 		if (fsync_mark_done(sbi, ino))
> > 			goto out;
> > 		mark_inode_dirty_sync(inode);
> > 		ret = f2fs_write_inode(inode, NULL);
> > 		if (ret)
> > 			goto out;
> > 	}
> > 	...
> 
> Is the following situation possible?
> 
> f2fs_sync_file				<writeback>
>   sync_node_pages			  f2fs_write_node_pages
>     write dnode(F)			    sync_node_pages
> 					      write inode(x) /* clear PAGECACHE_TAG_DIRTY */
> 
> 
> That is, f2fs_sync_file run parallel with <writeback>.  The
> sync_node_pages above will return 1, because dnode(F) is written.
> inode(x) is written by <writeback> path.  And because
> PAGECACHE_TAG_DIRTY is cleared, it is possible that sync_node_pages
> called by f2fs_sync_file does not write inode(F).

I think Chao's comment would work.
How about this patch?

>From 32fe5ff49d2c78d3be4cf3638cc64ae71cf44549 Mon Sep 17 00:00:00 2001
From: Jaegeuk Kim <jaegeuk@kernel.org>
Date: Wed, 10 Sep 2014 00:16:34 -0700
Subject: [PATCH] f2fs: fix roll-forward missing scenarios

We can summarize the roll forward recovery scenarios as follows.

[Term] F: fsync_mark, D: dentry_mark

1. inode(x) | CP | inode(x) | dnode(F)
-> Update the latest inode(x).

2. inode(x) | CP | inode(F) | dnode(F)
-> No problem.

3. inode(x) | CP | dnode(F) | inode(x)
-> Recover to the latest dnode(F), and drop the last inode(x)

4. inode(x) | CP | dnode(F) | inode(F)
-> No problem.

5. CP | inode(x) | dnode(F)
-> The inode(DF) was missing. Should drop this dnode(F).

6. CP | inode(DF) | dnode(F)
-> No problem.

7. CP | dnode(F) | inode(DF)
-> If f2fs_iget fails, then goto next to find inode(DF).

8. CP | dnode(F) | inode(x)
-> If f2fs_iget fails, then goto next to find inode(DF).
   But it will fail due to no inode(DF).

So, this patch adds some missing points such as #1, #5, #7, and #8.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
---
 fs/f2fs/file.c     | 20 ++++++++++++----
 fs/f2fs/node.c     | 11 ++++++++-
 fs/f2fs/recovery.c | 70 +++++++++++++++++++++++++++++++++++++++++++++---------
 3 files changed, 85 insertions(+), 16 deletions(-)

diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index e7681c3..70f5d4b 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -206,15 +206,27 @@ int f2fs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
 			up_write(&fi->i_sem);
 		}
 	} else {
-		/* if there is no written node page, write its inode page */
-		while (!sync_node_pages(sbi, ino, &wbc)) {
-			if (fsync_mark_done(sbi, ino))
-				goto out;
+sync_nodes:
+		sync_node_pages(sbi, ino, &wbc);
+
+		/*
+		 * inode(x) | CP | inode(x) | dnode(F)
+		 *  -> ok
+		 * inode(x) | CP | dnode(F) | inode(x)
+		 *  -> inode(x) | CP | dnode(F) | inode(x) | inode(F)
+		 * CP | inode(x) | dnode(F)
+		 *  -> CP | inode(x) | dnode(F) | inode(DF)
+		 * CP | dnode(F) | inode(x)
+		 *  -> CP | dnode(F) | inode(x) | inode(DF)
+		 */
+		if (!fsync_mark_done(sbi, ino)) {
 			mark_inode_dirty_sync(inode);
 			ret = f2fs_write_inode(inode, NULL);
 			if (ret)
 				goto out;
+			goto sync_nodes;
 		}
+
 		ret = wait_on_node_pages_writeback(sbi, ino);
 		if (ret)
 			goto out;
diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index b32eb56..653aa71 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -248,8 +248,17 @@ retry:
 
 	/* update fsync_mark if its inode nat entry is still alive */
 	e = __lookup_nat_cache(nm_i, ni->ino);
-	if (e)
+	if (e) {
+		/*
+		 * CP | inode(x) | dnode(F)
+		 *  -> CP | inode(x) | dnode(F) | inode(DF)
+		 */
+		if (!e->checkpointed && !e->fsync_done &&
+				ni->ino != ni->nid && fsync_done)
+			goto skip;
 		e->fsync_done = fsync_done;
+	}
+skip:
 	write_unlock(&nm_i->nat_tree_lock);
 }
 
diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c
index 6c5a74a..3736728 100644
--- a/fs/f2fs/recovery.c
+++ b/fs/f2fs/recovery.c
@@ -14,6 +14,36 @@
 #include "node.h"
 #include "segment.h"
 
+/*
+ * Roll forward recovery scenarios.
+ *
+ * [Term] F: fsync_mark, D: dentry_mark
+ *
+ * 1. inode(x) | CP | inode(x) | dnode(F)
+ * -> Update the latest inode(x).
+ *
+ * 2. inode(x) | CP | inode(F) | dnode(F)
+ * -> No problem.
+ *
+ * 3. inode(x) | CP | dnode(F) | inode(x)
+ * -> Recover to the latest dnode(F), and drop the last inode(x)
+ *
+ * 4. inode(x) | CP | dnode(F) | inode(F)
+ * -> No problem.
+ *
+ * 5. CP | inode(x) | dnode(F)
+ * -> The inode(DF) was missing. Should drop this dnode(F).
+ *
+ * 6. CP | inode(DF) | dnode(F)
+ * -> No problem.
+ *
+ * 7. CP | dnode(F) | inode(DF)
+ * -> If f2fs_iget fails, then goto next to find inode(DF).
+ *
+ * 8. CP | dnode(F) | inode(x)
+ * -> If f2fs_iget fails, then goto next to find inode(DF).
+ *    But it will fail due to no inode(DF).
+ */
 static struct kmem_cache *fsync_entry_slab;
 
 bool space_for_roll_forward(struct f2fs_sb_info *sbi)
@@ -110,27 +140,32 @@ out:
 	return err;
 }
 
-static int recover_inode(struct inode *inode, struct page *node_page)
+static void __recover_inode(struct inode *inode, struct page *page)
 {
-	struct f2fs_inode *raw_inode = F2FS_INODE(node_page);
+	struct f2fs_inode *raw = F2FS_INODE(page);
+
+	inode->i_mode = le16_to_cpu(raw->i_mode);
+	i_size_write(inode, le64_to_cpu(raw->i_size));
+	inode->i_atime.tv_sec = le64_to_cpu(raw->i_mtime);
+	inode->i_ctime.tv_sec = le64_to_cpu(raw->i_ctime);
+	inode->i_mtime.tv_sec = le64_to_cpu(raw->i_mtime);
+	inode->i_atime.tv_nsec = le32_to_cpu(raw->i_mtime_nsec);
+	inode->i_ctime.tv_nsec = le32_to_cpu(raw->i_ctime_nsec);
+	inode->i_mtime.tv_nsec = le32_to_cpu(raw->i_mtime_nsec);
+}
 
+static int recover_inode(struct inode *inode, struct page *node_page)
+{
 	if (!IS_INODE(node_page))
 		return 0;
 
-	inode->i_mode = le16_to_cpu(raw_inode->i_mode);
-	i_size_write(inode, le64_to_cpu(raw_inode->i_size));
-	inode->i_atime.tv_sec = le64_to_cpu(raw_inode->i_mtime);
-	inode->i_ctime.tv_sec = le64_to_cpu(raw_inode->i_ctime);
-	inode->i_mtime.tv_sec = le64_to_cpu(raw_inode->i_mtime);
-	inode->i_atime.tv_nsec = le32_to_cpu(raw_inode->i_mtime_nsec);
-	inode->i_ctime.tv_nsec = le32_to_cpu(raw_inode->i_ctime_nsec);
-	inode->i_mtime.tv_nsec = le32_to_cpu(raw_inode->i_mtime_nsec);
+	__recover_inode(inode, node_page);
 
 	if (is_dent_dnode(node_page))
 		return recover_dentry(node_page, inode);
 
 	f2fs_msg(inode->i_sb, KERN_NOTICE, "recover_inode: ino = %x, name = %s",
-			ino_of_node(node_page), raw_inode->i_name);
+			ino_of_node(node_page), F2FS_INODE(node_page)->i_name);
 	return 0;
 }
 
@@ -186,10 +221,16 @@ static int find_fsync_dnodes(struct f2fs_sb_info *sbi, struct list_head *head)
 				break;
 			}
 
+			/*
+			 * CP | dnode(F) | inode(DF)
+			 * For this case, we should not give up now.
+			 */
 			entry->inode = f2fs_iget(sbi->sb, ino_of_node(page));
 			if (IS_ERR(entry->inode)) {
 				err = PTR_ERR(entry->inode);
 				kmem_cache_free(fsync_entry_slab, entry);
+				if (err == -ENOENT)
+					goto next;
 				break;
 			}
 			list_add_tail(&entry->list, head);
@@ -416,6 +457,13 @@ static int recover_data(struct f2fs_sb_info *sbi,
 		entry = get_fsync_inode(head, ino_of_node(page));
 		if (!entry)
 			goto next;
+		/*
+		 * inode(x) | CP | inode(x) | dnode(F)
+		 * In this case, we can lose the latest inode(x).
+		 * So, call __recover_inode for the inode update.
+		 */
+		if (IS_INODE(page))
+			__recover_inode(entry->inode, page);
 
 		err = do_recover_data(sbi, entry->inode, page, blkaddr);
 		if (err)
-- 
1.8.5.2 (Apple Git-48)


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH] f2fs: Fix recover when nid of non-inode dnode < nid of inode
  2014-09-12  5:13                 ` Jaegeuk Kim
@ 2014-09-12  7:34                     ` Huang Ying
  0 siblings, 0 replies; 31+ messages in thread
From: Huang Ying @ 2014-09-12  7:34 UTC (permalink / raw)
  To: Jaegeuk Kim; +Cc: huang ying, Changman Lee, linux-f2fs-devel, LKML

On Thu, 2014-09-11 at 22:13 -0700, Jaegeuk Kim wrote:
> On Thu, Sep 11, 2014 at 08:25:17PM +0800, Huang Ying wrote:
> > 
> > On Wed, 2014-09-10 at 22:37 -0700, Jaegeuk Kim wrote:
> > > On Wed, Sep 10, 2014 at 07:08:32PM +0800, huang ying wrote:
> > > > On Wed, Sep 10, 2014 at 3:21 PM, Jaegeuk Kim <jaegeuk@kernel.org> wrote:
> > > > 
> > > > > On Tue, Sep 09, 2014 at 07:31:49PM +0800, huang ying wrote:
> > > > > > On Tue, Sep 9, 2014 at 3:09 PM, Jaegeuk Kim <jaegeuk@kernel.org> wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > On Tue, Sep 09, 2014 at 01:39:30PM +0800, Huang Ying wrote:
> > > > > > > > On Mon, 2014-09-08 at 22:23 -0700, Jaegeuk Kim wrote:
> > > > > > > > > Hi Huang,
> > > > > > > > >
> > > > > > > > > On Mon, Sep 08, 2014 at 07:38:26PM +0800, Huang Ying wrote:
> > > > > > > > > > For fsync, if the nid of a non-inode dnode < nid of inode and the
> > > > > > > > > > inode is not checkpointed.  The non-inode dnode may be written
> > > > > before
> > > > > > > > > > inode.  So in find_fsync_dnodes, f2fs_iget will fail, cause the
> > > > > > > > > > recovery fail.
> > > > > > > > > >
> > > > > > > > > > Usually, inode will be allocated before non-inode dnode, so the
> > > > > nid
> > > > > > > of
> > > > > > > > > > inode < nid of non-inode dnode.  But it is possible for the
> > > > > reverse.
> > > > > > > > > > For example, because of alloc_nid_failed.
> > > > > > > > > >
> > > > > > > > > > This is fixed via ignoring non-inode dnode before inode dnode in
> > > > > > > > > > find_fsync_dnodes.
> > > > > > > > > >
> > > > > > > > > > The patch was tested via allocating nid reversely via a debugging
> > > > > > > > > > patch, that is, from big number to small number.
> > > > > > > > > >
> > > > > > > > > > Signed-off-by: Huang, Ying <ying.huang@intel.com>
> > > > > > > > > > ---
> > > > > > > > > >  fs/f2fs/recovery.c |    7 ++++---
> > > > > > > > > >  1 file changed, 4 insertions(+), 3 deletions(-)
> > > > > > > > > >
> > > > > > > > > > --- a/fs/f2fs/recovery.c
> > > > > > > > > > +++ b/fs/f2fs/recovery.c
> > > > > > > > > > @@ -172,8 +172,8 @@ static int find_fsync_dnodes(struct f2fs
> > > > > > > > > >                   if (IS_INODE(page) && is_dent_dnode(page))
> > > > > > > > > >                           set_inode_flag(F2FS_I(entry->inode),
> > > > > > > > > >                                                   FI_INC_LINK);
> > > > > > > > > > -         } else {
> > > > > > > > > > -                 if (IS_INODE(page) && is_dent_dnode(page)) {
> > > > > > > > >
> > > > > > > > > If this is not inode block, we should add this inode to recover its
> > > > > > > data blocks.
> > > > > > > >
> > > > > > > > Is it possible that there is only non-inode dnode but no inode when
> > > > > > > > find_fsync_dnodes checking dnodes?  Per my understanding, any
> > > > > changes to
> > > > > > > > file will cause inode page dirty (for example, mtime changed), so
> > > > > that
> > > > > > > > we will write inode block.  Is it right?  If so, the solution in this
> > > > > > > > patch should work too.
> > > > > > >
> > > > > > > Your description says that f2fs_iget will fail, which causes the
> > > > > recovery
> > > > > > > fail.
> > > > > > > So, I thought it would be better to handle the f2fs_iget failure
> > > > > directly.
> > > > > > >
> > > > > >
> > > > > > Yes.  That is another way to fix the issue.
> > > > > >
> > > > > >
> > > > > > > In addition, we cannot guarantee the write order of dnode and inode.
> > > > > > > For exmaple,
> > > > > > > 1. the inode is written by flusher or kswapd, then,
> > > > > > > 2. f2fs_sync_file writes its dnode.
> > > > > > >
> > > > > > > In that case, we can get only non-inode dnode in the node chain, since
> > > > > the
> > > > > > > inode
> > > > > > > has not fsync_mark.
> > > > > > >
> > > > > >
> > > > > > I think your solution is better here, but does not fix all scenarios.  If
> > > > > > the inode is checkpointed, the file can be recovered, although the inode
> > > > > > information may be not up to date.  But if the inode is not checkpointed,
> > > > > > f2fs_iget will fail too and recover will fail.
> > > > >
> > > > > Ok, let me consider your scenarios.
> > > > >
> > > > > Term: F: fsync_mark, D: dentry_mark
> > > > >
> > > > > 1. inode(x) | CP | inode(x) | dnode(F)
> > > > >  -> Lose the latest inode(x). Need to fix.
> > > > >
> > > > > 2. inode(x) | CP | dnode(F) | inode(x)
> > > > >  -> Impossible, but recover latest dnode(F)
> > > > >
> > > > > 3. CP | inode(x) | dnode(F)
> > > > >  -> Need to write inode(DF) in f2fs_sync_file.
> > > > >
> > > > > 4. CP | dnode(F) | inode(DF)
> > > > >  -> If f2fs_iget fails, then goto next.
> > > > >
> > > > > 5. CP | dnode(F) | inode(x)
> > > > >  -> If f2fs_iget fails, then goto next. But, this is an impossible
> > > > > scenario.
> > > > >     Drop this dnode(F).
> > > > >
> > > > > Indeed, there were some missing scenarios.
> > > > >
> > > > > So, how about this patch?
> > > > >
> > > > > From 552dc68c5f07a335d7b55c197bab531efb135521 Mon Sep 17 00:00:00 2001
> > > > > From: Jaegeuk Kim <jaegeuk@kernel.org>
> > > > > Date: Wed, 10 Sep 2014 00:16:34 -0700
> > > > > Subject: [PATCH] f2fs: fix roll-forward missing scenarios
> > > > >
> > > > > We can summarize the roll forward recovery scenarios as follows.
> > > > >
> > > > > [Term] F: fsync_mark, D: dentry_mark
> > > > >
> > > > > 1. inode(x) | CP | inode(x) | dnode(F)
> > > > > -> Update the latest inode(x).
> > > > >
> > > > > 2. inode(x) | CP | inode(F) | dnode(F)
> > > > > -> No problem.
> > > > >
> > > > > 3. inode(x) | CP | dnode(F) | inode(x)
> > > > > -> Impossible, but recover latest dnode(F)
> > > > >
> > > > 
> > > > I think this is possible.  If f2fs_sync_file runs concurrently with
> > > > writeback. f2fs_sync_file written dnode(F), then writeback written inode(x).
> > > 
> > > If the inode(x) was written, f2fs_sync_file will do write the inode again with
> > > fsync_mark. So, dnode(F) | inode(x) | inode(F) should be shown.
> > > 
> > > In f2fs_sync_file,
> > > 	...
> > > 	while (!sync_node_pages(sbi, ino, &wbc)) {
> > > 		if (fsync_mark_done(sbi, ino))
> > > 			goto out;
> > > 		mark_inode_dirty_sync(inode);
> > > 		ret = f2fs_write_inode(inode, NULL);
> > > 		if (ret)
> > > 			goto out;
> > > 	}
> > > 	...
> > 
> > Is the following situation possible?
> > 
> > f2fs_sync_file				<writeback>
> >   sync_node_pages			  f2fs_write_node_pages
> >     write dnode(F)			    sync_node_pages
> > 					      write inode(x) /* clear PAGECACHE_TAG_DIRTY */
> > 
> > 
> > That is, f2fs_sync_file run parallel with <writeback>.  The
> > sync_node_pages above will return 1, because dnode(F) is written.
> > inode(x) is written by <writeback> path.  And because
> > PAGECACHE_TAG_DIRTY is cleared, it is possible that sync_node_pages
> > called by f2fs_sync_file does not write inode(F).
> 
> I think Chao's comment would work.
> How about this patch?
> 
> From 32fe5ff49d2c78d3be4cf3638cc64ae71cf44549 Mon Sep 17 00:00:00 2001
> From: Jaegeuk Kim <jaegeuk@kernel.org>
> Date: Wed, 10 Sep 2014 00:16:34 -0700
> Subject: [PATCH] f2fs: fix roll-forward missing scenarios
> 
> We can summarize the roll forward recovery scenarios as follows.
> 
> [Term] F: fsync_mark, D: dentry_mark
> 
> 1. inode(x) | CP | inode(x) | dnode(F)
> -> Update the latest inode(x).
> 
> 2. inode(x) | CP | inode(F) | dnode(F)
> -> No problem.
> 
> 3. inode(x) | CP | dnode(F) | inode(x)
> -> Recover to the latest dnode(F), and drop the last inode(x)
> 
> 4. inode(x) | CP | dnode(F) | inode(F)
> -> No problem.
> 
> 5. CP | inode(x) | dnode(F)
> -> The inode(DF) was missing. Should drop this dnode(F).
> 
> 6. CP | inode(DF) | dnode(F)
> -> No problem.
> 
> 7. CP | dnode(F) | inode(DF)
> -> If f2fs_iget fails, then goto next to find inode(DF).
> 
> 8. CP | dnode(F) | inode(x)
> -> If f2fs_iget fails, then goto next to find inode(DF).
>    But it will fail due to no inode(DF).
> 
> So, this patch adds some missing points such as #1, #5, #7, and #8.
> 
> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> ---
>  fs/f2fs/file.c     | 20 ++++++++++++----
>  fs/f2fs/node.c     | 11 ++++++++-
>  fs/f2fs/recovery.c | 70 +++++++++++++++++++++++++++++++++++++++++++++---------
>  3 files changed, 85 insertions(+), 16 deletions(-)
> 
> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> index e7681c3..70f5d4b 100644
> --- a/fs/f2fs/file.c
> +++ b/fs/f2fs/file.c
> @@ -206,15 +206,27 @@ int f2fs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
>  			up_write(&fi->i_sem);
>  		}
>  	} else {
> -		/* if there is no written node page, write its inode page */
> -		while (!sync_node_pages(sbi, ino, &wbc)) {
> -			if (fsync_mark_done(sbi, ino))
> -				goto out;
> +sync_nodes:
> +		sync_node_pages(sbi, ino, &wbc);
> +
> +		/*
> +		 * inode(x) | CP | inode(x) | dnode(F)
> +		 *  -> ok

Is it acceptable that we turn this to:

inode(x) | CPU | inode (x) | dnode (F) | inode(F)

> +		 * inode(x) | CP | dnode(F) | inode(x)
> +		 *  -> inode(x) | CP | dnode(F) | inode(x) | inode(F)
> +		 * CP | inode(x) | dnode(F)
> +		 *  -> CP | inode(x) | dnode(F) | inode(DF)
> +		 * CP | dnode(F) | inode(x)
> +		 *  -> CP | dnode(F) | inode(x) | inode(DF)
> +		 */
> +		if (!fsync_mark_done(sbi, ino)) {
>  			mark_inode_dirty_sync(inode);
>  			ret = f2fs_write_inode(inode, NULL);
>  			if (ret)
>  				goto out;
> +			goto sync_nodes;
>  		}
> +
>  		ret = wait_on_node_pages_writeback(sbi, ino);
>  		if (ret)
>  			goto out;
> diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
> index b32eb56..653aa71 100644
> --- a/fs/f2fs/node.c
> +++ b/fs/f2fs/node.c
> @@ -248,8 +248,17 @@ retry:
>  
>  	/* update fsync_mark if its inode nat entry is still alive */
>  	e = __lookup_nat_cache(nm_i, ni->ino);
> -	if (e)
> +	if (e) {
> +		/*
> +		 * CP | inode(x) | dnode(F)
> +		 *  -> CP | inode(x) | dnode(F) | inode(DF)
> +		 */
> +		if (!e->checkpointed && !e->fsync_done &&
> +				ni->ino != ni->nid && fsync_done)
> +			goto skip;
>  		e->fsync_done = fsync_done;
> +	}
> +skip:
>  	write_unlock(&nm_i->nat_tree_lock);
>  }

I don't understand why we need so complex logic?  Why not just let
e->fsync_done reflect just latest is_fsync_dnode(page)?

It appears that in f2fs_sync_file, what we need is just whether inode
page has fsync mark or not.

Best Regards,
Huang, Ying
 
> diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c
> index 6c5a74a..3736728 100644
> --- a/fs/f2fs/recovery.c
> +++ b/fs/f2fs/recovery.c
> @@ -14,6 +14,36 @@
>  #include "node.h"
>  #include "segment.h"
>  
> +/*
> + * Roll forward recovery scenarios.
> + *
> + * [Term] F: fsync_mark, D: dentry_mark
> + *
> + * 1. inode(x) | CP | inode(x) | dnode(F)
> + * -> Update the latest inode(x).
> + *
> + * 2. inode(x) | CP | inode(F) | dnode(F)
> + * -> No problem.
> + *
> + * 3. inode(x) | CP | dnode(F) | inode(x)
> + * -> Recover to the latest dnode(F), and drop the last inode(x)
> + *
> + * 4. inode(x) | CP | dnode(F) | inode(F)
> + * -> No problem.
> + *
> + * 5. CP | inode(x) | dnode(F)
> + * -> The inode(DF) was missing. Should drop this dnode(F).
> + *
> + * 6. CP | inode(DF) | dnode(F)
> + * -> No problem.
> + *
> + * 7. CP | dnode(F) | inode(DF)
> + * -> If f2fs_iget fails, then goto next to find inode(DF).
> + *
> + * 8. CP | dnode(F) | inode(x)
> + * -> If f2fs_iget fails, then goto next to find inode(DF).
> + *    But it will fail due to no inode(DF).
> + */
>  static struct kmem_cache *fsync_entry_slab;
>  
>  bool space_for_roll_forward(struct f2fs_sb_info *sbi)
> @@ -110,27 +140,32 @@ out:
>  	return err;
>  }
>  
> -static int recover_inode(struct inode *inode, struct page *node_page)
> +static void __recover_inode(struct inode *inode, struct page *page)
>  {
> -	struct f2fs_inode *raw_inode = F2FS_INODE(node_page);
> +	struct f2fs_inode *raw = F2FS_INODE(page);
> +
> +	inode->i_mode = le16_to_cpu(raw->i_mode);
> +	i_size_write(inode, le64_to_cpu(raw->i_size));
> +	inode->i_atime.tv_sec = le64_to_cpu(raw->i_mtime);
> +	inode->i_ctime.tv_sec = le64_to_cpu(raw->i_ctime);
> +	inode->i_mtime.tv_sec = le64_to_cpu(raw->i_mtime);
> +	inode->i_atime.tv_nsec = le32_to_cpu(raw->i_mtime_nsec);
> +	inode->i_ctime.tv_nsec = le32_to_cpu(raw->i_ctime_nsec);
> +	inode->i_mtime.tv_nsec = le32_to_cpu(raw->i_mtime_nsec);
> +}
>  
> +static int recover_inode(struct inode *inode, struct page *node_page)
> +{
>  	if (!IS_INODE(node_page))
>  		return 0;
>  
> -	inode->i_mode = le16_to_cpu(raw_inode->i_mode);
> -	i_size_write(inode, le64_to_cpu(raw_inode->i_size));
> -	inode->i_atime.tv_sec = le64_to_cpu(raw_inode->i_mtime);
> -	inode->i_ctime.tv_sec = le64_to_cpu(raw_inode->i_ctime);
> -	inode->i_mtime.tv_sec = le64_to_cpu(raw_inode->i_mtime);
> -	inode->i_atime.tv_nsec = le32_to_cpu(raw_inode->i_mtime_nsec);
> -	inode->i_ctime.tv_nsec = le32_to_cpu(raw_inode->i_ctime_nsec);
> -	inode->i_mtime.tv_nsec = le32_to_cpu(raw_inode->i_mtime_nsec);
> +	__recover_inode(inode, node_page);
>  
>  	if (is_dent_dnode(node_page))
>  		return recover_dentry(node_page, inode);
>  
>  	f2fs_msg(inode->i_sb, KERN_NOTICE, "recover_inode: ino = %x, name = %s",
> -			ino_of_node(node_page), raw_inode->i_name);
> +			ino_of_node(node_page), F2FS_INODE(node_page)->i_name);
>  	return 0;
>  }
>  
> @@ -186,10 +221,16 @@ static int find_fsync_dnodes(struct f2fs_sb_info *sbi, struct list_head *head)
>  				break;
>  			}
>  
> +			/*
> +			 * CP | dnode(F) | inode(DF)
> +			 * For this case, we should not give up now.
> +			 */
>  			entry->inode = f2fs_iget(sbi->sb, ino_of_node(page));
>  			if (IS_ERR(entry->inode)) {
>  				err = PTR_ERR(entry->inode);
>  				kmem_cache_free(fsync_entry_slab, entry);
> +				if (err == -ENOENT)
> +					goto next;
>  				break;
>  			}
>  			list_add_tail(&entry->list, head);
> @@ -416,6 +457,13 @@ static int recover_data(struct f2fs_sb_info *sbi,
>  		entry = get_fsync_inode(head, ino_of_node(page));
>  		if (!entry)
>  			goto next;
> +		/*
> +		 * inode(x) | CP | inode(x) | dnode(F)
> +		 * In this case, we can lose the latest inode(x).
> +		 * So, call __recover_inode for the inode update.
> +		 */
> +		if (IS_INODE(page))
> +			__recover_inode(entry->inode, page);
>  
>  		err = do_recover_data(sbi, entry->inode, page, blkaddr);
>  		if (err)



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] f2fs: Fix recover when nid of non-inode dnode < nid of inode
@ 2014-09-12  7:34                     ` Huang Ying
  0 siblings, 0 replies; 31+ messages in thread
From: Huang Ying @ 2014-09-12  7:34 UTC (permalink / raw)
  To: Jaegeuk Kim; +Cc: huang ying, LKML, linux-f2fs-devel

On Thu, 2014-09-11 at 22:13 -0700, Jaegeuk Kim wrote:
> On Thu, Sep 11, 2014 at 08:25:17PM +0800, Huang Ying wrote:
> > 
> > On Wed, 2014-09-10 at 22:37 -0700, Jaegeuk Kim wrote:
> > > On Wed, Sep 10, 2014 at 07:08:32PM +0800, huang ying wrote:
> > > > On Wed, Sep 10, 2014 at 3:21 PM, Jaegeuk Kim <jaegeuk@kernel.org> wrote:
> > > > 
> > > > > On Tue, Sep 09, 2014 at 07:31:49PM +0800, huang ying wrote:
> > > > > > On Tue, Sep 9, 2014 at 3:09 PM, Jaegeuk Kim <jaegeuk@kernel.org> wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > On Tue, Sep 09, 2014 at 01:39:30PM +0800, Huang Ying wrote:
> > > > > > > > On Mon, 2014-09-08 at 22:23 -0700, Jaegeuk Kim wrote:
> > > > > > > > > Hi Huang,
> > > > > > > > >
> > > > > > > > > On Mon, Sep 08, 2014 at 07:38:26PM +0800, Huang Ying wrote:
> > > > > > > > > > For fsync, if the nid of a non-inode dnode < nid of inode and the
> > > > > > > > > > inode is not checkpointed.  The non-inode dnode may be written
> > > > > before
> > > > > > > > > > inode.  So in find_fsync_dnodes, f2fs_iget will fail, cause the
> > > > > > > > > > recovery fail.
> > > > > > > > > >
> > > > > > > > > > Usually, inode will be allocated before non-inode dnode, so the
> > > > > nid
> > > > > > > of
> > > > > > > > > > inode < nid of non-inode dnode.  But it is possible for the
> > > > > reverse.
> > > > > > > > > > For example, because of alloc_nid_failed.
> > > > > > > > > >
> > > > > > > > > > This is fixed via ignoring non-inode dnode before inode dnode in
> > > > > > > > > > find_fsync_dnodes.
> > > > > > > > > >
> > > > > > > > > > The patch was tested via allocating nid reversely via a debugging
> > > > > > > > > > patch, that is, from big number to small number.
> > > > > > > > > >
> > > > > > > > > > Signed-off-by: Huang, Ying <ying.huang@intel.com>
> > > > > > > > > > ---
> > > > > > > > > >  fs/f2fs/recovery.c |    7 ++++---
> > > > > > > > > >  1 file changed, 4 insertions(+), 3 deletions(-)
> > > > > > > > > >
> > > > > > > > > > --- a/fs/f2fs/recovery.c
> > > > > > > > > > +++ b/fs/f2fs/recovery.c
> > > > > > > > > > @@ -172,8 +172,8 @@ static int find_fsync_dnodes(struct f2fs
> > > > > > > > > >                   if (IS_INODE(page) && is_dent_dnode(page))
> > > > > > > > > >                           set_inode_flag(F2FS_I(entry->inode),
> > > > > > > > > >                                                   FI_INC_LINK);
> > > > > > > > > > -         } else {
> > > > > > > > > > -                 if (IS_INODE(page) && is_dent_dnode(page)) {
> > > > > > > > >
> > > > > > > > > If this is not inode block, we should add this inode to recover its
> > > > > > > data blocks.
> > > > > > > >
> > > > > > > > Is it possible that there is only non-inode dnode but no inode when
> > > > > > > > find_fsync_dnodes checking dnodes?  Per my understanding, any
> > > > > changes to
> > > > > > > > file will cause inode page dirty (for example, mtime changed), so
> > > > > that
> > > > > > > > we will write inode block.  Is it right?  If so, the solution in this
> > > > > > > > patch should work too.
> > > > > > >
> > > > > > > Your description says that f2fs_iget will fail, which causes the
> > > > > recovery
> > > > > > > fail.
> > > > > > > So, I thought it would be better to handle the f2fs_iget failure
> > > > > directly.
> > > > > > >
> > > > > >
> > > > > > Yes.  That is another way to fix the issue.
> > > > > >
> > > > > >
> > > > > > > In addition, we cannot guarantee the write order of dnode and inode.
> > > > > > > For exmaple,
> > > > > > > 1. the inode is written by flusher or kswapd, then,
> > > > > > > 2. f2fs_sync_file writes its dnode.
> > > > > > >
> > > > > > > In that case, we can get only non-inode dnode in the node chain, since
> > > > > the
> > > > > > > inode
> > > > > > > has not fsync_mark.
> > > > > > >
> > > > > >
> > > > > > I think your solution is better here, but does not fix all scenarios.  If
> > > > > > the inode is checkpointed, the file can be recovered, although the inode
> > > > > > information may be not up to date.  But if the inode is not checkpointed,
> > > > > > f2fs_iget will fail too and recover will fail.
> > > > >
> > > > > Ok, let me consider your scenarios.
> > > > >
> > > > > Term: F: fsync_mark, D: dentry_mark
> > > > >
> > > > > 1. inode(x) | CP | inode(x) | dnode(F)
> > > > >  -> Lose the latest inode(x). Need to fix.
> > > > >
> > > > > 2. inode(x) | CP | dnode(F) | inode(x)
> > > > >  -> Impossible, but recover latest dnode(F)
> > > > >
> > > > > 3. CP | inode(x) | dnode(F)
> > > > >  -> Need to write inode(DF) in f2fs_sync_file.
> > > > >
> > > > > 4. CP | dnode(F) | inode(DF)
> > > > >  -> If f2fs_iget fails, then goto next.
> > > > >
> > > > > 5. CP | dnode(F) | inode(x)
> > > > >  -> If f2fs_iget fails, then goto next. But, this is an impossible
> > > > > scenario.
> > > > >     Drop this dnode(F).
> > > > >
> > > > > Indeed, there were some missing scenarios.
> > > > >
> > > > > So, how about this patch?
> > > > >
> > > > > From 552dc68c5f07a335d7b55c197bab531efb135521 Mon Sep 17 00:00:00 2001
> > > > > From: Jaegeuk Kim <jaegeuk@kernel.org>
> > > > > Date: Wed, 10 Sep 2014 00:16:34 -0700
> > > > > Subject: [PATCH] f2fs: fix roll-forward missing scenarios
> > > > >
> > > > > We can summarize the roll forward recovery scenarios as follows.
> > > > >
> > > > > [Term] F: fsync_mark, D: dentry_mark
> > > > >
> > > > > 1. inode(x) | CP | inode(x) | dnode(F)
> > > > > -> Update the latest inode(x).
> > > > >
> > > > > 2. inode(x) | CP | inode(F) | dnode(F)
> > > > > -> No problem.
> > > > >
> > > > > 3. inode(x) | CP | dnode(F) | inode(x)
> > > > > -> Impossible, but recover latest dnode(F)
> > > > >
> > > > 
> > > > I think this is possible.  If f2fs_sync_file runs concurrently with
> > > > writeback. f2fs_sync_file written dnode(F), then writeback written inode(x).
> > > 
> > > If the inode(x) was written, f2fs_sync_file will do write the inode again with
> > > fsync_mark. So, dnode(F) | inode(x) | inode(F) should be shown.
> > > 
> > > In f2fs_sync_file,
> > > 	...
> > > 	while (!sync_node_pages(sbi, ino, &wbc)) {
> > > 		if (fsync_mark_done(sbi, ino))
> > > 			goto out;
> > > 		mark_inode_dirty_sync(inode);
> > > 		ret = f2fs_write_inode(inode, NULL);
> > > 		if (ret)
> > > 			goto out;
> > > 	}
> > > 	...
> > 
> > Is the following situation possible?
> > 
> > f2fs_sync_file				<writeback>
> >   sync_node_pages			  f2fs_write_node_pages
> >     write dnode(F)			    sync_node_pages
> > 					      write inode(x) /* clear PAGECACHE_TAG_DIRTY */
> > 
> > 
> > That is, f2fs_sync_file run parallel with <writeback>.  The
> > sync_node_pages above will return 1, because dnode(F) is written.
> > inode(x) is written by <writeback> path.  And because
> > PAGECACHE_TAG_DIRTY is cleared, it is possible that sync_node_pages
> > called by f2fs_sync_file does not write inode(F).
> 
> I think Chao's comment would work.
> How about this patch?
> 
> From 32fe5ff49d2c78d3be4cf3638cc64ae71cf44549 Mon Sep 17 00:00:00 2001
> From: Jaegeuk Kim <jaegeuk@kernel.org>
> Date: Wed, 10 Sep 2014 00:16:34 -0700
> Subject: [PATCH] f2fs: fix roll-forward missing scenarios
> 
> We can summarize the roll forward recovery scenarios as follows.
> 
> [Term] F: fsync_mark, D: dentry_mark
> 
> 1. inode(x) | CP | inode(x) | dnode(F)
> -> Update the latest inode(x).
> 
> 2. inode(x) | CP | inode(F) | dnode(F)
> -> No problem.
> 
> 3. inode(x) | CP | dnode(F) | inode(x)
> -> Recover to the latest dnode(F), and drop the last inode(x)
> 
> 4. inode(x) | CP | dnode(F) | inode(F)
> -> No problem.
> 
> 5. CP | inode(x) | dnode(F)
> -> The inode(DF) was missing. Should drop this dnode(F).
> 
> 6. CP | inode(DF) | dnode(F)
> -> No problem.
> 
> 7. CP | dnode(F) | inode(DF)
> -> If f2fs_iget fails, then goto next to find inode(DF).
> 
> 8. CP | dnode(F) | inode(x)
> -> If f2fs_iget fails, then goto next to find inode(DF).
>    But it will fail due to no inode(DF).
> 
> So, this patch adds some missing points such as #1, #5, #7, and #8.
> 
> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> ---
>  fs/f2fs/file.c     | 20 ++++++++++++----
>  fs/f2fs/node.c     | 11 ++++++++-
>  fs/f2fs/recovery.c | 70 +++++++++++++++++++++++++++++++++++++++++++++---------
>  3 files changed, 85 insertions(+), 16 deletions(-)
> 
> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> index e7681c3..70f5d4b 100644
> --- a/fs/f2fs/file.c
> +++ b/fs/f2fs/file.c
> @@ -206,15 +206,27 @@ int f2fs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
>  			up_write(&fi->i_sem);
>  		}
>  	} else {
> -		/* if there is no written node page, write its inode page */
> -		while (!sync_node_pages(sbi, ino, &wbc)) {
> -			if (fsync_mark_done(sbi, ino))
> -				goto out;
> +sync_nodes:
> +		sync_node_pages(sbi, ino, &wbc);
> +
> +		/*
> +		 * inode(x) | CP | inode(x) | dnode(F)
> +		 *  -> ok

Is it acceptable that we turn this to:

inode(x) | CPU | inode (x) | dnode (F) | inode(F)

> +		 * inode(x) | CP | dnode(F) | inode(x)
> +		 *  -> inode(x) | CP | dnode(F) | inode(x) | inode(F)
> +		 * CP | inode(x) | dnode(F)
> +		 *  -> CP | inode(x) | dnode(F) | inode(DF)
> +		 * CP | dnode(F) | inode(x)
> +		 *  -> CP | dnode(F) | inode(x) | inode(DF)
> +		 */
> +		if (!fsync_mark_done(sbi, ino)) {
>  			mark_inode_dirty_sync(inode);
>  			ret = f2fs_write_inode(inode, NULL);
>  			if (ret)
>  				goto out;
> +			goto sync_nodes;
>  		}
> +
>  		ret = wait_on_node_pages_writeback(sbi, ino);
>  		if (ret)
>  			goto out;
> diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
> index b32eb56..653aa71 100644
> --- a/fs/f2fs/node.c
> +++ b/fs/f2fs/node.c
> @@ -248,8 +248,17 @@ retry:
>  
>  	/* update fsync_mark if its inode nat entry is still alive */
>  	e = __lookup_nat_cache(nm_i, ni->ino);
> -	if (e)
> +	if (e) {
> +		/*
> +		 * CP | inode(x) | dnode(F)
> +		 *  -> CP | inode(x) | dnode(F) | inode(DF)
> +		 */
> +		if (!e->checkpointed && !e->fsync_done &&
> +				ni->ino != ni->nid && fsync_done)
> +			goto skip;
>  		e->fsync_done = fsync_done;
> +	}
> +skip:
>  	write_unlock(&nm_i->nat_tree_lock);
>  }

I don't understand why we need so complex logic?  Why not just let
e->fsync_done reflect just latest is_fsync_dnode(page)?

It appears that in f2fs_sync_file, what we need is just whether inode
page has fsync mark or not.

Best Regards,
Huang, Ying
 
> diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c
> index 6c5a74a..3736728 100644
> --- a/fs/f2fs/recovery.c
> +++ b/fs/f2fs/recovery.c
> @@ -14,6 +14,36 @@
>  #include "node.h"
>  #include "segment.h"
>  
> +/*
> + * Roll forward recovery scenarios.
> + *
> + * [Term] F: fsync_mark, D: dentry_mark
> + *
> + * 1. inode(x) | CP | inode(x) | dnode(F)
> + * -> Update the latest inode(x).
> + *
> + * 2. inode(x) | CP | inode(F) | dnode(F)
> + * -> No problem.
> + *
> + * 3. inode(x) | CP | dnode(F) | inode(x)
> + * -> Recover to the latest dnode(F), and drop the last inode(x)
> + *
> + * 4. inode(x) | CP | dnode(F) | inode(F)
> + * -> No problem.
> + *
> + * 5. CP | inode(x) | dnode(F)
> + * -> The inode(DF) was missing. Should drop this dnode(F).
> + *
> + * 6. CP | inode(DF) | dnode(F)
> + * -> No problem.
> + *
> + * 7. CP | dnode(F) | inode(DF)
> + * -> If f2fs_iget fails, then goto next to find inode(DF).
> + *
> + * 8. CP | dnode(F) | inode(x)
> + * -> If f2fs_iget fails, then goto next to find inode(DF).
> + *    But it will fail due to no inode(DF).
> + */
>  static struct kmem_cache *fsync_entry_slab;
>  
>  bool space_for_roll_forward(struct f2fs_sb_info *sbi)
> @@ -110,27 +140,32 @@ out:
>  	return err;
>  }
>  
> -static int recover_inode(struct inode *inode, struct page *node_page)
> +static void __recover_inode(struct inode *inode, struct page *page)
>  {
> -	struct f2fs_inode *raw_inode = F2FS_INODE(node_page);
> +	struct f2fs_inode *raw = F2FS_INODE(page);
> +
> +	inode->i_mode = le16_to_cpu(raw->i_mode);
> +	i_size_write(inode, le64_to_cpu(raw->i_size));
> +	inode->i_atime.tv_sec = le64_to_cpu(raw->i_mtime);
> +	inode->i_ctime.tv_sec = le64_to_cpu(raw->i_ctime);
> +	inode->i_mtime.tv_sec = le64_to_cpu(raw->i_mtime);
> +	inode->i_atime.tv_nsec = le32_to_cpu(raw->i_mtime_nsec);
> +	inode->i_ctime.tv_nsec = le32_to_cpu(raw->i_ctime_nsec);
> +	inode->i_mtime.tv_nsec = le32_to_cpu(raw->i_mtime_nsec);
> +}
>  
> +static int recover_inode(struct inode *inode, struct page *node_page)
> +{
>  	if (!IS_INODE(node_page))
>  		return 0;
>  
> -	inode->i_mode = le16_to_cpu(raw_inode->i_mode);
> -	i_size_write(inode, le64_to_cpu(raw_inode->i_size));
> -	inode->i_atime.tv_sec = le64_to_cpu(raw_inode->i_mtime);
> -	inode->i_ctime.tv_sec = le64_to_cpu(raw_inode->i_ctime);
> -	inode->i_mtime.tv_sec = le64_to_cpu(raw_inode->i_mtime);
> -	inode->i_atime.tv_nsec = le32_to_cpu(raw_inode->i_mtime_nsec);
> -	inode->i_ctime.tv_nsec = le32_to_cpu(raw_inode->i_ctime_nsec);
> -	inode->i_mtime.tv_nsec = le32_to_cpu(raw_inode->i_mtime_nsec);
> +	__recover_inode(inode, node_page);
>  
>  	if (is_dent_dnode(node_page))
>  		return recover_dentry(node_page, inode);
>  
>  	f2fs_msg(inode->i_sb, KERN_NOTICE, "recover_inode: ino = %x, name = %s",
> -			ino_of_node(node_page), raw_inode->i_name);
> +			ino_of_node(node_page), F2FS_INODE(node_page)->i_name);
>  	return 0;
>  }
>  
> @@ -186,10 +221,16 @@ static int find_fsync_dnodes(struct f2fs_sb_info *sbi, struct list_head *head)
>  				break;
>  			}
>  
> +			/*
> +			 * CP | dnode(F) | inode(DF)
> +			 * For this case, we should not give up now.
> +			 */
>  			entry->inode = f2fs_iget(sbi->sb, ino_of_node(page));
>  			if (IS_ERR(entry->inode)) {
>  				err = PTR_ERR(entry->inode);
>  				kmem_cache_free(fsync_entry_slab, entry);
> +				if (err == -ENOENT)
> +					goto next;
>  				break;
>  			}
>  			list_add_tail(&entry->list, head);
> @@ -416,6 +457,13 @@ static int recover_data(struct f2fs_sb_info *sbi,
>  		entry = get_fsync_inode(head, ino_of_node(page));
>  		if (!entry)
>  			goto next;
> +		/*
> +		 * inode(x) | CP | inode(x) | dnode(F)
> +		 * In this case, we can lose the latest inode(x).
> +		 * So, call __recover_inode for the inode update.
> +		 */
> +		if (IS_INODE(page))
> +			__recover_inode(entry->inode, page);
>  
>  		err = do_recover_data(sbi, entry->inode, page, blkaddr);
>  		if (err)



------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] f2fs: Fix recover when nid of non-inode dnode < nid of inode
  2014-09-12  7:34                     ` Huang Ying
@ 2014-09-13 14:23                       ` Huang Ying
  -1 siblings, 0 replies; 31+ messages in thread
From: Huang Ying @ 2014-09-13 14:23 UTC (permalink / raw)
  To: Jaegeuk Kim; +Cc: huang ying, Changman Lee, linux-f2fs-devel, LKML

On Fri, 2014-09-12 at 15:34 +0800, Huang Ying wrote:
> On Thu, 2014-09-11 at 22:13 -0700, Jaegeuk Kim wrote:
> > On Thu, Sep 11, 2014 at 08:25:17PM +0800, Huang Ying wrote:
> > > 
> > > On Wed, 2014-09-10 at 22:37 -0700, Jaegeuk Kim wrote:
> > > > On Wed, Sep 10, 2014 at 07:08:32PM +0800, huang ying wrote:
> > > > > On Wed, Sep 10, 2014 at 3:21 PM, Jaegeuk Kim <jaegeuk@kernel.org> wrote:
> > > > > 
> > > > > > On Tue, Sep 09, 2014 at 07:31:49PM +0800, huang ying wrote:
> > > > > > > On Tue, Sep 9, 2014 at 3:09 PM, Jaegeuk Kim <jaegeuk@kernel.org> wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > On Tue, Sep 09, 2014 at 01:39:30PM +0800, Huang Ying wrote:
> > > > > > > > > On Mon, 2014-09-08 at 22:23 -0700, Jaegeuk Kim wrote:
> > > > > > > > > > Hi Huang,
> > > > > > > > > >
> > > > > > > > > > On Mon, Sep 08, 2014 at 07:38:26PM +0800, Huang Ying wrote:
> > > > > > > > > > > For fsync, if the nid of a non-inode dnode < nid of inode and the
> > > > > > > > > > > inode is not checkpointed.  The non-inode dnode may be written
> > > > > > before
> > > > > > > > > > > inode.  So in find_fsync_dnodes, f2fs_iget will fail, cause the
> > > > > > > > > > > recovery fail.
> > > > > > > > > > >
> > > > > > > > > > > Usually, inode will be allocated before non-inode dnode, so the
> > > > > > nid
> > > > > > > > of
> > > > > > > > > > > inode < nid of non-inode dnode.  But it is possible for the
> > > > > > reverse.
> > > > > > > > > > > For example, because of alloc_nid_failed.
> > > > > > > > > > >
> > > > > > > > > > > This is fixed via ignoring non-inode dnode before inode dnode in
> > > > > > > > > > > find_fsync_dnodes.
> > > > > > > > > > >
> > > > > > > > > > > The patch was tested via allocating nid reversely via a debugging
> > > > > > > > > > > patch, that is, from big number to small number.
> > > > > > > > > > >
> > > > > > > > > > > Signed-off-by: Huang, Ying <ying.huang@intel.com>
> > > > > > > > > > > ---
> > > > > > > > > > >  fs/f2fs/recovery.c |    7 ++++---
> > > > > > > > > > >  1 file changed, 4 insertions(+), 3 deletions(-)
> > > > > > > > > > >
> > > > > > > > > > > --- a/fs/f2fs/recovery.c
> > > > > > > > > > > +++ b/fs/f2fs/recovery.c
> > > > > > > > > > > @@ -172,8 +172,8 @@ static int find_fsync_dnodes(struct f2fs
> > > > > > > > > > >                   if (IS_INODE(page) && is_dent_dnode(page))
> > > > > > > > > > >                           set_inode_flag(F2FS_I(entry->inode),
> > > > > > > > > > >                                                   FI_INC_LINK);
> > > > > > > > > > > -         } else {
> > > > > > > > > > > -                 if (IS_INODE(page) && is_dent_dnode(page)) {
> > > > > > > > > >
> > > > > > > > > > If this is not inode block, we should add this inode to recover its
> > > > > > > > data blocks.
> > > > > > > > >
> > > > > > > > > Is it possible that there is only non-inode dnode but no inode when
> > > > > > > > > find_fsync_dnodes checking dnodes?  Per my understanding, any
> > > > > > changes to
> > > > > > > > > file will cause inode page dirty (for example, mtime changed), so
> > > > > > that
> > > > > > > > > we will write inode block.  Is it right?  If so, the solution in this
> > > > > > > > > patch should work too.
> > > > > > > >
> > > > > > > > Your description says that f2fs_iget will fail, which causes the
> > > > > > recovery
> > > > > > > > fail.
> > > > > > > > So, I thought it would be better to handle the f2fs_iget failure
> > > > > > directly.
> > > > > > > >
> > > > > > >
> > > > > > > Yes.  That is another way to fix the issue.
> > > > > > >
> > > > > > >
> > > > > > > > In addition, we cannot guarantee the write order of dnode and inode.
> > > > > > > > For exmaple,
> > > > > > > > 1. the inode is written by flusher or kswapd, then,
> > > > > > > > 2. f2fs_sync_file writes its dnode.
> > > > > > > >
> > > > > > > > In that case, we can get only non-inode dnode in the node chain, since
> > > > > > the
> > > > > > > > inode
> > > > > > > > has not fsync_mark.
> > > > > > > >
> > > > > > >
> > > > > > > I think your solution is better here, but does not fix all scenarios.  If
> > > > > > > the inode is checkpointed, the file can be recovered, although the inode
> > > > > > > information may be not up to date.  But if the inode is not checkpointed,
> > > > > > > f2fs_iget will fail too and recover will fail.
> > > > > >
> > > > > > Ok, let me consider your scenarios.
> > > > > >
> > > > > > Term: F: fsync_mark, D: dentry_mark
> > > > > >
> > > > > > 1. inode(x) | CP | inode(x) | dnode(F)
> > > > > >  -> Lose the latest inode(x). Need to fix.
> > > > > >
> > > > > > 2. inode(x) | CP | dnode(F) | inode(x)
> > > > > >  -> Impossible, but recover latest dnode(F)
> > > > > >
> > > > > > 3. CP | inode(x) | dnode(F)
> > > > > >  -> Need to write inode(DF) in f2fs_sync_file.
> > > > > >
> > > > > > 4. CP | dnode(F) | inode(DF)
> > > > > >  -> If f2fs_iget fails, then goto next.
> > > > > >
> > > > > > 5. CP | dnode(F) | inode(x)
> > > > > >  -> If f2fs_iget fails, then goto next. But, this is an impossible
> > > > > > scenario.
> > > > > >     Drop this dnode(F).
> > > > > >
> > > > > > Indeed, there were some missing scenarios.
> > > > > >
> > > > > > So, how about this patch?
> > > > > >
> > > > > > From 552dc68c5f07a335d7b55c197bab531efb135521 Mon Sep 17 00:00:00 2001
> > > > > > From: Jaegeuk Kim <jaegeuk@kernel.org>
> > > > > > Date: Wed, 10 Sep 2014 00:16:34 -0700
> > > > > > Subject: [PATCH] f2fs: fix roll-forward missing scenarios
> > > > > >
> > > > > > We can summarize the roll forward recovery scenarios as follows.
> > > > > >
> > > > > > [Term] F: fsync_mark, D: dentry_mark
> > > > > >
> > > > > > 1. inode(x) | CP | inode(x) | dnode(F)
> > > > > > -> Update the latest inode(x).
> > > > > >
> > > > > > 2. inode(x) | CP | inode(F) | dnode(F)
> > > > > > -> No problem.
> > > > > >
> > > > > > 3. inode(x) | CP | dnode(F) | inode(x)
> > > > > > -> Impossible, but recover latest dnode(F)
> > > > > >
> > > > > 
> > > > > I think this is possible.  If f2fs_sync_file runs concurrently with
> > > > > writeback. f2fs_sync_file written dnode(F), then writeback written inode(x).
> > > > 
> > > > If the inode(x) was written, f2fs_sync_file will do write the inode again with
> > > > fsync_mark. So, dnode(F) | inode(x) | inode(F) should be shown.
> > > > 
> > > > In f2fs_sync_file,
> > > > 	...
> > > > 	while (!sync_node_pages(sbi, ino, &wbc)) {
> > > > 		if (fsync_mark_done(sbi, ino))
> > > > 			goto out;
> > > > 		mark_inode_dirty_sync(inode);
> > > > 		ret = f2fs_write_inode(inode, NULL);
> > > > 		if (ret)
> > > > 			goto out;
> > > > 	}
> > > > 	...
> > > 
> > > Is the following situation possible?
> > > 
> > > f2fs_sync_file				<writeback>
> > >   sync_node_pages			  f2fs_write_node_pages
> > >     write dnode(F)			    sync_node_pages
> > > 					      write inode(x) /* clear PAGECACHE_TAG_DIRTY */
> > > 
> > > 
> > > That is, f2fs_sync_file run parallel with <writeback>.  The
> > > sync_node_pages above will return 1, because dnode(F) is written.
> > > inode(x) is written by <writeback> path.  And because
> > > PAGECACHE_TAG_DIRTY is cleared, it is possible that sync_node_pages
> > > called by f2fs_sync_file does not write inode(F).
> > 
> > I think Chao's comment would work.
> > How about this patch?
> > 
> > From 32fe5ff49d2c78d3be4cf3638cc64ae71cf44549 Mon Sep 17 00:00:00 2001
> > From: Jaegeuk Kim <jaegeuk@kernel.org>
> > Date: Wed, 10 Sep 2014 00:16:34 -0700
> > Subject: [PATCH] f2fs: fix roll-forward missing scenarios
> > 
> > We can summarize the roll forward recovery scenarios as follows.
> > 
> > [Term] F: fsync_mark, D: dentry_mark
> > 
> > 1. inode(x) | CP | inode(x) | dnode(F)
> > -> Update the latest inode(x).
> > 
> > 2. inode(x) | CP | inode(F) | dnode(F)
> > -> No problem.
> > 
> > 3. inode(x) | CP | dnode(F) | inode(x)
> > -> Recover to the latest dnode(F), and drop the last inode(x)
> > 
> > 4. inode(x) | CP | dnode(F) | inode(F)
> > -> No problem.
> > 
> > 5. CP | inode(x) | dnode(F)
> > -> The inode(DF) was missing. Should drop this dnode(F).
> > 
> > 6. CP | inode(DF) | dnode(F)
> > -> No problem.
> > 
> > 7. CP | dnode(F) | inode(DF)
> > -> If f2fs_iget fails, then goto next to find inode(DF).
> > 
> > 8. CP | dnode(F) | inode(x)
> > -> If f2fs_iget fails, then goto next to find inode(DF).
> >    But it will fail due to no inode(DF).
> > 
> > So, this patch adds some missing points such as #1, #5, #7, and #8.
> > 
> > Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> > ---
> >  fs/f2fs/file.c     | 20 ++++++++++++----
> >  fs/f2fs/node.c     | 11 ++++++++-
> >  fs/f2fs/recovery.c | 70 +++++++++++++++++++++++++++++++++++++++++++++---------
> >  3 files changed, 85 insertions(+), 16 deletions(-)
> > 
> > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> > index e7681c3..70f5d4b 100644
> > --- a/fs/f2fs/file.c
> > +++ b/fs/f2fs/file.c
> > @@ -206,15 +206,27 @@ int f2fs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
> >  			up_write(&fi->i_sem);
> >  		}
> >  	} else {
> > -		/* if there is no written node page, write its inode page */
> > -		while (!sync_node_pages(sbi, ino, &wbc)) {
> > -			if (fsync_mark_done(sbi, ino))
> > -				goto out;
> > +sync_nodes:
> > +		sync_node_pages(sbi, ino, &wbc);
> > +
> > +		/*
> > +		 * inode(x) | CP | inode(x) | dnode(F)
> > +		 *  -> ok
> 
> Is it acceptable that we turn this to:
> 
> inode(x) | CPU | inode (x) | dnode (F) | inode(F)
> 
> > +		 * inode(x) | CP | dnode(F) | inode(x)
> > +		 *  -> inode(x) | CP | dnode(F) | inode(x) | inode(F)
> > +		 * CP | inode(x) | dnode(F)
> > +		 *  -> CP | inode(x) | dnode(F) | inode(DF)
> > +		 * CP | dnode(F) | inode(x)
> > +		 *  -> CP | dnode(F) | inode(x) | inode(DF)
> > +		 */
> > +		if (!fsync_mark_done(sbi, ino)) {
> >  			mark_inode_dirty_sync(inode);
> >  			ret = f2fs_write_inode(inode, NULL);
> >  			if (ret)
> >  				goto out;
> > +			goto sync_nodes;
> >  		}
> > +
> >  		ret = wait_on_node_pages_writeback(sbi, ino);
> >  		if (ret)
> >  			goto out;
> > diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
> > index b32eb56..653aa71 100644
> > --- a/fs/f2fs/node.c
> > +++ b/fs/f2fs/node.c
> > @@ -248,8 +248,17 @@ retry:
> >  
> >  	/* update fsync_mark if its inode nat entry is still alive */
> >  	e = __lookup_nat_cache(nm_i, ni->ino);
> > -	if (e)
> > +	if (e) {
> > +		/*
> > +		 * CP | inode(x) | dnode(F)
> > +		 *  -> CP | inode(x) | dnode(F) | inode(DF)
> > +		 */
> > +		if (!e->checkpointed && !e->fsync_done &&
> > +				ni->ino != ni->nid && fsync_done)
> > +			goto skip;
> >  		e->fsync_done = fsync_done;
> > +	}
> > +skip:
> >  	write_unlock(&nm_i->nat_tree_lock);
> >  }
> 
> I don't understand why we need so complex logic?  Why not just let
> e->fsync_done reflect just latest is_fsync_dnode(page)?
> 
> It appears that in f2fs_sync_file, what we need is just whether inode
> page has fsync mark or not.

Oh, I see.  The e->fsync_done indicates whether latest node of the inode
has fsync_mark.  That can be used to deal with:

CP | inode(DF) | dnode(x)

But I still think it is possible to make e->fsync_done simple via
introduce another simple e->node_fsync_done to indicate whether the node
itself has fsync_mark.  What do you think about the solution in the
below patch?  Just build tested.

IMHO, it is easier to be understand.  And in effect, it is almost same,
because it is hard to distinguish between

CP | inode(x) | dnode(F)

and

inode(x) | CP | inode(x) | dnode(F)

in your solution too.

Best Regards,
Huang, Ying

----------------------------------------------------->
---
 fs/f2fs/f2fs.h     |    1 +
 fs/f2fs/file.c     |   26 ++++++++++++++++++++++----
 fs/f2fs/node.c     |   17 ++++++++++++++++-
 fs/f2fs/node.h     |    4 +++-
 fs/f2fs/recovery.c |    2 ++
 5 files changed, 44 insertions(+), 6 deletions(-)

--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -207,10 +207,28 @@ int f2fs_sync_file(struct file *file, lo
 			up_write(&fi->i_sem);
 		}
 	} else {
-		/* if there is no written node page, write its inode page */
-		while (!sync_node_pages(sbi, inode->i_ino, &wbc)) {
-			if (fsync_mark_done(sbi, inode->i_ino))
-				goto out;
+		for (;;) {
+			sync_node_pages(sbi, inode->i_ino, &wbc);
+			/*
+			 * inode(x) | CP | dnode(F)
+			 * -> ok
+			 * inode(x) | CP | inode(x) | dnode(x)
+			 * -> inode(x) | CP | inode(x) | dnode(x) | inode(F)
+			 * inode(x) | CP | inode(x) | dnode(F)
+			 * -> inode(x) | CP | inode(x) | dnode(F) | inode(F)
+			 * CP | inode(x) | dnode(x)
+			 * -> CP | inode(x) | dnode(x) | inode(DF)
+			 * CP | inode(x) | dnode(F)
+			 * -> CP | inode(x) | dnode(F) | inode(DF)
+			 * CP | dnode(F) | inode(x)
+			 * -> CP | dnode(F) | inode(x) | inode(DF)
+			 */
+			if ((is_checkpointed_node(sbi, inode->i_ino) &&
+			     fsync_mark_done(sbi, inode->i_ino)) ||
+			    (!is_checkpointed_node(sbi, inode->i_ino) &&
+			     node_fsync_mark_done(sbi, inode->i_ino) &&
+			     fsync_mark_done(sbi, inode->i_ino)))
+				break;
 			mark_inode_dirty_sync(inode);
 			ret = f2fs_write_inode(inode, NULL);
 			if (ret)
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -137,6 +137,20 @@ int is_checkpointed_node(struct f2fs_sb_
 	return is_cp;
 }
 
+bool node_fsync_mark_done(struct f2fs_sb_info *sbi, nid_t nid)
+{
+	struct f2fs_nm_info *nm_i = NM_I(sbi);
+	struct nat_entry *e;
+	bool node_fsync_done = false;
+
+	read_lock(&nm_i->nat_tree_lock);
+	e = __lookup_nat_cache(nm_i, nid);
+	if (e)
+		node_fsync_done = e->node_fsync_done;
+	read_unlock(&nm_i->nat_tree_lock);
+	return node_fsync_done;
+}
+
 bool fsync_mark_done(struct f2fs_sb_info *sbi, nid_t nid)
 {
 	struct f2fs_nm_info *nm_i = NM_I(sbi);
@@ -246,7 +260,8 @@ retry:
 	nat_set_blkaddr(e, new_blkaddr);
 	__set_nat_cache_dirty(nm_i, e);
 
-	/* update fsync_mark if its inode nat entry is still alive */
+	e->node_fsync_done = fsync_done;
+	/* update fsync_done if its inode nat entry is still alive */
 	e = __lookup_nat_cache(nm_i, ni->ino);
 	if (e)
 		e->fsync_done = fsync_done;
--- a/fs/f2fs/node.h
+++ b/fs/f2fs/node.h
@@ -42,7 +42,9 @@ struct node_info {
 struct nat_entry {
 	struct list_head list;	/* for clean or dirty nat list */
 	bool checkpointed;	/* whether it is checkpointed or not */
-	bool fsync_done;	/* whether the latest node has fsync mark */
+	bool node_fsync_done;	/* whether the node has fsync mark */
+	bool fsync_done;	/* whether the latest node of the
+				 * inode has fsync mark */
 	struct node_info ni;	/* in-memory node information */
 };
 
--- a/fs/f2fs/recovery.c
+++ b/fs/f2fs/recovery.c
@@ -190,6 +190,8 @@ static int find_fsync_dnodes(struct f2fs
 			if (IS_ERR(entry->inode)) {
 				err = PTR_ERR(entry->inode);
 				kmem_cache_free(fsync_entry_slab, entry);
+				if (err == -ENOENT)
+					goto next;
 				break;
 			}
 			list_add_tail(&entry->list, head);
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1220,6 +1220,7 @@ struct node_info;
 
 bool available_free_memory(struct f2fs_sb_info *, int);
 int is_checkpointed_node(struct f2fs_sb_info *, nid_t);
+bool node_fsync_mark_done(struct f2fs_sb_info *, nid_t);
 bool fsync_mark_done(struct f2fs_sb_info *, nid_t);
 void fsync_mark_clear(struct f2fs_sb_info *, nid_t);
 void get_node_info(struct f2fs_sb_info *, nid_t, struct node_info *);



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] f2fs: Fix recover when nid of non-inode dnode < nid of inode
@ 2014-09-13 14:23                       ` Huang Ying
  0 siblings, 0 replies; 31+ messages in thread
From: Huang Ying @ 2014-09-13 14:23 UTC (permalink / raw)
  To: Jaegeuk Kim; +Cc: huang ying, LKML, linux-f2fs-devel

On Fri, 2014-09-12 at 15:34 +0800, Huang Ying wrote:
> On Thu, 2014-09-11 at 22:13 -0700, Jaegeuk Kim wrote:
> > On Thu, Sep 11, 2014 at 08:25:17PM +0800, Huang Ying wrote:
> > > 
> > > On Wed, 2014-09-10 at 22:37 -0700, Jaegeuk Kim wrote:
> > > > On Wed, Sep 10, 2014 at 07:08:32PM +0800, huang ying wrote:
> > > > > On Wed, Sep 10, 2014 at 3:21 PM, Jaegeuk Kim <jaegeuk@kernel.org> wrote:
> > > > > 
> > > > > > On Tue, Sep 09, 2014 at 07:31:49PM +0800, huang ying wrote:
> > > > > > > On Tue, Sep 9, 2014 at 3:09 PM, Jaegeuk Kim <jaegeuk@kernel.org> wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > On Tue, Sep 09, 2014 at 01:39:30PM +0800, Huang Ying wrote:
> > > > > > > > > On Mon, 2014-09-08 at 22:23 -0700, Jaegeuk Kim wrote:
> > > > > > > > > > Hi Huang,
> > > > > > > > > >
> > > > > > > > > > On Mon, Sep 08, 2014 at 07:38:26PM +0800, Huang Ying wrote:
> > > > > > > > > > > For fsync, if the nid of a non-inode dnode < nid of inode and the
> > > > > > > > > > > inode is not checkpointed.  The non-inode dnode may be written
> > > > > > before
> > > > > > > > > > > inode.  So in find_fsync_dnodes, f2fs_iget will fail, cause the
> > > > > > > > > > > recovery fail.
> > > > > > > > > > >
> > > > > > > > > > > Usually, inode will be allocated before non-inode dnode, so the
> > > > > > nid
> > > > > > > > of
> > > > > > > > > > > inode < nid of non-inode dnode.  But it is possible for the
> > > > > > reverse.
> > > > > > > > > > > For example, because of alloc_nid_failed.
> > > > > > > > > > >
> > > > > > > > > > > This is fixed via ignoring non-inode dnode before inode dnode in
> > > > > > > > > > > find_fsync_dnodes.
> > > > > > > > > > >
> > > > > > > > > > > The patch was tested via allocating nid reversely via a debugging
> > > > > > > > > > > patch, that is, from big number to small number.
> > > > > > > > > > >
> > > > > > > > > > > Signed-off-by: Huang, Ying <ying.huang@intel.com>
> > > > > > > > > > > ---
> > > > > > > > > > >  fs/f2fs/recovery.c |    7 ++++---
> > > > > > > > > > >  1 file changed, 4 insertions(+), 3 deletions(-)
> > > > > > > > > > >
> > > > > > > > > > > --- a/fs/f2fs/recovery.c
> > > > > > > > > > > +++ b/fs/f2fs/recovery.c
> > > > > > > > > > > @@ -172,8 +172,8 @@ static int find_fsync_dnodes(struct f2fs
> > > > > > > > > > >                   if (IS_INODE(page) && is_dent_dnode(page))
> > > > > > > > > > >                           set_inode_flag(F2FS_I(entry->inode),
> > > > > > > > > > >                                                   FI_INC_LINK);
> > > > > > > > > > > -         } else {
> > > > > > > > > > > -                 if (IS_INODE(page) && is_dent_dnode(page)) {
> > > > > > > > > >
> > > > > > > > > > If this is not inode block, we should add this inode to recover its
> > > > > > > > data blocks.
> > > > > > > > >
> > > > > > > > > Is it possible that there is only non-inode dnode but no inode when
> > > > > > > > > find_fsync_dnodes checking dnodes?  Per my understanding, any
> > > > > > changes to
> > > > > > > > > file will cause inode page dirty (for example, mtime changed), so
> > > > > > that
> > > > > > > > > we will write inode block.  Is it right?  If so, the solution in this
> > > > > > > > > patch should work too.
> > > > > > > >
> > > > > > > > Your description says that f2fs_iget will fail, which causes the
> > > > > > recovery
> > > > > > > > fail.
> > > > > > > > So, I thought it would be better to handle the f2fs_iget failure
> > > > > > directly.
> > > > > > > >
> > > > > > >
> > > > > > > Yes.  That is another way to fix the issue.
> > > > > > >
> > > > > > >
> > > > > > > > In addition, we cannot guarantee the write order of dnode and inode.
> > > > > > > > For exmaple,
> > > > > > > > 1. the inode is written by flusher or kswapd, then,
> > > > > > > > 2. f2fs_sync_file writes its dnode.
> > > > > > > >
> > > > > > > > In that case, we can get only non-inode dnode in the node chain, since
> > > > > > the
> > > > > > > > inode
> > > > > > > > has not fsync_mark.
> > > > > > > >
> > > > > > >
> > > > > > > I think your solution is better here, but does not fix all scenarios.  If
> > > > > > > the inode is checkpointed, the file can be recovered, although the inode
> > > > > > > information may be not up to date.  But if the inode is not checkpointed,
> > > > > > > f2fs_iget will fail too and recover will fail.
> > > > > >
> > > > > > Ok, let me consider your scenarios.
> > > > > >
> > > > > > Term: F: fsync_mark, D: dentry_mark
> > > > > >
> > > > > > 1. inode(x) | CP | inode(x) | dnode(F)
> > > > > >  -> Lose the latest inode(x). Need to fix.
> > > > > >
> > > > > > 2. inode(x) | CP | dnode(F) | inode(x)
> > > > > >  -> Impossible, but recover latest dnode(F)
> > > > > >
> > > > > > 3. CP | inode(x) | dnode(F)
> > > > > >  -> Need to write inode(DF) in f2fs_sync_file.
> > > > > >
> > > > > > 4. CP | dnode(F) | inode(DF)
> > > > > >  -> If f2fs_iget fails, then goto next.
> > > > > >
> > > > > > 5. CP | dnode(F) | inode(x)
> > > > > >  -> If f2fs_iget fails, then goto next. But, this is an impossible
> > > > > > scenario.
> > > > > >     Drop this dnode(F).
> > > > > >
> > > > > > Indeed, there were some missing scenarios.
> > > > > >
> > > > > > So, how about this patch?
> > > > > >
> > > > > > From 552dc68c5f07a335d7b55c197bab531efb135521 Mon Sep 17 00:00:00 2001
> > > > > > From: Jaegeuk Kim <jaegeuk@kernel.org>
> > > > > > Date: Wed, 10 Sep 2014 00:16:34 -0700
> > > > > > Subject: [PATCH] f2fs: fix roll-forward missing scenarios
> > > > > >
> > > > > > We can summarize the roll forward recovery scenarios as follows.
> > > > > >
> > > > > > [Term] F: fsync_mark, D: dentry_mark
> > > > > >
> > > > > > 1. inode(x) | CP | inode(x) | dnode(F)
> > > > > > -> Update the latest inode(x).
> > > > > >
> > > > > > 2. inode(x) | CP | inode(F) | dnode(F)
> > > > > > -> No problem.
> > > > > >
> > > > > > 3. inode(x) | CP | dnode(F) | inode(x)
> > > > > > -> Impossible, but recover latest dnode(F)
> > > > > >
> > > > > 
> > > > > I think this is possible.  If f2fs_sync_file runs concurrently with
> > > > > writeback. f2fs_sync_file written dnode(F), then writeback written inode(x).
> > > > 
> > > > If the inode(x) was written, f2fs_sync_file will do write the inode again with
> > > > fsync_mark. So, dnode(F) | inode(x) | inode(F) should be shown.
> > > > 
> > > > In f2fs_sync_file,
> > > > 	...
> > > > 	while (!sync_node_pages(sbi, ino, &wbc)) {
> > > > 		if (fsync_mark_done(sbi, ino))
> > > > 			goto out;
> > > > 		mark_inode_dirty_sync(inode);
> > > > 		ret = f2fs_write_inode(inode, NULL);
> > > > 		if (ret)
> > > > 			goto out;
> > > > 	}
> > > > 	...
> > > 
> > > Is the following situation possible?
> > > 
> > > f2fs_sync_file				<writeback>
> > >   sync_node_pages			  f2fs_write_node_pages
> > >     write dnode(F)			    sync_node_pages
> > > 					      write inode(x) /* clear PAGECACHE_TAG_DIRTY */
> > > 
> > > 
> > > That is, f2fs_sync_file run parallel with <writeback>.  The
> > > sync_node_pages above will return 1, because dnode(F) is written.
> > > inode(x) is written by <writeback> path.  And because
> > > PAGECACHE_TAG_DIRTY is cleared, it is possible that sync_node_pages
> > > called by f2fs_sync_file does not write inode(F).
> > 
> > I think Chao's comment would work.
> > How about this patch?
> > 
> > From 32fe5ff49d2c78d3be4cf3638cc64ae71cf44549 Mon Sep 17 00:00:00 2001
> > From: Jaegeuk Kim <jaegeuk@kernel.org>
> > Date: Wed, 10 Sep 2014 00:16:34 -0700
> > Subject: [PATCH] f2fs: fix roll-forward missing scenarios
> > 
> > We can summarize the roll forward recovery scenarios as follows.
> > 
> > [Term] F: fsync_mark, D: dentry_mark
> > 
> > 1. inode(x) | CP | inode(x) | dnode(F)
> > -> Update the latest inode(x).
> > 
> > 2. inode(x) | CP | inode(F) | dnode(F)
> > -> No problem.
> > 
> > 3. inode(x) | CP | dnode(F) | inode(x)
> > -> Recover to the latest dnode(F), and drop the last inode(x)
> > 
> > 4. inode(x) | CP | dnode(F) | inode(F)
> > -> No problem.
> > 
> > 5. CP | inode(x) | dnode(F)
> > -> The inode(DF) was missing. Should drop this dnode(F).
> > 
> > 6. CP | inode(DF) | dnode(F)
> > -> No problem.
> > 
> > 7. CP | dnode(F) | inode(DF)
> > -> If f2fs_iget fails, then goto next to find inode(DF).
> > 
> > 8. CP | dnode(F) | inode(x)
> > -> If f2fs_iget fails, then goto next to find inode(DF).
> >    But it will fail due to no inode(DF).
> > 
> > So, this patch adds some missing points such as #1, #5, #7, and #8.
> > 
> > Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> > ---
> >  fs/f2fs/file.c     | 20 ++++++++++++----
> >  fs/f2fs/node.c     | 11 ++++++++-
> >  fs/f2fs/recovery.c | 70 +++++++++++++++++++++++++++++++++++++++++++++---------
> >  3 files changed, 85 insertions(+), 16 deletions(-)
> > 
> > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> > index e7681c3..70f5d4b 100644
> > --- a/fs/f2fs/file.c
> > +++ b/fs/f2fs/file.c
> > @@ -206,15 +206,27 @@ int f2fs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
> >  			up_write(&fi->i_sem);
> >  		}
> >  	} else {
> > -		/* if there is no written node page, write its inode page */
> > -		while (!sync_node_pages(sbi, ino, &wbc)) {
> > -			if (fsync_mark_done(sbi, ino))
> > -				goto out;
> > +sync_nodes:
> > +		sync_node_pages(sbi, ino, &wbc);
> > +
> > +		/*
> > +		 * inode(x) | CP | inode(x) | dnode(F)
> > +		 *  -> ok
> 
> Is it acceptable that we turn this to:
> 
> inode(x) | CPU | inode (x) | dnode (F) | inode(F)
> 
> > +		 * inode(x) | CP | dnode(F) | inode(x)
> > +		 *  -> inode(x) | CP | dnode(F) | inode(x) | inode(F)
> > +		 * CP | inode(x) | dnode(F)
> > +		 *  -> CP | inode(x) | dnode(F) | inode(DF)
> > +		 * CP | dnode(F) | inode(x)
> > +		 *  -> CP | dnode(F) | inode(x) | inode(DF)
> > +		 */
> > +		if (!fsync_mark_done(sbi, ino)) {
> >  			mark_inode_dirty_sync(inode);
> >  			ret = f2fs_write_inode(inode, NULL);
> >  			if (ret)
> >  				goto out;
> > +			goto sync_nodes;
> >  		}
> > +
> >  		ret = wait_on_node_pages_writeback(sbi, ino);
> >  		if (ret)
> >  			goto out;
> > diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
> > index b32eb56..653aa71 100644
> > --- a/fs/f2fs/node.c
> > +++ b/fs/f2fs/node.c
> > @@ -248,8 +248,17 @@ retry:
> >  
> >  	/* update fsync_mark if its inode nat entry is still alive */
> >  	e = __lookup_nat_cache(nm_i, ni->ino);
> > -	if (e)
> > +	if (e) {
> > +		/*
> > +		 * CP | inode(x) | dnode(F)
> > +		 *  -> CP | inode(x) | dnode(F) | inode(DF)
> > +		 */
> > +		if (!e->checkpointed && !e->fsync_done &&
> > +				ni->ino != ni->nid && fsync_done)
> > +			goto skip;
> >  		e->fsync_done = fsync_done;
> > +	}
> > +skip:
> >  	write_unlock(&nm_i->nat_tree_lock);
> >  }
> 
> I don't understand why we need so complex logic?  Why not just let
> e->fsync_done reflect just latest is_fsync_dnode(page)?
> 
> It appears that in f2fs_sync_file, what we need is just whether inode
> page has fsync mark or not.

Oh, I see.  The e->fsync_done indicates whether latest node of the inode
has fsync_mark.  That can be used to deal with:

CP | inode(DF) | dnode(x)

But I still think it is possible to make e->fsync_done simple via
introduce another simple e->node_fsync_done to indicate whether the node
itself has fsync_mark.  What do you think about the solution in the
below patch?  Just build tested.

IMHO, it is easier to be understand.  And in effect, it is almost same,
because it is hard to distinguish between

CP | inode(x) | dnode(F)

and

inode(x) | CP | inode(x) | dnode(F)

in your solution too.

Best Regards,
Huang, Ying

----------------------------------------------------->
---
 fs/f2fs/f2fs.h     |    1 +
 fs/f2fs/file.c     |   26 ++++++++++++++++++++++----
 fs/f2fs/node.c     |   17 ++++++++++++++++-
 fs/f2fs/node.h     |    4 +++-
 fs/f2fs/recovery.c |    2 ++
 5 files changed, 44 insertions(+), 6 deletions(-)

--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -207,10 +207,28 @@ int f2fs_sync_file(struct file *file, lo
 			up_write(&fi->i_sem);
 		}
 	} else {
-		/* if there is no written node page, write its inode page */
-		while (!sync_node_pages(sbi, inode->i_ino, &wbc)) {
-			if (fsync_mark_done(sbi, inode->i_ino))
-				goto out;
+		for (;;) {
+			sync_node_pages(sbi, inode->i_ino, &wbc);
+			/*
+			 * inode(x) | CP | dnode(F)
+			 * -> ok
+			 * inode(x) | CP | inode(x) | dnode(x)
+			 * -> inode(x) | CP | inode(x) | dnode(x) | inode(F)
+			 * inode(x) | CP | inode(x) | dnode(F)
+			 * -> inode(x) | CP | inode(x) | dnode(F) | inode(F)
+			 * CP | inode(x) | dnode(x)
+			 * -> CP | inode(x) | dnode(x) | inode(DF)
+			 * CP | inode(x) | dnode(F)
+			 * -> CP | inode(x) | dnode(F) | inode(DF)
+			 * CP | dnode(F) | inode(x)
+			 * -> CP | dnode(F) | inode(x) | inode(DF)
+			 */
+			if ((is_checkpointed_node(sbi, inode->i_ino) &&
+			     fsync_mark_done(sbi, inode->i_ino)) ||
+			    (!is_checkpointed_node(sbi, inode->i_ino) &&
+			     node_fsync_mark_done(sbi, inode->i_ino) &&
+			     fsync_mark_done(sbi, inode->i_ino)))
+				break;
 			mark_inode_dirty_sync(inode);
 			ret = f2fs_write_inode(inode, NULL);
 			if (ret)
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -137,6 +137,20 @@ int is_checkpointed_node(struct f2fs_sb_
 	return is_cp;
 }
 
+bool node_fsync_mark_done(struct f2fs_sb_info *sbi, nid_t nid)
+{
+	struct f2fs_nm_info *nm_i = NM_I(sbi);
+	struct nat_entry *e;
+	bool node_fsync_done = false;
+
+	read_lock(&nm_i->nat_tree_lock);
+	e = __lookup_nat_cache(nm_i, nid);
+	if (e)
+		node_fsync_done = e->node_fsync_done;
+	read_unlock(&nm_i->nat_tree_lock);
+	return node_fsync_done;
+}
+
 bool fsync_mark_done(struct f2fs_sb_info *sbi, nid_t nid)
 {
 	struct f2fs_nm_info *nm_i = NM_I(sbi);
@@ -246,7 +260,8 @@ retry:
 	nat_set_blkaddr(e, new_blkaddr);
 	__set_nat_cache_dirty(nm_i, e);
 
-	/* update fsync_mark if its inode nat entry is still alive */
+	e->node_fsync_done = fsync_done;
+	/* update fsync_done if its inode nat entry is still alive */
 	e = __lookup_nat_cache(nm_i, ni->ino);
 	if (e)
 		e->fsync_done = fsync_done;
--- a/fs/f2fs/node.h
+++ b/fs/f2fs/node.h
@@ -42,7 +42,9 @@ struct node_info {
 struct nat_entry {
 	struct list_head list;	/* for clean or dirty nat list */
 	bool checkpointed;	/* whether it is checkpointed or not */
-	bool fsync_done;	/* whether the latest node has fsync mark */
+	bool node_fsync_done;	/* whether the node has fsync mark */
+	bool fsync_done;	/* whether the latest node of the
+				 * inode has fsync mark */
 	struct node_info ni;	/* in-memory node information */
 };
 
--- a/fs/f2fs/recovery.c
+++ b/fs/f2fs/recovery.c
@@ -190,6 +190,8 @@ static int find_fsync_dnodes(struct f2fs
 			if (IS_ERR(entry->inode)) {
 				err = PTR_ERR(entry->inode);
 				kmem_cache_free(fsync_entry_slab, entry);
+				if (err == -ENOENT)
+					goto next;
 				break;
 			}
 			list_add_tail(&entry->list, head);
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1220,6 +1220,7 @@ struct node_info;
 
 bool available_free_memory(struct f2fs_sb_info *, int);
 int is_checkpointed_node(struct f2fs_sb_info *, nid_t);
+bool node_fsync_mark_done(struct f2fs_sb_info *, nid_t);
 bool fsync_mark_done(struct f2fs_sb_info *, nid_t);
 void fsync_mark_clear(struct f2fs_sb_info *, nid_t);
 void get_node_info(struct f2fs_sb_info *, nid_t, struct node_info *);



------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] f2fs: Fix recover when nid of non-inode dnode < nid of inode
  2014-09-12  7:34                     ` Huang Ying
@ 2014-09-14  7:38                       ` Jaegeuk Kim
  -1 siblings, 0 replies; 31+ messages in thread
From: Jaegeuk Kim @ 2014-09-14  7:38 UTC (permalink / raw)
  To: Huang Ying; +Cc: huang ying, Changman Lee, linux-f2fs-devel, LKML

On Fri, Sep 12, 2014 at 03:34:48PM +0800, Huang Ying wrote:
> On Thu, 2014-09-11 at 22:13 -0700, Jaegeuk Kim wrote:
> > On Thu, Sep 11, 2014 at 08:25:17PM +0800, Huang Ying wrote:
> > > 
> > > On Wed, 2014-09-10 at 22:37 -0700, Jaegeuk Kim wrote:
> > > > On Wed, Sep 10, 2014 at 07:08:32PM +0800, huang ying wrote:
> > > > > On Wed, Sep 10, 2014 at 3:21 PM, Jaegeuk Kim <jaegeuk@kernel.org> wrote:
> > > > > 
> > > > > > On Tue, Sep 09, 2014 at 07:31:49PM +0800, huang ying wrote:
> > > > > > > On Tue, Sep 9, 2014 at 3:09 PM, Jaegeuk Kim <jaegeuk@kernel.org> wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > On Tue, Sep 09, 2014 at 01:39:30PM +0800, Huang Ying wrote:
> > > > > > > > > On Mon, 2014-09-08 at 22:23 -0700, Jaegeuk Kim wrote:
> > > > > > > > > > Hi Huang,
> > > > > > > > > >
> > > > > > > > > > On Mon, Sep 08, 2014 at 07:38:26PM +0800, Huang Ying wrote:
> > > > > > > > > > > For fsync, if the nid of a non-inode dnode < nid of inode and the
> > > > > > > > > > > inode is not checkpointed.  The non-inode dnode may be written
> > > > > > before
> > > > > > > > > > > inode.  So in find_fsync_dnodes, f2fs_iget will fail, cause the
> > > > > > > > > > > recovery fail.
> > > > > > > > > > >
> > > > > > > > > > > Usually, inode will be allocated before non-inode dnode, so the
> > > > > > nid
> > > > > > > > of
> > > > > > > > > > > inode < nid of non-inode dnode.  But it is possible for the
> > > > > > reverse.
> > > > > > > > > > > For example, because of alloc_nid_failed.
> > > > > > > > > > >
> > > > > > > > > > > This is fixed via ignoring non-inode dnode before inode dnode in
> > > > > > > > > > > find_fsync_dnodes.
> > > > > > > > > > >
> > > > > > > > > > > The patch was tested via allocating nid reversely via a debugging
> > > > > > > > > > > patch, that is, from big number to small number.
> > > > > > > > > > >
> > > > > > > > > > > Signed-off-by: Huang, Ying <ying.huang@intel.com>
> > > > > > > > > > > ---
> > > > > > > > > > >  fs/f2fs/recovery.c |    7 ++++---
> > > > > > > > > > >  1 file changed, 4 insertions(+), 3 deletions(-)
> > > > > > > > > > >
> > > > > > > > > > > --- a/fs/f2fs/recovery.c
> > > > > > > > > > > +++ b/fs/f2fs/recovery.c
> > > > > > > > > > > @@ -172,8 +172,8 @@ static int find_fsync_dnodes(struct f2fs
> > > > > > > > > > >                   if (IS_INODE(page) && is_dent_dnode(page))
> > > > > > > > > > >                           set_inode_flag(F2FS_I(entry->inode),
> > > > > > > > > > >                                                   FI_INC_LINK);
> > > > > > > > > > > -         } else {
> > > > > > > > > > > -                 if (IS_INODE(page) && is_dent_dnode(page)) {
> > > > > > > > > >
> > > > > > > > > > If this is not inode block, we should add this inode to recover its
> > > > > > > > data blocks.
> > > > > > > > >
> > > > > > > > > Is it possible that there is only non-inode dnode but no inode when
> > > > > > > > > find_fsync_dnodes checking dnodes?  Per my understanding, any
> > > > > > changes to
> > > > > > > > > file will cause inode page dirty (for example, mtime changed), so
> > > > > > that
> > > > > > > > > we will write inode block.  Is it right?  If so, the solution in this
> > > > > > > > > patch should work too.
> > > > > > > >
> > > > > > > > Your description says that f2fs_iget will fail, which causes the
> > > > > > recovery
> > > > > > > > fail.
> > > > > > > > So, I thought it would be better to handle the f2fs_iget failure
> > > > > > directly.
> > > > > > > >
> > > > > > >
> > > > > > > Yes.  That is another way to fix the issue.
> > > > > > >
> > > > > > >
> > > > > > > > In addition, we cannot guarantee the write order of dnode and inode.
> > > > > > > > For exmaple,
> > > > > > > > 1. the inode is written by flusher or kswapd, then,
> > > > > > > > 2. f2fs_sync_file writes its dnode.
> > > > > > > >
> > > > > > > > In that case, we can get only non-inode dnode in the node chain, since
> > > > > > the
> > > > > > > > inode
> > > > > > > > has not fsync_mark.
> > > > > > > >
> > > > > > >
> > > > > > > I think your solution is better here, but does not fix all scenarios.  If
> > > > > > > the inode is checkpointed, the file can be recovered, although the inode
> > > > > > > information may be not up to date.  But if the inode is not checkpointed,
> > > > > > > f2fs_iget will fail too and recover will fail.
> > > > > >
> > > > > > Ok, let me consider your scenarios.
> > > > > >
> > > > > > Term: F: fsync_mark, D: dentry_mark
> > > > > >
> > > > > > 1. inode(x) | CP | inode(x) | dnode(F)
> > > > > >  -> Lose the latest inode(x). Need to fix.
> > > > > >
> > > > > > 2. inode(x) | CP | dnode(F) | inode(x)
> > > > > >  -> Impossible, but recover latest dnode(F)
> > > > > >
> > > > > > 3. CP | inode(x) | dnode(F)
> > > > > >  -> Need to write inode(DF) in f2fs_sync_file.
> > > > > >
> > > > > > 4. CP | dnode(F) | inode(DF)
> > > > > >  -> If f2fs_iget fails, then goto next.
> > > > > >
> > > > > > 5. CP | dnode(F) | inode(x)
> > > > > >  -> If f2fs_iget fails, then goto next. But, this is an impossible
> > > > > > scenario.
> > > > > >     Drop this dnode(F).
> > > > > >
> > > > > > Indeed, there were some missing scenarios.
> > > > > >
> > > > > > So, how about this patch?
> > > > > >
> > > > > > From 552dc68c5f07a335d7b55c197bab531efb135521 Mon Sep 17 00:00:00 2001
> > > > > > From: Jaegeuk Kim <jaegeuk@kernel.org>
> > > > > > Date: Wed, 10 Sep 2014 00:16:34 -0700
> > > > > > Subject: [PATCH] f2fs: fix roll-forward missing scenarios
> > > > > >
> > > > > > We can summarize the roll forward recovery scenarios as follows.
> > > > > >
> > > > > > [Term] F: fsync_mark, D: dentry_mark
> > > > > >
> > > > > > 1. inode(x) | CP | inode(x) | dnode(F)
> > > > > > -> Update the latest inode(x).
> > > > > >
> > > > > > 2. inode(x) | CP | inode(F) | dnode(F)
> > > > > > -> No problem.
> > > > > >
> > > > > > 3. inode(x) | CP | dnode(F) | inode(x)
> > > > > > -> Impossible, but recover latest dnode(F)
> > > > > >
> > > > > 
> > > > > I think this is possible.  If f2fs_sync_file runs concurrently with
> > > > > writeback. f2fs_sync_file written dnode(F), then writeback written inode(x).
> > > > 
> > > > If the inode(x) was written, f2fs_sync_file will do write the inode again with
> > > > fsync_mark. So, dnode(F) | inode(x) | inode(F) should be shown.
> > > > 
> > > > In f2fs_sync_file,
> > > > 	...
> > > > 	while (!sync_node_pages(sbi, ino, &wbc)) {
> > > > 		if (fsync_mark_done(sbi, ino))
> > > > 			goto out;
> > > > 		mark_inode_dirty_sync(inode);
> > > > 		ret = f2fs_write_inode(inode, NULL);
> > > > 		if (ret)
> > > > 			goto out;
> > > > 	}
> > > > 	...
> > > 
> > > Is the following situation possible?
> > > 
> > > f2fs_sync_file				<writeback>
> > >   sync_node_pages			  f2fs_write_node_pages
> > >     write dnode(F)			    sync_node_pages
> > > 					      write inode(x) /* clear PAGECACHE_TAG_DIRTY */
> > > 
> > > 
> > > That is, f2fs_sync_file run parallel with <writeback>.  The
> > > sync_node_pages above will return 1, because dnode(F) is written.
> > > inode(x) is written by <writeback> path.  And because
> > > PAGECACHE_TAG_DIRTY is cleared, it is possible that sync_node_pages
> > > called by f2fs_sync_file does not write inode(F).
> > 
> > I think Chao's comment would work.
> > How about this patch?
> > 
> > From 32fe5ff49d2c78d3be4cf3638cc64ae71cf44549 Mon Sep 17 00:00:00 2001
> > From: Jaegeuk Kim <jaegeuk@kernel.org>
> > Date: Wed, 10 Sep 2014 00:16:34 -0700
> > Subject: [PATCH] f2fs: fix roll-forward missing scenarios
> > 
> > We can summarize the roll forward recovery scenarios as follows.
> > 
> > [Term] F: fsync_mark, D: dentry_mark
> > 
> > 1. inode(x) | CP | inode(x) | dnode(F)
> > -> Update the latest inode(x).
> > 
> > 2. inode(x) | CP | inode(F) | dnode(F)
> > -> No problem.
> > 
> > 3. inode(x) | CP | dnode(F) | inode(x)
> > -> Recover to the latest dnode(F), and drop the last inode(x)
> > 
> > 4. inode(x) | CP | dnode(F) | inode(F)
> > -> No problem.
> > 
> > 5. CP | inode(x) | dnode(F)
> > -> The inode(DF) was missing. Should drop this dnode(F).
> > 
> > 6. CP | inode(DF) | dnode(F)
> > -> No problem.
> > 
> > 7. CP | dnode(F) | inode(DF)
> > -> If f2fs_iget fails, then goto next to find inode(DF).
> > 
> > 8. CP | dnode(F) | inode(x)
> > -> If f2fs_iget fails, then goto next to find inode(DF).
> >    But it will fail due to no inode(DF).
> > 
> > So, this patch adds some missing points such as #1, #5, #7, and #8.
> > 
> > Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> > ---
> >  fs/f2fs/file.c     | 20 ++++++++++++----
> >  fs/f2fs/node.c     | 11 ++++++++-
> >  fs/f2fs/recovery.c | 70 +++++++++++++++++++++++++++++++++++++++++++++---------
> >  3 files changed, 85 insertions(+), 16 deletions(-)
> > 
> > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> > index e7681c3..70f5d4b 100644
> > --- a/fs/f2fs/file.c
> > +++ b/fs/f2fs/file.c
> > @@ -206,15 +206,27 @@ int f2fs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
> >  			up_write(&fi->i_sem);
> >  		}
> >  	} else {
> > -		/* if there is no written node page, write its inode page */
> > -		while (!sync_node_pages(sbi, ino, &wbc)) {
> > -			if (fsync_mark_done(sbi, ino))
> > -				goto out;
> > +sync_nodes:
> > +		sync_node_pages(sbi, ino, &wbc);
> > +
> > +		/*
> > +		 * inode(x) | CP | inode(x) | dnode(F)
> > +		 *  -> ok
> 
> Is it acceptable that we turn this to:
> 
> inode(x) | CPU | inode (x) | dnode (F) | inode(F)

Why do we have to do writing additonal one block unnecessarily?
We got inode(x) for the recovery.

> 
> > +		 * inode(x) | CP | dnode(F) | inode(x)
> > +		 *  -> inode(x) | CP | dnode(F) | inode(x) | inode(F)
> > +		 * CP | inode(x) | dnode(F)
> > +		 *  -> CP | inode(x) | dnode(F) | inode(DF)
> > +		 * CP | dnode(F) | inode(x)
> > +		 *  -> CP | dnode(F) | inode(x) | inode(DF)
> > +		 */
> > +		if (!fsync_mark_done(sbi, ino)) {
> >  			mark_inode_dirty_sync(inode);
> >  			ret = f2fs_write_inode(inode, NULL);
> >  			if (ret)
> >  				goto out;
> > +			goto sync_nodes;
> >  		}
> > +
> >  		ret = wait_on_node_pages_writeback(sbi, ino);
> >  		if (ret)
> >  			goto out;
> > diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
> > index b32eb56..653aa71 100644
> > --- a/fs/f2fs/node.c
> > +++ b/fs/f2fs/node.c
> > @@ -248,8 +248,17 @@ retry:
> >  
> >  	/* update fsync_mark if its inode nat entry is still alive */
> >  	e = __lookup_nat_cache(nm_i, ni->ino);
> > -	if (e)
> > +	if (e) {
> > +		/*
> > +		 * CP | inode(x) | dnode(F)
> > +		 *  -> CP | inode(x) | dnode(F) | inode(DF)
> > +		 */
> > +		if (!e->checkpointed && !e->fsync_done &&
> > +				ni->ino != ni->nid && fsync_done)
> > +			goto skip;
> >  		e->fsync_done = fsync_done;
> > +	}
> > +skip:
> >  	write_unlock(&nm_i->nat_tree_lock);
> >  }
> 
> I don't understand why we need so complex logic?  Why not just let
> e->fsync_done reflect just latest is_fsync_dnode(page)?
> 
> It appears that in f2fs_sync_file, what we need is just whether inode
> page has fsync mark or not.
> 
> Best Regards,
> Huang, Ying
>  
> > diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c
> > index 6c5a74a..3736728 100644
> > --- a/fs/f2fs/recovery.c
> > +++ b/fs/f2fs/recovery.c
> > @@ -14,6 +14,36 @@
> >  #include "node.h"
> >  #include "segment.h"
> >  
> > +/*
> > + * Roll forward recovery scenarios.
> > + *
> > + * [Term] F: fsync_mark, D: dentry_mark
> > + *
> > + * 1. inode(x) | CP | inode(x) | dnode(F)
> > + * -> Update the latest inode(x).
> > + *
> > + * 2. inode(x) | CP | inode(F) | dnode(F)
> > + * -> No problem.
> > + *
> > + * 3. inode(x) | CP | dnode(F) | inode(x)
> > + * -> Recover to the latest dnode(F), and drop the last inode(x)
> > + *
> > + * 4. inode(x) | CP | dnode(F) | inode(F)
> > + * -> No problem.
> > + *
> > + * 5. CP | inode(x) | dnode(F)
> > + * -> The inode(DF) was missing. Should drop this dnode(F).
> > + *
> > + * 6. CP | inode(DF) | dnode(F)
> > + * -> No problem.
> > + *
> > + * 7. CP | dnode(F) | inode(DF)
> > + * -> If f2fs_iget fails, then goto next to find inode(DF).
> > + *
> > + * 8. CP | dnode(F) | inode(x)
> > + * -> If f2fs_iget fails, then goto next to find inode(DF).
> > + *    But it will fail due to no inode(DF).
> > + */
> >  static struct kmem_cache *fsync_entry_slab;
> >  
> >  bool space_for_roll_forward(struct f2fs_sb_info *sbi)
> > @@ -110,27 +140,32 @@ out:
> >  	return err;
> >  }
> >  
> > -static int recover_inode(struct inode *inode, struct page *node_page)
> > +static void __recover_inode(struct inode *inode, struct page *page)
> >  {
> > -	struct f2fs_inode *raw_inode = F2FS_INODE(node_page);
> > +	struct f2fs_inode *raw = F2FS_INODE(page);
> > +
> > +	inode->i_mode = le16_to_cpu(raw->i_mode);
> > +	i_size_write(inode, le64_to_cpu(raw->i_size));
> > +	inode->i_atime.tv_sec = le64_to_cpu(raw->i_mtime);
> > +	inode->i_ctime.tv_sec = le64_to_cpu(raw->i_ctime);
> > +	inode->i_mtime.tv_sec = le64_to_cpu(raw->i_mtime);
> > +	inode->i_atime.tv_nsec = le32_to_cpu(raw->i_mtime_nsec);
> > +	inode->i_ctime.tv_nsec = le32_to_cpu(raw->i_ctime_nsec);
> > +	inode->i_mtime.tv_nsec = le32_to_cpu(raw->i_mtime_nsec);
> > +}
> >  
> > +static int recover_inode(struct inode *inode, struct page *node_page)
> > +{
> >  	if (!IS_INODE(node_page))
> >  		return 0;
> >  
> > -	inode->i_mode = le16_to_cpu(raw_inode->i_mode);
> > -	i_size_write(inode, le64_to_cpu(raw_inode->i_size));
> > -	inode->i_atime.tv_sec = le64_to_cpu(raw_inode->i_mtime);
> > -	inode->i_ctime.tv_sec = le64_to_cpu(raw_inode->i_ctime);
> > -	inode->i_mtime.tv_sec = le64_to_cpu(raw_inode->i_mtime);
> > -	inode->i_atime.tv_nsec = le32_to_cpu(raw_inode->i_mtime_nsec);
> > -	inode->i_ctime.tv_nsec = le32_to_cpu(raw_inode->i_ctime_nsec);
> > -	inode->i_mtime.tv_nsec = le32_to_cpu(raw_inode->i_mtime_nsec);
> > +	__recover_inode(inode, node_page);
> >  
> >  	if (is_dent_dnode(node_page))
> >  		return recover_dentry(node_page, inode);
> >  
> >  	f2fs_msg(inode->i_sb, KERN_NOTICE, "recover_inode: ino = %x, name = %s",
> > -			ino_of_node(node_page), raw_inode->i_name);
> > +			ino_of_node(node_page), F2FS_INODE(node_page)->i_name);
> >  	return 0;
> >  }
> >  
> > @@ -186,10 +221,16 @@ static int find_fsync_dnodes(struct f2fs_sb_info *sbi, struct list_head *head)
> >  				break;
> >  			}
> >  
> > +			/*
> > +			 * CP | dnode(F) | inode(DF)
> > +			 * For this case, we should not give up now.
> > +			 */
> >  			entry->inode = f2fs_iget(sbi->sb, ino_of_node(page));
> >  			if (IS_ERR(entry->inode)) {
> >  				err = PTR_ERR(entry->inode);
> >  				kmem_cache_free(fsync_entry_slab, entry);
> > +				if (err == -ENOENT)
> > +					goto next;
> >  				break;
> >  			}
> >  			list_add_tail(&entry->list, head);
> > @@ -416,6 +457,13 @@ static int recover_data(struct f2fs_sb_info *sbi,
> >  		entry = get_fsync_inode(head, ino_of_node(page));
> >  		if (!entry)
> >  			goto next;
> > +		/*
> > +		 * inode(x) | CP | inode(x) | dnode(F)
> > +		 * In this case, we can lose the latest inode(x).
> > +		 * So, call __recover_inode for the inode update.
> > +		 */
> > +		if (IS_INODE(page))
> > +			__recover_inode(entry->inode, page);
> >  
> >  		err = do_recover_data(sbi, entry->inode, page, blkaddr);
> >  		if (err)

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] f2fs: Fix recover when nid of non-inode dnode < nid of inode
@ 2014-09-14  7:38                       ` Jaegeuk Kim
  0 siblings, 0 replies; 31+ messages in thread
From: Jaegeuk Kim @ 2014-09-14  7:38 UTC (permalink / raw)
  To: Huang Ying; +Cc: huang ying, LKML, linux-f2fs-devel

On Fri, Sep 12, 2014 at 03:34:48PM +0800, Huang Ying wrote:
> On Thu, 2014-09-11 at 22:13 -0700, Jaegeuk Kim wrote:
> > On Thu, Sep 11, 2014 at 08:25:17PM +0800, Huang Ying wrote:
> > > 
> > > On Wed, 2014-09-10 at 22:37 -0700, Jaegeuk Kim wrote:
> > > > On Wed, Sep 10, 2014 at 07:08:32PM +0800, huang ying wrote:
> > > > > On Wed, Sep 10, 2014 at 3:21 PM, Jaegeuk Kim <jaegeuk@kernel.org> wrote:
> > > > > 
> > > > > > On Tue, Sep 09, 2014 at 07:31:49PM +0800, huang ying wrote:
> > > > > > > On Tue, Sep 9, 2014 at 3:09 PM, Jaegeuk Kim <jaegeuk@kernel.org> wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > On Tue, Sep 09, 2014 at 01:39:30PM +0800, Huang Ying wrote:
> > > > > > > > > On Mon, 2014-09-08 at 22:23 -0700, Jaegeuk Kim wrote:
> > > > > > > > > > Hi Huang,
> > > > > > > > > >
> > > > > > > > > > On Mon, Sep 08, 2014 at 07:38:26PM +0800, Huang Ying wrote:
> > > > > > > > > > > For fsync, if the nid of a non-inode dnode < nid of inode and the
> > > > > > > > > > > inode is not checkpointed.  The non-inode dnode may be written
> > > > > > before
> > > > > > > > > > > inode.  So in find_fsync_dnodes, f2fs_iget will fail, cause the
> > > > > > > > > > > recovery fail.
> > > > > > > > > > >
> > > > > > > > > > > Usually, inode will be allocated before non-inode dnode, so the
> > > > > > nid
> > > > > > > > of
> > > > > > > > > > > inode < nid of non-inode dnode.  But it is possible for the
> > > > > > reverse.
> > > > > > > > > > > For example, because of alloc_nid_failed.
> > > > > > > > > > >
> > > > > > > > > > > This is fixed via ignoring non-inode dnode before inode dnode in
> > > > > > > > > > > find_fsync_dnodes.
> > > > > > > > > > >
> > > > > > > > > > > The patch was tested via allocating nid reversely via a debugging
> > > > > > > > > > > patch, that is, from big number to small number.
> > > > > > > > > > >
> > > > > > > > > > > Signed-off-by: Huang, Ying <ying.huang@intel.com>
> > > > > > > > > > > ---
> > > > > > > > > > >  fs/f2fs/recovery.c |    7 ++++---
> > > > > > > > > > >  1 file changed, 4 insertions(+), 3 deletions(-)
> > > > > > > > > > >
> > > > > > > > > > > --- a/fs/f2fs/recovery.c
> > > > > > > > > > > +++ b/fs/f2fs/recovery.c
> > > > > > > > > > > @@ -172,8 +172,8 @@ static int find_fsync_dnodes(struct f2fs
> > > > > > > > > > >                   if (IS_INODE(page) && is_dent_dnode(page))
> > > > > > > > > > >                           set_inode_flag(F2FS_I(entry->inode),
> > > > > > > > > > >                                                   FI_INC_LINK);
> > > > > > > > > > > -         } else {
> > > > > > > > > > > -                 if (IS_INODE(page) && is_dent_dnode(page)) {
> > > > > > > > > >
> > > > > > > > > > If this is not inode block, we should add this inode to recover its
> > > > > > > > data blocks.
> > > > > > > > >
> > > > > > > > > Is it possible that there is only non-inode dnode but no inode when
> > > > > > > > > find_fsync_dnodes checking dnodes?  Per my understanding, any
> > > > > > changes to
> > > > > > > > > file will cause inode page dirty (for example, mtime changed), so
> > > > > > that
> > > > > > > > > we will write inode block.  Is it right?  If so, the solution in this
> > > > > > > > > patch should work too.
> > > > > > > >
> > > > > > > > Your description says that f2fs_iget will fail, which causes the
> > > > > > recovery
> > > > > > > > fail.
> > > > > > > > So, I thought it would be better to handle the f2fs_iget failure
> > > > > > directly.
> > > > > > > >
> > > > > > >
> > > > > > > Yes.  That is another way to fix the issue.
> > > > > > >
> > > > > > >
> > > > > > > > In addition, we cannot guarantee the write order of dnode and inode.
> > > > > > > > For exmaple,
> > > > > > > > 1. the inode is written by flusher or kswapd, then,
> > > > > > > > 2. f2fs_sync_file writes its dnode.
> > > > > > > >
> > > > > > > > In that case, we can get only non-inode dnode in the node chain, since
> > > > > > the
> > > > > > > > inode
> > > > > > > > has not fsync_mark.
> > > > > > > >
> > > > > > >
> > > > > > > I think your solution is better here, but does not fix all scenarios.  If
> > > > > > > the inode is checkpointed, the file can be recovered, although the inode
> > > > > > > information may be not up to date.  But if the inode is not checkpointed,
> > > > > > > f2fs_iget will fail too and recover will fail.
> > > > > >
> > > > > > Ok, let me consider your scenarios.
> > > > > >
> > > > > > Term: F: fsync_mark, D: dentry_mark
> > > > > >
> > > > > > 1. inode(x) | CP | inode(x) | dnode(F)
> > > > > >  -> Lose the latest inode(x). Need to fix.
> > > > > >
> > > > > > 2. inode(x) | CP | dnode(F) | inode(x)
> > > > > >  -> Impossible, but recover latest dnode(F)
> > > > > >
> > > > > > 3. CP | inode(x) | dnode(F)
> > > > > >  -> Need to write inode(DF) in f2fs_sync_file.
> > > > > >
> > > > > > 4. CP | dnode(F) | inode(DF)
> > > > > >  -> If f2fs_iget fails, then goto next.
> > > > > >
> > > > > > 5. CP | dnode(F) | inode(x)
> > > > > >  -> If f2fs_iget fails, then goto next. But, this is an impossible
> > > > > > scenario.
> > > > > >     Drop this dnode(F).
> > > > > >
> > > > > > Indeed, there were some missing scenarios.
> > > > > >
> > > > > > So, how about this patch?
> > > > > >
> > > > > > From 552dc68c5f07a335d7b55c197bab531efb135521 Mon Sep 17 00:00:00 2001
> > > > > > From: Jaegeuk Kim <jaegeuk@kernel.org>
> > > > > > Date: Wed, 10 Sep 2014 00:16:34 -0700
> > > > > > Subject: [PATCH] f2fs: fix roll-forward missing scenarios
> > > > > >
> > > > > > We can summarize the roll forward recovery scenarios as follows.
> > > > > >
> > > > > > [Term] F: fsync_mark, D: dentry_mark
> > > > > >
> > > > > > 1. inode(x) | CP | inode(x) | dnode(F)
> > > > > > -> Update the latest inode(x).
> > > > > >
> > > > > > 2. inode(x) | CP | inode(F) | dnode(F)
> > > > > > -> No problem.
> > > > > >
> > > > > > 3. inode(x) | CP | dnode(F) | inode(x)
> > > > > > -> Impossible, but recover latest dnode(F)
> > > > > >
> > > > > 
> > > > > I think this is possible.  If f2fs_sync_file runs concurrently with
> > > > > writeback. f2fs_sync_file written dnode(F), then writeback written inode(x).
> > > > 
> > > > If the inode(x) was written, f2fs_sync_file will do write the inode again with
> > > > fsync_mark. So, dnode(F) | inode(x) | inode(F) should be shown.
> > > > 
> > > > In f2fs_sync_file,
> > > > 	...
> > > > 	while (!sync_node_pages(sbi, ino, &wbc)) {
> > > > 		if (fsync_mark_done(sbi, ino))
> > > > 			goto out;
> > > > 		mark_inode_dirty_sync(inode);
> > > > 		ret = f2fs_write_inode(inode, NULL);
> > > > 		if (ret)
> > > > 			goto out;
> > > > 	}
> > > > 	...
> > > 
> > > Is the following situation possible?
> > > 
> > > f2fs_sync_file				<writeback>
> > >   sync_node_pages			  f2fs_write_node_pages
> > >     write dnode(F)			    sync_node_pages
> > > 					      write inode(x) /* clear PAGECACHE_TAG_DIRTY */
> > > 
> > > 
> > > That is, f2fs_sync_file run parallel with <writeback>.  The
> > > sync_node_pages above will return 1, because dnode(F) is written.
> > > inode(x) is written by <writeback> path.  And because
> > > PAGECACHE_TAG_DIRTY is cleared, it is possible that sync_node_pages
> > > called by f2fs_sync_file does not write inode(F).
> > 
> > I think Chao's comment would work.
> > How about this patch?
> > 
> > From 32fe5ff49d2c78d3be4cf3638cc64ae71cf44549 Mon Sep 17 00:00:00 2001
> > From: Jaegeuk Kim <jaegeuk@kernel.org>
> > Date: Wed, 10 Sep 2014 00:16:34 -0700
> > Subject: [PATCH] f2fs: fix roll-forward missing scenarios
> > 
> > We can summarize the roll forward recovery scenarios as follows.
> > 
> > [Term] F: fsync_mark, D: dentry_mark
> > 
> > 1. inode(x) | CP | inode(x) | dnode(F)
> > -> Update the latest inode(x).
> > 
> > 2. inode(x) | CP | inode(F) | dnode(F)
> > -> No problem.
> > 
> > 3. inode(x) | CP | dnode(F) | inode(x)
> > -> Recover to the latest dnode(F), and drop the last inode(x)
> > 
> > 4. inode(x) | CP | dnode(F) | inode(F)
> > -> No problem.
> > 
> > 5. CP | inode(x) | dnode(F)
> > -> The inode(DF) was missing. Should drop this dnode(F).
> > 
> > 6. CP | inode(DF) | dnode(F)
> > -> No problem.
> > 
> > 7. CP | dnode(F) | inode(DF)
> > -> If f2fs_iget fails, then goto next to find inode(DF).
> > 
> > 8. CP | dnode(F) | inode(x)
> > -> If f2fs_iget fails, then goto next to find inode(DF).
> >    But it will fail due to no inode(DF).
> > 
> > So, this patch adds some missing points such as #1, #5, #7, and #8.
> > 
> > Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> > ---
> >  fs/f2fs/file.c     | 20 ++++++++++++----
> >  fs/f2fs/node.c     | 11 ++++++++-
> >  fs/f2fs/recovery.c | 70 +++++++++++++++++++++++++++++++++++++++++++++---------
> >  3 files changed, 85 insertions(+), 16 deletions(-)
> > 
> > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> > index e7681c3..70f5d4b 100644
> > --- a/fs/f2fs/file.c
> > +++ b/fs/f2fs/file.c
> > @@ -206,15 +206,27 @@ int f2fs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
> >  			up_write(&fi->i_sem);
> >  		}
> >  	} else {
> > -		/* if there is no written node page, write its inode page */
> > -		while (!sync_node_pages(sbi, ino, &wbc)) {
> > -			if (fsync_mark_done(sbi, ino))
> > -				goto out;
> > +sync_nodes:
> > +		sync_node_pages(sbi, ino, &wbc);
> > +
> > +		/*
> > +		 * inode(x) | CP | inode(x) | dnode(F)
> > +		 *  -> ok
> 
> Is it acceptable that we turn this to:
> 
> inode(x) | CPU | inode (x) | dnode (F) | inode(F)

Why do we have to do writing additonal one block unnecessarily?
We got inode(x) for the recovery.

> 
> > +		 * inode(x) | CP | dnode(F) | inode(x)
> > +		 *  -> inode(x) | CP | dnode(F) | inode(x) | inode(F)
> > +		 * CP | inode(x) | dnode(F)
> > +		 *  -> CP | inode(x) | dnode(F) | inode(DF)
> > +		 * CP | dnode(F) | inode(x)
> > +		 *  -> CP | dnode(F) | inode(x) | inode(DF)
> > +		 */
> > +		if (!fsync_mark_done(sbi, ino)) {
> >  			mark_inode_dirty_sync(inode);
> >  			ret = f2fs_write_inode(inode, NULL);
> >  			if (ret)
> >  				goto out;
> > +			goto sync_nodes;
> >  		}
> > +
> >  		ret = wait_on_node_pages_writeback(sbi, ino);
> >  		if (ret)
> >  			goto out;
> > diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
> > index b32eb56..653aa71 100644
> > --- a/fs/f2fs/node.c
> > +++ b/fs/f2fs/node.c
> > @@ -248,8 +248,17 @@ retry:
> >  
> >  	/* update fsync_mark if its inode nat entry is still alive */
> >  	e = __lookup_nat_cache(nm_i, ni->ino);
> > -	if (e)
> > +	if (e) {
> > +		/*
> > +		 * CP | inode(x) | dnode(F)
> > +		 *  -> CP | inode(x) | dnode(F) | inode(DF)
> > +		 */
> > +		if (!e->checkpointed && !e->fsync_done &&
> > +				ni->ino != ni->nid && fsync_done)
> > +			goto skip;
> >  		e->fsync_done = fsync_done;
> > +	}
> > +skip:
> >  	write_unlock(&nm_i->nat_tree_lock);
> >  }
> 
> I don't understand why we need so complex logic?  Why not just let
> e->fsync_done reflect just latest is_fsync_dnode(page)?
> 
> It appears that in f2fs_sync_file, what we need is just whether inode
> page has fsync mark or not.
> 
> Best Regards,
> Huang, Ying
>  
> > diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c
> > index 6c5a74a..3736728 100644
> > --- a/fs/f2fs/recovery.c
> > +++ b/fs/f2fs/recovery.c
> > @@ -14,6 +14,36 @@
> >  #include "node.h"
> >  #include "segment.h"
> >  
> > +/*
> > + * Roll forward recovery scenarios.
> > + *
> > + * [Term] F: fsync_mark, D: dentry_mark
> > + *
> > + * 1. inode(x) | CP | inode(x) | dnode(F)
> > + * -> Update the latest inode(x).
> > + *
> > + * 2. inode(x) | CP | inode(F) | dnode(F)
> > + * -> No problem.
> > + *
> > + * 3. inode(x) | CP | dnode(F) | inode(x)
> > + * -> Recover to the latest dnode(F), and drop the last inode(x)
> > + *
> > + * 4. inode(x) | CP | dnode(F) | inode(F)
> > + * -> No problem.
> > + *
> > + * 5. CP | inode(x) | dnode(F)
> > + * -> The inode(DF) was missing. Should drop this dnode(F).
> > + *
> > + * 6. CP | inode(DF) | dnode(F)
> > + * -> No problem.
> > + *
> > + * 7. CP | dnode(F) | inode(DF)
> > + * -> If f2fs_iget fails, then goto next to find inode(DF).
> > + *
> > + * 8. CP | dnode(F) | inode(x)
> > + * -> If f2fs_iget fails, then goto next to find inode(DF).
> > + *    But it will fail due to no inode(DF).
> > + */
> >  static struct kmem_cache *fsync_entry_slab;
> >  
> >  bool space_for_roll_forward(struct f2fs_sb_info *sbi)
> > @@ -110,27 +140,32 @@ out:
> >  	return err;
> >  }
> >  
> > -static int recover_inode(struct inode *inode, struct page *node_page)
> > +static void __recover_inode(struct inode *inode, struct page *page)
> >  {
> > -	struct f2fs_inode *raw_inode = F2FS_INODE(node_page);
> > +	struct f2fs_inode *raw = F2FS_INODE(page);
> > +
> > +	inode->i_mode = le16_to_cpu(raw->i_mode);
> > +	i_size_write(inode, le64_to_cpu(raw->i_size));
> > +	inode->i_atime.tv_sec = le64_to_cpu(raw->i_mtime);
> > +	inode->i_ctime.tv_sec = le64_to_cpu(raw->i_ctime);
> > +	inode->i_mtime.tv_sec = le64_to_cpu(raw->i_mtime);
> > +	inode->i_atime.tv_nsec = le32_to_cpu(raw->i_mtime_nsec);
> > +	inode->i_ctime.tv_nsec = le32_to_cpu(raw->i_ctime_nsec);
> > +	inode->i_mtime.tv_nsec = le32_to_cpu(raw->i_mtime_nsec);
> > +}
> >  
> > +static int recover_inode(struct inode *inode, struct page *node_page)
> > +{
> >  	if (!IS_INODE(node_page))
> >  		return 0;
> >  
> > -	inode->i_mode = le16_to_cpu(raw_inode->i_mode);
> > -	i_size_write(inode, le64_to_cpu(raw_inode->i_size));
> > -	inode->i_atime.tv_sec = le64_to_cpu(raw_inode->i_mtime);
> > -	inode->i_ctime.tv_sec = le64_to_cpu(raw_inode->i_ctime);
> > -	inode->i_mtime.tv_sec = le64_to_cpu(raw_inode->i_mtime);
> > -	inode->i_atime.tv_nsec = le32_to_cpu(raw_inode->i_mtime_nsec);
> > -	inode->i_ctime.tv_nsec = le32_to_cpu(raw_inode->i_ctime_nsec);
> > -	inode->i_mtime.tv_nsec = le32_to_cpu(raw_inode->i_mtime_nsec);
> > +	__recover_inode(inode, node_page);
> >  
> >  	if (is_dent_dnode(node_page))
> >  		return recover_dentry(node_page, inode);
> >  
> >  	f2fs_msg(inode->i_sb, KERN_NOTICE, "recover_inode: ino = %x, name = %s",
> > -			ino_of_node(node_page), raw_inode->i_name);
> > +			ino_of_node(node_page), F2FS_INODE(node_page)->i_name);
> >  	return 0;
> >  }
> >  
> > @@ -186,10 +221,16 @@ static int find_fsync_dnodes(struct f2fs_sb_info *sbi, struct list_head *head)
> >  				break;
> >  			}
> >  
> > +			/*
> > +			 * CP | dnode(F) | inode(DF)
> > +			 * For this case, we should not give up now.
> > +			 */
> >  			entry->inode = f2fs_iget(sbi->sb, ino_of_node(page));
> >  			if (IS_ERR(entry->inode)) {
> >  				err = PTR_ERR(entry->inode);
> >  				kmem_cache_free(fsync_entry_slab, entry);
> > +				if (err == -ENOENT)
> > +					goto next;
> >  				break;
> >  			}
> >  			list_add_tail(&entry->list, head);
> > @@ -416,6 +457,13 @@ static int recover_data(struct f2fs_sb_info *sbi,
> >  		entry = get_fsync_inode(head, ino_of_node(page));
> >  		if (!entry)
> >  			goto next;
> > +		/*
> > +		 * inode(x) | CP | inode(x) | dnode(F)
> > +		 * In this case, we can lose the latest inode(x).
> > +		 * So, call __recover_inode for the inode update.
> > +		 */
> > +		if (IS_INODE(page))
> > +			__recover_inode(entry->inode, page);
> >  
> >  		err = do_recover_data(sbi, entry->inode, page, blkaddr);
> >  		if (err)

------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] f2fs: Fix recover when nid of non-inode dnode < nid of inode
  2014-09-13 14:23                       ` Huang Ying
@ 2014-09-14  7:48                         ` Jaegeuk Kim
  -1 siblings, 0 replies; 31+ messages in thread
From: Jaegeuk Kim @ 2014-09-14  7:48 UTC (permalink / raw)
  To: Huang Ying; +Cc: huang ying, Changman Lee, linux-f2fs-devel, LKML

On Sat, Sep 13, 2014 at 10:23:18PM +0800, Huang Ying wrote:
> On Fri, 2014-09-12 at 15:34 +0800, Huang Ying wrote:
> > On Thu, 2014-09-11 at 22:13 -0700, Jaegeuk Kim wrote:
> > > On Thu, Sep 11, 2014 at 08:25:17PM +0800, Huang Ying wrote:
> > > > 
> > > > On Wed, 2014-09-10 at 22:37 -0700, Jaegeuk Kim wrote:
> > > > > On Wed, Sep 10, 2014 at 07:08:32PM +0800, huang ying wrote:
> > > > > > On Wed, Sep 10, 2014 at 3:21 PM, Jaegeuk Kim <jaegeuk@kernel.org> wrote:
> > > > > > 
> > > > > > > On Tue, Sep 09, 2014 at 07:31:49PM +0800, huang ying wrote:
> > > > > > > > On Tue, Sep 9, 2014 at 3:09 PM, Jaegeuk Kim <jaegeuk@kernel.org> wrote:
> > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > On Tue, Sep 09, 2014 at 01:39:30PM +0800, Huang Ying wrote:
> > > > > > > > > > On Mon, 2014-09-08 at 22:23 -0700, Jaegeuk Kim wrote:
> > > > > > > > > > > Hi Huang,
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Sep 08, 2014 at 07:38:26PM +0800, Huang Ying wrote:
> > > > > > > > > > > > For fsync, if the nid of a non-inode dnode < nid of inode and the
> > > > > > > > > > > > inode is not checkpointed.  The non-inode dnode may be written
> > > > > > > before
> > > > > > > > > > > > inode.  So in find_fsync_dnodes, f2fs_iget will fail, cause the
> > > > > > > > > > > > recovery fail.
> > > > > > > > > > > >
> > > > > > > > > > > > Usually, inode will be allocated before non-inode dnode, so the
> > > > > > > nid
> > > > > > > > > of
> > > > > > > > > > > > inode < nid of non-inode dnode.  But it is possible for the
> > > > > > > reverse.
> > > > > > > > > > > > For example, because of alloc_nid_failed.
> > > > > > > > > > > >
> > > > > > > > > > > > This is fixed via ignoring non-inode dnode before inode dnode in
> > > > > > > > > > > > find_fsync_dnodes.
> > > > > > > > > > > >
> > > > > > > > > > > > The patch was tested via allocating nid reversely via a debugging
> > > > > > > > > > > > patch, that is, from big number to small number.
> > > > > > > > > > > >
> > > > > > > > > > > > Signed-off-by: Huang, Ying <ying.huang@intel.com>
> > > > > > > > > > > > ---
> > > > > > > > > > > >  fs/f2fs/recovery.c |    7 ++++---
> > > > > > > > > > > >  1 file changed, 4 insertions(+), 3 deletions(-)
> > > > > > > > > > > >
> > > > > > > > > > > > --- a/fs/f2fs/recovery.c
> > > > > > > > > > > > +++ b/fs/f2fs/recovery.c
> > > > > > > > > > > > @@ -172,8 +172,8 @@ static int find_fsync_dnodes(struct f2fs
> > > > > > > > > > > >                   if (IS_INODE(page) && is_dent_dnode(page))
> > > > > > > > > > > >                           set_inode_flag(F2FS_I(entry->inode),
> > > > > > > > > > > >                                                   FI_INC_LINK);
> > > > > > > > > > > > -         } else {
> > > > > > > > > > > > -                 if (IS_INODE(page) && is_dent_dnode(page)) {
> > > > > > > > > > >
> > > > > > > > > > > If this is not inode block, we should add this inode to recover its
> > > > > > > > > data blocks.
> > > > > > > > > >
> > > > > > > > > > Is it possible that there is only non-inode dnode but no inode when
> > > > > > > > > > find_fsync_dnodes checking dnodes?  Per my understanding, any
> > > > > > > changes to
> > > > > > > > > > file will cause inode page dirty (for example, mtime changed), so
> > > > > > > that
> > > > > > > > > > we will write inode block.  Is it right?  If so, the solution in this
> > > > > > > > > > patch should work too.
> > > > > > > > >
> > > > > > > > > Your description says that f2fs_iget will fail, which causes the
> > > > > > > recovery
> > > > > > > > > fail.
> > > > > > > > > So, I thought it would be better to handle the f2fs_iget failure
> > > > > > > directly.
> > > > > > > > >
> > > > > > > >
> > > > > > > > Yes.  That is another way to fix the issue.
> > > > > > > >
> > > > > > > >
> > > > > > > > > In addition, we cannot guarantee the write order of dnode and inode.
> > > > > > > > > For exmaple,
> > > > > > > > > 1. the inode is written by flusher or kswapd, then,
> > > > > > > > > 2. f2fs_sync_file writes its dnode.
> > > > > > > > >
> > > > > > > > > In that case, we can get only non-inode dnode in the node chain, since
> > > > > > > the
> > > > > > > > > inode
> > > > > > > > > has not fsync_mark.
> > > > > > > > >
> > > > > > > >
> > > > > > > > I think your solution is better here, but does not fix all scenarios.  If
> > > > > > > > the inode is checkpointed, the file can be recovered, although the inode
> > > > > > > > information may be not up to date.  But if the inode is not checkpointed,
> > > > > > > > f2fs_iget will fail too and recover will fail.
> > > > > > >
> > > > > > > Ok, let me consider your scenarios.
> > > > > > >
> > > > > > > Term: F: fsync_mark, D: dentry_mark
> > > > > > >
> > > > > > > 1. inode(x) | CP | inode(x) | dnode(F)
> > > > > > >  -> Lose the latest inode(x). Need to fix.
> > > > > > >
> > > > > > > 2. inode(x) | CP | dnode(F) | inode(x)
> > > > > > >  -> Impossible, but recover latest dnode(F)
> > > > > > >
> > > > > > > 3. CP | inode(x) | dnode(F)
> > > > > > >  -> Need to write inode(DF) in f2fs_sync_file.
> > > > > > >
> > > > > > > 4. CP | dnode(F) | inode(DF)
> > > > > > >  -> If f2fs_iget fails, then goto next.
> > > > > > >
> > > > > > > 5. CP | dnode(F) | inode(x)
> > > > > > >  -> If f2fs_iget fails, then goto next. But, this is an impossible
> > > > > > > scenario.
> > > > > > >     Drop this dnode(F).
> > > > > > >
> > > > > > > Indeed, there were some missing scenarios.
> > > > > > >
> > > > > > > So, how about this patch?
> > > > > > >
> > > > > > > From 552dc68c5f07a335d7b55c197bab531efb135521 Mon Sep 17 00:00:00 2001
> > > > > > > From: Jaegeuk Kim <jaegeuk@kernel.org>
> > > > > > > Date: Wed, 10 Sep 2014 00:16:34 -0700
> > > > > > > Subject: [PATCH] f2fs: fix roll-forward missing scenarios
> > > > > > >
> > > > > > > We can summarize the roll forward recovery scenarios as follows.
> > > > > > >
> > > > > > > [Term] F: fsync_mark, D: dentry_mark
> > > > > > >
> > > > > > > 1. inode(x) | CP | inode(x) | dnode(F)
> > > > > > > -> Update the latest inode(x).
> > > > > > >
> > > > > > > 2. inode(x) | CP | inode(F) | dnode(F)
> > > > > > > -> No problem.
> > > > > > >
> > > > > > > 3. inode(x) | CP | dnode(F) | inode(x)
> > > > > > > -> Impossible, but recover latest dnode(F)
> > > > > > >
> > > > > > 
> > > > > > I think this is possible.  If f2fs_sync_file runs concurrently with
> > > > > > writeback. f2fs_sync_file written dnode(F), then writeback written inode(x).
> > > > > 
> > > > > If the inode(x) was written, f2fs_sync_file will do write the inode again with
> > > > > fsync_mark. So, dnode(F) | inode(x) | inode(F) should be shown.
> > > > > 
> > > > > In f2fs_sync_file,
> > > > > 	...
> > > > > 	while (!sync_node_pages(sbi, ino, &wbc)) {
> > > > > 		if (fsync_mark_done(sbi, ino))
> > > > > 			goto out;
> > > > > 		mark_inode_dirty_sync(inode);
> > > > > 		ret = f2fs_write_inode(inode, NULL);
> > > > > 		if (ret)
> > > > > 			goto out;
> > > > > 	}
> > > > > 	...
> > > > 
> > > > Is the following situation possible?
> > > > 
> > > > f2fs_sync_file				<writeback>
> > > >   sync_node_pages			  f2fs_write_node_pages
> > > >     write dnode(F)			    sync_node_pages
> > > > 					      write inode(x) /* clear PAGECACHE_TAG_DIRTY */
> > > > 
> > > > 
> > > > That is, f2fs_sync_file run parallel with <writeback>.  The
> > > > sync_node_pages above will return 1, because dnode(F) is written.
> > > > inode(x) is written by <writeback> path.  And because
> > > > PAGECACHE_TAG_DIRTY is cleared, it is possible that sync_node_pages
> > > > called by f2fs_sync_file does not write inode(F).
> > > 
> > > I think Chao's comment would work.
> > > How about this patch?
> > > 
> > > From 32fe5ff49d2c78d3be4cf3638cc64ae71cf44549 Mon Sep 17 00:00:00 2001
> > > From: Jaegeuk Kim <jaegeuk@kernel.org>
> > > Date: Wed, 10 Sep 2014 00:16:34 -0700
> > > Subject: [PATCH] f2fs: fix roll-forward missing scenarios
> > > 
> > > We can summarize the roll forward recovery scenarios as follows.
> > > 
> > > [Term] F: fsync_mark, D: dentry_mark
> > > 
> > > 1. inode(x) | CP | inode(x) | dnode(F)
> > > -> Update the latest inode(x).
> > > 
> > > 2. inode(x) | CP | inode(F) | dnode(F)
> > > -> No problem.
> > > 
> > > 3. inode(x) | CP | dnode(F) | inode(x)
> > > -> Recover to the latest dnode(F), and drop the last inode(x)
> > > 
> > > 4. inode(x) | CP | dnode(F) | inode(F)
> > > -> No problem.
> > > 
> > > 5. CP | inode(x) | dnode(F)
> > > -> The inode(DF) was missing. Should drop this dnode(F).
> > > 
> > > 6. CP | inode(DF) | dnode(F)
> > > -> No problem.
> > > 
> > > 7. CP | dnode(F) | inode(DF)
> > > -> If f2fs_iget fails, then goto next to find inode(DF).
> > > 
> > > 8. CP | dnode(F) | inode(x)
> > > -> If f2fs_iget fails, then goto next to find inode(DF).
> > >    But it will fail due to no inode(DF).
> > > 
> > > So, this patch adds some missing points such as #1, #5, #7, and #8.
> > > 
> > > Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> > > ---
> > >  fs/f2fs/file.c     | 20 ++++++++++++----
> > >  fs/f2fs/node.c     | 11 ++++++++-
> > >  fs/f2fs/recovery.c | 70 +++++++++++++++++++++++++++++++++++++++++++++---------
> > >  3 files changed, 85 insertions(+), 16 deletions(-)
> > > 
> > > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> > > index e7681c3..70f5d4b 100644
> > > --- a/fs/f2fs/file.c
> > > +++ b/fs/f2fs/file.c
> > > @@ -206,15 +206,27 @@ int f2fs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
> > >  			up_write(&fi->i_sem);
> > >  		}
> > >  	} else {
> > > -		/* if there is no written node page, write its inode page */
> > > -		while (!sync_node_pages(sbi, ino, &wbc)) {
> > > -			if (fsync_mark_done(sbi, ino))
> > > -				goto out;
> > > +sync_nodes:
> > > +		sync_node_pages(sbi, ino, &wbc);
> > > +
> > > +		/*
> > > +		 * inode(x) | CP | inode(x) | dnode(F)
> > > +		 *  -> ok
> > 
> > Is it acceptable that we turn this to:
> > 
> > inode(x) | CPU | inode (x) | dnode (F) | inode(F)
> > 
> > > +		 * inode(x) | CP | dnode(F) | inode(x)
> > > +		 *  -> inode(x) | CP | dnode(F) | inode(x) | inode(F)
> > > +		 * CP | inode(x) | dnode(F)
> > > +		 *  -> CP | inode(x) | dnode(F) | inode(DF)
> > > +		 * CP | dnode(F) | inode(x)
> > > +		 *  -> CP | dnode(F) | inode(x) | inode(DF)
> > > +		 */
> > > +		if (!fsync_mark_done(sbi, ino)) {
> > >  			mark_inode_dirty_sync(inode);
> > >  			ret = f2fs_write_inode(inode, NULL);
> > >  			if (ret)
> > >  				goto out;
> > > +			goto sync_nodes;
> > >  		}
> > > +
> > >  		ret = wait_on_node_pages_writeback(sbi, ino);
> > >  		if (ret)
> > >  			goto out;
> > > diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
> > > index b32eb56..653aa71 100644
> > > --- a/fs/f2fs/node.c
> > > +++ b/fs/f2fs/node.c
> > > @@ -248,8 +248,17 @@ retry:
> > >  
> > >  	/* update fsync_mark if its inode nat entry is still alive */
> > >  	e = __lookup_nat_cache(nm_i, ni->ino);
> > > -	if (e)
> > > +	if (e) {
> > > +		/*
> > > +		 * CP | inode(x) | dnode(F)
> > > +		 *  -> CP | inode(x) | dnode(F) | inode(DF)
> > > +		 */
> > > +		if (!e->checkpointed && !e->fsync_done &&
> > > +				ni->ino != ni->nid && fsync_done)
> > > +			goto skip;
> > >  		e->fsync_done = fsync_done;
> > > +	}
> > > +skip:
> > >  	write_unlock(&nm_i->nat_tree_lock);
> > >  }
> > 
> > I don't understand why we need so complex logic?  Why not just let
> > e->fsync_done reflect just latest is_fsync_dnode(page)?
> > 
> > It appears that in f2fs_sync_file, what we need is just whether inode
> > page has fsync mark or not.
> 
> Oh, I see.  The e->fsync_done indicates whether latest node of the inode
> has fsync_mark.  That can be used to deal with:
> 
> CP | inode(DF) | dnode(x)
> 
> But I still think it is possible to make e->fsync_done simple via
> introduce another simple e->node_fsync_done to indicate whether the node
> itself has fsync_mark.  What do you think about the solution in the
> below patch?  Just build tested.
> 
> IMHO, it is easier to be understand.  And in effect, it is almost same,
> because it is hard to distinguish between

I think you are showing another complexity with the same functionality.

As I mentioned before, this itself is a little bit corner case, so we need
at least such kind of combined conditions.
So, it doesn't make enough sense to add one more variable per each nat entry.

Thanks,

> 
> CP | inode(x) | dnode(F)
> 
> and
> 
> inode(x) | CP | inode(x) | dnode(F)
> 
> in your solution too.
> 
> Best Regards,
> Huang, Ying
> 
> ----------------------------------------------------->
> ---
>  fs/f2fs/f2fs.h     |    1 +
>  fs/f2fs/file.c     |   26 ++++++++++++++++++++++----
>  fs/f2fs/node.c     |   17 ++++++++++++++++-
>  fs/f2fs/node.h     |    4 +++-
>  fs/f2fs/recovery.c |    2 ++
>  5 files changed, 44 insertions(+), 6 deletions(-)
> 
> --- a/fs/f2fs/file.c
> +++ b/fs/f2fs/file.c
> @@ -207,10 +207,28 @@ int f2fs_sync_file(struct file *file, lo
>  			up_write(&fi->i_sem);
>  		}
>  	} else {
> -		/* if there is no written node page, write its inode page */
> -		while (!sync_node_pages(sbi, inode->i_ino, &wbc)) {
> -			if (fsync_mark_done(sbi, inode->i_ino))
> -				goto out;
> +		for (;;) {
> +			sync_node_pages(sbi, inode->i_ino, &wbc);
> +			/*
> +			 * inode(x) | CP | dnode(F)
> +			 * -> ok
> +			 * inode(x) | CP | inode(x) | dnode(x)
> +			 * -> inode(x) | CP | inode(x) | dnode(x) | inode(F)
> +			 * inode(x) | CP | inode(x) | dnode(F)
> +			 * -> inode(x) | CP | inode(x) | dnode(F) | inode(F)
> +			 * CP | inode(x) | dnode(x)
> +			 * -> CP | inode(x) | dnode(x) | inode(DF)
> +			 * CP | inode(x) | dnode(F)
> +			 * -> CP | inode(x) | dnode(F) | inode(DF)
> +			 * CP | dnode(F) | inode(x)
> +			 * -> CP | dnode(F) | inode(x) | inode(DF)
> +			 */
> +			if ((is_checkpointed_node(sbi, inode->i_ino) &&
> +			     fsync_mark_done(sbi, inode->i_ino)) ||
> +			    (!is_checkpointed_node(sbi, inode->i_ino) &&
> +			     node_fsync_mark_done(sbi, inode->i_ino) &&
> +			     fsync_mark_done(sbi, inode->i_ino)))
> +				break;
>  			mark_inode_dirty_sync(inode);
>  			ret = f2fs_write_inode(inode, NULL);
>  			if (ret)
> --- a/fs/f2fs/node.c
> +++ b/fs/f2fs/node.c
> @@ -137,6 +137,20 @@ int is_checkpointed_node(struct f2fs_sb_
>  	return is_cp;
>  }
>  
> +bool node_fsync_mark_done(struct f2fs_sb_info *sbi, nid_t nid)
> +{
> +	struct f2fs_nm_info *nm_i = NM_I(sbi);
> +	struct nat_entry *e;
> +	bool node_fsync_done = false;
> +
> +	read_lock(&nm_i->nat_tree_lock);
> +	e = __lookup_nat_cache(nm_i, nid);
> +	if (e)
> +		node_fsync_done = e->node_fsync_done;
> +	read_unlock(&nm_i->nat_tree_lock);
> +	return node_fsync_done;
> +}
> +
>  bool fsync_mark_done(struct f2fs_sb_info *sbi, nid_t nid)
>  {
>  	struct f2fs_nm_info *nm_i = NM_I(sbi);
> @@ -246,7 +260,8 @@ retry:
>  	nat_set_blkaddr(e, new_blkaddr);
>  	__set_nat_cache_dirty(nm_i, e);
>  
> -	/* update fsync_mark if its inode nat entry is still alive */
> +	e->node_fsync_done = fsync_done;
> +	/* update fsync_done if its inode nat entry is still alive */
>  	e = __lookup_nat_cache(nm_i, ni->ino);
>  	if (e)
>  		e->fsync_done = fsync_done;
> --- a/fs/f2fs/node.h
> +++ b/fs/f2fs/node.h
> @@ -42,7 +42,9 @@ struct node_info {
>  struct nat_entry {
>  	struct list_head list;	/* for clean or dirty nat list */
>  	bool checkpointed;	/* whether it is checkpointed or not */
> -	bool fsync_done;	/* whether the latest node has fsync mark */
> +	bool node_fsync_done;	/* whether the node has fsync mark */
> +	bool fsync_done;	/* whether the latest node of the
> +				 * inode has fsync mark */
>  	struct node_info ni;	/* in-memory node information */
>  };
>  
> --- a/fs/f2fs/recovery.c
> +++ b/fs/f2fs/recovery.c
> @@ -190,6 +190,8 @@ static int find_fsync_dnodes(struct f2fs
>  			if (IS_ERR(entry->inode)) {
>  				err = PTR_ERR(entry->inode);
>  				kmem_cache_free(fsync_entry_slab, entry);
> +				if (err == -ENOENT)
> +					goto next;
>  				break;
>  			}
>  			list_add_tail(&entry->list, head);
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -1220,6 +1220,7 @@ struct node_info;
>  
>  bool available_free_memory(struct f2fs_sb_info *, int);
>  int is_checkpointed_node(struct f2fs_sb_info *, nid_t);
> +bool node_fsync_mark_done(struct f2fs_sb_info *, nid_t);
>  bool fsync_mark_done(struct f2fs_sb_info *, nid_t);
>  void fsync_mark_clear(struct f2fs_sb_info *, nid_t);
>  void get_node_info(struct f2fs_sb_info *, nid_t, struct node_info *);

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] f2fs: Fix recover when nid of non-inode dnode < nid of inode
@ 2014-09-14  7:48                         ` Jaegeuk Kim
  0 siblings, 0 replies; 31+ messages in thread
From: Jaegeuk Kim @ 2014-09-14  7:48 UTC (permalink / raw)
  To: Huang Ying; +Cc: huang ying, LKML, linux-f2fs-devel

On Sat, Sep 13, 2014 at 10:23:18PM +0800, Huang Ying wrote:
> On Fri, 2014-09-12 at 15:34 +0800, Huang Ying wrote:
> > On Thu, 2014-09-11 at 22:13 -0700, Jaegeuk Kim wrote:
> > > On Thu, Sep 11, 2014 at 08:25:17PM +0800, Huang Ying wrote:
> > > > 
> > > > On Wed, 2014-09-10 at 22:37 -0700, Jaegeuk Kim wrote:
> > > > > On Wed, Sep 10, 2014 at 07:08:32PM +0800, huang ying wrote:
> > > > > > On Wed, Sep 10, 2014 at 3:21 PM, Jaegeuk Kim <jaegeuk@kernel.org> wrote:
> > > > > > 
> > > > > > > On Tue, Sep 09, 2014 at 07:31:49PM +0800, huang ying wrote:
> > > > > > > > On Tue, Sep 9, 2014 at 3:09 PM, Jaegeuk Kim <jaegeuk@kernel.org> wrote:
> > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > On Tue, Sep 09, 2014 at 01:39:30PM +0800, Huang Ying wrote:
> > > > > > > > > > On Mon, 2014-09-08 at 22:23 -0700, Jaegeuk Kim wrote:
> > > > > > > > > > > Hi Huang,
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Sep 08, 2014 at 07:38:26PM +0800, Huang Ying wrote:
> > > > > > > > > > > > For fsync, if the nid of a non-inode dnode < nid of inode and the
> > > > > > > > > > > > inode is not checkpointed.  The non-inode dnode may be written
> > > > > > > before
> > > > > > > > > > > > inode.  So in find_fsync_dnodes, f2fs_iget will fail, cause the
> > > > > > > > > > > > recovery fail.
> > > > > > > > > > > >
> > > > > > > > > > > > Usually, inode will be allocated before non-inode dnode, so the
> > > > > > > nid
> > > > > > > > > of
> > > > > > > > > > > > inode < nid of non-inode dnode.  But it is possible for the
> > > > > > > reverse.
> > > > > > > > > > > > For example, because of alloc_nid_failed.
> > > > > > > > > > > >
> > > > > > > > > > > > This is fixed via ignoring non-inode dnode before inode dnode in
> > > > > > > > > > > > find_fsync_dnodes.
> > > > > > > > > > > >
> > > > > > > > > > > > The patch was tested via allocating nid reversely via a debugging
> > > > > > > > > > > > patch, that is, from big number to small number.
> > > > > > > > > > > >
> > > > > > > > > > > > Signed-off-by: Huang, Ying <ying.huang@intel.com>
> > > > > > > > > > > > ---
> > > > > > > > > > > >  fs/f2fs/recovery.c |    7 ++++---
> > > > > > > > > > > >  1 file changed, 4 insertions(+), 3 deletions(-)
> > > > > > > > > > > >
> > > > > > > > > > > > --- a/fs/f2fs/recovery.c
> > > > > > > > > > > > +++ b/fs/f2fs/recovery.c
> > > > > > > > > > > > @@ -172,8 +172,8 @@ static int find_fsync_dnodes(struct f2fs
> > > > > > > > > > > >                   if (IS_INODE(page) && is_dent_dnode(page))
> > > > > > > > > > > >                           set_inode_flag(F2FS_I(entry->inode),
> > > > > > > > > > > >                                                   FI_INC_LINK);
> > > > > > > > > > > > -         } else {
> > > > > > > > > > > > -                 if (IS_INODE(page) && is_dent_dnode(page)) {
> > > > > > > > > > >
> > > > > > > > > > > If this is not inode block, we should add this inode to recover its
> > > > > > > > > data blocks.
> > > > > > > > > >
> > > > > > > > > > Is it possible that there is only non-inode dnode but no inode when
> > > > > > > > > > find_fsync_dnodes checking dnodes?  Per my understanding, any
> > > > > > > changes to
> > > > > > > > > > file will cause inode page dirty (for example, mtime changed), so
> > > > > > > that
> > > > > > > > > > we will write inode block.  Is it right?  If so, the solution in this
> > > > > > > > > > patch should work too.
> > > > > > > > >
> > > > > > > > > Your description says that f2fs_iget will fail, which causes the
> > > > > > > recovery
> > > > > > > > > fail.
> > > > > > > > > So, I thought it would be better to handle the f2fs_iget failure
> > > > > > > directly.
> > > > > > > > >
> > > > > > > >
> > > > > > > > Yes.  That is another way to fix the issue.
> > > > > > > >
> > > > > > > >
> > > > > > > > > In addition, we cannot guarantee the write order of dnode and inode.
> > > > > > > > > For exmaple,
> > > > > > > > > 1. the inode is written by flusher or kswapd, then,
> > > > > > > > > 2. f2fs_sync_file writes its dnode.
> > > > > > > > >
> > > > > > > > > In that case, we can get only non-inode dnode in the node chain, since
> > > > > > > the
> > > > > > > > > inode
> > > > > > > > > has not fsync_mark.
> > > > > > > > >
> > > > > > > >
> > > > > > > > I think your solution is better here, but does not fix all scenarios.  If
> > > > > > > > the inode is checkpointed, the file can be recovered, although the inode
> > > > > > > > information may be not up to date.  But if the inode is not checkpointed,
> > > > > > > > f2fs_iget will fail too and recover will fail.
> > > > > > >
> > > > > > > Ok, let me consider your scenarios.
> > > > > > >
> > > > > > > Term: F: fsync_mark, D: dentry_mark
> > > > > > >
> > > > > > > 1. inode(x) | CP | inode(x) | dnode(F)
> > > > > > >  -> Lose the latest inode(x). Need to fix.
> > > > > > >
> > > > > > > 2. inode(x) | CP | dnode(F) | inode(x)
> > > > > > >  -> Impossible, but recover latest dnode(F)
> > > > > > >
> > > > > > > 3. CP | inode(x) | dnode(F)
> > > > > > >  -> Need to write inode(DF) in f2fs_sync_file.
> > > > > > >
> > > > > > > 4. CP | dnode(F) | inode(DF)
> > > > > > >  -> If f2fs_iget fails, then goto next.
> > > > > > >
> > > > > > > 5. CP | dnode(F) | inode(x)
> > > > > > >  -> If f2fs_iget fails, then goto next. But, this is an impossible
> > > > > > > scenario.
> > > > > > >     Drop this dnode(F).
> > > > > > >
> > > > > > > Indeed, there were some missing scenarios.
> > > > > > >
> > > > > > > So, how about this patch?
> > > > > > >
> > > > > > > From 552dc68c5f07a335d7b55c197bab531efb135521 Mon Sep 17 00:00:00 2001
> > > > > > > From: Jaegeuk Kim <jaegeuk@kernel.org>
> > > > > > > Date: Wed, 10 Sep 2014 00:16:34 -0700
> > > > > > > Subject: [PATCH] f2fs: fix roll-forward missing scenarios
> > > > > > >
> > > > > > > We can summarize the roll forward recovery scenarios as follows.
> > > > > > >
> > > > > > > [Term] F: fsync_mark, D: dentry_mark
> > > > > > >
> > > > > > > 1. inode(x) | CP | inode(x) | dnode(F)
> > > > > > > -> Update the latest inode(x).
> > > > > > >
> > > > > > > 2. inode(x) | CP | inode(F) | dnode(F)
> > > > > > > -> No problem.
> > > > > > >
> > > > > > > 3. inode(x) | CP | dnode(F) | inode(x)
> > > > > > > -> Impossible, but recover latest dnode(F)
> > > > > > >
> > > > > > 
> > > > > > I think this is possible.  If f2fs_sync_file runs concurrently with
> > > > > > writeback. f2fs_sync_file written dnode(F), then writeback written inode(x).
> > > > > 
> > > > > If the inode(x) was written, f2fs_sync_file will do write the inode again with
> > > > > fsync_mark. So, dnode(F) | inode(x) | inode(F) should be shown.
> > > > > 
> > > > > In f2fs_sync_file,
> > > > > 	...
> > > > > 	while (!sync_node_pages(sbi, ino, &wbc)) {
> > > > > 		if (fsync_mark_done(sbi, ino))
> > > > > 			goto out;
> > > > > 		mark_inode_dirty_sync(inode);
> > > > > 		ret = f2fs_write_inode(inode, NULL);
> > > > > 		if (ret)
> > > > > 			goto out;
> > > > > 	}
> > > > > 	...
> > > > 
> > > > Is the following situation possible?
> > > > 
> > > > f2fs_sync_file				<writeback>
> > > >   sync_node_pages			  f2fs_write_node_pages
> > > >     write dnode(F)			    sync_node_pages
> > > > 					      write inode(x) /* clear PAGECACHE_TAG_DIRTY */
> > > > 
> > > > 
> > > > That is, f2fs_sync_file run parallel with <writeback>.  The
> > > > sync_node_pages above will return 1, because dnode(F) is written.
> > > > inode(x) is written by <writeback> path.  And because
> > > > PAGECACHE_TAG_DIRTY is cleared, it is possible that sync_node_pages
> > > > called by f2fs_sync_file does not write inode(F).
> > > 
> > > I think Chao's comment would work.
> > > How about this patch?
> > > 
> > > From 32fe5ff49d2c78d3be4cf3638cc64ae71cf44549 Mon Sep 17 00:00:00 2001
> > > From: Jaegeuk Kim <jaegeuk@kernel.org>
> > > Date: Wed, 10 Sep 2014 00:16:34 -0700
> > > Subject: [PATCH] f2fs: fix roll-forward missing scenarios
> > > 
> > > We can summarize the roll forward recovery scenarios as follows.
> > > 
> > > [Term] F: fsync_mark, D: dentry_mark
> > > 
> > > 1. inode(x) | CP | inode(x) | dnode(F)
> > > -> Update the latest inode(x).
> > > 
> > > 2. inode(x) | CP | inode(F) | dnode(F)
> > > -> No problem.
> > > 
> > > 3. inode(x) | CP | dnode(F) | inode(x)
> > > -> Recover to the latest dnode(F), and drop the last inode(x)
> > > 
> > > 4. inode(x) | CP | dnode(F) | inode(F)
> > > -> No problem.
> > > 
> > > 5. CP | inode(x) | dnode(F)
> > > -> The inode(DF) was missing. Should drop this dnode(F).
> > > 
> > > 6. CP | inode(DF) | dnode(F)
> > > -> No problem.
> > > 
> > > 7. CP | dnode(F) | inode(DF)
> > > -> If f2fs_iget fails, then goto next to find inode(DF).
> > > 
> > > 8. CP | dnode(F) | inode(x)
> > > -> If f2fs_iget fails, then goto next to find inode(DF).
> > >    But it will fail due to no inode(DF).
> > > 
> > > So, this patch adds some missing points such as #1, #5, #7, and #8.
> > > 
> > > Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> > > ---
> > >  fs/f2fs/file.c     | 20 ++++++++++++----
> > >  fs/f2fs/node.c     | 11 ++++++++-
> > >  fs/f2fs/recovery.c | 70 +++++++++++++++++++++++++++++++++++++++++++++---------
> > >  3 files changed, 85 insertions(+), 16 deletions(-)
> > > 
> > > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> > > index e7681c3..70f5d4b 100644
> > > --- a/fs/f2fs/file.c
> > > +++ b/fs/f2fs/file.c
> > > @@ -206,15 +206,27 @@ int f2fs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
> > >  			up_write(&fi->i_sem);
> > >  		}
> > >  	} else {
> > > -		/* if there is no written node page, write its inode page */
> > > -		while (!sync_node_pages(sbi, ino, &wbc)) {
> > > -			if (fsync_mark_done(sbi, ino))
> > > -				goto out;
> > > +sync_nodes:
> > > +		sync_node_pages(sbi, ino, &wbc);
> > > +
> > > +		/*
> > > +		 * inode(x) | CP | inode(x) | dnode(F)
> > > +		 *  -> ok
> > 
> > Is it acceptable that we turn this to:
> > 
> > inode(x) | CPU | inode (x) | dnode (F) | inode(F)
> > 
> > > +		 * inode(x) | CP | dnode(F) | inode(x)
> > > +		 *  -> inode(x) | CP | dnode(F) | inode(x) | inode(F)
> > > +		 * CP | inode(x) | dnode(F)
> > > +		 *  -> CP | inode(x) | dnode(F) | inode(DF)
> > > +		 * CP | dnode(F) | inode(x)
> > > +		 *  -> CP | dnode(F) | inode(x) | inode(DF)
> > > +		 */
> > > +		if (!fsync_mark_done(sbi, ino)) {
> > >  			mark_inode_dirty_sync(inode);
> > >  			ret = f2fs_write_inode(inode, NULL);
> > >  			if (ret)
> > >  				goto out;
> > > +			goto sync_nodes;
> > >  		}
> > > +
> > >  		ret = wait_on_node_pages_writeback(sbi, ino);
> > >  		if (ret)
> > >  			goto out;
> > > diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
> > > index b32eb56..653aa71 100644
> > > --- a/fs/f2fs/node.c
> > > +++ b/fs/f2fs/node.c
> > > @@ -248,8 +248,17 @@ retry:
> > >  
> > >  	/* update fsync_mark if its inode nat entry is still alive */
> > >  	e = __lookup_nat_cache(nm_i, ni->ino);
> > > -	if (e)
> > > +	if (e) {
> > > +		/*
> > > +		 * CP | inode(x) | dnode(F)
> > > +		 *  -> CP | inode(x) | dnode(F) | inode(DF)
> > > +		 */
> > > +		if (!e->checkpointed && !e->fsync_done &&
> > > +				ni->ino != ni->nid && fsync_done)
> > > +			goto skip;
> > >  		e->fsync_done = fsync_done;
> > > +	}
> > > +skip:
> > >  	write_unlock(&nm_i->nat_tree_lock);
> > >  }
> > 
> > I don't understand why we need so complex logic?  Why not just let
> > e->fsync_done reflect just latest is_fsync_dnode(page)?
> > 
> > It appears that in f2fs_sync_file, what we need is just whether inode
> > page has fsync mark or not.
> 
> Oh, I see.  The e->fsync_done indicates whether latest node of the inode
> has fsync_mark.  That can be used to deal with:
> 
> CP | inode(DF) | dnode(x)
> 
> But I still think it is possible to make e->fsync_done simple via
> introduce another simple e->node_fsync_done to indicate whether the node
> itself has fsync_mark.  What do you think about the solution in the
> below patch?  Just build tested.
> 
> IMHO, it is easier to be understand.  And in effect, it is almost same,
> because it is hard to distinguish between

I think you are showing another complexity with the same functionality.

As I mentioned before, this itself is a little bit corner case, so we need
at least such kind of combined conditions.
So, it doesn't make enough sense to add one more variable per each nat entry.

Thanks,

> 
> CP | inode(x) | dnode(F)
> 
> and
> 
> inode(x) | CP | inode(x) | dnode(F)
> 
> in your solution too.
> 
> Best Regards,
> Huang, Ying
> 
> ----------------------------------------------------->
> ---
>  fs/f2fs/f2fs.h     |    1 +
>  fs/f2fs/file.c     |   26 ++++++++++++++++++++++----
>  fs/f2fs/node.c     |   17 ++++++++++++++++-
>  fs/f2fs/node.h     |    4 +++-
>  fs/f2fs/recovery.c |    2 ++
>  5 files changed, 44 insertions(+), 6 deletions(-)
> 
> --- a/fs/f2fs/file.c
> +++ b/fs/f2fs/file.c
> @@ -207,10 +207,28 @@ int f2fs_sync_file(struct file *file, lo
>  			up_write(&fi->i_sem);
>  		}
>  	} else {
> -		/* if there is no written node page, write its inode page */
> -		while (!sync_node_pages(sbi, inode->i_ino, &wbc)) {
> -			if (fsync_mark_done(sbi, inode->i_ino))
> -				goto out;
> +		for (;;) {
> +			sync_node_pages(sbi, inode->i_ino, &wbc);
> +			/*
> +			 * inode(x) | CP | dnode(F)
> +			 * -> ok
> +			 * inode(x) | CP | inode(x) | dnode(x)
> +			 * -> inode(x) | CP | inode(x) | dnode(x) | inode(F)
> +			 * inode(x) | CP | inode(x) | dnode(F)
> +			 * -> inode(x) | CP | inode(x) | dnode(F) | inode(F)
> +			 * CP | inode(x) | dnode(x)
> +			 * -> CP | inode(x) | dnode(x) | inode(DF)
> +			 * CP | inode(x) | dnode(F)
> +			 * -> CP | inode(x) | dnode(F) | inode(DF)
> +			 * CP | dnode(F) | inode(x)
> +			 * -> CP | dnode(F) | inode(x) | inode(DF)
> +			 */
> +			if ((is_checkpointed_node(sbi, inode->i_ino) &&
> +			     fsync_mark_done(sbi, inode->i_ino)) ||
> +			    (!is_checkpointed_node(sbi, inode->i_ino) &&
> +			     node_fsync_mark_done(sbi, inode->i_ino) &&
> +			     fsync_mark_done(sbi, inode->i_ino)))
> +				break;
>  			mark_inode_dirty_sync(inode);
>  			ret = f2fs_write_inode(inode, NULL);
>  			if (ret)
> --- a/fs/f2fs/node.c
> +++ b/fs/f2fs/node.c
> @@ -137,6 +137,20 @@ int is_checkpointed_node(struct f2fs_sb_
>  	return is_cp;
>  }
>  
> +bool node_fsync_mark_done(struct f2fs_sb_info *sbi, nid_t nid)
> +{
> +	struct f2fs_nm_info *nm_i = NM_I(sbi);
> +	struct nat_entry *e;
> +	bool node_fsync_done = false;
> +
> +	read_lock(&nm_i->nat_tree_lock);
> +	e = __lookup_nat_cache(nm_i, nid);
> +	if (e)
> +		node_fsync_done = e->node_fsync_done;
> +	read_unlock(&nm_i->nat_tree_lock);
> +	return node_fsync_done;
> +}
> +
>  bool fsync_mark_done(struct f2fs_sb_info *sbi, nid_t nid)
>  {
>  	struct f2fs_nm_info *nm_i = NM_I(sbi);
> @@ -246,7 +260,8 @@ retry:
>  	nat_set_blkaddr(e, new_blkaddr);
>  	__set_nat_cache_dirty(nm_i, e);
>  
> -	/* update fsync_mark if its inode nat entry is still alive */
> +	e->node_fsync_done = fsync_done;
> +	/* update fsync_done if its inode nat entry is still alive */
>  	e = __lookup_nat_cache(nm_i, ni->ino);
>  	if (e)
>  		e->fsync_done = fsync_done;
> --- a/fs/f2fs/node.h
> +++ b/fs/f2fs/node.h
> @@ -42,7 +42,9 @@ struct node_info {
>  struct nat_entry {
>  	struct list_head list;	/* for clean or dirty nat list */
>  	bool checkpointed;	/* whether it is checkpointed or not */
> -	bool fsync_done;	/* whether the latest node has fsync mark */
> +	bool node_fsync_done;	/* whether the node has fsync mark */
> +	bool fsync_done;	/* whether the latest node of the
> +				 * inode has fsync mark */
>  	struct node_info ni;	/* in-memory node information */
>  };
>  
> --- a/fs/f2fs/recovery.c
> +++ b/fs/f2fs/recovery.c
> @@ -190,6 +190,8 @@ static int find_fsync_dnodes(struct f2fs
>  			if (IS_ERR(entry->inode)) {
>  				err = PTR_ERR(entry->inode);
>  				kmem_cache_free(fsync_entry_slab, entry);
> +				if (err == -ENOENT)
> +					goto next;
>  				break;
>  			}
>  			list_add_tail(&entry->list, head);
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -1220,6 +1220,7 @@ struct node_info;
>  
>  bool available_free_memory(struct f2fs_sb_info *, int);
>  int is_checkpointed_node(struct f2fs_sb_info *, nid_t);
> +bool node_fsync_mark_done(struct f2fs_sb_info *, nid_t);
>  bool fsync_mark_done(struct f2fs_sb_info *, nid_t);
>  void fsync_mark_clear(struct f2fs_sb_info *, nid_t);
>  void get_node_info(struct f2fs_sb_info *, nid_t, struct node_info *);

------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] f2fs: Fix recover when nid of non-inode dnode < nid of inode
  2014-09-14  7:38                       ` Jaegeuk Kim
@ 2014-09-15  1:32                         ` Huang Ying
  -1 siblings, 0 replies; 31+ messages in thread
From: Huang Ying @ 2014-09-15  1:32 UTC (permalink / raw)
  To: Jaegeuk Kim; +Cc: huang ying, Changman Lee, linux-f2fs-devel, LKML

On Sun, 2014-09-14 at 00:38 -0700, Jaegeuk Kim wrote:
> On Fri, Sep 12, 2014 at 03:34:48PM +0800, Huang Ying wrote:
> > On Thu, 2014-09-11 at 22:13 -0700, Jaegeuk Kim wrote:
> > > On Thu, Sep 11, 2014 at 08:25:17PM +0800, Huang Ying wrote:
> > > > 
> > > > On Wed, 2014-09-10 at 22:37 -0700, Jaegeuk Kim wrote:
> > > > > On Wed, Sep 10, 2014 at 07:08:32PM +0800, huang ying wrote:
> > > > > > On Wed, Sep 10, 2014 at 3:21 PM, Jaegeuk Kim <jaegeuk@kernel.org> wrote:
> > > > > > 
> > > > > > > On Tue, Sep 09, 2014 at 07:31:49PM +0800, huang ying wrote:
> > > > > > > > On Tue, Sep 9, 2014 at 3:09 PM, Jaegeuk Kim <jaegeuk@kernel.org> wrote:
> > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > On Tue, Sep 09, 2014 at 01:39:30PM +0800, Huang Ying wrote:
> > > > > > > > > > On Mon, 2014-09-08 at 22:23 -0700, Jaegeuk Kim wrote:
> > > > > > > > > > > Hi Huang,
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Sep 08, 2014 at 07:38:26PM +0800, Huang Ying wrote:
> > > > > > > > > > > > For fsync, if the nid of a non-inode dnode < nid of inode and the
> > > > > > > > > > > > inode is not checkpointed.  The non-inode dnode may be written
> > > > > > > before
> > > > > > > > > > > > inode.  So in find_fsync_dnodes, f2fs_iget will fail, cause the
> > > > > > > > > > > > recovery fail.
> > > > > > > > > > > >
> > > > > > > > > > > > Usually, inode will be allocated before non-inode dnode, so the
> > > > > > > nid
> > > > > > > > > of
> > > > > > > > > > > > inode < nid of non-inode dnode.  But it is possible for the
> > > > > > > reverse.
> > > > > > > > > > > > For example, because of alloc_nid_failed.
> > > > > > > > > > > >
> > > > > > > > > > > > This is fixed via ignoring non-inode dnode before inode dnode in
> > > > > > > > > > > > find_fsync_dnodes.
> > > > > > > > > > > >
> > > > > > > > > > > > The patch was tested via allocating nid reversely via a debugging
> > > > > > > > > > > > patch, that is, from big number to small number.
> > > > > > > > > > > >
> > > > > > > > > > > > Signed-off-by: Huang, Ying <ying.huang@intel.com>
> > > > > > > > > > > > ---
> > > > > > > > > > > >  fs/f2fs/recovery.c |    7 ++++---
> > > > > > > > > > > >  1 file changed, 4 insertions(+), 3 deletions(-)
> > > > > > > > > > > >
> > > > > > > > > > > > --- a/fs/f2fs/recovery.c
> > > > > > > > > > > > +++ b/fs/f2fs/recovery.c
> > > > > > > > > > > > @@ -172,8 +172,8 @@ static int find_fsync_dnodes(struct f2fs
> > > > > > > > > > > >                   if (IS_INODE(page) && is_dent_dnode(page))
> > > > > > > > > > > >                           set_inode_flag(F2FS_I(entry->inode),
> > > > > > > > > > > >                                                   FI_INC_LINK);
> > > > > > > > > > > > -         } else {
> > > > > > > > > > > > -                 if (IS_INODE(page) && is_dent_dnode(page)) {
> > > > > > > > > > >
> > > > > > > > > > > If this is not inode block, we should add this inode to recover its
> > > > > > > > > data blocks.
> > > > > > > > > >
> > > > > > > > > > Is it possible that there is only non-inode dnode but no inode when
> > > > > > > > > > find_fsync_dnodes checking dnodes?  Per my understanding, any
> > > > > > > changes to
> > > > > > > > > > file will cause inode page dirty (for example, mtime changed), so
> > > > > > > that
> > > > > > > > > > we will write inode block.  Is it right?  If so, the solution in this
> > > > > > > > > > patch should work too.
> > > > > > > > >
> > > > > > > > > Your description says that f2fs_iget will fail, which causes the
> > > > > > > recovery
> > > > > > > > > fail.
> > > > > > > > > So, I thought it would be better to handle the f2fs_iget failure
> > > > > > > directly.
> > > > > > > > >
> > > > > > > >
> > > > > > > > Yes.  That is another way to fix the issue.
> > > > > > > >
> > > > > > > >
> > > > > > > > > In addition, we cannot guarantee the write order of dnode and inode.
> > > > > > > > > For exmaple,
> > > > > > > > > 1. the inode is written by flusher or kswapd, then,
> > > > > > > > > 2. f2fs_sync_file writes its dnode.
> > > > > > > > >
> > > > > > > > > In that case, we can get only non-inode dnode in the node chain, since
> > > > > > > the
> > > > > > > > > inode
> > > > > > > > > has not fsync_mark.
> > > > > > > > >
> > > > > > > >
> > > > > > > > I think your solution is better here, but does not fix all scenarios.  If
> > > > > > > > the inode is checkpointed, the file can be recovered, although the inode
> > > > > > > > information may be not up to date.  But if the inode is not checkpointed,
> > > > > > > > f2fs_iget will fail too and recover will fail.
> > > > > > >
> > > > > > > Ok, let me consider your scenarios.
> > > > > > >
> > > > > > > Term: F: fsync_mark, D: dentry_mark
> > > > > > >
> > > > > > > 1. inode(x) | CP | inode(x) | dnode(F)
> > > > > > >  -> Lose the latest inode(x). Need to fix.
> > > > > > >
> > > > > > > 2. inode(x) | CP | dnode(F) | inode(x)
> > > > > > >  -> Impossible, but recover latest dnode(F)
> > > > > > >
> > > > > > > 3. CP | inode(x) | dnode(F)
> > > > > > >  -> Need to write inode(DF) in f2fs_sync_file.
> > > > > > >
> > > > > > > 4. CP | dnode(F) | inode(DF)
> > > > > > >  -> If f2fs_iget fails, then goto next.
> > > > > > >
> > > > > > > 5. CP | dnode(F) | inode(x)
> > > > > > >  -> If f2fs_iget fails, then goto next. But, this is an impossible
> > > > > > > scenario.
> > > > > > >     Drop this dnode(F).
> > > > > > >
> > > > > > > Indeed, there were some missing scenarios.
> > > > > > >
> > > > > > > So, how about this patch?
> > > > > > >
> > > > > > > From 552dc68c5f07a335d7b55c197bab531efb135521 Mon Sep 17 00:00:00 2001
> > > > > > > From: Jaegeuk Kim <jaegeuk@kernel.org>
> > > > > > > Date: Wed, 10 Sep 2014 00:16:34 -0700
> > > > > > > Subject: [PATCH] f2fs: fix roll-forward missing scenarios
> > > > > > >
> > > > > > > We can summarize the roll forward recovery scenarios as follows.
> > > > > > >
> > > > > > > [Term] F: fsync_mark, D: dentry_mark
> > > > > > >
> > > > > > > 1. inode(x) | CP | inode(x) | dnode(F)
> > > > > > > -> Update the latest inode(x).
> > > > > > >
> > > > > > > 2. inode(x) | CP | inode(F) | dnode(F)
> > > > > > > -> No problem.
> > > > > > >
> > > > > > > 3. inode(x) | CP | dnode(F) | inode(x)
> > > > > > > -> Impossible, but recover latest dnode(F)
> > > > > > >
> > > > > > 
> > > > > > I think this is possible.  If f2fs_sync_file runs concurrently with
> > > > > > writeback. f2fs_sync_file written dnode(F), then writeback written inode(x).
> > > > > 
> > > > > If the inode(x) was written, f2fs_sync_file will do write the inode again with
> > > > > fsync_mark. So, dnode(F) | inode(x) | inode(F) should be shown.
> > > > > 
> > > > > In f2fs_sync_file,
> > > > > 	...
> > > > > 	while (!sync_node_pages(sbi, ino, &wbc)) {
> > > > > 		if (fsync_mark_done(sbi, ino))
> > > > > 			goto out;
> > > > > 		mark_inode_dirty_sync(inode);
> > > > > 		ret = f2fs_write_inode(inode, NULL);
> > > > > 		if (ret)
> > > > > 			goto out;
> > > > > 	}
> > > > > 	...
> > > > 
> > > > Is the following situation possible?
> > > > 
> > > > f2fs_sync_file				<writeback>
> > > >   sync_node_pages			  f2fs_write_node_pages
> > > >     write dnode(F)			    sync_node_pages
> > > > 					      write inode(x) /* clear PAGECACHE_TAG_DIRTY */
> > > > 
> > > > 
> > > > That is, f2fs_sync_file run parallel with <writeback>.  The
> > > > sync_node_pages above will return 1, because dnode(F) is written.
> > > > inode(x) is written by <writeback> path.  And because
> > > > PAGECACHE_TAG_DIRTY is cleared, it is possible that sync_node_pages
> > > > called by f2fs_sync_file does not write inode(F).
> > > 
> > > I think Chao's comment would work.
> > > How about this patch?
> > > 
> > > From 32fe5ff49d2c78d3be4cf3638cc64ae71cf44549 Mon Sep 17 00:00:00 2001
> > > From: Jaegeuk Kim <jaegeuk@kernel.org>
> > > Date: Wed, 10 Sep 2014 00:16:34 -0700
> > > Subject: [PATCH] f2fs: fix roll-forward missing scenarios
> > > 
> > > We can summarize the roll forward recovery scenarios as follows.
> > > 
> > > [Term] F: fsync_mark, D: dentry_mark
> > > 
> > > 1. inode(x) | CP | inode(x) | dnode(F)
> > > -> Update the latest inode(x).
> > > 
> > > 2. inode(x) | CP | inode(F) | dnode(F)
> > > -> No problem.
> > > 
> > > 3. inode(x) | CP | dnode(F) | inode(x)
> > > -> Recover to the latest dnode(F), and drop the last inode(x)
> > > 
> > > 4. inode(x) | CP | dnode(F) | inode(F)
> > > -> No problem.
> > > 
> > > 5. CP | inode(x) | dnode(F)
> > > -> The inode(DF) was missing. Should drop this dnode(F).
> > > 
> > > 6. CP | inode(DF) | dnode(F)
> > > -> No problem.
> > > 
> > > 7. CP | dnode(F) | inode(DF)
> > > -> If f2fs_iget fails, then goto next to find inode(DF).
> > > 
> > > 8. CP | dnode(F) | inode(x)
> > > -> If f2fs_iget fails, then goto next to find inode(DF).
> > >    But it will fail due to no inode(DF).
> > > 
> > > So, this patch adds some missing points such as #1, #5, #7, and #8.
> > > 
> > > Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> > > ---
> > >  fs/f2fs/file.c     | 20 ++++++++++++----
> > >  fs/f2fs/node.c     | 11 ++++++++-
> > >  fs/f2fs/recovery.c | 70 +++++++++++++++++++++++++++++++++++++++++++++---------
> > >  3 files changed, 85 insertions(+), 16 deletions(-)
> > > 
> > > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> > > index e7681c3..70f5d4b 100644
> > > --- a/fs/f2fs/file.c
> > > +++ b/fs/f2fs/file.c
> > > @@ -206,15 +206,27 @@ int f2fs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
> > >  			up_write(&fi->i_sem);
> > >  		}
> > >  	} else {
> > > -		/* if there is no written node page, write its inode page */
> > > -		while (!sync_node_pages(sbi, ino, &wbc)) {
> > > -			if (fsync_mark_done(sbi, ino))
> > > -				goto out;
> > > +sync_nodes:
> > > +		sync_node_pages(sbi, ino, &wbc);
> > > +
> > > +		/*
> > > +		 * inode(x) | CP | inode(x) | dnode(F)
> > > +		 *  -> ok
> > 
> > Is it acceptable that we turn this to:
> > 
> > inode(x) | CPU | inode (x) | dnode (F) | inode(F)
> 
> Why do we have to do writing additonal one block unnecessarily?
> We got inode(x) for the recovery.

If my understanding were correct, your code has the same effect.

In set_node_addr:

+               if (!e->checkpointed && !e->fsync_done &&
+                               ni->ino != ni->nid && fsync_done)
+                       goto skip;
                e->fsync_done = fsync_done;
+       }
+skip:
        write_unlock(&nm_i->nat_tree_lock);
 
For inode(x) | CP | inode(x) | dnode(F), when writing the second
inode(x):

	if (!e->checkpointed ...) is false
	e->fsync_done = false

When writing dnode(F):

	if (!e->checkpointed ...) is true
	e->fsync_done will be kept as false

Best Regards,
Huang, Ying

> > 
> > > +		 * inode(x) | CP | dnode(F) | inode(x)
> > > +		 *  -> inode(x) | CP | dnode(F) | inode(x) | inode(F)
> > > +		 * CP | inode(x) | dnode(F)
> > > +		 *  -> CP | inode(x) | dnode(F) | inode(DF)
> > > +		 * CP | dnode(F) | inode(x)
> > > +		 *  -> CP | dnode(F) | inode(x) | inode(DF)
> > > +		 */
> > > +		if (!fsync_mark_done(sbi, ino)) {
> > >  			mark_inode_dirty_sync(inode);
> > >  			ret = f2fs_write_inode(inode, NULL);
> > >  			if (ret)
> > >  				goto out;
> > > +			goto sync_nodes;
> > >  		}
> > > +
> > >  		ret = wait_on_node_pages_writeback(sbi, ino);
> > >  		if (ret)
> > >  			goto out;
> > > diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
> > > index b32eb56..653aa71 100644
> > > --- a/fs/f2fs/node.c
> > > +++ b/fs/f2fs/node.c
> > > @@ -248,8 +248,17 @@ retry:
> > >  
> > >  	/* update fsync_mark if its inode nat entry is still alive */
> > >  	e = __lookup_nat_cache(nm_i, ni->ino);
> > > -	if (e)
> > > +	if (e) {
> > > +		/*
> > > +		 * CP | inode(x) | dnode(F)
> > > +		 *  -> CP | inode(x) | dnode(F) | inode(DF)
> > > +		 */
> > > +		if (!e->checkpointed && !e->fsync_done &&
> > > +				ni->ino != ni->nid && fsync_done)
> > > +			goto skip;
> > >  		e->fsync_done = fsync_done;
> > > +	}
> > > +skip:
> > >  	write_unlock(&nm_i->nat_tree_lock);
> > >  }
> > 
> > I don't understand why we need so complex logic?  Why not just let
> > e->fsync_done reflect just latest is_fsync_dnode(page)?
> > 
> > It appears that in f2fs_sync_file, what we need is just whether inode
> > page has fsync mark or not.
> > 
> > Best Regards,
> > Huang, Ying
> >  
> > > diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c
> > > index 6c5a74a..3736728 100644
> > > --- a/fs/f2fs/recovery.c
> > > +++ b/fs/f2fs/recovery.c
> > > @@ -14,6 +14,36 @@
> > >  #include "node.h"
> > >  #include "segment.h"
> > >  
> > > +/*
> > > + * Roll forward recovery scenarios.
> > > + *
> > > + * [Term] F: fsync_mark, D: dentry_mark
> > > + *
> > > + * 1. inode(x) | CP | inode(x) | dnode(F)
> > > + * -> Update the latest inode(x).
> > > + *
> > > + * 2. inode(x) | CP | inode(F) | dnode(F)
> > > + * -> No problem.
> > > + *
> > > + * 3. inode(x) | CP | dnode(F) | inode(x)
> > > + * -> Recover to the latest dnode(F), and drop the last inode(x)
> > > + *
> > > + * 4. inode(x) | CP | dnode(F) | inode(F)
> > > + * -> No problem.
> > > + *
> > > + * 5. CP | inode(x) | dnode(F)
> > > + * -> The inode(DF) was missing. Should drop this dnode(F).
> > > + *
> > > + * 6. CP | inode(DF) | dnode(F)
> > > + * -> No problem.
> > > + *
> > > + * 7. CP | dnode(F) | inode(DF)
> > > + * -> If f2fs_iget fails, then goto next to find inode(DF).
> > > + *
> > > + * 8. CP | dnode(F) | inode(x)
> > > + * -> If f2fs_iget fails, then goto next to find inode(DF).
> > > + *    But it will fail due to no inode(DF).
> > > + */
> > >  static struct kmem_cache *fsync_entry_slab;
> > >  
> > >  bool space_for_roll_forward(struct f2fs_sb_info *sbi)
> > > @@ -110,27 +140,32 @@ out:
> > >  	return err;
> > >  }
> > >  
> > > -static int recover_inode(struct inode *inode, struct page *node_page)
> > > +static void __recover_inode(struct inode *inode, struct page *page)
> > >  {
> > > -	struct f2fs_inode *raw_inode = F2FS_INODE(node_page);
> > > +	struct f2fs_inode *raw = F2FS_INODE(page);
> > > +
> > > +	inode->i_mode = le16_to_cpu(raw->i_mode);
> > > +	i_size_write(inode, le64_to_cpu(raw->i_size));
> > > +	inode->i_atime.tv_sec = le64_to_cpu(raw->i_mtime);
> > > +	inode->i_ctime.tv_sec = le64_to_cpu(raw->i_ctime);
> > > +	inode->i_mtime.tv_sec = le64_to_cpu(raw->i_mtime);
> > > +	inode->i_atime.tv_nsec = le32_to_cpu(raw->i_mtime_nsec);
> > > +	inode->i_ctime.tv_nsec = le32_to_cpu(raw->i_ctime_nsec);
> > > +	inode->i_mtime.tv_nsec = le32_to_cpu(raw->i_mtime_nsec);
> > > +}
> > >  
> > > +static int recover_inode(struct inode *inode, struct page *node_page)
> > > +{
> > >  	if (!IS_INODE(node_page))
> > >  		return 0;
> > >  
> > > -	inode->i_mode = le16_to_cpu(raw_inode->i_mode);
> > > -	i_size_write(inode, le64_to_cpu(raw_inode->i_size));
> > > -	inode->i_atime.tv_sec = le64_to_cpu(raw_inode->i_mtime);
> > > -	inode->i_ctime.tv_sec = le64_to_cpu(raw_inode->i_ctime);
> > > -	inode->i_mtime.tv_sec = le64_to_cpu(raw_inode->i_mtime);
> > > -	inode->i_atime.tv_nsec = le32_to_cpu(raw_inode->i_mtime_nsec);
> > > -	inode->i_ctime.tv_nsec = le32_to_cpu(raw_inode->i_ctime_nsec);
> > > -	inode->i_mtime.tv_nsec = le32_to_cpu(raw_inode->i_mtime_nsec);
> > > +	__recover_inode(inode, node_page);
> > >  
> > >  	if (is_dent_dnode(node_page))
> > >  		return recover_dentry(node_page, inode);
> > >  
> > >  	f2fs_msg(inode->i_sb, KERN_NOTICE, "recover_inode: ino = %x, name = %s",
> > > -			ino_of_node(node_page), raw_inode->i_name);
> > > +			ino_of_node(node_page), F2FS_INODE(node_page)->i_name);
> > >  	return 0;
> > >  }
> > >  
> > > @@ -186,10 +221,16 @@ static int find_fsync_dnodes(struct f2fs_sb_info *sbi, struct list_head *head)
> > >  				break;
> > >  			}
> > >  
> > > +			/*
> > > +			 * CP | dnode(F) | inode(DF)
> > > +			 * For this case, we should not give up now.
> > > +			 */
> > >  			entry->inode = f2fs_iget(sbi->sb, ino_of_node(page));
> > >  			if (IS_ERR(entry->inode)) {
> > >  				err = PTR_ERR(entry->inode);
> > >  				kmem_cache_free(fsync_entry_slab, entry);
> > > +				if (err == -ENOENT)
> > > +					goto next;
> > >  				break;
> > >  			}
> > >  			list_add_tail(&entry->list, head);
> > > @@ -416,6 +457,13 @@ static int recover_data(struct f2fs_sb_info *sbi,
> > >  		entry = get_fsync_inode(head, ino_of_node(page));
> > >  		if (!entry)
> > >  			goto next;
> > > +		/*
> > > +		 * inode(x) | CP | inode(x) | dnode(F)
> > > +		 * In this case, we can lose the latest inode(x).
> > > +		 * So, call __recover_inode for the inode update.
> > > +		 */
> > > +		if (IS_INODE(page))
> > > +			__recover_inode(entry->inode, page);
> > >  
> > >  		err = do_recover_data(sbi, entry->inode, page, blkaddr);
> > >  		if (err)



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] f2fs: Fix recover when nid of non-inode dnode < nid of inode
@ 2014-09-15  1:32                         ` Huang Ying
  0 siblings, 0 replies; 31+ messages in thread
From: Huang Ying @ 2014-09-15  1:32 UTC (permalink / raw)
  To: Jaegeuk Kim; +Cc: huang ying, LKML, linux-f2fs-devel

On Sun, 2014-09-14 at 00:38 -0700, Jaegeuk Kim wrote:
> On Fri, Sep 12, 2014 at 03:34:48PM +0800, Huang Ying wrote:
> > On Thu, 2014-09-11 at 22:13 -0700, Jaegeuk Kim wrote:
> > > On Thu, Sep 11, 2014 at 08:25:17PM +0800, Huang Ying wrote:
> > > > 
> > > > On Wed, 2014-09-10 at 22:37 -0700, Jaegeuk Kim wrote:
> > > > > On Wed, Sep 10, 2014 at 07:08:32PM +0800, huang ying wrote:
> > > > > > On Wed, Sep 10, 2014 at 3:21 PM, Jaegeuk Kim <jaegeuk@kernel.org> wrote:
> > > > > > 
> > > > > > > On Tue, Sep 09, 2014 at 07:31:49PM +0800, huang ying wrote:
> > > > > > > > On Tue, Sep 9, 2014 at 3:09 PM, Jaegeuk Kim <jaegeuk@kernel.org> wrote:
> > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > On Tue, Sep 09, 2014 at 01:39:30PM +0800, Huang Ying wrote:
> > > > > > > > > > On Mon, 2014-09-08 at 22:23 -0700, Jaegeuk Kim wrote:
> > > > > > > > > > > Hi Huang,
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Sep 08, 2014 at 07:38:26PM +0800, Huang Ying wrote:
> > > > > > > > > > > > For fsync, if the nid of a non-inode dnode < nid of inode and the
> > > > > > > > > > > > inode is not checkpointed.  The non-inode dnode may be written
> > > > > > > before
> > > > > > > > > > > > inode.  So in find_fsync_dnodes, f2fs_iget will fail, cause the
> > > > > > > > > > > > recovery fail.
> > > > > > > > > > > >
> > > > > > > > > > > > Usually, inode will be allocated before non-inode dnode, so the
> > > > > > > nid
> > > > > > > > > of
> > > > > > > > > > > > inode < nid of non-inode dnode.  But it is possible for the
> > > > > > > reverse.
> > > > > > > > > > > > For example, because of alloc_nid_failed.
> > > > > > > > > > > >
> > > > > > > > > > > > This is fixed via ignoring non-inode dnode before inode dnode in
> > > > > > > > > > > > find_fsync_dnodes.
> > > > > > > > > > > >
> > > > > > > > > > > > The patch was tested via allocating nid reversely via a debugging
> > > > > > > > > > > > patch, that is, from big number to small number.
> > > > > > > > > > > >
> > > > > > > > > > > > Signed-off-by: Huang, Ying <ying.huang@intel.com>
> > > > > > > > > > > > ---
> > > > > > > > > > > >  fs/f2fs/recovery.c |    7 ++++---
> > > > > > > > > > > >  1 file changed, 4 insertions(+), 3 deletions(-)
> > > > > > > > > > > >
> > > > > > > > > > > > --- a/fs/f2fs/recovery.c
> > > > > > > > > > > > +++ b/fs/f2fs/recovery.c
> > > > > > > > > > > > @@ -172,8 +172,8 @@ static int find_fsync_dnodes(struct f2fs
> > > > > > > > > > > >                   if (IS_INODE(page) && is_dent_dnode(page))
> > > > > > > > > > > >                           set_inode_flag(F2FS_I(entry->inode),
> > > > > > > > > > > >                                                   FI_INC_LINK);
> > > > > > > > > > > > -         } else {
> > > > > > > > > > > > -                 if (IS_INODE(page) && is_dent_dnode(page)) {
> > > > > > > > > > >
> > > > > > > > > > > If this is not inode block, we should add this inode to recover its
> > > > > > > > > data blocks.
> > > > > > > > > >
> > > > > > > > > > Is it possible that there is only non-inode dnode but no inode when
> > > > > > > > > > find_fsync_dnodes checking dnodes?  Per my understanding, any
> > > > > > > changes to
> > > > > > > > > > file will cause inode page dirty (for example, mtime changed), so
> > > > > > > that
> > > > > > > > > > we will write inode block.  Is it right?  If so, the solution in this
> > > > > > > > > > patch should work too.
> > > > > > > > >
> > > > > > > > > Your description says that f2fs_iget will fail, which causes the
> > > > > > > recovery
> > > > > > > > > fail.
> > > > > > > > > So, I thought it would be better to handle the f2fs_iget failure
> > > > > > > directly.
> > > > > > > > >
> > > > > > > >
> > > > > > > > Yes.  That is another way to fix the issue.
> > > > > > > >
> > > > > > > >
> > > > > > > > > In addition, we cannot guarantee the write order of dnode and inode.
> > > > > > > > > For exmaple,
> > > > > > > > > 1. the inode is written by flusher or kswapd, then,
> > > > > > > > > 2. f2fs_sync_file writes its dnode.
> > > > > > > > >
> > > > > > > > > In that case, we can get only non-inode dnode in the node chain, since
> > > > > > > the
> > > > > > > > > inode
> > > > > > > > > has not fsync_mark.
> > > > > > > > >
> > > > > > > >
> > > > > > > > I think your solution is better here, but does not fix all scenarios.  If
> > > > > > > > the inode is checkpointed, the file can be recovered, although the inode
> > > > > > > > information may be not up to date.  But if the inode is not checkpointed,
> > > > > > > > f2fs_iget will fail too and recover will fail.
> > > > > > >
> > > > > > > Ok, let me consider your scenarios.
> > > > > > >
> > > > > > > Term: F: fsync_mark, D: dentry_mark
> > > > > > >
> > > > > > > 1. inode(x) | CP | inode(x) | dnode(F)
> > > > > > >  -> Lose the latest inode(x). Need to fix.
> > > > > > >
> > > > > > > 2. inode(x) | CP | dnode(F) | inode(x)
> > > > > > >  -> Impossible, but recover latest dnode(F)
> > > > > > >
> > > > > > > 3. CP | inode(x) | dnode(F)
> > > > > > >  -> Need to write inode(DF) in f2fs_sync_file.
> > > > > > >
> > > > > > > 4. CP | dnode(F) | inode(DF)
> > > > > > >  -> If f2fs_iget fails, then goto next.
> > > > > > >
> > > > > > > 5. CP | dnode(F) | inode(x)
> > > > > > >  -> If f2fs_iget fails, then goto next. But, this is an impossible
> > > > > > > scenario.
> > > > > > >     Drop this dnode(F).
> > > > > > >
> > > > > > > Indeed, there were some missing scenarios.
> > > > > > >
> > > > > > > So, how about this patch?
> > > > > > >
> > > > > > > From 552dc68c5f07a335d7b55c197bab531efb135521 Mon Sep 17 00:00:00 2001
> > > > > > > From: Jaegeuk Kim <jaegeuk@kernel.org>
> > > > > > > Date: Wed, 10 Sep 2014 00:16:34 -0700
> > > > > > > Subject: [PATCH] f2fs: fix roll-forward missing scenarios
> > > > > > >
> > > > > > > We can summarize the roll forward recovery scenarios as follows.
> > > > > > >
> > > > > > > [Term] F: fsync_mark, D: dentry_mark
> > > > > > >
> > > > > > > 1. inode(x) | CP | inode(x) | dnode(F)
> > > > > > > -> Update the latest inode(x).
> > > > > > >
> > > > > > > 2. inode(x) | CP | inode(F) | dnode(F)
> > > > > > > -> No problem.
> > > > > > >
> > > > > > > 3. inode(x) | CP | dnode(F) | inode(x)
> > > > > > > -> Impossible, but recover latest dnode(F)
> > > > > > >
> > > > > > 
> > > > > > I think this is possible.  If f2fs_sync_file runs concurrently with
> > > > > > writeback. f2fs_sync_file written dnode(F), then writeback written inode(x).
> > > > > 
> > > > > If the inode(x) was written, f2fs_sync_file will do write the inode again with
> > > > > fsync_mark. So, dnode(F) | inode(x) | inode(F) should be shown.
> > > > > 
> > > > > In f2fs_sync_file,
> > > > > 	...
> > > > > 	while (!sync_node_pages(sbi, ino, &wbc)) {
> > > > > 		if (fsync_mark_done(sbi, ino))
> > > > > 			goto out;
> > > > > 		mark_inode_dirty_sync(inode);
> > > > > 		ret = f2fs_write_inode(inode, NULL);
> > > > > 		if (ret)
> > > > > 			goto out;
> > > > > 	}
> > > > > 	...
> > > > 
> > > > Is the following situation possible?
> > > > 
> > > > f2fs_sync_file				<writeback>
> > > >   sync_node_pages			  f2fs_write_node_pages
> > > >     write dnode(F)			    sync_node_pages
> > > > 					      write inode(x) /* clear PAGECACHE_TAG_DIRTY */
> > > > 
> > > > 
> > > > That is, f2fs_sync_file run parallel with <writeback>.  The
> > > > sync_node_pages above will return 1, because dnode(F) is written.
> > > > inode(x) is written by <writeback> path.  And because
> > > > PAGECACHE_TAG_DIRTY is cleared, it is possible that sync_node_pages
> > > > called by f2fs_sync_file does not write inode(F).
> > > 
> > > I think Chao's comment would work.
> > > How about this patch?
> > > 
> > > From 32fe5ff49d2c78d3be4cf3638cc64ae71cf44549 Mon Sep 17 00:00:00 2001
> > > From: Jaegeuk Kim <jaegeuk@kernel.org>
> > > Date: Wed, 10 Sep 2014 00:16:34 -0700
> > > Subject: [PATCH] f2fs: fix roll-forward missing scenarios
> > > 
> > > We can summarize the roll forward recovery scenarios as follows.
> > > 
> > > [Term] F: fsync_mark, D: dentry_mark
> > > 
> > > 1. inode(x) | CP | inode(x) | dnode(F)
> > > -> Update the latest inode(x).
> > > 
> > > 2. inode(x) | CP | inode(F) | dnode(F)
> > > -> No problem.
> > > 
> > > 3. inode(x) | CP | dnode(F) | inode(x)
> > > -> Recover to the latest dnode(F), and drop the last inode(x)
> > > 
> > > 4. inode(x) | CP | dnode(F) | inode(F)
> > > -> No problem.
> > > 
> > > 5. CP | inode(x) | dnode(F)
> > > -> The inode(DF) was missing. Should drop this dnode(F).
> > > 
> > > 6. CP | inode(DF) | dnode(F)
> > > -> No problem.
> > > 
> > > 7. CP | dnode(F) | inode(DF)
> > > -> If f2fs_iget fails, then goto next to find inode(DF).
> > > 
> > > 8. CP | dnode(F) | inode(x)
> > > -> If f2fs_iget fails, then goto next to find inode(DF).
> > >    But it will fail due to no inode(DF).
> > > 
> > > So, this patch adds some missing points such as #1, #5, #7, and #8.
> > > 
> > > Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> > > ---
> > >  fs/f2fs/file.c     | 20 ++++++++++++----
> > >  fs/f2fs/node.c     | 11 ++++++++-
> > >  fs/f2fs/recovery.c | 70 +++++++++++++++++++++++++++++++++++++++++++++---------
> > >  3 files changed, 85 insertions(+), 16 deletions(-)
> > > 
> > > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> > > index e7681c3..70f5d4b 100644
> > > --- a/fs/f2fs/file.c
> > > +++ b/fs/f2fs/file.c
> > > @@ -206,15 +206,27 @@ int f2fs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
> > >  			up_write(&fi->i_sem);
> > >  		}
> > >  	} else {
> > > -		/* if there is no written node page, write its inode page */
> > > -		while (!sync_node_pages(sbi, ino, &wbc)) {
> > > -			if (fsync_mark_done(sbi, ino))
> > > -				goto out;
> > > +sync_nodes:
> > > +		sync_node_pages(sbi, ino, &wbc);
> > > +
> > > +		/*
> > > +		 * inode(x) | CP | inode(x) | dnode(F)
> > > +		 *  -> ok
> > 
> > Is it acceptable that we turn this to:
> > 
> > inode(x) | CPU | inode (x) | dnode (F) | inode(F)
> 
> Why do we have to do writing additonal one block unnecessarily?
> We got inode(x) for the recovery.

If my understanding were correct, your code has the same effect.

In set_node_addr:

+               if (!e->checkpointed && !e->fsync_done &&
+                               ni->ino != ni->nid && fsync_done)
+                       goto skip;
                e->fsync_done = fsync_done;
+       }
+skip:
        write_unlock(&nm_i->nat_tree_lock);
 
For inode(x) | CP | inode(x) | dnode(F), when writing the second
inode(x):

	if (!e->checkpointed ...) is false
	e->fsync_done = false

When writing dnode(F):

	if (!e->checkpointed ...) is true
	e->fsync_done will be kept as false

Best Regards,
Huang, Ying

> > 
> > > +		 * inode(x) | CP | dnode(F) | inode(x)
> > > +		 *  -> inode(x) | CP | dnode(F) | inode(x) | inode(F)
> > > +		 * CP | inode(x) | dnode(F)
> > > +		 *  -> CP | inode(x) | dnode(F) | inode(DF)
> > > +		 * CP | dnode(F) | inode(x)
> > > +		 *  -> CP | dnode(F) | inode(x) | inode(DF)
> > > +		 */
> > > +		if (!fsync_mark_done(sbi, ino)) {
> > >  			mark_inode_dirty_sync(inode);
> > >  			ret = f2fs_write_inode(inode, NULL);
> > >  			if (ret)
> > >  				goto out;
> > > +			goto sync_nodes;
> > >  		}
> > > +
> > >  		ret = wait_on_node_pages_writeback(sbi, ino);
> > >  		if (ret)
> > >  			goto out;
> > > diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
> > > index b32eb56..653aa71 100644
> > > --- a/fs/f2fs/node.c
> > > +++ b/fs/f2fs/node.c
> > > @@ -248,8 +248,17 @@ retry:
> > >  
> > >  	/* update fsync_mark if its inode nat entry is still alive */
> > >  	e = __lookup_nat_cache(nm_i, ni->ino);
> > > -	if (e)
> > > +	if (e) {
> > > +		/*
> > > +		 * CP | inode(x) | dnode(F)
> > > +		 *  -> CP | inode(x) | dnode(F) | inode(DF)
> > > +		 */
> > > +		if (!e->checkpointed && !e->fsync_done &&
> > > +				ni->ino != ni->nid && fsync_done)
> > > +			goto skip;
> > >  		e->fsync_done = fsync_done;
> > > +	}
> > > +skip:
> > >  	write_unlock(&nm_i->nat_tree_lock);
> > >  }
> > 
> > I don't understand why we need so complex logic?  Why not just let
> > e->fsync_done reflect just latest is_fsync_dnode(page)?
> > 
> > It appears that in f2fs_sync_file, what we need is just whether inode
> > page has fsync mark or not.
> > 
> > Best Regards,
> > Huang, Ying
> >  
> > > diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c
> > > index 6c5a74a..3736728 100644
> > > --- a/fs/f2fs/recovery.c
> > > +++ b/fs/f2fs/recovery.c
> > > @@ -14,6 +14,36 @@
> > >  #include "node.h"
> > >  #include "segment.h"
> > >  
> > > +/*
> > > + * Roll forward recovery scenarios.
> > > + *
> > > + * [Term] F: fsync_mark, D: dentry_mark
> > > + *
> > > + * 1. inode(x) | CP | inode(x) | dnode(F)
> > > + * -> Update the latest inode(x).
> > > + *
> > > + * 2. inode(x) | CP | inode(F) | dnode(F)
> > > + * -> No problem.
> > > + *
> > > + * 3. inode(x) | CP | dnode(F) | inode(x)
> > > + * -> Recover to the latest dnode(F), and drop the last inode(x)
> > > + *
> > > + * 4. inode(x) | CP | dnode(F) | inode(F)
> > > + * -> No problem.
> > > + *
> > > + * 5. CP | inode(x) | dnode(F)
> > > + * -> The inode(DF) was missing. Should drop this dnode(F).
> > > + *
> > > + * 6. CP | inode(DF) | dnode(F)
> > > + * -> No problem.
> > > + *
> > > + * 7. CP | dnode(F) | inode(DF)
> > > + * -> If f2fs_iget fails, then goto next to find inode(DF).
> > > + *
> > > + * 8. CP | dnode(F) | inode(x)
> > > + * -> If f2fs_iget fails, then goto next to find inode(DF).
> > > + *    But it will fail due to no inode(DF).
> > > + */
> > >  static struct kmem_cache *fsync_entry_slab;
> > >  
> > >  bool space_for_roll_forward(struct f2fs_sb_info *sbi)
> > > @@ -110,27 +140,32 @@ out:
> > >  	return err;
> > >  }
> > >  
> > > -static int recover_inode(struct inode *inode, struct page *node_page)
> > > +static void __recover_inode(struct inode *inode, struct page *page)
> > >  {
> > > -	struct f2fs_inode *raw_inode = F2FS_INODE(node_page);
> > > +	struct f2fs_inode *raw = F2FS_INODE(page);
> > > +
> > > +	inode->i_mode = le16_to_cpu(raw->i_mode);
> > > +	i_size_write(inode, le64_to_cpu(raw->i_size));
> > > +	inode->i_atime.tv_sec = le64_to_cpu(raw->i_mtime);
> > > +	inode->i_ctime.tv_sec = le64_to_cpu(raw->i_ctime);
> > > +	inode->i_mtime.tv_sec = le64_to_cpu(raw->i_mtime);
> > > +	inode->i_atime.tv_nsec = le32_to_cpu(raw->i_mtime_nsec);
> > > +	inode->i_ctime.tv_nsec = le32_to_cpu(raw->i_ctime_nsec);
> > > +	inode->i_mtime.tv_nsec = le32_to_cpu(raw->i_mtime_nsec);
> > > +}
> > >  
> > > +static int recover_inode(struct inode *inode, struct page *node_page)
> > > +{
> > >  	if (!IS_INODE(node_page))
> > >  		return 0;
> > >  
> > > -	inode->i_mode = le16_to_cpu(raw_inode->i_mode);
> > > -	i_size_write(inode, le64_to_cpu(raw_inode->i_size));
> > > -	inode->i_atime.tv_sec = le64_to_cpu(raw_inode->i_mtime);
> > > -	inode->i_ctime.tv_sec = le64_to_cpu(raw_inode->i_ctime);
> > > -	inode->i_mtime.tv_sec = le64_to_cpu(raw_inode->i_mtime);
> > > -	inode->i_atime.tv_nsec = le32_to_cpu(raw_inode->i_mtime_nsec);
> > > -	inode->i_ctime.tv_nsec = le32_to_cpu(raw_inode->i_ctime_nsec);
> > > -	inode->i_mtime.tv_nsec = le32_to_cpu(raw_inode->i_mtime_nsec);
> > > +	__recover_inode(inode, node_page);
> > >  
> > >  	if (is_dent_dnode(node_page))
> > >  		return recover_dentry(node_page, inode);
> > >  
> > >  	f2fs_msg(inode->i_sb, KERN_NOTICE, "recover_inode: ino = %x, name = %s",
> > > -			ino_of_node(node_page), raw_inode->i_name);
> > > +			ino_of_node(node_page), F2FS_INODE(node_page)->i_name);
> > >  	return 0;
> > >  }
> > >  
> > > @@ -186,10 +221,16 @@ static int find_fsync_dnodes(struct f2fs_sb_info *sbi, struct list_head *head)
> > >  				break;
> > >  			}
> > >  
> > > +			/*
> > > +			 * CP | dnode(F) | inode(DF)
> > > +			 * For this case, we should not give up now.
> > > +			 */
> > >  			entry->inode = f2fs_iget(sbi->sb, ino_of_node(page));
> > >  			if (IS_ERR(entry->inode)) {
> > >  				err = PTR_ERR(entry->inode);
> > >  				kmem_cache_free(fsync_entry_slab, entry);
> > > +				if (err == -ENOENT)
> > > +					goto next;
> > >  				break;
> > >  			}
> > >  			list_add_tail(&entry->list, head);
> > > @@ -416,6 +457,13 @@ static int recover_data(struct f2fs_sb_info *sbi,
> > >  		entry = get_fsync_inode(head, ino_of_node(page));
> > >  		if (!entry)
> > >  			goto next;
> > > +		/*
> > > +		 * inode(x) | CP | inode(x) | dnode(F)
> > > +		 * In this case, we can lose the latest inode(x).
> > > +		 * So, call __recover_inode for the inode update.
> > > +		 */
> > > +		if (IS_INODE(page))
> > > +			__recover_inode(entry->inode, page);
> > >  
> > >  		err = do_recover_data(sbi, entry->inode, page, blkaddr);
> > >  		if (err)



------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] f2fs: Fix recover when nid of non-inode dnode < nid of inode
  2014-09-14  7:48                         ` Jaegeuk Kim
@ 2014-09-15  5:14                           ` Huang Ying
  -1 siblings, 0 replies; 31+ messages in thread
From: Huang Ying @ 2014-09-15  5:14 UTC (permalink / raw)
  To: Jaegeuk Kim; +Cc: huang ying, Changman Lee, linux-f2fs-devel, LKML

On Sun, 2014-09-14 at 00:48 -0700, Jaegeuk Kim wrote:
> On Sat, Sep 13, 2014 at 10:23:18PM +0800, Huang Ying wrote:
> > On Fri, 2014-09-12 at 15:34 +0800, Huang Ying wrote:
> > > On Thu, 2014-09-11 at 22:13 -0700, Jaegeuk Kim wrote:
> > > > On Thu, Sep 11, 2014 at 08:25:17PM +0800, Huang Ying wrote:
> > > > > 
> > > > > On Wed, 2014-09-10 at 22:37 -0700, Jaegeuk Kim wrote:
> > > > > > On Wed, Sep 10, 2014 at 07:08:32PM +0800, huang ying wrote:
> > > > > > > On Wed, Sep 10, 2014 at 3:21 PM, Jaegeuk Kim <jaegeuk@kernel.org> wrote:
> > > > > > > 
> > > > > > > > On Tue, Sep 09, 2014 at 07:31:49PM +0800, huang ying wrote:
> > > > > > > > > On Tue, Sep 9, 2014 at 3:09 PM, Jaegeuk Kim <jaegeuk@kernel.org> wrote:
> > > > > > > > >
> > > > > > > > > > Hi,
> > > > > > > > > >
> > > > > > > > > > On Tue, Sep 09, 2014 at 01:39:30PM +0800, Huang Ying wrote:
> > > > > > > > > > > On Mon, 2014-09-08 at 22:23 -0700, Jaegeuk Kim wrote:
> > > > > > > > > > > > Hi Huang,
> > > > > > > > > > > >
> > > > > > > > > > > > On Mon, Sep 08, 2014 at 07:38:26PM +0800, Huang Ying wrote:
> > > > > > > > > > > > > For fsync, if the nid of a non-inode dnode < nid of inode and the
> > > > > > > > > > > > > inode is not checkpointed.  The non-inode dnode may be written
> > > > > > > > before
> > > > > > > > > > > > > inode.  So in find_fsync_dnodes, f2fs_iget will fail, cause the
> > > > > > > > > > > > > recovery fail.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Usually, inode will be allocated before non-inode dnode, so the
> > > > > > > > nid
> > > > > > > > > > of
> > > > > > > > > > > > > inode < nid of non-inode dnode.  But it is possible for the
> > > > > > > > reverse.
> > > > > > > > > > > > > For example, because of alloc_nid_failed.
> > > > > > > > > > > > >
> > > > > > > > > > > > > This is fixed via ignoring non-inode dnode before inode dnode in
> > > > > > > > > > > > > find_fsync_dnodes.
> > > > > > > > > > > > >
> > > > > > > > > > > > > The patch was tested via allocating nid reversely via a debugging
> > > > > > > > > > > > > patch, that is, from big number to small number.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Signed-off-by: Huang, Ying <ying.huang@intel.com>
> > > > > > > > > > > > > ---
> > > > > > > > > > > > >  fs/f2fs/recovery.c |    7 ++++---
> > > > > > > > > > > > >  1 file changed, 4 insertions(+), 3 deletions(-)
> > > > > > > > > > > > >
> > > > > > > > > > > > > --- a/fs/f2fs/recovery.c
> > > > > > > > > > > > > +++ b/fs/f2fs/recovery.c
> > > > > > > > > > > > > @@ -172,8 +172,8 @@ static int find_fsync_dnodes(struct f2fs
> > > > > > > > > > > > >                   if (IS_INODE(page) && is_dent_dnode(page))
> > > > > > > > > > > > >                           set_inode_flag(F2FS_I(entry->inode),
> > > > > > > > > > > > >                                                   FI_INC_LINK);
> > > > > > > > > > > > > -         } else {
> > > > > > > > > > > > > -                 if (IS_INODE(page) && is_dent_dnode(page)) {
> > > > > > > > > > > >
> > > > > > > > > > > > If this is not inode block, we should add this inode to recover its
> > > > > > > > > > data blocks.
> > > > > > > > > > >
> > > > > > > > > > > Is it possible that there is only non-inode dnode but no inode when
> > > > > > > > > > > find_fsync_dnodes checking dnodes?  Per my understanding, any
> > > > > > > > changes to
> > > > > > > > > > > file will cause inode page dirty (for example, mtime changed), so
> > > > > > > > that
> > > > > > > > > > > we will write inode block.  Is it right?  If so, the solution in this
> > > > > > > > > > > patch should work too.
> > > > > > > > > >
> > > > > > > > > > Your description says that f2fs_iget will fail, which causes the
> > > > > > > > recovery
> > > > > > > > > > fail.
> > > > > > > > > > So, I thought it would be better to handle the f2fs_iget failure
> > > > > > > > directly.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Yes.  That is another way to fix the issue.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > In addition, we cannot guarantee the write order of dnode and inode.
> > > > > > > > > > For exmaple,
> > > > > > > > > > 1. the inode is written by flusher or kswapd, then,
> > > > > > > > > > 2. f2fs_sync_file writes its dnode.
> > > > > > > > > >
> > > > > > > > > > In that case, we can get only non-inode dnode in the node chain, since
> > > > > > > > the
> > > > > > > > > > inode
> > > > > > > > > > has not fsync_mark.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > I think your solution is better here, but does not fix all scenarios.  If
> > > > > > > > > the inode is checkpointed, the file can be recovered, although the inode
> > > > > > > > > information may be not up to date.  But if the inode is not checkpointed,
> > > > > > > > > f2fs_iget will fail too and recover will fail.
> > > > > > > >
> > > > > > > > Ok, let me consider your scenarios.
> > > > > > > >
> > > > > > > > Term: F: fsync_mark, D: dentry_mark
> > > > > > > >
> > > > > > > > 1. inode(x) | CP | inode(x) | dnode(F)
> > > > > > > >  -> Lose the latest inode(x). Need to fix.
> > > > > > > >
> > > > > > > > 2. inode(x) | CP | dnode(F) | inode(x)
> > > > > > > >  -> Impossible, but recover latest dnode(F)
> > > > > > > >
> > > > > > > > 3. CP | inode(x) | dnode(F)
> > > > > > > >  -> Need to write inode(DF) in f2fs_sync_file.
> > > > > > > >
> > > > > > > > 4. CP | dnode(F) | inode(DF)
> > > > > > > >  -> If f2fs_iget fails, then goto next.
> > > > > > > >
> > > > > > > > 5. CP | dnode(F) | inode(x)
> > > > > > > >  -> If f2fs_iget fails, then goto next. But, this is an impossible
> > > > > > > > scenario.
> > > > > > > >     Drop this dnode(F).
> > > > > > > >
> > > > > > > > Indeed, there were some missing scenarios.
> > > > > > > >
> > > > > > > > So, how about this patch?
> > > > > > > >
> > > > > > > > From 552dc68c5f07a335d7b55c197bab531efb135521 Mon Sep 17 00:00:00 2001
> > > > > > > > From: Jaegeuk Kim <jaegeuk@kernel.org>
> > > > > > > > Date: Wed, 10 Sep 2014 00:16:34 -0700
> > > > > > > > Subject: [PATCH] f2fs: fix roll-forward missing scenarios
> > > > > > > >
> > > > > > > > We can summarize the roll forward recovery scenarios as follows.
> > > > > > > >
> > > > > > > > [Term] F: fsync_mark, D: dentry_mark
> > > > > > > >
> > > > > > > > 1. inode(x) | CP | inode(x) | dnode(F)
> > > > > > > > -> Update the latest inode(x).
> > > > > > > >
> > > > > > > > 2. inode(x) | CP | inode(F) | dnode(F)
> > > > > > > > -> No problem.
> > > > > > > >
> > > > > > > > 3. inode(x) | CP | dnode(F) | inode(x)
> > > > > > > > -> Impossible, but recover latest dnode(F)
> > > > > > > >
> > > > > > > 
> > > > > > > I think this is possible.  If f2fs_sync_file runs concurrently with
> > > > > > > writeback. f2fs_sync_file written dnode(F), then writeback written inode(x).
> > > > > > 
> > > > > > If the inode(x) was written, f2fs_sync_file will do write the inode again with
> > > > > > fsync_mark. So, dnode(F) | inode(x) | inode(F) should be shown.
> > > > > > 
> > > > > > In f2fs_sync_file,
> > > > > > 	...
> > > > > > 	while (!sync_node_pages(sbi, ino, &wbc)) {
> > > > > > 		if (fsync_mark_done(sbi, ino))
> > > > > > 			goto out;
> > > > > > 		mark_inode_dirty_sync(inode);
> > > > > > 		ret = f2fs_write_inode(inode, NULL);
> > > > > > 		if (ret)
> > > > > > 			goto out;
> > > > > > 	}
> > > > > > 	...
> > > > > 
> > > > > Is the following situation possible?
> > > > > 
> > > > > f2fs_sync_file				<writeback>
> > > > >   sync_node_pages			  f2fs_write_node_pages
> > > > >     write dnode(F)			    sync_node_pages
> > > > > 					      write inode(x) /* clear PAGECACHE_TAG_DIRTY */
> > > > > 
> > > > > 
> > > > > That is, f2fs_sync_file run parallel with <writeback>.  The
> > > > > sync_node_pages above will return 1, because dnode(F) is written.
> > > > > inode(x) is written by <writeback> path.  And because
> > > > > PAGECACHE_TAG_DIRTY is cleared, it is possible that sync_node_pages
> > > > > called by f2fs_sync_file does not write inode(F).
> > > > 
> > > > I think Chao's comment would work.
> > > > How about this patch?
> > > > 
> > > > From 32fe5ff49d2c78d3be4cf3638cc64ae71cf44549 Mon Sep 17 00:00:00 2001
> > > > From: Jaegeuk Kim <jaegeuk@kernel.org>
> > > > Date: Wed, 10 Sep 2014 00:16:34 -0700
> > > > Subject: [PATCH] f2fs: fix roll-forward missing scenarios
> > > > 
> > > > We can summarize the roll forward recovery scenarios as follows.
> > > > 
> > > > [Term] F: fsync_mark, D: dentry_mark
> > > > 
> > > > 1. inode(x) | CP | inode(x) | dnode(F)
> > > > -> Update the latest inode(x).
> > > > 
> > > > 2. inode(x) | CP | inode(F) | dnode(F)
> > > > -> No problem.
> > > > 
> > > > 3. inode(x) | CP | dnode(F) | inode(x)
> > > > -> Recover to the latest dnode(F), and drop the last inode(x)
> > > > 
> > > > 4. inode(x) | CP | dnode(F) | inode(F)
> > > > -> No problem.
> > > > 
> > > > 5. CP | inode(x) | dnode(F)
> > > > -> The inode(DF) was missing. Should drop this dnode(F).
> > > > 
> > > > 6. CP | inode(DF) | dnode(F)
> > > > -> No problem.
> > > > 
> > > > 7. CP | dnode(F) | inode(DF)
> > > > -> If f2fs_iget fails, then goto next to find inode(DF).
> > > > 
> > > > 8. CP | dnode(F) | inode(x)
> > > > -> If f2fs_iget fails, then goto next to find inode(DF).
> > > >    But it will fail due to no inode(DF).
> > > > 
> > > > So, this patch adds some missing points such as #1, #5, #7, and #8.
> > > > 
> > > > Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> > > > ---
> > > >  fs/f2fs/file.c     | 20 ++++++++++++----
> > > >  fs/f2fs/node.c     | 11 ++++++++-
> > > >  fs/f2fs/recovery.c | 70 +++++++++++++++++++++++++++++++++++++++++++++---------
> > > >  3 files changed, 85 insertions(+), 16 deletions(-)
> > > > 
> > > > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> > > > index e7681c3..70f5d4b 100644
> > > > --- a/fs/f2fs/file.c
> > > > +++ b/fs/f2fs/file.c
> > > > @@ -206,15 +206,27 @@ int f2fs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
> > > >  			up_write(&fi->i_sem);
> > > >  		}
> > > >  	} else {
> > > > -		/* if there is no written node page, write its inode page */
> > > > -		while (!sync_node_pages(sbi, ino, &wbc)) {
> > > > -			if (fsync_mark_done(sbi, ino))
> > > > -				goto out;
> > > > +sync_nodes:
> > > > +		sync_node_pages(sbi, ino, &wbc);
> > > > +
> > > > +		/*
> > > > +		 * inode(x) | CP | inode(x) | dnode(F)
> > > > +		 *  -> ok
> > > 
> > > Is it acceptable that we turn this to:
> > > 
> > > inode(x) | CPU | inode (x) | dnode (F) | inode(F)
> > > 
> > > > +		 * inode(x) | CP | dnode(F) | inode(x)
> > > > +		 *  -> inode(x) | CP | dnode(F) | inode(x) | inode(F)
> > > > +		 * CP | inode(x) | dnode(F)
> > > > +		 *  -> CP | inode(x) | dnode(F) | inode(DF)
> > > > +		 * CP | dnode(F) | inode(x)
> > > > +		 *  -> CP | dnode(F) | inode(x) | inode(DF)
> > > > +		 */
> > > > +		if (!fsync_mark_done(sbi, ino)) {
> > > >  			mark_inode_dirty_sync(inode);
> > > >  			ret = f2fs_write_inode(inode, NULL);
> > > >  			if (ret)
> > > >  				goto out;
> > > > +			goto sync_nodes;
> > > >  		}
> > > > +
> > > >  		ret = wait_on_node_pages_writeback(sbi, ino);
> > > >  		if (ret)
> > > >  			goto out;
> > > > diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
> > > > index b32eb56..653aa71 100644
> > > > --- a/fs/f2fs/node.c
> > > > +++ b/fs/f2fs/node.c
> > > > @@ -248,8 +248,17 @@ retry:
> > > >  
> > > >  	/* update fsync_mark if its inode nat entry is still alive */
> > > >  	e = __lookup_nat_cache(nm_i, ni->ino);
> > > > -	if (e)
> > > > +	if (e) {
> > > > +		/*
> > > > +		 * CP | inode(x) | dnode(F)
> > > > +		 *  -> CP | inode(x) | dnode(F) | inode(DF)
> > > > +		 */
> > > > +		if (!e->checkpointed && !e->fsync_done &&
> > > > +				ni->ino != ni->nid && fsync_done)
> > > > +			goto skip;
> > > >  		e->fsync_done = fsync_done;
> > > > +	}
> > > > +skip:
> > > >  	write_unlock(&nm_i->nat_tree_lock);
> > > >  }
> > > 
> > > I don't understand why we need so complex logic?  Why not just let
> > > e->fsync_done reflect just latest is_fsync_dnode(page)?
> > > 
> > > It appears that in f2fs_sync_file, what we need is just whether inode
> > > page has fsync mark or not.
> > 
> > Oh, I see.  The e->fsync_done indicates whether latest node of the inode
> > has fsync_mark.  That can be used to deal with:
> > 
> > CP | inode(DF) | dnode(x)
> > 
> > But I still think it is possible to make e->fsync_done simple via
> > introduce another simple e->node_fsync_done to indicate whether the node
> > itself has fsync_mark.  What do you think about the solution in the
> > below patch?  Just build tested.
> > 
> > IMHO, it is easier to be understand.  And in effect, it is almost same,
> > because it is hard to distinguish between
> 
> I think you are showing another complexity with the same functionality.
> 
> As I mentioned before, this itself is a little bit corner case, so we need
> at least such kind of combined conditions.
> So, it doesn't make enough sense to add one more variable per each nat entry.

If you care about size of nat entry, we can make all these bool into bit
field if necessary.  If you care about concept, IMHO, the added logic is
straightforward and easier to be understood.  And all such kind of
complex logic are in one place: f2fs_sync_file, instead of being
scattered to set_node_addr too.

And I found your solution will unnecessarily append another inode(DF) in
the following scenario if my understanding were correct.

CP | inode(DF) | dnode(x) | dnode(F)

All in all, I think the high possibility scenarios are as follow:

inode(x) | CP | dnode(F)
CP | inode(DF) | dnode(F)

All other scenarios have very low possibility.  So IMHO it is not
necessary to improve the code complexity to do too much optimizing for
them.  It should be OK just to detect them and append an inode(F) or
inode(DF) block.

Best Regards,
Huang, Ying

> Thanks,
> 
> > 
> > CP | inode(x) | dnode(F)
> > 
> > and
> > 
> > inode(x) | CP | inode(x) | dnode(F)
> > 
> > in your solution too.
> > 
> > Best Regards,
> > Huang, Ying
> > 
> > ----------------------------------------------------->
> > ---
> >  fs/f2fs/f2fs.h     |    1 +
> >  fs/f2fs/file.c     |   26 ++++++++++++++++++++++----
> >  fs/f2fs/node.c     |   17 ++++++++++++++++-
> >  fs/f2fs/node.h     |    4 +++-
> >  fs/f2fs/recovery.c |    2 ++
> >  5 files changed, 44 insertions(+), 6 deletions(-)
> > 
> > --- a/fs/f2fs/file.c
> > +++ b/fs/f2fs/file.c
> > @@ -207,10 +207,28 @@ int f2fs_sync_file(struct file *file, lo
> >  			up_write(&fi->i_sem);
> >  		}
> >  	} else {
> > -		/* if there is no written node page, write its inode page */
> > -		while (!sync_node_pages(sbi, inode->i_ino, &wbc)) {
> > -			if (fsync_mark_done(sbi, inode->i_ino))
> > -				goto out;
> > +		for (;;) {
> > +			sync_node_pages(sbi, inode->i_ino, &wbc);
> > +			/*
> > +			 * inode(x) | CP | dnode(F)
> > +			 * -> ok
> > +			 * inode(x) | CP | inode(x) | dnode(x)
> > +			 * -> inode(x) | CP | inode(x) | dnode(x) | inode(F)
> > +			 * inode(x) | CP | inode(x) | dnode(F)
> > +			 * -> inode(x) | CP | inode(x) | dnode(F) | inode(F)
> > +			 * CP | inode(x) | dnode(x)
> > +			 * -> CP | inode(x) | dnode(x) | inode(DF)
> > +			 * CP | inode(x) | dnode(F)
> > +			 * -> CP | inode(x) | dnode(F) | inode(DF)
> > +			 * CP | dnode(F) | inode(x)
> > +			 * -> CP | dnode(F) | inode(x) | inode(DF)
> > +			 */
> > +			if ((is_checkpointed_node(sbi, inode->i_ino) &&
> > +			     fsync_mark_done(sbi, inode->i_ino)) ||
> > +			    (!is_checkpointed_node(sbi, inode->i_ino) &&
> > +			     node_fsync_mark_done(sbi, inode->i_ino) &&
> > +			     fsync_mark_done(sbi, inode->i_ino)))
> > +				break;
> >  			mark_inode_dirty_sync(inode);
> >  			ret = f2fs_write_inode(inode, NULL);
> >  			if (ret)
> > --- a/fs/f2fs/node.c
> > +++ b/fs/f2fs/node.c
> > @@ -137,6 +137,20 @@ int is_checkpointed_node(struct f2fs_sb_
> >  	return is_cp;
> >  }
> >  
> > +bool node_fsync_mark_done(struct f2fs_sb_info *sbi, nid_t nid)
> > +{
> > +	struct f2fs_nm_info *nm_i = NM_I(sbi);
> > +	struct nat_entry *e;
> > +	bool node_fsync_done = false;
> > +
> > +	read_lock(&nm_i->nat_tree_lock);
> > +	e = __lookup_nat_cache(nm_i, nid);
> > +	if (e)
> > +		node_fsync_done = e->node_fsync_done;
> > +	read_unlock(&nm_i->nat_tree_lock);
> > +	return node_fsync_done;
> > +}
> > +
> >  bool fsync_mark_done(struct f2fs_sb_info *sbi, nid_t nid)
> >  {
> >  	struct f2fs_nm_info *nm_i = NM_I(sbi);
> > @@ -246,7 +260,8 @@ retry:
> >  	nat_set_blkaddr(e, new_blkaddr);
> >  	__set_nat_cache_dirty(nm_i, e);
> >  
> > -	/* update fsync_mark if its inode nat entry is still alive */
> > +	e->node_fsync_done = fsync_done;
> > +	/* update fsync_done if its inode nat entry is still alive */
> >  	e = __lookup_nat_cache(nm_i, ni->ino);
> >  	if (e)
> >  		e->fsync_done = fsync_done;
> > --- a/fs/f2fs/node.h
> > +++ b/fs/f2fs/node.h
> > @@ -42,7 +42,9 @@ struct node_info {
> >  struct nat_entry {
> >  	struct list_head list;	/* for clean or dirty nat list */
> >  	bool checkpointed;	/* whether it is checkpointed or not */
> > -	bool fsync_done;	/* whether the latest node has fsync mark */
> > +	bool node_fsync_done;	/* whether the node has fsync mark */
> > +	bool fsync_done;	/* whether the latest node of the
> > +				 * inode has fsync mark */
> >  	struct node_info ni;	/* in-memory node information */
> >  };
> >  
> > --- a/fs/f2fs/recovery.c
> > +++ b/fs/f2fs/recovery.c
> > @@ -190,6 +190,8 @@ static int find_fsync_dnodes(struct f2fs
> >  			if (IS_ERR(entry->inode)) {
> >  				err = PTR_ERR(entry->inode);
> >  				kmem_cache_free(fsync_entry_slab, entry);
> > +				if (err == -ENOENT)
> > +					goto next;
> >  				break;
> >  			}
> >  			list_add_tail(&entry->list, head);
> > --- a/fs/f2fs/f2fs.h
> > +++ b/fs/f2fs/f2fs.h
> > @@ -1220,6 +1220,7 @@ struct node_info;
> >  
> >  bool available_free_memory(struct f2fs_sb_info *, int);
> >  int is_checkpointed_node(struct f2fs_sb_info *, nid_t);
> > +bool node_fsync_mark_done(struct f2fs_sb_info *, nid_t);
> >  bool fsync_mark_done(struct f2fs_sb_info *, nid_t);
> >  void fsync_mark_clear(struct f2fs_sb_info *, nid_t);
> >  void get_node_info(struct f2fs_sb_info *, nid_t, struct node_info *);



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] f2fs: Fix recover when nid of non-inode dnode < nid of inode
@ 2014-09-15  5:14                           ` Huang Ying
  0 siblings, 0 replies; 31+ messages in thread
From: Huang Ying @ 2014-09-15  5:14 UTC (permalink / raw)
  To: Jaegeuk Kim; +Cc: huang ying, LKML, linux-f2fs-devel

On Sun, 2014-09-14 at 00:48 -0700, Jaegeuk Kim wrote:
> On Sat, Sep 13, 2014 at 10:23:18PM +0800, Huang Ying wrote:
> > On Fri, 2014-09-12 at 15:34 +0800, Huang Ying wrote:
> > > On Thu, 2014-09-11 at 22:13 -0700, Jaegeuk Kim wrote:
> > > > On Thu, Sep 11, 2014 at 08:25:17PM +0800, Huang Ying wrote:
> > > > > 
> > > > > On Wed, 2014-09-10 at 22:37 -0700, Jaegeuk Kim wrote:
> > > > > > On Wed, Sep 10, 2014 at 07:08:32PM +0800, huang ying wrote:
> > > > > > > On Wed, Sep 10, 2014 at 3:21 PM, Jaegeuk Kim <jaegeuk@kernel.org> wrote:
> > > > > > > 
> > > > > > > > On Tue, Sep 09, 2014 at 07:31:49PM +0800, huang ying wrote:
> > > > > > > > > On Tue, Sep 9, 2014 at 3:09 PM, Jaegeuk Kim <jaegeuk@kernel.org> wrote:
> > > > > > > > >
> > > > > > > > > > Hi,
> > > > > > > > > >
> > > > > > > > > > On Tue, Sep 09, 2014 at 01:39:30PM +0800, Huang Ying wrote:
> > > > > > > > > > > On Mon, 2014-09-08 at 22:23 -0700, Jaegeuk Kim wrote:
> > > > > > > > > > > > Hi Huang,
> > > > > > > > > > > >
> > > > > > > > > > > > On Mon, Sep 08, 2014 at 07:38:26PM +0800, Huang Ying wrote:
> > > > > > > > > > > > > For fsync, if the nid of a non-inode dnode < nid of inode and the
> > > > > > > > > > > > > inode is not checkpointed.  The non-inode dnode may be written
> > > > > > > > before
> > > > > > > > > > > > > inode.  So in find_fsync_dnodes, f2fs_iget will fail, cause the
> > > > > > > > > > > > > recovery fail.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Usually, inode will be allocated before non-inode dnode, so the
> > > > > > > > nid
> > > > > > > > > > of
> > > > > > > > > > > > > inode < nid of non-inode dnode.  But it is possible for the
> > > > > > > > reverse.
> > > > > > > > > > > > > For example, because of alloc_nid_failed.
> > > > > > > > > > > > >
> > > > > > > > > > > > > This is fixed via ignoring non-inode dnode before inode dnode in
> > > > > > > > > > > > > find_fsync_dnodes.
> > > > > > > > > > > > >
> > > > > > > > > > > > > The patch was tested via allocating nid reversely via a debugging
> > > > > > > > > > > > > patch, that is, from big number to small number.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Signed-off-by: Huang, Ying <ying.huang@intel.com>
> > > > > > > > > > > > > ---
> > > > > > > > > > > > >  fs/f2fs/recovery.c |    7 ++++---
> > > > > > > > > > > > >  1 file changed, 4 insertions(+), 3 deletions(-)
> > > > > > > > > > > > >
> > > > > > > > > > > > > --- a/fs/f2fs/recovery.c
> > > > > > > > > > > > > +++ b/fs/f2fs/recovery.c
> > > > > > > > > > > > > @@ -172,8 +172,8 @@ static int find_fsync_dnodes(struct f2fs
> > > > > > > > > > > > >                   if (IS_INODE(page) && is_dent_dnode(page))
> > > > > > > > > > > > >                           set_inode_flag(F2FS_I(entry->inode),
> > > > > > > > > > > > >                                                   FI_INC_LINK);
> > > > > > > > > > > > > -         } else {
> > > > > > > > > > > > > -                 if (IS_INODE(page) && is_dent_dnode(page)) {
> > > > > > > > > > > >
> > > > > > > > > > > > If this is not inode block, we should add this inode to recover its
> > > > > > > > > > data blocks.
> > > > > > > > > > >
> > > > > > > > > > > Is it possible that there is only non-inode dnode but no inode when
> > > > > > > > > > > find_fsync_dnodes checking dnodes?  Per my understanding, any
> > > > > > > > changes to
> > > > > > > > > > > file will cause inode page dirty (for example, mtime changed), so
> > > > > > > > that
> > > > > > > > > > > we will write inode block.  Is it right?  If so, the solution in this
> > > > > > > > > > > patch should work too.
> > > > > > > > > >
> > > > > > > > > > Your description says that f2fs_iget will fail, which causes the
> > > > > > > > recovery
> > > > > > > > > > fail.
> > > > > > > > > > So, I thought it would be better to handle the f2fs_iget failure
> > > > > > > > directly.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Yes.  That is another way to fix the issue.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > In addition, we cannot guarantee the write order of dnode and inode.
> > > > > > > > > > For exmaple,
> > > > > > > > > > 1. the inode is written by flusher or kswapd, then,
> > > > > > > > > > 2. f2fs_sync_file writes its dnode.
> > > > > > > > > >
> > > > > > > > > > In that case, we can get only non-inode dnode in the node chain, since
> > > > > > > > the
> > > > > > > > > > inode
> > > > > > > > > > has not fsync_mark.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > I think your solution is better here, but does not fix all scenarios.  If
> > > > > > > > > the inode is checkpointed, the file can be recovered, although the inode
> > > > > > > > > information may be not up to date.  But if the inode is not checkpointed,
> > > > > > > > > f2fs_iget will fail too and recover will fail.
> > > > > > > >
> > > > > > > > Ok, let me consider your scenarios.
> > > > > > > >
> > > > > > > > Term: F: fsync_mark, D: dentry_mark
> > > > > > > >
> > > > > > > > 1. inode(x) | CP | inode(x) | dnode(F)
> > > > > > > >  -> Lose the latest inode(x). Need to fix.
> > > > > > > >
> > > > > > > > 2. inode(x) | CP | dnode(F) | inode(x)
> > > > > > > >  -> Impossible, but recover latest dnode(F)
> > > > > > > >
> > > > > > > > 3. CP | inode(x) | dnode(F)
> > > > > > > >  -> Need to write inode(DF) in f2fs_sync_file.
> > > > > > > >
> > > > > > > > 4. CP | dnode(F) | inode(DF)
> > > > > > > >  -> If f2fs_iget fails, then goto next.
> > > > > > > >
> > > > > > > > 5. CP | dnode(F) | inode(x)
> > > > > > > >  -> If f2fs_iget fails, then goto next. But, this is an impossible
> > > > > > > > scenario.
> > > > > > > >     Drop this dnode(F).
> > > > > > > >
> > > > > > > > Indeed, there were some missing scenarios.
> > > > > > > >
> > > > > > > > So, how about this patch?
> > > > > > > >
> > > > > > > > From 552dc68c5f07a335d7b55c197bab531efb135521 Mon Sep 17 00:00:00 2001
> > > > > > > > From: Jaegeuk Kim <jaegeuk@kernel.org>
> > > > > > > > Date: Wed, 10 Sep 2014 00:16:34 -0700
> > > > > > > > Subject: [PATCH] f2fs: fix roll-forward missing scenarios
> > > > > > > >
> > > > > > > > We can summarize the roll forward recovery scenarios as follows.
> > > > > > > >
> > > > > > > > [Term] F: fsync_mark, D: dentry_mark
> > > > > > > >
> > > > > > > > 1. inode(x) | CP | inode(x) | dnode(F)
> > > > > > > > -> Update the latest inode(x).
> > > > > > > >
> > > > > > > > 2. inode(x) | CP | inode(F) | dnode(F)
> > > > > > > > -> No problem.
> > > > > > > >
> > > > > > > > 3. inode(x) | CP | dnode(F) | inode(x)
> > > > > > > > -> Impossible, but recover latest dnode(F)
> > > > > > > >
> > > > > > > 
> > > > > > > I think this is possible.  If f2fs_sync_file runs concurrently with
> > > > > > > writeback. f2fs_sync_file written dnode(F), then writeback written inode(x).
> > > > > > 
> > > > > > If the inode(x) was written, f2fs_sync_file will do write the inode again with
> > > > > > fsync_mark. So, dnode(F) | inode(x) | inode(F) should be shown.
> > > > > > 
> > > > > > In f2fs_sync_file,
> > > > > > 	...
> > > > > > 	while (!sync_node_pages(sbi, ino, &wbc)) {
> > > > > > 		if (fsync_mark_done(sbi, ino))
> > > > > > 			goto out;
> > > > > > 		mark_inode_dirty_sync(inode);
> > > > > > 		ret = f2fs_write_inode(inode, NULL);
> > > > > > 		if (ret)
> > > > > > 			goto out;
> > > > > > 	}
> > > > > > 	...
> > > > > 
> > > > > Is the following situation possible?
> > > > > 
> > > > > f2fs_sync_file				<writeback>
> > > > >   sync_node_pages			  f2fs_write_node_pages
> > > > >     write dnode(F)			    sync_node_pages
> > > > > 					      write inode(x) /* clear PAGECACHE_TAG_DIRTY */
> > > > > 
> > > > > 
> > > > > That is, f2fs_sync_file run parallel with <writeback>.  The
> > > > > sync_node_pages above will return 1, because dnode(F) is written.
> > > > > inode(x) is written by <writeback> path.  And because
> > > > > PAGECACHE_TAG_DIRTY is cleared, it is possible that sync_node_pages
> > > > > called by f2fs_sync_file does not write inode(F).
> > > > 
> > > > I think Chao's comment would work.
> > > > How about this patch?
> > > > 
> > > > From 32fe5ff49d2c78d3be4cf3638cc64ae71cf44549 Mon Sep 17 00:00:00 2001
> > > > From: Jaegeuk Kim <jaegeuk@kernel.org>
> > > > Date: Wed, 10 Sep 2014 00:16:34 -0700
> > > > Subject: [PATCH] f2fs: fix roll-forward missing scenarios
> > > > 
> > > > We can summarize the roll forward recovery scenarios as follows.
> > > > 
> > > > [Term] F: fsync_mark, D: dentry_mark
> > > > 
> > > > 1. inode(x) | CP | inode(x) | dnode(F)
> > > > -> Update the latest inode(x).
> > > > 
> > > > 2. inode(x) | CP | inode(F) | dnode(F)
> > > > -> No problem.
> > > > 
> > > > 3. inode(x) | CP | dnode(F) | inode(x)
> > > > -> Recover to the latest dnode(F), and drop the last inode(x)
> > > > 
> > > > 4. inode(x) | CP | dnode(F) | inode(F)
> > > > -> No problem.
> > > > 
> > > > 5. CP | inode(x) | dnode(F)
> > > > -> The inode(DF) was missing. Should drop this dnode(F).
> > > > 
> > > > 6. CP | inode(DF) | dnode(F)
> > > > -> No problem.
> > > > 
> > > > 7. CP | dnode(F) | inode(DF)
> > > > -> If f2fs_iget fails, then goto next to find inode(DF).
> > > > 
> > > > 8. CP | dnode(F) | inode(x)
> > > > -> If f2fs_iget fails, then goto next to find inode(DF).
> > > >    But it will fail due to no inode(DF).
> > > > 
> > > > So, this patch adds some missing points such as #1, #5, #7, and #8.
> > > > 
> > > > Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> > > > ---
> > > >  fs/f2fs/file.c     | 20 ++++++++++++----
> > > >  fs/f2fs/node.c     | 11 ++++++++-
> > > >  fs/f2fs/recovery.c | 70 +++++++++++++++++++++++++++++++++++++++++++++---------
> > > >  3 files changed, 85 insertions(+), 16 deletions(-)
> > > > 
> > > > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> > > > index e7681c3..70f5d4b 100644
> > > > --- a/fs/f2fs/file.c
> > > > +++ b/fs/f2fs/file.c
> > > > @@ -206,15 +206,27 @@ int f2fs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
> > > >  			up_write(&fi->i_sem);
> > > >  		}
> > > >  	} else {
> > > > -		/* if there is no written node page, write its inode page */
> > > > -		while (!sync_node_pages(sbi, ino, &wbc)) {
> > > > -			if (fsync_mark_done(sbi, ino))
> > > > -				goto out;
> > > > +sync_nodes:
> > > > +		sync_node_pages(sbi, ino, &wbc);
> > > > +
> > > > +		/*
> > > > +		 * inode(x) | CP | inode(x) | dnode(F)
> > > > +		 *  -> ok
> > > 
> > > Is it acceptable that we turn this to:
> > > 
> > > inode(x) | CPU | inode (x) | dnode (F) | inode(F)
> > > 
> > > > +		 * inode(x) | CP | dnode(F) | inode(x)
> > > > +		 *  -> inode(x) | CP | dnode(F) | inode(x) | inode(F)
> > > > +		 * CP | inode(x) | dnode(F)
> > > > +		 *  -> CP | inode(x) | dnode(F) | inode(DF)
> > > > +		 * CP | dnode(F) | inode(x)
> > > > +		 *  -> CP | dnode(F) | inode(x) | inode(DF)
> > > > +		 */
> > > > +		if (!fsync_mark_done(sbi, ino)) {
> > > >  			mark_inode_dirty_sync(inode);
> > > >  			ret = f2fs_write_inode(inode, NULL);
> > > >  			if (ret)
> > > >  				goto out;
> > > > +			goto sync_nodes;
> > > >  		}
> > > > +
> > > >  		ret = wait_on_node_pages_writeback(sbi, ino);
> > > >  		if (ret)
> > > >  			goto out;
> > > > diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
> > > > index b32eb56..653aa71 100644
> > > > --- a/fs/f2fs/node.c
> > > > +++ b/fs/f2fs/node.c
> > > > @@ -248,8 +248,17 @@ retry:
> > > >  
> > > >  	/* update fsync_mark if its inode nat entry is still alive */
> > > >  	e = __lookup_nat_cache(nm_i, ni->ino);
> > > > -	if (e)
> > > > +	if (e) {
> > > > +		/*
> > > > +		 * CP | inode(x) | dnode(F)
> > > > +		 *  -> CP | inode(x) | dnode(F) | inode(DF)
> > > > +		 */
> > > > +		if (!e->checkpointed && !e->fsync_done &&
> > > > +				ni->ino != ni->nid && fsync_done)
> > > > +			goto skip;
> > > >  		e->fsync_done = fsync_done;
> > > > +	}
> > > > +skip:
> > > >  	write_unlock(&nm_i->nat_tree_lock);
> > > >  }
> > > 
> > > I don't understand why we need so complex logic?  Why not just let
> > > e->fsync_done reflect just latest is_fsync_dnode(page)?
> > > 
> > > It appears that in f2fs_sync_file, what we need is just whether inode
> > > page has fsync mark or not.
> > 
> > Oh, I see.  The e->fsync_done indicates whether latest node of the inode
> > has fsync_mark.  That can be used to deal with:
> > 
> > CP | inode(DF) | dnode(x)
> > 
> > But I still think it is possible to make e->fsync_done simple via
> > introduce another simple e->node_fsync_done to indicate whether the node
> > itself has fsync_mark.  What do you think about the solution in the
> > below patch?  Just build tested.
> > 
> > IMHO, it is easier to be understand.  And in effect, it is almost same,
> > because it is hard to distinguish between
> 
> I think you are showing another complexity with the same functionality.
> 
> As I mentioned before, this itself is a little bit corner case, so we need
> at least such kind of combined conditions.
> So, it doesn't make enough sense to add one more variable per each nat entry.

If you care about size of nat entry, we can make all these bool into bit
field if necessary.  If you care about concept, IMHO, the added logic is
straightforward and easier to be understood.  And all such kind of
complex logic are in one place: f2fs_sync_file, instead of being
scattered to set_node_addr too.

And I found your solution will unnecessarily append another inode(DF) in
the following scenario if my understanding were correct.

CP | inode(DF) | dnode(x) | dnode(F)

All in all, I think the high possibility scenarios are as follow:

inode(x) | CP | dnode(F)
CP | inode(DF) | dnode(F)

All other scenarios have very low possibility.  So IMHO it is not
necessary to improve the code complexity to do too much optimizing for
them.  It should be OK just to detect them and append an inode(F) or
inode(DF) block.

Best Regards,
Huang, Ying

> Thanks,
> 
> > 
> > CP | inode(x) | dnode(F)
> > 
> > and
> > 
> > inode(x) | CP | inode(x) | dnode(F)
> > 
> > in your solution too.
> > 
> > Best Regards,
> > Huang, Ying
> > 
> > ----------------------------------------------------->
> > ---
> >  fs/f2fs/f2fs.h     |    1 +
> >  fs/f2fs/file.c     |   26 ++++++++++++++++++++++----
> >  fs/f2fs/node.c     |   17 ++++++++++++++++-
> >  fs/f2fs/node.h     |    4 +++-
> >  fs/f2fs/recovery.c |    2 ++
> >  5 files changed, 44 insertions(+), 6 deletions(-)
> > 
> > --- a/fs/f2fs/file.c
> > +++ b/fs/f2fs/file.c
> > @@ -207,10 +207,28 @@ int f2fs_sync_file(struct file *file, lo
> >  			up_write(&fi->i_sem);
> >  		}
> >  	} else {
> > -		/* if there is no written node page, write its inode page */
> > -		while (!sync_node_pages(sbi, inode->i_ino, &wbc)) {
> > -			if (fsync_mark_done(sbi, inode->i_ino))
> > -				goto out;
> > +		for (;;) {
> > +			sync_node_pages(sbi, inode->i_ino, &wbc);
> > +			/*
> > +			 * inode(x) | CP | dnode(F)
> > +			 * -> ok
> > +			 * inode(x) | CP | inode(x) | dnode(x)
> > +			 * -> inode(x) | CP | inode(x) | dnode(x) | inode(F)
> > +			 * inode(x) | CP | inode(x) | dnode(F)
> > +			 * -> inode(x) | CP | inode(x) | dnode(F) | inode(F)
> > +			 * CP | inode(x) | dnode(x)
> > +			 * -> CP | inode(x) | dnode(x) | inode(DF)
> > +			 * CP | inode(x) | dnode(F)
> > +			 * -> CP | inode(x) | dnode(F) | inode(DF)
> > +			 * CP | dnode(F) | inode(x)
> > +			 * -> CP | dnode(F) | inode(x) | inode(DF)
> > +			 */
> > +			if ((is_checkpointed_node(sbi, inode->i_ino) &&
> > +			     fsync_mark_done(sbi, inode->i_ino)) ||
> > +			    (!is_checkpointed_node(sbi, inode->i_ino) &&
> > +			     node_fsync_mark_done(sbi, inode->i_ino) &&
> > +			     fsync_mark_done(sbi, inode->i_ino)))
> > +				break;
> >  			mark_inode_dirty_sync(inode);
> >  			ret = f2fs_write_inode(inode, NULL);
> >  			if (ret)
> > --- a/fs/f2fs/node.c
> > +++ b/fs/f2fs/node.c
> > @@ -137,6 +137,20 @@ int is_checkpointed_node(struct f2fs_sb_
> >  	return is_cp;
> >  }
> >  
> > +bool node_fsync_mark_done(struct f2fs_sb_info *sbi, nid_t nid)
> > +{
> > +	struct f2fs_nm_info *nm_i = NM_I(sbi);
> > +	struct nat_entry *e;
> > +	bool node_fsync_done = false;
> > +
> > +	read_lock(&nm_i->nat_tree_lock);
> > +	e = __lookup_nat_cache(nm_i, nid);
> > +	if (e)
> > +		node_fsync_done = e->node_fsync_done;
> > +	read_unlock(&nm_i->nat_tree_lock);
> > +	return node_fsync_done;
> > +}
> > +
> >  bool fsync_mark_done(struct f2fs_sb_info *sbi, nid_t nid)
> >  {
> >  	struct f2fs_nm_info *nm_i = NM_I(sbi);
> > @@ -246,7 +260,8 @@ retry:
> >  	nat_set_blkaddr(e, new_blkaddr);
> >  	__set_nat_cache_dirty(nm_i, e);
> >  
> > -	/* update fsync_mark if its inode nat entry is still alive */
> > +	e->node_fsync_done = fsync_done;
> > +	/* update fsync_done if its inode nat entry is still alive */
> >  	e = __lookup_nat_cache(nm_i, ni->ino);
> >  	if (e)
> >  		e->fsync_done = fsync_done;
> > --- a/fs/f2fs/node.h
> > +++ b/fs/f2fs/node.h
> > @@ -42,7 +42,9 @@ struct node_info {
> >  struct nat_entry {
> >  	struct list_head list;	/* for clean or dirty nat list */
> >  	bool checkpointed;	/* whether it is checkpointed or not */
> > -	bool fsync_done;	/* whether the latest node has fsync mark */
> > +	bool node_fsync_done;	/* whether the node has fsync mark */
> > +	bool fsync_done;	/* whether the latest node of the
> > +				 * inode has fsync mark */
> >  	struct node_info ni;	/* in-memory node information */
> >  };
> >  
> > --- a/fs/f2fs/recovery.c
> > +++ b/fs/f2fs/recovery.c
> > @@ -190,6 +190,8 @@ static int find_fsync_dnodes(struct f2fs
> >  			if (IS_ERR(entry->inode)) {
> >  				err = PTR_ERR(entry->inode);
> >  				kmem_cache_free(fsync_entry_slab, entry);
> > +				if (err == -ENOENT)
> > +					goto next;
> >  				break;
> >  			}
> >  			list_add_tail(&entry->list, head);
> > --- a/fs/f2fs/f2fs.h
> > +++ b/fs/f2fs/f2fs.h
> > @@ -1220,6 +1220,7 @@ struct node_info;
> >  
> >  bool available_free_memory(struct f2fs_sb_info *, int);
> >  int is_checkpointed_node(struct f2fs_sb_info *, nid_t);
> > +bool node_fsync_mark_done(struct f2fs_sb_info *, nid_t);
> >  bool fsync_mark_done(struct f2fs_sb_info *, nid_t);
> >  void fsync_mark_clear(struct f2fs_sb_info *, nid_t);
> >  void get_node_info(struct f2fs_sb_info *, nid_t, struct node_info *);



------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] f2fs: Fix recover when nid of non-inode dnode < nid of inode
  2014-09-15  5:14                           ` Huang Ying
  (?)
@ 2014-09-18  5:47                           ` Jaegeuk Kim
  -1 siblings, 0 replies; 31+ messages in thread
From: Jaegeuk Kim @ 2014-09-18  5:47 UTC (permalink / raw)
  To: Huang Ying; +Cc: huang ying, Changman Lee, linux-f2fs-devel, LKML

On Mon, Sep 15, 2014 at 01:14:09PM +0800, Huang Ying wrote:
> On Sun, 2014-09-14 at 00:48 -0700, Jaegeuk Kim wrote:
> > On Sat, Sep 13, 2014 at 10:23:18PM +0800, Huang Ying wrote:
> > > On Fri, 2014-09-12 at 15:34 +0800, Huang Ying wrote:
> > > > On Thu, 2014-09-11 at 22:13 -0700, Jaegeuk Kim wrote:
> > > > > On Thu, Sep 11, 2014 at 08:25:17PM +0800, Huang Ying wrote:
> > > > > > 
> > > > > > On Wed, 2014-09-10 at 22:37 -0700, Jaegeuk Kim wrote:
> > > > > > > On Wed, Sep 10, 2014 at 07:08:32PM +0800, huang ying wrote:
> > > > > > > > On Wed, Sep 10, 2014 at 3:21 PM, Jaegeuk Kim <jaegeuk@kernel.org> wrote:
> > > > > > > > 
> > > > > > > > > On Tue, Sep 09, 2014 at 07:31:49PM +0800, huang ying wrote:
> > > > > > > > > > On Tue, Sep 9, 2014 at 3:09 PM, Jaegeuk Kim <jaegeuk@kernel.org> wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi,
> > > > > > > > > > >
> > > > > > > > > > > On Tue, Sep 09, 2014 at 01:39:30PM +0800, Huang Ying wrote:
> > > > > > > > > > > > On Mon, 2014-09-08 at 22:23 -0700, Jaegeuk Kim wrote:
> > > > > > > > > > > > > Hi Huang,
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Mon, Sep 08, 2014 at 07:38:26PM +0800, Huang Ying wrote:
> > > > > > > > > > > > > > For fsync, if the nid of a non-inode dnode < nid of inode and the
> > > > > > > > > > > > > > inode is not checkpointed.  The non-inode dnode may be written
> > > > > > > > > before
> > > > > > > > > > > > > > inode.  So in find_fsync_dnodes, f2fs_iget will fail, cause the
> > > > > > > > > > > > > > recovery fail.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Usually, inode will be allocated before non-inode dnode, so the
> > > > > > > > > nid
> > > > > > > > > > > of
> > > > > > > > > > > > > > inode < nid of non-inode dnode.  But it is possible for the
> > > > > > > > > reverse.
> > > > > > > > > > > > > > For example, because of alloc_nid_failed.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > This is fixed via ignoring non-inode dnode before inode dnode in
> > > > > > > > > > > > > > find_fsync_dnodes.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > The patch was tested via allocating nid reversely via a debugging
> > > > > > > > > > > > > > patch, that is, from big number to small number.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Signed-off-by: Huang, Ying <ying.huang@intel.com>
> > > > > > > > > > > > > > ---
> > > > > > > > > > > > > >  fs/f2fs/recovery.c |    7 ++++---
> > > > > > > > > > > > > >  1 file changed, 4 insertions(+), 3 deletions(-)
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > --- a/fs/f2fs/recovery.c
> > > > > > > > > > > > > > +++ b/fs/f2fs/recovery.c
> > > > > > > > > > > > > > @@ -172,8 +172,8 @@ static int find_fsync_dnodes(struct f2fs
> > > > > > > > > > > > > >                   if (IS_INODE(page) && is_dent_dnode(page))
> > > > > > > > > > > > > >                           set_inode_flag(F2FS_I(entry->inode),
> > > > > > > > > > > > > >                                                   FI_INC_LINK);
> > > > > > > > > > > > > > -         } else {
> > > > > > > > > > > > > > -                 if (IS_INODE(page) && is_dent_dnode(page)) {
> > > > > > > > > > > > >
> > > > > > > > > > > > > If this is not inode block, we should add this inode to recover its
> > > > > > > > > > > data blocks.
> > > > > > > > > > > >
> > > > > > > > > > > > Is it possible that there is only non-inode dnode but no inode when
> > > > > > > > > > > > find_fsync_dnodes checking dnodes?  Per my understanding, any
> > > > > > > > > changes to
> > > > > > > > > > > > file will cause inode page dirty (for example, mtime changed), so
> > > > > > > > > that
> > > > > > > > > > > > we will write inode block.  Is it right?  If so, the solution in this
> > > > > > > > > > > > patch should work too.
> > > > > > > > > > >
> > > > > > > > > > > Your description says that f2fs_iget will fail, which causes the
> > > > > > > > > recovery
> > > > > > > > > > > fail.
> > > > > > > > > > > So, I thought it would be better to handle the f2fs_iget failure
> > > > > > > > > directly.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Yes.  That is another way to fix the issue.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > In addition, we cannot guarantee the write order of dnode and inode.
> > > > > > > > > > > For exmaple,
> > > > > > > > > > > 1. the inode is written by flusher or kswapd, then,
> > > > > > > > > > > 2. f2fs_sync_file writes its dnode.
> > > > > > > > > > >
> > > > > > > > > > > In that case, we can get only non-inode dnode in the node chain, since
> > > > > > > > > the
> > > > > > > > > > > inode
> > > > > > > > > > > has not fsync_mark.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > I think your solution is better here, but does not fix all scenarios.  If
> > > > > > > > > > the inode is checkpointed, the file can be recovered, although the inode
> > > > > > > > > > information may be not up to date.  But if the inode is not checkpointed,
> > > > > > > > > > f2fs_iget will fail too and recover will fail.
> > > > > > > > >
> > > > > > > > > Ok, let me consider your scenarios.
> > > > > > > > >
> > > > > > > > > Term: F: fsync_mark, D: dentry_mark
> > > > > > > > >
> > > > > > > > > 1. inode(x) | CP | inode(x) | dnode(F)
> > > > > > > > >  -> Lose the latest inode(x). Need to fix.
> > > > > > > > >
> > > > > > > > > 2. inode(x) | CP | dnode(F) | inode(x)
> > > > > > > > >  -> Impossible, but recover latest dnode(F)
> > > > > > > > >
> > > > > > > > > 3. CP | inode(x) | dnode(F)
> > > > > > > > >  -> Need to write inode(DF) in f2fs_sync_file.
> > > > > > > > >
> > > > > > > > > 4. CP | dnode(F) | inode(DF)
> > > > > > > > >  -> If f2fs_iget fails, then goto next.
> > > > > > > > >
> > > > > > > > > 5. CP | dnode(F) | inode(x)
> > > > > > > > >  -> If f2fs_iget fails, then goto next. But, this is an impossible
> > > > > > > > > scenario.
> > > > > > > > >     Drop this dnode(F).
> > > > > > > > >
> > > > > > > > > Indeed, there were some missing scenarios.
> > > > > > > > >
> > > > > > > > > So, how about this patch?
> > > > > > > > >
> > > > > > > > > From 552dc68c5f07a335d7b55c197bab531efb135521 Mon Sep 17 00:00:00 2001
> > > > > > > > > From: Jaegeuk Kim <jaegeuk@kernel.org>
> > > > > > > > > Date: Wed, 10 Sep 2014 00:16:34 -0700
> > > > > > > > > Subject: [PATCH] f2fs: fix roll-forward missing scenarios
> > > > > > > > >
> > > > > > > > > We can summarize the roll forward recovery scenarios as follows.
> > > > > > > > >
> > > > > > > > > [Term] F: fsync_mark, D: dentry_mark
> > > > > > > > >
> > > > > > > > > 1. inode(x) | CP | inode(x) | dnode(F)
> > > > > > > > > -> Update the latest inode(x).
> > > > > > > > >
> > > > > > > > > 2. inode(x) | CP | inode(F) | dnode(F)
> > > > > > > > > -> No problem.
> > > > > > > > >
> > > > > > > > > 3. inode(x) | CP | dnode(F) | inode(x)
> > > > > > > > > -> Impossible, but recover latest dnode(F)
> > > > > > > > >
> > > > > > > > 
> > > > > > > > I think this is possible.  If f2fs_sync_file runs concurrently with
> > > > > > > > writeback. f2fs_sync_file written dnode(F), then writeback written inode(x).
> > > > > > > 
> > > > > > > If the inode(x) was written, f2fs_sync_file will do write the inode again with
> > > > > > > fsync_mark. So, dnode(F) | inode(x) | inode(F) should be shown.
> > > > > > > 
> > > > > > > In f2fs_sync_file,
> > > > > > > 	...
> > > > > > > 	while (!sync_node_pages(sbi, ino, &wbc)) {
> > > > > > > 		if (fsync_mark_done(sbi, ino))
> > > > > > > 			goto out;
> > > > > > > 		mark_inode_dirty_sync(inode);
> > > > > > > 		ret = f2fs_write_inode(inode, NULL);
> > > > > > > 		if (ret)
> > > > > > > 			goto out;
> > > > > > > 	}
> > > > > > > 	...
> > > > > > 
> > > > > > Is the following situation possible?
> > > > > > 
> > > > > > f2fs_sync_file				<writeback>
> > > > > >   sync_node_pages			  f2fs_write_node_pages
> > > > > >     write dnode(F)			    sync_node_pages
> > > > > > 					      write inode(x) /* clear PAGECACHE_TAG_DIRTY */
> > > > > > 
> > > > > > 
> > > > > > That is, f2fs_sync_file run parallel with <writeback>.  The
> > > > > > sync_node_pages above will return 1, because dnode(F) is written.
> > > > > > inode(x) is written by <writeback> path.  And because
> > > > > > PAGECACHE_TAG_DIRTY is cleared, it is possible that sync_node_pages
> > > > > > called by f2fs_sync_file does not write inode(F).
> > > > > 
> > > > > I think Chao's comment would work.
> > > > > How about this patch?
> > > > > 
> > > > > From 32fe5ff49d2c78d3be4cf3638cc64ae71cf44549 Mon Sep 17 00:00:00 2001
> > > > > From: Jaegeuk Kim <jaegeuk@kernel.org>
> > > > > Date: Wed, 10 Sep 2014 00:16:34 -0700
> > > > > Subject: [PATCH] f2fs: fix roll-forward missing scenarios
> > > > > 
> > > > > We can summarize the roll forward recovery scenarios as follows.
> > > > > 
> > > > > [Term] F: fsync_mark, D: dentry_mark
> > > > > 
> > > > > 1. inode(x) | CP | inode(x) | dnode(F)
> > > > > -> Update the latest inode(x).
> > > > > 
> > > > > 2. inode(x) | CP | inode(F) | dnode(F)
> > > > > -> No problem.
> > > > > 
> > > > > 3. inode(x) | CP | dnode(F) | inode(x)
> > > > > -> Recover to the latest dnode(F), and drop the last inode(x)
> > > > > 
> > > > > 4. inode(x) | CP | dnode(F) | inode(F)
> > > > > -> No problem.
> > > > > 
> > > > > 5. CP | inode(x) | dnode(F)
> > > > > -> The inode(DF) was missing. Should drop this dnode(F).
> > > > > 
> > > > > 6. CP | inode(DF) | dnode(F)
> > > > > -> No problem.
> > > > > 
> > > > > 7. CP | dnode(F) | inode(DF)
> > > > > -> If f2fs_iget fails, then goto next to find inode(DF).
> > > > > 
> > > > > 8. CP | dnode(F) | inode(x)
> > > > > -> If f2fs_iget fails, then goto next to find inode(DF).
> > > > >    But it will fail due to no inode(DF).
> > > > > 
> > > > > So, this patch adds some missing points such as #1, #5, #7, and #8.
> > > > > 
> > > > > Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> > > > > ---
> > > > >  fs/f2fs/file.c     | 20 ++++++++++++----
> > > > >  fs/f2fs/node.c     | 11 ++++++++-
> > > > >  fs/f2fs/recovery.c | 70 +++++++++++++++++++++++++++++++++++++++++++++---------
> > > > >  3 files changed, 85 insertions(+), 16 deletions(-)
> > > > > 
> > > > > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> > > > > index e7681c3..70f5d4b 100644
> > > > > --- a/fs/f2fs/file.c
> > > > > +++ b/fs/f2fs/file.c
> > > > > @@ -206,15 +206,27 @@ int f2fs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
> > > > >  			up_write(&fi->i_sem);
> > > > >  		}
> > > > >  	} else {
> > > > > -		/* if there is no written node page, write its inode page */
> > > > > -		while (!sync_node_pages(sbi, ino, &wbc)) {
> > > > > -			if (fsync_mark_done(sbi, ino))
> > > > > -				goto out;
> > > > > +sync_nodes:
> > > > > +		sync_node_pages(sbi, ino, &wbc);
> > > > > +
> > > > > +		/*
> > > > > +		 * inode(x) | CP | inode(x) | dnode(F)
> > > > > +		 *  -> ok
> > > > 
> > > > Is it acceptable that we turn this to:
> > > > 
> > > > inode(x) | CPU | inode (x) | dnode (F) | inode(F)
> > > > 
> > > > > +		 * inode(x) | CP | dnode(F) | inode(x)
> > > > > +		 *  -> inode(x) | CP | dnode(F) | inode(x) | inode(F)
> > > > > +		 * CP | inode(x) | dnode(F)
> > > > > +		 *  -> CP | inode(x) | dnode(F) | inode(DF)
> > > > > +		 * CP | dnode(F) | inode(x)
> > > > > +		 *  -> CP | dnode(F) | inode(x) | inode(DF)
> > > > > +		 */
> > > > > +		if (!fsync_mark_done(sbi, ino)) {
> > > > >  			mark_inode_dirty_sync(inode);
> > > > >  			ret = f2fs_write_inode(inode, NULL);
> > > > >  			if (ret)
> > > > >  				goto out;
> > > > > +			goto sync_nodes;
> > > > >  		}
> > > > > +
> > > > >  		ret = wait_on_node_pages_writeback(sbi, ino);
> > > > >  		if (ret)
> > > > >  			goto out;
> > > > > diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
> > > > > index b32eb56..653aa71 100644
> > > > > --- a/fs/f2fs/node.c
> > > > > +++ b/fs/f2fs/node.c
> > > > > @@ -248,8 +248,17 @@ retry:
> > > > >  
> > > > >  	/* update fsync_mark if its inode nat entry is still alive */
> > > > >  	e = __lookup_nat_cache(nm_i, ni->ino);
> > > > > -	if (e)
> > > > > +	if (e) {
> > > > > +		/*
> > > > > +		 * CP | inode(x) | dnode(F)
> > > > > +		 *  -> CP | inode(x) | dnode(F) | inode(DF)
> > > > > +		 */
> > > > > +		if (!e->checkpointed && !e->fsync_done &&
> > > > > +				ni->ino != ni->nid && fsync_done)
> > > > > +			goto skip;
> > > > >  		e->fsync_done = fsync_done;
> > > > > +	}
> > > > > +skip:
> > > > >  	write_unlock(&nm_i->nat_tree_lock);
> > > > >  }
> > > > 
> > > > I don't understand why we need so complex logic?  Why not just let
> > > > e->fsync_done reflect just latest is_fsync_dnode(page)?
> > > > 
> > > > It appears that in f2fs_sync_file, what we need is just whether inode
> > > > page has fsync mark or not.
> > > 
> > > Oh, I see.  The e->fsync_done indicates whether latest node of the inode
> > > has fsync_mark.  That can be used to deal with:
> > > 
> > > CP | inode(DF) | dnode(x)
> > > 
> > > But I still think it is possible to make e->fsync_done simple via
> > > introduce another simple e->node_fsync_done to indicate whether the node
> > > itself has fsync_mark.  What do you think about the solution in the
> > > below patch?  Just build tested.
> > > 
> > > IMHO, it is easier to be understand.  And in effect, it is almost same,
> > > because it is hard to distinguish between
> > 
> > I think you are showing another complexity with the same functionality.
> > 
> > As I mentioned before, this itself is a little bit corner case, so we need
> > at least such kind of combined conditions.
> > So, it doesn't make enough sense to add one more variable per each nat entry.
> 
> If you care about size of nat entry, we can make all these bool into bit
> field if necessary.  If you care about concept, IMHO, the added logic is
> straightforward and easier to be understood.  And all such kind of
> complex logic are in one place: f2fs_sync_file, instead of being
> scattered to set_node_addr too.
> 
> And I found your solution will unnecessarily append another inode(DF) in
> the following scenario if my understanding were correct.
> 
> CP | inode(DF) | dnode(x) | dnode(F)

Oh, nice catch!
Then, this becomes a different story where it is worth to change from the
sketch.

So, as your suggestion, it would be good to use simple bits for nat entry
states, and then to make a rule for redirtying its inode during the fsync path.
Afterwards, the roll-forward should be fixed finally.

Please find them in other patch-set that I'll send.

Thanks,

> 
> All in all, I think the high possibility scenarios are as follow:
> 
> inode(x) | CP | dnode(F)
> CP | inode(DF) | dnode(F)
> 
> All other scenarios have very low possibility.  So IMHO it is not
> necessary to improve the code complexity to do too much optimizing for
> them.  It should be OK just to detect them and append an inode(F) or
> inode(DF) block.
> 
> Best Regards,
> Huang, Ying
> 
> > Thanks,
> > 
> > > 
> > > CP | inode(x) | dnode(F)
> > > 
> > > and
> > > 
> > > inode(x) | CP | inode(x) | dnode(F)
> > > 
> > > in your solution too.
> > > 
> > > Best Regards,
> > > Huang, Ying
> > > 
> > > ----------------------------------------------------->
> > > ---
> > >  fs/f2fs/f2fs.h     |    1 +
> > >  fs/f2fs/file.c     |   26 ++++++++++++++++++++++----
> > >  fs/f2fs/node.c     |   17 ++++++++++++++++-
> > >  fs/f2fs/node.h     |    4 +++-
> > >  fs/f2fs/recovery.c |    2 ++
> > >  5 files changed, 44 insertions(+), 6 deletions(-)
> > > 
> > > --- a/fs/f2fs/file.c
> > > +++ b/fs/f2fs/file.c
> > > @@ -207,10 +207,28 @@ int f2fs_sync_file(struct file *file, lo
> > >  			up_write(&fi->i_sem);
> > >  		}
> > >  	} else {
> > > -		/* if there is no written node page, write its inode page */
> > > -		while (!sync_node_pages(sbi, inode->i_ino, &wbc)) {
> > > -			if (fsync_mark_done(sbi, inode->i_ino))
> > > -				goto out;
> > > +		for (;;) {
> > > +			sync_node_pages(sbi, inode->i_ino, &wbc);
> > > +			/*
> > > +			 * inode(x) | CP | dnode(F)
> > > +			 * -> ok
> > > +			 * inode(x) | CP | inode(x) | dnode(x)
> > > +			 * -> inode(x) | CP | inode(x) | dnode(x) | inode(F)
> > > +			 * inode(x) | CP | inode(x) | dnode(F)
> > > +			 * -> inode(x) | CP | inode(x) | dnode(F) | inode(F)
> > > +			 * CP | inode(x) | dnode(x)
> > > +			 * -> CP | inode(x) | dnode(x) | inode(DF)
> > > +			 * CP | inode(x) | dnode(F)
> > > +			 * -> CP | inode(x) | dnode(F) | inode(DF)
> > > +			 * CP | dnode(F) | inode(x)
> > > +			 * -> CP | dnode(F) | inode(x) | inode(DF)
> > > +			 */
> > > +			if ((is_checkpointed_node(sbi, inode->i_ino) &&
> > > +			     fsync_mark_done(sbi, inode->i_ino)) ||
> > > +			    (!is_checkpointed_node(sbi, inode->i_ino) &&
> > > +			     node_fsync_mark_done(sbi, inode->i_ino) &&
> > > +			     fsync_mark_done(sbi, inode->i_ino)))
> > > +				break;
> > >  			mark_inode_dirty_sync(inode);
> > >  			ret = f2fs_write_inode(inode, NULL);
> > >  			if (ret)
> > > --- a/fs/f2fs/node.c
> > > +++ b/fs/f2fs/node.c
> > > @@ -137,6 +137,20 @@ int is_checkpointed_node(struct f2fs_sb_
> > >  	return is_cp;
> > >  }
> > >  
> > > +bool node_fsync_mark_done(struct f2fs_sb_info *sbi, nid_t nid)
> > > +{
> > > +	struct f2fs_nm_info *nm_i = NM_I(sbi);
> > > +	struct nat_entry *e;
> > > +	bool node_fsync_done = false;
> > > +
> > > +	read_lock(&nm_i->nat_tree_lock);
> > > +	e = __lookup_nat_cache(nm_i, nid);
> > > +	if (e)
> > > +		node_fsync_done = e->node_fsync_done;
> > > +	read_unlock(&nm_i->nat_tree_lock);
> > > +	return node_fsync_done;
> > > +}
> > > +
> > >  bool fsync_mark_done(struct f2fs_sb_info *sbi, nid_t nid)
> > >  {
> > >  	struct f2fs_nm_info *nm_i = NM_I(sbi);
> > > @@ -246,7 +260,8 @@ retry:
> > >  	nat_set_blkaddr(e, new_blkaddr);
> > >  	__set_nat_cache_dirty(nm_i, e);
> > >  
> > > -	/* update fsync_mark if its inode nat entry is still alive */
> > > +	e->node_fsync_done = fsync_done;
> > > +	/* update fsync_done if its inode nat entry is still alive */
> > >  	e = __lookup_nat_cache(nm_i, ni->ino);
> > >  	if (e)
> > >  		e->fsync_done = fsync_done;
> > > --- a/fs/f2fs/node.h
> > > +++ b/fs/f2fs/node.h
> > > @@ -42,7 +42,9 @@ struct node_info {
> > >  struct nat_entry {
> > >  	struct list_head list;	/* for clean or dirty nat list */
> > >  	bool checkpointed;	/* whether it is checkpointed or not */
> > > -	bool fsync_done;	/* whether the latest node has fsync mark */
> > > +	bool node_fsync_done;	/* whether the node has fsync mark */
> > > +	bool fsync_done;	/* whether the latest node of the
> > > +				 * inode has fsync mark */
> > >  	struct node_info ni;	/* in-memory node information */
> > >  };
> > >  
> > > --- a/fs/f2fs/recovery.c
> > > +++ b/fs/f2fs/recovery.c
> > > @@ -190,6 +190,8 @@ static int find_fsync_dnodes(struct f2fs
> > >  			if (IS_ERR(entry->inode)) {
> > >  				err = PTR_ERR(entry->inode);
> > >  				kmem_cache_free(fsync_entry_slab, entry);
> > > +				if (err == -ENOENT)
> > > +					goto next;
> > >  				break;
> > >  			}
> > >  			list_add_tail(&entry->list, head);
> > > --- a/fs/f2fs/f2fs.h
> > > +++ b/fs/f2fs/f2fs.h
> > > @@ -1220,6 +1220,7 @@ struct node_info;
> > >  
> > >  bool available_free_memory(struct f2fs_sb_info *, int);
> > >  int is_checkpointed_node(struct f2fs_sb_info *, nid_t);
> > > +bool node_fsync_mark_done(struct f2fs_sb_info *, nid_t);
> > >  bool fsync_mark_done(struct f2fs_sb_info *, nid_t);
> > >  void fsync_mark_clear(struct f2fs_sb_info *, nid_t);
> > >  void get_node_info(struct f2fs_sb_info *, nid_t, struct node_info *);

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2014-09-18  5:47 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-08 11:38 [PATCH] f2fs: Fix recover when nid of non-inode dnode < nid of inode Huang Ying
2014-09-08 11:38 ` Huang Ying
2014-09-09  5:23 ` Jaegeuk Kim
2014-09-09  5:23   ` Jaegeuk Kim
2014-09-09  5:39   ` Huang Ying
2014-09-09  7:09     ` Jaegeuk Kim
2014-09-09  7:09       ` Jaegeuk Kim
     [not found]       ` <CAC=cRTMA9AqmHjQqK3=5pAs8yqi25Rzmz8MaZ8=oTDaxHAXU+A@mail.gmail.com>
2014-09-10  7:21         ` Jaegeuk Kim
2014-09-10  7:21           ` Jaegeuk Kim
     [not found]           ` <CAC=cRTMGJfD_SZ=bE8pi5U6n5W1MF17emLC3VZDbpLSQtbNcKg@mail.gmail.com>
2014-09-11  5:37             ` Jaegeuk Kim
2014-09-11  5:37               ` Jaegeuk Kim
2014-09-11 10:47               ` [f2fs-dev] " Chao Yu
2014-09-11 10:47                 ` Chao Yu
2014-09-11 12:31                 ` [f2fs-dev] " Huang Ying
2014-09-11 12:31                   ` Huang Ying
2014-09-11 12:25               ` Huang Ying
2014-09-11 12:25                 ` Huang Ying
2014-09-12  5:13                 ` Jaegeuk Kim
2014-09-12  7:34                   ` Huang Ying
2014-09-12  7:34                     ` Huang Ying
2014-09-13 14:23                     ` Huang Ying
2014-09-13 14:23                       ` Huang Ying
2014-09-14  7:48                       ` Jaegeuk Kim
2014-09-14  7:48                         ` Jaegeuk Kim
2014-09-15  5:14                         ` Huang Ying
2014-09-15  5:14                           ` Huang Ying
2014-09-18  5:47                           ` Jaegeuk Kim
2014-09-14  7:38                     ` Jaegeuk Kim
2014-09-14  7:38                       ` Jaegeuk Kim
2014-09-15  1:32                       ` Huang Ying
2014-09-15  1:32                         ` Huang Ying

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.