All of lore.kernel.org
 help / color / mirror / Atom feed
From: tristan ye <tristan.ye@oracle.com>
To: ocfs2-devel@oss.oracle.com
Subject: [Ocfs2-devel] [PATCH 2/2] Ocfs2: Handle deletion of refcounted oprhan_inode correctly.
Date: Sat, 27 Feb 2010 18:34:02 +0800	[thread overview]
Message-ID: <4B88F51A.8070701@oracle.com> (raw)
In-Reply-To: <20100226232807.GF23730@mail.oracle.com>



Joel Becker wrote:
> On Sun, Feb 21, 2010 at 04:29:37PM +0800, Tristan Ye wrote:
>   
>> Current ocfs2 semantic for reflinking a file firstly create a
>> new orphan_inode in orphan_dir, then remove it to target dir
>> after refcounting operation done, these 2 steps makes logic
>> straightfoward, and guarantee a crash during reflinking can
>> be replayed(half-refcounted inode can be removed), while it
>> brings us another issue cause these 2 steps is acquiring the
>> orphan_dir lock respectively, the problem is, orphan_scan()
>> may detect the half-refcounted inode in orphan_dir as its
>> proper candidates to wipe off in a later time. actually it's
>> not of course, we'd handle this correctly.
>>     
>
> 	Why is this necessary?  Don't we have the open lock on the
> reflink target?  That should keep an orphan scan from wiping a life
> refount target.  Tao, do we not have the open lock?
>   

Yes, we have the open lock all time during reflink operation, but we 
didn't hold the orphan_dir's lock during this period, which means 
orphan_scan would have a chance to add our half-refcounted target into 
its working queue, which will be deferred to invoke ocfs2_inode_delete 
after reflink operation done(actually it always be invoked after reflink 
done since we hold the open lock of target inode all the time), in this 
case, we definitely failed at  ocfs2_query_inode_wipe() as bug 1215 
described, since the ORPHAN_FLAG here have been cleared by reflink 
operation. that's not so good to me, we shouldn't have treated this as a 
'ML_ERROR'.

>   
>> diff --git a/fs/ocfs2/inode.c b/fs/ocfs2/inode.c
>> index 88459bd..61fb546 100644
>> --- a/fs/ocfs2/inode.c
>> +++ b/fs/ocfs2/inode.c
>> @@ -892,14 +892,30 @@ static int ocfs2_query_inode_wipe(struct inode *inode,
>>  	di = (struct ocfs2_dinode *) di_bh->b_data;
>>  	if (!(di->i_flags & cpu_to_le32(OCFS2_ORPHANED_FL))) {
>>  		/* for lack of a better error? */
>> -		status = -EEXIST;
>> -		mlog(ML_ERROR,
>> -		     "Inode %llu (on-disk %llu) not orphaned! "
>> -		     "Disk flags  0x%x, inode flags 0x%x\n",
>> -		     (unsigned long long)oi->ip_blkno,
>> -		     (unsigned long long)le64_to_cpu(di->i_blkno),
>> -		     le32_to_cpu(di->i_flags), oi->ip_flags);
>> -		goto bail;
>>     
>
> 	This should not be inside the ORPHANED_FL check.  Every inode in
> the orphan_dir, whether a reflink target or an unlinked inode, should
> have ORPHANED_FL set.  I think this is why others were confused.
>   

Not exactly, after reflink operation, the flag will be unset, and then 
ocfs2_delete_inode() was invoked immediately, we did meet bug 1215.

> 	Tao, were you removing the ORPHANED_FL from reflink targets?
>
>   
>> +		if (!(di->i_dyn_features &
>> +		      cpu_to_le16(OCFS2_HAS_REFCOUNT_FL))) {
>> +			status = -EEXIST;
>> +			mlog(ML_ERROR,
>> +			     "Inode %llu (on-disk %llu) not orphaned! "
>> +			     "Disk flags  0x%x, inode flags 0x%x\n",
>> +			     (unsigned long long)oi->ip_blkno,
>> +			     (unsigned long long)le64_to_cpu(di->i_blkno),
>> +			     le32_to_cpu(di->i_flags), oi->ip_flags);
>> +			goto bail;
>> +		} else {
>> +			/*
>> +			 * It did happen to us, though it's a rare case:
>> +			 * orphan_scan() detects the half-refcounted inode
>> +			 * in orphan_dir, and delete_inode() attempts to
>> +			 * wipe it after reflink operation done later. now
>> +			 * we're not allowed to delete such a valid inode,
>> +			 * instead, just bail out.
>> +			 */
>> +			mlog(0, "Skipping delete of %llu because it's a "
>> +			     "reflinked inode\n",
>> +			     (unsigned long long)oi->ip_blkno);
>> +			goto bail;
>>     
>
> 	Second, this looks like it skips all reflink targets.  That's
> not OK.  Sometimes they do need to be deleted.
>   
Not exactly, it will not skip a failed reflink target which is due to a 
machine crash or other fatal failure.

Look,  if reflink operation aborted in a abnormal way, it's 
OCFS2_ORPHANED_FL on disk always there, that can be judged for us to 
determine if it's a alive half-refcounted inode or a failed reflink target.

So I guess this patch will not skip the real orphan inodes created 
during a reflink failure.



Regards,
Tristan.

> Joel
>
>   

  reply	other threads:[~2010-02-27 10:34 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-02-21  8:29 [Ocfs2-devel] [PATCH 1/2] Ocfs2: Journaling i_flags and i_orphaned_slot when adding inode to orphan dir Tristan Ye
2010-02-21  8:29 ` [Ocfs2-devel] [PATCH 2/2] Ocfs2: Handle deletion of refcounted oprhan_inode correctly Tristan Ye
2010-02-21  8:55   ` Tao Ma
2010-02-21  8:58     ` tristan
2010-02-21  9:04       ` Tao Ma
2010-02-23  0:31         ` Sunil Mushran
2010-02-23  0:51           ` Tao Ma
2010-02-23  1:34           ` tristan
2010-02-26 23:28   ` Joel Becker
2010-02-27 10:34     ` tristan ye [this message]
2010-02-27 10:36       ` tristan ye
2010-02-28  3:06       ` Joel Becker
2010-03-01  2:21         ` tristan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4B88F51A.8070701@oracle.com \
    --to=tristan.ye@oracle.com \
    --cc=ocfs2-devel@oss.oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.