All of lore.kernel.org
 help / color / mirror / Atom feed
* [Ocfs2-devel] [PATCH] ocfs2: unlock open_lock immediately
@ 2011-08-26  2:50 Wengang Wang
  2011-08-31  1:55 ` Sunil Mushran
  2011-09-07 18:04 ` Joel Becker
  0 siblings, 2 replies; 7+ messages in thread
From: Wengang Wang @ 2011-08-26  2:50 UTC (permalink / raw)
  To: ocfs2-devel

There is a race between 2(+) nodes that calls iput_final() on same inode.
time sequence is like the following. The result is neither of the 2(+) node
does real inode deletion work and the unlinked inode is left in orphandir. 

--------------------------------------

node A                                  node B

open_lock PR

                                        open_LOCK PR

.......

                                         .......

#in ocfs2_delete_inode()
inode_lock EX
#in ocfs2_query_inode_wipe
try open_lock EX -->cant grant(B has PR)
ignore the deletion
inode_unlock EX

                                        #in ocfs2_delete_inode() 
                                        inode_lock EX
                                        #in ocfs2_query_inode_wipe
                                        try open_lock EX -->can't grant(A has PR)
                                        ignore the deletion
                                        inode_unlock EX

#in ocfs2_clear_inode()
open_unlock EX
drop open_lock

                                         #in ocfs2_clear_inode()
                                         open_unlock EX

--------------------------------------

The fix is to force dlm_unlock on open_lock within inode_lock. see
comment embedded in patch.

Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
---
 fs/ocfs2/dlmglue.c |    8 ++++++--
 fs/ocfs2/inode.c   |   11 +++++++++++
 2 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
index 7642d7c..f331310 100644
--- a/fs/ocfs2/dlmglue.c
+++ b/fs/ocfs2/dlmglue.c
@@ -1752,12 +1752,16 @@ void ocfs2_open_unlock(struct inode *inode)
 	if (ocfs2_mount_local(osb))
 		goto out;
 
-	if(lockres->l_ro_holders)
+	if (lockres->l_ro_holders) {
 		ocfs2_cluster_unlock(OCFS2_SB(inode->i_sb), lockres,
 				     DLM_LOCK_PR);
-	if(lockres->l_ex_holders)
+		lockres->l_ro_holders = 0;
+	}
+	if (lockres->l_ex_holders) {
 		ocfs2_cluster_unlock(OCFS2_SB(inode->i_sb), lockres,
 				     DLM_LOCK_EX);
+		lockres->l_ex_holders = 0;
+	}
 
 out:
 	return;
diff --git a/fs/ocfs2/inode.c b/fs/ocfs2/inode.c
index b4c8bb6..390a6fc 100644
--- a/fs/ocfs2/inode.c
+++ b/fs/ocfs2/inode.c
@@ -1052,6 +1052,17 @@ static void ocfs2_delete_inode(struct inode *inode)
 	OCFS2_I(inode)->ip_flags |= OCFS2_INODE_DELETED;
 
 bail_unlock_inode:
+	/*
+	 * since we don't take care of deleting the on disk inode any longer
+	 * from now on, we must release the open_lock(dlm unlock) immediately
+	 * within inode_lock. Otherwise, trying open_lock for EX from other node
+	 * can fail if it comes before we release PR on open_lock later, so that
+	 * both/all nodes think other node(s) is/are opening the inode thus
+	 * neither/none of them do real inode deletion.
+	 */
+	ocfs2_open_unlock(inode);
+	ocfs2_simple_drop_lockres(OCFS2_SB(inode->i_sb),
+				  &OCFS2_I(inode)->ip_open_lockres);
 	ocfs2_inode_unlock(inode, 1);
 	brelse(di_bh);
 
-- 
1.7.5.2

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [Ocfs2-devel] [PATCH] ocfs2: unlock open_lock immediately
  2011-08-26  2:50 [Ocfs2-devel] [PATCH] ocfs2: unlock open_lock immediately Wengang Wang
@ 2011-08-31  1:55 ` Sunil Mushran
  2011-08-31  2:51   ` Wengang Wang
  2011-09-07 18:04 ` Joel Becker
  1 sibling, 1 reply; 7+ messages in thread
From: Sunil Mushran @ 2011-08-31  1:55 UTC (permalink / raw)
  To: ocfs2-devel

Comments inlined.

BTW, how common place is this race in your testing? If you can
answer that, I would like to also know how you arrived at it.

On 08/25/2011 07:50 PM, Wengang Wang wrote:
> There is a race between 2(+) nodes that calls iput_final() on same inode.
> time sequence is like the following. The result is neither of the 2(+) node
> does real inode deletion work and the unlinked inode is left in orphandir.
>
> --------------------------------------
>
> node A                                  node B
>
> open_lock PR
>
>                                          open_LOCK PR
>
> .......
>
>                                           .......
>
> #in ocfs2_delete_inode()
> inode_lock EX
> #in ocfs2_query_inode_wipe
> try open_lock EX -->cant grant(B has PR)
> ignore the deletion
> inode_unlock EX
>
>                                          #in ocfs2_delete_inode()
>                                          inode_lock EX
>                                          #in ocfs2_query_inode_wipe
>                                          try open_lock EX -->can't grant(A has PR)
>                                          ignore the deletion
>                                          inode_unlock EX
>
> #in ocfs2_clear_inode()
> open_unlock EX
> drop open_lock
>
>                                           #in ocfs2_clear_inode()
>                                           open_unlock EX
>
> --------------------------------------
>
> The fix is to force dlm_unlock on open_lock within inode_lock. see
> comment embedded in patch.
>
> Signed-off-by: Wengang Wang<wen.gang.wang@oracle.com>

While I am still wrapping my head around this, I see no harm in releasing
the open_lock early. Afterall the inode is in MAYBE_ORPHANED state.

> ---
>   fs/ocfs2/dlmglue.c |    8 ++++++--
>   fs/ocfs2/inode.c   |   11 +++++++++++
>   2 files changed, 17 insertions(+), 2 deletions(-)
>
> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
> index 7642d7c..f331310 100644
> --- a/fs/ocfs2/dlmglue.c
> +++ b/fs/ocfs2/dlmglue.c
> @@ -1752,12 +1752,16 @@ void ocfs2_open_unlock(struct inode *inode)
>   	if (ocfs2_mount_local(osb))
>   		goto out;
>
> -	if(lockres->l_ro_holders)
> +	if (lockres->l_ro_holders) {
>   		ocfs2_cluster_unlock(OCFS2_SB(inode->i_sb), lockres,
>   				     DLM_LOCK_PR);
> -	if(lockres->l_ex_holders)
> +		lockres->l_ro_holders = 0;
> +	}
> +	if (lockres->l_ex_holders) {
>   		ocfs2_cluster_unlock(OCFS2_SB(inode->i_sb), lockres,
>   				     DLM_LOCK_EX);
> +		lockres->l_ex_holders = 0;
> +	}


This bit looks incorrect. We cannot force these counts to zero.
We have to let dec_holders() to do that in cluster_unlock().


>   out:
>   	return;
> diff --git a/fs/ocfs2/inode.c b/fs/ocfs2/inode.c
> index b4c8bb6..390a6fc 100644
> --- a/fs/ocfs2/inode.c
> +++ b/fs/ocfs2/inode.c
> @@ -1052,6 +1052,17 @@ static void ocfs2_delete_inode(struct inode *inode)
>   	OCFS2_I(inode)->ip_flags |= OCFS2_INODE_DELETED;
>
>   bail_unlock_inode:
> +	/*
> +	 * since we don't take care of deleting the on disk inode any longer
> +	 * from now on, we must release the open_lock(dlm unlock) immediately
> +	 * within inode_lock. Otherwise, trying open_lock for EX from other node
> +	 * can fail if it comes before we release PR on open_lock later, so that
> +	 * both/all nodes think other node(s) is/are opening the inode thus
> +	 * neither/none of them do real inode deletion.
> +	 */
> +	ocfs2_open_unlock(inode);
> +	ocfs2_simple_drop_lockres(OCFS2_SB(inode->i_sb),
> +				&OCFS2_I(inode)->ip_open_lockres);
>   	ocfs2_inode_unlock(inode, 1);
>   	brelse(di_bh);
>

We have to make corresponding changes in ocfs2_drop_inode_locks()
and ocfs2_clear_inode().

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Ocfs2-devel] [PATCH] ocfs2: unlock open_lock immediately
  2011-08-31  1:55 ` Sunil Mushran
@ 2011-08-31  2:51   ` Wengang Wang
  2011-08-31 17:36     ` Sunil Mushran
  0 siblings, 1 reply; 7+ messages in thread
From: Wengang Wang @ 2011-08-31  2:51 UTC (permalink / raw)
  To: ocfs2-devel

Hi Sunil,

thanks for you review!

On 11-08-30 18:55, Sunil Mushran wrote:
> Comments inlined.
> 
> BTW, how common place is this race in your testing? If you can
> answer that, I would like to also know how you arrived at it.
The test case is simple:
in a three-node cluster, 
1) node A copies kernel tree to ocfs2 volume
2) node B and C keeps "ls -R" the tree which is under copying
3) after the copy finished, remove the whole tree by "rm -rf xxx" while
   Node B and C are still "ls -R"ing.
4) stop the "ls -R" on node B/C when "rm" on node A is finished.
5) umount all three nodes.
There are entries left in orphandirs(could for all slots).
Actually copying whole is time consuming, I can hit the problem when copied part
of the kernel tree.

I confirmed that the cause is "remotely opened" by printing logs.
log showed that all the three nodes think "there is other node(s) still opening the inode",
so they don't do dinode deletion.

> On 08/25/2011 07:50 PM, Wengang Wang wrote:
> >There is a race between 2(+) nodes that calls iput_final() on same inode.
> >time sequence is like the following. The result is neither of the 2(+) node
> >does real inode deletion work and the unlinked inode is left in orphandir.
> >
> >--------------------------------------
> >
> >node A                                  node B
> >
> >open_lock PR
> >
> >                                         open_LOCK PR
> >
> >.......
> >
> >                                          .......
> >
> >#in ocfs2_delete_inode()
> >inode_lock EX
> >#in ocfs2_query_inode_wipe
> >try open_lock EX -->cant grant(B has PR)
> >ignore the deletion
> >inode_unlock EX
> >
> >                                         #in ocfs2_delete_inode()
> >                                         inode_lock EX
> >                                         #in ocfs2_query_inode_wipe
> >                                         try open_lock EX -->can't grant(A has PR)
> >                                         ignore the deletion
> >                                         inode_unlock EX
> >
> >#in ocfs2_clear_inode()
> >open_unlock EX
> >drop open_lock
> >
> >                                          #in ocfs2_clear_inode()
> >                                          open_unlock EX
> >
> >--------------------------------------
> >
> >The fix is to force dlm_unlock on open_lock within inode_lock. see
> >comment embedded in patch.
> >
> >Signed-off-by: Wengang Wang<wen.gang.wang@oracle.com>
> 
> While I am still wrapping my head around this, I see no harm in releasing
> the open_lock early. Afterall the inode is in MAYBE_ORPHANED state.
> 
> >---
> >  fs/ocfs2/dlmglue.c |    8 ++++++--
> >  fs/ocfs2/inode.c   |   11 +++++++++++
> >  2 files changed, 17 insertions(+), 2 deletions(-)
> >
> >diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
> >index 7642d7c..f331310 100644
> >--- a/fs/ocfs2/dlmglue.c
> >+++ b/fs/ocfs2/dlmglue.c
> >@@ -1752,12 +1752,16 @@ void ocfs2_open_unlock(struct inode *inode)
> >  	if (ocfs2_mount_local(osb))
> >  		goto out;
> >
> >-	if(lockres->l_ro_holders)
> >+	if (lockres->l_ro_holders) {
> >  		ocfs2_cluster_unlock(OCFS2_SB(inode->i_sb), lockres,
> >  				     DLM_LOCK_PR);
> >-	if(lockres->l_ex_holders)
> >+		lockres->l_ro_holders = 0;
> >+	}
> >+	if (lockres->l_ex_holders) {
> >  		ocfs2_cluster_unlock(OCFS2_SB(inode->i_sb), lockres,
> >  				     DLM_LOCK_EX);
> >+		lockres->l_ex_holders = 0;
> >+	}
> 
> 
> This bit looks incorrect. We cannot force these counts to zero.
> We have to let dec_holders() to do that in cluster_unlock().

OK, good point. Seems we don't need force them to zero since
cluster_unlock() takes care of it.

> 
> 
> >  out:
> >  	return;
> >diff --git a/fs/ocfs2/inode.c b/fs/ocfs2/inode.c
> >index b4c8bb6..390a6fc 100644
> >--- a/fs/ocfs2/inode.c
> >+++ b/fs/ocfs2/inode.c
> >@@ -1052,6 +1052,17 @@ static void ocfs2_delete_inode(struct inode *inode)
> >  	OCFS2_I(inode)->ip_flags |= OCFS2_INODE_DELETED;
> >
> >  bail_unlock_inode:
> >+	/*
> >+	 * since we don't take care of deleting the on disk inode any longer
> >+	 * from now on, we must release the open_lock(dlm unlock) immediately
> >+	 * within inode_lock. Otherwise, trying open_lock for EX from other node
> >+	 * can fail if it comes before we release PR on open_lock later, so that
> >+	 * both/all nodes think other node(s) is/are opening the inode thus
> >+	 * neither/none of them do real inode deletion.
> >+	 */
> >+	ocfs2_open_unlock(inode);
> >+	ocfs2_simple_drop_lockres(OCFS2_SB(inode->i_sb),
> >+				&OCFS2_I(inode)->ip_open_lockres);
> >  	ocfs2_inode_unlock(inode, 1);
> >  	brelse(di_bh);
> >
> 
> We have to make corresponding changes in ocfs2_drop_inode_locks()
> and ocfs2_clear_inode().
Yes, I have also checked into the two functions.
I found there is no bad effect to call ocfs2_open_unlock() and
ocfs2_simple_drop_lockres() on openlock twice.

when ocfs2_inode_unlock() is called in ocfs2_clear_inode again, the holders
should be zero already(in the first call), so no confusing to dlm unlock. 
when ocfs2_mark_lockres_freeing() is called for the second time, the openlock
should be with OCFS2_LOCK_FREEING flag already, I don't see problem here. 
when ocfs2_drop_lock() is called for the second time, flag OCFS2_LOCK_ATTACHED
should be cleared already(in first call), so no problem either.
Maybe I missed something?

Simply removing ocfs2_open_unlock/ocfs2_mark_lockres_freeing from
ocfs2_clear_inode() and ocfs2_drop_lock from ocfs2_drop_inode_locks()
respectively can introduce problem:

For ((inode->i_nlink &&
     !(OCFS2_I(inode)->ip_flags &OCFS2_INODE_MAYBE_ORPHANED))
case, we still need to call ocfs2_open_unlock() and
ocfs2_mark_lockres_freeing() against openlock in ocfs2_clear_inode()
and call ocfs2_drop_lock() in ocfs2_drop_inode_locks().

Or do you mean we should add a flag to indicate whether we should do those
or not?

thanks,
wengang.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Ocfs2-devel] [PATCH] ocfs2: unlock open_lock immediately
  2011-08-31  2:51   ` Wengang Wang
@ 2011-08-31 17:36     ` Sunil Mushran
  2011-09-01  1:00       ` Wengang Wang
  0 siblings, 1 reply; 7+ messages in thread
From: Sunil Mushran @ 2011-08-31 17:36 UTC (permalink / raw)
  To: ocfs2-devel

On 08/30/2011 07:51 PM, Wengang Wang wrote:
> The test case is simple:
> in a three-node cluster,
> 1) node A copies kernel tree to ocfs2 volume
> 2) node B and C keeps "ls -R" the tree which is under copying
> 3) after the copy finished, remove the whole tree by "rm -rf xxx" while
>     Node B and C are still "ls -R"ing.
> 4) stop the "ls -R" on node B/C when "rm" on node A is finished.
> 5) umount all three nodes.
> There are entries left in orphandirs(could for all slots).
> Actually copying whole is time consuming, I can hit the problem when copied part
> of the kernel tree.
>
> I confirmed that the cause is "remotely opened" by printing logs.
> log showed that all the three nodes think "there is other node(s) still opening the inode",
> so they don't do dinode deletion.


Fair enough.

What needs investigating is whether this approach will work in 1.4/1.6
too. evict_inode() was added later.


> Yes, I have also checked into the two functions.
> I found there is no bad effect to call ocfs2_open_unlock() and
> ocfs2_simple_drop_lockres() on openlock twice.
>
> when ocfs2_inode_unlock() is called in ocfs2_clear_inode again, the holders
> should be zero already(in the first call), so no confusing to dlm unlock.
> when ocfs2_mark_lockres_freeing() is called for the second time, the openlock
> should be with OCFS2_LOCK_FREEING flag already, I don't see problem here.
> when ocfs2_drop_lock() is called for the second time, flag OCFS2_LOCK_ATTACHED
> should be cleared already(in first call), so no problem either.
> Maybe I missed something?
>
> Simply removing ocfs2_open_unlock/ocfs2_mark_lockres_freeing from
> ocfs2_clear_inode() and ocfs2_drop_lock from ocfs2_drop_inode_locks()
> respectively can introduce problem:
>
> For ((inode->i_nlink&&
>       !(OCFS2_I(inode)->ip_flags&OCFS2_INODE_MAYBE_ORPHANED))
> case, we still need to call ocfs2_open_unlock() and
> ocfs2_mark_lockres_freeing() against openlock in ocfs2_clear_inode()
> and call ocfs2_drop_lock() in ocfs2_drop_inode_locks().

Good point. No need to add another flag. We may need to add
some comments to explain this.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Ocfs2-devel] [PATCH] ocfs2: unlock open_lock immediately
  2011-08-31 17:36     ` Sunil Mushran
@ 2011-09-01  1:00       ` Wengang Wang
  0 siblings, 0 replies; 7+ messages in thread
From: Wengang Wang @ 2011-09-01  1:00 UTC (permalink / raw)
  To: ocfs2-devel

On 11-08-31 10:36, Sunil Mushran wrote:
> On 08/30/2011 07:51 PM, Wengang Wang wrote:
> >The test case is simple:
> >in a three-node cluster,
> >1) node A copies kernel tree to ocfs2 volume
> >2) node B and C keeps "ls -R" the tree which is under copying
> >3) after the copy finished, remove the whole tree by "rm -rf xxx" while
> >    Node B and C are still "ls -R"ing.
> >4) stop the "ls -R" on node B/C when "rm" on node A is finished.
> >5) umount all three nodes.
> >There are entries left in orphandirs(could for all slots).
> >Actually copying whole is time consuming, I can hit the problem when copied part
> >of the kernel tree.
> >
> >I confirmed that the cause is "remotely opened" by printing logs.
> >log showed that all the three nodes think "there is other node(s) still opening the inode",
> >so they don't do dinode deletion.
> 
> 
> Fair enough.
> 
> What needs investigating is whether this approach will work in 1.4/1.6
> too. evict_inode() was added later.

The relavent code in 1.4 and 1.6 are the same.
The difference between mainline and 1.4/1.6 is that:

For 1.4/1.6 code, clear_inode(), which calls ocfs2_clear_inode(), is called
inside ocfs2_delete_inode() directly in the end.
For mainline code, function clear_inode() is removed.
ocfs2_clear_inode() is called in ocfs2_evict_inode() after just after ocfs2_delete_inode().

There is no essential difference. So I think it also work for 1.4/1.6 too.

Additionally, I confirmed it works on 1.6 too by testing.

> 
> 
> >Yes, I have also checked into the two functions.
> >I found there is no bad effect to call ocfs2_open_unlock() and
> >ocfs2_simple_drop_lockres() on openlock twice.
> >
> >when ocfs2_inode_unlock() is called in ocfs2_clear_inode again, the holders
> >should be zero already(in the first call), so no confusing to dlm unlock.
> >when ocfs2_mark_lockres_freeing() is called for the second time, the openlock
> >should be with OCFS2_LOCK_FREEING flag already, I don't see problem here.
> >when ocfs2_drop_lock() is called for the second time, flag OCFS2_LOCK_ATTACHED
> >should be cleared already(in first call), so no problem either.
> >Maybe I missed something?
> >
> >Simply removing ocfs2_open_unlock/ocfs2_mark_lockres_freeing from
> >ocfs2_clear_inode() and ocfs2_drop_lock from ocfs2_drop_inode_locks()
> >respectively can introduce problem:
> >
> >For ((inode->i_nlink&&
> >      !(OCFS2_I(inode)->ip_flags&OCFS2_INODE_MAYBE_ORPHANED))
> >case, we still need to call ocfs2_open_unlock() and
> >ocfs2_mark_lockres_freeing() against openlock in ocfs2_clear_inode()
> >and call ocfs2_drop_lock() in ocfs2_drop_inode_locks().
> 
> Good point. No need to add another flag. We may need to add
> some comments to explain this.

Ok. I will do in next drop.

thanks,
wengang.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Ocfs2-devel] [PATCH] ocfs2: unlock open_lock immediately
  2011-08-26  2:50 [Ocfs2-devel] [PATCH] ocfs2: unlock open_lock immediately Wengang Wang
  2011-08-31  1:55 ` Sunil Mushran
@ 2011-09-07 18:04 ` Joel Becker
  2011-09-08  2:14   ` Wengang Wang
  1 sibling, 1 reply; 7+ messages in thread
From: Joel Becker @ 2011-09-07 18:04 UTC (permalink / raw)
  To: ocfs2-devel

On Fri, Aug 26, 2011 at 10:50:27AM +0800, Wengang Wang wrote:
> There is a race between 2(+) nodes that calls iput_final() on same inode.
> time sequence is like the following. The result is neither of the 2(+) node
> does real inode deletion work and the unlinked inode is left in orphandir. 
> 
> --------------------------------------
> 
> node A                                  node B
> 
> open_lock PR
> 
>                                         open_LOCK PR
> 

	Who is taking the open lock here?  Or are you presuming a
long-held open lock (eg, back when you untarred stuff)?

> .......
> 
>                                          .......
> 
> #in ocfs2_delete_inode()
> inode_lock EX
> #in ocfs2_query_inode_wipe
> try open_lock EX -->cant grant(B has PR)
> ignore the deletion
> inode_unlock EX
> 
>                                         #in ocfs2_delete_inode() 
>                                         inode_lock EX
>                                         #in ocfs2_query_inode_wipe
>                                         try open_lock EX -->can't grant(A has PR)
>                                         ignore the deletion
>                                         inode_unlock EX
> 
> #in ocfs2_clear_inode()
> open_unlock EX
> drop open_lock
> 
>                                          #in ocfs2_clear_inode()
>                                          open_unlock EX
> 
> --------------------------------------
> 
> The fix is to force dlm_unlock on open_lock within inode_lock. see
> comment embedded in patch.

	Why wouldn't the orphan scan catch this?

> diff --git a/fs/ocfs2/inode.c b/fs/ocfs2/inode.c
> index b4c8bb6..390a6fc 100644
> --- a/fs/ocfs2/inode.c
> +++ b/fs/ocfs2/inode.c
> @@ -1052,6 +1052,17 @@ static void ocfs2_delete_inode(struct inode *inode)
>  	OCFS2_I(inode)->ip_flags |= OCFS2_INODE_DELETED;
>  
>  bail_unlock_inode:
> +	/*
> +	 * since we don't take care of deleting the on disk inode any longer
> +	 * from now on, we must release the open_lock(dlm unlock) immediately
> +	 * within inode_lock. Otherwise, trying open_lock for EX from other node
> +	 * can fail if it comes before we release PR on open_lock later, so that
> +	 * both/all nodes think other node(s) is/are opening the inode thus
> +	 * neither/none of them do real inode deletion.
> +	 */
> +	ocfs2_open_unlock(inode);
> +	ocfs2_simple_drop_lockres(OCFS2_SB(inode->i_sb),
> +				  &OCFS2_I(inode)->ip_open_lockres);

	How do you know that you can ocfs2_simple_drop_lockres()?  Can't
Another code path have a reference on the inode?

Joel

-- 

"The nice thing about egotists is that they don't talk about other
 people."
         - Lucille S. Harper

			http://www.jlbec.org/
			jlbec at evilplan.org

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Ocfs2-devel] [PATCH] ocfs2: unlock open_lock immediately
  2011-09-07 18:04 ` Joel Becker
@ 2011-09-08  2:14   ` Wengang Wang
  0 siblings, 0 replies; 7+ messages in thread
From: Wengang Wang @ 2011-09-08  2:14 UTC (permalink / raw)
  To: ocfs2-devel

Hi Joel,

On 11-09-07 11:04, Joel Becker wrote:
> On Fri, Aug 26, 2011 at 10:50:27AM +0800, Wengang Wang wrote:
> > There is a race between 2(+) nodes that calls iput_final() on same inode.
> > time sequence is like the following. The result is neither of the 2(+) node
> > does real inode deletion work and the unlinked inode is left in orphandir. 
> > 
> > --------------------------------------
> > 
> > node A                                  node B
> > 
> > open_lock PR
> > 
> >                                         open_LOCK PR
> > 
> 
> 	Who is taking the open lock here?  Or are you presuming a
> long-held open lock (eg, back when you untarred stuff)?

Both A and B. Every node has a PR lock against open_lock(
ocfs2_inode_info.ip_open_lockres) during the life of an in memory ocfs2 inode.
The inode in question can be purged out of memory(drops open_lock) due to memory
presure. But even so, the inode is reloaded into memory when the unlink comes,
and a PR is taken against open_lock again.

> 
> > .......
> > 
> >                                          .......
> > 
> > #in ocfs2_delete_inode()
> > inode_lock EX
> > #in ocfs2_query_inode_wipe
> > try open_lock EX -->cant grant(B has PR)
> > ignore the deletion
> > inode_unlock EX
> > 
> >                                         #in ocfs2_delete_inode() 
> >                                         inode_lock EX
> >                                         #in ocfs2_query_inode_wipe
> >                                         try open_lock EX -->can't grant(A has PR)
> >                                         ignore the deletion
> >                                         inode_unlock EX
> > 
> > #in ocfs2_clear_inode()
> > open_unlock EX
> > drop open_lock
> > 
> >                                          #in ocfs2_clear_inode()
> >                                          open_unlock EX
> > 
> > --------------------------------------
> > 
> > The fix is to force dlm_unlock on open_lock within inode_lock. see
> > comment embedded in patch.
> 
> 	Why wouldn't the orphan scan catch this?

orphan scan catches this. But I think pushing the work to orphan scan is not
the correct behaviour.
If you are curious how I found the problem,  during the test, I
1) disabled orphan scan
2) added, to ocfs2_inode_info, a delete_lock which has the same functionality as
dentry_lock(on dentry) does.
The two should make sure deleted files won't be left in orphandirs in any case.

> 
> > diff --git a/fs/ocfs2/inode.c b/fs/ocfs2/inode.c
> > index b4c8bb6..390a6fc 100644
> > --- a/fs/ocfs2/inode.c
> > +++ b/fs/ocfs2/inode.c
> > @@ -1052,6 +1052,17 @@ static void ocfs2_delete_inode(struct inode *inode)
> >  	OCFS2_I(inode)->ip_flags |= OCFS2_INODE_DELETED;
> >  
> >  bail_unlock_inode:
> > +	/*
> > +	 * since we don't take care of deleting the on disk inode any longer
> > +	 * from now on, we must release the open_lock(dlm unlock) immediately
> > +	 * within inode_lock. Otherwise, trying open_lock for EX from other node
> > +	 * can fail if it comes before we release PR on open_lock later, so that
> > +	 * both/all nodes think other node(s) is/are opening the inode thus
> > +	 * neither/none of them do real inode deletion.
> > +	 */
> > +	ocfs2_open_unlock(inode);
> > +	ocfs2_simple_drop_lockres(OCFS2_SB(inode->i_sb),
> > +				  &OCFS2_I(inode)->ip_open_lockres);
> 
> 	How do you know that you can ocfs2_simple_drop_lockres()?  Can't
> Another code path have a reference on the inode?

It is in the iput_final() path, when coming to ocfs2_delete_inode, the
inode should be already with I_FREEING flag which prevents another reference
on the inode in question. This patch is just moving the actual work of
ocfs2_simple_drop_lockres() to a bit earlier place, but still in safe scope.

thanks,
wengang.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2011-09-08  2:14 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-26  2:50 [Ocfs2-devel] [PATCH] ocfs2: unlock open_lock immediately Wengang Wang
2011-08-31  1:55 ` Sunil Mushran
2011-08-31  2:51   ` Wengang Wang
2011-08-31 17:36     ` Sunil Mushran
2011-09-01  1:00       ` Wengang Wang
2011-09-07 18:04 ` Joel Becker
2011-09-08  2:14   ` Wengang Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.