All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: ocfs2-devel@oss.oracle.com
Subject: [Ocfs2-devel] [PATCH] Revert iput deferring code in ocfs2_drop_dentry_lock
Date: Thu, 16 Jan 2014 20:34:44 +0100	[thread overview]
Message-ID: <20140116193444.GB10628@quack.suse.cz> (raw)
In-Reply-To: <52D830D3.6010605@oracle.com>

On Thu 16-01-14 11:19:47, Srinivas Eeda wrote:
> Hi Jan,
> thanks a lot for explaining the problem. Please see my comment below.
> 
> On 01/16/2014 06:02 AM, Jan Kara wrote:
> >On Thu 16-01-14 07:35:58, Goldwyn Rodrigues wrote:
> >>On 01/15/2014 08:47 PM, Jan Kara wrote:
> >>>On Wed 15-01-14 17:17:55, Goldwyn Rodrigues wrote:
> >>>>On 01/15/2014 09:53 AM, Jan Kara wrote:
> >>>>>On Wed 15-01-14 09:32:39, Goldwyn Rodrigues wrote:
> >>>>>>On 01/15/2014 07:33 AM, Jan Kara wrote:
> >>>>>>>On Wed 15-01-14 06:51:51, Goldwyn Rodrigues wrote:
> >>>>>>>>The following patches are reverted in this patch because these
> >>>>>>>>patches caused regression in the unlink() calls.
> >>>>>>>>
> >>>>>>>>ea455f8ab68338ba69f5d3362b342c115bea8e13 - ocfs2: Push out dropping
> >>>>>>>>of dentry lock to ocfs2_wq
> >>>>>>>>f7b1aa69be138ad9d7d3f31fa56f4c9407f56b6a - ocfs2: Fix deadlock on umount
> >>>>>>>>5fd131893793567c361ae64cbeb28a2a753bbe35 - ocfs2: Don't oops in
> >>>>>>>>ocfs2_kill_sb on a failed mount
> >>>>>>>>
> >>>>>>>>The regression is caused because these patches delay the iput() in case
> >>>>>>>>of dentry unlocks. This also delays the unlocking of the open lockres.
> >>>>>>>>The open lockresource is required to test if the inode can be wiped from
> >>>>>>>>disk on not. When the deleting node does not get the open lock, it marks
> >>>>>>>>it as orphan (even though it is not in use by another node/process)
> >>>>>>>>and causes a journal checkpoint. This delays operations following the
> >>>>>>>>inode eviction. This also moves the inode to the orphaned inode
> >>>>>>>>which further causes more I/O and a lot of unneccessary orphans.
> >>>>>>>>
> >>>>>>>>As for the fix, I tried reproducing the problem by using quotas and enabling
> >>>>>>>>lockdep, but the issue could not be reproduced.
> >>>>>>>   So I was thinking about this for a while trying to remember... What it
> >>>>>>>really boils down to is whether it is OK to call ocfs2_delete_inode() from
> >>>>>>>DLM downconvert thread. Bear in mind that ocfs2_delete_inode() takes all
> >>>>>>>kinds of interesting cluster locks meaning that the downconvert thread can
> >>>>>>>block waiting to acquire some cluster lock. That seems like asking for
> >>>>>>>trouble to me (i.e., what if the node holding the lock we need from
> >>>>>>>downconvert thread needs a lock from us first?) but maybe it is OK these
> >>>>>>>days.
> >>>>>>>
> >>>>>>The only lock it tries to take is the "inode lock" resource, which
> >>>>>>seems to be fine to take.
> >>>>>   Why is it obviously fine? Some other node might be holding the inode lock
> >>>>>and still write to the inode. If that needs a cluster lock for block
> >>>>>allocation and that allocation cluster lock is currently hold by our node,
> >>>>>we are screwed. Aren't we?
> >>>>>
> >>>>No. If another thread is using the allocation cluster lock implies
> >>>>we already have the inode lock. Cluster locks are also taken in a
> >>>>strict order in order to avoid ABBA locking.
> >>>   I agree with what you wrote above but it's not exactly what I'm talking
> >>>about. What seems to be possible is:
> >>>NODE 1					NODE2
> >>>holds dentry lock for 'foo'
> >>>holds inode lock for GLOBAL_BITMAP_SYSTEM_INODE
> >>>					dquot_initialize(bar)
> >>>					  ocfs2_dquot_acquire()
> >>>					    ocfs2_inode_lock(USER_QUOTA_SYSTEM_INODE)
> >>>					    ...
> >>>downconvert thread (triggered from another
> >>>node or a different process from NODE2)
> >>>   ocfs2_dentry_post_unlock()
> >>>     ...
> >>>     iput(foo)
> >>>       ocfs2_evict_inode(foo)
> >>>         ocfs2_clear_inode(foo)
> >>>           dquot_drop(inode)
> >>>	    ...
> >>>             ocfs2_inode_lock(USER_QUOTA_SYSTEM_INODE)
> >>>              - blocks
> >>>					    finds we need more space in
> >>>					    quota file
> >>>					    ...
> >>>					    ocfs2_extend_no_holes()
> >>>					      ocfs2_inode_lock(GLOBAL_BITMAP_SYSTEM_INODE)
> >>>						- deadlocks waiting for
> >>>						  downconvert thread
> >>>
> >>Thanks. This explains the situation much better. Whats stopping
> >>NODE1 from releasing the GLOBAL_BITMAP_SYSTEM_INODE? (to break from
> >>hold and wait condition)
> >   The fact that to release the lock, it has to be downconverted. And there
> >is only one downconvert thread (ocfs2dc) and that is busy waiting for
> >USER_QUOTA_SYSTEM_INODE lock...
> Yes that indeed seems to be a problem, thanks a lot for explaining
> :). I do not know much about quota's code but wondering if we can
> fix this problem by enforcing the lock ordering ? Can NODE2 check
> first if it needs space before getting lock on
> USER_QUOTA_SYSTEM_INODE. If it does then it should acquire
> GLOBAL_BITMAP_SYSTEM_INODE before acquiring lock on
> USER_QUOTA_SYSTEM_INODE ? (Basically we enforce
> GLOBAL_BITMAP_SYSTEM_INODE cannot be taken while node has
> USER_QUOTA_SYSTEM_INODE lock)
  Well, that is impractical to say the least - we would have to rewrite the
allocation path to handle specifically quota code which would get the
GLOBAL_BITMAP_SYSTEM_INODE lock at a different place than any other code.
I think it is better to postpone dropping quota references - that leaves
the pain localized into the code handling inode eviction.

								Honza

-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

      reply	other threads:[~2014-01-16 19:34 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-01-15 12:51 [Ocfs2-devel] [PATCH] Revert iput deferring code in ocfs2_drop_dentry_lock Goldwyn Rodrigues
2014-01-15 13:33 ` Jan Kara
2014-01-15 15:32   ` Goldwyn Rodrigues
2014-01-15 15:53     ` Jan Kara
2014-01-15 23:17       ` Goldwyn Rodrigues
2014-01-16  2:47         ` Jan Kara
2014-01-16 13:35           ` Goldwyn Rodrigues
2014-01-16 14:02             ` Jan Kara
2014-01-16 15:49               ` Goldwyn Rodrigues
2014-01-16 17:13                 ` Jan Kara
2014-01-16 17:47                   ` Goldwyn Rodrigues
2014-01-16 19:24                     ` Jan Kara
2014-01-16 19:19               ` Srinivas Eeda
2014-01-16 19:34                 ` Jan Kara [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140116193444.GB10628@quack.suse.cz \
    --to=jack@suse.cz \
    --cc=ocfs2-devel@oss.oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.