All of lore.kernel.org
 help / color / mirror / Atom feed
* [Ocfs2-devel] Question about incorrect free bits setting
@ 2015-03-27  2:27 Joseph Qi
  2015-03-27 16:54 ` Goldwyn Rodrigues
  0 siblings, 1 reply; 6+ messages in thread
From: Joseph Qi @ 2015-03-27  2:27 UTC (permalink / raw)
  To: ocfs2-devel

Hi Goldwyn,
I found you posted a mail to discuss about incorrect free bits setting.
https://oss.oracle.com/pipermail/ocfs2-devel/2012-January/008458.html

In this topic, Sunil said it was because of the patch added to delay
dropping of the dentry locks (commit ea455f8ab683) and suggested to fix
the quota issue in a different way.
Then you reverted the patches based on Honza's new way to fix the quota
issue.
https://oss.oracle.com/pipermail/ocfs2-devel/2014-February/009662.html

I have investigated these patches and still do not know how can it
happen.
Could you please tell me more about the case that bits to be cleared
twice?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Ocfs2-devel] Question about incorrect free bits setting
  2015-03-27  2:27 [Ocfs2-devel] Question about incorrect free bits setting Joseph Qi
@ 2015-03-27 16:54 ` Goldwyn Rodrigues
  2015-03-27 16:57   ` Goldwyn Rodrigues
  0 siblings, 1 reply; 6+ messages in thread
From: Goldwyn Rodrigues @ 2015-03-27 16:54 UTC (permalink / raw)
  To: ocfs2-devel

Hi joseph,

On 03/26/2015 09:27 PM, Joseph Qi wrote:
> Hi Goldwyn,
> I found you posted a mail to discuss about incorrect free bits setting.
> https://oss.oracle.com/pipermail/ocfs2-devel/2012-January/008458.html
>
> In this topic, Sunil said it was because of the patch added to delay
> dropping of the dentry locks (commit ea455f8ab683) and suggested to fix
> the quota issue in a different way.
> Then you reverted the patches based on Honza's new way to fix the quota
> issue.
> https://oss.oracle.com/pipermail/ocfs2-devel/2014-February/009662.html
>
> I have investigated these patches and still do not know how can it
> happen.
> Could you please tell me more about the case that bits to be cleared
> twice?

I am not sure how the quota patches were related. It was a long time ago.

However, what we fixed in Honza's patches is the way unlink is 
performed. The problem was we were getting very bad performance because 
of too much of journal activity. We realized that it was because the 
inodes were shown as busy and hence moved orphan directory, when they 
were not busy. It all came to the point that the open lock was still 
being held because it was delayed/offloaded to another thread.

I am not sure, but I guess that this delay may be messing up the 
accounting between the node being the owner of the lock and the one 
deleting the file (also requesting for the lock). I have not seen this 
issue for a long time now so I am not sure. Perhaps Sunil may be able to 
give more inputs.

-- 
Goldwyn

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Ocfs2-devel] Question about incorrect free bits setting
  2015-03-27 16:54 ` Goldwyn Rodrigues
@ 2015-03-27 16:57   ` Goldwyn Rodrigues
  2015-04-01  8:16     ` Joseph Qi
  0 siblings, 1 reply; 6+ messages in thread
From: Goldwyn Rodrigues @ 2015-03-27 16:57 UTC (permalink / raw)
  To: ocfs2-devel

Hi joseph,

On 03/26/2015 09:27 PM, Joseph Qi wrote:
> Hi Goldwyn,
> I found you posted a mail to discuss about incorrect free bits setting.
> https://oss.oracle.com/pipermail/ocfs2-devel/2012-January/008458.html
>
> In this topic, Sunil said it was because of the patch added to delay
> dropping of the dentry locks (commit ea455f8ab683) and suggested to fix
> the quota issue in a different way.
> Then you reverted the patches based on Honza's new way to fix the quota
> issue.
> https://oss.oracle.com/pipermail/ocfs2-devel/2014-February/009662.html
>
> I have investigated these patches and still do not know how can it
> happen.
> Could you please tell me more about the case that bits to be cleared
> twice?

I am not sure how the quota patches were related. It was a long time ago.

However, what we fixed in Honza's patches is the way unlink is
performed. The problem was we were getting very bad performance because
of too much of journal activity. We realized that it was because the
inodes were shown as busy and hence moved orphan directory, when they
were not busy. It all came to the point that the open lock was still
being held because it was delayed/offloaded to another thread.

I am not sure, but I guess that this delay may be messing up the
accounting between the node being the owner of the lock and the one
deleting the file (also requesting for the lock). I have not seen this
issue for a long time now so I am not sure. Perhaps Sunil may be able to
give more inputs.


-- 
Goldwyn

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Ocfs2-devel] Question about incorrect free bits setting
  2015-03-27 16:57   ` Goldwyn Rodrigues
@ 2015-04-01  8:16     ` Joseph Qi
  0 siblings, 0 replies; 6+ messages in thread
From: Joseph Qi @ 2015-04-01  8:16 UTC (permalink / raw)
  To: ocfs2-devel

Hi Goldwyn,
Thanks very much for the quick reply.

Hi Sunil,
Could you help provide more inputs?

Thanks,
Joseph

On 2015/3/28 0:57, Goldwyn Rodrigues wrote:
> Hi joseph,
> 
> On 03/26/2015 09:27 PM, Joseph Qi wrote:
>> Hi Goldwyn,
>> I found you posted a mail to discuss about incorrect free bits setting.
>> https://oss.oracle.com/pipermail/ocfs2-devel/2012-January/008458.html
>>
>> In this topic, Sunil said it was because of the patch added to delay
>> dropping of the dentry locks (commit ea455f8ab683) and suggested to fix
>> the quota issue in a different way.
>> Then you reverted the patches based on Honza's new way to fix the quota
>> issue.
>> https://oss.oracle.com/pipermail/ocfs2-devel/2014-February/009662.html
>>
>> I have investigated these patches and still do not know how can it
>> happen.
>> Could you please tell me more about the case that bits to be cleared
>> twice?
> 
> I am not sure how the quota patches were related. It was a long time ago.
> 
> However, what we fixed in Honza's patches is the way unlink is
> performed. The problem was we were getting very bad performance because
> of too much of journal activity. We realized that it was because the
> inodes were shown as busy and hence moved orphan directory, when they
> were not busy. It all came to the point that the open lock was still
> being held because it was delayed/offloaded to another thread.
> 
> I am not sure, but I guess that this delay may be messing up the
> accounting between the node being the owner of the lock and the one
> deleting the file (also requesting for the lock). I have not seen this
> issue for a long time now so I am not sure. Perhaps Sunil may be able to
> give more inputs.
> 
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Ocfs2-devel] Question about incorrect free bits setting
  2012-01-18 18:00 Goldwyn Rodrigues
@ 2012-01-18 18:21 ` Sunil Mushran
  0 siblings, 0 replies; 6+ messages in thread
From: Sunil Mushran @ 2012-01-18 18:21 UTC (permalink / raw)
  To: ocfs2-devel

We've seen this too. The problem happens because of the patch added to delay
dropping of the dentry locks (first patch below). The other two are related.
It was added to avoid a deadlock in quotas but adds problems of its own.
Srini has studied this issue and may be able to expand on this. The quick
and dirty solution is to back out these patches and ask users to disable
quotas for now. The longer term solution is to fix the quotas issue in a different
way... or redo deletes completely.

commit ea455f8ab68338ba69f5d3362b342c115bea8e13
Author: Jan Kara <jack@suse.cz>
Date:   Mon Jan 12 23:20:31 2009 +0100

     ocfs2: Push out dropping of dentry lock to ocfs2_wq

     Dropping of last reference to dentry lock is a complicated operation involving
     dropping of reference to inode. This can get complicated and quota code in
     particular needs to obtain some quota locks which leads to potential deadlock.
     Thus we defer dropping of inode reference to ocfs2_wq.

     Signed-off-by: Jan Kara <jack@suse.cz>
     Signed-off-by: Mark Fasheh <mfasheh@suse.com>

commit 5fd131893793567c361ae64cbeb28a2a753bbe35
Author: Jan Kara <jack@suse.cz>
Date:   Thu Jul 30 17:01:53 2009 +0200

     ocfs2: Don't oops in ocfs2_kill_sb on a failed mount

     If we fail to mount the filesystem, we have to be careful not to dereference
     uninitialized structures in ocfs2_kill_sb.

     Signed-off-by: Jan Kara <jack@suse.cz>
     Signed-off-by: Joel Becker <joel.becker@oracle.com>

commit f7b1aa69be138ad9d7d3f31fa56f4c9407f56b6a
Author: Jan Kara <jack@suse.cz>
Date:   Mon Jul 20 12:12:36 2009 +0200

     ocfs2: Fix deadlock on umount

     In commit ea455f8ab68338ba69f5d3362b342c115bea8e13, we moved the dentry lock
     put process into ocfs2_wq. This causes problems during umount because ocfs2_wq
     can drop references to inodes while they are being invalidated by
     invalidate_inodes() causing all sorts of nasty things (invalidate_inodes()
     ending in an infinite loop, "Busy inodes after umount" messages etc.).

     We fix the problem by stopping ocfs2_wq from doing any further releasing of
     inode references on the superblock being unmounted, wait until it finishes
     the current round of releasing and finally cleaning up all the references in
     dentry_lock_list from ocfs2_put_super().

     The issue was tracked down by Tao Ma <tao.ma@oracle.com>.

     Signed-off-by: Jan Kara <jack@suse.cz>
     Signed-off-by: Joel Becker <joel.becker@oracle.com>



On 01/18/2012 10:00 AM, Goldwyn Rodrigues wrote:
> We have a customer who was running into read-only filesystem because
> of incorrect free bits set/calculation. We have provided the fix from
> here, which avoids the read-only problem
> http://oss.oracle.com/pipermail/ocfs2-devel/2011-November/008431.html
>
> Though the filesystem is does not turn read-only, we still get messages like -
>
> [ 5017.452846] (ocfs2_wq,8480,0):ocfs2_block_group_clear_bits:2113
> ERROR: Trying to clear 1 bits at offset 7658 in group descriptor #
> 7644672 (device cciss/c0d0p3), needed to clear 0 bits
>
> We are investigating how the bits get free in the first place because
> another allocation could claim the bits marked as free.
>
> The question is:
>
> Why does ocfs2_release_clusters has ocfs2_clear_bit as the undo
> function wheras ocfs2_free_clusters has ocfs2_set_bit as the undo
> function? Should it be NULL for ocfs2_release_clusters?
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Ocfs2-devel] Question about incorrect free bits setting
@ 2012-01-18 18:00 Goldwyn Rodrigues
  2012-01-18 18:21 ` Sunil Mushran
  0 siblings, 1 reply; 6+ messages in thread
From: Goldwyn Rodrigues @ 2012-01-18 18:00 UTC (permalink / raw)
  To: ocfs2-devel

We have a customer who was running into read-only filesystem because
of incorrect free bits set/calculation. We have provided the fix from
here, which avoids the read-only problem
http://oss.oracle.com/pipermail/ocfs2-devel/2011-November/008431.html

Though the filesystem is does not turn read-only, we still get messages like -

[ 5017.452846] (ocfs2_wq,8480,0):ocfs2_block_group_clear_bits:2113
ERROR: Trying to clear 1 bits at offset 7658 in group descriptor #
7644672 (device cciss/c0d0p3), needed to clear 0 bits

We are investigating how the bits get free in the first place because
another allocation could claim the bits marked as free.

The question is:

Why does ocfs2_release_clusters has ocfs2_clear_bit as the undo
function wheras ocfs2_free_clusters has ocfs2_set_bit as the undo
function? Should it be NULL for ocfs2_release_clusters?

-- 
Goldwyn

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-04-01  8:16 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-27  2:27 [Ocfs2-devel] Question about incorrect free bits setting Joseph Qi
2015-03-27 16:54 ` Goldwyn Rodrigues
2015-03-27 16:57   ` Goldwyn Rodrigues
2015-04-01  8:16     ` Joseph Qi
  -- strict thread matches above, loose matches on Subject: below --
2012-01-18 18:00 Goldwyn Rodrigues
2012-01-18 18:21 ` Sunil Mushran

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.