All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] mballoc: trivial code cleanup
@ 2011-02-06  4:53 Coly Li
  2011-02-06  8:43 ` Amir Goldstein
  0 siblings, 1 reply; 7+ messages in thread
From: Coly Li @ 2011-02-06  4:53 UTC (permalink / raw)
  To: linux-ext4; +Cc: Alex Tomas, Theodore Tso

This patch set contains several trivial code cleanup to mballoc code.
A basic testing is done with Linux 2.6.38-rc3+ (commit 44f2c5c8).

Signed-off-by: Coly Li <bosong.ly@taobao.com>
Cc: Alex Tomas <alex@clusterfs.com>
Cc: Theodore Tso <tytso@google.com>
---
  fs/ext4/mballoc.c |   15 ++++++++++-----
  fs/ext4/mballoc.h |    2 +-
  2 files changed, 11 insertions(+), 6 deletions(-)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/5] mballoc: trivial code cleanup
  2011-02-06  4:53 [PATCH 0/5] mballoc: trivial code cleanup Coly Li
@ 2011-02-06  8:43 ` Amir Goldstein
  2011-02-07 17:41   ` Ted Ts'o
  0 siblings, 1 reply; 7+ messages in thread
From: Amir Goldstein @ 2011-02-06  8:43 UTC (permalink / raw)
  To: i; +Cc: linux-ext4, Alex Tomas, Theodore Tso

Hi Coly,

Forgive me for hijacking your thread.
I cannot comment on the cleanup, as I am new to mballoc code,
but I was wondering if you could offer me a piece of advise.

In the current implementation (prototype) of ext4 snapshots,
I have an outstanding circular lockdep warning to fix:
inode->i_data_sem => grp->alloc_sem => snapshot->i_data_sem/1 => grp->alloc_sem

Snapshot block allocation can be nested inside another inode block allocation,
because COW of bitmap block is triggered on get_write_access() inside
ext4_mb_mark_diskspace_used().
i_data_sem nested lockdep is handled by calling
down_read/write_nested() in ext4_map_blocks():
https://github.com/amir73il/ext4-snapshots/blob/ext4-snapshots/fs/ext4/inode.c#L1540

In ext3/next3, handling nested i_truncate_mutex lockdep was enough,
but in ext4, I need to take care of alloc_sem and lg_mutex as well.
I "handled" lg_mutex by never setting EXT4_MB_HINT_DATA on snapshot files.
When looking at alloc_sem, I realized that it is only needed to avoid
race with adjacent group buddy initialization.
Snapshots feature, however, requires that block_size == page_size, so
it seems that alloc_sem
can be avoided altogether.

My questions are:

1. Am I missing something in my analysis?
2. Wouldn't it make sense to bypass alloc_sem at all times if
block_size == page_size
(or block_size*2 >= page_size), regardless of the snapshots feature?
After all, a group buddy is always initialized before calling
allocation routines.

Thanks for your time,
Amir.


On Sun, Feb 6, 2011 at 6:53 AM, Coly Li <i@coly.li> wrote:
> This patch set contains several trivial code cleanup to mballoc code.
> A basic testing is done with Linux 2.6.38-rc3+ (commit 44f2c5c8).
>
> Signed-off-by: Coly Li <bosong.ly@taobao.com>
> Cc: Alex Tomas <alex@clusterfs.com>
> Cc: Theodore Tso <tytso@google.com>
> ---
>  fs/ext4/mballoc.c |   15 ++++++++++-----
>  fs/ext4/mballoc.h |    2 +-
>  2 files changed, 11 insertions(+), 6 deletions(-)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/5] mballoc: trivial code cleanup
  2011-02-06  8:43 ` Amir Goldstein
@ 2011-02-07 17:41   ` Ted Ts'o
  2011-02-07 20:59     ` Amir Goldstein
  0 siblings, 1 reply; 7+ messages in thread
From: Ted Ts'o @ 2011-02-07 17:41 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: i, linux-ext4, Alex Tomas, Theodore Tso

On Sun, Feb 06, 2011 at 10:43:58AM +0200, Amir Goldstein wrote:
> When looking at alloc_sem, I realized that it is only needed to avoid
> race with adjacent group buddy initialization.

Actually, alloc_sem is used to protect all of the block group specific
data structures; the buddy bitmap counters, adjusting the buddy bitmap
itself, the largest free order in a block group, etc.  So even in the
case where block_size == page_size, alloc_sem is still needed!

     	   	      	 	    	      	 - Ted

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/5] mballoc: trivial code cleanup
  2011-02-07 17:41   ` Ted Ts'o
@ 2011-02-07 20:59     ` Amir Goldstein
  2011-02-07 22:24       ` Ted Ts'o
  0 siblings, 1 reply; 7+ messages in thread
From: Amir Goldstein @ 2011-02-07 20:59 UTC (permalink / raw)
  To: Ted Ts'o; +Cc: i, Ext4 Developers List

On Mon, Feb 7, 2011 at 7:41 PM, Ted Ts'o <tytso@mit.edu> wrote:
> On Sun, Feb 06, 2011 at 10:43:58AM +0200, Amir Goldstein wrote:
>> When looking at alloc_sem, I realized that it is only needed to avoid
>> race with adjacent group buddy initialization.
>
> Actually, alloc_sem is used to protect all of the block group specific
> data structures; the buddy bitmap counters, adjusting the buddy bitmap
> itself, the largest free order in a block group, etc.  So even in the
> case where block_size == page_size, alloc_sem is still needed!
>

This was my assumption by the name of the lock itself, but when I searched where
down_write(alloc_sem) is called, I saw that it is called by 3 functions:
mb_init_group - for lazy init of buddy cache, once per mount per blockgroup,
before any alloc routines can access that blockgroup.
add_groupblocks - for online resize, even before lazy init of buddy cache.
init_inode_table - for lazy init of inode table, once per mkfs, per blockgroup.

it seems to me like down_write(alloc_sem) may be taken in mb_init_group()
*only* in case EXT4_MB_GRP_NEED_INIT(grp) and that all down_read(alloc_sem)
calls in mballoc.c serve no purpose when block_size == page_size.
I concluded that from the comment in mb_load_buddy():

/* Take the read lock on the group alloc
 * sem. This would make sure a parallel
 * ext4_mb_init_group happening on other
 * groups mapped by the page is blocked
 * till we are done with allocation
 */

it says alloc_sem protects against lazy init of adjacent groups
and says nothing about protecting block group specific data structures...

what am I missing???

what am I missing here???
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/5] mballoc: trivial code cleanup
  2011-02-07 20:59     ` Amir Goldstein
@ 2011-02-07 22:24       ` Ted Ts'o
  2011-02-08  8:54         ` Amir Goldstein
  0 siblings, 1 reply; 7+ messages in thread
From: Ted Ts'o @ 2011-02-07 22:24 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: i, Ext4 Developers List

On Mon, Feb 07, 2011 at 10:59:54PM +0200, Amir Goldstein wrote:
> it says alloc_sem protects against lazy init of adjacent groups
> and says nothing about protecting block group specific data structures...
> 
> what am I missing???

You're missing ext4_mb_load_buddy(), which takes grp->alloc_sem, and
which is released by ext4_mb_unload_buddy().  No, it's not the most
obvious code in the world...

					- Ted



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/5] mballoc: trivial code cleanup
  2011-02-07 22:24       ` Ted Ts'o
@ 2011-02-08  8:54         ` Amir Goldstein
  2011-02-09  9:09           ` Amir Goldstein
  0 siblings, 1 reply; 7+ messages in thread
From: Amir Goldstein @ 2011-02-08  8:54 UTC (permalink / raw)
  To: Ted Ts'o; +Cc: i, Ext4 Developers List

On Tue, Feb 8, 2011 at 12:24 AM, Ted Ts'o <tytso@mit.edu> wrote:
> On Mon, Feb 07, 2011 at 10:59:54PM +0200, Amir Goldstein wrote:
>> it says alloc_sem protects against lazy init of adjacent groups
>> and says nothing about protecting block group specific data structures...
>>
>> what am I missing???
>
> You're missing ext4_mb_load_buddy(), which takes grp->alloc_sem, and
> which is released by ext4_mb_unload_buddy().  No, it's not the most
> obvious code in the world...
>

OK Ted, you leave me no choice... I need to paste the code of mb_load_buddy():

1157        e4b->alloc_semp = &grp->alloc_sem;
1158
1159        /* Take the read lock on the group alloc
1160         * sem. This would make sure a parallel
1161         * ext4_mb_init_group happening on other
1162         * groups mapped by the page is blocked
1163         * till we are done with allocation
1164         */
1165repeat_load_buddy:
1166        down_read(e4b->alloc_semp);
1167
1168        if (unlikely(EXT4_MB_GRP_NEED_INIT(grp))) {
1169                /* we need to check for group need init flag
1170                 * with alloc_semp held so that we can be sure
1171                 * that new blocks didn't get added to the group
1172                 * when we are loading the buddy cache
1173                 */
1174                up_read(e4b->alloc_semp);
1175                /*
1176                 * we need full data about the group
1177                 * to make a good selection
1178                 */
1179                ret = ext4_mb_init_group(sb, group);
1180                if (ret)
1181                        return ret;
1182                goto repeat_load_buddy;
1183        }
1184

ext4_mb_load_buddy() *only* takes down_read(grp->alloc_sem),
except for the first time after mount, in which ext4_mb_init_group() takes
down_write(grp->alloc_sem), releases it, and then repeat_load_buddy label
will re-take down_read(grp->alloc_sem).

Essentially, this means that after time Ti(group), all users take only read
access to grp->alloc_sem, which is kind of futile...

Your statement that alloc_sem is needed certainly makes sense, but I just don't
see it in the code.
As un-obvious as the code may be, you cannot protect data structures
without anyone taking write access to the semaphore on allocation routines.
Also, I believe that buddy data structures are modified in
ext4_mb_generate_buddy()
under the protection of ext4_lock_group().

So at the risk of having to buy you a beer on LFS I will repeat my
nagging question:
What am I missing???

Amir.
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/5] mballoc: trivial code cleanup
  2011-02-08  8:54         ` Amir Goldstein
@ 2011-02-09  9:09           ` Amir Goldstein
  0 siblings, 0 replies; 7+ messages in thread
From: Amir Goldstein @ 2011-02-09  9:09 UTC (permalink / raw)
  To: Ted Ts'o, Aneesh Kumar K.V; +Cc: i, Ext4 Developers List

On Tue, Feb 8, 2011 at 10:54 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Tue, Feb 8, 2011 at 12:24 AM, Ted Ts'o <tytso@mit.edu> wrote:
>> On Mon, Feb 07, 2011 at 10:59:54PM +0200, Amir Goldstein wrote:
>>> it says alloc_sem protects against lazy init of adjacent groups
>>> and says nothing about protecting block group specific data structures...
>>>
>>> what am I missing???
>>
>> You're missing ext4_mb_load_buddy(), which takes grp->alloc_sem, and
>> which is released by ext4_mb_unload_buddy().  No, it's not the most
>> obvious code in the world...
>>
>
> OK Ted, you leave me no choice... I need to paste the code of mb_load_buddy():
>
> 1157        e4b->alloc_semp = &grp->alloc_sem;
> 1158
> 1159        /* Take the read lock on the group alloc
> 1160         * sem. This would make sure a parallel
> 1161         * ext4_mb_init_group happening on other
> 1162         * groups mapped by the page is blocked
> 1163         * till we are done with allocation
> 1164         */
> 1165repeat_load_buddy:
> 1166        down_read(e4b->alloc_semp);
> 1167
> 1168        if (unlikely(EXT4_MB_GRP_NEED_INIT(grp))) {
> 1169                /* we need to check for group need init flag
> 1170                 * with alloc_semp held so that we can be sure
> 1171                 * that new blocks didn't get added to the group
> 1172                 * when we are loading the buddy cache
> 1173                 */
> 1174                up_read(e4b->alloc_semp);
> 1175                /*
> 1176                 * we need full data about the group
> 1177                 * to make a good selection
> 1178                 */
> 1179                ret = ext4_mb_init_group(sb, group);
> 1180                if (ret)
> 1181                        return ret;
> 1182                goto repeat_load_buddy;
> 1183        }
> 1184
>
> ext4_mb_load_buddy() *only* takes down_read(grp->alloc_sem),
> except for the first time after mount, in which ext4_mb_init_group() takes
> down_write(grp->alloc_sem), releases it, and then repeat_load_buddy label
> will re-take down_read(grp->alloc_sem).
>
> Essentially, this means that after time Ti(group), all users take only read
> access to grp->alloc_sem, which is kind of futile...
>
> Your statement that alloc_sem is needed certainly makes sense, but I just don't
> see it in the code.
> As un-obvious as the code may be, you cannot protect data structures
> without anyone taking write access to the semaphore on allocation routines.
> Also, I believe that buddy data structures are modified in
> ext4_mb_generate_buddy()
> under the protection of ext4_lock_group().
>
> So at the risk of having to buy you a beer on LFS I will repeat my
> nagging question:
> What am I missing???
>

I found out what I was missing (it's in the comment in line 1169 above).
I wrongly assumed the EXT4_GROUP_INFO_NEED_INIT_BIT is set only
once in a lifetime of an ext4_group_info, but I was wrong. It may also be set
when adding blocks to an existing group from ext4_group_extend().

Still, I think that the use cases in which down_read(alloc_sem) is needed are
very unlikely() and can be covered with the following check:

    if (blocks_per_page > 2 || group == sbi->s_groups_count - 1)
        /* Synchronize init of adjacent group and adding of blocks to
last group */
        e4b->alloc_semp = &grp->alloc_sem;
    else
        e4b->alloc_semp = NULL;

I will post a patch for review.

Amir.
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2011-02-09  9:09 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-02-06  4:53 [PATCH 0/5] mballoc: trivial code cleanup Coly Li
2011-02-06  8:43 ` Amir Goldstein
2011-02-07 17:41   ` Ted Ts'o
2011-02-07 20:59     ` Amir Goldstein
2011-02-07 22:24       ` Ted Ts'o
2011-02-08  8:54         ` Amir Goldstein
2011-02-09  9:09           ` Amir Goldstein

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.