* [patch] fs: fix superblock iteration race @ 2010-06-11 14:50 Nick Piggin 2010-06-11 16:06 ` Linus Torvalds 0 siblings, 1 reply; 8+ messages in thread From: Nick Piggin @ 2010-06-11 14:50 UTC (permalink / raw) To: Al Viro, Linus Torvalds, linux-fsdevel Not sure if this is really the _cleanest_ way to fix it. But open coding the list walking is a bit annoying too. And I couldn't see any real way to make the list macro safe. Better ideas? Thanks, Nick -- list_for_each_entry_safe is not suitable to protect against concurrent modification of the list. 6754af6 introduced a race in sb walking. list_for_each_entry can use the trick of pinning the current entry in the list before we drop and retake the lock because it subsequently follows cur->next. However list_for_each_entry_safe saves n=cur->next for following before entering the loop body, so when the lock is dropped, n may be deleted. Signed-off-by: Nick Piggin <npiggin@suse.de> --- fs/dcache.c | 2 ++ fs/super.c | 6 ++++++ 2 files changed, 8 insertions(+) Index: linux-2.6/fs/dcache.c =================================================================== --- linux-2.6.orig/fs/dcache.c 2010-06-12 00:00:10.000000000 +1000 +++ linux-2.6/fs/dcache.c 2010-06-12 00:38:21.000000000 +1000 @@ -590,6 +590,8 @@ static void prune_dcache(int count) up_read(&sb->s_umount); } spin_lock(&sb_lock); + /* old n may have been deleted */ + n = list_entry(sb->s_list.next, struct super_block, s_list); count -= pruned; __put_super(sb); /* more work left to do? */ Index: linux-2.6/fs/super.c =================================================================== --- linux-2.6.orig/fs/super.c 2010-06-11 23:55:40.000000000 +1000 +++ linux-2.6/fs/super.c 2010-06-12 00:38:40.000000000 +1000 @@ -374,6 +374,8 @@ void sync_supers(void) up_read(&sb->s_umount); spin_lock(&sb_lock); + /* old n may have been deleted */ + n = list_entry(sb->s_list.next, struct super_block, s_list); __put_super(sb); } } @@ -405,6 +407,8 @@ void iterate_supers(void (*f)(struct sup up_read(&sb->s_umount); spin_lock(&sb_lock); + /* old n may have been deleted */ + n = list_entry(sb->s_list.next, struct super_block, s_list); __put_super(sb); } spin_unlock(&sb_lock); @@ -585,6 +589,8 @@ static void do_emergency_remount(struct } up_write(&sb->s_umount); spin_lock(&sb_lock); + /* old n may have been deleted */ + n = list_entry(sb->s_list.next, struct super_block, s_list); __put_super(sb); } spin_unlock(&sb_lock); ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [patch] fs: fix superblock iteration race 2010-06-11 14:50 [patch] fs: fix superblock iteration race Nick Piggin @ 2010-06-11 16:06 ` Linus Torvalds 2010-06-12 3:37 ` Nick Piggin 2010-06-12 3:57 ` Nick Piggin 0 siblings, 2 replies; 8+ messages in thread From: Linus Torvalds @ 2010-06-11 16:06 UTC (permalink / raw) To: Nick Piggin; +Cc: Al Viro, linux-fsdevel On Fri, Jun 11, 2010 at 7:50 AM, Nick Piggin <npiggin@suse.de> wrote: > Not sure if this is really the _cleanest_ way to fix it. But open coding > the list walking is a bit annoying too. And I couldn't see any real way to > make the list macro safe. Better ideas? I really think we should open-code the list walking instead. You basically already are doing that, and in a very non-obvious way too (ie you are mixing the non-open-coded list walker with also explicitly playing with the internal variable for that magic walker. So I would get rid of the 'list_for_each_entry_safe' entirely, and replace it with something like struct list_head *list; spin_lock(&sb_lock); list = super_blocks->next; while (list != &super_blocks) { struct super_block *sb = list_entry(next, struct super_block, s_list); list = list->next; if (list_empty(&sb->s_instances)) continue; if (!sb->s_nr_dentry_unused) continue; sb->s_count++; spin_unlock(&sb_lock); .... whatever ... spin_lock(&sb_lock); /* We dropped the lock, need to re-load the next list entry */ list = sb->s_list.next; __put_super(sb); } which isn't that much more complicated, now is it? Sure, it's open-coded, but at least it doesn't play games. And being open-coded, it's a lot more honest about the issue. Maybe even add a comment saying "we can't use the list_for_each[_safe]() macro, because we don't hold the lock and we're not the only ones that may delete things" explaining _why_ it's open-coded. I dunno. Maybe Al disagrees. I just don't like using the "simple helpers" and then changing subtly how they work by knowing their internals. Linus ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [patch] fs: fix superblock iteration race 2010-06-11 16:06 ` Linus Torvalds @ 2010-06-12 3:37 ` Nick Piggin 2010-06-12 3:57 ` Nick Piggin 1 sibling, 0 replies; 8+ messages in thread From: Nick Piggin @ 2010-06-12 3:37 UTC (permalink / raw) To: Linus Torvalds; +Cc: Al Viro, linux-fsdevel On Fri, Jun 11, 2010 at 09:06:01AM -0700, Linus Torvalds wrote: > On Fri, Jun 11, 2010 at 7:50 AM, Nick Piggin <npiggin@suse.de> wrote: > > Not sure if this is really the _cleanest_ way to fix it. But open coding > > the list walking is a bit annoying too. And I couldn't see any real way to > > make the list macro safe. Better ideas? > > I really think we should open-code the list walking instead. You > basically already are doing that, and in a very non-obvious way too > (ie you are mixing the non-open-coded list walker with also explicitly > playing with the internal variable for that magic walker. > > So I would get rid of the 'list_for_each_entry_safe' entirely, and > replace it with something like > > struct list_head *list; > > spin_lock(&sb_lock); > list = super_blocks->next; > while (list != &super_blocks) { > struct super_block *sb = list_entry(next, struct super_block, s_list); > list = list->next; > > if (list_empty(&sb->s_instances)) > continue; > > if (!sb->s_nr_dentry_unused) > continue; > > sb->s_count++; > spin_unlock(&sb_lock); > > .... whatever ... > > spin_lock(&sb_lock); > /* We dropped the lock, need to re-load the next list entry */ > list = sb->s_list.next; > __put_super(sb); > } Yeah I do agree really. I guess the bug came about in the first place because it's easy to overlook where the memory accesses happen. > which isn't that much more complicated, now is it? Sure, it's > open-coded, but at least it doesn't play games. And being open-coded, > it's a lot more honest about the issue. Maybe even add a comment > saying "we can't use the list_for_each[_safe]() macro, because we > don't hold the lock and we're not the only ones that may delete > things" explaining _why_ it's open-coded. > > I dunno. Maybe Al disagrees. I just don't like using the "simple > helpers" and then changing subtly how they work by knowing their > internals. I'll respin the patch and we'll see. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [patch] fs: fix superblock iteration race 2010-06-11 16:06 ` Linus Torvalds 2010-06-12 3:37 ` Nick Piggin @ 2010-06-12 3:57 ` Nick Piggin 2010-06-12 4:15 ` Linus Torvalds 1 sibling, 1 reply; 8+ messages in thread From: Nick Piggin @ 2010-06-12 3:57 UTC (permalink / raw) To: Linus Torvalds; +Cc: Al Viro, linux-fsdevel On Fri, Jun 11, 2010 at 09:06:01AM -0700, Linus Torvalds wrote: > On Fri, Jun 11, 2010 at 7:50 AM, Nick Piggin <npiggin@suse.de> wrote: > > Not sure if this is really the _cleanest_ way to fix it. But open coding > > the list walking is a bit annoying too. And I couldn't see any real way to > > make the list macro safe. Better ideas? > > I really think we should open-code the list walking instead. You > basically already are doing that, and in a very non-obvious way too > (ie you are mixing the non-open-coded list walker with also explicitly > playing with the internal variable for that magic walker. > > So I would get rid of the 'list_for_each_entry_safe' entirely, and > replace it with something like > > struct list_head *list; > > spin_lock(&sb_lock); > list = super_blocks->next; > while (list != &super_blocks) { > struct super_block *sb = list_entry(next, struct super_block, s_list); > list = list->next; > > if (list_empty(&sb->s_instances)) > continue; > > if (!sb->s_nr_dentry_unused) > continue; > > sb->s_count++; > spin_unlock(&sb_lock); > > .... whatever ... > > spin_lock(&sb_lock); > /* We dropped the lock, need to re-load the next list entry */ > list = sb->s_list.next; > __put_super(sb); > } > > which isn't that much more complicated, now is it? Sure, it's > open-coded, but at least it doesn't play games. And being open-coded, > it's a lot more honest about the issue. Maybe even add a comment > saying "we can't use the list_for_each[_safe]() macro, because we > don't hold the lock and we're not the only ones that may delete > things" explaining _why_ it's open-coded. > > I dunno. Maybe Al disagrees. I just don't like using the "simple > helpers" and then changing subtly how they work by knowing their > internals. Something like this -- list_for_each_entry_safe is not suitable to protect against concurrent modification of the list. 6754af6 introduced a race in sb walking. list_for_each_entry can use the trick of pinning the current entry while we drop and retake the lock because the iteration subsequently follows cur->next. However list_for_each_entry_safe saves n=cur->next before entering the loop body, so when the lock is dropped, n may be deleted. Signed-off-by: Nick Piggin <npiggin@suse.de> --- fs/dcache.c | 14 ++++++++++++-- fs/super.c | 51 +++++++++++++++++++++++++++++++++++++++++++++------ 2 files changed, 57 insertions(+), 8 deletions(-) Index: linux-2.6/fs/dcache.c =================================================================== --- linux-2.6.orig/fs/dcache.c +++ linux-2.6/fs/dcache.c @@ -536,7 +536,7 @@ restart: */ static void prune_dcache(int count) { - struct super_block *sb, *n; + struct list_head *list; int w_count; int unused = dentry_stat.nr_unused; int prune_ratio; @@ -549,8 +549,16 @@ static void prune_dcache(int count) prune_ratio = 1; else prune_ratio = unused / count; + + /* see iterate_supers for super_blocks iteration comments */ spin_lock(&sb_lock); - list_for_each_entry_safe(sb, n, &super_blocks, s_list) { + list = super_blocks.next; + while (list != &super_blocks) { + struct super_block *sb; + + sb = list_entry(list, struct super_block, s_list); + list = list->next; + if (list_empty(&sb->s_instances)) continue; if (sb->s_nr_dentry_unused == 0) @@ -590,6 +598,8 @@ static void prune_dcache(int count) up_read(&sb->s_umount); } spin_lock(&sb_lock); + /* sb_lock dropped, must reload next */ + list = sb->s_list.next; count -= pruned; __put_super(sb); /* more work left to do? */ Index: linux-2.6/fs/super.c =================================================================== --- linux-2.6.orig/fs/super.c +++ linux-2.6/fs/super.c @@ -358,10 +358,17 @@ EXPORT_SYMBOL(drop_super); */ void sync_supers(void) { - struct super_block *sb, *n; + struct list_head *list; + /* see iterate_supers for super_blocks iteration comments */ spin_lock(&sb_lock); - list_for_each_entry_safe(sb, n, &super_blocks, s_list) { + list = super_blocks.next; + while (list != &super_blocks) { + struct super_block *sb; + + sb = list_entry(list, struct super_block, s_list); + list = list->next; + if (list_empty(&sb->s_instances)) continue; if (sb->s_op->write_super && sb->s_dirt) { @@ -374,6 +381,8 @@ void sync_supers(void) up_read(&sb->s_umount); spin_lock(&sb_lock); + /* sb_lock dropped, must reload next */ + list = sb->s_list.next; __put_super(sb); } } @@ -390,10 +399,25 @@ void sync_supers(void) */ void iterate_supers(void (*f)(struct super_block *, void *), void *arg) { - struct super_block *sb, *n; + struct list_head *list; + /* + * Walk the list of super_blocks: + * Cannot use list_for_each_entry because __put_super may delete + * sb from the list. + * Cannot use list_for_each_entry_safe because it loads both the + * current and next list entries before the loop body. When dropping + * the lock we have only pinned the current entry in the list, next + * may be deleted. + */ spin_lock(&sb_lock); - list_for_each_entry_safe(sb, n, &super_blocks, s_list) { + list = super_blocks.next; + while (list != &super_blocks) { + struct super_block *sb; + + sb = list_entry(list, struct super_block, s_list); + list = list->next; + if (list_empty(&sb->s_instances)) continue; sb->s_count++; @@ -405,6 +429,12 @@ void iterate_supers(void (*f)(struct sup up_read(&sb->s_umount); spin_lock(&sb_lock); + /* + * sb_lock dropped, we must reload next entry. We can reload it + * from sb because we have that element pinned in the list with + * s_count. + */ + list = sb->s_list.next; __put_super(sb); } spin_unlock(&sb_lock); @@ -568,10 +598,17 @@ int do_remount_sb(struct super_block *sb static void do_emergency_remount(struct work_struct *work) { - struct super_block *sb, *n; + struct list_head *list; + /* see iterate_supers for super_blocks iteration comments */ spin_lock(&sb_lock); - list_for_each_entry_safe(sb, n, &super_blocks, s_list) { + list = super_blocks.next; + while (list != &super_blocks) { + struct super_block *sb; + + sb = list_entry(list, struct super_block, s_list); + list = list->next; + if (list_empty(&sb->s_instances)) continue; sb->s_count++; @@ -585,6 +622,8 @@ static void do_emergency_remount(struct } up_write(&sb->s_umount); spin_lock(&sb_lock); + /* sb_lock dropped, must reload next */ + list = sb->s_list.next; __put_super(sb); } spin_unlock(&sb_lock); ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [patch] fs: fix superblock iteration race 2010-06-12 3:57 ` Nick Piggin @ 2010-06-12 4:15 ` Linus Torvalds 2010-06-12 4:38 ` Nick Piggin 0 siblings, 1 reply; 8+ messages in thread From: Linus Torvalds @ 2010-06-12 4:15 UTC (permalink / raw) To: Nick Piggin; +Cc: Al Viro, linux-fsdevel On Fri, Jun 11, 2010 at 8:57 PM, Nick Piggin <npiggin@suse.de> wrote: > > Something like this > > fs/dcache.c | 14 ++++++++++++-- > fs/super.c | 51 +++++++++++++++++++++++++++++++++++++++++++++------ > 2 files changed, 57 insertions(+), 8 deletions(-) Ok, I have to admit that I didn't expect it to blow up quite that badly - I thought it would add a line or two, not 50. So wow I wonder if we should use your old approach instead, just make it an 'official' thing. IOW, maybe we can use "list_for_each_entry_safe()" after all, but simply introduce a "reset_next_entry()" helper or something and make that be part of the "calling convention" for those things. But I can live with the open-coded version too. It just is a bit more code than I thought it would be. Anybody? I don't really have very strong opinions. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [patch] fs: fix superblock iteration race 2010-06-12 4:15 ` Linus Torvalds @ 2010-06-12 4:38 ` Nick Piggin 2010-06-12 4:46 ` Linus Torvalds 0 siblings, 1 reply; 8+ messages in thread From: Nick Piggin @ 2010-06-12 4:38 UTC (permalink / raw) To: Linus Torvalds; +Cc: Al Viro, linux-fsdevel On Fri, Jun 11, 2010 at 09:15:54PM -0700, Linus Torvalds wrote: > On Fri, Jun 11, 2010 at 8:57 PM, Nick Piggin <npiggin@suse.de> wrote: > > > > Something like this > > > > fs/dcache.c | 14 ++++++++++++-- > > fs/super.c | 51 +++++++++++++++++++++++++++++++++++++++++++++------ > > 2 files changed, 57 insertions(+), 8 deletions(-) > > Ok, I have to admit that I didn't expect it to blow up quite that > badly - I thought it would add a line or two, not 50. In fairness I added a lot more comments and a bit of whitespace, accounting for about half of those lines added. > So wow I wonder if we should use your old approach instead, just make > it an 'official' thing. IOW, maybe we can use > "list_for_each_entry_safe()" after all, but simply introduce a > "reset_next_entry()" helper or something and make that be part of the > "calling convention" for those things. > > But I can live with the open-coded version too. It just is a bit more > code than I thought it would be. > > Anybody? I don't really have very strong opinions. I wouldn't mind a list macro to reset. I'd prefer the name match better with the iterator macro though. list_safe_reset_next()? -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [patch] fs: fix superblock iteration race 2010-06-12 4:38 ` Nick Piggin @ 2010-06-12 4:46 ` Linus Torvalds 2010-06-14 15:07 ` Nick Piggin 0 siblings, 1 reply; 8+ messages in thread From: Linus Torvalds @ 2010-06-12 4:46 UTC (permalink / raw) To: Nick Piggin; +Cc: Al Viro, linux-fsdevel On Fri, Jun 11, 2010 at 9:38 PM, Nick Piggin <npiggin@suse.de> wrote: > > In fairness I added a lot more comments and a bit of whitespace, > accounting for about half of those lines added. Yeah, ok. As mentioned, I can certainly go either way. > I wouldn't mind a list macro to reset. I'd prefer the name > match better with the iterator macro though. > list_safe_reset_next()? Yeah, it needs to be at least "list_" something. So "list_safe_reset_next()" sounds as good as anything else. I wouldn't expect it to be very common. Linus ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [patch] fs: fix superblock iteration race 2010-06-12 4:46 ` Linus Torvalds @ 2010-06-14 15:07 ` Nick Piggin 0 siblings, 0 replies; 8+ messages in thread From: Nick Piggin @ 2010-06-14 15:07 UTC (permalink / raw) To: Linus Torvalds; +Cc: Al Viro, linux-fsdevel On Fri, Jun 11, 2010 at 09:46:49PM -0700, Linus Torvalds wrote: > On Fri, Jun 11, 2010 at 9:38 PM, Nick Piggin <npiggin@suse.de> wrote: > > > > In fairness I added a lot more comments and a bit of whitespace, > > accounting for about half of those lines added. > > Yeah, ok. As mentioned, I can certainly go either way. > > > I wouldn't mind a list macro to reset. I'd prefer the name > > match better with the iterator macro though. > > list_safe_reset_next()? > > Yeah, it needs to be at least "list_" something. So > "list_safe_reset_next()" sounds as good as anything else. I wouldn't > expect it to be very common. I could go either way as well. This is what the list macro looks like. I almost dislike it because it is a rather special thing to be doing. OTOH it gives a pattern to grep for abuses. Al, ping? -- list_for_each_entry_safe is not suitable to protect against concurrent modification of the list. 6754af6 introduced a race in sb walking. list_for_each_entry can use the trick of pinning the current entry in the list before we drop and retake the lock because it subsequently follows cur->next. However list_for_each_entry_safe saves n=cur->next for following before entering the loop body, so when the lock is dropped, n may be deleted. Signed-off-by: Nick Piggin <npiggin@suse.de> --- fs/dcache.c | 2 ++ fs/super.c | 6 ++++++ 2 files changed, 8 insertions(+) Index: linux-2.6/fs/dcache.c =================================================================== --- linux-2.6.orig/fs/dcache.c +++ linux-2.6/fs/dcache.c @@ -590,6 +590,8 @@ static void prune_dcache(int count) up_read(&sb->s_umount); } spin_lock(&sb_lock); + /* lock was dropped, must reset next */ + list_safe_reset_next(sb, n, s_list); count -= pruned; __put_super(sb); /* more work left to do? */ Index: linux-2.6/fs/super.c =================================================================== --- linux-2.6.orig/fs/super.c +++ linux-2.6/fs/super.c @@ -374,6 +374,8 @@ void sync_supers(void) up_read(&sb->s_umount); spin_lock(&sb_lock); + /* lock was dropped, must reset next */ + list_safe_reset_next(sb, n, s_list); __put_super(sb); } } @@ -405,6 +407,8 @@ void iterate_supers(void (*f)(struct sup up_read(&sb->s_umount); spin_lock(&sb_lock); + /* lock was dropped, must reset next */ + list_safe_reset_next(sb, n, s_list); __put_super(sb); } spin_unlock(&sb_lock); @@ -585,6 +589,8 @@ static void do_emergency_remount(struct } up_write(&sb->s_umount); spin_lock(&sb_lock); + /* lock was dropped, must reset next */ + list_safe_reset_next(sb, n, s_list); __put_super(sb); } spin_unlock(&sb_lock); Index: linux-2.6/include/linux/list.h =================================================================== --- linux-2.6.orig/include/linux/list.h +++ linux-2.6/include/linux/list.h @@ -544,6 +544,21 @@ static inline void list_splice_tail_init &pos->member != (head); \ pos = n, n = list_entry(n->member.prev, typeof(*n), member)) +/** + * list_safe_reset_next - reset a stale list_for_each_entry_safe loop + * @pos: the loop cursor used in the list_for_each_entry_safe loop + * @n: temporary storage used in list_for_each_entry_safe + * @member: the name of the list_struct within the struct. + * + * list_for_each_entry_safe is not safe to use in general if the list may be + * modified concurrently (eg. the lock is dropped in the loop body). An + * exception to this is if the cursor element (pos) is pinned in the list, + * and list_safe_reset_next is called after re-taking the lock and before + * completing the current iteration of the loop body. + */ +#define list_safe_reset_next(pos, n, member) \ + n = list_entry(pos->member.next, typeof(*pos), member) + /* * Double linked lists with a single pointer list head. * Mostly useful for hash tables where the two pointer list head is ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2010-06-14 15:07 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2010-06-11 14:50 [patch] fs: fix superblock iteration race Nick Piggin 2010-06-11 16:06 ` Linus Torvalds 2010-06-12 3:37 ` Nick Piggin 2010-06-12 3:57 ` Nick Piggin 2010-06-12 4:15 ` Linus Torvalds 2010-06-12 4:38 ` Nick Piggin 2010-06-12 4:46 ` Linus Torvalds 2010-06-14 15:07 ` Nick Piggin
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.