* [patch] fs: fix superblock iteration race
@ 2010-06-11 14:50 Nick Piggin
2010-06-11 16:06 ` Linus Torvalds
0 siblings, 1 reply; 8+ messages in thread
From: Nick Piggin @ 2010-06-11 14:50 UTC (permalink / raw)
To: Al Viro, Linus Torvalds, linux-fsdevel
Not sure if this is really the _cleanest_ way to fix it. But open coding
the list walking is a bit annoying too. And I couldn't see any real way to
make the list macro safe. Better ideas?
Thanks,
Nick
--
list_for_each_entry_safe is not suitable to protect against concurrent
modification of the list. 6754af6 introduced a race in sb walking.
list_for_each_entry can use the trick of pinning the current entry in
the list before we drop and retake the lock because it subsequently
follows cur->next. However list_for_each_entry_safe saves n=cur->next
for following before entering the loop body, so when the lock is
dropped, n may be deleted.
Signed-off-by: Nick Piggin <npiggin@suse.de>
---
fs/dcache.c | 2 ++
fs/super.c | 6 ++++++
2 files changed, 8 insertions(+)
Index: linux-2.6/fs/dcache.c
===================================================================
--- linux-2.6.orig/fs/dcache.c 2010-06-12 00:00:10.000000000 +1000
+++ linux-2.6/fs/dcache.c 2010-06-12 00:38:21.000000000 +1000
@@ -590,6 +590,8 @@ static void prune_dcache(int count)
up_read(&sb->s_umount);
}
spin_lock(&sb_lock);
+ /* old n may have been deleted */
+ n = list_entry(sb->s_list.next, struct super_block, s_list);
count -= pruned;
__put_super(sb);
/* more work left to do? */
Index: linux-2.6/fs/super.c
===================================================================
--- linux-2.6.orig/fs/super.c 2010-06-11 23:55:40.000000000 +1000
+++ linux-2.6/fs/super.c 2010-06-12 00:38:40.000000000 +1000
@@ -374,6 +374,8 @@ void sync_supers(void)
up_read(&sb->s_umount);
spin_lock(&sb_lock);
+ /* old n may have been deleted */
+ n = list_entry(sb->s_list.next, struct super_block, s_list);
__put_super(sb);
}
}
@@ -405,6 +407,8 @@ void iterate_supers(void (*f)(struct sup
up_read(&sb->s_umount);
spin_lock(&sb_lock);
+ /* old n may have been deleted */
+ n = list_entry(sb->s_list.next, struct super_block, s_list);
__put_super(sb);
}
spin_unlock(&sb_lock);
@@ -585,6 +589,8 @@ static void do_emergency_remount(struct
}
up_write(&sb->s_umount);
spin_lock(&sb_lock);
+ /* old n may have been deleted */
+ n = list_entry(sb->s_list.next, struct super_block, s_list);
__put_super(sb);
}
spin_unlock(&sb_lock);
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [patch] fs: fix superblock iteration race
2010-06-11 14:50 [patch] fs: fix superblock iteration race Nick Piggin
@ 2010-06-11 16:06 ` Linus Torvalds
2010-06-12 3:37 ` Nick Piggin
2010-06-12 3:57 ` Nick Piggin
0 siblings, 2 replies; 8+ messages in thread
From: Linus Torvalds @ 2010-06-11 16:06 UTC (permalink / raw)
To: Nick Piggin; +Cc: Al Viro, linux-fsdevel
On Fri, Jun 11, 2010 at 7:50 AM, Nick Piggin <npiggin@suse.de> wrote:
> Not sure if this is really the _cleanest_ way to fix it. But open coding
> the list walking is a bit annoying too. And I couldn't see any real way to
> make the list macro safe. Better ideas?
I really think we should open-code the list walking instead. You
basically already are doing that, and in a very non-obvious way too
(ie you are mixing the non-open-coded list walker with also explicitly
playing with the internal variable for that magic walker.
So I would get rid of the 'list_for_each_entry_safe' entirely, and
replace it with something like
struct list_head *list;
spin_lock(&sb_lock);
list = super_blocks->next;
while (list != &super_blocks) {
struct super_block *sb = list_entry(next, struct super_block, s_list);
list = list->next;
if (list_empty(&sb->s_instances))
continue;
if (!sb->s_nr_dentry_unused)
continue;
sb->s_count++;
spin_unlock(&sb_lock);
.... whatever ...
spin_lock(&sb_lock);
/* We dropped the lock, need to re-load the next list entry */
list = sb->s_list.next;
__put_super(sb);
}
which isn't that much more complicated, now is it? Sure, it's
open-coded, but at least it doesn't play games. And being open-coded,
it's a lot more honest about the issue. Maybe even add a comment
saying "we can't use the list_for_each[_safe]() macro, because we
don't hold the lock and we're not the only ones that may delete
things" explaining _why_ it's open-coded.
I dunno. Maybe Al disagrees. I just don't like using the "simple
helpers" and then changing subtly how they work by knowing their
internals.
Linus
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [patch] fs: fix superblock iteration race
2010-06-11 16:06 ` Linus Torvalds
@ 2010-06-12 3:37 ` Nick Piggin
2010-06-12 3:57 ` Nick Piggin
1 sibling, 0 replies; 8+ messages in thread
From: Nick Piggin @ 2010-06-12 3:37 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Al Viro, linux-fsdevel
On Fri, Jun 11, 2010 at 09:06:01AM -0700, Linus Torvalds wrote:
> On Fri, Jun 11, 2010 at 7:50 AM, Nick Piggin <npiggin@suse.de> wrote:
> > Not sure if this is really the _cleanest_ way to fix it. But open coding
> > the list walking is a bit annoying too. And I couldn't see any real way to
> > make the list macro safe. Better ideas?
>
> I really think we should open-code the list walking instead. You
> basically already are doing that, and in a very non-obvious way too
> (ie you are mixing the non-open-coded list walker with also explicitly
> playing with the internal variable for that magic walker.
>
> So I would get rid of the 'list_for_each_entry_safe' entirely, and
> replace it with something like
>
> struct list_head *list;
>
> spin_lock(&sb_lock);
> list = super_blocks->next;
> while (list != &super_blocks) {
> struct super_block *sb = list_entry(next, struct super_block, s_list);
> list = list->next;
>
> if (list_empty(&sb->s_instances))
> continue;
>
> if (!sb->s_nr_dentry_unused)
> continue;
>
> sb->s_count++;
> spin_unlock(&sb_lock);
>
> .... whatever ...
>
> spin_lock(&sb_lock);
> /* We dropped the lock, need to re-load the next list entry */
> list = sb->s_list.next;
> __put_super(sb);
> }
Yeah I do agree really. I guess the bug came about in the first place
because it's easy to overlook where the memory accesses happen.
> which isn't that much more complicated, now is it? Sure, it's
> open-coded, but at least it doesn't play games. And being open-coded,
> it's a lot more honest about the issue. Maybe even add a comment
> saying "we can't use the list_for_each[_safe]() macro, because we
> don't hold the lock and we're not the only ones that may delete
> things" explaining _why_ it's open-coded.
>
> I dunno. Maybe Al disagrees. I just don't like using the "simple
> helpers" and then changing subtly how they work by knowing their
> internals.
I'll respin the patch and we'll see.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [patch] fs: fix superblock iteration race
2010-06-11 16:06 ` Linus Torvalds
2010-06-12 3:37 ` Nick Piggin
@ 2010-06-12 3:57 ` Nick Piggin
2010-06-12 4:15 ` Linus Torvalds
1 sibling, 1 reply; 8+ messages in thread
From: Nick Piggin @ 2010-06-12 3:57 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Al Viro, linux-fsdevel
On Fri, Jun 11, 2010 at 09:06:01AM -0700, Linus Torvalds wrote:
> On Fri, Jun 11, 2010 at 7:50 AM, Nick Piggin <npiggin@suse.de> wrote:
> > Not sure if this is really the _cleanest_ way to fix it. But open coding
> > the list walking is a bit annoying too. And I couldn't see any real way to
> > make the list macro safe. Better ideas?
>
> I really think we should open-code the list walking instead. You
> basically already are doing that, and in a very non-obvious way too
> (ie you are mixing the non-open-coded list walker with also explicitly
> playing with the internal variable for that magic walker.
>
> So I would get rid of the 'list_for_each_entry_safe' entirely, and
> replace it with something like
>
> struct list_head *list;
>
> spin_lock(&sb_lock);
> list = super_blocks->next;
> while (list != &super_blocks) {
> struct super_block *sb = list_entry(next, struct super_block, s_list);
> list = list->next;
>
> if (list_empty(&sb->s_instances))
> continue;
>
> if (!sb->s_nr_dentry_unused)
> continue;
>
> sb->s_count++;
> spin_unlock(&sb_lock);
>
> .... whatever ...
>
> spin_lock(&sb_lock);
> /* We dropped the lock, need to re-load the next list entry */
> list = sb->s_list.next;
> __put_super(sb);
> }
>
> which isn't that much more complicated, now is it? Sure, it's
> open-coded, but at least it doesn't play games. And being open-coded,
> it's a lot more honest about the issue. Maybe even add a comment
> saying "we can't use the list_for_each[_safe]() macro, because we
> don't hold the lock and we're not the only ones that may delete
> things" explaining _why_ it's open-coded.
>
> I dunno. Maybe Al disagrees. I just don't like using the "simple
> helpers" and then changing subtly how they work by knowing their
> internals.
Something like this
--
list_for_each_entry_safe is not suitable to protect against concurrent
modification of the list. 6754af6 introduced a race in sb walking.
list_for_each_entry can use the trick of pinning the current entry while we
drop and retake the lock because the iteration subsequently follows cur->next.
However list_for_each_entry_safe saves n=cur->next before entering the loop
body, so when the lock is dropped, n may be deleted.
Signed-off-by: Nick Piggin <npiggin@suse.de>
---
fs/dcache.c | 14 ++++++++++++--
fs/super.c | 51 +++++++++++++++++++++++++++++++++++++++++++++------
2 files changed, 57 insertions(+), 8 deletions(-)
Index: linux-2.6/fs/dcache.c
===================================================================
--- linux-2.6.orig/fs/dcache.c
+++ linux-2.6/fs/dcache.c
@@ -536,7 +536,7 @@ restart:
*/
static void prune_dcache(int count)
{
- struct super_block *sb, *n;
+ struct list_head *list;
int w_count;
int unused = dentry_stat.nr_unused;
int prune_ratio;
@@ -549,8 +549,16 @@ static void prune_dcache(int count)
prune_ratio = 1;
else
prune_ratio = unused / count;
+
+ /* see iterate_supers for super_blocks iteration comments */
spin_lock(&sb_lock);
- list_for_each_entry_safe(sb, n, &super_blocks, s_list) {
+ list = super_blocks.next;
+ while (list != &super_blocks) {
+ struct super_block *sb;
+
+ sb = list_entry(list, struct super_block, s_list);
+ list = list->next;
+
if (list_empty(&sb->s_instances))
continue;
if (sb->s_nr_dentry_unused == 0)
@@ -590,6 +598,8 @@ static void prune_dcache(int count)
up_read(&sb->s_umount);
}
spin_lock(&sb_lock);
+ /* sb_lock dropped, must reload next */
+ list = sb->s_list.next;
count -= pruned;
__put_super(sb);
/* more work left to do? */
Index: linux-2.6/fs/super.c
===================================================================
--- linux-2.6.orig/fs/super.c
+++ linux-2.6/fs/super.c
@@ -358,10 +358,17 @@ EXPORT_SYMBOL(drop_super);
*/
void sync_supers(void)
{
- struct super_block *sb, *n;
+ struct list_head *list;
+ /* see iterate_supers for super_blocks iteration comments */
spin_lock(&sb_lock);
- list_for_each_entry_safe(sb, n, &super_blocks, s_list) {
+ list = super_blocks.next;
+ while (list != &super_blocks) {
+ struct super_block *sb;
+
+ sb = list_entry(list, struct super_block, s_list);
+ list = list->next;
+
if (list_empty(&sb->s_instances))
continue;
if (sb->s_op->write_super && sb->s_dirt) {
@@ -374,6 +381,8 @@ void sync_supers(void)
up_read(&sb->s_umount);
spin_lock(&sb_lock);
+ /* sb_lock dropped, must reload next */
+ list = sb->s_list.next;
__put_super(sb);
}
}
@@ -390,10 +399,25 @@ void sync_supers(void)
*/
void iterate_supers(void (*f)(struct super_block *, void *), void *arg)
{
- struct super_block *sb, *n;
+ struct list_head *list;
+ /*
+ * Walk the list of super_blocks:
+ * Cannot use list_for_each_entry because __put_super may delete
+ * sb from the list.
+ * Cannot use list_for_each_entry_safe because it loads both the
+ * current and next list entries before the loop body. When dropping
+ * the lock we have only pinned the current entry in the list, next
+ * may be deleted.
+ */
spin_lock(&sb_lock);
- list_for_each_entry_safe(sb, n, &super_blocks, s_list) {
+ list = super_blocks.next;
+ while (list != &super_blocks) {
+ struct super_block *sb;
+
+ sb = list_entry(list, struct super_block, s_list);
+ list = list->next;
+
if (list_empty(&sb->s_instances))
continue;
sb->s_count++;
@@ -405,6 +429,12 @@ void iterate_supers(void (*f)(struct sup
up_read(&sb->s_umount);
spin_lock(&sb_lock);
+ /*
+ * sb_lock dropped, we must reload next entry. We can reload it
+ * from sb because we have that element pinned in the list with
+ * s_count.
+ */
+ list = sb->s_list.next;
__put_super(sb);
}
spin_unlock(&sb_lock);
@@ -568,10 +598,17 @@ int do_remount_sb(struct super_block *sb
static void do_emergency_remount(struct work_struct *work)
{
- struct super_block *sb, *n;
+ struct list_head *list;
+ /* see iterate_supers for super_blocks iteration comments */
spin_lock(&sb_lock);
- list_for_each_entry_safe(sb, n, &super_blocks, s_list) {
+ list = super_blocks.next;
+ while (list != &super_blocks) {
+ struct super_block *sb;
+
+ sb = list_entry(list, struct super_block, s_list);
+ list = list->next;
+
if (list_empty(&sb->s_instances))
continue;
sb->s_count++;
@@ -585,6 +622,8 @@ static void do_emergency_remount(struct
}
up_write(&sb->s_umount);
spin_lock(&sb_lock);
+ /* sb_lock dropped, must reload next */
+ list = sb->s_list.next;
__put_super(sb);
}
spin_unlock(&sb_lock);
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [patch] fs: fix superblock iteration race
2010-06-12 3:57 ` Nick Piggin
@ 2010-06-12 4:15 ` Linus Torvalds
2010-06-12 4:38 ` Nick Piggin
0 siblings, 1 reply; 8+ messages in thread
From: Linus Torvalds @ 2010-06-12 4:15 UTC (permalink / raw)
To: Nick Piggin; +Cc: Al Viro, linux-fsdevel
On Fri, Jun 11, 2010 at 8:57 PM, Nick Piggin <npiggin@suse.de> wrote:
>
> Something like this
>
> fs/dcache.c | 14 ++++++++++++--
> fs/super.c | 51 +++++++++++++++++++++++++++++++++++++++++++++------
> 2 files changed, 57 insertions(+), 8 deletions(-)
Ok, I have to admit that I didn't expect it to blow up quite that
badly - I thought it would add a line or two, not 50.
So wow I wonder if we should use your old approach instead, just make
it an 'official' thing. IOW, maybe we can use
"list_for_each_entry_safe()" after all, but simply introduce a
"reset_next_entry()" helper or something and make that be part of the
"calling convention" for those things.
But I can live with the open-coded version too. It just is a bit more
code than I thought it would be.
Anybody? I don't really have very strong opinions.
Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [patch] fs: fix superblock iteration race
2010-06-12 4:15 ` Linus Torvalds
@ 2010-06-12 4:38 ` Nick Piggin
2010-06-12 4:46 ` Linus Torvalds
0 siblings, 1 reply; 8+ messages in thread
From: Nick Piggin @ 2010-06-12 4:38 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Al Viro, linux-fsdevel
On Fri, Jun 11, 2010 at 09:15:54PM -0700, Linus Torvalds wrote:
> On Fri, Jun 11, 2010 at 8:57 PM, Nick Piggin <npiggin@suse.de> wrote:
> >
> > Something like this
> >
> > fs/dcache.c | 14 ++++++++++++--
> > fs/super.c | 51 +++++++++++++++++++++++++++++++++++++++++++++------
> > 2 files changed, 57 insertions(+), 8 deletions(-)
>
> Ok, I have to admit that I didn't expect it to blow up quite that
> badly - I thought it would add a line or two, not 50.
In fairness I added a lot more comments and a bit of whitespace,
accounting for about half of those lines added.
> So wow I wonder if we should use your old approach instead, just make
> it an 'official' thing. IOW, maybe we can use
> "list_for_each_entry_safe()" after all, but simply introduce a
> "reset_next_entry()" helper or something and make that be part of the
> "calling convention" for those things.
>
> But I can live with the open-coded version too. It just is a bit more
> code than I thought it would be.
>
> Anybody? I don't really have very strong opinions.
I wouldn't mind a list macro to reset. I'd prefer the name
match better with the iterator macro though.
list_safe_reset_next()?
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [patch] fs: fix superblock iteration race
2010-06-12 4:38 ` Nick Piggin
@ 2010-06-12 4:46 ` Linus Torvalds
2010-06-14 15:07 ` Nick Piggin
0 siblings, 1 reply; 8+ messages in thread
From: Linus Torvalds @ 2010-06-12 4:46 UTC (permalink / raw)
To: Nick Piggin; +Cc: Al Viro, linux-fsdevel
On Fri, Jun 11, 2010 at 9:38 PM, Nick Piggin <npiggin@suse.de> wrote:
>
> In fairness I added a lot more comments and a bit of whitespace,
> accounting for about half of those lines added.
Yeah, ok. As mentioned, I can certainly go either way.
> I wouldn't mind a list macro to reset. I'd prefer the name
> match better with the iterator macro though.
> list_safe_reset_next()?
Yeah, it needs to be at least "list_" something. So
"list_safe_reset_next()" sounds as good as anything else. I wouldn't
expect it to be very common.
Linus
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [patch] fs: fix superblock iteration race
2010-06-12 4:46 ` Linus Torvalds
@ 2010-06-14 15:07 ` Nick Piggin
0 siblings, 0 replies; 8+ messages in thread
From: Nick Piggin @ 2010-06-14 15:07 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Al Viro, linux-fsdevel
On Fri, Jun 11, 2010 at 09:46:49PM -0700, Linus Torvalds wrote:
> On Fri, Jun 11, 2010 at 9:38 PM, Nick Piggin <npiggin@suse.de> wrote:
> >
> > In fairness I added a lot more comments and a bit of whitespace,
> > accounting for about half of those lines added.
>
> Yeah, ok. As mentioned, I can certainly go either way.
>
> > I wouldn't mind a list macro to reset. I'd prefer the name
> > match better with the iterator macro though.
> > list_safe_reset_next()?
>
> Yeah, it needs to be at least "list_" something. So
> "list_safe_reset_next()" sounds as good as anything else. I wouldn't
> expect it to be very common.
I could go either way as well. This is what the list macro looks
like. I almost dislike it because it is a rather special thing to
be doing. OTOH it gives a pattern to grep for abuses.
Al, ping?
--
list_for_each_entry_safe is not suitable to protect against concurrent
modification of the list. 6754af6 introduced a race in sb walking.
list_for_each_entry can use the trick of pinning the current entry in
the list before we drop and retake the lock because it subsequently
follows cur->next. However list_for_each_entry_safe saves n=cur->next
for following before entering the loop body, so when the lock is
dropped, n may be deleted.
Signed-off-by: Nick Piggin <npiggin@suse.de>
---
fs/dcache.c | 2 ++
fs/super.c | 6 ++++++
2 files changed, 8 insertions(+)
Index: linux-2.6/fs/dcache.c
===================================================================
--- linux-2.6.orig/fs/dcache.c
+++ linux-2.6/fs/dcache.c
@@ -590,6 +590,8 @@ static void prune_dcache(int count)
up_read(&sb->s_umount);
}
spin_lock(&sb_lock);
+ /* lock was dropped, must reset next */
+ list_safe_reset_next(sb, n, s_list);
count -= pruned;
__put_super(sb);
/* more work left to do? */
Index: linux-2.6/fs/super.c
===================================================================
--- linux-2.6.orig/fs/super.c
+++ linux-2.6/fs/super.c
@@ -374,6 +374,8 @@ void sync_supers(void)
up_read(&sb->s_umount);
spin_lock(&sb_lock);
+ /* lock was dropped, must reset next */
+ list_safe_reset_next(sb, n, s_list);
__put_super(sb);
}
}
@@ -405,6 +407,8 @@ void iterate_supers(void (*f)(struct sup
up_read(&sb->s_umount);
spin_lock(&sb_lock);
+ /* lock was dropped, must reset next */
+ list_safe_reset_next(sb, n, s_list);
__put_super(sb);
}
spin_unlock(&sb_lock);
@@ -585,6 +589,8 @@ static void do_emergency_remount(struct
}
up_write(&sb->s_umount);
spin_lock(&sb_lock);
+ /* lock was dropped, must reset next */
+ list_safe_reset_next(sb, n, s_list);
__put_super(sb);
}
spin_unlock(&sb_lock);
Index: linux-2.6/include/linux/list.h
===================================================================
--- linux-2.6.orig/include/linux/list.h
+++ linux-2.6/include/linux/list.h
@@ -544,6 +544,21 @@ static inline void list_splice_tail_init
&pos->member != (head); \
pos = n, n = list_entry(n->member.prev, typeof(*n), member))
+/**
+ * list_safe_reset_next - reset a stale list_for_each_entry_safe loop
+ * @pos: the loop cursor used in the list_for_each_entry_safe loop
+ * @n: temporary storage used in list_for_each_entry_safe
+ * @member: the name of the list_struct within the struct.
+ *
+ * list_for_each_entry_safe is not safe to use in general if the list may be
+ * modified concurrently (eg. the lock is dropped in the loop body). An
+ * exception to this is if the cursor element (pos) is pinned in the list,
+ * and list_safe_reset_next is called after re-taking the lock and before
+ * completing the current iteration of the loop body.
+ */
+#define list_safe_reset_next(pos, n, member) \
+ n = list_entry(pos->member.next, typeof(*pos), member)
+
/*
* Double linked lists with a single pointer list head.
* Mostly useful for hash tables where the two pointer list head is
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2010-06-14 15:07 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-06-11 14:50 [patch] fs: fix superblock iteration race Nick Piggin
2010-06-11 16:06 ` Linus Torvalds
2010-06-12 3:37 ` Nick Piggin
2010-06-12 3:57 ` Nick Piggin
2010-06-12 4:15 ` Linus Torvalds
2010-06-12 4:38 ` Nick Piggin
2010-06-12 4:46 ` Linus Torvalds
2010-06-14 15:07 ` Nick Piggin
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.