* [PATCH RESEND] Fix a dead loop in async_synchronize_full()
@ 2012-07-09 7:04 Li Zhong
2012-07-11 22:42 ` Andrew Morton
0 siblings, 1 reply; 10+ messages in thread
From: Li Zhong @ 2012-07-09 7:04 UTC (permalink / raw)
To: LKML; +Cc: arjan, Paul E. McKenney, Christian Kujau, Cong Wang
This patch tries to fix a dead loop in async_synchronize_full(), which
could be seen when preemption is disabled on a single cpu machine.
void async_synchronize_full(void)
{
do {
async_synchronize_cookie(next_cookie);
} while (!list_empty(&async_running) || !
list_empty(&async_pending));
}
async_synchronize_cookie() calls async_synchronize_cookie_domain() with
&async_running as the default domain to synchronize.
However, there might be some works in the async_pending list from other
domains. On a single cpu system, without preemption, there is no chance
for the other works to finish, so async_synchronize_full() enters a dead
loop.
It seems async_synchronize_full() wants to synchronize all entries in
all running lists(domains), so maybe we could just check the entry_count
to know whether all works are finished.
Currently, async_synchronize_cookie_domain() expects a non-NULL running
list ( if NULL, there would be NULL pointer dereference ), so maybe a
NULL pointer could be used as an indication for the functions to
synchronize all works in all domains.
Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Christian Kujau <lists@nerdbynature.de>
---
kernel/async.c | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/kernel/async.c b/kernel/async.c
index bd0c168..32d8dc9 100644
--- a/kernel/async.c
+++ b/kernel/async.c
@@ -86,6 +86,13 @@ static async_cookie_t __lowest_in_progress(struct
list_head *running)
{
struct async_entry *entry;
+ if (!running) { /* just check the entry count */
+ if (atomic_read(&entry_count))
+ return 0; /* smaller than any cookie */
+ else
+ return next_cookie;
+ }
+
if (!list_empty(running)) {
entry = list_first_entry(running,
struct async_entry, list);
@@ -236,9 +243,7 @@ EXPORT_SYMBOL_GPL(async_schedule_domain);
*/
void async_synchronize_full(void)
{
- do {
- async_synchronize_cookie(next_cookie);
- } while (!list_empty(&async_running) || !list_empty(&async_pending));
+ async_synchronize_cookie_domain(next_cookie, NULL);
}
EXPORT_SYMBOL_GPL(async_synchronize_full);
@@ -258,7 +263,7 @@ EXPORT_SYMBOL_GPL(async_synchronize_full_domain);
/**
* async_synchronize_cookie_domain - synchronize asynchronous function
calls within a certain domain with cookie checkpointing
* @cookie: async_cookie_t to use as checkpoint
- * @running: running list to synchronize on
+ * @running: running list to synchronize on, NULL indicates all lists
*
* This function waits until all asynchronous function calls for the
* synchronization domain specified by the running list @list submitted
--
1.7.9.5
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH RESEND] Fix a dead loop in async_synchronize_full()
2012-07-09 7:04 [PATCH RESEND] Fix a dead loop in async_synchronize_full() Li Zhong
@ 2012-07-11 22:42 ` Andrew Morton
2012-07-11 22:50 ` Dan Williams
2012-07-12 2:49 ` Li Zhong
0 siblings, 2 replies; 10+ messages in thread
From: Andrew Morton @ 2012-07-11 22:42 UTC (permalink / raw)
To: Li Zhong
Cc: LKML, arjan, Paul E. McKenney, Christian Kujau, Cong Wang, Dan Williams
On Mon, 09 Jul 2012 15:04:25 +0800
Li Zhong <zhong@linux.vnet.ibm.com> wrote:
> This patch tries to fix a dead loop in async_synchronize_full(), which
> could be seen when preemption is disabled on a single cpu machine.
>
> void async_synchronize_full(void)
> {
> do {
> async_synchronize_cookie(next_cookie);
> } while (!list_empty(&async_running) || !
> list_empty(&async_pending));
> }
>
> async_synchronize_cookie() calls async_synchronize_cookie_domain() with
> &async_running as the default domain to synchronize.
>
> However, there might be some works in the async_pending list from other
> domains. On a single cpu system, without preemption, there is no chance
> for the other works to finish, so async_synchronize_full() enters a dead
> loop.
>
> It seems async_synchronize_full() wants to synchronize all entries in
> all running lists(domains), so maybe we could just check the entry_count
> to know whether all works are finished.
>
> Currently, async_synchronize_cookie_domain() expects a non-NULL running
> list ( if NULL, there would be NULL pointer dereference ), so maybe a
> NULL pointer could be used as an indication for the functions to
> synchronize all works in all domains.
The patch is fairly wordwrapped - please fix up your email client.
More seriously, it does not apply to linux-next due to some fairly
significant changes which have been sitting in Dan's tree since May.
What's going on?
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH RESEND] Fix a dead loop in async_synchronize_full()
2012-07-11 22:42 ` Andrew Morton
@ 2012-07-11 22:50 ` Dan Williams
2012-07-12 9:56 ` Li Zhong
2012-07-12 2:49 ` Li Zhong
1 sibling, 1 reply; 10+ messages in thread
From: Dan Williams @ 2012-07-11 22:50 UTC (permalink / raw)
To: Andrew Morton
Cc: Li Zhong, LKML, arjan, Paul E. McKenney, Christian Kujau, Cong Wang
On Wed, Jul 11, 2012 at 3:42 PM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> The patch is fairly wordwrapped - please fix up your email client.
>
> More seriously, it does not apply to linux-next due to some fairly
> significant changes which have been sitting in Dan's tree since May.
> What's going on?
>
Those changes missed the 3.5 merge window, but now that they have
Arjan's ack they should head upstream via James for 3.6. Right now
they are on his pending [1] branch.
As far as the comment:
> It seems async_synchronize_full() wants to synchronize all entries in
> all running lists(domains), so maybe we could just check the entry_count
> to know whether all works are finished.
...at first glance this is what the new async patches achieve.
async_synchronize_full should now sync work across all domains, but if
you can reproduce this bug it would be nice to confirm that the
pending changes fix it.
--
Dan
[1]: http://git.kernel.org/?p=linux/kernel/git/jejb/scsi.git;a=shortlog;h=refs/heads/pending
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH RESEND] Fix a dead loop in async_synchronize_full()
2012-07-11 22:42 ` Andrew Morton
2012-07-11 22:50 ` Dan Williams
@ 2012-07-12 2:49 ` Li Zhong
1 sibling, 0 replies; 10+ messages in thread
From: Li Zhong @ 2012-07-12 2:49 UTC (permalink / raw)
To: Andrew Morton
Cc: LKML, arjan, Paul E. McKenney, Christian Kujau, Cong Wang, Dan Williams
On Wed, 2012-07-11 at 15:42 -0700, Andrew Morton wrote:
> On Mon, 09 Jul 2012 15:04:25 +0800
> Li Zhong <zhong@linux.vnet.ibm.com> wrote:
>
> > This patch tries to fix a dead loop in async_synchronize_full(), which
> > could be seen when preemption is disabled on a single cpu machine.
> >
> > void async_synchronize_full(void)
> > {
> > do {
> > async_synchronize_cookie(next_cookie);
> > } while (!list_empty(&async_running) || !
> > list_empty(&async_pending));
> > }
> >
> > async_synchronize_cookie() calls async_synchronize_cookie_domain() with
> > &async_running as the default domain to synchronize.
> >
> > However, there might be some works in the async_pending list from other
> > domains. On a single cpu system, without preemption, there is no chance
> > for the other works to finish, so async_synchronize_full() enters a dead
> > loop.
> >
> > It seems async_synchronize_full() wants to synchronize all entries in
> > all running lists(domains), so maybe we could just check the entry_count
> > to know whether all works are finished.
> >
> > Currently, async_synchronize_cookie_domain() expects a non-NULL running
> > list ( if NULL, there would be NULL pointer dereference ), so maybe a
> > NULL pointer could be used as an indication for the functions to
> > synchronize all works in all domains.
>
> The patch is fairly wordwrapped - please fix up your email client.
Ah, sorry for that, I will check it.
>
> More seriously, it does not apply to linux-next due to some fairly
> significant changes which have been sitting in Dan's tree since May.
> What's going on?
>
Just went through Dan's patches, it seems that they also had
async_synchronize_full() to sync all domains. I will test/check those
patches, and drop this one if the result is good.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH RESEND] Fix a dead loop in async_synchronize_full()
2012-07-11 22:50 ` Dan Williams
@ 2012-07-12 9:56 ` Li Zhong
2012-07-12 16:41 ` Dan Williams
2012-07-16 18:32 ` Christian Kujau
0 siblings, 2 replies; 10+ messages in thread
From: Li Zhong @ 2012-07-12 9:56 UTC (permalink / raw)
To: Dan Williams
Cc: Andrew Morton, LKML, arjan, Paul E. McKenney, Christian Kujau, Cong Wang
On Wed, 2012-07-11 at 15:50 -0700, Dan Williams wrote:
> On Wed, Jul 11, 2012 at 3:42 PM, Andrew Morton
> <akpm@linux-foundation.org> wrote:
> > The patch is fairly wordwrapped - please fix up your email client.
> >
> > More seriously, it does not apply to linux-next due to some fairly
> > significant changes which have been sitting in Dan's tree since May.
> > What's going on?
> >
>
> Those changes missed the 3.5 merge window, but now that they have
> Arjan's ack they should head upstream via James for 3.6. Right now
> they are on his pending [1] branch.
>
> As far as the comment:
>
> > It seems async_synchronize_full() wants to synchronize all entries in
> > all running lists(domains), so maybe we could just check the entry_count
> > to know whether all works are finished.
>
> ...at first glance this is what the new async patches achieve.
> async_synchronize_full should now sync work across all domains, but if
> you can reproduce this bug it would be nice to confirm that the
> pending changes fix it.
>
I have tested your pending patches, they fix the problem here.
But with ASYNC_DOMAIN_EXCLUSIVE added for the domains defined on the
stack, I think we lack a function that could wait for all the works in
all domains (however, maybe actually we don't need such an interface).
Also, I think it's not good to exclude them from
async_synchronize_full() just because they are defined on the stack.
> --
> Dan
>
> [1]: http://git.kernel.org/?p=linux/kernel/git/jejb/scsi.git;a=shortlog;h=refs/heads/pending
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH RESEND] Fix a dead loop in async_synchronize_full()
2012-07-12 9:56 ` Li Zhong
@ 2012-07-12 16:41 ` Dan Williams
2012-07-16 18:32 ` Christian Kujau
1 sibling, 0 replies; 10+ messages in thread
From: Dan Williams @ 2012-07-12 16:41 UTC (permalink / raw)
To: Li Zhong
Cc: Andrew Morton, LKML, arjan, Paul E. McKenney, Christian Kujau,
Cong Wang, JBottomley
[ adding James ]
On Thu, Jul 12, 2012 at 2:56 AM, Li Zhong <zhong@linux.vnet.ibm.com> wrote:
> I have tested your pending patches, they fix the problem here.
Thanks!
James, if you get the chance please add:
Tested-by: Li Zhong <zhong@linux.vnet.ibm.com>
...to the pending set, or I can just resend. Let me know.
> But with ASYNC_DOMAIN_EXCLUSIVE added for the domains defined on the
> stack, I think we lack a function that could wait for all the works in
> all domains (however, maybe actually we don't need such an interface).
>
> Also, I think it's not good to exclude them from
> async_synchronize_full() just because they are defined on the stack.
ASYNC_DOMAIN can be used to allow on-stack domains to be flushed via
async_synchronize_full(). However, if you know ahead of time that
your work items do not need to be anonymously flushed and you know the
lifetime of your domain ASYNC_DOMAIN_EXCLUSIVE +
async_synchronize_full_domain() are there to prevent unnecessary
entanglements.
If for some reason you want to have temporary on stack domains be
globally visible I included an async_unregister_domain() routine to
make the api complete. It has a comment that using
ASYNC_DOMAIN_EXCLUSIVE for such domains is preferred.
--
Dan
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH RESEND] Fix a dead loop in async_synchronize_full()
2012-07-12 9:56 ` Li Zhong
2012-07-12 16:41 ` Dan Williams
@ 2012-07-16 18:32 ` Christian Kujau
2012-07-16 23:01 ` Dan Williams
1 sibling, 1 reply; 10+ messages in thread
From: Christian Kujau @ 2012-07-16 18:32 UTC (permalink / raw)
To: Li Zhong
Cc: Dan Williams, Andrew Morton, LKML, arjan, Paul E. McKenney, Cong Wang
On Thu, 12 Jul 2012 at 17:56, Li Zhong wrote:
> On Wed, 2012-07-11 at 15:50 -0700, Dan Williams wrote:
> > On Wed, Jul 11, 2012 at 3:42 PM, Andrew Morton
> > <akpm@linux-foundation.org> wrote:
> > > The patch is fairly wordwrapped - please fix up your email client.
> > >
> > > More seriously, it does not apply to linux-next due to some fairly
> > > significant changes which have been sitting in Dan's tree since May.
> > > What's going on?
> > >
> >
> > Those changes missed the 3.5 merge window, but now that they have
> > Arjan's ack they should head upstream via James for 3.6. Right now
> > they are on his pending [1] branch.
> >
> > As far as the comment:
> >
> > > It seems async_synchronize_full() wants to synchronize all entries in
> > > all running lists(domains), so maybe we could just check the entry_count
> > > to know whether all works are finished.
> >
> > ...at first glance this is what the new async patches achieve.
> > async_synchronize_full should now sync work across all domains, but if
> > you can reproduce this bug it would be nice to confirm that the
> > pending changes fix it.
> >
>
> I have tested your pending patches, they fix the problem here.
>
> But with ASYNC_DOMAIN_EXCLUSIVE added for the domains defined on the
> stack, I think we lack a function that could wait for all the works in
> all domains (however, maybe actually we don't need such an interface).
>
> Also, I think it's not good to exclude them from
> async_synchronize_full() just because they are defined on the stack.
Is this still scheduled to go into 3.5? I'm asking because -rc7 has been
released and does not contain this fix. W/o this fix, my powerpc system
won't boot[0] :-\
Thanks,
Christian.
[0] http://nerdbynature.de/bits/3.5.0-rc5/soft_lockup/
--
BOFH excuse #28:
CPU radiator broken
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH RESEND] Fix a dead loop in async_synchronize_full()
2012-07-16 18:32 ` Christian Kujau
@ 2012-07-16 23:01 ` Dan Williams
2012-07-17 1:57 ` Paul E. McKenney
0 siblings, 1 reply; 10+ messages in thread
From: Dan Williams @ 2012-07-16 23:01 UTC (permalink / raw)
To: Christian Kujau
Cc: Li Zhong, Andrew Morton, LKML, arjan, Paul E. McKenney,
Cong Wang, JBottomley
On Mon, Jul 16, 2012 at 11:32 AM, Christian Kujau <lists@nerdbynature.de> wrote:
> Is this still scheduled to go into 3.5? I'm asking because -rc7 has been
> released and does not contain this fix. W/o this fix, my powerpc system
> won't boot[0] :-\
I don't expect James is going to push my async changes for 3.5. So
maybe we should go with Zhong's smaller fix for 3.5 and then replace
with the ASYNC_DOMAIN patches in 3.6?
--
Dan
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH RESEND] Fix a dead loop in async_synchronize_full()
2012-07-16 23:01 ` Dan Williams
@ 2012-07-17 1:57 ` Paul E. McKenney
2012-07-17 2:42 ` Li Zhong
0 siblings, 1 reply; 10+ messages in thread
From: Paul E. McKenney @ 2012-07-17 1:57 UTC (permalink / raw)
To: Dan Williams
Cc: Christian Kujau, Li Zhong, Andrew Morton, LKML, arjan, Cong Wang,
JBottomley
On Mon, Jul 16, 2012 at 04:01:33PM -0700, Dan Williams wrote:
> On Mon, Jul 16, 2012 at 11:32 AM, Christian Kujau <lists@nerdbynature.de> wrote:
> > Is this still scheduled to go into 3.5? I'm asking because -rc7 has been
> > released and does not contain this fix. W/o this fix, my powerpc system
> > won't boot[0] :-\
>
> I don't expect James is going to push my async changes for 3.5. So
> maybe we should go with Zhong's smaller fix for 3.5 and then replace
> with the ASYNC_DOMAIN patches in 3.6?
Makes sense to me! I have been using Zhong's fix for my testing on
PowerPC for some weeks now with no trouble.
Thanx, Paul
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH RESEND] Fix a dead loop in async_synchronize_full()
2012-07-17 1:57 ` Paul E. McKenney
@ 2012-07-17 2:42 ` Li Zhong
0 siblings, 0 replies; 10+ messages in thread
From: Li Zhong @ 2012-07-17 2:42 UTC (permalink / raw)
To: paulmck
Cc: Dan Williams, Christian Kujau, Andrew Morton, LKML, arjan,
Cong Wang, JBottomley, zhong
resend it again with the email client fixed... in case it is needed
This patch tries to fix a dead loop in async_synchronize_full(), which
could be seen when preemption is disabled on a single cpu machine.
void async_synchronize_full(void)
{
do {
async_synchronize_cookie(next_cookie);
} while (!list_empty(&async_running) || !
list_empty(&async_pending));
}
async_synchronize_cookie() calls async_synchronize_cookie_domain() with
&async_running as the default domain to synchronize.
However, there might be some works in the async_pending list from other
domains. On a single cpu system, without preemption, there is no chance
for the other works to finish, so async_synchronize_full() enters a dead
loop.
It seems async_synchronize_full() wants to synchronize all entries in
all running lists(domains), so maybe we could just check the entry_count
to know whether all works are finished.
Currently, async_synchronize_cookie_domain() expects a non-NULL running
list ( if NULL, there would be NULL pointer dereference ), so maybe a
NULL pointer could be used as an indication for the functions to
synchronize all works in all domains.
Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Christian Kujau <lists@nerdbynature.de>
---
kernel/async.c | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/kernel/async.c b/kernel/async.c
index bd0c168..32d8dc9 100644
--- a/kernel/async.c
+++ b/kernel/async.c
@@ -86,6 +86,13 @@ static async_cookie_t __lowest_in_progress(struct list_head *running)
{
struct async_entry *entry;
+ if (!running) { /* just check the entry count */
+ if (atomic_read(&entry_count))
+ return 0; /* smaller than any cookie */
+ else
+ return next_cookie;
+ }
+
if (!list_empty(running)) {
entry = list_first_entry(running,
struct async_entry, list);
@@ -236,9 +243,7 @@ EXPORT_SYMBOL_GPL(async_schedule_domain);
*/
void async_synchronize_full(void)
{
- do {
- async_synchronize_cookie(next_cookie);
- } while (!list_empty(&async_running) || !list_empty(&async_pending));
+ async_synchronize_cookie_domain(next_cookie, NULL);
}
EXPORT_SYMBOL_GPL(async_synchronize_full);
@@ -258,7 +263,7 @@ EXPORT_SYMBOL_GPL(async_synchronize_full_domain);
/**
* async_synchronize_cookie_domain - synchronize asynchronous function calls within a certain domain with cookie checkpointing
* @cookie: async_cookie_t to use as checkpoint
- * @running: running list to synchronize on
+ * @running: running list to synchronize on, NULL indicates all lists
*
* This function waits until all asynchronous function calls for the
* synchronization domain specified by the running list @list submitted
--
1.7.9.5
^ permalink raw reply related [flat|nested] 10+ messages in thread
end of thread, other threads:[~2012-07-17 2:42 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-09 7:04 [PATCH RESEND] Fix a dead loop in async_synchronize_full() Li Zhong
2012-07-11 22:42 ` Andrew Morton
2012-07-11 22:50 ` Dan Williams
2012-07-12 9:56 ` Li Zhong
2012-07-12 16:41 ` Dan Williams
2012-07-16 18:32 ` Christian Kujau
2012-07-16 23:01 ` Dan Williams
2012-07-17 1:57 ` Paul E. McKenney
2012-07-17 2:42 ` Li Zhong
2012-07-12 2:49 ` Li Zhong
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.