All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RESEND] Fix a dead loop in async_synchronize_full()
@ 2012-07-09  7:04 Li Zhong
  2012-07-11 22:42 ` Andrew Morton
  0 siblings, 1 reply; 10+ messages in thread
From: Li Zhong @ 2012-07-09  7:04 UTC (permalink / raw)
  To: LKML; +Cc: arjan, Paul E. McKenney, Christian Kujau, Cong Wang

This patch tries to fix a dead loop in  async_synchronize_full(), which
could be seen when preemption is disabled on a single cpu machine. 

void async_synchronize_full(void)
{
        do {
                async_synchronize_cookie(next_cookie);
        } while (!list_empty(&async_running) || !
list_empty(&async_pending));
}

async_synchronize_cookie() calls async_synchronize_cookie_domain() with
&async_running as the default domain to synchronize. 

However, there might be some works in the async_pending list from other
domains. On a single cpu system, without preemption, there is no chance
for the other works to finish, so async_synchronize_full() enters a dead
loop. 

It seems async_synchronize_full() wants to synchronize all entries in
all running lists(domains), so maybe we could just check the entry_count
to know whether all works are finished. 

Currently, async_synchronize_cookie_domain() expects a non-NULL running
list ( if NULL, there would be NULL pointer dereference ), so maybe a
NULL pointer could be used as an indication for the functions to
synchronize all works in all domains. 

Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Christian Kujau <lists@nerdbynature.de>
---
 kernel/async.c |   13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/kernel/async.c b/kernel/async.c
index bd0c168..32d8dc9 100644
--- a/kernel/async.c
+++ b/kernel/async.c
@@ -86,6 +86,13 @@ static async_cookie_t  __lowest_in_progress(struct
list_head *running)
 {
 	struct async_entry *entry;
 
+	if (!running) { /* just check the entry count */
+		if (atomic_read(&entry_count))
+			return 0; /* smaller than any cookie */
+		else
+			return next_cookie;
+	}
+
 	if (!list_empty(running)) {
 		entry = list_first_entry(running,
 			struct async_entry, list);
@@ -236,9 +243,7 @@ EXPORT_SYMBOL_GPL(async_schedule_domain);
  */
 void async_synchronize_full(void)
 {
-	do {
-		async_synchronize_cookie(next_cookie);
-	} while (!list_empty(&async_running) || !list_empty(&async_pending));
+	async_synchronize_cookie_domain(next_cookie, NULL);
 }
 EXPORT_SYMBOL_GPL(async_synchronize_full);
 
@@ -258,7 +263,7 @@ EXPORT_SYMBOL_GPL(async_synchronize_full_domain);
 /**
  * async_synchronize_cookie_domain - synchronize asynchronous function
calls within a certain domain with cookie checkpointing
  * @cookie: async_cookie_t to use as checkpoint
- * @running: running list to synchronize on
+ * @running: running list to synchronize on, NULL indicates all lists
  *
  * This function waits until all asynchronous function calls for the
  * synchronization domain specified by the running list @list submitted
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH RESEND] Fix a dead loop in async_synchronize_full()
  2012-07-09  7:04 [PATCH RESEND] Fix a dead loop in async_synchronize_full() Li Zhong
@ 2012-07-11 22:42 ` Andrew Morton
  2012-07-11 22:50   ` Dan Williams
  2012-07-12  2:49   ` Li Zhong
  0 siblings, 2 replies; 10+ messages in thread
From: Andrew Morton @ 2012-07-11 22:42 UTC (permalink / raw)
  To: Li Zhong
  Cc: LKML, arjan, Paul E. McKenney, Christian Kujau, Cong Wang, Dan Williams

On Mon, 09 Jul 2012 15:04:25 +0800
Li Zhong <zhong@linux.vnet.ibm.com> wrote:

> This patch tries to fix a dead loop in  async_synchronize_full(), which
> could be seen when preemption is disabled on a single cpu machine. 
> 
> void async_synchronize_full(void)
> {
>         do {
>                 async_synchronize_cookie(next_cookie);
>         } while (!list_empty(&async_running) || !
> list_empty(&async_pending));
> }
> 
> async_synchronize_cookie() calls async_synchronize_cookie_domain() with
> &async_running as the default domain to synchronize. 
> 
> However, there might be some works in the async_pending list from other
> domains. On a single cpu system, without preemption, there is no chance
> for the other works to finish, so async_synchronize_full() enters a dead
> loop. 
> 
> It seems async_synchronize_full() wants to synchronize all entries in
> all running lists(domains), so maybe we could just check the entry_count
> to know whether all works are finished. 
> 
> Currently, async_synchronize_cookie_domain() expects a non-NULL running
> list ( if NULL, there would be NULL pointer dereference ), so maybe a
> NULL pointer could be used as an indication for the functions to
> synchronize all works in all domains. 

The patch is fairly wordwrapped - please fix up your email client.

More seriously, it does not apply to linux-next due to some fairly
significant changes which have been sitting in Dan's tree since May. 
What's going on?


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH RESEND] Fix a dead loop in async_synchronize_full()
  2012-07-11 22:42 ` Andrew Morton
@ 2012-07-11 22:50   ` Dan Williams
  2012-07-12  9:56     ` Li Zhong
  2012-07-12  2:49   ` Li Zhong
  1 sibling, 1 reply; 10+ messages in thread
From: Dan Williams @ 2012-07-11 22:50 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Li Zhong, LKML, arjan, Paul E. McKenney, Christian Kujau, Cong Wang

On Wed, Jul 11, 2012 at 3:42 PM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> The patch is fairly wordwrapped - please fix up your email client.
>
> More seriously, it does not apply to linux-next due to some fairly
> significant changes which have been sitting in Dan's tree since May.
> What's going on?
>

Those changes missed the 3.5 merge window, but now that they have
Arjan's ack they should head upstream via James for 3.6.  Right now
they are on his pending [1] branch.

As far as the comment:

> It seems async_synchronize_full() wants to synchronize all entries in
> all running lists(domains), so maybe we could just check the entry_count
> to know whether all works are finished.

...at first glance this is what the new async patches achieve.
async_synchronize_full should now sync work across all domains, but if
you can reproduce this bug it would be nice to confirm that the
pending changes fix it.

--
Dan

[1]: http://git.kernel.org/?p=linux/kernel/git/jejb/scsi.git;a=shortlog;h=refs/heads/pending

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH RESEND] Fix a dead loop in async_synchronize_full()
  2012-07-11 22:42 ` Andrew Morton
  2012-07-11 22:50   ` Dan Williams
@ 2012-07-12  2:49   ` Li Zhong
  1 sibling, 0 replies; 10+ messages in thread
From: Li Zhong @ 2012-07-12  2:49 UTC (permalink / raw)
  To: Andrew Morton
  Cc: LKML, arjan, Paul E. McKenney, Christian Kujau, Cong Wang, Dan Williams

On Wed, 2012-07-11 at 15:42 -0700, Andrew Morton wrote:
> On Mon, 09 Jul 2012 15:04:25 +0800
> Li Zhong <zhong@linux.vnet.ibm.com> wrote:
> 
> > This patch tries to fix a dead loop in  async_synchronize_full(), which
> > could be seen when preemption is disabled on a single cpu machine. 
> > 
> > void async_synchronize_full(void)
> > {
> >         do {
> >                 async_synchronize_cookie(next_cookie);
> >         } while (!list_empty(&async_running) || !
> > list_empty(&async_pending));
> > }
> > 
> > async_synchronize_cookie() calls async_synchronize_cookie_domain() with
> > &async_running as the default domain to synchronize. 
> > 
> > However, there might be some works in the async_pending list from other
> > domains. On a single cpu system, without preemption, there is no chance
> > for the other works to finish, so async_synchronize_full() enters a dead
> > loop. 
> > 
> > It seems async_synchronize_full() wants to synchronize all entries in
> > all running lists(domains), so maybe we could just check the entry_count
> > to know whether all works are finished. 
> > 
> > Currently, async_synchronize_cookie_domain() expects a non-NULL running
> > list ( if NULL, there would be NULL pointer dereference ), so maybe a
> > NULL pointer could be used as an indication for the functions to
> > synchronize all works in all domains. 
> 
> The patch is fairly wordwrapped - please fix up your email client.

Ah, sorry for that, I will check it. 

> 
> More seriously, it does not apply to linux-next due to some fairly
> significant changes which have been sitting in Dan's tree since May. 
> What's going on?
> 

Just went through Dan's patches, it seems that they also had
async_synchronize_full() to sync all domains. I will test/check those
patches, and drop this one if the result is good. 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH RESEND] Fix a dead loop in async_synchronize_full()
  2012-07-11 22:50   ` Dan Williams
@ 2012-07-12  9:56     ` Li Zhong
  2012-07-12 16:41       ` Dan Williams
  2012-07-16 18:32       ` Christian Kujau
  0 siblings, 2 replies; 10+ messages in thread
From: Li Zhong @ 2012-07-12  9:56 UTC (permalink / raw)
  To: Dan Williams
  Cc: Andrew Morton, LKML, arjan, Paul E. McKenney, Christian Kujau, Cong Wang

On Wed, 2012-07-11 at 15:50 -0700, Dan Williams wrote:
> On Wed, Jul 11, 2012 at 3:42 PM, Andrew Morton
> <akpm@linux-foundation.org> wrote:
> > The patch is fairly wordwrapped - please fix up your email client.
> >
> > More seriously, it does not apply to linux-next due to some fairly
> > significant changes which have been sitting in Dan's tree since May.
> > What's going on?
> >
> 
> Those changes missed the 3.5 merge window, but now that they have
> Arjan's ack they should head upstream via James for 3.6.  Right now
> they are on his pending [1] branch.
> 
> As far as the comment:
> 
> > It seems async_synchronize_full() wants to synchronize all entries in
> > all running lists(domains), so maybe we could just check the entry_count
> > to know whether all works are finished.
> 
> ...at first glance this is what the new async patches achieve.
> async_synchronize_full should now sync work across all domains, but if
> you can reproduce this bug it would be nice to confirm that the
> pending changes fix it.
> 

I have tested your pending patches, they fix the problem here.

But with ASYNC_DOMAIN_EXCLUSIVE added for the domains defined on the
stack, I think we lack a function that could wait for all the works in
all domains (however, maybe actually we don't need such an interface).

Also, I think it's not good to exclude them from
async_synchronize_full() just because they are defined on the stack. 

> --
> Dan
> 
> [1]: http://git.kernel.org/?p=linux/kernel/git/jejb/scsi.git;a=shortlog;h=refs/heads/pending
> 



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH RESEND] Fix a dead loop in async_synchronize_full()
  2012-07-12  9:56     ` Li Zhong
@ 2012-07-12 16:41       ` Dan Williams
  2012-07-16 18:32       ` Christian Kujau
  1 sibling, 0 replies; 10+ messages in thread
From: Dan Williams @ 2012-07-12 16:41 UTC (permalink / raw)
  To: Li Zhong
  Cc: Andrew Morton, LKML, arjan, Paul E. McKenney, Christian Kujau,
	Cong Wang, JBottomley

[ adding James ]

On Thu, Jul 12, 2012 at 2:56 AM, Li Zhong <zhong@linux.vnet.ibm.com> wrote:
> I have tested your pending patches, they fix the problem here.

Thanks!

James, if you get the chance please add:

Tested-by: Li Zhong <zhong@linux.vnet.ibm.com>

...to the pending set, or I can just resend.  Let me know.

> But with ASYNC_DOMAIN_EXCLUSIVE added for the domains defined on the
> stack, I think we lack a function that could wait for all the works in
> all domains (however, maybe actually we don't need such an interface).
>
> Also, I think it's not good to exclude them from
> async_synchronize_full() just because they are defined on the stack.

ASYNC_DOMAIN can be used to allow on-stack domains to be flushed via
async_synchronize_full().  However, if you know ahead of time that
your work items do not need to be anonymously flushed and you know the
lifetime of your domain ASYNC_DOMAIN_EXCLUSIVE +
async_synchronize_full_domain() are there to prevent unnecessary
entanglements.

If for some reason you want to have temporary on stack domains be
globally visible I included an async_unregister_domain() routine to
make the api complete.  It has a comment that using
ASYNC_DOMAIN_EXCLUSIVE for such domains is preferred.

--
Dan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH RESEND] Fix a dead loop in async_synchronize_full()
  2012-07-12  9:56     ` Li Zhong
  2012-07-12 16:41       ` Dan Williams
@ 2012-07-16 18:32       ` Christian Kujau
  2012-07-16 23:01         ` Dan Williams
  1 sibling, 1 reply; 10+ messages in thread
From: Christian Kujau @ 2012-07-16 18:32 UTC (permalink / raw)
  To: Li Zhong
  Cc: Dan Williams, Andrew Morton, LKML, arjan, Paul E. McKenney, Cong Wang

On Thu, 12 Jul 2012 at 17:56, Li Zhong wrote:
> On Wed, 2012-07-11 at 15:50 -0700, Dan Williams wrote:
> > On Wed, Jul 11, 2012 at 3:42 PM, Andrew Morton
> > <akpm@linux-foundation.org> wrote:
> > > The patch is fairly wordwrapped - please fix up your email client.
> > >
> > > More seriously, it does not apply to linux-next due to some fairly
> > > significant changes which have been sitting in Dan's tree since May.
> > > What's going on?
> > >
> > 
> > Those changes missed the 3.5 merge window, but now that they have
> > Arjan's ack they should head upstream via James for 3.6.  Right now
> > they are on his pending [1] branch.
> > 
> > As far as the comment:
> > 
> > > It seems async_synchronize_full() wants to synchronize all entries in
> > > all running lists(domains), so maybe we could just check the entry_count
> > > to know whether all works are finished.
> > 
> > ...at first glance this is what the new async patches achieve.
> > async_synchronize_full should now sync work across all domains, but if
> > you can reproduce this bug it would be nice to confirm that the
> > pending changes fix it.
> > 
> 
> I have tested your pending patches, they fix the problem here.
> 
> But with ASYNC_DOMAIN_EXCLUSIVE added for the domains defined on the
> stack, I think we lack a function that could wait for all the works in
> all domains (however, maybe actually we don't need such an interface).
> 
> Also, I think it's not good to exclude them from
> async_synchronize_full() just because they are defined on the stack. 

Is this still scheduled to go into 3.5? I'm asking because -rc7 has been 
released and does not contain this fix. W/o this fix, my powerpc system 
won't boot[0] :-\

Thanks,
Christian.

[0] http://nerdbynature.de/bits/3.5.0-rc5/soft_lockup/
-- 
BOFH excuse #28:

CPU radiator broken

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH RESEND] Fix a dead loop in async_synchronize_full()
  2012-07-16 18:32       ` Christian Kujau
@ 2012-07-16 23:01         ` Dan Williams
  2012-07-17  1:57           ` Paul E. McKenney
  0 siblings, 1 reply; 10+ messages in thread
From: Dan Williams @ 2012-07-16 23:01 UTC (permalink / raw)
  To: Christian Kujau
  Cc: Li Zhong, Andrew Morton, LKML, arjan, Paul E. McKenney,
	Cong Wang, JBottomley

On Mon, Jul 16, 2012 at 11:32 AM, Christian Kujau <lists@nerdbynature.de> wrote:
> Is this still scheduled to go into 3.5? I'm asking because -rc7 has been
> released and does not contain this fix. W/o this fix, my powerpc system
> won't boot[0] :-\

I don't expect James is going to push my async changes for 3.5.  So
maybe we should go with Zhong's smaller fix for 3.5 and then replace
with the ASYNC_DOMAIN patches in 3.6?

--
Dan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH RESEND] Fix a dead loop in async_synchronize_full()
  2012-07-16 23:01         ` Dan Williams
@ 2012-07-17  1:57           ` Paul E. McKenney
  2012-07-17  2:42             ` Li Zhong
  0 siblings, 1 reply; 10+ messages in thread
From: Paul E. McKenney @ 2012-07-17  1:57 UTC (permalink / raw)
  To: Dan Williams
  Cc: Christian Kujau, Li Zhong, Andrew Morton, LKML, arjan, Cong Wang,
	JBottomley

On Mon, Jul 16, 2012 at 04:01:33PM -0700, Dan Williams wrote:
> On Mon, Jul 16, 2012 at 11:32 AM, Christian Kujau <lists@nerdbynature.de> wrote:
> > Is this still scheduled to go into 3.5? I'm asking because -rc7 has been
> > released and does not contain this fix. W/o this fix, my powerpc system
> > won't boot[0] :-\
> 
> I don't expect James is going to push my async changes for 3.5.  So
> maybe we should go with Zhong's smaller fix for 3.5 and then replace
> with the ASYNC_DOMAIN patches in 3.6?

Makes sense to me!  I have been using Zhong's fix for my testing on
PowerPC for some weeks now with no trouble.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH RESEND] Fix a dead loop in async_synchronize_full()
  2012-07-17  1:57           ` Paul E. McKenney
@ 2012-07-17  2:42             ` Li Zhong
  0 siblings, 0 replies; 10+ messages in thread
From: Li Zhong @ 2012-07-17  2:42 UTC (permalink / raw)
  To: paulmck
  Cc: Dan Williams, Christian Kujau, Andrew Morton, LKML, arjan,
	Cong Wang, JBottomley, zhong

resend it again with the email client fixed... in case it is needed

This patch tries to fix a dead loop in  async_synchronize_full(), which
could be seen when preemption is disabled on a single cpu machine. 

void async_synchronize_full(void)
{
        do {
                async_synchronize_cookie(next_cookie);
        } while (!list_empty(&async_running) || !
list_empty(&async_pending));
}

async_synchronize_cookie() calls async_synchronize_cookie_domain() with
&async_running as the default domain to synchronize. 

However, there might be some works in the async_pending list from other
domains. On a single cpu system, without preemption, there is no chance
for the other works to finish, so async_synchronize_full() enters a dead
loop. 

It seems async_synchronize_full() wants to synchronize all entries in
all running lists(domains), so maybe we could just check the entry_count
to know whether all works are finished. 

Currently, async_synchronize_cookie_domain() expects a non-NULL running
list ( if NULL, there would be NULL pointer dereference ), so maybe a
NULL pointer could be used as an indication for the functions to
synchronize all works in all domains. 

Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Christian Kujau <lists@nerdbynature.de>
---
 kernel/async.c |   13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/kernel/async.c b/kernel/async.c
index bd0c168..32d8dc9 100644
--- a/kernel/async.c
+++ b/kernel/async.c
@@ -86,6 +86,13 @@ static async_cookie_t  __lowest_in_progress(struct list_head *running)
 {
 	struct async_entry *entry;
 
+	if (!running) { /* just check the entry count */
+		if (atomic_read(&entry_count))
+			return 0; /* smaller than any cookie */
+		else
+			return next_cookie;
+	}
+
 	if (!list_empty(running)) {
 		entry = list_first_entry(running,
 			struct async_entry, list);
@@ -236,9 +243,7 @@ EXPORT_SYMBOL_GPL(async_schedule_domain);
  */
 void async_synchronize_full(void)
 {
-	do {
-		async_synchronize_cookie(next_cookie);
-	} while (!list_empty(&async_running) || !list_empty(&async_pending));
+	async_synchronize_cookie_domain(next_cookie, NULL);
 }
 EXPORT_SYMBOL_GPL(async_synchronize_full);
 
@@ -258,7 +263,7 @@ EXPORT_SYMBOL_GPL(async_synchronize_full_domain);
 /**
  * async_synchronize_cookie_domain - synchronize asynchronous function calls within a certain domain with cookie checkpointing
  * @cookie: async_cookie_t to use as checkpoint
- * @running: running list to synchronize on
+ * @running: running list to synchronize on, NULL indicates all lists
  *
  * This function waits until all asynchronous function calls for the
  * synchronization domain specified by the running list @list submitted
-- 
1.7.9.5



^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2012-07-17  2:42 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-09  7:04 [PATCH RESEND] Fix a dead loop in async_synchronize_full() Li Zhong
2012-07-11 22:42 ` Andrew Morton
2012-07-11 22:50   ` Dan Williams
2012-07-12  9:56     ` Li Zhong
2012-07-12 16:41       ` Dan Williams
2012-07-16 18:32       ` Christian Kujau
2012-07-16 23:01         ` Dan Williams
2012-07-17  1:57           ` Paul E. McKenney
2012-07-17  2:42             ` Li Zhong
2012-07-12  2:49   ` Li Zhong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.