All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] xfs_repair: fix prefetch queue waiting
@ 2014-04-08  3:27 Eric Sandeen
  2014-04-08 12:58 ` Brian Foster
  0 siblings, 1 reply; 4+ messages in thread
From: Eric Sandeen @ 2014-04-08  3:27 UTC (permalink / raw)
  To: xfs-oss

This fixes a regression caused by:

97b1fcf xfs_repair: fix array overrun in do_inode_prefetch

The thread creation loop has 2 ways to exit; either via
the loop counter based on thread_count, or the break statement
if we've started enough workers to cover all AGs.

Whether or not the loop counter "i" reflects the number of
threads started depends on whether or not we exited via the
break.

The above commit prevented us from indexing off the end
of the queues[] array if we actually advanced "i" all the
way to thread_count, but in the case where we break, "i"
is one *less* than the nr of threads started, so we don't
wait for completion of all threads, and all hell breaks
loose in phase 5.

Just stop with the cleverness of re-using the loop counter -
instead, explicitly count threads that we start, and then use
that counter to wait for each worker to complete.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
---

I have one fs which demonstrates the problem, and have verified
the regression & tested the fix against that.

I'll run this over xfstests overnight, but it seems obvious
from here (OTOH the other fix seemed obvious too) :(

diff --git a/repair/prefetch.c b/repair/prefetch.c
index e47a48e..4c32395 100644
--- a/repair/prefetch.c
+++ b/repair/prefetch.c
@@ -944,6 +944,7 @@ do_inode_prefetch(
 	int			i;
 	struct work_queue	queue;
 	struct work_queue	*queues;
+	int			queues_started = 0;
 
 	/*
 	 * If the previous phases of repair have not overflowed the buffer
@@ -987,6 +988,7 @@ do_inode_prefetch(
 
 		create_work_queue(&queues[i], mp, 1);
 		queue_work(&queues[i], prefetch_ag_range_work, 0, wargs);
+		queues_started++;
 
 		if (wargs->end_ag >= mp->m_sb.sb_agcount)
 			break;
@@ -995,7 +997,7 @@ do_inode_prefetch(
 	/*
 	 * wait for workers to complete
 	 */
-	while (i--)
+	for (i = 0; i < queues_started; i++)
 		destroy_work_queue(&queues[i]);
 	free(queues);
 }

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] xfs_repair: fix prefetch queue waiting
  2014-04-08  3:27 [PATCH] xfs_repair: fix prefetch queue waiting Eric Sandeen
@ 2014-04-08 12:58 ` Brian Foster
  2014-04-08 13:36   ` Eric Sandeen
  0 siblings, 1 reply; 4+ messages in thread
From: Brian Foster @ 2014-04-08 12:58 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: xfs-oss

On Mon, Apr 07, 2014 at 10:27:28PM -0500, Eric Sandeen wrote:
> This fixes a regression caused by:
> 
> 97b1fcf xfs_repair: fix array overrun in do_inode_prefetch
> 
> The thread creation loop has 2 ways to exit; either via
> the loop counter based on thread_count, or the break statement
> if we've started enough workers to cover all AGs.
> 
> Whether or not the loop counter "i" reflects the number of
> threads started depends on whether or not we exited via the
> break.
> 
> The above commit prevented us from indexing off the end
> of the queues[] array if we actually advanced "i" all the
> way to thread_count, but in the case where we break, "i"
> is one *less* than the nr of threads started, so we don't
> wait for completion of all threads, and all hell breaks
> loose in phase 5.
> 
> Just stop with the cleverness of re-using the loop counter -
> instead, explicitly count threads that we start, and then use
> that counter to wait for each worker to complete.
> 
> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
> ---
> 
> I have one fs which demonstrates the problem, and have verified
> the regression & tested the fix against that.
> 
> I'll run this over xfstests overnight, but it seems obvious
> from here (OTOH the other fix seemed obvious too) :(
> 
> diff --git a/repair/prefetch.c b/repair/prefetch.c
> index e47a48e..4c32395 100644
> --- a/repair/prefetch.c
> +++ b/repair/prefetch.c
> @@ -944,6 +944,7 @@ do_inode_prefetch(
>  	int			i;
>  	struct work_queue	queue;
>  	struct work_queue	*queues;
> +	int			queues_started = 0;
>  
>  	/*
>  	 * If the previous phases of repair have not overflowed the buffer
> @@ -987,6 +988,7 @@ do_inode_prefetch(
>  
>  		create_work_queue(&queues[i], mp, 1);
>  		queue_work(&queues[i], prefetch_ag_range_work, 0, wargs);
> +		queues_started++;
>  
>  		if (wargs->end_ag >= mp->m_sb.sb_agcount)
>  			break;
> @@ -995,7 +997,7 @@ do_inode_prefetch(
>  	/*
>  	 * wait for workers to complete
>  	 */
> -	while (i--)
> +	for (i = 0; i < queues_started; i++)
>  		destroy_work_queue(&queues[i]);

Fix looks good, but any reason to reverse the order of the destroy loop?

Brian

>  	free(queues);
>  }
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] xfs_repair: fix prefetch queue waiting
  2014-04-08 12:58 ` Brian Foster
@ 2014-04-08 13:36   ` Eric Sandeen
  2014-04-08 13:52     ` Brian Foster
  0 siblings, 1 reply; 4+ messages in thread
From: Eric Sandeen @ 2014-04-08 13:36 UTC (permalink / raw)
  To: Brian Foster, Eric Sandeen; +Cc: xfs-oss

On 4/8/14, 7:58 AM, Brian Foster wrote:
> On Mon, Apr 07, 2014 at 10:27:28PM -0500, Eric Sandeen wrote:
>> This fixes a regression caused by:
>>
>> 97b1fcf xfs_repair: fix array overrun in do_inode_prefetch
>>
>> The thread creation loop has 2 ways to exit; either via
>> the loop counter based on thread_count, or the break statement
>> if we've started enough workers to cover all AGs.
>>
>> Whether or not the loop counter "i" reflects the number of
>> threads started depends on whether or not we exited via the
>> break.
>>
>> The above commit prevented us from indexing off the end
>> of the queues[] array if we actually advanced "i" all the
>> way to thread_count, but in the case where we break, "i"
>> is one *less* than the nr of threads started, so we don't
>> wait for completion of all threads, and all hell breaks
>> loose in phase 5.
>>
>> Just stop with the cleverness of re-using the loop counter -
>> instead, explicitly count threads that we start, and then use
>> that counter to wait for each worker to complete.
>>
>> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
>> ---
>>
>> I have one fs which demonstrates the problem, and have verified
>> the regression & tested the fix against that.
>>
>> I'll run this over xfstests overnight, but it seems obvious
>> from here (OTOH the other fix seemed obvious too) :(
>>
>> diff --git a/repair/prefetch.c b/repair/prefetch.c
>> index e47a48e..4c32395 100644
>> --- a/repair/prefetch.c
>> +++ b/repair/prefetch.c
>> @@ -944,6 +944,7 @@ do_inode_prefetch(
>>  	int			i;
>>  	struct work_queue	queue;
>>  	struct work_queue	*queues;
>> +	int			queues_started = 0;
>>  
>>  	/*
>>  	 * If the previous phases of repair have not overflowed the buffer
>> @@ -987,6 +988,7 @@ do_inode_prefetch(
>>  
>>  		create_work_queue(&queues[i], mp, 1);
>>  		queue_work(&queues[i], prefetch_ag_range_work, 0, wargs);
>> +		queues_started++;
>>  
>>  		if (wargs->end_ag >= mp->m_sb.sb_agcount)
>>  			break;
>> @@ -995,7 +997,7 @@ do_inode_prefetch(
>>  	/*
>>  	 * wait for workers to complete
>>  	 */
>> -	while (i--)
>> +	for (i = 0; i < queues_started; i++)
>>  		destroy_work_queue(&queues[i]);
> 
> Fix looks good, but any reason to reverse the order of the destroy loop?

simplicity? :)

I don't think it matters operationally...

-Eric
 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] xfs_repair: fix prefetch queue waiting
  2014-04-08 13:36   ` Eric Sandeen
@ 2014-04-08 13:52     ` Brian Foster
  0 siblings, 0 replies; 4+ messages in thread
From: Brian Foster @ 2014-04-08 13:52 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Eric Sandeen, xfs-oss

On Tue, Apr 08, 2014 at 08:36:19AM -0500, Eric Sandeen wrote:
> On 4/8/14, 7:58 AM, Brian Foster wrote:
> > On Mon, Apr 07, 2014 at 10:27:28PM -0500, Eric Sandeen wrote:
> >> This fixes a regression caused by:
> >>
> >> 97b1fcf xfs_repair: fix array overrun in do_inode_prefetch
> >>
> >> The thread creation loop has 2 ways to exit; either via
> >> the loop counter based on thread_count, or the break statement
> >> if we've started enough workers to cover all AGs.
> >>
> >> Whether or not the loop counter "i" reflects the number of
> >> threads started depends on whether or not we exited via the
> >> break.
> >>
> >> The above commit prevented us from indexing off the end
> >> of the queues[] array if we actually advanced "i" all the
> >> way to thread_count, but in the case where we break, "i"
> >> is one *less* than the nr of threads started, so we don't
> >> wait for completion of all threads, and all hell breaks
> >> loose in phase 5.
> >>
> >> Just stop with the cleverness of re-using the loop counter -
> >> instead, explicitly count threads that we start, and then use
> >> that counter to wait for each worker to complete.
> >>
> >> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
> >> ---
> >>
> >> I have one fs which demonstrates the problem, and have verified
> >> the regression & tested the fix against that.
> >>
> >> I'll run this over xfstests overnight, but it seems obvious
> >> from here (OTOH the other fix seemed obvious too) :(
> >>
> >> diff --git a/repair/prefetch.c b/repair/prefetch.c
> >> index e47a48e..4c32395 100644
> >> --- a/repair/prefetch.c
> >> +++ b/repair/prefetch.c
> >> @@ -944,6 +944,7 @@ do_inode_prefetch(
> >>  	int			i;
> >>  	struct work_queue	queue;
> >>  	struct work_queue	*queues;
> >> +	int			queues_started = 0;
> >>  
> >>  	/*
> >>  	 * If the previous phases of repair have not overflowed the buffer
> >> @@ -987,6 +988,7 @@ do_inode_prefetch(
> >>  
> >>  		create_work_queue(&queues[i], mp, 1);
> >>  		queue_work(&queues[i], prefetch_ag_range_work, 0, wargs);
> >> +		queues_started++;
> >>  
> >>  		if (wargs->end_ag >= mp->m_sb.sb_agcount)
> >>  			break;
> >> @@ -995,7 +997,7 @@ do_inode_prefetch(
> >>  	/*
> >>  	 * wait for workers to complete
> >>  	 */
> >> -	while (i--)
> >> +	for (i = 0; i < queues_started; i++)
> >>  		destroy_work_queue(&queues[i]);
> > 
> > Fix looks good, but any reason to reverse the order of the destroy loop?
> 
> simplicity? :)
> 

Fine by me. :)

Reviewed-by: Brian Foster <bfoster@redhat.com>

> I don't think it matters operationally...
> 
> -Eric
>  
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2014-04-08 13:52 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-08  3:27 [PATCH] xfs_repair: fix prefetch queue waiting Eric Sandeen
2014-04-08 12:58 ` Brian Foster
2014-04-08 13:36   ` Eric Sandeen
2014-04-08 13:52     ` Brian Foster

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.