linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] xfs_repair: kick processing thread if ra_count is at limit
@ 2018-10-24 23:11 Eric Sandeen
  2018-10-24 23:43 ` Dave Chinner
  0 siblings, 1 reply; 2+ messages in thread
From: Eric Sandeen @ 2018-10-24 23:11 UTC (permalink / raw)
  To: linux-xfs

Zorro hit an xfs_repair hang on a 500T filesystem where
all the prefetch threads were sleeping and nothing progressed.

The problem is that if every buffer we tried to read ahead in
phase6 was already up to date, pf_start_io_workers has no effect;
there is no io to do, and the sem_wait in pf_queuing_worker waits
forever.

Kick the processing thread to avoid this situation.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201173
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
---

My brains started leaking out debugging this, but it works,
and it seems harmless. :D  Happy to have review from anyone who groks
the prefetch thread management better than I do...

diff --git a/repair/prefetch.c b/repair/prefetch.c
index 9571b24..1de0e2f 100644
--- a/repair/prefetch.c
+++ b/repair/prefetch.c
@@ -768,8 +768,12 @@ pf_queuing_worker(
 			 * might get stuck on a buffer that has been locked
 			 * and added to the I/O queue but is waiting for
 			 * the thread to be woken.
+			 * Start processing as well, in case everything so
+			 * far was already prefetched and the queue is empty.
 			 */
+			
 			pf_start_io_workers(args);
+			pf_start_processing(args);
 			sem_wait(&args->ra_count);
 		}
 

^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH] xfs_repair: kick processing thread if ra_count is at limit
  2018-10-24 23:11 [PATCH] xfs_repair: kick processing thread if ra_count is at limit Eric Sandeen
@ 2018-10-24 23:43 ` Dave Chinner
  0 siblings, 0 replies; 2+ messages in thread
From: Dave Chinner @ 2018-10-24 23:43 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: linux-xfs

On Wed, Oct 24, 2018 at 06:11:46PM -0500, Eric Sandeen wrote:
> Zorro hit an xfs_repair hang on a 500T filesystem where
> all the prefetch threads were sleeping and nothing progressed.
> 
> The problem is that if every buffer we tried to read ahead in
> phase6 was already up to date, pf_start_io_workers has no effect;
> there is no io to do, and the sem_wait in pf_queuing_worker waits
> forever.
> 
> Kick the processing thread to avoid this situation.
> 
> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201173
> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
> ---
> 
> My brains started leaking out debugging this, but it works,
> and it seems harmless. :D  Happy to have review from anyone who groks
> the prefetch thread management better than I do...
> 
> diff --git a/repair/prefetch.c b/repair/prefetch.c
> index 9571b24..1de0e2f 100644
> --- a/repair/prefetch.c
> +++ b/repair/prefetch.c
> @@ -768,8 +768,12 @@ pf_queuing_worker(
>  			 * might get stuck on a buffer that has been locked
>  			 * and added to the I/O queue but is waiting for
>  			 * the thread to be woken.
> +			 * Start processing as well, in case everything so
> +			 * far was already prefetched and the queue is empty.
>  			 */
> +			
>  			pf_start_io_workers(args);
> +			pf_start_processing(args);
>  			sem_wait(&args->ra_count);
>  		}

Looks reasonable. We've had other bugs like this in the prefetch
code, so I'm not surprised there are still some lurking.

Reviewed-by: Dave Chinner <dchinner@redhat.com>

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2018-10-25  8:13 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-24 23:11 [PATCH] xfs_repair: kick processing thread if ra_count is at limit Eric Sandeen
2018-10-24 23:43 ` Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).