All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] xfs_repair: don't unlock prefetch tree to read discontig buffers
@ 2014-05-07 23:29 Eric Sandeen
  2014-05-08  1:42 ` Dave Chinner
  0 siblings, 1 reply; 4+ messages in thread
From: Eric Sandeen @ 2014-05-07 23:29 UTC (permalink / raw)
  To: xfs-oss

The way discontiguous buffers are currently handled in
prefetch is by unlocking the prefetch tree and reading
them one at a time in pf_read_discontig(), inside the
normal loop of searching for buffers to read in a more
optimized fashion.

But by unlocking the tree, we allow other threads to come
in and find buffers which we've already stashed locally
on our bplist[].  If 2 threads think they own the same
set of buffers, they may both try to delete them from
the prefetch btree, and the second one to arrive will not
find it, resulting in:

	fatal error -- prefetch corruption

Fix this by maintaining 2 lists; the original bplist,
and a new one containing only discontiguous buffers.

The original list can be seek-optimized as before,
and the discontiguous list can be read one by one
before we do the seek-optimized reads, after all of the
tree manipulation has been completed.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
---

diff --git a/repair/prefetch.c b/repair/prefetch.c
index 65fedf5..2a8008f 100644
--- a/repair/prefetch.c
+++ b/repair/prefetch.c
@@ -444,28 +444,7 @@ pf_read_inode_dirs(
 }
 
 /*
- * Discontiguous buffers require multiple IOs to fill, so we can't use any
- * linearising, hole filling algorithms on them to avoid seeks. Just remove them
- * for the prefetch queue and read them straight into the cache and release
- * them.
- */
-static void
-pf_read_discontig(
-	struct prefetch_args	*args,
-	struct xfs_buf		*bp)
-{
-	if (!btree_delete(args->io_queue, XFS_DADDR_TO_FSB(mp, bp->b_bn)))
-		do_error(_("prefetch corruption\n"));
-
-	pthread_mutex_unlock(&args->lock);
-	libxfs_readbufr_map(mp->m_ddev_targp, bp, 0);
-	bp->b_flags |= LIBXFS_B_UNCHECKED;
-	libxfs_putbuf(bp);
-	pthread_mutex_lock(&args->lock);
-}
-
-/*
- * pf_batch_read must be called with the lock locked.
+ * pf_batch_read must be called with the args->lock mutex locked.
  */
 static void
 pf_batch_read(
@@ -474,7 +453,8 @@ pf_batch_read(
 	void			*buf)
 {
 	xfs_buf_t		*bplist[MAX_BUFS];
-	unsigned int		num;
+	xfs_buf_t		*bplist_disc[MAX_BUFS];
+	unsigned int		num, num_disc;
 	off64_t			first_off, last_off, next_off;
 	int			len, size;
 	int			i;
@@ -484,7 +464,7 @@ pf_batch_read(
 	char			*pbuf;
 
 	for (;;) {
-		num = 0;
+		num = num_disc = 0;
 		if (which == PF_SECONDARY) {
 			bplist[0] = btree_find(args->io_queue, 0, &fsbno);
 			max_fsbno = MIN(fsbno + pf_max_fsbs,
@@ -494,18 +474,22 @@ pf_batch_read(
 						args->last_bno_read, &fsbno);
 			max_fsbno = fsbno + pf_max_fsbs;
 		}
+
 		while (bplist[num] && num < MAX_BUFS && fsbno < max_fsbno) {
 			/*
-			 * Handle discontiguous buffers outside the seek
-			 * optimised IO loop below.
+ 			 * Discontiguous buffers require multiple IOs to fill,
+ 			 * so we can't use any linearising, hole filling
+ 			 * algorithms on them to avoid seeks. Just move them
+ 			 * to their own list and read them individually later.
 			 */
 			if ((bplist[num]->b_flags & LIBXFS_B_DISCONTIG)) {
-				pf_read_discontig(args, bplist[num]);
-				bplist[num] = NULL;
+				bplist_disc[num_disc] = bplist[num];
+				num_disc++;
 			} else if (which != PF_META_ONLY ||
 				   !B_IS_INODE(XFS_BUF_PRIORITY(bplist[num])))
 				num++;
-			if (num == MAX_BUFS)
+
+			if (num == MAX_BUFS || num_disc == MAX_BUFS)
 				break;
 			bplist[num] = btree_lookup_next(args->io_queue, &fsbno);
 		}
@@ -541,12 +525,19 @@ pf_batch_read(
 			num = i;
 		}
 
+		/* Take everything we found out of the tree */
 		for (i = 0; i < num; i++) {
 			if (btree_delete(args->io_queue, XFS_DADDR_TO_FSB(mp,
 					XFS_BUF_ADDR(bplist[i]))) == NULL)
 				do_error(_("prefetch corruption\n"));
 		}
 
+		for (i = 0; i < num_disc; i++) {
+			if (btree_delete(args->io_queue, XFS_DADDR_TO_FSB(mp,
+					XFS_BUF_ADDR(bplist_disc[i]))) == NULL)
+				do_error(_("prefetch corruption\n"));
+		}
+
 		if (which == PF_PRIMARY) {
 			for (inode_bufs = 0, i = 0; i < num; i++) {
 				if (B_IS_INODE(XFS_BUF_PRIORITY(bplist[i])))
@@ -566,6 +557,12 @@ pf_batch_read(
 #endif
 		pthread_mutex_unlock(&args->lock);
 
+		/* Read discontig buffers individually, if any */
+		for (i = 0; i < num_disc; i++) {
+			libxfs_readbufr_map(mp->m_ddev_targp, bplist_disc[i], 0);
+			bplist_disc[i]->b_flags |= LIBXFS_B_UNCHECKED;
+			libxfs_putbuf(bplist_disc[i]);
+		}
 		/*
 		 * now read the data and put into the xfs_but_t's
 		 */


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] xfs_repair: don't unlock prefetch tree to read discontig buffers
  2014-05-07 23:29 [PATCH] xfs_repair: don't unlock prefetch tree to read discontig buffers Eric Sandeen
@ 2014-05-08  1:42 ` Dave Chinner
  2014-05-08  1:58   ` Eric Sandeen
  0 siblings, 1 reply; 4+ messages in thread
From: Dave Chinner @ 2014-05-08  1:42 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: xfs-oss

On Wed, May 07, 2014 at 06:29:15PM -0500, Eric Sandeen wrote:
> The way discontiguous buffers are currently handled in
> prefetch is by unlocking the prefetch tree and reading
> them one at a time in pf_read_discontig(), inside the
> normal loop of searching for buffers to read in a more
> optimized fashion.
> 
> But by unlocking the tree, we allow other threads to come
> in and find buffers which we've already stashed locally
> on our bplist[].  If 2 threads think they own the same
> set of buffers, they may both try to delete them from
> the prefetch btree, and the second one to arrive will not
> find it, resulting in:
> 
> 	fatal error -- prefetch corruption
> 
> Fix this by maintaining 2 lists; the original bplist,
> and a new one containing only discontiguous buffers.
> 
> The original list can be seek-optimized as before,
> and the discontiguous list can be read one by one
> before we do the seek-optimized reads, after all of the
> tree manipulation has been completed.

Nice job finding the problem, Eric! It looks like your patch solves
the problem, but after considering this approach for a while I think
it's overkill. ;)

What the loop is trying to do is linearise all the IO and turn lots
of small IO into a single large IO, so if we grab all the discontig
buffers in the range, then do IO on them, then do the large IO, we
are effectively seeking all over that range, including backwards.
This is exactly the sort of problem the prefetch loop is trying to
avoid.

So what I think is best is that we simply abort the pulling of new
buffers off the list when we hit a discontiguous buffer. Leave the
discontig buffer as the last on the list, and process the list as
per normal. Remove all the remaining buffers from the btree, then
drop the lock and do the pread64 call.

Then, check the last buffer on the bplist - if it's the discontig
buffer (i.e. wasn't dropped during list processing), then issue the
discontig buffer IO. It should at least start as either sequential I
oor with a small forwards seek, so so shoul be as close to seek
optimised as we can get for such buffers. Then it can be removed
from the bplist, num decremented, the lock picked back up and the
large buffer read in via pread64() can be sliced and diced
appropriately...

i.e. much less code, no need for a separate list, and the seeks
shoul dbe minimised as much as possible....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] xfs_repair: don't unlock prefetch tree to read discontig buffers
  2014-05-08  1:42 ` Dave Chinner
@ 2014-05-08  1:58   ` Eric Sandeen
  2014-05-08  5:42     ` Dave Chinner
  0 siblings, 1 reply; 4+ messages in thread
From: Eric Sandeen @ 2014-05-08  1:58 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs-oss

On 5/7/14, 8:42 PM, Dave Chinner wrote:
> On Wed, May 07, 2014 at 06:29:15PM -0500, Eric Sandeen wrote:
>> The way discontiguous buffers are currently handled in
>> prefetch is by unlocking the prefetch tree and reading
>> them one at a time in pf_read_discontig(), inside the
>> normal loop of searching for buffers to read in a more
>> optimized fashion.
>>
>> But by unlocking the tree, we allow other threads to come
>> in and find buffers which we've already stashed locally
>> on our bplist[].  If 2 threads think they own the same
>> set of buffers, they may both try to delete them from
>> the prefetch btree, and the second one to arrive will not
>> find it, resulting in:
>>
>> 	fatal error -- prefetch corruption
>>
>> Fix this by maintaining 2 lists; the original bplist,
>> and a new one containing only discontiguous buffers.
>>
>> The original list can be seek-optimized as before,
>> and the discontiguous list can be read one by one
>> before we do the seek-optimized reads, after all of the
>> tree manipulation has been completed.
> 
> Nice job finding the problem, Eric! It looks like your patch solves
> the problem, but after considering this approach for a while I think
> it's overkill. ;)

Well, that's how it goes.  :)

> What the loop is trying to do is linearise all the IO and turn lots
> of small IO into a single large IO, so if we grab all the discontig
> buffers in the range, then do IO on them, then do the large IO, we
> are effectively seeking all over that range, including backwards.
> This is exactly the sort of problem the prefetch loop is trying to
> avoid.

mmmhm... OTOH, discontig buffers are ... fairly rare?  And they do
have to be read sometime.

> So what I think is best is that we simply abort the pulling of new
> buffers off the list when we hit a discontiguous buffer. Leave the
> discontig buffer as the last on the list, and process the list as
> per normal. Remove all the remaining buffers from the btree, then
> drop the lock and do the pread64 call.

I kind of half considered something like that, but the optimizing
trims back num based on a few criteria, some involving the last
buffer in the bplist.  So it's going to require a bit more awareness
I think.

> Then, check the last buffer on the bplist - if it's the discontig
> buffer (i.e. wasn't dropped during list processing), then issue the
> discontig buffer IO. It should at least start as either sequential I
> oor with a small forwards seek, so so shoul be as close to seek
> optimised as we can get for such buffers. Then it can be removed
> from the bplist, num decremented, the lock picked back up and the
> large buffer read in via pread64() can be sliced and diced
> appropriately...
> 
> i.e. much less code, no need for a separate list, and the seeks
> shoul dbe minimised as much as possible....

I'll give something like that a shot.  Yeah, it felt a bit brute-force-y.

-eric

> Cheers,
> 
> Dave.
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] xfs_repair: don't unlock prefetch tree to read discontig buffers
  2014-05-08  1:58   ` Eric Sandeen
@ 2014-05-08  5:42     ` Dave Chinner
  0 siblings, 0 replies; 4+ messages in thread
From: Dave Chinner @ 2014-05-08  5:42 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: xfs-oss

On Wed, May 07, 2014 at 08:58:38PM -0500, Eric Sandeen wrote:
> On 5/7/14, 8:42 PM, Dave Chinner wrote:
> > On Wed, May 07, 2014 at 06:29:15PM -0500, Eric Sandeen wrote:
> >> The way discontiguous buffers are currently handled in
> >> prefetch is by unlocking the prefetch tree and reading
> >> them one at a time in pf_read_discontig(), inside the
> >> normal loop of searching for buffers to read in a more
> >> optimized fashion.
> >>
> >> But by unlocking the tree, we allow other threads to come
> >> in and find buffers which we've already stashed locally
> >> on our bplist[].  If 2 threads think they own the same
> >> set of buffers, they may both try to delete them from
> >> the prefetch btree, and the second one to arrive will not
> >> find it, resulting in:
> >>
> >> 	fatal error -- prefetch corruption
> >>
> >> Fix this by maintaining 2 lists; the original bplist,
> >> and a new one containing only discontiguous buffers.
> >>
> >> The original list can be seek-optimized as before,
> >> and the discontiguous list can be read one by one
> >> before we do the seek-optimized reads, after all of the
> >> tree manipulation has been completed.
> > 
> > Nice job finding the problem, Eric! It looks like your patch solves
> > the problem, but after considering this approach for a while I think
> > it's overkill. ;)
> 
> Well, that's how it goes.  :)
> 
> > What the loop is trying to do is linearise all the IO and turn lots
> > of small IO into a single large IO, so if we grab all the discontig
> > buffers in the range, then do IO on them, then do the large IO, we
> > are effectively seeking all over that range, including backwards.
> > This is exactly the sort of problem the prefetch loop is trying to
> > avoid.
> 
> mmmhm... OTOH, discontig buffers are ... fairly rare?  And they do
> have to be read sometime.
> 
> > So what I think is best is that we simply abort the pulling of new
> > buffers off the list when we hit a discontiguous buffer. Leave the
> > discontig buffer as the last on the list, and process the list as
> > per normal. Remove all the remaining buffers from the btree, then
> > drop the lock and do the pread64 call.
> 
> I kind of half considered something like that, but the optimizing
> trims back num based on a few criteria, some involving the last
> buffer in the bplist.  So it's going to require a bit more awareness
> I think.

You can ignore it, though, because if we trim the discontig buffer
out, we don't read it in that loop, and another prefetch thread
will deal with it. If we don't trim it out, it makes no difference to
the large read...

i.e. you can completely ignore the discontig buffer until after te
pread64...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2014-05-08  5:43 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-07 23:29 [PATCH] xfs_repair: don't unlock prefetch tree to read discontig buffers Eric Sandeen
2014-05-08  1:42 ` Dave Chinner
2014-05-08  1:58   ` Eric Sandeen
2014-05-08  5:42     ` Dave Chinner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.