All of lore.kernel.org
 help / color / mirror / Atom feed
* error propagation problem on xfs over dm stripe
@ 2017-02-01  3:12 Eric Sandeen
  2017-02-01  7:42   ` Christoph Hellwig
  2017-02-01 10:28   ` Junichi Nomura
  0 siblings, 2 replies; 9+ messages in thread
From: Eric Sandeen @ 2017-02-01  3:12 UTC (permalink / raw)
  To: linux-xfs, dm-devel; +Cc: Mike Snitzer

xfstest generic/108 creates a stripe then fails one leg to
see if io errors on only a subset of an io get reported
to userspace:

# Test partial block device failure. Calls like fsync() should report failure
# on partial I/O failure, e.g. a single failed disk in a raid 0 stripe.
#
# Test motivated by an XFS bug, and this commit fixed the issue
# xfs: return errors from partial I/O failures to files


This started failing with a recent xfs commit, but the change wasn't
expected to change anything related to this behavior at all.

My best guess was that allocation and IO patterns shifted over
the stripe, and I think that turned out to be right.

I tracked this down to being unique to dm; an md stripe of
the same geometry doesn't have this issue.  Root cause seems
to be that dm's dec_pending() overwrites a bio's bi_error regardless
of current state, and in some cases will overwrite an -EIO with
a zero.  This seems to fix it for me:

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 3086da5..3555ba8 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -808,7 +808,9 @@ static void dec_pending(struct dm_io *io, int error)
 		} else {
 			/* done with normal IO or empty flush */
 			trace_block_bio_complete(md->queue, bio, io_error);
-			bio->bi_error = io_error;
+			/* don't overwrite or clear existing errors */
+			if (!bio->bi_error)
+				bio->bi_error = io_error;
 			bio_endio(bio);
 		}
 	}

but Mike was a little uneasy, not knowing for sure how we got here to
overwrite this bio's error (hopefully I'm representing his concerns
fairly and correctly).

One other clue is that I think this is a chained bio, something xfs
does use.

I'll submit the above as a proper dm patch if it seems sane, but figured
I'd throw this out on the lists as a bit of a heads up / question / RFC
before I do that.

-Eric

^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-02-01 14:49 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-01  3:12 error propagation problem on xfs over dm stripe Eric Sandeen
2017-02-01  7:42 ` [dm-devel] " Christoph Hellwig
2017-02-01  7:42   ` Christoph Hellwig
2017-02-01 14:33   ` Mike Snitzer
2017-02-01 14:33     ` Mike Snitzer
2017-02-01 14:49     ` Christoph Hellwig
2017-02-01 14:49       ` Christoph Hellwig
2017-02-01 10:28 ` [dm-devel] " Junichi Nomura
2017-02-01 10:28   ` Junichi Nomura

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.