All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] Btrfs: fix kernel oops while reading compressed data
@ 2017-09-20 23:50 Liu Bo
  2017-09-20 23:50 ` [PATCH 2/2] Btrfs: skip checksum when reading compressed data if some IO have failed Liu Bo
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Liu Bo @ 2017-09-20 23:50 UTC (permalink / raw)
  To: linux-btrfs

The kernel oops happens at

kernel BUG at fs/btrfs/extent_io.c:2104!
...
RIP: clean_io_failure+0x263/0x2a0 [btrfs]

It's showing that read-repair code is using an improper mirror index.
This is due to the fact that compression read's endio hasn't recorded
the failed mirror index in %cb->orig_bio.

With this, btrfs's read-repair can work properly on reading compressed
data.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reported-by: Paul Jones <paul@pauljones.id.au>
---
 fs/btrfs/compression.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index d2ef9ac..9a4ccdf 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -107,6 +107,7 @@ static void end_compressed_bio_read(struct bio *bio)
 	struct inode *inode;
 	struct page *page;
 	unsigned long index;
+	unsigned int mirror = btrfs_io_bio(bio)->mirror_num;
 	int ret;
 
 	if (bio->bi_status)
@@ -118,6 +119,14 @@ static void end_compressed_bio_read(struct bio *bio)
 	if (!refcount_dec_and_test(&cb->pending_bios))
 		goto out;
 
+	/*
+	 * Record the correct mirror_num in cb->orig_bio so that
+	 * read-repair can work properly.
+	 */
+	ASSERT(btrfs_io_bio(cb->orig_bio));
+	btrfs_io_bio(cb->orig_bio)->mirror_num = mirror;
+	cb->mirror_num = mirror;
+
 	inode = cb->inode;
 	ret = check_compressed_csum(BTRFS_I(inode), cb,
 				    (u64)bio->bi_iter.bi_sector << 9);
-- 
2.9.4


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/2] Btrfs: skip checksum when reading compressed data if some IO have failed
  2017-09-20 23:50 [PATCH 1/2] Btrfs: fix kernel oops while reading compressed data Liu Bo
@ 2017-09-20 23:50 ` Liu Bo
  2017-09-24 13:48   ` David Sterba
  2017-09-24 13:04 ` [PATCH 1/2] Btrfs: fix kernel oops while reading compressed data David Sterba
  2017-09-24 13:45 ` David Sterba
  2 siblings, 1 reply; 7+ messages in thread
From: Liu Bo @ 2017-09-20 23:50 UTC (permalink / raw)
  To: linux-btrfs

Currently even if the underlying disk reports failure on IO,
compressed read endio still gets to verify checksum and reports it as
a checksum error.

In fact, if some IO have failed during reading a compressed data
extent , there's no way the checksum could match, therefore, we can
skip that in order to return error quickly to the upper layer.

Please note that we need to do this after recording the failed mirror
index so that read-repair in the upper layer's endio can work
properly.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
---
 fs/btrfs/compression.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 9a4ccdf..d8d4678 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -108,7 +108,7 @@ static void end_compressed_bio_read(struct bio *bio)
 	struct page *page;
 	unsigned long index;
 	unsigned int mirror = btrfs_io_bio(bio)->mirror_num;
-	int ret;
+	int ret = 0;
 
 	if (bio->bi_status)
 		cb->errors = 1;
@@ -127,6 +127,13 @@ static void end_compressed_bio_read(struct bio *bio)
 	btrfs_io_bio(cb->orig_bio)->mirror_num = mirror;
 	cb->mirror_num = mirror;
 
+	/*
+	 * Some IO in this cb have failed, just skip checksum as there
+	 * is no way it could be correct.
+	 */
+	if (cb->errors == 1)
+		goto csum_failed;
+
 	inode = cb->inode;
 	ret = check_compressed_csum(BTRFS_I(inode), cb,
 				    (u64)bio->bi_iter.bi_sector << 9);
-- 
2.9.4


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] Btrfs: fix kernel oops while reading compressed data
  2017-09-20 23:50 [PATCH 1/2] Btrfs: fix kernel oops while reading compressed data Liu Bo
  2017-09-20 23:50 ` [PATCH 2/2] Btrfs: skip checksum when reading compressed data if some IO have failed Liu Bo
@ 2017-09-24 13:04 ` David Sterba
  2017-09-24 13:45 ` David Sterba
  2 siblings, 0 replies; 7+ messages in thread
From: David Sterba @ 2017-09-24 13:04 UTC (permalink / raw)
  To: Liu Bo; +Cc: linux-btrfs

On Wed, Sep 20, 2017 at 05:50:18PM -0600, Liu Bo wrote:
> The kernel oops happens at
> 
> kernel BUG at fs/btrfs/extent_io.c:2104!
> ...
> RIP: clean_io_failure+0x263/0x2a0 [btrfs]
> 
> It's showing that read-repair code is using an improper mirror index.
> This is due to the fact that compression read's endio hasn't recorded
> the failed mirror index in %cb->orig_bio.

This sounds like the corner case of raid1 repair that was reported i the
past and actually is the only thing because of which we are not so
comfortable to mark raid1 'completely ok' in the Status page.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] Btrfs: fix kernel oops while reading compressed data
  2017-09-20 23:50 [PATCH 1/2] Btrfs: fix kernel oops while reading compressed data Liu Bo
  2017-09-20 23:50 ` [PATCH 2/2] Btrfs: skip checksum when reading compressed data if some IO have failed Liu Bo
  2017-09-24 13:04 ` [PATCH 1/2] Btrfs: fix kernel oops while reading compressed data David Sterba
@ 2017-09-24 13:45 ` David Sterba
  2017-09-26  8:41   ` Paul Jones
  2 siblings, 1 reply; 7+ messages in thread
From: David Sterba @ 2017-09-24 13:45 UTC (permalink / raw)
  To: Liu Bo; +Cc: linux-btrfs

On Wed, Sep 20, 2017 at 05:50:18PM -0600, Liu Bo wrote:
> The kernel oops happens at
> 
> kernel BUG at fs/btrfs/extent_io.c:2104!
> ...
> RIP: clean_io_failure+0x263/0x2a0 [btrfs]
> 
> It's showing that read-repair code is using an improper mirror index.
> This is due to the fact that compression read's endio hasn't recorded
> the failed mirror index in %cb->orig_bio.
> 
> With this, btrfs's read-repair can work properly on reading compressed
> data.
> 
> Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
> Reported-by: Paul Jones <paul@pauljones.id.au>

Reviewed-by: David Sterba <dsterba@suse.com>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] Btrfs: skip checksum when reading compressed data if some IO have failed
  2017-09-20 23:50 ` [PATCH 2/2] Btrfs: skip checksum when reading compressed data if some IO have failed Liu Bo
@ 2017-09-24 13:48   ` David Sterba
  0 siblings, 0 replies; 7+ messages in thread
From: David Sterba @ 2017-09-24 13:48 UTC (permalink / raw)
  To: Liu Bo; +Cc: linux-btrfs

On Wed, Sep 20, 2017 at 05:50:19PM -0600, Liu Bo wrote:
> Currently even if the underlying disk reports failure on IO,
> compressed read endio still gets to verify checksum and reports it as
> a checksum error.
> 
> In fact, if some IO have failed during reading a compressed data
> extent , there's no way the checksum could match, therefore, we can
> skip that in order to return error quickly to the upper layer.
> 
> Please note that we need to do this after recording the failed mirror
> index so that read-repair in the upper layer's endio can work
> properly.
> 
> Signed-off-by: Liu Bo <bo.li.liu@oracle.com>

Reviewed-by: David Sterba <dsterba@suse.com>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH 1/2] Btrfs: fix kernel oops while reading compressed data
  2017-09-24 13:45 ` David Sterba
@ 2017-09-26  8:41   ` Paul Jones
  2017-09-26 11:33     ` David Sterba
  0 siblings, 1 reply; 7+ messages in thread
From: Paul Jones @ 2017-09-26  8:41 UTC (permalink / raw)
  To: dsterba, Liu Bo; +Cc: linux-btrfs

> -----Original Message-----
> From: linux-btrfs-owner@vger.kernel.org [mailto:linux-btrfs-
> owner@vger.kernel.org] On Behalf Of David Sterba
> Sent: Sunday, 24 September 2017 11:46 PM
> To: Liu Bo <bo.li.liu@oracle.com>
> Cc: linux-btrfs@vger.kernel.org
> Subject: Re: [PATCH 1/2] Btrfs: fix kernel oops while reading compressed
> data
> 
> On Wed, Sep 20, 2017 at 05:50:18PM -0600, Liu Bo wrote:
> > The kernel oops happens at
> >
> > kernel BUG at fs/btrfs/extent_io.c:2104!
> > ...
> > RIP: clean_io_failure+0x263/0x2a0 [btrfs]
> >
> > It's showing that read-repair code is using an improper mirror index.
> > This is due to the fact that compression read's endio hasn't recorded
> > the failed mirror index in %cb->orig_bio.
> >
> > With this, btrfs's read-repair can work properly on reading compressed
> > data.
> >
> > Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
> > Reported-by: Paul Jones <paul@pauljones.id.au>
> 
> Reviewed-by: David Sterba <dsterba@suse.com>

Tested-by: <paul@pauljones.id.au>
For both patches.

I caused the same thing to happen again, this time by unplugging the wrong hard drive. Applied the patches and problem (BUG_ON) is gone.
Should this also go to stable? Seems like a rather glaring problem to me.

Paul.





^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] Btrfs: fix kernel oops while reading compressed data
  2017-09-26  8:41   ` Paul Jones
@ 2017-09-26 11:33     ` David Sterba
  0 siblings, 0 replies; 7+ messages in thread
From: David Sterba @ 2017-09-26 11:33 UTC (permalink / raw)
  To: Paul Jones; +Cc: dsterba, Liu Bo, linux-btrfs

On Tue, Sep 26, 2017 at 08:41:27AM +0000, Paul Jones wrote:
> > -----Original Message-----
> > From: linux-btrfs-owner@vger.kernel.org [mailto:linux-btrfs-
> > owner@vger.kernel.org] On Behalf Of David Sterba
> > Sent: Sunday, 24 September 2017 11:46 PM
> > To: Liu Bo <bo.li.liu@oracle.com>
> > Cc: linux-btrfs@vger.kernel.org
> > Subject: Re: [PATCH 1/2] Btrfs: fix kernel oops while reading compressed
> > data
> > 
> > On Wed, Sep 20, 2017 at 05:50:18PM -0600, Liu Bo wrote:
> > > The kernel oops happens at
> > >
> > > kernel BUG at fs/btrfs/extent_io.c:2104!
> > > ...
> > > RIP: clean_io_failure+0x263/0x2a0 [btrfs]
> > >
> > > It's showing that read-repair code is using an improper mirror index.
> > > This is due to the fact that compression read's endio hasn't recorded
> > > the failed mirror index in %cb->orig_bio.
> > >
> > > With this, btrfs's read-repair can work properly on reading compressed
> > > data.
> > >
> > > Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
> > > Reported-by: Paul Jones <paul@pauljones.id.au>
> > 
> > Reviewed-by: David Sterba <dsterba@suse.com>
> 
> Tested-by: <paul@pauljones.id.au>
> For both patches.

Thanks for testing.

> I caused the same thing to happen again, this time by unplugging the
> wrong hard drive. Applied the patches and problem (BUG_ON) is gone.
> Should this also go to stable? Seems like a rather glaring problem to me.

Yes it should and will be forwarded there once it's merged to Linus'
tree.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-09-26 11:34 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-20 23:50 [PATCH 1/2] Btrfs: fix kernel oops while reading compressed data Liu Bo
2017-09-20 23:50 ` [PATCH 2/2] Btrfs: skip checksum when reading compressed data if some IO have failed Liu Bo
2017-09-24 13:48   ` David Sterba
2017-09-24 13:04 ` [PATCH 1/2] Btrfs: fix kernel oops while reading compressed data David Sterba
2017-09-24 13:45 ` David Sterba
2017-09-26  8:41   ` Paul Jones
2017-09-26 11:33     ` David Sterba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.