All of lore.kernel.org
 help / color / mirror / Atom feed
* ext4 filesystem corruption with 4.10-rc2 on ppc64le
@ 2017-01-04  5:18 ` Anton Blanchard
  0 siblings, 0 replies; 13+ messages in thread
From: Anton Blanchard @ 2017-01-04  5:18 UTC (permalink / raw)
  To: jack, Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Stephen Rothwell, axboe
  Cc: linuxppc-dev, linux-kernel, linux-ext4, linux-fsdevel

Hi,

I'm consistently seeing ext4 filesystem corruption using a mainline
kernel. It doesn't take much to trigger it - download a ppc64le Ubuntu
cloud image, boot it in KVM and run:

sudo apt-get update
sudo apt-get dist-upgrade
sudo reboot

And it never makes it back up, dying with rather severe filesystem
corruption.

I've narrowed it down to:

64e1c57fa474 ("ext4: Use clean_bdev_aliases() instead of iteration")
e64855c6cfaa ("fs: Add helper to clean bdev aliases under a bh and use it")
ce98321bf7d2 ("fs: Remove unmap_underlying_metadata")

Backing these patches out fixes the issue.

Anton

^ permalink raw reply	[flat|nested] 13+ messages in thread

* ext4 filesystem corruption with 4.10-rc2 on ppc64le
@ 2017-01-04  5:18 ` Anton Blanchard
  0 siblings, 0 replies; 13+ messages in thread
From: Anton Blanchard @ 2017-01-04  5:18 UTC (permalink / raw)
  To: jack, Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Stephen Rothwell, axboe
  Cc: linux-fsdevel, linux-ext4, linuxppc-dev, linux-kernel

Hi,

I'm consistently seeing ext4 filesystem corruption using a mainline
kernel. It doesn't take much to trigger it - download a ppc64le Ubuntu
cloud image, boot it in KVM and run:

sudo apt-get update
sudo apt-get dist-upgrade
sudo reboot

And it never makes it back up, dying with rather severe filesystem
corruption.

I've narrowed it down to:

64e1c57fa474 ("ext4: Use clean_bdev_aliases() instead of iteration")
e64855c6cfaa ("fs: Add helper to clean bdev aliases under a bh and use it")
ce98321bf7d2 ("fs: Remove unmap_underlying_metadata")

Backing these patches out fixes the issue.

Anton

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: ext4 filesystem corruption with 4.10-rc2 on ppc64le
  2017-01-04  5:18 ` Anton Blanchard
@ 2017-01-04  6:02   ` Chandan Rajendra
  -1 siblings, 0 replies; 13+ messages in thread
From: Chandan Rajendra @ 2017-01-04  6:02 UTC (permalink / raw)
  To: Anton Blanchard
  Cc: jack, Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Stephen Rothwell, axboe, linuxppc-dev, linux-kernel, linux-ext4,
	linux-fsdevel

On Wednesday, January 04, 2017 04:18:08 PM Anton Blanchard wrote:
> Hi,
> 
> I'm consistently seeing ext4 filesystem corruption using a mainline
> kernel. It doesn't take much to trigger it - download a ppc64le Ubuntu
> cloud image, boot it in KVM and run:
> 
> sudo apt-get update
> sudo apt-get dist-upgrade
> sudo reboot
> 
> And it never makes it back up, dying with rather severe filesystem
> corruption.

Hi,

The patch at https://patchwork.kernel.org/patch/9488235/ should fix the
bug.

> 
> I've narrowed it down to:
> 
> 64e1c57fa474 ("ext4: Use clean_bdev_aliases() instead of iteration")
> e64855c6cfaa ("fs: Add helper to clean bdev aliases under a bh and use it")
> ce98321bf7d2 ("fs: Remove unmap_underlying_metadata")
> 
> Backing these patches out fixes the issue.
> 
> Anton
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
chandan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: ext4 filesystem corruption with 4.10-rc2 on ppc64le
@ 2017-01-04  6:02   ` Chandan Rajendra
  0 siblings, 0 replies; 13+ messages in thread
From: Chandan Rajendra @ 2017-01-04  6:02 UTC (permalink / raw)
  To: Anton Blanchard
  Cc: Stephen Rothwell, jack, linux-kernel, axboe, Paul Mackerras,
	linux-fsdevel, linux-ext4, linuxppc-dev

On Wednesday, January 04, 2017 04:18:08 PM Anton Blanchard wrote:
> Hi,
> 
> I'm consistently seeing ext4 filesystem corruption using a mainline
> kernel. It doesn't take much to trigger it - download a ppc64le Ubuntu
> cloud image, boot it in KVM and run:
> 
> sudo apt-get update
> sudo apt-get dist-upgrade
> sudo reboot
> 
> And it never makes it back up, dying with rather severe filesystem
> corruption.

Hi,

The patch at https://patchwork.kernel.org/patch/9488235/ should fix the
bug.

> 
> I've narrowed it down to:
> 
> 64e1c57fa474 ("ext4: Use clean_bdev_aliases() instead of iteration")
> e64855c6cfaa ("fs: Add helper to clean bdev aliases under a bh and use it")
> ce98321bf7d2 ("fs: Remove unmap_underlying_metadata")
> 
> Backing these patches out fixes the issue.
> 
> Anton
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
chandan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: ext4 filesystem corruption with 4.10-rc2 on ppc64le
  2017-01-04  5:18 ` Anton Blanchard
  (?)
  (?)
@ 2017-01-04  7:34 ` luigi burdo
  -1 siblings, 0 replies; 13+ messages in thread
From: luigi burdo @ 2017-01-04  7:34 UTC (permalink / raw)
  To: Anton Blanchard, jack, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Stephen Rothwell, axboe
  Cc: linux-fsdevel, linux-ext4, linuxppc-dev, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1264 bytes --]

Hi,

it is present on ppc not le too.

found it on Ubuntu Mate 16.10 PPC with kernel 4.9 rc6 PPC64 on P5020/P5040


Thanks

Luigi


________________________________
Da: Linuxppc-dev <linuxppc-dev-bounces+intermediadc=hotmail.com@lists.ozlabs.org> per conto di Anton Blanchard <anton@samba.org>
Inviato: mercoledì 4 gennaio 2017 06.18
A: jack@suse.cz; Michael Ellerman; Benjamin Herrenschmidt; Paul Mackerras; Stephen Rothwell; axboe@fb.com
Cc: linux-fsdevel@vger.kernel.org; linux-ext4@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; linux-kernel@vger.kernel.org
Oggetto: ext4 filesystem corruption with 4.10-rc2 on ppc64le

Hi,

I'm consistently seeing ext4 filesystem corruption using a mainline
kernel. It doesn't take much to trigger it - download a ppc64le Ubuntu
cloud image, boot it in KVM and run:

sudo apt-get update
sudo apt-get dist-upgrade
sudo reboot

And it never makes it back up, dying with rather severe filesystem
corruption.

I've narrowed it down to:

64e1c57fa474 ("ext4: Use clean_bdev_aliases() instead of iteration")
e64855c6cfaa ("fs: Add helper to clean bdev aliases under a bh and use it")
ce98321bf7d2 ("fs: Remove unmap_underlying_metadata")

Backing these patches out fixes the issue.

Anton

[-- Attachment #2: Type: text/html, Size: 2205 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: ext4 filesystem corruption with 4.10-rc2 on ppc64le
  2017-01-04  5:18 ` Anton Blanchard
                   ` (2 preceding siblings ...)
  (?)
@ 2017-01-04 15:09 ` Jens Axboe
  -1 siblings, 0 replies; 13+ messages in thread
From: Jens Axboe @ 2017-01-04 15:09 UTC (permalink / raw)
  To: Anton Blanchard, jack, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Stephen Rothwell
  Cc: linuxppc-dev, linux-kernel, linux-ext4, linux-fsdevel

On 01/03/2017 10:18 PM, Anton Blanchard wrote:
> Hi,
> 
> I'm consistently seeing ext4 filesystem corruption using a mainline
> kernel. It doesn't take much to trigger it - download a ppc64le Ubuntu
> cloud image, boot it in KVM and run:
> 
> sudo apt-get update
> sudo apt-get dist-upgrade
> sudo reboot
> 
> And it never makes it back up, dying with rather severe filesystem
> corruption.
> 
> I've narrowed it down to:
> 
> 64e1c57fa474 ("ext4: Use clean_bdev_aliases() instead of iteration")
> e64855c6cfaa ("fs: Add helper to clean bdev aliases under a bh and use it")
> ce98321bf7d2 ("fs: Remove unmap_underlying_metadata")
> 
> Backing these patches out fixes the issue.

Fix is going out today, I see Chandan already pointed you at it. For the
other reporter, it's not an LE vs BE thing, it's a fs blocksize < page
size problem.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: ext4 filesystem corruption with 4.10-rc2 on ppc64le
  2017-01-04  6:02   ` Chandan Rajendra
  (?)
@ 2017-01-04 15:28   ` Theodore Ts'o
  2017-01-04 16:23     ` Jens Axboe
                       ` (2 more replies)
  -1 siblings, 3 replies; 13+ messages in thread
From: Theodore Ts'o @ 2017-01-04 15:28 UTC (permalink / raw)
  To: Chandan Rajendra
  Cc: Anton Blanchard, jack, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Stephen Rothwell, axboe, linuxppc-dev,
	linux-kernel, linux-ext4, linux-fsdevel, Jens Axboe, torvalds

On Wed, Jan 04, 2017 at 11:32:42AM +0530, Chandan Rajendra wrote:
> On Wednesday, January 04, 2017 04:18:08 PM Anton Blanchard wrote:
> > I'm consistently seeing ext4 filesystem corruption using a mainline
> > kernel. It doesn't take much to trigger it - download a ppc64le Ubuntu
> > cloud image, boot it in KVM and run:
> > 
> > sudo apt-get update
> > sudo apt-get dist-upgrade
> > sudo reboot
> > 
> > And it never makes it back up, dying with rather severe filesystem
> > corruption.
> 
> The patch at https://patchwork.kernel.org/patch/9488235/ should fix the
> bug.

It looks like this patch is already queued up on the "for-linus"
branch on the linux-block.git tree.

Chandra, thanks for pointing this out!  I had missed your e-mail from
Christmas day, and it was on my todo list to figure out why I was
seeing lots of 1k block regressions on gce-xfstests post-merge window
that wasn't showing up on the ext4.git tree before I sent my pull
request to Linus.

Jens, could you expedite a pull request to Linus?  This is affecting
ext4 on 1k block file systems on x86/x86_64, so this is not a ppc-only
regression.  

Anton or Chandan, could you do me a favor and verify whether or not
64k block sizes are working for you on ppcle on ext4 by running
xfstests?  Light duty testing works for me but when I stress ext4 with
pagesize==blocksize on ppcle64 via xfstests, it blows up.  I suspect
(but am not sure) it's due to (non-upstream) device driver issues, and
a verification that you can run xfstests on your ppcle64 systems using
standard upstream device drivers would be very helpful, since I don't
have easy console access on the machines I have access to at $WORK.  :-(

And of course, if there are still blocksize==pagesize issues on ext4
on ppc64le, it would be good to know that too.

Many thanks!!
						- Ted

P.S.  And for those people who are doing storage work, let me put in a
plug for "gce-xfstests full".  It's cheap and finds lots of problems
before I and others have to.  And if the $1.50 USD is the problem, let
me know and I'll try to work something out.  :-) :-)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: ext4 filesystem corruption with 4.10-rc2 on ppc64le
  2017-01-04 15:28   ` Theodore Ts'o
@ 2017-01-04 16:23     ` Jens Axboe
  2017-01-04 18:09         ` Linus Torvalds
  2017-01-05 10:44       ` Anton Blanchard
  2017-01-09  4:10     ` Chandan Rajendra
  2 siblings, 1 reply; 13+ messages in thread
From: Jens Axboe @ 2017-01-04 16:23 UTC (permalink / raw)
  To: Theodore Ts'o, Chandan Rajendra, Anton Blanchard, jack,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Stephen Rothwell, linuxppc-dev, linux-kernel, linux-ext4,
	linux-fsdevel, Jens Axboe, torvalds

On 01/04/2017 08:28 AM, Theodore Ts'o wrote:
> On Wed, Jan 04, 2017 at 11:32:42AM +0530, Chandan Rajendra wrote:
>> On Wednesday, January 04, 2017 04:18:08 PM Anton Blanchard wrote:
>>> I'm consistently seeing ext4 filesystem corruption using a mainline
>>> kernel. It doesn't take much to trigger it - download a ppc64le Ubuntu
>>> cloud image, boot it in KVM and run:
>>>
>>> sudo apt-get update
>>> sudo apt-get dist-upgrade
>>> sudo reboot
>>>
>>> And it never makes it back up, dying with rather severe filesystem
>>> corruption.
>>
>> The patch at https://patchwork.kernel.org/patch/9488235/ should fix the
>> bug.
> 
> It looks like this patch is already queued up on the "for-linus"
> branch on the linux-block.git tree.
> 
> Chandra, thanks for pointing this out!  I had missed your e-mail from
> Christmas day, and it was on my todo list to figure out why I was
> seeing lots of 1k block regressions on gce-xfstests post-merge window
> that wasn't showing up on the ext4.git tree before I sent my pull
> request to Linus.
> 
> Jens, could you expedite a pull request to Linus?  This is affecting
> ext4 on 1k block file systems on x86/x86_64, so this is not a ppc-only
> regression.  

Yes, it'll go out this morning.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: ext4 filesystem corruption with 4.10-rc2 on ppc64le
  2017-01-04 16:23     ` Jens Axboe
@ 2017-01-04 18:09         ` Linus Torvalds
  0 siblings, 0 replies; 13+ messages in thread
From: Linus Torvalds @ 2017-01-04 18:09 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Theodore Ts'o, Chandan Rajendra, Anton Blanchard, Jan Kara,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Stephen Rothwell, ppc-dev, Linux Kernel Mailing List, linux-ext4,
	linux-fsdevel, Jens Axboe

On Wed, Jan 4, 2017 at 8:23 AM, Jens Axboe <axboe@fb.com> wrote:
> On 01/04/2017 08:28 AM, Theodore Ts'o wrote:
>>
>> Jens, could you expedite a pull request to Linus?  This is affecting
>> ext4 on 1k block file systems on x86/x86_64, so this is not a ppc-only
>> regression.
>
> Yes, it'll go out this morning.

It's merged and out there in my tree now.

                 Linus

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: ext4 filesystem corruption with 4.10-rc2 on ppc64le
@ 2017-01-04 18:09         ` Linus Torvalds
  0 siblings, 0 replies; 13+ messages in thread
From: Linus Torvalds @ 2017-01-04 18:09 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Jens Axboe, Stephen Rothwell, Jan Kara,
	Linux Kernel Mailing List, Chandan Rajendra, Paul Mackerras,
	Anton Blanchard, linux-fsdevel, Theodore Ts'o, linux-ext4,
	ppc-dev

On Wed, Jan 4, 2017 at 8:23 AM, Jens Axboe <axboe@fb.com> wrote:
> On 01/04/2017 08:28 AM, Theodore Ts'o wrote:
>>
>> Jens, could you expedite a pull request to Linus?  This is affecting
>> ext4 on 1k block file systems on x86/x86_64, so this is not a ppc-only
>> regression.
>
> Yes, it'll go out this morning.

It's merged and out there in my tree now.

                 Linus

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: ext4 filesystem corruption with 4.10-rc2 on ppc64le
  2017-01-04 15:28   ` Theodore Ts'o
@ 2017-01-05 10:44       ` Anton Blanchard
  2017-01-05 10:44       ` Anton Blanchard
  2017-01-09  4:10     ` Chandan Rajendra
  2 siblings, 0 replies; 13+ messages in thread
From: Anton Blanchard @ 2017-01-05 10:44 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Chandan Rajendra, jack, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Stephen Rothwell, axboe, linuxppc-dev,
	linux-kernel, linux-ext4, linux-fsdevel, Jens Axboe, torvalds

Hi Ted,

> Anton or Chandan, could you do me a favor and verify whether or not
> 64k block sizes are working for you on ppcle on ext4 by running
> xfstests?  Light duty testing works for me but when I stress ext4 with
> pagesize==blocksize on ppcle64 via xfstests, it blows up.  I suspect
> (but am not sure) it's due to (non-upstream) device driver issues, and
> a verification that you can run xfstests on your ppcle64 systems using
> standard upstream device drivers would be very helpful, since I don't
> have easy console access on the machines I have access to at
> $WORK.  :-(

I fired off an xfstests run, and it looks good. There are 3 failures,
but they seem to be setup issues on my part. I also double checked
those same three failed on 4.8.

Chandan has been running the test suite regularly, and plans to do a
run against mainline too.

Anton

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: ext4 filesystem corruption with 4.10-rc2 on ppc64le
@ 2017-01-05 10:44       ` Anton Blanchard
  0 siblings, 0 replies; 13+ messages in thread
From: Anton Blanchard @ 2017-01-05 10:44 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Jens Axboe, Stephen Rothwell, jack, linux-kernel, axboe,
	torvalds, Chandan Rajendra, Paul Mackerras, linux-fsdevel,
	linux-ext4, linuxppc-dev

Hi Ted,

> Anton or Chandan, could you do me a favor and verify whether or not
> 64k block sizes are working for you on ppcle on ext4 by running
> xfstests?  Light duty testing works for me but when I stress ext4 with
> pagesize==blocksize on ppcle64 via xfstests, it blows up.  I suspect
> (but am not sure) it's due to (non-upstream) device driver issues, and
> a verification that you can run xfstests on your ppcle64 systems using
> standard upstream device drivers would be very helpful, since I don't
> have easy console access on the machines I have access to at
> $WORK.  :-(

I fired off an xfstests run, and it looks good. There are 3 failures,
but they seem to be setup issues on my part. I also double checked
those same three failed on 4.8.

Chandan has been running the test suite regularly, and plans to do a
run against mainline too.

Anton

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: ext4 filesystem corruption with 4.10-rc2 on ppc64le
  2017-01-04 15:28   ` Theodore Ts'o
  2017-01-04 16:23     ` Jens Axboe
  2017-01-05 10:44       ` Anton Blanchard
@ 2017-01-09  4:10     ` Chandan Rajendra
  2 siblings, 0 replies; 13+ messages in thread
From: Chandan Rajendra @ 2017-01-09  4:10 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Anton Blanchard, jack, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Stephen Rothwell, axboe, linuxppc-dev,
	linux-kernel, linux-ext4, linux-fsdevel, Jens Axboe, torvalds

On Wednesday, January 04, 2017 10:28:37 AM Theodore Ts'o wrote:
> On Wed, Jan 04, 2017 at 11:32:42AM +0530, Chandan Rajendra wrote:
> > On Wednesday, January 04, 2017 04:18:08 PM Anton Blanchard wrote:
> > > I'm consistently seeing ext4 filesystem corruption using a mainline
> > > kernel. It doesn't take much to trigger it - download a ppc64le Ubuntu
> > > cloud image, boot it in KVM and run:
> > > 
> > > sudo apt-get update
> > > sudo apt-get dist-upgrade
> > > sudo reboot
> > > 
> > > And it never makes it back up, dying with rather severe filesystem
> > > corruption.
> > 
> > The patch at https://patchwork.kernel.org/patch/9488235/ should fix the
> > bug.
> 
> It looks like this patch is already queued up on the "for-linus"
> branch on the linux-block.git tree.
> 
> Chandra, thanks for pointing this out!  I had missed your e-mail from
> Christmas day, and it was on my todo list to figure out why I was
> seeing lots of 1k block regressions on gce-xfstests post-merge window
> that wasn't showing up on the ext4.git tree before I sent my pull
> request to Linus.
> 
> Jens, could you expedite a pull request to Linus?  This is affecting
> ext4 on 1k block file systems on x86/x86_64, so this is not a ppc-only
> regression.  
> 
> Anton or Chandan, could you do me a favor and verify whether or not
> 64k block sizes are working for you on ppcle on ext4 by running
> xfstests?  Light duty testing works for me but when I stress ext4 with
> pagesize==blocksize on ppcle64 via xfstests, it blows up.  I suspect
> (but am not sure) it's due to (non-upstream) device driver issues, and
> a verification that you can run xfstests on your ppcle64 systems using
> standard upstream device drivers would be very helpful, since I don't
> have easy console access on the machines I have access to at $WORK.  :-(

Hi Ted,

I found one regression w.r.t 64k blocksize. I posted a patch
(http://marc.info/?l=linux-block&m=148388687722745&w=2) to fix the issue. 

-- 
chandan

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2017-01-09  4:10 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-04  5:18 ext4 filesystem corruption with 4.10-rc2 on ppc64le Anton Blanchard
2017-01-04  5:18 ` Anton Blanchard
2017-01-04  6:02 ` Chandan Rajendra
2017-01-04  6:02   ` Chandan Rajendra
2017-01-04 15:28   ` Theodore Ts'o
2017-01-04 16:23     ` Jens Axboe
2017-01-04 18:09       ` Linus Torvalds
2017-01-04 18:09         ` Linus Torvalds
2017-01-05 10:44     ` Anton Blanchard
2017-01-05 10:44       ` Anton Blanchard
2017-01-09  4:10     ` Chandan Rajendra
2017-01-04  7:34 ` luigi burdo
2017-01-04 15:09 ` Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.