linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* ext3 file system livelock and file system corruption, 4.9.166 stable kernel
@ 2019-04-02 10:08 Jari Ruusu
  2019-04-02 10:35 ` Greg Kroah-Hartman
  2019-04-02 13:06 ` zhangyi (F)
  0 siblings, 2 replies; 6+ messages in thread
From: Jari Ruusu @ 2019-04-02 10:08 UTC (permalink / raw)
  To: Greg Kroah-Hartman; +Cc: zhangyi (F), Theodore Ts'o, Jan Kara, linux-kernel

To trigger this ext4 file system bug, you need a sparse file with
correct sparse pattern on old-school ext3 file system. I tried
more simpler ways to trigger this but those attempts did not
trigger the bug. I have provided compressed sparse file that
reliably triggers the bug. Size of compressed sparse file 1667256
bytes. Size of uncompressed sparse file 7369850880 bytes.
Following commands will demo the problem.

  wget http://www.elisanet.fi/jariruusu/123/sparse-demo.data.xz
  xz -d sparse-demo.data.xz
  mkfs -t ext3 -b 4096 -e remount-ro -O "^dir_index" /dev/sdc1
  mount -t ext3 /dev/sdc1 /mnt
  cp -v --sparse=always sparse-demo.data /mnt/aa
  cp -v --sparse=always sparse-demo.data /mnt/bb
  umount /mnt
  mount -t ext3 /dev/sdc1 /mnt
  cp -v --sparse=always /mnt/bb /mnt/aa

That last cp command reliably triggers the bug that livelocks and
after reset you have file system corruption to deal with. Deeply
unfunny.

The bug is caused by
"ext4: brelse all indirect buffer in ext4_ind_remove_space()"
upstream commit 674a2b27234d1b7afcb0a9162e81b2e53aeef217, from
<yi.zhang@huawei.com>, who provided a follow-up patch
"ext4: cleanup bh release code in ext4_ind_remove_space()"
upstream commit 5e86bdda41534e17621d5a071b294943cae4376e. The
problem with that follow-up patch is that it is almost criminally
mislabeled. It should have said "fixes ext3 livelock and file
system corrupting bug" or something like that, so that Greg KH &
Co would have understood that it must be backported to stable
kernels too. Now the bug appears to be in all/most stable kernels
already.

Below is the buggy patch that causes the problem. Look at those
new while loops. Once the while condition is true once, it is
ALWAYS true, so it livelocks.

> --- a/fs/ext4/indirect.c
> +++ b/fs/ext4/indirect.c
> @@ -1385,10 +1385,14 @@ end_range:
>  					   partial->p + 1,
>  					   partial2->p,
>  					   (chain+n-1) - partial);
> -			BUFFER_TRACE(partial->bh, "call brelse");
> -			brelse(partial->bh);
> -			BUFFER_TRACE(partial2->bh, "call brelse");
> -			brelse(partial2->bh);
> +			while (partial > chain) {
> +				BUFFER_TRACE(partial->bh, "call brelse");
> +				brelse(partial->bh);
> +			}
> +			while (partial2 > chain2) {
> +				BUFFER_TRACE(partial2->bh, "call brelse");
> +				brelse(partial2->bh);
> +			}
>  			return 0;
>  		}
>

Greg & Co,
Please revert that above patch from stable kernels or backport the
follow-up patch that fixes the problem.

-- 
Jari Ruusu  4096R/8132F189 12D6 4C3A DCDA 0AA4 27BD  ACDF F073 3C80 8132 F189

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: ext3 file system livelock and file system corruption, 4.9.166 stable kernel
  2019-04-02 10:08 ext3 file system livelock and file system corruption, 4.9.166 stable kernel Jari Ruusu
@ 2019-04-02 10:35 ` Greg Kroah-Hartman
  2019-04-02 15:10   ` Jan Kara
  2019-04-02 16:15   ` Theodore Ts'o
  2019-04-02 13:06 ` zhangyi (F)
  1 sibling, 2 replies; 6+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-02 10:35 UTC (permalink / raw)
  To: Jari Ruusu; +Cc: zhangyi (F), Theodore Ts'o, Jan Kara, linux-kernel

On Tue, Apr 02, 2019 at 01:08:45PM +0300, Jari Ruusu wrote:
> To trigger this ext4 file system bug, you need a sparse file with
> correct sparse pattern on old-school ext3 file system. I tried
> more simpler ways to trigger this but those attempts did not
> trigger the bug. I have provided compressed sparse file that
> reliably triggers the bug. Size of compressed sparse file 1667256
> bytes. Size of uncompressed sparse file 7369850880 bytes.
> Following commands will demo the problem.
> 
>   wget http://www.elisanet.fi/jariruusu/123/sparse-demo.data.xz
>   xz -d sparse-demo.data.xz
>   mkfs -t ext3 -b 4096 -e remount-ro -O "^dir_index" /dev/sdc1
>   mount -t ext3 /dev/sdc1 /mnt
>   cp -v --sparse=always sparse-demo.data /mnt/aa
>   cp -v --sparse=always sparse-demo.data /mnt/bb
>   umount /mnt
>   mount -t ext3 /dev/sdc1 /mnt
>   cp -v --sparse=always /mnt/bb /mnt/aa
> 
> That last cp command reliably triggers the bug that livelocks and
> after reset you have file system corruption to deal with. Deeply
> unfunny.
> 
> The bug is caused by
> "ext4: brelse all indirect buffer in ext4_ind_remove_space()"
> upstream commit 674a2b27234d1b7afcb0a9162e81b2e53aeef217, from
> <yi.zhang@huawei.com>, who provided a follow-up patch
> "ext4: cleanup bh release code in ext4_ind_remove_space()"
> upstream commit 5e86bdda41534e17621d5a071b294943cae4376e. The
> problem with that follow-up patch is that it is almost criminally
> mislabeled. It should have said "fixes ext3 livelock and file
> system corrupting bug" or something like that, so that Greg KH &
> Co would have understood that it must be backported to stable
> kernels too. Now the bug appears to be in all/most stable kernels
> already.
> 
> Below is the buggy patch that causes the problem. Look at those
> new while loops. Once the while condition is true once, it is
> ALWAYS true, so it livelocks.
> 
> > --- a/fs/ext4/indirect.c
> > +++ b/fs/ext4/indirect.c
> > @@ -1385,10 +1385,14 @@ end_range:
> >  					   partial->p + 1,
> >  					   partial2->p,
> >  					   (chain+n-1) - partial);
> > -			BUFFER_TRACE(partial->bh, "call brelse");
> > -			brelse(partial->bh);
> > -			BUFFER_TRACE(partial2->bh, "call brelse");
> > -			brelse(partial2->bh);
> > +			while (partial > chain) {
> > +				BUFFER_TRACE(partial->bh, "call brelse");
> > +				brelse(partial->bh);
> > +			}
> > +			while (partial2 > chain2) {
> > +				BUFFER_TRACE(partial2->bh, "call brelse");
> > +				brelse(partial2->bh);
> > +			}
> >  			return 0;
> >  		}
> >
> 
> Greg & Co,
> Please revert that above patch from stable kernels or backport the
> follow-up patch that fixes the problem.

So you need 5e86bdda4153 ("ext4: cleanup bh release code in
ext4_ind_remove_space()") applied to all of the stable and LTS kernels
at the moment (as that patch only showed up in 5.1-rc1)?

If so, I need an ack from the ext4 developers/maintainer to do so.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: ext3 file system livelock and file system corruption, 4.9.166 stable kernel
  2019-04-02 10:08 ext3 file system livelock and file system corruption, 4.9.166 stable kernel Jari Ruusu
  2019-04-02 10:35 ` Greg Kroah-Hartman
@ 2019-04-02 13:06 ` zhangyi (F)
  1 sibling, 0 replies; 6+ messages in thread
From: zhangyi (F) @ 2019-04-02 13:06 UTC (permalink / raw)
  To: Jari Ruusu, Greg Kroah-Hartman; +Cc: Theodore Ts'o, Jan Kara, linux-kernel

Hi Jari,

Sorry about introduce this livelocks bug. The patch 674a2b272 ("ext4:
brelse all indirect buffer in ext4_ind_remove_space()") want to fix a
buffer leak problem. The follow-up patch 5e86bdda415 ("ext4: cleanup
bh release code in ext4_ind_remove_space()") was just want to do some
cleanup stuff originally, it was seperate from the first patch [*] in
the v2 iteration. But I forget to do decrease the partial and partial2
pointers in the first patch when doing seperate job, sorry again.
Fortunately, the second patch can fix the livelocks bug, so the upstream
is fine.

Hi Greg, backport the second cleanup patch can fix the bug, or I can
post a individual fix patch if you want.

Thanks,
Yi.

[*] https://www.spinics.net/lists/linux-ext4/msg64668.html

On 2019/4/2 18:08, Jari Ruusu Wrote:
> To trigger this ext4 file system bug, you need a sparse file with
> correct sparse pattern on old-school ext3 file system. I tried
> more simpler ways to trigger this but those attempts did not
> trigger the bug. I have provided compressed sparse file that
> reliably triggers the bug. Size of compressed sparse file 1667256
> bytes. Size of uncompressed sparse file 7369850880 bytes.
> Following commands will demo the problem.
> 
>   wget http://www.elisanet.fi/jariruusu/123/sparse-demo.data.xz
>   xz -d sparse-demo.data.xz
>   mkfs -t ext3 -b 4096 -e remount-ro -O "^dir_index" /dev/sdc1
>   mount -t ext3 /dev/sdc1 /mnt
>   cp -v --sparse=always sparse-demo.data /mnt/aa
>   cp -v --sparse=always sparse-demo.data /mnt/bb
>   umount /mnt
>   mount -t ext3 /dev/sdc1 /mnt
>   cp -v --sparse=always /mnt/bb /mnt/aa
> 
> That last cp command reliably triggers the bug that livelocks and
> after reset you have file system corruption to deal with. Deeply
> unfunny.
> 
> The bug is caused by
> "ext4: brelse all indirect buffer in ext4_ind_remove_space()"
> upstream commit 674a2b27234d1b7afcb0a9162e81b2e53aeef217, from
> <yi.zhang@huawei.com>, who provided a follow-up patch
> "ext4: cleanup bh release code in ext4_ind_remove_space()"
> upstream commit 5e86bdda41534e17621d5a071b294943cae4376e. The
> problem with that follow-up patch is that it is almost criminally
> mislabeled. It should have said "fixes ext3 livelock and file
> system corrupting bug" or something like that, so that Greg KH &
> Co would have understood that it must be backported to stable
> kernels too. Now the bug appears to be in all/most stable kernels
> already.
> 
> Below is the buggy patch that causes the problem. Look at those
> new while loops. Once the while condition is true once, it is
> ALWAYS true, so it livelocks.
> 
>> --- a/fs/ext4/indirect.c
>> +++ b/fs/ext4/indirect.c
>> @@ -1385,10 +1385,14 @@ end_range:
>>  					   partial->p + 1,
>>  					   partial2->p,
>>  					   (chain+n-1) - partial);
>> -			BUFFER_TRACE(partial->bh, "call brelse");
>> -			brelse(partial->bh);
>> -			BUFFER_TRACE(partial2->bh, "call brelse");
>> -			brelse(partial2->bh);
>> +			while (partial > chain) {
>> +				BUFFER_TRACE(partial->bh, "call brelse");
>> +				brelse(partial->bh);
>> +			}
>> +			while (partial2 > chain2) {
>> +				BUFFER_TRACE(partial2->bh, "call brelse");
>> +				brelse(partial2->bh);
>> +			}
>>  			return 0;
>>  		}
>>
> 
> Greg & Co,
> Please revert that above patch from stable kernels or backport the
> follow-up patch that fixes the problem.
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: ext3 file system livelock and file system corruption, 4.9.166 stable kernel
  2019-04-02 10:35 ` Greg Kroah-Hartman
@ 2019-04-02 15:10   ` Jan Kara
  2019-04-02 16:15   ` Theodore Ts'o
  1 sibling, 0 replies; 6+ messages in thread
From: Jan Kara @ 2019-04-02 15:10 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Jari Ruusu, zhangyi (F), Theodore Ts'o, Jan Kara, linux-kernel

On Tue 02-04-19 12:35:07, Greg Kroah-Hartman wrote:
> On Tue, Apr 02, 2019 at 01:08:45PM +0300, Jari Ruusu wrote:
> > To trigger this ext4 file system bug, you need a sparse file with
> > correct sparse pattern on old-school ext3 file system. I tried
> > more simpler ways to trigger this but those attempts did not
> > trigger the bug. I have provided compressed sparse file that
> > reliably triggers the bug. Size of compressed sparse file 1667256
> > bytes. Size of uncompressed sparse file 7369850880 bytes.
> > Following commands will demo the problem.
> > 
> >   wget http://www.elisanet.fi/jariruusu/123/sparse-demo.data.xz
> >   xz -d sparse-demo.data.xz
> >   mkfs -t ext3 -b 4096 -e remount-ro -O "^dir_index" /dev/sdc1
> >   mount -t ext3 /dev/sdc1 /mnt
> >   cp -v --sparse=always sparse-demo.data /mnt/aa
> >   cp -v --sparse=always sparse-demo.data /mnt/bb
> >   umount /mnt
> >   mount -t ext3 /dev/sdc1 /mnt
> >   cp -v --sparse=always /mnt/bb /mnt/aa
> > 
> > That last cp command reliably triggers the bug that livelocks and
> > after reset you have file system corruption to deal with. Deeply
> > unfunny.
> > 
> > The bug is caused by
> > "ext4: brelse all indirect buffer in ext4_ind_remove_space()"
> > upstream commit 674a2b27234d1b7afcb0a9162e81b2e53aeef217, from
> > <yi.zhang@huawei.com>, who provided a follow-up patch
> > "ext4: cleanup bh release code in ext4_ind_remove_space()"
> > upstream commit 5e86bdda41534e17621d5a071b294943cae4376e. The
> > problem with that follow-up patch is that it is almost criminally
> > mislabeled. It should have said "fixes ext3 livelock and file
> > system corrupting bug" or something like that, so that Greg KH &
> > Co would have understood that it must be backported to stable
> > kernels too. Now the bug appears to be in all/most stable kernels
> > already.
> > 
> > Below is the buggy patch that causes the problem. Look at those
> > new while loops. Once the while condition is true once, it is
> > ALWAYS true, so it livelocks.
> > 
> > > --- a/fs/ext4/indirect.c
> > > +++ b/fs/ext4/indirect.c
> > > @@ -1385,10 +1385,14 @@ end_range:
> > >  					   partial->p + 1,
> > >  					   partial2->p,
> > >  					   (chain+n-1) - partial);
> > > -			BUFFER_TRACE(partial->bh, "call brelse");
> > > -			brelse(partial->bh);
> > > -			BUFFER_TRACE(partial2->bh, "call brelse");
> > > -			brelse(partial2->bh);
> > > +			while (partial > chain) {
> > > +				BUFFER_TRACE(partial->bh, "call brelse");
> > > +				brelse(partial->bh);
> > > +			}
> > > +			while (partial2 > chain2) {
> > > +				BUFFER_TRACE(partial2->bh, "call brelse");
> > > +				brelse(partial2->bh);
> > > +			}
> > >  			return 0;
> > >  		}
> > >
> > 
> > Greg & Co,
> > Please revert that above patch from stable kernels or backport the
> > follow-up patch that fixes the problem.
> 
> So you need 5e86bdda4153 ("ext4: cleanup bh release code in
> ext4_ind_remove_space()") applied to all of the stable and LTS kernels
> at the moment (as that patch only showed up in 5.1-rc1)?
> 
> If so, I need an ack from the ext4 developers/maintainer to do so.

Ack from me, and sorry for missing this brown paper bag bug during
review...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: ext3 file system livelock and file system corruption, 4.9.166 stable kernel
  2019-04-02 10:35 ` Greg Kroah-Hartman
  2019-04-02 15:10   ` Jan Kara
@ 2019-04-02 16:15   ` Theodore Ts'o
  2019-04-03 13:59     ` Greg Kroah-Hartman
  1 sibling, 1 reply; 6+ messages in thread
From: Theodore Ts'o @ 2019-04-02 16:15 UTC (permalink / raw)
  To: Greg Kroah-Hartman; +Cc: Jari Ruusu, zhangyi (F), Jan Kara, linux-kernel

On Tue, Apr 02, 2019 at 12:35:07PM +0200, Greg Kroah-Hartman wrote:
> So you need 5e86bdda4153 ("ext4: cleanup bh release code in
> ext4_ind_remove_space()") applied to all of the stable and LTS kernels
> at the moment (as that patch only showed up in 5.1-rc1)?
> 
> If so, I need an ack from the ext4 developers/maintainer to do so.

Acked-by: Theodore Ts'o <tytso@mit.edu>

						- Ted

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: ext3 file system livelock and file system corruption, 4.9.166 stable kernel
  2019-04-02 16:15   ` Theodore Ts'o
@ 2019-04-03 13:59     ` Greg Kroah-Hartman
  0 siblings, 0 replies; 6+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-03 13:59 UTC (permalink / raw)
  To: Theodore Ts'o, Jari Ruusu, zhangyi (F), Jan Kara, linux-kernel

On Tue, Apr 02, 2019 at 12:15:58PM -0400, Theodore Ts'o wrote:
> On Tue, Apr 02, 2019 at 12:35:07PM +0200, Greg Kroah-Hartman wrote:
> > So you need 5e86bdda4153 ("ext4: cleanup bh release code in
> > ext4_ind_remove_space()") applied to all of the stable and LTS kernels
> > at the moment (as that patch only showed up in 5.1-rc1)?
> > 
> > If so, I need an ack from the ext4 developers/maintainer to do so.
> 
> Acked-by: Theodore Ts'o <tytso@mit.edu>

Thanks for all of the responses here, patch is now queued up.

greg k-h

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-04-03 13:59 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-02 10:08 ext3 file system livelock and file system corruption, 4.9.166 stable kernel Jari Ruusu
2019-04-02 10:35 ` Greg Kroah-Hartman
2019-04-02 15:10   ` Jan Kara
2019-04-02 16:15   ` Theodore Ts'o
2019-04-03 13:59     ` Greg Kroah-Hartman
2019-04-02 13:06 ` zhangyi (F)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).