All of lore.kernel.org
 help / color / mirror / Atom feed
* regression: 4.13 cannot follow symlinks on some ext3 fs
@ 2017-11-23 20:33 Andi Kleen
  2017-11-23 22:23 ` Theodore Ts'o
  0 siblings, 1 reply; 21+ messages in thread
From: Andi Kleen @ 2017-11-23 20:33 UTC (permalink / raw)
  To: tahsin, adilger, tytso; +Cc: linux-kernel, linux-fsdevel


Hi,

I have an older qemu VM image that i sometimes use for testing. It
stopped booting with 4.13-4.14 because it couldn't run init.  
It uses ext3 for the root file system.

I instrumented the code and found that it failed to follow the 
/lib/ld-linux.so.2 -> ld-2.3.6.so symlink for init's ELF interpreter. 

I bisected it to down to

commit 407cd7fb83c0ebabb490190e673d8c71ee7df97e (refs/bisect/bad)
Author: Tahsin Erdogan <tahsin@google.com>
Date:   Tue Jul 4 00:11:21 2017 -0400

    ext4: change fast symlink test to not rely on i_blocks

when I revert this commit 4.14 my VM runs fine again.

Dump of the inode in debugfs: 

debugfs:  Inode: 1767   Type: symlink    Mode:  0777   Flags: 0x0
Generation: 0
User:     0   Group:     0   Size: 11
File ACL: 0    Directory ACL: 0
Links: 1   Blockcount: 8
Fragment:  Address: 0    Number: 0    Size: 0
ctime: 0x45ad7ba0 -- Wed Jan 17 01:28:00 2007
atime: 0x5a164be5 -- Thu Nov 23 04:17:41 2017
mtime: 0x45ad7ba0 -- Wed Jan 17 01:28:00 2007
BLOCKS:
(0):11006
TOTAL: 1

-Andi

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: regression: 4.13 cannot follow symlinks on some ext3 fs
  2017-11-23 20:33 regression: 4.13 cannot follow symlinks on some ext3 fs Andi Kleen
@ 2017-11-23 22:23 ` Theodore Ts'o
  2017-11-23 23:31   ` Andi Kleen
  0 siblings, 1 reply; 21+ messages in thread
From: Theodore Ts'o @ 2017-11-23 22:23 UTC (permalink / raw)
  To: Andi Kleen; +Cc: tahsin, adilger, linux-kernel, linux-fsdevel

On Thu, Nov 23, 2017 at 12:33:30PM -0800, Andi Kleen wrote:
> 
> I have an older qemu VM image that i sometimes use for testing. It
> stopped booting with 4.13-4.14 because it couldn't run init.  
> It uses ext3 for the root file system.

Hmm, do you know roughly when (what krenel version) this image was
created?  We had done quite a lot of research and the belief was
kernels never would create a "slow" symlink which was less than 60
bytes.

Or was this image something that was created manually (e.g., using debugfs)?

       	    	  	    	     	     	      - Ted

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: regression: 4.13 cannot follow symlinks on some ext3 fs
  2017-11-23 22:23 ` Theodore Ts'o
@ 2017-11-23 23:31   ` Andi Kleen
  2017-11-24  0:41     ` Andreas Dilger
  0 siblings, 1 reply; 21+ messages in thread
From: Andi Kleen @ 2017-11-23 23:31 UTC (permalink / raw)
  To: Theodore Ts'o, Andi Kleen, tahsin, adilger, linux-kernel,
	linux-fsdevel

On Thu, Nov 23, 2017 at 05:23:17PM -0500, Theodore Ts'o wrote:
> On Thu, Nov 23, 2017 at 12:33:30PM -0800, Andi Kleen wrote:
> > 
> > I have an older qemu VM image that i sometimes use for testing. It
> > stopped booting with 4.13-4.14 because it couldn't run init.  
> > It uses ext3 for the root file system.
> 
> Hmm, do you know roughly when (what krenel version) this image was
> created?  We had done quite a lot of research and the belief was
> kernels never would create a "slow" symlink which was less than 60
> bytes.

The date of the inode is from 2007, the original kernel was 2.6.17
with a 32bit kernel.

> Or was this image something that was created manually (e.g., using debugfs)?

No, it was installed.

-Andi

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: regression: 4.13 cannot follow symlinks on some ext3 fs
  2017-11-23 23:31   ` Andi Kleen
@ 2017-11-24  0:41     ` Andreas Dilger
  2017-11-24  2:04       ` Andi Kleen
  0 siblings, 1 reply; 21+ messages in thread
From: Andreas Dilger @ 2017-11-24  0:41 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Theodore Ts'o, tahsin, linux-kernel, linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 1205 bytes --]

On Nov 23, 2017, at 4:31 PM, Andi Kleen <andi@firstfloor.org> wrote:
> 
> On Thu, Nov 23, 2017 at 05:23:17PM -0500, Theodore Ts'o wrote:
>> On Thu, Nov 23, 2017 at 12:33:30PM -0800, Andi Kleen wrote:
>>> 
>>> I have an older qemu VM image that i sometimes use for testing. It
>>> stopped booting with 4.13-4.14 because it couldn't run init.
>>> It uses ext3 for the root file system.
>> 
>> Hmm, do you know roughly when (what krenel version) this image was
>> created?  We had done quite a lot of research and the belief was
>> kernels never would create a "slow" symlink which was less than 60
>> bytes.
> 
> The date of the inode is from 2007, the original kernel was 2.6.17
> with a 32bit kernel.
> 
>> Or was this image something that was created manually (e.g., using debugfs)?
> 
> No, it was installed.

As a workaround, you could delete and recreate the symlink with the new
kernel to create a proper fast symlink.  It would be useful to scan the
image to see if there are other similar symlinks present:

    find /myth/tmp -type l -size -60 -ls | awk '$2 != 0 { print }'

This is probably something that e2fsck should check for and fix.

Cheers, Andreas






[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: regression: 4.13 cannot follow symlinks on some ext3 fs
  2017-11-24  0:41     ` Andreas Dilger
@ 2017-11-24  2:04       ` Andi Kleen
  2017-11-24  6:12         ` Andreas Dilger
  0 siblings, 1 reply; 21+ messages in thread
From: Andi Kleen @ 2017-11-24  2:04 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: Andi Kleen, Theodore Ts'o, tahsin, linux-kernel, linux-fsdevel

> As a workaround, you could delete and recreate the symlink with the new

I revert the patch for now. Everything seems to work.

> kernel to create a proper fast symlink.  It would be useful to scan the
> image to see if there are other similar symlinks present:
> 
>     find /myth/tmp -type l -size -60 -ls | awk '$2 != 0 { print }'

Doesn't find anything. Your recipe must be wrong.
> 
> This is probably something that e2fsck should check for and fix.

Nah the kernel should just support it like it always did.

-Andi

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: regression: 4.13 cannot follow symlinks on some ext3 fs
  2017-11-24  2:04       ` Andi Kleen
@ 2017-11-24  6:12         ` Andreas Dilger
  2017-11-24 16:51           ` Andi Kleen
  0 siblings, 1 reply; 21+ messages in thread
From: Andreas Dilger @ 2017-11-24  6:12 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Theodore Ts'o, Tahsin Erdogan, Linux Kernel Mailing List,
	linux-fsdevel, linux-ext4

[-- Attachment #1: Type: text/plain, Size: 2968 bytes --]

On Nov 23, 2017, at 7:04 PM, Andi Kleen <andi@firstfloor.org> wrote:
> 
>> As a workaround, you could delete and recreate the symlink with the new
> 
> I revert the patch for now. Everything seems to work.
> 
>> kernel to create a proper fast symlink.  It would be useful to scan
>> the image to see if there are other similar symlinks present:
>> 
>>    find /myth/tmp -type l -size -60 -ls | awk '$2 != 0 { print }'
> 
> Doesn't find anything. Your recipe must be wrong.

I see that I should have used "-60c" to properly limit the listing to
short symlinks, but this doesn't appear to be the core problem.  It
looks like there is a bug in find (at least version 4.4.2 that I'm
testing with) that it doesn't print the blocks count properly.

According to find(1) the "-ls" argument should list the file the same
as "ls -dils" format (blocks is $2), but as shown below "find -ls"
prints "0" for blocks when it should be "4" (for a long symlink using
"+60c" in my example, I couldn't find any short+external symlinks on a
couple of 8 year old root filesystems):

$ find /etc/alternatives/rmid -type l -size +60c -ls
327877 0 lrwxrwxrwx 1 root root 73 Jan  4  2017 /etc/alternatives/rmid -> /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.111-0.b15.el6_8.x86_64/jre/bin/rmid

$ ls -dils /etc/alternatives/rmid
327877 4 lrwxrwxrwx 1 root root 73 Jan  4  2017 /etc/alternatives/rmid -> /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.111-0.b15.el6_8.x86_64/jre/bin/rmid*


Try the following command instead:

find / -type l -size -60c -print0 | xargs -0r ls -dils | awk '$2 != 0 { print }'


>> This is probably something that e2fsck should check for and fix.
> 
> Nah the kernel should just support it like it always did.

The reason we changed this code in the first place was because the
old check would repeatedly break when some new reason for storing
blocks on a symlink appeared.  It broke when xattrs were allowed
on symlinks for SELinux.  It broke when bigalloc blocks were added.
It broke when inline_data was added, and it would have broken (and
been really hard to fix efficiently) when large xattrs were added.

We checked old kernels, and old e2fsprogs, and didn't see any cases
where fast (<= 60 chars) symlinks were created using external blocks.
It seems that _something_ did create them, and it would be good to
figure that out so we can determine if it is a widespread problem.

I think e2fsck can fix this quite easily, and there really isn't
an easy way to revert to the old method if the large xattr feature
is enabled.  If you are willing to run a new kernel, you should also
be willing to run a new e2fsck.

We could probably add a fallback to the old mechanism (and print
a one-time warning to upgrade to a newer e2fsck) if an external fast
symlink is found and the large xattr  feature is not enabled, which
would give more time to fix this (hopefully rare in the wild) case.

Cheers, Andreas






[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: regression: 4.13 cannot follow symlinks on some ext3 fs
  2017-11-24  6:12         ` Andreas Dilger
@ 2017-11-24 16:51           ` Andi Kleen
  2017-11-24 22:03             ` Andreas Dilger
  2017-11-25  3:54             ` Theodore Ts'o
  0 siblings, 2 replies; 21+ messages in thread
From: Andi Kleen @ 2017-11-24 16:51 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: Andi Kleen, Theodore Ts'o, Tahsin Erdogan,
	Linux Kernel Mailing List, linux-fsdevel, linux-ext4

> We checked old kernels, and old e2fsprogs, and didn't see any cases
> where fast (<= 60 chars) symlinks were created using external blocks.
> It seems that _something_ did create them, and it would be good to
> figure that out so we can determine if it is a widespread problem

I assume it was the original kernel. 

> 
> I think e2fsck can fix this quite easily, and there really isn't
> an easy way to revert to the old method if the large xattr feature
> is enabled.  If you are willing to run a new kernel, you should also
> be willing to run a new e2fsck.

It's obviously not enabled on ext3.

> 
> We could probably add a fallback to the old mechanism (and print
> a one-time warning to upgrade to a newer e2fsck) if an external fast
> symlink is found and the large xattr  feature is not enabled, which
> would give more time to fix this (hopefully rare in the wild) case.

If the old kernel created it, then likely all the
/lib{,64}/ld-linux.so.2 symlinks have that, which breaks all ELF 
executables. I suspect in these old file systems it's not particularly rare.

So I don't think you can just break them all.

I think it's ok to only handle it when the large xattrs are disabled.

Requiring new e2fsck on old systems is a bad idea.

-Andi

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: regression: 4.13 cannot follow symlinks on some ext3 fs
  2017-11-24 16:51           ` Andi Kleen
@ 2017-11-24 22:03             ` Andreas Dilger
  2017-11-24 22:28               ` James Bottomley
                                 ` (3 more replies)
  2017-11-25  3:54             ` Theodore Ts'o
  1 sibling, 4 replies; 21+ messages in thread
From: Andreas Dilger @ 2017-11-24 22:03 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Theodore Ts'o, Tahsin Erdogan, Linux Kernel Mailing List,
	linux-fsdevel, linux-ext4

[-- Attachment #1: Type: text/plain, Size: 2674 bytes --]

On Nov 24, 2017, at 9:51 AM, Andi Kleen <andi@firstfloor.org> wrote:
> 
>> We checked old kernels, and old e2fsprogs, and didn't see any cases
>> where fast (<= 60 chars) symlinks were created using external blocks.
>> It seems that _something_ did create them, and it would be good to
>> figure that out so we can determine if it is a widespread problem
> 
> I assume it was the original kernel.
> 
>> 
>> I think e2fsck can fix this quite easily, and there really isn't
>> an easy way to revert to the old method if the large xattr feature
>> is enabled.  If you are willing to run a new kernel, you should also
>> be willing to run a new e2fsck.
> 
> It's obviously not enabled on ext3.
> 
>> We could probably add a fallback to the old mechanism (and print
>> a one-time warning to upgrade to a newer e2fsck) if an external fast
>> symlink is found and the large xattr  feature is not enabled, which
>> would give more time to fix this (hopefully rare in the wild) case.
> 
> If the old kernel created it, then likely all the
> /lib{,64}/ld-linux.so.2 symlinks have that, which breaks all ELF
> executables. I suspect in these old file systems it's not particularly rare.

Sure, but not many people are going to be running a 4.14 kernel with
a 2007 system.  Could you please run the updated find command to see
whether this is an isolated case, or if it is a common case:

find / -type l -size -60c -print0 | xargs -0r ls -dils | awk '$2 != 0 { print }'

It would also be useful if anyone else reading this that has an old
system (2005-2011 install date) ran the same to see if any such
symlinks are found.  To see when the root filesystem was created, run:

dumpe2fs -h $(df -P / | awk '/dev/ { print $1 }') 2>&1 | grep created

> So I don't think you can just break them all.

Sure.  As previously mentioned, it shouldn't have broken *any* systems
based on our prior investigation, I'm just trying to see how bad the
problem really is.  Like I said, a workaround (without need to patch
the kernel, and that is compatible with old and new kernels) is:

find / -type l -size -60c -print0 | xargs -0r ls -dils | awk '$2 != 0 { print }' |
    while read L; do ln -sfv "$(ls -l "$L" | sed -e 's/.*-> //')" "$L"; done

This just recreates any problematic symlinks in place, which should make
it a proper fast symlink.

> I think it's ok to only handle it when the large xattrs are disabled.
> 
> Requiring new e2fsck on old systems is a bad idea.

Any worse an idea than running a new kernel on an old system?
Newer e2fsck fixes a lot of bugs that are present in older
e2fsck as well...

Cheers, Andreas






[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: regression: 4.13 cannot follow symlinks on some ext3 fs
  2017-11-24 22:03             ` Andreas Dilger
@ 2017-11-24 22:28               ` James Bottomley
  2017-11-25  1:42               ` Andi Kleen
                                 ` (2 subsequent siblings)
  3 siblings, 0 replies; 21+ messages in thread
From: James Bottomley @ 2017-11-24 22:28 UTC (permalink / raw)
  To: Andreas Dilger, Andi Kleen
  Cc: Theodore Ts'o, Tahsin Erdogan, Linux Kernel Mailing List,
	linux-fsdevel, linux-ext4

[-- Attachment #1: Type: text/plain, Size: 2175 bytes --]

On Fri, 2017-11-24 at 15:03 -0700, Andreas Dilger wrote:
> On Nov 24, 2017, at 9:51 AM, Andi Kleen <andi@firstfloor.org> wrote:
> > 
> > 
> > > 
> > > We checked old kernels, and old e2fsprogs, and didn't see any
> > > cases
> > > where fast (<= 60 chars) symlinks were created using external
> > > blocks.
> > > It seems that _something_ did create them, and it would be good
> > > to
> > > figure that out so we can determine if it is a widespread problem
> > 
> > I assume it was the original kernel.
> > 
> > > 
> > > 
> > > I think e2fsck can fix this quite easily, and there really isn't
> > > an easy way to revert to the old method if the large xattr
> > > feature
> > > is enabled.  If you are willing to run a new kernel, you should
> > > also
> > > be willing to run a new e2fsck.
> > 
> > It's obviously not enabled on ext3.
> > 
> > > 
> > > We could probably add a fallback to the old mechanism (and print
> > > a one-time warning to upgrade to a newer e2fsck) if an external
> > > fast symlink is found and the large xattr  feature is not
> > > enabled, which would give more time to fix this (hopefully rare
> > > in the wild) case.
> > 
> > If the old kernel created it, then likely all the
> > /lib{,64}/ld-linux.so.2 symlinks have that, which breaks all ELF
> > executables. I suspect in these old file systems it's not
> > particularly rare.
> 
> Sure, but not many people are going to be running a 4.14 kernel with
> a 2007 system. 

I really disagree on this ... most of us who are doing kernel testing
will be running with older systems.  It's true, some of us do install
from scratch and then test, but most of us upgrade (which doesn't
necessarily modify the symlinks).  On your creation test, this is my
cloud system:

bedivere:~# dumpe2fs -h $(df -P / | awk '/dev/ { print $1 }') 2>&1 | grep created
Filesystem created:       Tue Mar 24 20:21:35 2009

Your find command turns up nothing untoward.

My older system is the home entertainment system, but that has an xfs
root dating back to 2005.

I bet I have a laptop even older (currently travelling, so can't
check).

James

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: regression: 4.13 cannot follow symlinks on some ext3 fs
  2017-11-24 22:03             ` Andreas Dilger
  2017-11-24 22:28               ` James Bottomley
@ 2017-11-25  1:42               ` Andi Kleen
  2017-11-25 22:32               ` Dave Chinner
  2017-12-04 16:35               ` Jan Kara
  3 siblings, 0 replies; 21+ messages in thread
From: Andi Kleen @ 2017-11-25  1:42 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: Andi Kleen, Theodore Ts'o, Tahsin Erdogan,
	Linux Kernel Mailing List, linux-fsdevel, linux-ext4

> Sure, but not many people are going to be running a 4.14 kernel with
> a 2007 system.  

It's not just root, but any disk. People could well have 10 year old
disks.

> Could you please run the updated find command to see
> whether this is an isolated case, or if it is a common case:
> 
> find / -type l -size -60c -print0 | xargs -0r ls -dils | awk '$2 != 0 { print }'

Pretty much all symlinks on / hit it. / has 1278 symlinks total, and 
1218 match the line above.

-Andi

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: regression: 4.13 cannot follow symlinks on some ext3 fs
  2017-11-24 16:51           ` Andi Kleen
  2017-11-24 22:03             ` Andreas Dilger
@ 2017-11-25  3:54             ` Theodore Ts'o
  1 sibling, 0 replies; 21+ messages in thread
From: Theodore Ts'o @ 2017-11-25  3:54 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Andreas Dilger, Tahsin Erdogan, Linux Kernel Mailing List,
	linux-fsdevel, linux-ext4

On Fri, Nov 24, 2017 at 08:51:02AM -0800, Andi Kleen wrote:
> > I think e2fsck can fix this quite easily, and there really isn't
> > an easy way to revert to the old method if the large xattr feature
> > is enabled.  If you are willing to run a new kernel, you should also
> > be willing to run a new e2fsck.
> 
> It's obviously not enabled on ext3.

Yes, I think it's clear we need to enable a backwards compatibility
support for ext3 file systems, or even all ext4 file systems that
don't have the large xattr feature.

We could have e2fsck offer to fix it, so long as it is being run
manually (e.g., not in preen mode), since it does have the benefit of
releasing unnecessarily allocated 4k blocks for symlinks which are <
60 bytes.

						- Ted

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: regression: 4.13 cannot follow symlinks on some ext3 fs
  2017-11-24 22:03             ` Andreas Dilger
  2017-11-24 22:28               ` James Bottomley
  2017-11-25  1:42               ` Andi Kleen
@ 2017-11-25 22:32               ` Dave Chinner
  2017-11-25 22:45                 ` Reindl Harald
  2017-11-26 15:40                 ` Theodore Ts'o
  2017-12-04 16:35               ` Jan Kara
  3 siblings, 2 replies; 21+ messages in thread
From: Dave Chinner @ 2017-11-25 22:32 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: Andi Kleen, Theodore Ts'o, Tahsin Erdogan,
	Linux Kernel Mailing List, linux-fsdevel, linux-ext4

On Fri, Nov 24, 2017 at 03:03:37PM -0700, Andreas Dilger wrote:
> On Nov 24, 2017, at 9:51 AM, Andi Kleen <andi@firstfloor.org> wrote:
> > 
> >> We checked old kernels, and old e2fsprogs, and didn't see any cases
> >> where fast (<= 60 chars) symlinks were created using external blocks.
> >> It seems that _something_ did create them, and it would be good to
> >> figure that out so we can determine if it is a widespread problem
> > 
> > I assume it was the original kernel.
> > 
> >> 
> >> I think e2fsck can fix this quite easily, and there really isn't
> >> an easy way to revert to the old method if the large xattr feature
> >> is enabled.  If you are willing to run a new kernel, you should also
> >> be willing to run a new e2fsck.
> > 
> > It's obviously not enabled on ext3.
> > 
> >> We could probably add a fallback to the old mechanism (and print
> >> a one-time warning to upgrade to a newer e2fsck) if an external fast
> >> symlink is found and the large xattr  feature is not enabled, which
> >> would give more time to fix this (hopefully rare in the wild) case.
> > 
> > If the old kernel created it, then likely all the
> > /lib{,64}/ld-linux.so.2 symlinks have that, which breaks all ELF
> > executables. I suspect in these old file systems it's not particularly rare.
> 
> Sure, but not many people are going to be running a 4.14 kernel with
> a 2007 system. 

I have multiple test VMs with root ext3 filesystems that date back
that far. Looks like the original install the root fs image was
derived from came from around 2006:

$ ls -lt /etc |tail -1
-rw-r--r--  1 root root       9 Aug  8  2006 host.conf
$ ls -lt /usr/bin |tail -2
-rwxr-xr-x 1 root   root         2038 Jun 18  2006 defoma-hints
-rwxr-xr-x 1 root   root         1761 Jun 18  2006 dh_installdefoma
$ uname -a
Linux test4 4.14.0-dgc #211 SMP PREEMPT Thu Nov 23 16:49:31 AEDT 2017 x86_64 GNU/Linux
$

These VMs are in use 24x7, and have been since they were created way
back when. When something in ext3 breaks, I tend to notice it and
report it.

They don't have any whacky symlinks around, but the modern ext4 code
does try to eat these filesystems every so often. Extended operation
at ENOSPC will eventually corrupt the rootfs and crash the kernel,
and then I play the "e2fsck doesn't detect corruption, kernel does"
game to get them fixed up and working again....

> > Requiring new e2fsck on old systems is a bad idea.
> 
> Any worse an idea than running a new kernel on an old system?
> Newer e2fsck fixes a lot of bugs that are present in older
> e2fsck as well...

I'm running with everything up to date (debian unstable) on these
VMs, they are just an old filesystem because some distros have had
reliable rolling updates for the entire life of these VMs. :P

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: regression: 4.13 cannot follow symlinks on some ext3 fs
  2017-11-25 22:32               ` Dave Chinner
@ 2017-11-25 22:45                 ` Reindl Harald
  2017-11-25 22:57                   ` Dave Chinner
  2017-11-26 15:40                 ` Theodore Ts'o
  1 sibling, 1 reply; 21+ messages in thread
From: Reindl Harald @ 2017-11-25 22:45 UTC (permalink / raw)
  To: Dave Chinner, Andreas Dilger
  Cc: Andi Kleen, Theodore Ts'o, Tahsin Erdogan,
	Linux Kernel Mailing List, linux-fsdevel, linux-ext4


Am 25.11.2017 um 23:32 schrieb Dave Chinner:
> On Fri, Nov 24, 2017 at 03:03:37PM -0700, Andreas Dilger wrote:
>> Any worse an idea than running a new kernel on an old system?
>> Newer e2fsck fixes a lot of bugs that are present in older
>> e2fsck as well...
> 
> I'm running with everything up to date (debian unstable) on these
> VMs, they are just an old filesystem because some distros have had
> reliable rolling updates for the entire life of these VMs. :P

but why not update the FS to ext4?

our whole infrastructure was installed with Fedora 9 on ext3 (currently 
running F26, yum/dnf dist-upgrades) but any FS including the rootfs was 
converted to ext4 in 2010

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: regression: 4.13 cannot follow symlinks on some ext3 fs
  2017-11-25 22:45                 ` Reindl Harald
@ 2017-11-25 22:57                   ` Dave Chinner
  0 siblings, 0 replies; 21+ messages in thread
From: Dave Chinner @ 2017-11-25 22:57 UTC (permalink / raw)
  To: Reindl Harald
  Cc: Andreas Dilger, Andi Kleen, Theodore Ts'o, Tahsin Erdogan,
	Linux Kernel Mailing List, linux-fsdevel, linux-ext4

On Sat, Nov 25, 2017 at 11:45:07PM +0100, Reindl Harald wrote:
> 
> Am 25.11.2017 um 23:32 schrieb Dave Chinner:
> >On Fri, Nov 24, 2017 at 03:03:37PM -0700, Andreas Dilger wrote:
> >>Any worse an idea than running a new kernel on an old system?
> >>Newer e2fsck fixes a lot of bugs that are present in older
> >>e2fsck as well...
> >
> >I'm running with everything up to date (debian unstable) on these
> >VMs, they are just an old filesystem because some distros have had
> >reliable rolling updates for the entire life of these VMs. :P
> 
> but why not update the FS to ext4?

Unlike ext3, ext4 is not a filesystem that takes kindly to being
abused by an environment that involves machines being crashed,
oopsed and forcibly rebooted without warning tens of times a day.
Every ext4 root filesytsem I've tried on these VMs has lasted less
than two weeks before being unrecoverably corrupted and needing to
be rebuilt from scratch.

Last time I tried a couple of years ago, the ext4 filesystems lasted
less than a day because corrupting itself in a way that it couldn't
mount but e2fsck didn't detect anything wrong and so it couldn't be
repaired. ext4 is just not robust enough for my use case.

And, FWIW, I don't use XFS for these root filesystems because the
reason I'm doing this to machines is that I'm trashing throwaway XFS
filesystems with broken XFS code on other devices on the VM...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: regression: 4.13 cannot follow symlinks on some ext3 fs
  2017-11-25 22:32               ` Dave Chinner
  2017-11-25 22:45                 ` Reindl Harald
@ 2017-11-26 15:40                 ` Theodore Ts'o
  2017-11-26 21:14                   ` Dave Chinner
  1 sibling, 1 reply; 21+ messages in thread
From: Theodore Ts'o @ 2017-11-26 15:40 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Andreas Dilger, Andi Kleen, Tahsin Erdogan,
	Linux Kernel Mailing List, linux-fsdevel, linux-ext4

On Sun, Nov 26, 2017 at 09:32:02AM +1100, Dave Chinner wrote:
> 
> They don't have any whacky symlinks around, but the modern ext4 code
> does try to eat these filesystems every so often. Extended operation
> at ENOSPC will eventually corrupt the rootfs and crash the kernel,
> and then I play the "e2fsck doesn't detect corruption, kernel does"
> game to get them fixed up and working again....

If you have stack dumps or file system images which e2fsck doesn't
detect any problems but the kernels do, please do feel free send
reports to the ext4 mailing list.

> I'm running with everything up to date (debian unstable) on these
> VMs, they are just an old filesystem because some distros have had
> reliable rolling updates for the entire life of these VMs. :P

Or if you can make the VM's available and tell me how you are
using/exercising them, I can try to see if I can repro the problem.

I am wondering how you are running into ENOSPC on the root file
systems; I take this is much more than running xfstests?  Are you
running some benchmarks that are logging into the root, and that's
triggering the ENOSPC condition?

Thanks,

						- Ted

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: regression: 4.13 cannot follow symlinks on some ext3 fs
  2017-11-26 15:40                 ` Theodore Ts'o
@ 2017-11-26 21:14                   ` Dave Chinner
  2017-11-26 21:35                     ` Reindl Harald
  2017-11-27 17:11                     ` Theodore Ts'o
  0 siblings, 2 replies; 21+ messages in thread
From: Dave Chinner @ 2017-11-26 21:14 UTC (permalink / raw)
  To: Theodore Ts'o, Andreas Dilger, Andi Kleen, Tahsin Erdogan,
	Linux Kernel Mailing List, linux-fsdevel, linux-ext4

On Sun, Nov 26, 2017 at 10:40:26AM -0500, Theodore Ts'o wrote:
> On Sun, Nov 26, 2017 at 09:32:02AM +1100, Dave Chinner wrote:
> > 
> > They don't have any whacky symlinks around, but the modern ext4 code
> > does try to eat these filesystems every so often. Extended operation
> > at ENOSPC will eventually corrupt the rootfs and crash the kernel,
> > and then I play the "e2fsck doesn't detect corruption, kernel does"
> > game to get them fixed up and working again....
> 
> If you have stack dumps or file system images which e2fsck doesn't
> detect any problems but the kernels do, please do feel free send
> reports to the ext4 mailing list.

Of course. I've done that every time I've come acros these sorts of
problems.

> > I'm running with everything up to date (debian unstable) on these
> > VMs, they are just an old filesystem because some distros have had
> > reliable rolling updates for the entire life of these VMs. :P
> 
> Or if you can make the VM's available and tell me how you are
> using/exercising them, I can try to see if I can repro the problem.

No, I can't xpamke them available. As for how I use them, they are
my test/devel VMs, so they are getting multiple kernels thrown at
them every day, and I'll just kill the VM via the qemu console (they
*never* get shut down clealy) when I need to install a new kernel.
Often they won't shut down anyway, because I've
oopsed/deadlocked/etc something on a different filesystem...

> I am wondering how you are running into ENOSPC on the root file
> systems; I take this is much more than running xfstests?

No, it isn't.  Just have a scratch filesystem failure during
xfstests such that mount fails during a "fill to enospc" test and it
will fill the root filesystem rather than the test/scratch device.
Or run a buggy test that dumps everything in $here. Or fill /tmp
without noticing it.  Then let fstests continue to run trying to
write state and logs for the next 500 tests...

> Are you
> running some benchmarks that are logging into the root, and that's
> triggering the ENOSPC condition?

No, I'm not doing anything like that on these machines. It's
straight forward "something filled the root fs unexpectedly" type of
error which I don't notice immediately...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: regression: 4.13 cannot follow symlinks on some ext3 fs
  2017-11-26 21:14                   ` Dave Chinner
@ 2017-11-26 21:35                     ` Reindl Harald
  2017-11-26 22:43                       ` Dave Chinner
  2017-11-27 17:11                     ` Theodore Ts'o
  1 sibling, 1 reply; 21+ messages in thread
From: Reindl Harald @ 2017-11-26 21:35 UTC (permalink / raw)
  To: linux-ext4; +Cc: Dave Chinner



Am 26.11.2017 um 22:14 schrieb Dave Chinner:
> On Sun, Nov 26, 2017 at 10:40:26AM -0500, Theodore Ts'o wrote:
>> Are you
>> running some benchmarks that are logging into the root, and that's
>> triggering the ENOSPC condition?
> 
> No, I'm not doing anything like that on these machines. It's
> straight forward "something filled the root fs unexpectedly" type of
> error which I don't notice immediately...

have you ever considered to just buy larger disks or introduce quota 
because "Unlike ext3, ext4 is not a filesystem that takes kindly to 
being abused by an environment that involves machines being crashed, 
oopsed and forcibly rebooted without warning tens of times a day" caused 
by a full root fs is anyhting but a reasonable workload and i doubt any 
filesystem is intended to run under such abused environment

it would be even enough to not run that workload as root because that's 
the reserved diskspace is for

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: regression: 4.13 cannot follow symlinks on some ext3 fs
  2017-11-26 21:35                     ` Reindl Harald
@ 2017-11-26 22:43                       ` Dave Chinner
  0 siblings, 0 replies; 21+ messages in thread
From: Dave Chinner @ 2017-11-26 22:43 UTC (permalink / raw)
  To: Reindl Harald; +Cc: linux-ext4

On Sun, Nov 26, 2017 at 10:35:08PM +0100, Reindl Harald wrote:
> 
> 
> Am 26.11.2017 um 22:14 schrieb Dave Chinner:
> >On Sun, Nov 26, 2017 at 10:40:26AM -0500, Theodore Ts'o wrote:
> >>Are you
> >>running some benchmarks that are logging into the root, and that's
> >>triggering the ENOSPC condition?
> >
> >No, I'm not doing anything like that on these machines. It's
> >straight forward "something filled the root fs unexpectedly" type of
> >error which I don't notice immediately...
> 
> have you ever considered to just buy larger disks or introduce quota
> because "Unlike ext3, ext4 is not a filesystem that takes kindly to
> being abused by an environment that involves machines being crashed,
> oopsed and forcibly rebooted without warning tens of times a day"

That's my normal production workload, been doing it successfully
on ext3 root filesystems for more than 10 years. This causes very
few problems for ext3 filesystems, but it causes real problems for
ext4 filesystems.

> caused by a full root fs

I don't think you've quite understood: a) I don't run my root
filesystems at ENOSPC (it's a rare event), and b) crashing kernels
and abusing filesystems is the one-line summary of my job
description.

> is anyhting but a reasonable workload and i
> doubt any filesystem is intended to run under such abused
> environment

Intended or not, we have to make filesystems robust in such
environments because that's the sort of abuse we see in production
environments. Users *expect* filesystems to handle crashes, ENOSPC,
etc conditions without losing data or corrupting themselves. 

If filesystem developers aren't abusing their filesystems and
attempting to break them into little pieces and put them back
together again, then they aren't doing their jobs properly.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: regression: 4.13 cannot follow symlinks on some ext3 fs
  2017-11-26 21:14                   ` Dave Chinner
  2017-11-26 21:35                     ` Reindl Harald
@ 2017-11-27 17:11                     ` Theodore Ts'o
  2017-11-28  0:42                       ` Dave Chinner
  1 sibling, 1 reply; 21+ messages in thread
From: Theodore Ts'o @ 2017-11-27 17:11 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Andreas Dilger, Andi Kleen, Tahsin Erdogan,
	Linux Kernel Mailing List, linux-fsdevel, linux-ext4

On Mon, Nov 27, 2017 at 08:14:27AM +1100, Dave Chinner wrote:
> Of course. I've done that every time I've come acros these sorts of
> problems.

The most recent report I was able to find was against 4.7-rc6, in July
2016.  Have you been able to reproduce it more recently than that?

Cheers,

						- Ted

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: regression: 4.13 cannot follow symlinks on some ext3 fs
  2017-11-27 17:11                     ` Theodore Ts'o
@ 2017-11-28  0:42                       ` Dave Chinner
  0 siblings, 0 replies; 21+ messages in thread
From: Dave Chinner @ 2017-11-28  0:42 UTC (permalink / raw)
  To: Theodore Ts'o, Andreas Dilger, Andi Kleen, Tahsin Erdogan,
	Linux Kernel Mailing List, linux-fsdevel, linux-ext4

On Mon, Nov 27, 2017 at 12:11:26PM -0500, Theodore Ts'o wrote:
> On Mon, Nov 27, 2017 at 08:14:27AM +1100, Dave Chinner wrote:
> > Of course. I've done that every time I've come acros these sorts of
> > problems.
> 
> The most recent report I was able to find was against 4.7-rc6, in July
> 2016.  Have you been able to reproduce it more recently than that?

I hit it once a couple of months ago, but I was was busy with much
higher priority stuff at the time (sorting out a CVE-worthy bug fix)
so it slipped off my radar pretty rapidly after I recovered the test
system and kept doing what I needed to do...

So, yeah, the problems are still there, I just don't run my root
filesystems out of space very often. Like I said - maybe once or
twice a year is the typical frequency this happens.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: regression: 4.13 cannot follow symlinks on some ext3 fs
  2017-11-24 22:03             ` Andreas Dilger
                                 ` (2 preceding siblings ...)
  2017-11-25 22:32               ` Dave Chinner
@ 2017-12-04 16:35               ` Jan Kara
  3 siblings, 0 replies; 21+ messages in thread
From: Jan Kara @ 2017-12-04 16:35 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: Andi Kleen, Theodore Ts'o, Tahsin Erdogan,
	Linux Kernel Mailing List, linux-fsdevel, linux-ext4

On Fri 24-11-17 15:03:37, Andreas Dilger wrote:
> On Nov 24, 2017, at 9:51 AM, Andi Kleen <andi@firstfloor.org> wrote:
> > 
> >> We checked old kernels, and old e2fsprogs, and didn't see any cases
> >> where fast (<= 60 chars) symlinks were created using external blocks.
> >> It seems that _something_ did create them, and it would be good to
> >> figure that out so we can determine if it is a widespread problem
> > 
> > I assume it was the original kernel.
> > 
> >> 
> >> I think e2fsck can fix this quite easily, and there really isn't
> >> an easy way to revert to the old method if the large xattr feature
> >> is enabled.  If you are willing to run a new kernel, you should also
> >> be willing to run a new e2fsck.
> > 
> > It's obviously not enabled on ext3.
> > 
> >> We could probably add a fallback to the old mechanism (and print
> >> a one-time warning to upgrade to a newer e2fsck) if an external fast
> >> symlink is found and the large xattr  feature is not enabled, which
> >> would give more time to fix this (hopefully rare in the wild) case.
> > 
> > If the old kernel created it, then likely all the
> > /lib{,64}/ld-linux.so.2 symlinks have that, which breaks all ELF
> > executables. I suspect in these old file systems it's not particularly rare.
> 
> Sure, but not many people are going to be running a 4.14 kernel with
> a 2007 system.  Could you please run the updated find command to see
> whether this is an isolated case, or if it is a common case:
> 
> find / -type l -size -60c -print0 | xargs -0r ls -dils | awk '$2 != 0 { print }'
> 
> It would also be useful if anyone else reading this that has an old
> system (2005-2011 install date) ran the same to see if any such
> symlinks are found.  To see when the root filesystem was created, run:
> 
> dumpe2fs -h $(df -P / | awk '/dev/ { print $1 }') 2>&1 | grep created

I have one fs image around from:

Filesystem created:       Tue Nov 15 04:43:22 2005

and it indeed does have these problematic symlinks as well:

none):~# l /usr/share/terminfo/x/xterm-r5
lrwxrwxrwx 1 root root 24 May 19  2006 /usr/share/terminfo/x/xterm-r5 ->
/lib/terminfo/x/xterm-r5
(none):~# stat /usr/share/terminfo/x/xterm-r5
  File: `/usr/share/terminfo/x/xterm-r5' -> `/lib/terminfo/x/xterm-r5'
  Size: 24        	Blocks: 8          IO Block: 4096   symbolic link
Device: 6200h/25088d	Inode: 98027       Links: 1
Access: (0777/lrwxrwxrwx)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2017-12-04 16:27:29.000000000 +0000
Modify: 2006-05-19 21:12:53.000000000 +0000
Change: 2006-05-19 21:12:53.000000000 +0000

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2017-12-04 16:35 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-23 20:33 regression: 4.13 cannot follow symlinks on some ext3 fs Andi Kleen
2017-11-23 22:23 ` Theodore Ts'o
2017-11-23 23:31   ` Andi Kleen
2017-11-24  0:41     ` Andreas Dilger
2017-11-24  2:04       ` Andi Kleen
2017-11-24  6:12         ` Andreas Dilger
2017-11-24 16:51           ` Andi Kleen
2017-11-24 22:03             ` Andreas Dilger
2017-11-24 22:28               ` James Bottomley
2017-11-25  1:42               ` Andi Kleen
2017-11-25 22:32               ` Dave Chinner
2017-11-25 22:45                 ` Reindl Harald
2017-11-25 22:57                   ` Dave Chinner
2017-11-26 15:40                 ` Theodore Ts'o
2017-11-26 21:14                   ` Dave Chinner
2017-11-26 21:35                     ` Reindl Harald
2017-11-26 22:43                       ` Dave Chinner
2017-11-27 17:11                     ` Theodore Ts'o
2017-11-28  0:42                       ` Dave Chinner
2017-12-04 16:35               ` Jan Kara
2017-11-25  3:54             ` Theodore Ts'o

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.