* Re: regression: 4.13 cannot follow symlinks on some ext3 fs
2017-11-24 22:03 ` Andreas Dilger
@ 2017-11-24 22:28 ` James Bottomley
2017-11-25 1:42 ` Andi Kleen
` (2 subsequent siblings)
3 siblings, 0 replies; 19+ messages in thread
From: James Bottomley @ 2017-11-24 22:28 UTC (permalink / raw)
To: Andreas Dilger, Andi Kleen
Cc: Theodore Ts'o, Tahsin Erdogan, Linux Kernel Mailing List,
linux-fsdevel, linux-ext4
[-- Attachment #1: Type: text/plain, Size: 2175 bytes --]
On Fri, 2017-11-24 at 15:03 -0700, Andreas Dilger wrote:
> On Nov 24, 2017, at 9:51 AM, Andi Kleen <andi@firstfloor.org> wrote:
> >
> >
> > >
> > > We checked old kernels, and old e2fsprogs, and didn't see any
> > > cases
> > > where fast (<= 60 chars) symlinks were created using external
> > > blocks.
> > > It seems that _something_ did create them, and it would be good
> > > to
> > > figure that out so we can determine if it is a widespread problem
> >
> > I assume it was the original kernel.
> >
> > >
> > >
> > > I think e2fsck can fix this quite easily, and there really isn't
> > > an easy way to revert to the old method if the large xattr
> > > feature
> > > is enabled. If you are willing to run a new kernel, you should
> > > also
> > > be willing to run a new e2fsck.
> >
> > It's obviously not enabled on ext3.
> >
> > >
> > > We could probably add a fallback to the old mechanism (and print
> > > a one-time warning to upgrade to a newer e2fsck) if an external
> > > fast symlink is found and the large xattr feature is not
> > > enabled, which would give more time to fix this (hopefully rare
> > > in the wild) case.
> >
> > If the old kernel created it, then likely all the
> > /lib{,64}/ld-linux.so.2 symlinks have that, which breaks all ELF
> > executables. I suspect in these old file systems it's not
> > particularly rare.
>
> Sure, but not many people are going to be running a 4.14 kernel with
> a 2007 system.
I really disagree on this ... most of us who are doing kernel testing
will be running with older systems. It's true, some of us do install
from scratch and then test, but most of us upgrade (which doesn't
necessarily modify the symlinks). On your creation test, this is my
cloud system:
bedivere:~# dumpe2fs -h $(df -P / | awk '/dev/ { print $1 }') 2>&1 | grep created
Filesystem created: Tue Mar 24 20:21:35 2009
Your find command turns up nothing untoward.
My older system is the home entertainment system, but that has an xfs
root dating back to 2005.
I bet I have a laptop even older (currently travelling, so can't
check).
James
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: regression: 4.13 cannot follow symlinks on some ext3 fs
2017-11-24 22:03 ` Andreas Dilger
2017-11-24 22:28 ` James Bottomley
@ 2017-11-25 1:42 ` Andi Kleen
2017-11-25 22:32 ` Dave Chinner
2017-12-04 16:35 ` Jan Kara
3 siblings, 0 replies; 19+ messages in thread
From: Andi Kleen @ 2017-11-25 1:42 UTC (permalink / raw)
To: Andreas Dilger
Cc: Andi Kleen, Theodore Ts'o, Tahsin Erdogan,
Linux Kernel Mailing List, linux-fsdevel, linux-ext4
> Sure, but not many people are going to be running a 4.14 kernel with
> a 2007 system.
It's not just root, but any disk. People could well have 10 year old
disks.
> Could you please run the updated find command to see
> whether this is an isolated case, or if it is a common case:
>
> find / -type l -size -60c -print0 | xargs -0r ls -dils | awk '$2 != 0 { print }'
Pretty much all symlinks on / hit it. / has 1278 symlinks total, and
1218 match the line above.
-Andi
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: regression: 4.13 cannot follow symlinks on some ext3 fs
2017-11-24 22:03 ` Andreas Dilger
2017-11-24 22:28 ` James Bottomley
2017-11-25 1:42 ` Andi Kleen
@ 2017-11-25 22:32 ` Dave Chinner
2017-11-25 22:45 ` Reindl Harald
2017-11-26 15:40 ` Theodore Ts'o
2017-12-04 16:35 ` Jan Kara
3 siblings, 2 replies; 19+ messages in thread
From: Dave Chinner @ 2017-11-25 22:32 UTC (permalink / raw)
To: Andreas Dilger
Cc: Andi Kleen, Theodore Ts'o, Tahsin Erdogan,
Linux Kernel Mailing List, linux-fsdevel, linux-ext4
On Fri, Nov 24, 2017 at 03:03:37PM -0700, Andreas Dilger wrote:
> On Nov 24, 2017, at 9:51 AM, Andi Kleen <andi@firstfloor.org> wrote:
> >
> >> We checked old kernels, and old e2fsprogs, and didn't see any cases
> >> where fast (<= 60 chars) symlinks were created using external blocks.
> >> It seems that _something_ did create them, and it would be good to
> >> figure that out so we can determine if it is a widespread problem
> >
> > I assume it was the original kernel.
> >
> >>
> >> I think e2fsck can fix this quite easily, and there really isn't
> >> an easy way to revert to the old method if the large xattr feature
> >> is enabled. If you are willing to run a new kernel, you should also
> >> be willing to run a new e2fsck.
> >
> > It's obviously not enabled on ext3.
> >
> >> We could probably add a fallback to the old mechanism (and print
> >> a one-time warning to upgrade to a newer e2fsck) if an external fast
> >> symlink is found and the large xattr feature is not enabled, which
> >> would give more time to fix this (hopefully rare in the wild) case.
> >
> > If the old kernel created it, then likely all the
> > /lib{,64}/ld-linux.so.2 symlinks have that, which breaks all ELF
> > executables. I suspect in these old file systems it's not particularly rare.
>
> Sure, but not many people are going to be running a 4.14 kernel with
> a 2007 system.
I have multiple test VMs with root ext3 filesystems that date back
that far. Looks like the original install the root fs image was
derived from came from around 2006:
$ ls -lt /etc |tail -1
-rw-r--r-- 1 root root 9 Aug 8 2006 host.conf
$ ls -lt /usr/bin |tail -2
-rwxr-xr-x 1 root root 2038 Jun 18 2006 defoma-hints
-rwxr-xr-x 1 root root 1761 Jun 18 2006 dh_installdefoma
$ uname -a
Linux test4 4.14.0-dgc #211 SMP PREEMPT Thu Nov 23 16:49:31 AEDT 2017 x86_64 GNU/Linux
$
These VMs are in use 24x7, and have been since they were created way
back when. When something in ext3 breaks, I tend to notice it and
report it.
They don't have any whacky symlinks around, but the modern ext4 code
does try to eat these filesystems every so often. Extended operation
at ENOSPC will eventually corrupt the rootfs and crash the kernel,
and then I play the "e2fsck doesn't detect corruption, kernel does"
game to get them fixed up and working again....
> > Requiring new e2fsck on old systems is a bad idea.
>
> Any worse an idea than running a new kernel on an old system?
> Newer e2fsck fixes a lot of bugs that are present in older
> e2fsck as well...
I'm running with everything up to date (debian unstable) on these
VMs, they are just an old filesystem because some distros have had
reliable rolling updates for the entire life of these VMs. :P
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: regression: 4.13 cannot follow symlinks on some ext3 fs
2017-11-25 22:32 ` Dave Chinner
@ 2017-11-25 22:45 ` Reindl Harald
2017-11-25 22:57 ` Dave Chinner
2017-11-26 15:40 ` Theodore Ts'o
1 sibling, 1 reply; 19+ messages in thread
From: Reindl Harald @ 2017-11-25 22:45 UTC (permalink / raw)
To: Dave Chinner, Andreas Dilger
Cc: Andi Kleen, Theodore Ts'o, Tahsin Erdogan,
Linux Kernel Mailing List, linux-fsdevel, linux-ext4
Am 25.11.2017 um 23:32 schrieb Dave Chinner:
> On Fri, Nov 24, 2017 at 03:03:37PM -0700, Andreas Dilger wrote:
>> Any worse an idea than running a new kernel on an old system?
>> Newer e2fsck fixes a lot of bugs that are present in older
>> e2fsck as well...
>
> I'm running with everything up to date (debian unstable) on these
> VMs, they are just an old filesystem because some distros have had
> reliable rolling updates for the entire life of these VMs. :P
but why not update the FS to ext4?
our whole infrastructure was installed with Fedora 9 on ext3 (currently
running F26, yum/dnf dist-upgrades) but any FS including the rootfs was
converted to ext4 in 2010
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: regression: 4.13 cannot follow symlinks on some ext3 fs
2017-11-25 22:45 ` Reindl Harald
@ 2017-11-25 22:57 ` Dave Chinner
0 siblings, 0 replies; 19+ messages in thread
From: Dave Chinner @ 2017-11-25 22:57 UTC (permalink / raw)
To: Reindl Harald
Cc: Andreas Dilger, Andi Kleen, Theodore Ts'o, Tahsin Erdogan,
Linux Kernel Mailing List, linux-fsdevel, linux-ext4
On Sat, Nov 25, 2017 at 11:45:07PM +0100, Reindl Harald wrote:
>
> Am 25.11.2017 um 23:32 schrieb Dave Chinner:
> >On Fri, Nov 24, 2017 at 03:03:37PM -0700, Andreas Dilger wrote:
> >>Any worse an idea than running a new kernel on an old system?
> >>Newer e2fsck fixes a lot of bugs that are present in older
> >>e2fsck as well...
> >
> >I'm running with everything up to date (debian unstable) on these
> >VMs, they are just an old filesystem because some distros have had
> >reliable rolling updates for the entire life of these VMs. :P
>
> but why not update the FS to ext4?
Unlike ext3, ext4 is not a filesystem that takes kindly to being
abused by an environment that involves machines being crashed,
oopsed and forcibly rebooted without warning tens of times a day.
Every ext4 root filesytsem I've tried on these VMs has lasted less
than two weeks before being unrecoverably corrupted and needing to
be rebuilt from scratch.
Last time I tried a couple of years ago, the ext4 filesystems lasted
less than a day because corrupting itself in a way that it couldn't
mount but e2fsck didn't detect anything wrong and so it couldn't be
repaired. ext4 is just not robust enough for my use case.
And, FWIW, I don't use XFS for these root filesystems because the
reason I'm doing this to machines is that I'm trashing throwaway XFS
filesystems with broken XFS code on other devices on the VM...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: regression: 4.13 cannot follow symlinks on some ext3 fs
2017-11-25 22:32 ` Dave Chinner
2017-11-25 22:45 ` Reindl Harald
@ 2017-11-26 15:40 ` Theodore Ts'o
2017-11-26 21:14 ` Dave Chinner
1 sibling, 1 reply; 19+ messages in thread
From: Theodore Ts'o @ 2017-11-26 15:40 UTC (permalink / raw)
To: Dave Chinner
Cc: Andreas Dilger, Andi Kleen, Tahsin Erdogan,
Linux Kernel Mailing List, linux-fsdevel, linux-ext4
On Sun, Nov 26, 2017 at 09:32:02AM +1100, Dave Chinner wrote:
>
> They don't have any whacky symlinks around, but the modern ext4 code
> does try to eat these filesystems every so often. Extended operation
> at ENOSPC will eventually corrupt the rootfs and crash the kernel,
> and then I play the "e2fsck doesn't detect corruption, kernel does"
> game to get them fixed up and working again....
If you have stack dumps or file system images which e2fsck doesn't
detect any problems but the kernels do, please do feel free send
reports to the ext4 mailing list.
> I'm running with everything up to date (debian unstable) on these
> VMs, they are just an old filesystem because some distros have had
> reliable rolling updates for the entire life of these VMs. :P
Or if you can make the VM's available and tell me how you are
using/exercising them, I can try to see if I can repro the problem.
I am wondering how you are running into ENOSPC on the root file
systems; I take this is much more than running xfstests? Are you
running some benchmarks that are logging into the root, and that's
triggering the ENOSPC condition?
Thanks,
- Ted
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: regression: 4.13 cannot follow symlinks on some ext3 fs
2017-11-26 15:40 ` Theodore Ts'o
@ 2017-11-26 21:14 ` Dave Chinner
2017-11-27 17:11 ` Theodore Ts'o
0 siblings, 1 reply; 19+ messages in thread
From: Dave Chinner @ 2017-11-26 21:14 UTC (permalink / raw)
To: Theodore Ts'o, Andreas Dilger, Andi Kleen, Tahsin Erdogan,
Linux Kernel Mailing List, linux-fsdevel, linux-ext4
On Sun, Nov 26, 2017 at 10:40:26AM -0500, Theodore Ts'o wrote:
> On Sun, Nov 26, 2017 at 09:32:02AM +1100, Dave Chinner wrote:
> >
> > They don't have any whacky symlinks around, but the modern ext4 code
> > does try to eat these filesystems every so often. Extended operation
> > at ENOSPC will eventually corrupt the rootfs and crash the kernel,
> > and then I play the "e2fsck doesn't detect corruption, kernel does"
> > game to get them fixed up and working again....
>
> If you have stack dumps or file system images which e2fsck doesn't
> detect any problems but the kernels do, please do feel free send
> reports to the ext4 mailing list.
Of course. I've done that every time I've come acros these sorts of
problems.
> > I'm running with everything up to date (debian unstable) on these
> > VMs, they are just an old filesystem because some distros have had
> > reliable rolling updates for the entire life of these VMs. :P
>
> Or if you can make the VM's available and tell me how you are
> using/exercising them, I can try to see if I can repro the problem.
No, I can't xpamke them available. As for how I use them, they are
my test/devel VMs, so they are getting multiple kernels thrown at
them every day, and I'll just kill the VM via the qemu console (they
*never* get shut down clealy) when I need to install a new kernel.
Often they won't shut down anyway, because I've
oopsed/deadlocked/etc something on a different filesystem...
> I am wondering how you are running into ENOSPC on the root file
> systems; I take this is much more than running xfstests?
No, it isn't. Just have a scratch filesystem failure during
xfstests such that mount fails during a "fill to enospc" test and it
will fill the root filesystem rather than the test/scratch device.
Or run a buggy test that dumps everything in $here. Or fill /tmp
without noticing it. Then let fstests continue to run trying to
write state and logs for the next 500 tests...
> Are you
> running some benchmarks that are logging into the root, and that's
> triggering the ENOSPC condition?
No, I'm not doing anything like that on these machines. It's
straight forward "something filled the root fs unexpectedly" type of
error which I don't notice immediately...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: regression: 4.13 cannot follow symlinks on some ext3 fs
2017-11-26 21:14 ` Dave Chinner
@ 2017-11-27 17:11 ` Theodore Ts'o
2017-11-28 0:42 ` Dave Chinner
0 siblings, 1 reply; 19+ messages in thread
From: Theodore Ts'o @ 2017-11-27 17:11 UTC (permalink / raw)
To: Dave Chinner
Cc: Andreas Dilger, Andi Kleen, Tahsin Erdogan,
Linux Kernel Mailing List, linux-fsdevel, linux-ext4
On Mon, Nov 27, 2017 at 08:14:27AM +1100, Dave Chinner wrote:
> Of course. I've done that every time I've come acros these sorts of
> problems.
The most recent report I was able to find was against 4.7-rc6, in July
2016. Have you been able to reproduce it more recently than that?
Cheers,
- Ted
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: regression: 4.13 cannot follow symlinks on some ext3 fs
2017-11-27 17:11 ` Theodore Ts'o
@ 2017-11-28 0:42 ` Dave Chinner
0 siblings, 0 replies; 19+ messages in thread
From: Dave Chinner @ 2017-11-28 0:42 UTC (permalink / raw)
To: Theodore Ts'o, Andreas Dilger, Andi Kleen, Tahsin Erdogan,
Linux Kernel Mailing List, linux-fsdevel, linux-ext4
On Mon, Nov 27, 2017 at 12:11:26PM -0500, Theodore Ts'o wrote:
> On Mon, Nov 27, 2017 at 08:14:27AM +1100, Dave Chinner wrote:
> > Of course. I've done that every time I've come acros these sorts of
> > problems.
>
> The most recent report I was able to find was against 4.7-rc6, in July
> 2016. Have you been able to reproduce it more recently than that?
I hit it once a couple of months ago, but I was was busy with much
higher priority stuff at the time (sorting out a CVE-worthy bug fix)
so it slipped off my radar pretty rapidly after I recovered the test
system and kept doing what I needed to do...
So, yeah, the problems are still there, I just don't run my root
filesystems out of space very often. Like I said - maybe once or
twice a year is the typical frequency this happens.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: regression: 4.13 cannot follow symlinks on some ext3 fs
2017-11-24 22:03 ` Andreas Dilger
` (2 preceding siblings ...)
2017-11-25 22:32 ` Dave Chinner
@ 2017-12-04 16:35 ` Jan Kara
3 siblings, 0 replies; 19+ messages in thread
From: Jan Kara @ 2017-12-04 16:35 UTC (permalink / raw)
To: Andreas Dilger
Cc: Andi Kleen, Theodore Ts'o, Tahsin Erdogan,
Linux Kernel Mailing List, linux-fsdevel, linux-ext4
On Fri 24-11-17 15:03:37, Andreas Dilger wrote:
> On Nov 24, 2017, at 9:51 AM, Andi Kleen <andi@firstfloor.org> wrote:
> >
> >> We checked old kernels, and old e2fsprogs, and didn't see any cases
> >> where fast (<= 60 chars) symlinks were created using external blocks.
> >> It seems that _something_ did create them, and it would be good to
> >> figure that out so we can determine if it is a widespread problem
> >
> > I assume it was the original kernel.
> >
> >>
> >> I think e2fsck can fix this quite easily, and there really isn't
> >> an easy way to revert to the old method if the large xattr feature
> >> is enabled. If you are willing to run a new kernel, you should also
> >> be willing to run a new e2fsck.
> >
> > It's obviously not enabled on ext3.
> >
> >> We could probably add a fallback to the old mechanism (and print
> >> a one-time warning to upgrade to a newer e2fsck) if an external fast
> >> symlink is found and the large xattr feature is not enabled, which
> >> would give more time to fix this (hopefully rare in the wild) case.
> >
> > If the old kernel created it, then likely all the
> > /lib{,64}/ld-linux.so.2 symlinks have that, which breaks all ELF
> > executables. I suspect in these old file systems it's not particularly rare.
>
> Sure, but not many people are going to be running a 4.14 kernel with
> a 2007 system. Could you please run the updated find command to see
> whether this is an isolated case, or if it is a common case:
>
> find / -type l -size -60c -print0 | xargs -0r ls -dils | awk '$2 != 0 { print }'
>
> It would also be useful if anyone else reading this that has an old
> system (2005-2011 install date) ran the same to see if any such
> symlinks are found. To see when the root filesystem was created, run:
>
> dumpe2fs -h $(df -P / | awk '/dev/ { print $1 }') 2>&1 | grep created
I have one fs image around from:
Filesystem created: Tue Nov 15 04:43:22 2005
and it indeed does have these problematic symlinks as well:
none):~# l /usr/share/terminfo/x/xterm-r5
lrwxrwxrwx 1 root root 24 May 19 2006 /usr/share/terminfo/x/xterm-r5 ->
/lib/terminfo/x/xterm-r5
(none):~# stat /usr/share/terminfo/x/xterm-r5
File: `/usr/share/terminfo/x/xterm-r5' -> `/lib/terminfo/x/xterm-r5'
Size: 24 Blocks: 8 IO Block: 4096 symbolic link
Device: 6200h/25088d Inode: 98027 Links: 1
Access: (0777/lrwxrwxrwx) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2017-12-04 16:27:29.000000000 +0000
Modify: 2006-05-19 21:12:53.000000000 +0000
Change: 2006-05-19 21:12:53.000000000 +0000
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 19+ messages in thread