* Phantom full ext4 root filesystems on 4.1 through 4.14 kernels
@ 2018-11-08 17:59 Elana Hashman
2018-11-08 18:13 ` Reindl Harald
` (2 more replies)
0 siblings, 3 replies; 21+ messages in thread
From: Elana Hashman @ 2018-11-08 17:59 UTC (permalink / raw)
To: 'tytso@mit.edu'; +Cc: 'linux-ext4@vger.kernel.org'
Hi Ted,
We've run into a mysterious "phantom" full filesystem issue on our Kubernetes fleet. We initially encountered this issue on kernel 4.1.35, but are still experiencing the problem after upgrading to 4.14.67. Essentially, `df` reports our root filesystems as full and they behave as though they are full, but the "used" space cannot be accounted for. Rebooting the system, remounting the root filesystem read-only and then remounting as read-write, or booting into single-user mode all free up the "used" space. The disk slowly fills up over time, suggesting that there might be some kind of leak; we previously saw this affecting hosts with ~200 days of uptime on the 4.1 kernel, but are now seeing it affect a 4.14 host with only ~70 days of uptime.
Here is some data from an example host, running the 4.14.67 kernel. The root disk is ext4.
$ uname -a
Linux <hostname> 4.14.67-ts1 #1 SMP Wed Aug 29 13:28:25 UTC 2018 x86_64 GNU/Linux
$ grep ' / ' /proc/mounts
/dev/disk/by-uuid/<some-uuid> / ext4 rw,relatime,errors=remount-ro,data=ordered 0 0
`df` reports 0 bytes free:
$ df -h /
Filesystem Size Used Avail Use% Mounted on
/dev/disk/by-uuid/<some-uuid> 50G 48G 0 100% /
Deleted, open files account for almost no disk capacity:
$ sudo lsof -a +L1 /
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NLINK NODE NAME
java 5313 user 3r REG 8,3 6806312 0 1315847 /var/lib/sss/mc/passwd (deleted)
java 5313 user 11u REG 8,3 55185 0 2494654 /tmp/classpath.1668Gp (deleted)
system_ar 5333 user 3r REG 8,3 6806312 0 1315847 /var/lib/sss/mc/passwd (deleted)
java 5421 user 3r REG 8,3 6806312 0 1315847 /var/lib/sss/mc/passwd (deleted)
java 5421 user 11u REG 8,3 149313 0 2494486 /tmp/java.fzTwWp (deleted)
java 5421 tsdist 12u REG 8,3 55185 0 2500513 /tmp/classpath.7AmxHO (deleted)
`du` can only account for 16GB of file usage:
$ sudo du -hxs /
16G /
But what is most puzzling is the numbers reported by e2freefrag, which don't add up:
$ sudo e2freefrag /dev/disk/by-uuid/<some-uuid>
Device: /dev/disk/by-uuid/<some-uuid>
Blocksize: 4096 bytes
Total blocks: 13107200
Free blocks: 7778076 (59.3%)
Min. free extent: 4 KB
Max. free extent: 8876 KB
Avg. free extent: 224 KB
Num. free extent: 6098
HISTOGRAM OF FREE EXTENT SIZES:
Extent Size Range : Free extents Free Blocks Percent
4K... 8K- : 1205 1205 0.02%
8K... 16K- : 980 2265 0.03%
16K... 32K- : 653 3419 0.04%
32K... 64K- : 1337 15374 0.20%
64K... 128K- : 631 14151 0.18%
128K... 256K- : 224 10205 0.13%
256K... 512K- : 261 23818 0.31%
512K... 1024K- : 303 56801 0.73%
1M... 2M- : 387 135907 1.75%
2M... 4M- : 103 64740 0.83%
4M... 8M- : 12 15005 0.19%
8M... 16M- : 2 4267 0.05%
This looks like a bug to me; the histogram in the manpage example has percentages that add up to 100% but this doesn't even add up to 5%.
After a reboot, `df` reflects real utilization:
$ df -h /
Filesystem Size Used Avail Use% Mounted on
/dev/disk/by-uuid/<some-uuid> 50G 16G 31G 34% /
We are using overlay2fs for Docker, as well as rbd mounts; I'm not sure how they might interact.
Thanks for your help,
--
Elana Hashman
ehashman@twosigma.com
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Phantom full ext4 root filesystems on 4.1 through 4.14 kernels
2018-11-08 17:59 Phantom full ext4 root filesystems on 4.1 through 4.14 kernels Elana Hashman
@ 2018-11-08 18:13 ` Reindl Harald
2018-11-08 18:20 ` Elana Hashman
2018-11-08 18:47 ` Darrick J. Wong
2019-01-24 1:54 ` Phantom full ext4 root filesystems on 4.1 through 4.14 kernels Liu Bo
2 siblings, 1 reply; 21+ messages in thread
From: Reindl Harald @ 2018-11-08 18:13 UTC (permalink / raw)
To: Elana Hashman, 'tytso@mit.edu'
Cc: 'linux-ext4@vger.kernel.org'
Am 08.11.18 um 18:59 schrieb Elana Hashman:
> After a reboot, `df` reflects real utilization:
>
> $ df -h /
> Filesystem Size Used Avail Use% Mounted on
> /dev/disk/by-uuid/<some-uuid> 50G 16G 31G 34% /
>
> We are using overlay2fs for Docker, as well as rbd mounts; I'm not sure how they might interact.
and what says "df -hi"?
maybe you are running simply out of inodes
^ permalink raw reply [flat|nested] 21+ messages in thread
* RE: Phantom full ext4 root filesystems on 4.1 through 4.14 kernels
2018-11-08 18:13 ` Reindl Harald
@ 2018-11-08 18:20 ` Elana Hashman
0 siblings, 0 replies; 21+ messages in thread
From: Elana Hashman @ 2018-11-08 18:20 UTC (permalink / raw)
To: 'Reindl Harald', 'tytso@mit.edu'
Cc: 'linux-ext4@vger.kernel.org'
-----Original Message-----
From: Reindl Harald <h.reindl@thelounge.net>
Sent: Thursday, November 8, 2018 1:14 PM
> and what says "df -hi"?
> maybe you are running simply out of inodes
We are nowhere near the inode limit:
$ df -hi /
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/disk/by-uuid/<some-uuid> 3.2M 191K 3.0M 6% /
I can't confirm 100% since we rebooted this host and I don't have a candidate demonstrating the problem right now that I can check, but it's something we've checked previously.
- e
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Phantom full ext4 root filesystems on 4.1 through 4.14 kernels
2018-11-08 17:59 Phantom full ext4 root filesystems on 4.1 through 4.14 kernels Elana Hashman
2018-11-08 18:13 ` Reindl Harald
@ 2018-11-08 18:47 ` Darrick J. Wong
2018-12-05 16:26 ` Elana Hashman
2019-01-24 1:54 ` Phantom full ext4 root filesystems on 4.1 through 4.14 kernels Liu Bo
2 siblings, 1 reply; 21+ messages in thread
From: Darrick J. Wong @ 2018-11-08 18:47 UTC (permalink / raw)
To: Elana Hashman
Cc: 'tytso@mit.edu', 'linux-ext4@vger.kernel.org'
On Thu, Nov 08, 2018 at 05:59:18PM +0000, Elana Hashman wrote:
> Hi Ted,
>
> We've run into a mysterious "phantom" full filesystem issue on our Kubernetes fleet. We initially encountered this issue on kernel 4.1.35, but are still experiencing the problem after upgrading to 4.14.67. Essentially, `df` reports our root filesystems as full and they behave as though they are full, but the "used" space cannot be accounted for. Rebooting the system, remounting the root filesystem read-only and then remounting as read-write, or booting into single-user mode all free up the "used" space. The disk slowly fills up over time, suggesting that there might be some kind of leak; we previously saw this affecting hosts with ~200 days of uptime on the 4.1 kernel, but are now seeing it affect a 4.14 host with only ~70 days of uptime.
>
> Here is some data from an example host, running the 4.14.67 kernel. The root disk is ext4.
>
> $ uname -a
> Linux <hostname> 4.14.67-ts1 #1 SMP Wed Aug 29 13:28:25 UTC 2018 x86_64 GNU/Linux
> $ grep ' / ' /proc/mounts
> /dev/disk/by-uuid/<some-uuid> / ext4 rw,relatime,errors=remount-ro,data=ordered 0 0
>
> `df` reports 0 bytes free:
>
> $ df -h /
> Filesystem Size Used Avail Use% Mounted on
> /dev/disk/by-uuid/<some-uuid> 50G 48G 0 100% /
This is very odd. I wonder, how many of those overlayfses are still
mounted on the system at this point? Over in xfs land we've discovered
that overlayfs subtly changes the lifetime behavior of incore inodes,
maybe that's what's going on here? (Pure speculation on my part...)
> Deleted, open files account for almost no disk capacity:
>
> $ sudo lsof -a +L1 /
> COMMAND PID USER FD TYPE DEVICE SIZE/OFF NLINK NODE NAME
> java 5313 user 3r REG 8,3 6806312 0 1315847 /var/lib/sss/mc/passwd (deleted)
> java 5313 user 11u REG 8,3 55185 0 2494654 /tmp/classpath.1668Gp (deleted)
> system_ar 5333 user 3r REG 8,3 6806312 0 1315847 /var/lib/sss/mc/passwd (deleted)
> java 5421 user 3r REG 8,3 6806312 0 1315847 /var/lib/sss/mc/passwd (deleted)
> java 5421 user 11u REG 8,3 149313 0 2494486 /tmp/java.fzTwWp (deleted)
> java 5421 tsdist 12u REG 8,3 55185 0 2500513 /tmp/classpath.7AmxHO (deleted)
>
> `du` can only account for 16GB of file usage:
>
> $ sudo du -hxs /
> 16G /
>
> But what is most puzzling is the numbers reported by e2freefrag, which don't add up:
>
> $ sudo e2freefrag /dev/disk/by-uuid/<some-uuid>
> Device: /dev/disk/by-uuid/<some-uuid>
> Blocksize: 4096 bytes
> Total blocks: 13107200
> Free blocks: 7778076 (59.3%)
>
> Min. free extent: 4 KB
> Max. free extent: 8876 KB
> Avg. free extent: 224 KB
> Num. free extent: 6098
>
> HISTOGRAM OF FREE EXTENT SIZES:
> Extent Size Range : Free extents Free Blocks Percent
> 4K... 8K- : 1205 1205 0.02%
> 8K... 16K- : 980 2265 0.03%
> 16K... 32K- : 653 3419 0.04%
> 32K... 64K- : 1337 15374 0.20%
> 64K... 128K- : 631 14151 0.18%
> 128K... 256K- : 224 10205 0.13%
> 256K... 512K- : 261 23818 0.31%
> 512K... 1024K- : 303 56801 0.73%
> 1M... 2M- : 387 135907 1.75%
> 2M... 4M- : 103 64740 0.83%
> 4M... 8M- : 12 15005 0.19%
> 8M... 16M- : 2 4267 0.05%
Clearly a bug in e2freefrag, the percentages are supposed to sum to 100.
Patches soon.
--D
>
> This looks like a bug to me; the histogram in the manpage example has percentages that add up to 100% but this doesn't even add up to 5%.
>
> After a reboot, `df` reflects real utilization:
>
> $ df -h /
> Filesystem Size Used Avail Use% Mounted on
> /dev/disk/by-uuid/<some-uuid> 50G 16G 31G 34% /
>
> We are using overlay2fs for Docker, as well as rbd mounts; I'm not sure how they might interact.
>
> Thanks for your help,
>
> --
> Elana Hashman
> ehashman@twosigma.com
^ permalink raw reply [flat|nested] 21+ messages in thread
* RE: Phantom full ext4 root filesystems on 4.1 through 4.14 kernels
2018-11-08 18:47 ` Darrick J. Wong
@ 2018-12-05 16:26 ` Elana Hashman
2019-01-23 19:59 ` Thomas Walker
0 siblings, 1 reply; 21+ messages in thread
From: Elana Hashman @ 2018-12-05 16:26 UTC (permalink / raw)
To: Darrick J. Wong
Cc: 'tytso@mit.edu', 'linux-ext4@vger.kernel.org',
Thomas Walker
Okay, let's take a look at another affected host. I have not drained it, just cordoned it, so it's still in Kubernetes service and has running, active pods.
$ uname -a
Linux <hostname> 4.14.67-ts1 #1 SMP Wed Aug 29 13:28:25 UTC 2018 x86_64 GNU/Linux
$ df -h /
Filesystem Size Used Avail Use% Mounted on
/dev/disk/by-uuid/<some-uuid> 50G 46G 1.6G 97% /
$ df -hi /
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/disk/by-uuid/<some-uuid> 3.2M 203K 3.0M 7% /
$ sudo du -hxs /
21G /
$ sudo lsof -a +L1 /
lsof: WARNING: can't stat() fuse file system /srv/kubelet/pods/<some-path>
Output information may be incomplete.
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NLINK NODE NAME
java 6392 user 11u REG 8,3 55185 0 1441946 /tmp/classpath.ln0XhI (deleted)
java 6481 user 11u REG 8,3 149313 0 1441945 /tmp/java.AwFIiw (deleted)
java 6481 user 12u REG 8,3 55185 0 1441951 /tmp/classpath.qrQhkS (deleted)
There are many overlay mounts currently active:
$ mount | grep overlay | wc -l
40
Also some fuse mounts (as mentioned in the lsof warning on this particular machine):
$ mount | grep fuse | wc -l
21
And just to double-check, all these modules are running in-tree:
$ modinfo ext4 | grep intree
intree: Y
$ modinfo overlay | grep intree
intree: Y
$ modinfo fuse | grep intree
intree: Y
- e
-----Original Message-----
From: Darrick J. Wong <darrick.wong@oracle.com>
Sent: Thursday, November 8, 2018 1:47 PM
Subject: Re: Phantom full ext4 root filesystems on 4.1 through 4.14 kernels
This is very odd. I wonder, how many of those overlayfses are still mounted on the system at this point? Over in xfs land we've discovered that overlayfs subtly changes the lifetime behavior of incore inodes, maybe that's what's going on here? (Pure speculation on my part...)
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Phantom full ext4 root filesystems on 4.1 through 4.14 kernels
2018-12-05 16:26 ` Elana Hashman
@ 2019-01-23 19:59 ` Thomas Walker
2019-06-26 15:17 ` Thomas Walker
0 siblings, 1 reply; 21+ messages in thread
From: Thomas Walker @ 2019-01-23 19:59 UTC (permalink / raw)
To: Elana Hashman
Cc: Darrick J. Wong, 'tytso@mit.edu',
'linux-ext4@vger.kernel.org'
Unfortunately this still continues to be a persistent problem for us. On another example system:
# uname -a
Linux <hostname> 4.14.67-ts1 #1 SMP Wed Aug 29 13:28:25 UTC 2018 x86_64 GNU/Linux
# df -h /
Filesystem Size Used Avail Use% Mounted on
/dev/disk/by-uuid/<uuid> 50G 46G 1.1G 98% /
# df -hi /
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/disk/by-uuid/<uuid> 3.2M 306K 2.9M 10% /
# du -hsx /
14G /
And confirmed not to be due to sparse files or deleted but still open files.
The most interesting thing that I've been able to find so far is this:
# mount -o remount,ro /
mount: / is busy
# df -h /
Filesystem Size Used Avail Use% Mounted on
/dev/disk/by-uuid/<uuid> 50G 14G 33G 30% /
Something about attempting (and failing) to remount read-only frees up all of the phantom space usage.
Curious whether that sparks ideas in anyone's mind?
I've tried all manner of other things without success. Unmounting all of the overlays. Killing off virtually all of usersapce (dropping to single user). Dropping page/inode/dentry caches.Nothing else (short of a reboot) seems to give us the space back.
On Wed, Dec 05, 2018 at 11:26:19AM -0500, Elana Hashman wrote:
> Okay, let's take a look at another affected host. I have not drained it, just
> cordoned it, so it's still in Kubernetes service and has running, active pods.
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Phantom full ext4 root filesystems on 4.1 through 4.14 kernels
2018-11-08 17:59 Phantom full ext4 root filesystems on 4.1 through 4.14 kernels Elana Hashman
2018-11-08 18:13 ` Reindl Harald
2018-11-08 18:47 ` Darrick J. Wong
@ 2019-01-24 1:54 ` Liu Bo
2019-01-24 14:40 ` Elana Hashman
2 siblings, 1 reply; 21+ messages in thread
From: Liu Bo @ 2019-01-24 1:54 UTC (permalink / raw)
To: Elana Hashman; +Cc: tytso, linux-ext4
On Thu, Nov 8, 2018 at 10:11 AM Elana Hashman
<Elana.Hashman@twosigma.com> wrote:
>
> Hi Ted,
>
> We've run into a mysterious "phantom" full filesystem issue on our Kubernetes fleet. We initially encountered this issue on kernel 4.1.35, but are still experiencing the problem after upgrading to 4.14.67. Essentially, `df` reports our root filesystems as full and they behave as though they are full, but the "used" space cannot be accounted for. Rebooting the system, remounting the root filesystem read-only and then remounting as read-write, or booting into single-user mode all free up the "used" space. The disk slowly fills up over time, suggesting that there might be some kind of leak; we previously saw this affecting hosts with ~200 days of uptime on the 4.1 kernel, but are now seeing it affect a 4.14 host with only ~70 days of uptime.
>
I wonder if this ext4 enabled bigalloc (can be checked by dumpe2fs -h $disk).
So bigalloc is known to cause leak space, and it's been just fixed recently.
thanks,
liubo
> Here is some data from an example host, running the 4.14.67 kernel. The root disk is ext4.
>
> $ uname -a
> Linux <hostname> 4.14.67-ts1 #1 SMP Wed Aug 29 13:28:25 UTC 2018 x86_64 GNU/Linux
> $ grep ' / ' /proc/mounts
> /dev/disk/by-uuid/<some-uuid> / ext4 rw,relatime,errors=remount-ro,data=ordered 0 0
>
> `df` reports 0 bytes free:
>
> $ df -h /
> Filesystem Size Used Avail Use% Mounted on
> /dev/disk/by-uuid/<some-uuid> 50G 48G 0 100% /
>
> Deleted, open files account for almost no disk capacity:
>
> $ sudo lsof -a +L1 /
> COMMAND PID USER FD TYPE DEVICE SIZE/OFF NLINK NODE NAME
> java 5313 user 3r REG 8,3 6806312 0 1315847 /var/lib/sss/mc/passwd (deleted)
> java 5313 user 11u REG 8,3 55185 0 2494654 /tmp/classpath.1668Gp (deleted)
> system_ar 5333 user 3r REG 8,3 6806312 0 1315847 /var/lib/sss/mc/passwd (deleted)
> java 5421 user 3r REG 8,3 6806312 0 1315847 /var/lib/sss/mc/passwd (deleted)
> java 5421 user 11u REG 8,3 149313 0 2494486 /tmp/java.fzTwWp (deleted)
> java 5421 tsdist 12u REG 8,3 55185 0 2500513 /tmp/classpath.7AmxHO (deleted)
>
> `du` can only account for 16GB of file usage:
>
> $ sudo du -hxs /
> 16G /
>
> But what is most puzzling is the numbers reported by e2freefrag, which don't add up:
>
> $ sudo e2freefrag /dev/disk/by-uuid/<some-uuid>
> Device: /dev/disk/by-uuid/<some-uuid>
> Blocksize: 4096 bytes
> Total blocks: 13107200
> Free blocks: 7778076 (59.3%)
>
> Min. free extent: 4 KB
> Max. free extent: 8876 KB
> Avg. free extent: 224 KB
> Num. free extent: 6098
>
> HISTOGRAM OF FREE EXTENT SIZES:
> Extent Size Range : Free extents Free Blocks Percent
> 4K... 8K- : 1205 1205 0.02%
> 8K... 16K- : 980 2265 0.03%
> 16K... 32K- : 653 3419 0.04%
> 32K... 64K- : 1337 15374 0.20%
> 64K... 128K- : 631 14151 0.18%
> 128K... 256K- : 224 10205 0.13%
> 256K... 512K- : 261 23818 0.31%
> 512K... 1024K- : 303 56801 0.73%
> 1M... 2M- : 387 135907 1.75%
> 2M... 4M- : 103 64740 0.83%
> 4M... 8M- : 12 15005 0.19%
> 8M... 16M- : 2 4267 0.05%
>
> This looks like a bug to me; the histogram in the manpage example has percentages that add up to 100% but this doesn't even add up to 5%.
>
> After a reboot, `df` reflects real utilization:
>
> $ df -h /
> Filesystem Size Used Avail Use% Mounted on
> /dev/disk/by-uuid/<some-uuid> 50G 16G 31G 34% /
>
> We are using overlay2fs for Docker, as well as rbd mounts; I'm not sure how they might interact.
>
> Thanks for your help,
>
> --
> Elana Hashman
> ehashman@twosigma.com
^ permalink raw reply [flat|nested] 21+ messages in thread
* RE: Phantom full ext4 root filesystems on 4.1 through 4.14 kernels
2019-01-24 1:54 ` Phantom full ext4 root filesystems on 4.1 through 4.14 kernels Liu Bo
@ 2019-01-24 14:40 ` Elana Hashman
0 siblings, 0 replies; 21+ messages in thread
From: Elana Hashman @ 2019-01-24 14:40 UTC (permalink / raw)
To: Liu Bo; +Cc: tytso, linux-ext4, Thomas Walker
From: Liu Bo <obuil.liubo@gmail.com>
> I wonder if this ext4 enabled bigalloc (can be checked by dumpe2fs -h $disk).
> So bigalloc is known to cause leak space, and it's been just fixed recently.
Doesn't look that way:
$ uname -a
Linux <hostname> 4.14.67-ts1 #1 SMP Wed Aug 29 13:28:25 UTC 2018 x86_64 GNU/Linux
$ sudo du -hxs /
20G /
$ df -h /
Filesystem Size Used Avail Use% Mounted on
/dev/disk/by-uuid/<some-uuid> 50G 46G 1.1G 98% /
$ sudo dumpe2fs -h /dev/disk/by-uuid/<some-uuid>
dumpe2fs 1.42.5 (29-Jul-2012)
Filesystem volume name: ROOT
Last mounted on: /
Filesystem UUID: <some-uuid>
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
Filesystem flags: signed_directory_hash
Default mount options: user_xattr acl
Filesystem state: clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 3276800
Block count: 13107200
Reserved block count: 655360
Free blocks: 9260751
Free inodes: 3078348
First block: 0
Block size: 4096
Fragment size: 4096
Reserved GDT blocks: 1020
Blocks per group: 32768
Fragments per group: 32768
Inodes per group: 8192
Inode blocks per group: 512
Flex block group size: 16
Filesystem created: Tue Jan 16 18:48:13 2018
Last mount time: Mon Oct 1 20:43:59 2018
Last write time: Mon Oct 1 20:43:58 2018
Mount count: 7
Maximum mount count: -1
Last checked: Tue Jan 16 18:48:13 2018
Check interval: 0 (<none>)
Lifetime writes: 7159 GB
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 256
Required extra isize: 28
Desired extra isize: 28
Journal inode: 8
First orphan inode: 2231477
Default directory hash: half_md4
Directory Hash Seed: 8a6f6d03-37e7-409c-b8c9-d738e0b0a94a
Journal backup: inode blocks
Journal features: journal_incompat_revoke
Journal size: 128M
Journal length: 32768
Journal sequence: 0x02a09116
Journal start: 24688
--
Elana Hashman
ehashman@twosigma.com
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Phantom full ext4 root filesystems on 4.1 through 4.14 kernels
2019-01-23 19:59 ` Thomas Walker
@ 2019-06-26 15:17 ` Thomas Walker
2019-07-11 9:23 ` Jan Kara
0 siblings, 1 reply; 21+ messages in thread
From: Thomas Walker @ 2019-06-26 15:17 UTC (permalink / raw)
To: 'linux-ext4@vger.kernel.org'
Cc: Darrick J. Wong, 'tytso@mit.edu', Geoffrey Thomas
Sorry to revive a rather old thread, but Elana mentioned that there might have been a related fix recently? Possibly something to do with truncate?
A quick scan of the last month or so turned up https://www.spinics.net/lists/linux-ext4/msg65772.html but none of these seemed obviously applicable to me.
We do still experience this phantom space usage quite frequently (although the remount workaround below has lowered the priority).
On Wed, Jan 23, 2019 at 02:59:22PM -0500, Thomas Walker wrote:
> Unfortunately this still continues to be a persistent problem for us. On another example system:
>
> # uname -a
> Linux <hostname> 4.14.67-ts1 #1 SMP Wed Aug 29 13:28:25 UTC 2018 x86_64 GNU/Linux
>
> # df -h /
> Filesystem Size Used Avail Use% Mounted on
> /dev/disk/by-uuid/<uuid> 50G 46G 1.1G 98% /
>
> # df -hi /
> Filesystem Inodes IUsed IFree IUse% Mounted on
> /dev/disk/by-uuid/<uuid> 3.2M 306K 2.9M 10% /
>
> # du -hsx /
> 14G /
>
> And confirmed not to be due to sparse files or deleted but still open files.
>
> The most interesting thing that I've been able to find so far is this:
>
> # mount -o remount,ro /
> mount: / is busy
> # df -h /
> Filesystem Size Used Avail Use% Mounted on
> /dev/disk/by-uuid/<uuid> 50G 14G 33G 30% /
>
> Something about attempting (and failing) to remount read-only frees up all of the phantom space usage.
> Curious whether that sparks ideas in anyone's mind?
>
> I've tried all manner of other things without success. Unmounting all of the overlays. Killing off virtually all of usersapce (dropping to single user). Dropping page/inode/dentry caches.Nothing else (short of a reboot) seems to give us the space back.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Phantom full ext4 root filesystems on 4.1 through 4.14 kernels
2019-06-26 15:17 ` Thomas Walker
@ 2019-07-11 9:23 ` Jan Kara
2019-07-11 14:40 ` Geoffrey Thomas
0 siblings, 1 reply; 21+ messages in thread
From: Jan Kara @ 2019-07-11 9:23 UTC (permalink / raw)
To: Thomas Walker
Cc: 'linux-ext4@vger.kernel.org',
Darrick J. Wong, 'tytso@mit.edu',
Geoffrey Thomas
On Wed 26-06-19 11:17:54, Thomas Walker wrote:
> Sorry to revive a rather old thread, but Elana mentioned that there might
> have been a related fix recently? Possibly something to do with
> truncate? A quick scan of the last month or so turned up
> https://www.spinics.net/lists/linux-ext4/msg65772.html but none of these
> seemed obviously applicable to me. We do still experience this phantom
> space usage quite frequently (although the remount workaround below has
> lowered the priority).
I don't recall any fix for this. But seeing that remount "fixes" the issue
for you can you try whether one of the following has a similar effect?
1) Try "sync"
2) Try "fsfreeze -f / && fsfreeze -u /"
3) Try "echo 3 >/proc/sys/vm/drop_caches"
Also what is the contents of
/sys/fs/ext4/<problematic-device>/delayed_allocation_blocks
when the issue happens?
Honza
>
> On Wed, Jan 23, 2019 at 02:59:22PM -0500, Thomas Walker wrote:
> > Unfortunately this still continues to be a persistent problem for us. On another example system:
> >
> > # uname -a
> > Linux <hostname> 4.14.67-ts1 #1 SMP Wed Aug 29 13:28:25 UTC 2018 x86_64 GNU/Linux
> >
> > # df -h /
> > Filesystem Size Used Avail Use% Mounted on
> > /dev/disk/by-uuid/<uuid> 50G 46G 1.1G 98% /
> >
> > # df -hi /
> > Filesystem Inodes IUsed IFree IUse% Mounted on
> > /dev/disk/by-uuid/<uuid> 3.2M 306K 2.9M 10% /
> >
> > # du -hsx /
> > 14G /
> >
> > And confirmed not to be due to sparse files or deleted but still open files.
> >
> > The most interesting thing that I've been able to find so far is this:
> >
> > # mount -o remount,ro /
> > mount: / is busy
> > # df -h /
> > Filesystem Size Used Avail Use% Mounted on
> > /dev/disk/by-uuid/<uuid> 50G 14G 33G 30% /
> >
> > Something about attempting (and failing) to remount read-only frees up all of the phantom space usage.
> > Curious whether that sparks ideas in anyone's mind?
> >
> > I've tried all manner of other things without success. Unmounting all of the overlays. Killing off virtually all of usersapce (dropping to single user). Dropping page/inode/dentry caches.Nothing else (short of a reboot) seems to give us the space back.
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 21+ messages in thread
* RE: Phantom full ext4 root filesystems on 4.1 through 4.14 kernels
2019-07-11 9:23 ` Jan Kara
@ 2019-07-11 14:40 ` Geoffrey Thomas
2019-07-11 15:23 ` Jan Kara
2019-07-11 17:10 ` Theodore Ts'o
0 siblings, 2 replies; 21+ messages in thread
From: Geoffrey Thomas @ 2019-07-11 14:40 UTC (permalink / raw)
To: 'Jan Kara', Thomas Walker
Cc: 'linux-ext4@vger.kernel.org',
Darrick J. Wong, 'tytso@mit.edu'
On Thursday, July 11, 2019 5:23 AM, Jan Kara <jack@suse.cz> wrote:
> On Wed 26-06-19 11:17:54, Thomas Walker wrote:
> > Sorry to revive a rather old thread, but Elana mentioned that there might
> > have been a related fix recently? Possibly something to do with
> > truncate? A quick scan of the last month or so turned up
> > https://www.spinics.net/lists/linux-ext4/msg65772.html but none of these
> > seemed obviously applicable to me. We do still experience this phantom
> > space usage quite frequently (although the remount workaround below has
> > lowered the priority).
>
> I don't recall any fix for this. But seeing that remount "fixes" the issue
> for you can you try whether one of the following has a similar effect?
>
> 1) Try "sync"
> 2) Try "fsfreeze -f / && fsfreeze -u /"
> 3) Try "echo 3 >/proc/sys/vm/drop_caches"
>
> Also what is the contents of
> /sys/fs/ext4/<problematic-device>/delayed_allocation_blocks
> when the issue happens?
We just had one of these today, and no luck from any of those. delayed_allocation_blocks is 1:
# df -h /
Filesystem Size Used Avail Use% Mounted on
/dev/disk/by-uuid/523c8243-5a25-40eb-8627-f3bbf98ec299 50G 47G 455M 100% /
# ls -l /dev/disk/by-uuid/523c8243-5a25-40eb-8627-f3bbf98ec299
lrwxrwxrwx 1 root root 10 Mar 8 16:03 /dev/disk/by-uuid/523c8243-5a25-40eb-8627-f3bbf98ec299 -> ../../sda3
# cat /sys/fs/ext4/sda3/delayed_allocation_blocks
1
# sync; sync; sync
# df -h /
Filesystem Size Used Avail Use% Mounted on
/dev/disk/by-uuid/523c8243-5a25-40eb-8627-f3bbf98ec299 50G 47G 455M 100% /
# fsfreeze -f /; fsfreeze -u /
# df -h /
Filesystem Size Used Avail Use% Mounted on
/dev/disk/by-uuid/523c8243-5a25-40eb-8627-f3bbf98ec299 50G 47G 455M 100% /
# echo 3 > /proc/sys/vm/drop_caches
[blocks for about 10 seconds]
# df -h /
Filesystem Size Used Avail Use% Mounted on
/dev/disk/by-uuid/523c8243-5a25-40eb-8627-f3bbf98ec299 50G 47G 454M 100% /
# umount /
[blocks for about 2 seconds]
umount: /: device is busy.
(In some cases useful info about processes that use
the device is found by lsof(8) or fuser(1))
# df -h /
Filesystem Size Used Avail Use% Mounted on
/dev/disk/by-uuid/523c8243-5a25-40eb-8627-f3bbf98ec299 50G 16G 32G 33% /
# uname -r
4.14.67-ts1
--
Geoffrey Thomas
geofft@twosigma.com
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Phantom full ext4 root filesystems on 4.1 through 4.14 kernels
2019-07-11 14:40 ` Geoffrey Thomas
@ 2019-07-11 15:23 ` Jan Kara
2019-07-11 17:10 ` Theodore Ts'o
1 sibling, 0 replies; 21+ messages in thread
From: Jan Kara @ 2019-07-11 15:23 UTC (permalink / raw)
To: Geoffrey Thomas
Cc: 'Jan Kara',
Thomas Walker, 'linux-ext4@vger.kernel.org',
Darrick J. Wong, 'tytso@mit.edu'
On Thu 11-07-19 14:40:43, Geoffrey Thomas wrote:
> On Thursday, July 11, 2019 5:23 AM, Jan Kara <jack@suse.cz> wrote:
> > On Wed 26-06-19 11:17:54, Thomas Walker wrote:
> > > Sorry to revive a rather old thread, but Elana mentioned that there might
> > > have been a related fix recently? Possibly something to do with
> > > truncate? A quick scan of the last month or so turned up
> > > https://www.spinics.net/lists/linux-ext4/msg65772.html but none of these
> > > seemed obviously applicable to me. We do still experience this phantom
> > > space usage quite frequently (although the remount workaround below has
> > > lowered the priority).
> >
> > I don't recall any fix for this. But seeing that remount "fixes" the issue
> > for you can you try whether one of the following has a similar effect?
> >
> > 1) Try "sync"
> > 2) Try "fsfreeze -f / && fsfreeze -u /"
> > 3) Try "echo 3 >/proc/sys/vm/drop_caches"
> >
> > Also what is the contents of
> > /sys/fs/ext4/<problematic-device>/delayed_allocation_blocks
> > when the issue happens?
>
> We just had one of these today, and no luck from any of those.
> delayed_allocation_blocks is 1:
...
This is very strange because failed remount read-only (with EBUSY) doesn't
really do more than what "sync; echo 3 >/proc/sys/vm/drop_caches" does. I
suspect there's really some userspace taking up space and cleaning up on
umount. Anyway once this happens again, can you do:
fsfreeze -f /
e2image -r /dev/disk/by-uuid/523c8243-5a25-40eb-8627-f3bbf98ec299 - | \
xz >some_storage.xz
fsfreeze -u /
some_storage.xz can be on some usb stick or so. It will dump ext4 metadata
to the file. Then please provide some_storage.xz for download somewhere.
Thanks! BTW I'll be on vacation next two weeks so it will take a while to
get to this...
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Phantom full ext4 root filesystems on 4.1 through 4.14 kernels
2019-07-11 14:40 ` Geoffrey Thomas
2019-07-11 15:23 ` Jan Kara
@ 2019-07-11 17:10 ` Theodore Ts'o
2019-07-12 19:19 ` Thomas Walker
1 sibling, 1 reply; 21+ messages in thread
From: Theodore Ts'o @ 2019-07-11 17:10 UTC (permalink / raw)
To: Geoffrey Thomas
Cc: 'Jan Kara',
Thomas Walker, 'linux-ext4@vger.kernel.org',
Darrick J. Wong
Can you try using "df -i" when the file system looks full, and then
reboot, and look at the results of "df -i" afterwards?
Also interesting would be to grab a metadata-only snapshot of the file
system when it is in its mysteriously full state, writing that
snapshot on some other file system *other* than on /dev/sda3:
e2image -r /dev/sda3 /mnt/sda3.e2i
Then run e2fsck on it:
e2fsck -fy /mnt/sda3.e2i
What I'm curious about is how many "orphaned inodes" are reported, and
how much space they are taking up. That will look like this:
% gunzip < /usr/src/e2fsprogs/tests/f_orphan/image.gz > /tmp/foo.img
% e2fsck -fy /tmp/foo.img
e2fsck 1.45.2 (27-May-2019)
Clearing orphaned inode 15 (uid=0, gid=0, mode=040755, size=1024)
Clearing orphaned inode 17 (uid=0, gid=0, mode=0100644, size=0)
Clearing orphaned inode 16 (uid=0, gid=0, mode=040755, size=1024)
Clearing orphaned inode 14 (uid=0, gid=0, mode=0100644, size=69)
Clearing orphaned inode 13 (uid=0, gid=0, mode=040755, size=1024)
...
It's been theorized the bug is in overlayfs, where it's holding inodes
open so the space isn't released. IIRC somewhat had reported a
similar problem with overlayfs on top of xfs. (BTW, are you using
overlayfs or aufs with your Docker setup?)
- Ted
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Phantom full ext4 root filesystems on 4.1 through 4.14 kernels
2019-07-11 17:10 ` Theodore Ts'o
@ 2019-07-12 19:19 ` Thomas Walker
2019-07-12 20:28 ` Theodore Ts'o
0 siblings, 1 reply; 21+ messages in thread
From: Thomas Walker @ 2019-07-12 19:19 UTC (permalink / raw)
To: Theodore Ts'o
Cc: Geoffrey Thomas, 'Jan Kara',
'linux-ext4@vger.kernel.org',
Darrick J. Wong
On Thu, Jul 11, 2019 at 01:10:46PM -0400, Theodore Ts'o wrote:
> Can you try using "df -i" when the file system looks full, and then
> reboot, and look at the results of "df -i" afterwards?
inode usage doesn't change appreciably between the state with "lost" space, after the remount workaround, or after a reboot.
> Also interesting would be to grab a metadata-only snapshot of the file
> system when it is in its mysteriously full state, writing that
> snapshot on some other file system *other* than on /dev/sda3:
>
> e2image -r /dev/sda3 /mnt/sda3.e2i
>
> Then run e2fsck on it:
>
> e2fsck -fy /mnt/sda3.e2i
>
> What I'm curious about is how many "orphaned inodes" are reported, and
> how much space they are taking up. That will look like this:
<..>
Clearing orphaned inode 2360177 (uid=0, gid=0, mode=0100644, size=1035390)
Clearing orphaned inode 2360181 (uid=0, gid=0, mode=0100644, size=185522)
Clearing orphaned inode 2360176 (uid=0, gid=0, mode=0100644, size=1924512)
Clearing orphaned inode 2360180 (uid=0, gid=0, mode=0100644, size=3621978)
Clearing orphaned inode 1048838 (uid=0, gid=4, mode=0100640, size=39006841856)
release_inode_blocks: Corrupt extent header while calling ext2fs_block_iterate for inode 1048838
<..>
Of particular note, ino 1048838 matches the size of the space that we "lost".
A few months ago I was poking at this with kprobes to try to understand what was happening with the attempt to remount read-only and noticed that it triggered hundreds of:
<...>-78273 [000] .... 5186888.917840: ext4_remove_blocks: dev 8,3 ino 2889535 extent [0(11384832), 2048]from 0 to 2047 partial_cluster 0
<...>-78273 [000] .... 5186888.917841: <stack trace>
=> ext4_ext_truncate
=> ext4_truncate
=> ext4_evict_inode
=> evict
=> iput
=> dentry_unlink_inode
=> __dentry_kill
=> dput.part.23
=> dput
=> SyS_rename
=> do_syscall_64
=> entry_SYSCALL_64_after_hwframe
With all the same inode numbers whose sizes added up to the same amount of space that we had "lost" and got back. But the inode didn't map to any file or open filehandle though.
Obviously the inode numbers match here and, if I total up all of the ext4_remove_blocks lines, the number of blocks match both what fsck reports and what we "lost"
>
> ...
>
> It's been theorized the bug is in overlayfs, where it's holding inodes
> open so the space isn't released. IIRC somewhat had reported a
> similar problem with overlayfs on top of xfs. (BTW, are you using
> overlayfs or aufs with your Docker setup?)
>
Yes, we are using overlayfs and had also heard similar reports. But the ext4 fs here is the root filesystem,
while all of the overlays are using a separate partition using XFS. The only interplay between our root fs and
the docker containers is the occasional ro bind mount from root into the container.
Unfortunately, we've not been able to reproduce this outside of our production plant running real workload with real data. I did capture a metadata dump (with dirents scrambled) as Jan asked, but I suspect it will take some work to get past our security team. I can certainly do that if we think there is anything valuable though.
Thanks,
Tom.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Phantom full ext4 root filesystems on 4.1 through 4.14 kernels
2019-07-12 19:19 ` Thomas Walker
@ 2019-07-12 20:28 ` Theodore Ts'o
2019-07-12 21:47 ` Geoffrey Thomas
0 siblings, 1 reply; 21+ messages in thread
From: Theodore Ts'o @ 2019-07-12 20:28 UTC (permalink / raw)
To: Thomas Walker
Cc: Geoffrey Thomas, 'Jan Kara',
'linux-ext4@vger.kernel.org',
Darrick J. Wong
On Fri, Jul 12, 2019 at 03:19:03PM -0400, Thomas Walker wrote:
> Clearing orphaned inode 1048838 (uid=0, gid=4, mode=0100640, size=39006841856)
> Of particular note, ino 1048838 matches the size of the space that we "lost".
Hmmm... what's gid 4? Is that a hint of where the inode might have come from?
Can you try the this experiment of e2image... e2fsck, but add a "cp
--sparse" of the e2i file between the e2image and e2fsck step? Then
when you can identify the inode that has the huge amount of the
orphaned space, try grab the first couple of blocks, and then run
"file" on the first part of the file, which might help you identify
where the file came from. Is it an ISO file? etc.
The goal is to come up with a repeatable way of forcing the failure,
so we can understand the root cause of the "lost space". The fact
that it's an orphaned inode means that something was hanging onto the
inode. The question is what part of the kernel was keeping the ref
count elevated.
Thanks,
- Ted
^ permalink raw reply [flat|nested] 21+ messages in thread
* RE: Phantom full ext4 root filesystems on 4.1 through 4.14 kernels
2019-07-12 20:28 ` Theodore Ts'o
@ 2019-07-12 21:47 ` Geoffrey Thomas
2019-07-25 21:22 ` Geoffrey Thomas
0 siblings, 1 reply; 21+ messages in thread
From: Geoffrey Thomas @ 2019-07-12 21:47 UTC (permalink / raw)
To: 'Theodore Ts'o', Thomas Walker
Cc: 'Jan Kara', 'linux-ext4@vger.kernel.org',
Darrick J. Wong
On Friday, July 12, 2019 4:28 PM, Theodore Ts'o <tytso@mit.edu> wrote:
> To: Thomas Walker <Thomas.Walker@twosigma.com>
> Cc: Geoffrey Thomas <Geoffrey.Thomas@twosigma.com>; 'Jan Kara'
> <jack@suse.cz>; 'linux-ext4@vger.kernel.org' <linux-ext4@vger.kernel.org>;
> Darrick J. Wong <darrick.wong@oracle.com>
> Subject: Re: Phantom full ext4 root filesystems on 4.1 through 4.14
> kernels
>
> On Fri, Jul 12, 2019 at 03:19:03PM -0400, Thomas Walker wrote:
> > Clearing orphaned inode 1048838 (uid=0, gid=4, mode=0100640,
> size=39006841856)
>
> > Of particular note, ino 1048838 matches the size of the space that we
> "lost".
>
> Hmmm... what's gid 4? Is that a hint of where the inode might have come
> from?
Good call, gid 4 is `adm`. And now that we have an inode number we can see the file's contents, it's from /var/log/account.
I bet that this is acct(2) holding onto a reference in some weird way (possibly involving logrotate?), which also explains why we couldn't find a userspace process holding onto the inode. We'll investigate a bit....
Thanks,
--
Geoffrey Thomas
geofft@twosigma.com
^ permalink raw reply [flat|nested] 21+ messages in thread
* RE: Phantom full ext4 root filesystems on 4.1 through 4.14 kernels
2019-07-12 21:47 ` Geoffrey Thomas
@ 2019-07-25 21:22 ` Geoffrey Thomas
2019-07-29 10:09 ` Jan Kara
0 siblings, 1 reply; 21+ messages in thread
From: Geoffrey Thomas @ 2019-07-25 21:22 UTC (permalink / raw)
To: 'Theodore Ts'o', Thomas Walker
Cc: 'Jan Kara', 'linux-ext4@vger.kernel.org',
'Darrick J. Wong'
On Friday, July 12, 2019 5:47 PM, Geoffrey Thomas <Geoffrey.Thomas@twosigma.com> wrote:
> On Friday, July 12, 2019 4:28 PM, Theodore Ts'o <tytso@mit.edu> wrote:
> > Hmmm... what's gid 4? Is that a hint of where the inode might have come
> > from?
>
> Good call, gid 4 is `adm`. And now that we have an inode number we can see
> the file's contents, it's from /var/log/account.
>
> I bet that this is acct(2) holding onto a reference in some weird way
> (possibly involving logrotate?), which also explains why we couldn't find
> a userspace process holding onto the inode. We'll investigate a bit....
To close this out - yes, this was process accounting. Debian has a nightly cronjob which rotates the pacct logs, runs `invoke-rc.d acct restart` to reopen the file, and compresses the old log. Due to a stray policy-rc.d file from an old provisioning script, however, the restart was being skipped, and so we were unlinking and compressing the pacct file while the kernel still had it open. So it was the classic problem of an open file handle to a large deleted file, except that the open file handle was being held by the kernel.
`accton off` solved our immediate problems and freed the space. I'm not totally sure why a failed umount had that effect, too, but I suppose it turned off process accounting.
It's a little frustrating to me that the file opened by acct(2) doesn't show up to userspace (lsof doesn't seem to find it) - it'd be nice if it could show up in /proc/$some_kernel_thread/fd or somewhere, if possible.
Thanks for the help - the e2image + fsck trick is great!
--
Geoffrey Thomas
geofft@twosigma.com
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Phantom full ext4 root filesystems on 4.1 through 4.14 kernels
2019-07-25 21:22 ` Geoffrey Thomas
@ 2019-07-29 10:09 ` Jan Kara
2019-07-29 11:18 ` ext4 file system is constantly writing to the block device with no activity from the applications, is it a bug? Dmitrij Gusev
0 siblings, 1 reply; 21+ messages in thread
From: Jan Kara @ 2019-07-29 10:09 UTC (permalink / raw)
To: Geoffrey Thomas
Cc: 'Theodore Ts'o', Thomas Walker, 'Jan Kara',
'linux-ext4@vger.kernel.org', 'Darrick J. Wong'
On Thu 25-07-19 21:22:28, Geoffrey Thomas wrote:
> On Friday, July 12, 2019 5:47 PM, Geoffrey Thomas <Geoffrey.Thomas@twosigma.com> wrote:
> > On Friday, July 12, 2019 4:28 PM, Theodore Ts'o <tytso@mit.edu> wrote:
> > > Hmmm... what's gid 4? Is that a hint of where the inode might have come
> > > from?
> >
> > Good call, gid 4 is `adm`. And now that we have an inode number we can see
> > the file's contents, it's from /var/log/account.
> >
> > I bet that this is acct(2) holding onto a reference in some weird way
> > (possibly involving logrotate?), which also explains why we couldn't find
> > a userspace process holding onto the inode. We'll investigate a bit....
>
> To close this out - yes, this was process accounting. Debian has a nightly cronjob which rotates the pacct logs, runs `invoke-rc.d acct restart` to reopen the file, and compresses the old log. Due to a stray policy-rc.d file from an old provisioning script, however, the restart was being skipped, and so we were unlinking and compressing the pacct file while the kernel still had it open. So it was the classic problem of an open file handle to a large deleted file, except that the open file handle was being held by the kernel.
>
> `accton off` solved our immediate problems and freed the space. I'm not totally sure why a failed umount had that effect, too, but I suppose it turned off process accounting.
>
> It's a little frustrating to me that the file opened by acct(2) doesn't show up to userspace (lsof doesn't seem to find it) - it'd be nice if it could show up in /proc/$some_kernel_thread/fd or somewhere, if possible.
>
> Thanks for the help - the e2image + fsck trick is great!
Glad to hear you were able to solve the problem in the end :)
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 21+ messages in thread
* ext4 file system is constantly writing to the block device with no activity from the applications, is it a bug?
2019-07-29 10:09 ` Jan Kara
@ 2019-07-29 11:18 ` Dmitrij Gusev
2019-07-29 12:55 ` Theodore Y. Ts'o
0 siblings, 1 reply; 21+ messages in thread
From: Dmitrij Gusev @ 2019-07-29 11:18 UTC (permalink / raw)
To: 'linux-ext4@vger.kernel.org'; +Cc: 'Theodore Ts'o'
Hello.
A ext4 file system is constantly writing to the block device with no
activity from the applications, is it a bug?
Write speed is about 64k bytes (almost always exactly 64k bytes) per
second every 1-2 seconds (I've discovered it after a RAID sync
finished). Please the check activity log sample below.
I could not find out which process was writing constantly, so I rebooted
to a rescue mode and mounted FS with no activity guaranteed - the same
activity has started about at 64kB/s every 1-2 seconds.
The file system was created with the latest e2fsprogs, here are the details:
Linux nexus 4.19.62 #2 SMP Sun Jul 28 17:18:10 CDT 2019 x86_64 AMD
Athlon(tm) Dual Core Processor 5050e AuthenticAMD GNU/Linux
# tune2fs -l /dev/<XXXXXXvg>/home
tune2fs 1.43.1 (08-Jun-2016) // file system was created in other system
with the latest version tools
Filesystem volume name: home
Last mounted on: /home
Filesystem UUID: XXXXXXXXXXXXXXXXXX
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: has_journal ext_attr resize_inode dir_index
filetype needs_recovery extent 64bit flex_bg sparse_super large_file
huge_file dir_nlink extra_isize metadata_csum
Filesystem flags: signed_directory_hash
Default mount options: user_xattr acl
Filesystem state: clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 21644032
Block count: 2770432000
Reserved block count: 0
Free blocks: 2095699032
Free inodes: 21205431
First block: 0
Block size: 4096
Fragment size: 4096
Group descriptor size: 64
Reserved GDT blocks: 726
Blocks per group: 32768
Fragments per group: 32768
Inodes per group: 256
Inode blocks per group: 16
RAID stride: 128
RAID stripe width: 384
Flex block group size: 16
Filesystem created: Sat Jul 27 16:41:21 2019
Last mount time: Mon Jul 29 13:47:33 2019
Last write time: Mon Jul 29 13:47:33 2019
Mount count: 10
Maximum mount count: -1
Last checked: Sat Jul 27 16:41:21 2019
Check interval: 0 (<none>)
Lifetime writes: 2615 GB
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 256
Required extra isize: 32
Desired extra isize: 32
Journal inode: 8
Default directory hash: half_md4
Directory Hash Seed: XXXXXXXXXXXXXXXXXXXXXXX
Journal backup: inode blocks
Checksum type: crc32c
Checksum: 0x7f0fb170
BlockDev Activity log:
# iostat -kdt 1 /dev/XXXXXXvg/home
Linux 4.19.62 (XXXXX) 07/29/2019 _x86_64_ (2 CPU)
07/29/2019 02:08:03 PM
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
dm-5 1.91 15.40 37.77 19309 47364
07/29/2019 02:08:04 PM
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
dm-5 1.00 0.00 64.00 0 64
07/29/2019 02:08:05 PM
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
dm-5 1.00 0.00 64.00 0 64
07/29/2019 02:08:06 PM
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
dm-5 0.00 0.00 0.00 0 0
07/29/2019 02:08:07 PM
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
dm-5 1.00 0.00 64.00 0 64
07/29/2019 02:08:08 PM
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
dm-5 0.00 0.00 0.00 0 0
07/29/2019 02:08:09 PM
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
dm-5 1.00 0.00 64.00 0 64
07/29/2019 02:08:10 PM
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
dm-5 1.00 0.00 64.00 0 64
07/29/2019 02:08:11 PM
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
dm-5 0.00 0.00 0.00 0 0
07/29/2019 02:08:12 PM
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
dm-5 1.00 0.00 64.00 0 64
07/29/2019 02:08:13 PM
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
dm-5 0.00 0.00 0.00 0 0
07/29/2019 02:08:14 PM
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
dm-5 1.00 0.00 64.00 0 64
07/29/2019 02:08:15 PM
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
dm-5 1.00 0.00 64.00 0 64
07/29/2019 02:08:16 PM
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
dm-5 0.00 0.00 0.00 0 0
07/29/2019 02:08:17 PM
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
dm-5 1.00 0.00 64.00 0 64
07/29/2019 02:08:18 PM
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
dm-5 0.00 0.00 0.00 0 0
07/29/2019 02:08:19 PM
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
dm-5 1.00 0.00 64.00 0 64
07/29/2019 02:08:20 PM
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
dm-5 1.00 0.00 64.00 0 64
07/29/2019 02:08:21 PM
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
dm-5 0.00 0.00 0.00 0 0
07/29/2019 02:08:22 PM
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
dm-5 1.00 0.00 64.00 0 64
07/29/2019 02:08:23 PM
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
dm-5 0.00 0.00 0.00 0 0
07/29/2019 02:08:24 PM
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
dm-5 1.00 0.00 64.00 0 64
Thank you,
---
Dmitrij Gusev
Systems architect
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: ext4 file system is constantly writing to the block device with no activity from the applications, is it a bug?
2019-07-29 11:18 ` ext4 file system is constantly writing to the block device with no activity from the applications, is it a bug? Dmitrij Gusev
@ 2019-07-29 12:55 ` Theodore Y. Ts'o
2019-07-29 21:12 ` Dmitrij Gusev
0 siblings, 1 reply; 21+ messages in thread
From: Theodore Y. Ts'o @ 2019-07-29 12:55 UTC (permalink / raw)
To: Dmitrij Gusev; +Cc: 'linux-ext4@vger.kernel.org'
On Mon, Jul 29, 2019 at 02:18:22PM +0300, Dmitrij Gusev wrote:
>
> A ext4 file system is constantly writing to the block device with no
> activity from the applications, is it a bug?
>
> Write speed is about 64k bytes (almost always exactly 64k bytes) per second
> every 1-2 seconds (I've discovered it after a RAID sync finished). Please
> the check activity log sample below.
Is this a freshly created file system? It could be the lazy inode
table initialization. You can suppress it using "mount -o
noinit_itable", but it will leave portions of the inode table
unzeroed, which can lead to confusion if the system crashes and e2fsck
has to try to recover the file system.
Or you can not enable lazy inode table initialization when the file
system is created, using "mke2fs -t ext4 -E lazy_itable_init=0
/dev/XXX". (See the manual page for mke2fs.conf for another way to
turn it off by default.)
Turning off lazy inode table initialization mke2fs to take **much**
longer, especially on large RAID arrays. The idea is to trade off
mkfs time with background activity to initialize the inode table when
the file system is mounted. The noinit_itable mount option was added
so that a distro installer can temporarily suppress the background
inode table initialization to speed up the install; but then when the
system is booted, it can run in the background later.
If that's not it, try installing the blktrace package and then run
"btrace /dev/<vg>/home", and see what it reports. For example, here's
the output from running "touch /mnt/test" (comments prefixed by '#'):
# here's the touch process reading the inode...
259,0 2 1 37.115679608 6646 Q RM 4232 + 8 [touch]
259,0 2 2 37.115682891 6646 C RM 4232 + 8 [0]
# here's the journal commit, 5 seconds later
259,0 1 11 42.543705759 6570 Q WS 3932216 + 8 [jbd2/pmem0-8]
259,0 1 12 42.543709184 6570 C WS 3932216 + 8 [0]
259,0 1 13 42.543713049 6570 Q WS 3932224 + 8 [jbd2/pmem0-8]
259,0 1 14 42.543714248 6570 C WS 3932224 + 8 [0]
259,0 1 15 42.543717049 6570 Q WS 3932232 + 8 [jbd2/pmem0-8]
259,0 1 16 42.543718193 6570 C WS 3932232 + 8 [0]
259,0 1 17 42.543720895 6570 Q WS 3932240 + 8 [jbd2/pmem0-8]
259,0 1 18 42.543722028 6570 C WS 3932240 + 8 [0]
259,0 1 19 42.543724806 6570 Q WS 3932248 + 8 [jbd2/pmem0-8]
259,0 1 20 42.543725952 6570 C WS 3932248 + 8 [0]
259,0 1 21 42.543728697 6570 Q WS 3932256 + 8 [jbd2/pmem0-8]
259,0 1 22 42.543729799 6570 C WS 3932256 + 8 [0]
259,0 1 23 42.543745380 6570 Q FWFS 3932264 + 8 [jbd2/pmem0-8]
259,0 1 24 42.543746836 6570 C FWFS 3932264 + 8 [0]
# and here's the writeback to the inode table and superblock,
# 30 seconds later
259,0 1 25 72.836967205 91 Q W 0 + 8 [kworker/u8:3]
259,0 1 26 72.836970861 91 C W 0 + 8 [0]
259,0 1 27 72.836984218 91 Q WM 8 + 8 [kworker/u8:3]
259,0 1 28 72.836985929 91 C WM 8 + 8 [0]
259,0 1 29 72.836992108 91 Q WM 4232 + 8 [kworker/u8:3]
259,0 1 30 72.836993953 91 C WM 4232 + 8 [0]
259,0 1 31 72.837001370 91 Q WM 4360 + 8 [kworker/u8:3]
259,0 1 32 72.837003210 91 C WM 4360 + 8 [0]
259,0 1 33 72.837010993 91 Q WM 69896 + 8 [kworker/u8:3]
259,0 1 34 72.837012564 91 C WM 69896 + 8 [0]
Cheers,
- Ted
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: ext4 file system is constantly writing to the block device with no activity from the applications, is it a bug?
2019-07-29 12:55 ` Theodore Y. Ts'o
@ 2019-07-29 21:12 ` Dmitrij Gusev
0 siblings, 0 replies; 21+ messages in thread
From: Dmitrij Gusev @ 2019-07-29 21:12 UTC (permalink / raw)
To: Theodore Y. Ts'o; +Cc: 'linux-ext4@vger.kernel.org'
Hello.
Yes, this is the lazy inode table initialization.
After I remounted the FS with a "noinit_itable" option the activity has
stopped (then I remounted it back to continue initialization).
Appreciate your help.
P.S.
Thank you, Theodore and the development team involved for the great ext4
filesystem.
It is a reliable, high-performance file system. It always served me well
for many years and continues to do so.
I had never lost any data using it, even though some systems experienced
many crashes or power losses.
Sincerely,
Dmitrij
On 2019-07-29 15:55, Theodore Y. Ts'o wrote:
> On Mon, Jul 29, 2019 at 02:18:22PM +0300, Dmitrij Gusev wrote:
>> A ext4 file system is constantly writing to the block device with no
>> activity from the applications, is it a bug?
>>
>> Write speed is about 64k bytes (almost always exactly 64k bytes) per second
>> every 1-2 seconds (I've discovered it after a RAID sync finished). Please
>> the check activity log sample below.
> Is this a freshly created file system? It could be the lazy inode
> table initialization. You can suppress it using "mount -o
> noinit_itable", but it will leave portions of the inode table
> unzeroed, which can lead to confusion if the system crashes and e2fsck
> has to try to recover the file system.
>
> Or you can not enable lazy inode table initialization when the file
> system is created, using "mke2fs -t ext4 -E lazy_itable_init=0
> /dev/XXX". (See the manual page for mke2fs.conf for another way to
> turn it off by default.)
>
> Turning off lazy inode table initialization mke2fs to take **much**
> longer, especially on large RAID arrays. The idea is to trade off
> mkfs time with background activity to initialize the inode table when
> the file system is mounted. The noinit_itable mount option was added
> so that a distro installer can temporarily suppress the background
> inode table initialization to speed up the install; but then when the
> system is booted, it can run in the background later.
>
>
> If that's not it, try installing the blktrace package and then run
> "btrace /dev/<vg>/home", and see what it reports. For example, here's
> the output from running "touch /mnt/test" (comments prefixed by '#'):
>
> # here's the touch process reading the inode...
> 259,0 2 1 37.115679608 6646 Q RM 4232 + 8 [touch]
> 259,0 2 2 37.115682891 6646 C RM 4232 + 8 [0]
> # here's the journal commit, 5 seconds later
> 259,0 1 11 42.543705759 6570 Q WS 3932216 + 8 [jbd2/pmem0-8]
> 259,0 1 12 42.543709184 6570 C WS 3932216 + 8 [0]
> 259,0 1 13 42.543713049 6570 Q WS 3932224 + 8 [jbd2/pmem0-8]
> 259,0 1 14 42.543714248 6570 C WS 3932224 + 8 [0]
> 259,0 1 15 42.543717049 6570 Q WS 3932232 + 8 [jbd2/pmem0-8]
> 259,0 1 16 42.543718193 6570 C WS 3932232 + 8 [0]
> 259,0 1 17 42.543720895 6570 Q WS 3932240 + 8 [jbd2/pmem0-8]
> 259,0 1 18 42.543722028 6570 C WS 3932240 + 8 [0]
> 259,0 1 19 42.543724806 6570 Q WS 3932248 + 8 [jbd2/pmem0-8]
> 259,0 1 20 42.543725952 6570 C WS 3932248 + 8 [0]
> 259,0 1 21 42.543728697 6570 Q WS 3932256 + 8 [jbd2/pmem0-8]
> 259,0 1 22 42.543729799 6570 C WS 3932256 + 8 [0]
> 259,0 1 23 42.543745380 6570 Q FWFS 3932264 + 8 [jbd2/pmem0-8]
> 259,0 1 24 42.543746836 6570 C FWFS 3932264 + 8 [0]
> # and here's the writeback to the inode table and superblock,
> # 30 seconds later
> 259,0 1 25 72.836967205 91 Q W 0 + 8 [kworker/u8:3]
> 259,0 1 26 72.836970861 91 C W 0 + 8 [0]
> 259,0 1 27 72.836984218 91 Q WM 8 + 8 [kworker/u8:3]
> 259,0 1 28 72.836985929 91 C WM 8 + 8 [0]
> 259,0 1 29 72.836992108 91 Q WM 4232 + 8 [kworker/u8:3]
> 259,0 1 30 72.836993953 91 C WM 4232 + 8 [0]
> 259,0 1 31 72.837001370 91 Q WM 4360 + 8 [kworker/u8:3]
> 259,0 1 32 72.837003210 91 C WM 4360 + 8 [0]
> 259,0 1 33 72.837010993 91 Q WM 69896 + 8 [kworker/u8:3]
> 259,0 1 34 72.837012564 91 C WM 69896 + 8 [0]
>
> Cheers,
>
> - Ted
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2019-07-29 21:12 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-08 17:59 Phantom full ext4 root filesystems on 4.1 through 4.14 kernels Elana Hashman
2018-11-08 18:13 ` Reindl Harald
2018-11-08 18:20 ` Elana Hashman
2018-11-08 18:47 ` Darrick J. Wong
2018-12-05 16:26 ` Elana Hashman
2019-01-23 19:59 ` Thomas Walker
2019-06-26 15:17 ` Thomas Walker
2019-07-11 9:23 ` Jan Kara
2019-07-11 14:40 ` Geoffrey Thomas
2019-07-11 15:23 ` Jan Kara
2019-07-11 17:10 ` Theodore Ts'o
2019-07-12 19:19 ` Thomas Walker
2019-07-12 20:28 ` Theodore Ts'o
2019-07-12 21:47 ` Geoffrey Thomas
2019-07-25 21:22 ` Geoffrey Thomas
2019-07-29 10:09 ` Jan Kara
2019-07-29 11:18 ` ext4 file system is constantly writing to the block device with no activity from the applications, is it a bug? Dmitrij Gusev
2019-07-29 12:55 ` Theodore Y. Ts'o
2019-07-29 21:12 ` Dmitrij Gusev
2019-01-24 1:54 ` Phantom full ext4 root filesystems on 4.1 through 4.14 kernels Liu Bo
2019-01-24 14:40 ` Elana Hashman
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.