All of lore.kernel.org
 help / color / mirror / Atom feed
* [Bug 151491] New: free space lossage on busy system with bigalloc enabled and 128KB cluster
@ 2016-08-04 19:47 bugzilla-daemon
  2016-08-06 12:54 ` [Bug 151491] " bugzilla-daemon
                   ` (21 more replies)
  0 siblings, 22 replies; 23+ messages in thread
From: bugzilla-daemon @ 2016-08-04 19:47 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=151491

            Bug ID: 151491
           Summary: free space lossage on busy system with bigalloc
                    enabled and 128KB cluster
           Product: File System
           Version: 2.5
    Kernel Version: 4.1.8
          Hardware: x86-64
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: high
          Priority: P1
         Component: ext4
          Assignee: fs_ext4@kernel-bugs.osdl.org
          Reporter: mlmartin@clearsky-data.com
        Regression: No

Created attachment 227581
  --> https://bugzilla.kernel.org/attachment.cgi?id=227581&action=edit
details of fault with scripts for reproduction

file system with bigalloc enabled and 128KB cluster size with a large number of
2MB files being created/overwritten/deleted loses usable space.

Running du and df gives wildly different usage with df showing much more usage
than du. lsof shows no phantom open files. Using dd to fill the file system
shows that df's version of free space is operative, but unmounting and
remounting the file system returns the free space. There is no difference
between df and du usage after remount.

The fault does not seem to be present in the 4.7 kernel (or it takes a lot more
activity for it to show up).

I will build 4.4.16 and retest to see if is present there.

We do have a(n obnoxious) workaround of periodically unmounting/remounting
files systems.

Details of configurations and tests in the attached document

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug 151491] free space lossage on busy system with bigalloc enabled and 128KB cluster
  2016-08-04 19:47 [Bug 151491] New: free space lossage on busy system with bigalloc enabled and 128KB cluster bugzilla-daemon
@ 2016-08-06 12:54 ` bugzilla-daemon
  2016-08-08 20:30 ` bugzilla-daemon
                   ` (20 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: bugzilla-daemon @ 2016-08-06 12:54 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=151491

--- Comment #1 from Matthew L. Martin <mlmartin@clearsky-data.com> ---
The fault is in linux 4.4.16. After running the populate script for a few hours
du and df disagree a great deal:

# df -h /mnt/hdd_sd[gh]; du -hs  /mnt/hdd_sd[gh]
Filesystem      Size  Used Avail Use% Mounted on
/dev/sdg        5.5T  129G  5.3T   3% /mnt/hdd_sdg
/dev/sdh        5.5T   20G  5.4T   1% /mnt/hdd_sdh
20G    /mnt/hdd_sdg
20G    /mnt/hdd_sdh

# lsof | grep -e sdg -e sdh
jbd2/sdg- 32609           root  cwd       DIR              9,127      4096     
  192 /
jbd2/sdg- 32609           root  rtd       DIR              9,127      4096     
  192 /
jbd2/sdg- 32609           root  txt   unknown                                  
      /proc/32609/exe
jbd2/sdh- 32614           root  cwd       DIR              9,127      4096     
  192 /
jbd2/sdh- 32614           root  rtd       DIR              9,127      4096     
  192 /
jbd2/sdh- 32614           root  txt   unknown                                  
      /proc/32614/exe

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug 151491] free space lossage on busy system with bigalloc enabled and 128KB cluster
  2016-08-04 19:47 [Bug 151491] New: free space lossage on busy system with bigalloc enabled and 128KB cluster bugzilla-daemon
  2016-08-06 12:54 ` [Bug 151491] " bugzilla-daemon
@ 2016-08-08 20:30 ` bugzilla-daemon
  2016-08-09 18:28 ` bugzilla-daemon
                   ` (19 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: bugzilla-daemon @ 2016-08-08 20:30 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=151491

--- Comment #2 from Matthew L. Martin <mlmartin@clearsky-data.com> ---
I believe that I have confirmed that this fault is not present in linux 4.7.
After running the reproduction script for over three hours I have not seen a
difference between the usage reported by du and df:

# df -h /mnt/hdd_sd[gh]; du -hs  /mnt/hdd_sd[gh]
Filesystem      Size  Used Avail Use% Mounted on
/dev/sdg        5.5T   20G  5.4T   1% /mnt/hdd_sdg
/dev/sdh        5.5T   20G  5.4T   1% /mnt/hdd_sdh
20G    /mnt/hdd_sdg
20G    /mnt/hdd_sdh

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug 151491] free space lossage on busy system with bigalloc enabled and 128KB cluster
  2016-08-04 19:47 [Bug 151491] New: free space lossage on busy system with bigalloc enabled and 128KB cluster bugzilla-daemon
  2016-08-06 12:54 ` [Bug 151491] " bugzilla-daemon
  2016-08-08 20:30 ` bugzilla-daemon
@ 2016-08-09 18:28 ` bugzilla-daemon
  2017-11-11 11:04 ` bugzilla-daemon
                   ` (18 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: bugzilla-daemon @ 2016-08-09 18:28 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=151491

--- Comment #3 from Matthew L. Martin <mlmartin@clearsky-data.com> ---
Unfortunately, after an extended run of the test scripts the fault presented
itself in the 4.7 kernel:

[root@d-ceph01 ~]# df -h /mnt/hdd_sd[gh]; du -hs  /mnt/hdd_sd[gh]
Filesystem      Size  Used Avail Use% Mounted on
/dev/sdg        5.5T  669G  4.8T  13% /mnt/hdd_sdg
/dev/sdh        5.5T   20G  5.4T   1% /mnt/hdd_sdh
20G    /mnt/hdd_sdg
20G    /mnt/hdd_sdh

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug 151491] free space lossage on busy system with bigalloc enabled and 128KB cluster
  2016-08-04 19:47 [Bug 151491] New: free space lossage on busy system with bigalloc enabled and 128KB cluster bugzilla-daemon
                   ` (2 preceding siblings ...)
  2016-08-09 18:28 ` bugzilla-daemon
@ 2017-11-11 11:04 ` bugzilla-daemon
  2017-11-11 19:20 ` bugzilla-daemon
                   ` (17 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: bugzilla-daemon @ 2017-11-11 11:04 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=151491

Fischreiher (mfe555@web.de) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |mfe555@web.de

--- Comment #4 from Fischreiher (mfe555@web.de) ---
I have a similar or same problem on a Linux based Enigma2 set-top box

- ext4
- kernel 4.8.3
- bigalloc enabled
- cluster size of 262144

In normal use of the set-top box, the free space lossage is 10s of Gigabytes
per day. The problem can be easily reproduced:

When creating a fresh file, there is a significant difference in file size (ls
-la) and disk usage (du). When making two copies of the file ..

gbquad:/hdd/test# cp file file.copy1
gbquad:/hdd/test# cp file file.copy2
gbquad:/hdd/test# ls -la
-rw-------    1 root     root     581821460 Nov  1 18:52 file
-rw-------    1 root     root     581821460 Nov  1 18:56 file.copy1
-rw-------    1 root     root     581821460 Nov  1 18:57 file.copy2
gbquad:/hdd/test# du *
607232  file
658176  file.copy1
644864  file.copy2

... all three files show an overhead in the ~10% range, and the overhead is
different for these files although their md5sums are equal.

When deleting a file (rm), the overhead remains occupied on the disk. For
example, after deleting "file", "df" reports approx. 581821460 more bytes free,
not 607232 kbytes more free space. The overhead (607232 kB - 581821460 B =
approx. 39 MB) remains blocked.

When unmounting and mounting again, the blocked space becomes free again, and
in addition the overhead of those files that were not deleted also disappears,
so that after a re-mount the 'file size' and 'disk usage' match for all files
(except for rounding up to some block size).

I found that
    echo 3 > /proc/sys/vm/drop_caches
seems to detach the blocked disk space from the files (so that 'du file' no
longer includes the offset), but it does not free the space, 'df' still shows
all file overheads as used disk space.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug 151491] free space lossage on busy system with bigalloc enabled and 128KB cluster
  2016-08-04 19:47 [Bug 151491] New: free space lossage on busy system with bigalloc enabled and 128KB cluster bugzilla-daemon
                   ` (3 preceding siblings ...)
  2017-11-11 11:04 ` bugzilla-daemon
@ 2017-11-11 19:20 ` bugzilla-daemon
  2017-11-13 16:27 ` bugzilla-daemon
                   ` (16 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: bugzilla-daemon @ 2017-11-11 19:20 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=151491

Theodore Tso (tytso@mit.edu) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |tytso@mit.edu

--- Comment #5 from Theodore Tso (tytso@mit.edu) ---
Can you try replicating this on an upstream kernel, running in a controlled
environment (e.g., using kvm) and then give us reliable reproduction
instructions --- e.g., using simple shell scripts, and which doesn't depend on
the vagaries of the settop box software, and random versions of du, df, etc.

For bonus points, use get a copy of kvm-xfstests[1][2], and run the test using
scripts cut and pasted into "kvm-xfstests shell".   That way we will be able to
reproduce *exactly* what you are doing.

[1] https://github.com/tytso/xfstests-bld
[2]
https://github.com/tytso/xfstests-bld/blob/master/Documentation/kvm-xfstests.md

Thanks!!

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug 151491] free space lossage on busy system with bigalloc enabled and 128KB cluster
  2016-08-04 19:47 [Bug 151491] New: free space lossage on busy system with bigalloc enabled and 128KB cluster bugzilla-daemon
                   ` (4 preceding siblings ...)
  2017-11-11 19:20 ` bugzilla-daemon
@ 2017-11-13 16:27 ` bugzilla-daemon
  2017-11-13 17:27 ` bugzilla-daemon
                   ` (15 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: bugzilla-daemon @ 2017-11-13 16:27 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=151491

Betacentauri (betacentauri@arcor.de) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |betacentauri@arcor.de

--- Comment #6 from Betacentauri (betacentauri@arcor.de) ---
First result:
When I revert this commit
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/fs/ext4?h=v4.1&id=9d21c9fa2cc24e2a195a79c27b6550e1a96051a4
in a 4.1 ARM kernel the problem doesn't occur any longer.
Scripts for replicating in the kvm-xfstests environment will follow later.


By the way: The setup boxes fischreiher and me are talking about use only
slightly adapted upstream kernel to support broadcom hardware. fs directory in
kernel sources is not changed.

Frank

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug 151491] free space lossage on busy system with bigalloc enabled and 128KB cluster
  2016-08-04 19:47 [Bug 151491] New: free space lossage on busy system with bigalloc enabled and 128KB cluster bugzilla-daemon
                   ` (5 preceding siblings ...)
  2017-11-13 16:27 ` bugzilla-daemon
@ 2017-11-13 17:27 ` bugzilla-daemon
  2017-11-13 17:30 ` bugzilla-daemon
                   ` (14 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: bugzilla-daemon @ 2017-11-13 17:27 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=151491

--- Comment #7 from Betacentauri (betacentauri@arcor.de) ---
Created attachment 260631
  --> https://bugzilla.kernel.org/attachment.cgi?id=260631&action=edit
Test script

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug 151491] free space lossage on busy system with bigalloc enabled and 128KB cluster
  2016-08-04 19:47 [Bug 151491] New: free space lossage on busy system with bigalloc enabled and 128KB cluster bugzilla-daemon
                   ` (6 preceding siblings ...)
  2017-11-13 17:27 ` bugzilla-daemon
@ 2017-11-13 17:30 ` bugzilla-daemon
  2017-11-13 17:39 ` bugzilla-daemon
                   ` (13 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: bugzilla-daemon @ 2017-11-13 17:30 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=151491

--- Comment #8 from Betacentauri (betacentauri@arcor.de) ---
Created attachment 260633
  --> https://bugzilla.kernel.org/attachment.cgi?id=260633&action=edit
Test script output

I have used a 4.9 32 bit kernel (config from kernel-configs folder).

Output shows how used space in df command increases. Also files generated by dd
and cp have different file size in ls command.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug 151491] free space lossage on busy system with bigalloc enabled and 128KB cluster
  2016-08-04 19:47 [Bug 151491] New: free space lossage on busy system with bigalloc enabled and 128KB cluster bugzilla-daemon
                   ` (7 preceding siblings ...)
  2017-11-13 17:30 ` bugzilla-daemon
@ 2017-11-13 17:39 ` bugzilla-daemon
  2017-11-16 23:54 ` bugzilla-daemon
                   ` (12 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: bugzilla-daemon @ 2017-11-13 17:39 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=151491

--- Comment #9 from Betacentauri (betacentauri@arcor.de) ---
Forgot to say that the script is for the kvm-xfstests environment. And the
output already shows the output of that environment.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug 151491] free space lossage on busy system with bigalloc enabled and 128KB cluster
  2016-08-04 19:47 [Bug 151491] New: free space lossage on busy system with bigalloc enabled and 128KB cluster bugzilla-daemon
                   ` (8 preceding siblings ...)
  2017-11-13 17:39 ` bugzilla-daemon
@ 2017-11-16 23:54 ` bugzilla-daemon
  2017-11-17 16:00 ` bugzilla-daemon
                   ` (11 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: bugzilla-daemon @ 2017-11-16 23:54 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=151491

Eric Whitney (enwlinux@gmail.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |enwlinux@gmail.com

--- Comment #10 from Eric Whitney (enwlinux@gmail.com) ---
I've been able to reproduce the reported problem on my test system running a
4.14 x86-64 kernel with the supplied test script.  Thanks for supplying it!

The block reporting errors from du and df are likely caused by delayed
allocation accounting bugs.  Experiments with an instrumented kernel show that
the number of delayed allocated blocks is occasionally overcounted as the test
files are physically allocated, leaving a residual value behind once allocation
is complete.  This residual value remains once a file has been fully written
out or deleted, and distorts the results reported by du or df.  Interestingly,
the overcounting isn't deterministic and varies from run to run.

Part of the overcounting appears due to code in ext4_ext_map_blocks() that
increases i_reserved_data_blocks when new clusters are allocated.  This code
has been previously implicated in other observed failures and in this case
appears to contribute some but not always all of the overcounted clusters seen
when running the test script.  Kernel traces indicate that there is usually
another as yet unknown contributor to the overcount.

Ted has suggested a temporary workaround which can be used to avoid the
reported problems, though it may have a significant workload-dependent
performance impact.  Delayed allocation can simply be disabled by using the
nodelalloc mount option.  I've tested this with repeated runs of the supplied
test script, and it avoids the reported problems as expected.

Reverting "ext4: don't release reserved space for previously allocated cluster"
(9d21c9fs2cc2) isn't an attractive option because doing so would expose users
to potential data loss.  The purpose of the patch was to fix cases where the
number of outstanding delayed allocation blocks were undercounted. 
Undercounting can lead to unexpected free space exhaustion at writeback time,
among other things.

I'll see what more I can learn from some additional experimentation.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug 151491] free space lossage on busy system with bigalloc enabled and 128KB cluster
  2016-08-04 19:47 [Bug 151491] New: free space lossage on busy system with bigalloc enabled and 128KB cluster bugzilla-daemon
                   ` (9 preceding siblings ...)
  2017-11-16 23:54 ` bugzilla-daemon
@ 2017-11-17 16:00 ` bugzilla-daemon
  2017-11-20 15:50 ` bugzilla-daemon
                   ` (10 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: bugzilla-daemon @ 2017-11-17 16:00 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=151491

--- Comment #11 from Betacentauri (betacentauri@arcor.de) ---
nodelalloc mount options workarounds the problem in the test environment. But I
also checked with real ARM system with 4.1.37 kernel:

root@sf4008:/media# mount 
...
/dev/sda on /media/sda type ext4 (rw,relatime,nodelalloc,data=ordered)
root@sf4008:/media# ls -las sda/testfiles/
 10240 -rw-r--r--    1 root     root      10485760 Nov 17 16:47 test
 10304 -rw-r--r--    1 root     root      10485760 Nov 17 16:47 test1

Test is a little bit different. Only 10 MB files are generated. In most cases
file size (first column) is equal, but in some cases file size still differs
like in above example. It's not deterministic for me when it happens. But I
only see 2 sizes. 10240 or 10304. With delalloc the file size was much more
random.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug 151491] free space lossage on busy system with bigalloc enabled and 128KB cluster
  2016-08-04 19:47 [Bug 151491] New: free space lossage on busy system with bigalloc enabled and 128KB cluster bugzilla-daemon
                   ` (10 preceding siblings ...)
  2017-11-17 16:00 ` bugzilla-daemon
@ 2017-11-20 15:50 ` bugzilla-daemon
  2017-11-20 17:59 ` bugzilla-daemon
                   ` (9 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: bugzilla-daemon @ 2017-11-20 15:50 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=151491

--- Comment #12 from Eric Whitney (enwlinux@gmail.com) ---
After many attempts, I'm unable to reproduce the newly reported behavior for
the delalloc workaround in comment 11 on my x86-64 test system (which is not an
xfstests-bld test appliance) running either a current 4.14 kernel or an older
Debian Jessie 4.8 kernel.  I consistently get a reported value of 10240 1k
units, which is correct for the reported size.

However, in the process of running my trials I arrived at a simpler reproducer
that should be helpful in identifying the source of the original space
reporting problem.  There's no need to copy the first test file if the test
system's free memory is sufficiently limited relative to the size of the test
file - a simple sequential write of a single test file suffices.  In fact, the
tighter the free memory, the more likely the problem occurs and the likelihood
of larger reporting errors increases.  A test system with ample free memory
won't exhibit the problem at all.

I'm getting workable kernel traces with the simpler reproducer, and the free
memory-related behavior suggests a direction, so I'll see where that takes me.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug 151491] free space lossage on busy system with bigalloc enabled and 128KB cluster
  2016-08-04 19:47 [Bug 151491] New: free space lossage on busy system with bigalloc enabled and 128KB cluster bugzilla-daemon
                   ` (11 preceding siblings ...)
  2017-11-20 15:50 ` bugzilla-daemon
@ 2017-11-20 17:59 ` bugzilla-daemon
  2017-11-30 16:08 ` bugzilla-daemon
                   ` (8 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: bugzilla-daemon @ 2017-11-20 17:59 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=151491

--- Comment #13 from Betacentauri (betacentauri@arcor.de) ---
The ARM machine I use has very little free memory. So that fits to your
analysis. 
With your information I could also reproduce it in the xfstests environment
with nodelalloc. But only 2 times. I set memory in the virtual machine to 256MB
(in config.kvm). Then I mounted the filesystem with nodelalloc and executed
this little script several times:

#!/bin/bash

i=0
while [ $i -lt 10 ]; do
 dd if=/dev/zero of=./test$i bs=1M count=200 > /dev/null
 cp test$i testx_$i
 sync
 let i=i+1
 echo $i
done

Result was this:
root@kvm-xfstests:/media/test# ls -las
total 4097028
   256 drwxr-xr-x 3 root root      4096 Nov 20 17:43 .
     4 drwxr-xr-x 3 root root      4096 Nov 20 17:33 ..
   256 -rwxr-xr-x 1 root root       155 Nov 20 17:48 h.sh
   256 drwx------ 2 root root     16384 Nov 20 17:26 lost+found
204800 -rw-r--r-- 1 root root 209715200 Nov 20 17:48 test0
205056 -rw-r--r-- 1 root root 209715200 Nov 20 17:48 test1
204800 -rw-r--r-- 1 root root 209715200 Nov 20 17:48 test2
204800 -rw-r--r-- 1 root root 209715200 Nov 20 17:48 test3
204800 -rw-r--r-- 1 root root 209715200 Nov 20 17:48 test4
...

One file has size 205056. But I cannot really reproduce it.

So better focus on fixing the bug than on trying to reproduce it with
nodelalloc. If I find a way to reproduce, I'll let you know ;-)

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug 151491] free space lossage on busy system with bigalloc enabled and 128KB cluster
  2016-08-04 19:47 [Bug 151491] New: free space lossage on busy system with bigalloc enabled and 128KB cluster bugzilla-daemon
                   ` (12 preceding siblings ...)
  2017-11-20 17:59 ` bugzilla-daemon
@ 2017-11-30 16:08 ` bugzilla-daemon
  2017-12-02 11:35 ` bugzilla-daemon
                   ` (7 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: bugzilla-daemon @ 2017-11-30 16:08 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=151491

--- Comment #14 from Eric Whitney (enwlinux@gmail.com) ---
I've been able to identify the unknown contributor to the delalloc accounting
error as discussed previously (in comment 10).

When ext4 is writing back a previously delalloc'ed extent that contains just a
portion of a cluster, and then delallocs an extent that contains another
disjoint portion of that same cluster, the count of delalloc'ed clusters
(i_reserved_data_blocks) is incorrectly incremented.  The cluster has been
physically allocated during writeback, but the subsequent delalloc write does
not discover that allocation.  This is because the code in ext4_da_map_blocks()
checks for a previously physically allocated block at the point of allocation
rather than a previously physically allocated cluster spanning the point of
allocation.

The effect is to bump the delalloc'ed cluster count for clusters that will
never be allocated (since they've already been allocated), and the overcount
will therefore never be reduced.

It's more likely this problem would occur when writing files sequentially if
the test system was under memory pressure, resulting in writeback activity in
parallel with delalloc writes.  The magnitude of the overcount is also likely
to be larger in this situation.  This correlates well with the observation that
the reproducer for the accounting errors is more likely to reproduce the
problem on a test system with little free memory.

I've been testing a prototype patch that appears to fix this problem.  However,
I've also identified at least two other unrelated delalloc accounting problems
for bigalloc file systems whose effects are masked by the other contributor to
overcounting in ext4_ext_map_blocks().  Fixing it results in failures caused by
these other problems when running xfstests-bld on bigalloc.  So, there's a lot
of work yet to be done before it's time to post patches.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug 151491] free space lossage on busy system with bigalloc enabled and 128KB cluster
  2016-08-04 19:47 [Bug 151491] New: free space lossage on busy system with bigalloc enabled and 128KB cluster bugzilla-daemon
                   ` (13 preceding siblings ...)
  2017-11-30 16:08 ` bugzilla-daemon
@ 2017-12-02 11:35 ` bugzilla-daemon
  2018-01-27  9:25 ` bugzilla-daemon
                   ` (6 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: bugzilla-daemon @ 2017-12-02 11:35 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=151491

--- Comment #15 from Fischreiher (mfe555@web.de) ---
Thanks a lot for your investigation, Eric, and for describing these details. It
is great to know that this issue is in competent hands.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug 151491] free space lossage on busy system with bigalloc enabled and 128KB cluster
  2016-08-04 19:47 [Bug 151491] New: free space lossage on busy system with bigalloc enabled and 128KB cluster bugzilla-daemon
                   ` (14 preceding siblings ...)
  2017-12-02 11:35 ` bugzilla-daemon
@ 2018-01-27  9:25 ` bugzilla-daemon
  2018-01-27 14:20 ` bugzilla-daemon
                   ` (5 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: bugzilla-daemon @ 2018-01-27  9:25 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=151491

--- Comment #16 from Fischreiher (mfe555@web.de) ---
Hi Erik, I got the message that this is not an easy nor quick fix, but is it
still ob your list?

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug 151491] free space lossage on busy system with bigalloc enabled and 128KB cluster
  2016-08-04 19:47 [Bug 151491] New: free space lossage on busy system with bigalloc enabled and 128KB cluster bugzilla-daemon
                   ` (15 preceding siblings ...)
  2018-01-27  9:25 ` bugzilla-daemon
@ 2018-01-27 14:20 ` bugzilla-daemon
  2018-01-27 19:28 ` bugzilla-daemon
                   ` (4 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: bugzilla-daemon @ 2018-01-27 14:20 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=151491

--- Comment #17 from Theodore Tso (tytso@mit.edu) ---
Eric has still been working on it (he has been reporting on it on our weekly
ext4 concalls, and we've been discussing it).   He's identified an approach and
has patches which he is refining, perfecting, and testing.   Hopefully there
will be something that can be released for users to test in the near future.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug 151491] free space lossage on busy system with bigalloc enabled and 128KB cluster
  2016-08-04 19:47 [Bug 151491] New: free space lossage on busy system with bigalloc enabled and 128KB cluster bugzilla-daemon
                   ` (16 preceding siblings ...)
  2018-01-27 14:20 ` bugzilla-daemon
@ 2018-01-27 19:28 ` bugzilla-daemon
  2018-12-03 17:10 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: bugzilla-daemon @ 2018-01-27 19:28 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=151491

--- Comment #18 from Fischreiher (mfe555@web.de) ---
Great, thank you, I'll be patient.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug 151491] free space lossage on busy system with bigalloc enabled and 128KB cluster
  2016-08-04 19:47 [Bug 151491] New: free space lossage on busy system with bigalloc enabled and 128KB cluster bugzilla-daemon
                   ` (17 preceding siblings ...)
  2018-01-27 19:28 ` bugzilla-daemon
@ 2018-12-03 17:10 ` bugzilla-daemon
  2018-12-11 16:44 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: bugzilla-daemon @ 2018-12-03 17:10 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=151491

--- Comment #19 from Betacentauri (betacentauri@arcor.de) ---
Any news regarding this ticket? Is the problem in the meantime fixed?

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug 151491] free space lossage on busy system with bigalloc enabled and 128KB cluster
  2016-08-04 19:47 [Bug 151491] New: free space lossage on busy system with bigalloc enabled and 128KB cluster bugzilla-daemon
                   ` (18 preceding siblings ...)
  2018-12-03 17:10 ` bugzilla-daemon
@ 2018-12-11 16:44 ` bugzilla-daemon
  2018-12-11 18:57 ` bugzilla-daemon
  2018-12-13 16:12 ` bugzilla-daemon
  21 siblings, 0 replies; 23+ messages in thread
From: bugzilla-daemon @ 2018-12-11 16:44 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=151491

--- Comment #20 from Eric Whitney (enwlinux@gmail.com) ---
4.20 contains patches that correct delalloc cluster accounting for bigalloc
file systems.  They were merged at the beginning of the release cycle. See
"ext4: generalize extents status tree search functions" (ad431025aecd) and the
following five patches.  They should address all the problems described in this
bugzilla.

Any independent testing would be appreciated.  I'd recommend working with the
latest mainline rc available at this time, which is 4.20-rc6.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug 151491] free space lossage on busy system with bigalloc enabled and 128KB cluster
  2016-08-04 19:47 [Bug 151491] New: free space lossage on busy system with bigalloc enabled and 128KB cluster bugzilla-daemon
                   ` (19 preceding siblings ...)
  2018-12-11 16:44 ` bugzilla-daemon
@ 2018-12-11 18:57 ` bugzilla-daemon
  2018-12-13 16:12 ` bugzilla-daemon
  21 siblings, 0 replies; 23+ messages in thread
From: bugzilla-daemon @ 2018-12-11 18:57 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=151491

--- Comment #21 from Betacentauri (betacentauri@arcor.de) ---
Thanks for the patches!

In kvm-xfstests environment with a 4.20-rc6 kernel I cannot reproduce the bug
anymore. I also created a tmpfs with several big files to reduce free memory,
but still there are no problems :-)

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug 151491] free space lossage on busy system with bigalloc enabled and 128KB cluster
  2016-08-04 19:47 [Bug 151491] New: free space lossage on busy system with bigalloc enabled and 128KB cluster bugzilla-daemon
                   ` (20 preceding siblings ...)
  2018-12-11 18:57 ` bugzilla-daemon
@ 2018-12-13 16:12 ` bugzilla-daemon
  21 siblings, 0 replies; 23+ messages in thread
From: bugzilla-daemon @ 2018-12-13 16:12 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=151491

--- Comment #22 from Eric Whitney (enwlinux@gmail.com) ---
Very good - thanks for the testing!  I'll close this bug out at the end of the
4.20 release cycle if no negative test results are reported by then.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2018-12-13 16:12 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-04 19:47 [Bug 151491] New: free space lossage on busy system with bigalloc enabled and 128KB cluster bugzilla-daemon
2016-08-06 12:54 ` [Bug 151491] " bugzilla-daemon
2016-08-08 20:30 ` bugzilla-daemon
2016-08-09 18:28 ` bugzilla-daemon
2017-11-11 11:04 ` bugzilla-daemon
2017-11-11 19:20 ` bugzilla-daemon
2017-11-13 16:27 ` bugzilla-daemon
2017-11-13 17:27 ` bugzilla-daemon
2017-11-13 17:30 ` bugzilla-daemon
2017-11-13 17:39 ` bugzilla-daemon
2017-11-16 23:54 ` bugzilla-daemon
2017-11-17 16:00 ` bugzilla-daemon
2017-11-20 15:50 ` bugzilla-daemon
2017-11-20 17:59 ` bugzilla-daemon
2017-11-30 16:08 ` bugzilla-daemon
2017-12-02 11:35 ` bugzilla-daemon
2018-01-27  9:25 ` bugzilla-daemon
2018-01-27 14:20 ` bugzilla-daemon
2018-01-27 19:28 ` bugzilla-daemon
2018-12-03 17:10 ` bugzilla-daemon
2018-12-11 16:44 ` bugzilla-daemon
2018-12-11 18:57 ` bugzilla-daemon
2018-12-13 16:12 ` bugzilla-daemon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.