Linux-ext4 Archive on lore.kernel.org
 help / color / Atom feed
* [Bug 207729] New: Mounting EXT4 with data_err=abort does not abort journal on data block write failure
@ 2020-05-13  8:15 bugzilla-daemon
  2020-07-27  9:41 ` [Bug 207729] " bugzilla-daemon
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: bugzilla-daemon @ 2020-05-13  8:15 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=207729

            Bug ID: 207729
           Summary: Mounting EXT4 with data_err=abort does not abort
                    journal on data block write failure
           Product: File System
           Version: 2.5
    Kernel Version: 5.3.2
          Hardware: All
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: ext4
          Assignee: fs_ext4@kernel-bugs.osdl.org
          Reporter: rebello.anthony@gmail.com
        Regression: No

Created attachment 289111
  --> https://bugzilla.kernel.org/attachment.cgi?id=289111&action=edit
TGZ containing shell script and c code to reproduce the bug

Hi,

I've been performing some experiments that involve fault injection in the ext4
file system and noticed something that might be a bug.

According to [ext4
Documentation](https://www.kernel.org/doc/Documentation/filesystems/ext4.txt), 
`data_err=abort` should abort the journal if an error occurs in a file data
buffer in ordered mode.

The assumption I'm making is that the mount option `data_err=abort` will abort
the journal if the file system fails to write a data block to disk (I'm not
sure about my assumption since the documentation mentions a "file data
buffer").

### Overview

I have noticed that when fsync fails due to a failed data block write, the
journal is not aborted even when the file system is mounted with
`data_err=abort`. The file system continues to accept and handle future writes
and fsyncs.

I've attached a tgz which contains a small C program along with a shell script
to reproduce the problem. The C file is a simple workload that writes and
fsyncs a file. The shell script uses losetup to create a loop device,
configures the device mapper to fail certain sectors, mounts the ext4 file
system, and runs the compiled C file to demonstrate the issue.

Usage

```
tar -xvzf ext4-data-err-abort-bug.tgz
cd ext4-data-err-abort-bug
bash run_bug.sh
```

### Steps to reproduce independently

1. Create a 1GB loopdevice and mkfs -t ext4
2. Create a file with 12K data on the mounted loopdevice
3. Using debugfs, identify the 3 blocks for the file
4. Create a device mapper device using the dmsetup create command, configure
the 2nd block to use error target while everything else is a linear target
5. Open the file, write to the 2nd page, fsyncs (this fsync should fail)
6. Write to the 3rd page, fsync (This passes even though we expect it to fail
because of the `data_err=abort` mount option)
7. close file, unmount, dmsetup remove, etc

### Actual results

I observed that the first fsync fails because of the write failure. However,
the journal is not aborted. The second write and fsync succeed.

### Expected results

The first fsync should fail and the journal should be aborted. Any future
write/fsync would cause the file system to detect an aborted journal and
probably remount in read only mode.


### Other information

I used an unmodified linux 5.3.2 from
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/snapshot/linux-5.3.2.tar.gz
on an Intel Xeon Silver 4114 deca core processor.

```
$ awk -f scripts/ver_linux
If some fields are empty or look unusual you may have an old version.
Compare to the current minimal requirements in Documentation/Changes.

Linux my-hostname 5.3.2 #1 SMP Mon May 11 14:13:56 CDT 2020 x86_64 x86_64
x86_64 GNU/Linux

GNU Make                4.1
Binutils                2.30
Util-linux              2.31.1
Mount                   2.31.1
Module-init-tools       24
E2fsprogs               1.44.1
Xfsprogs                4.9.0
Nfs-utils               1.3.3
Linux C Library         2.27
Dynamic linker (ldd)    2.27
Linux C++ Library       6.0.25
Procps                  3.3.12
Net-tools               2.10
Kbd                     2.0.4
Console-tools           2.0.4
Sh-utils                8.28
Udev                    237
Modules Loaded          acpi_pad acpi_power_meter aesni_intel aes_x86_64 ahci
autofs4 binfmt_misc coretemp crc32_pclmul crct10dif_pclmul cryptd crypto_simd
dca drm drm_kms_helper drm_vram_helper fb_sys_fops fscache ghash_clmulni_intel
glue_helper grace hid hid_generic i2c_algo_bit i40e ib_cm ib_core ib_iser
input_leds intel_cstate intel_powerclamp intel_rapl_common intel_rapl_msr
intel_rapl_perf ioatdma ipmi_devintf ipmi_msghandler ipmi_si ipmi_ssif ipod
ip_tables irqbypass iscsi_tcp iw_cm ixgbe joydev kvm kvm_intel libahci
libcrc32c libiscsi libiscsi_tcp lockd lpc_ich mac_hid mdio mei mei_me mgag200
mpt3sas nfit nfs nfs_acl nfsv3 pps_core ptp raid_class rdma_cm sch_fq_codel
scsi_transport_iscsi scsi_transport_sas skx_edac sunrpc syscopyarea sysfillrect
sysimgblt ttm usbhid wmi x86_pkg_temp_thermal xfrm_algo xfs x_tables
```

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug 207729] Mounting EXT4 with data_err=abort does not abort journal on data block write failure
  2020-05-13  8:15 [Bug 207729] New: Mounting EXT4 with data_err=abort does not abort journal on data block write failure bugzilla-daemon
@ 2020-07-27  9:41 ` bugzilla-daemon
  2020-07-27 20:31 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: bugzilla-daemon @ 2020-07-27  9:41 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=207729

Jan Kara (jack@suse.cz) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
                 CC|                            |jack@suse.cz

--- Comment #1 from Jan Kara (jack@suse.cz) ---
Thanks for the report! So it is actually a description of data_err=abort mount
option that is somewhat misleading. Let me explain a bit more: Ext4 in
data=ordered mode guarantees to write contents of newly allocated data blocks
during transaction commit, after which changes that make these blocks visible
will become durable. In practice, whenever a new blocks are allocated for a
file, we write out the range of a file that covers all the newly allocated
blocks but that's just an implementation detail. And data_err=abort controls
the behavior when this particular writeout of data blocks fail. In your test
there are no newly allocated blocks in the transaction so the data_err=abort
does not apply.

To explain some rationale, such data writeback errors are indeed more serious
because if we just committed the transaction despite these errors, the newly
allocated blocks could expose stale, potentially security sensitive, data from
other files. So that's why the option was introduced.

But I agree that the documentation is misleading and the semantics of the
option is somewhat peculiar. I'll talk to other ext4 developers how we could
possibly improve the situation.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug 207729] Mounting EXT4 with data_err=abort does not abort journal on data block write failure
  2020-05-13  8:15 [Bug 207729] New: Mounting EXT4 with data_err=abort does not abort journal on data block write failure bugzilla-daemon
  2020-07-27  9:41 ` [Bug 207729] " bugzilla-daemon
@ 2020-07-27 20:31 ` bugzilla-daemon
  2020-07-27 20:41 ` bugzilla-daemon
  2020-07-28  8:32 ` bugzilla-daemon
  3 siblings, 0 replies; 5+ messages in thread
From: bugzilla-daemon @ 2020-07-27 20:31 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=207729

--- Comment #2 from Anthony Rebello (rebello.anthony@gmail.com) ---
Created attachment 290633
  --> https://bugzilla.kernel.org/attachment.cgi?id=290633&action=edit
TGZ containing code to reproduce bug for APPEND workload

The previous attachment modified a file in place, which did not allocate any
new blocks.
This attachment contains a workload that allocates new blocks by appending to
the file.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug 207729] Mounting EXT4 with data_err=abort does not abort journal on data block write failure
  2020-05-13  8:15 [Bug 207729] New: Mounting EXT4 with data_err=abort does not abort journal on data block write failure bugzilla-daemon
  2020-07-27  9:41 ` [Bug 207729] " bugzilla-daemon
  2020-07-27 20:31 ` bugzilla-daemon
@ 2020-07-27 20:41 ` bugzilla-daemon
  2020-07-28  8:32 ` bugzilla-daemon
  3 siblings, 0 replies; 5+ messages in thread
From: bugzilla-daemon @ 2020-07-27 20:41 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=207729

--- Comment #3 from Anthony Rebello (rebello.anthony@gmail.com) ---
Hi Jan,

Thank you for your reply.

I've changed the workload to perform appends rather than in-place updates.

Appending sufficient data will cause a new block to be allocated.

On fsync, this newly allocated data block will be written out and in ordered
mode,
the journal entry will contain the block bitmap (with the bit set for the newly
allocated block) and the inode table that maps the offset to this block.

In this particular case, if `data_err=abort` is enabled, the journal should
abort when it fails to write the newly allocated data block.

However, that doesn't seem to be the case. I've attached an updated TGZ
containing the append workload.

The bitmap has the block set, the inode table still points to the block, but
the block has not been overwritten with the intended data. It still contains
it's old contents.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug 207729] Mounting EXT4 with data_err=abort does not abort journal on data block write failure
  2020-05-13  8:15 [Bug 207729] New: Mounting EXT4 with data_err=abort does not abort journal on data block write failure bugzilla-daemon
                   ` (2 preceding siblings ...)
  2020-07-27 20:41 ` bugzilla-daemon
@ 2020-07-28  8:32 ` bugzilla-daemon
  3 siblings, 0 replies; 5+ messages in thread
From: bugzilla-daemon @ 2020-07-28  8:32 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=207729

--- Comment #4 from Jan Kara (jack@suse.cz) ---
Thanks for the reproducer! Good spotting! This is indeed broken. The problem is
that the write to the second file block happens, data is written to page cache.
Then fsync(2) happens. It starts writeback of the second file block - allocates
block, extends file size, submits write of the second file block, and waits for
this write to complete. Because the write fails with EIO, waiting for the write
to complete returns EIO which then bubbles up to userspace. But this also
"consumes" the IO error and so the journalling layer which commits transaction
later does not know there was IO error before and so it happily commits the
transaction. As I've verified, this scenario indeed leads to stale data
exposure that data_err=abort mount option is meant to prevent.

I have to think how to fix this properly...

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, back to index

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-13  8:15 [Bug 207729] New: Mounting EXT4 with data_err=abort does not abort journal on data block write failure bugzilla-daemon
2020-07-27  9:41 ` [Bug 207729] " bugzilla-daemon
2020-07-27 20:31 ` bugzilla-daemon
2020-07-27 20:41 ` bugzilla-daemon
2020-07-28  8:32 ` bugzilla-daemon

Linux-ext4 Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-ext4/0 linux-ext4/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-ext4 linux-ext4/ https://lore.kernel.org/linux-ext4 \
		linux-ext4@vger.kernel.org
	public-inbox-index linux-ext4

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-ext4


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git