Linux-ext4 Archive on lore.kernel.org
 help / color / Atom feed
* Data exposure on IO error
@ 2020-07-31 22:56 Jan Kara
  2020-08-01  7:32 ` Amir Goldstein
  0 siblings, 1 reply; 3+ messages in thread
From: Jan Kara @ 2020-07-31 22:56 UTC (permalink / raw)
  To: linux-ext4; +Cc: rebello.anthony

Hello!

In bug 207729, Anthony reported a bug that can actually lead to a stale
data exposure on IO error. The problem is relatively simple: Suppose we
do:

  fd = open("file", O_WRONLY | O_CREAT | O_TRUNC, 0644);
  write(fd, buf, 4096);
  fsync(fd);

And IO error happens when fsync writes the block of "file". The IO error
gets properly reported to userspace but otherwise the filesystem keeps
running. So the transaction creating "file" and allocating block to it can
commit. Then when page cache of "file" gets evicted, the user can read
stale block contents (provided the IO error was just temporary or involving
only writes).

Now I understand in face of IO errors the behavior is really undefined but
potential exposure of stale data seems worse than strictly necessary. Also
if we run in data=ordered mode, especially if also data_err=abort is set,
user would rightfully expect that the filesystem gets aborted when such IO
error happens but that's not the case. Generally data_err=abort seems a bit
misnamed (and the manpage is wrong about this mount option) since what it
really does is that if jbd2 thread encounters error when writing back
ordered data, the filesystem is aborted. However the ordered data can be
written back by other processes as well and in that case the error is just
lost / reported to userspace but the filesystem doesn't get aborted.

As I was thinking about it, it seems to me that in data=ordered mode, we
should just always abort the filesystem when writeback of newly allocated
block fails to avoid the stale data exposure mentioned above. And then, we
could just deprecate data_err= mount option because it wouldn't be any
useful anymore... What do people think?

								Honza

[1] https://bugzilla.kernel.org/show_bug.cgi?id=207729
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Data exposure on IO error
  2020-07-31 22:56 Data exposure on IO error Jan Kara
@ 2020-08-01  7:32 ` Amir Goldstein
  2020-08-03  7:57   ` Jan Kara
  0 siblings, 1 reply; 3+ messages in thread
From: Amir Goldstein @ 2020-08-01  7:32 UTC (permalink / raw)
  To: Jan Kara; +Cc: Ext4, rebello.anthony

On Sat, Aug 1, 2020 at 1:59 AM Jan Kara <jack@suse.cz> wrote:
>
> Hello!
>
> In bug 207729, Anthony reported a bug that can actually lead to a stale
> data exposure on IO error. The problem is relatively simple: Suppose we
> do:
>
>   fd = open("file", O_WRONLY | O_CREAT | O_TRUNC, 0644);
>   write(fd, buf, 4096);
>   fsync(fd);
>
> And IO error happens when fsync writes the block of "file". The IO error
> gets properly reported to userspace but otherwise the filesystem keeps
> running. So the transaction creating "file" and allocating block to it can
> commit. Then when page cache of "file" gets evicted, the user can read
> stale block contents (provided the IO error was just temporary or involving
> only writes).
>
> Now I understand in face of IO errors the behavior is really undefined but
> potential exposure of stale data seems worse than strictly necessary. Also
> if we run in data=ordered mode, especially if also data_err=abort is set,
> user would rightfully expect that the filesystem gets aborted when such IO
> error happens but that's not the case. Generally data_err=abort seems a bit
> misnamed (and the manpage is wrong about this mount option) since what it
> really does is that if jbd2 thread encounters error when writing back
> ordered data, the filesystem is aborted. However the ordered data can be
> written back by other processes as well and in that case the error is just
> lost / reported to userspace but the filesystem doesn't get aborted.
>
> As I was thinking about it, it seems to me that in data=ordered mode, we
> should just always abort the filesystem when writeback of newly allocated
> block fails to avoid the stale data exposure mentioned above. And then, we
> could just deprecate data_err= mount option because it wouldn't be any
> useful anymore... What do people think?
>

It sounds worse than strictly necessary.

In what way is that use case different from writing into a punched hole
in the middle of the file and getting an IO error on writeback?

It looks like ext4 already goes into a great deal of trouble to handle
extent conversion to init at io end.

So couldn't the described case be handled as a private case of
filling a hole at the end of the file?

Am I missing something beyond the fact that traditionally, extending
a file enjoyed the protection of i_disksize, so did not need to worry
about unwritten extents?

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Data exposure on IO error
  2020-08-01  7:32 ` Amir Goldstein
@ 2020-08-03  7:57   ` Jan Kara
  0 siblings, 0 replies; 3+ messages in thread
From: Jan Kara @ 2020-08-03  7:57 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, Ext4, rebello.anthony

Hello Amir!

On Sat 01-08-20 10:32:53, Amir Goldstein wrote:
> On Sat, Aug 1, 2020 at 1:59 AM Jan Kara <jack@suse.cz> wrote:
> >
> > Hello!
> >
> > In bug 207729, Anthony reported a bug that can actually lead to a stale
> > data exposure on IO error. The problem is relatively simple: Suppose we
> > do:
> >
> >   fd = open("file", O_WRONLY | O_CREAT | O_TRUNC, 0644);
> >   write(fd, buf, 4096);
> >   fsync(fd);
> >
> > And IO error happens when fsync writes the block of "file". The IO error
> > gets properly reported to userspace but otherwise the filesystem keeps
> > running. So the transaction creating "file" and allocating block to it can
> > commit. Then when page cache of "file" gets evicted, the user can read
> > stale block contents (provided the IO error was just temporary or involving
> > only writes).
> >
> > Now I understand in face of IO errors the behavior is really undefined but
> > potential exposure of stale data seems worse than strictly necessary. Also
> > if we run in data=ordered mode, especially if also data_err=abort is set,
> > user would rightfully expect that the filesystem gets aborted when such IO
> > error happens but that's not the case. Generally data_err=abort seems a bit
> > misnamed (and the manpage is wrong about this mount option) since what it
> > really does is that if jbd2 thread encounters error when writing back
> > ordered data, the filesystem is aborted. However the ordered data can be
> > written back by other processes as well and in that case the error is just
> > lost / reported to userspace but the filesystem doesn't get aborted.
> >
> > As I was thinking about it, it seems to me that in data=ordered mode, we
> > should just always abort the filesystem when writeback of newly allocated
> > block fails to avoid the stale data exposure mentioned above. And then, we
> > could just deprecate data_err= mount option because it wouldn't be any
> > useful anymore... What do people think?
> >
> 
> It sounds worse than strictly necessary.
> 
> In what way is that use case different from writing into a punched hole
> in the middle of the file and getting an IO error on writeback?

This is exactly the same.

> It looks like ext4 already goes into a great deal of trouble to handle
> extent conversion to init at io end.

So ext4 has currently two modes of operation controlled by dioread_nolock
mount option. Since about two kernel releases, ext4 defaults to actually
creating extents as unwritten and converting them on IO end. In this mode
ext4 actually doesn't have an issue because when IO error happens, we just
don't convert extents to written ones and so stale data is not exposed. But
we still do support the "legacy" mode of operation where extents are
created as written ones from the start and we just make sure the commit
with block allocation waits for data writeback to complete. And in this
mode there's this possibility of stale data exposure on IO error.

> So couldn't the described case be handled as a private case of
> filling a hole at the end of the file?
> 
> Am I missing something beyond the fact that traditionally, extending
> a file enjoyed the protection of i_disksize, so did not need to worry
> about unwritten extents?

The question really is what to do with the legacy mode of operation...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, back to index

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-31 22:56 Data exposure on IO error Jan Kara
2020-08-01  7:32 ` Amir Goldstein
2020-08-03  7:57   ` Jan Kara

Linux-ext4 Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-ext4/0 linux-ext4/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-ext4 linux-ext4/ https://lore.kernel.org/linux-ext4 \
		linux-ext4@vger.kernel.org
	public-inbox-index linux-ext4

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-ext4


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git