linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/1] fs: fix race between llseek SEEK_END and write
@ 2018-11-16  8:37 Eiichi Tsukata
  2018-11-16  8:37 ` [RFC PATCH 1/1] ext4: " Eiichi Tsukata
  2018-11-16 15:44 ` [RFC PATCH 0/1] fs: " Andi Kleen
  0 siblings, 2 replies; 4+ messages in thread
From: Eiichi Tsukata @ 2018-11-16  8:37 UTC (permalink / raw)
  To: andi, tytso, adilger.kernel, linux-ext4, linux-fsdevel; +Cc: Eiichi Tsukata

Hello,

Some file systems (including ext4, xfs, ramfs ...) have the following
problem as I've described in the commit message of the sample patch
[PATCH 1/1] for ext4.

  The commit ef3d0fd27e90 ("vfs: do (nearly) lockless generic_file_llseek")
  removed almost all locks in llseek() including SEEK_END. It based on the
  idea that write() updates size atomically. But in fact, write() can be
  divided into two or more parts in generic_perform_write() when pos
  straddles over the PAGE_SIZE, which results in updating size multiple
  times in one write(). It means that llseek() can see the size being
  updated during write().

  This race changes behavior of some applications. 'tail' is one of those
  applications. It reads range [pos, pos_end] where pos_end is obtained
  via llseek() SEEK_END. Sometimes, a read line could be broken.

  reproducer:

    $ while true; do echo 123456 >> out; done
    $ while true; do tail out | grep -v 123456 ; done

  example output(take 30 secs):

    12345
    1
    1234
    1
    12
    1234

Note: Some file systems which indivisually implements llseek() and hold
      inode mutex lock in it are not affected. ex:) btrfs, ocfs2


I would like to ask you the following questions;

Q1. Do you consider this behavior as a bug in kernel?
    or userspace applications are responseible for it?

Q2. If it is a bug, how should we fix it?

Currently I'm planning to re-introduce generic_file_llseek_unlocked()
and inode lock in generic_file_llseek() for SEEK_END. Then replace
generic_file_llseek() with generic_file_llseek_unlocked() if it called
with inode lock in individual file systems. Please let me know if the
way is not appropreate or any other better way to fix it.

Thanks

Eiichi Tsukata (1):
  ext4: fix race between llseek SEEK_END and write

 fs/ext4/file.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

-- 
2.19.1

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [RFC PATCH 1/1] ext4: fix race between llseek SEEK_END and write
  2018-11-16  8:37 [RFC PATCH 0/1] fs: fix race between llseek SEEK_END and write Eiichi Tsukata
@ 2018-11-16  8:37 ` Eiichi Tsukata
  2018-11-16 15:44 ` [RFC PATCH 0/1] fs: " Andi Kleen
  1 sibling, 0 replies; 4+ messages in thread
From: Eiichi Tsukata @ 2018-11-16  8:37 UTC (permalink / raw)
  To: andi, tytso, adilger.kernel, linux-ext4, linux-fsdevel; +Cc: Eiichi Tsukata

The commit ef3d0fd27e90 ("vfs: do (nearly) lockless generic_file_llseek")
removed almost all locks in llseek() including SEEK_END. It based on the
idea that write() updates size atomically. But in fact, write() can be
divided into two or more parts in generic_perform_write() when pos
straddles over the PAGE_SIZE, which results in updating size multiple
times in one write(). It means that llseek() can see the size being
updated during write().

This race changes behavior of some applications. 'tail' is one of those
applications. It reads range [pos, pos_end] where pos_end is obtained
via llseek() SEEK_END. Sometimes, a read line could be broken.

reproducer:

  $ while true; do echo 123456 >> out; done
  $ while true; do tail out | grep -v 123456 ; done

example output(take 30 secs):

  12345
  1
  1234
  1
  12
  1234

Signed-off-by: Eiichi Tsukata <devel@etsukata.com>
---
 fs/ext4/file.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 69d65d49837b..6479f3066043 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -477,6 +477,16 @@ loff_t ext4_llseek(struct file *file, loff_t offset, int whence)
 	default:
 		return generic_file_llseek_size(file, offset, whence,
 						maxbytes, i_size_read(inode));
+	case SEEK_END:
+		/*
+		 * protects against inode size race with write so that llseek
+		 * doesn't see inode size being updated in generic_perform_write
+		 */
+		inode_lock_shared(inode);
+		offset = generic_file_llseek_size(file, offset, whence,
+						  maxbytes, i_size_read(inode));
+		inode_unlock_shared(inode);
+		return offset;
 	case SEEK_HOLE:
 		inode_lock_shared(inode);
 		offset = iomap_seek_hole(inode, offset, &ext4_iomap_ops);
-- 
2.19.1

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [RFC PATCH 0/1] fs: fix race between llseek SEEK_END and write
  2018-11-16  8:37 [RFC PATCH 0/1] fs: fix race between llseek SEEK_END and write Eiichi Tsukata
  2018-11-16  8:37 ` [RFC PATCH 1/1] ext4: " Eiichi Tsukata
@ 2018-11-16 15:44 ` Andi Kleen
  2018-11-17  7:29   ` Eiichi Tsukata
  1 sibling, 1 reply; 4+ messages in thread
From: Andi Kleen @ 2018-11-16 15:44 UTC (permalink / raw)
  To: Eiichi Tsukata; +Cc: andi, tytso, adilger.kernel, linux-ext4, linux-fsdevel

> I would like to ask you the following questions;
> 
> Q1. Do you consider this behavior as a bug in kernel?
>     or userspace applications are responseible for it?

Yes I would consider it a bug.

> 
> Q2. If it is a bug, how should we fix it?
> 
> Currently I'm planning to re-introduce generic_file_llseek_unlocked()
> and inode lock in generic_file_llseek() for SEEK_END. Then replace
> generic_file_llseek() with generic_file_llseek_unlocked() if it called
> with inode lock in individual file systems. Please let me know if the
> way is not appropreate or any other better way to fix it.

Sounds reasonable.

-Andi

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC PATCH 0/1] fs: fix race between llseek SEEK_END and write
  2018-11-16 15:44 ` [RFC PATCH 0/1] fs: " Andi Kleen
@ 2018-11-17  7:29   ` Eiichi Tsukata
  0 siblings, 0 replies; 4+ messages in thread
From: Eiichi Tsukata @ 2018-11-17  7:29 UTC (permalink / raw)
  To: andi; +Cc: Theodore Ts'o, adilger.kernel, linux-ext4, linux-fsdevel

> >
> > Q2. If it is a bug, how should we fix it?
> >
> > Currently I'm planning to re-introduce generic_file_llseek_unlocked()
> > and inode lock in generic_file_llseek() for SEEK_END. Then replace
> > generic_file_llseek() with generic_file_llseek_unlocked() if it called
> > with inode lock in individual file systems. Please let me know if the
> > way is not appropreate or any other better way to fix it.
>
> Sounds reasonable.
>
> -Andi

Thanks for comments. I'll make a patch.

Eiichi

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-11-17 17:45 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-16  8:37 [RFC PATCH 0/1] fs: fix race between llseek SEEK_END and write Eiichi Tsukata
2018-11-16  8:37 ` [RFC PATCH 1/1] ext4: " Eiichi Tsukata
2018-11-16 15:44 ` [RFC PATCH 0/1] fs: " Andi Kleen
2018-11-17  7:29   ` Eiichi Tsukata

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).