From: Kirill Smelkov <kirr@nexedi.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>, Arnd Bergmann <arnd@arndb.de>,
Christoph Hellwig <hch@lst.de>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
<linux-fsdevel@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
Kirill Smelkov <kirr@nexedi.com>
Subject: [PATCH 2/2] vfs: use &file->f_pos directly on files that have position
Date: Sat, 13 Apr 2019 16:55:06 +0000 [thread overview]
Message-ID: <20190413165449.11168-2-kirr@nexedi.com> (raw)
In-Reply-To: <20190413165116.GB10314@deco.navytux.spb.ru>
Long ago vfs read/write operations were passing ppos=&file->f_pos directly to
.read / .write file_operations methods. That changed in 2004 in 55f09ec0087c
("read/write: pass down a copy of f_pos, not f_pos itself.") which started to
pass ppos=&local_var trying to avoid simultaneous read/write/lseek stepping
onto each other toes and overwriting file->f_pos racily. That measure was not
complete and in 2014 commit 9c225f2655e36 ("vfs: atomic f_pos accesses as per
POSIX") added file->f_pos_lock to completely disable simultaneous
read/write/lseek runs. After f_pos_lock was introduced the reason to avoid
passing ppos=&file->f_pos directly due to concurrency vanished.
Linus explains[1]:
In fact, we *used* to (long ago) pass in the address of "file->f_pos"
itself to the low-level read/write routines. We then changed it to do
that indirection through a local copy of pos (and
file_pos_read/file_pos_write) because we didn't do the proper locking,
so different read/write versions could mess with each other (and with
lseek).
But one of the things that commit 9c225f2655e36 ("vfs: atomic f_pos
accesses as per POSIX") did was to add the proper locking at least for
the cases that we care about deeply, so we *could* say that we have
three cases:
- FMODE_ATOMIC_POS: properly locked,
- FMODE_STREAM: no pos at all
- otherwise a "mostly don't care - don't mix!"
and so we could go back to not copying the pos at all, and instead do
something like
loff_t *ppos = f.file->f_mode & FMODE_STREAM ? NULL : &file->f_pos;
ret = vfs_write(f.file, buf, count, ppos);
and perhaps have a long-term plan to try to get rid of the "don't mix"
case entirely (ie "if you use f_pos, then we'll do the proper
locking")
(The above is obviously surrounded by the fdget_pos()/fdput_pos() that
implements the locking decision).
Currently for regular files we always set FMODE_ATOMIC_POS and change that to
FMODE_STREAM if stream_open is used explicitly on open. That leaves other
files, like e.g. sockets and pipes, for "mostly don't care - don't mix!" case.
Sockets, for example, always check that on read/write the initial pos they
receive is 0 and don't update it. And if it is !0 they return -ESPIPE.
That suggests that we can do the switch into passing &file->f_pos directly now
and incrementally convert to FMODE_STREAM files that were doing the stream-like
checking manually in their low-level .read/.write handlers.
Note: it is theoretically possible that a driver updates *ppos inside
even if read/write returns error. For such cases the conversion will
change IO semantic a bit. The semantic that is changing here was
introduced in 2013 in commit 5faf153ebf61 "don't call file_pos_write()
if vfs_{read,write}{,v}() fails".
[1] https://lore.kernel.org/linux-fsdevel/CAHk-=whJtZt52SnhBGrNMnuxFn3GE9X_e02x8BPxtkqrfyZukw@mail.gmail.com/
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Kirill Smelkov <kirr@nexedi.com>
---
fs/read_write.c | 36 ++++--------------------------------
1 file changed, 4 insertions(+), 32 deletions(-)
diff --git a/fs/read_write.c b/fs/read_write.c
index d62556be6848..13550b65cb2c 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -571,14 +571,7 @@ ssize_t ksys_read(unsigned int fd, char __user *buf, size_t count)
ssize_t ret = -EBADF;
if (f.file) {
- loff_t pos, *ppos = file_ppos(f.file);
- if (ppos) {
- pos = *ppos;
- ppos = &pos;
- }
- ret = vfs_read(f.file, buf, count, ppos);
- if (ret >= 0 && ppos)
- f.file->f_pos = pos;
+ ret = vfs_read(f.file, buf, count, file_ppos(f.file));
fdput_pos(f);
}
return ret;
@@ -595,14 +588,7 @@ ssize_t ksys_write(unsigned int fd, const char __user *buf, size_t count)
ssize_t ret = -EBADF;
if (f.file) {
- loff_t pos, *ppos = file_ppos(f.file);
- if (ppos) {
- pos = *ppos;
- ppos = &pos;
- }
- ret = vfs_write(f.file, buf, count, ppos);
- if (ret >= 0 && ppos)
- f.file->f_pos = pos;
+ ret = vfs_write(f.file, buf, count, file_ppos(f.file));
fdput_pos(f);
}
@@ -1018,14 +1004,7 @@ static ssize_t do_readv(unsigned long fd, const struct iovec __user *vec,
ssize_t ret = -EBADF;
if (f.file) {
- loff_t pos, *ppos = file_ppos(f.file);
- if (ppos) {
- pos = *ppos;
- ppos = &pos;
- }
- ret = vfs_readv(f.file, vec, vlen, ppos, flags);
- if (ret >= 0 && ppos)
- f.file->f_pos = pos;
+ ret = vfs_readv(f.file, vec, vlen, file_ppos(f.file), flags);
fdput_pos(f);
}
@@ -1042,14 +1021,7 @@ static ssize_t do_writev(unsigned long fd, const struct iovec __user *vec,
ssize_t ret = -EBADF;
if (f.file) {
- loff_t pos, *ppos = file_ppos(f.file);
- if (ppos) {
- pos = *ppos;
- ppos = &pos;
- }
- ret = vfs_writev(f.file, vec, vlen, ppos, flags);
- if (ret >= 0 && ppos)
- f.file->f_pos = pos;
+ ret = vfs_writev(f.file, vec, vlen, file_ppos(f.file), flags);
fdput_pos(f);
}
--
2.21.0.593.g511ec345e1
prev parent reply other threads:[~2019-04-13 17:25 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-03-26 22:20 [PATCH 1/3] fs: stream_open - opener for stream-like files so that read and write can run simultaneously without deadlock Kirill Smelkov
2019-03-26 23:22 ` [PATCH 3/3] fuse: Add FOPEN_STREAM and use stream_open() if filesystem returned that from open handler Kirill Smelkov
2019-04-24 7:13 ` [RESEND, PATCH " Kirill Smelkov
[not found] ` <20190424160611.2A71321900@mail.kernel.org>
2019-04-24 19:16 ` Kirill Smelkov
[not found] ` <8794193f3040b798010970228d978c05ad56ec52.1553637462.git.kirr@nexedi.com>
2019-03-27 6:54 ` [PATCH 2/3] *: convert stream-like files from nonseekable_open -> stream_open Lubomir Rintel
2019-03-27 16:58 ` [PATCH 1/3] fs: stream_open - opener for stream-like files so that read and write can run simultaneously without deadlock Juergen Gross
2019-04-06 17:07 ` Linus Torvalds
2019-04-07 20:04 ` Kirill Smelkov
2019-04-08 0:09 ` Linus Torvalds
2019-04-14 7:11 ` Kirill Smelkov
[not found] ` <4c4651e2-167e-bfcc-7b3e-cda118f98a69@rasmusvillemoes.dk>
[not found] ` <20190409203807.GA13855@deco.navytux.spb.ru>
[not found] ` <d8c23d05-8810-13a2-cc50-7a47ff35e90b@rasmusvillemoes.dk>
2019-04-11 12:38 ` Kirill Smelkov
2019-04-11 16:22 ` Linus Torvalds
2019-04-12 12:42 ` Kirill Smelkov
2019-04-13 16:54 ` Kirill Smelkov
2019-04-13 16:54 ` [PATCH 1/2] vfs: pass ppos=NULL to .read()/.write() of FMODE_STREAM files Kirill Smelkov
2019-04-13 17:27 ` Linus Torvalds
2019-04-13 17:38 ` Al Viro
2019-04-13 18:44 ` Kirill Smelkov
2019-04-13 16:55 ` Kirill Smelkov [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190413165449.11168-2-kirr@nexedi.com \
--to=kirr@nexedi.com \
--cc=arnd@arndb.de \
--cc=gregkh@linuxfoundation.org \
--cc=hch@lst.de \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).