From: Kirill Smelkov <kirr@nexedi.com> To: Linus Torvalds <torvalds@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk>, Arnd Bergmann <arnd@arndb.de>, Christoph Hellwig <hch@lst.de>, Greg Kroah-Hartman <gregkh@linuxfoundation.org>, <linux-fsdevel@vger.kernel.org>, <linux-kernel@vger.kernel.org>, Kirill Smelkov <kirr@nexedi.com> Subject: [PATCH 2/2] vfs: use &file->f_pos directly on files that have position Date: Sat, 13 Apr 2019 16:55:07 +0000 [thread overview] Message-ID: <20190413165449.11168-2-kirr@nexedi.com> (raw) In-Reply-To: <20190413165116.GB10314@deco.navytux.spb.ru> Long ago vfs read/write operations were passing ppos=&file->f_pos directly to .read / .write file_operations methods. That changed in 2004 in 55f09ec0087c ("read/write: pass down a copy of f_pos, not f_pos itself.") which started to pass ppos=&local_var trying to avoid simultaneous read/write/lseek stepping onto each other toes and overwriting file->f_pos racily. That measure was not complete and in 2014 commit 9c225f2655e36 ("vfs: atomic f_pos accesses as per POSIX") added file->f_pos_lock to completely disable simultaneous read/write/lseek runs. After f_pos_lock was introduced the reason to avoid passing ppos=&file->f_pos directly due to concurrency vanished. Linus explains[1]: In fact, we *used* to (long ago) pass in the address of "file->f_pos" itself to the low-level read/write routines. We then changed it to do that indirection through a local copy of pos (and file_pos_read/file_pos_write) because we didn't do the proper locking, so different read/write versions could mess with each other (and with lseek). But one of the things that commit 9c225f2655e36 ("vfs: atomic f_pos accesses as per POSIX") did was to add the proper locking at least for the cases that we care about deeply, so we *could* say that we have three cases: - FMODE_ATOMIC_POS: properly locked, - FMODE_STREAM: no pos at all - otherwise a "mostly don't care - don't mix!" and so we could go back to not copying the pos at all, and instead do something like loff_t *ppos = f.file->f_mode & FMODE_STREAM ? NULL : &file->f_pos; ret = vfs_write(f.file, buf, count, ppos); and perhaps have a long-term plan to try to get rid of the "don't mix" case entirely (ie "if you use f_pos, then we'll do the proper locking") (The above is obviously surrounded by the fdget_pos()/fdput_pos() that implements the locking decision). Currently for regular files we always set FMODE_ATOMIC_POS and change that to FMODE_STREAM if stream_open is used explicitly on open. That leaves other files, like e.g. sockets and pipes, for "mostly don't care - don't mix!" case. Sockets, for example, always check that on read/write the initial pos they receive is 0 and don't update it. And if it is !0 they return -ESPIPE. That suggests that we can do the switch into passing &file->f_pos directly now and incrementally convert to FMODE_STREAM files that were doing the stream-like checking manually in their low-level .read/.write handlers. Note: it is theoretically possible that a driver updates *ppos inside even if read/write returns error. For such cases the conversion will change IO semantic a bit. The semantic that is changing here was introduced in 2013 in commit 5faf153ebf61 "don't call file_pos_write() if vfs_{read,write}{,v}() fails". [1] https://lore.kernel.org/linux-fsdevel/CAHk-=whJtZt52SnhBGrNMnuxFn3GE9X_e02x8BPxtkqrfyZukw@mail.gmail.com/ Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Kirill Smelkov <kirr@nexedi.com> --- fs/read_write.c | 36 ++++-------------------------------- 1 file changed, 4 insertions(+), 32 deletions(-) diff --git a/fs/read_write.c b/fs/read_write.c index d62556be6848..13550b65cb2c 100644 --- a/fs/read_write.c +++ b/fs/read_write.c @@ -571,14 +571,7 @@ ssize_t ksys_read(unsigned int fd, char __user *buf, size_t count) ssize_t ret = -EBADF; if (f.file) { - loff_t pos, *ppos = file_ppos(f.file); - if (ppos) { - pos = *ppos; - ppos = &pos; - } - ret = vfs_read(f.file, buf, count, ppos); - if (ret >= 0 && ppos) - f.file->f_pos = pos; + ret = vfs_read(f.file, buf, count, file_ppos(f.file)); fdput_pos(f); } return ret; @@ -595,14 +588,7 @@ ssize_t ksys_write(unsigned int fd, const char __user *buf, size_t count) ssize_t ret = -EBADF; if (f.file) { - loff_t pos, *ppos = file_ppos(f.file); - if (ppos) { - pos = *ppos; - ppos = &pos; - } - ret = vfs_write(f.file, buf, count, ppos); - if (ret >= 0 && ppos) - f.file->f_pos = pos; + ret = vfs_write(f.file, buf, count, file_ppos(f.file)); fdput_pos(f); } @@ -1018,14 +1004,7 @@ static ssize_t do_readv(unsigned long fd, const struct iovec __user *vec, ssize_t ret = -EBADF; if (f.file) { - loff_t pos, *ppos = file_ppos(f.file); - if (ppos) { - pos = *ppos; - ppos = &pos; - } - ret = vfs_readv(f.file, vec, vlen, ppos, flags); - if (ret >= 0 && ppos) - f.file->f_pos = pos; + ret = vfs_readv(f.file, vec, vlen, file_ppos(f.file), flags); fdput_pos(f); } @@ -1042,14 +1021,7 @@ static ssize_t do_writev(unsigned long fd, const struct iovec __user *vec, ssize_t ret = -EBADF; if (f.file) { - loff_t pos, *ppos = file_ppos(f.file); - if (ppos) { - pos = *ppos; - ppos = &pos; - } - ret = vfs_writev(f.file, vec, vlen, ppos, flags); - if (ret >= 0 && ppos) - f.file->f_pos = pos; + ret = vfs_writev(f.file, vec, vlen, file_ppos(f.file), flags); fdput_pos(f); } -- 2.21.0.593.g511ec345e1
WARNING: multiple messages have this Message-ID (diff)
From: Kirill Smelkov <kirr@nexedi.com> To: Linus Torvalds <torvalds@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk>, Arnd Bergmann <arnd@arndb.de>, Christoph Hellwig <hch@lst.de>, Greg Kroah-Hartman <gregkh@linuxfoundation.org>, <linux-fsdevel@vger.kernel.org>, <linux-kernel@vger.kernel.org>, Kirill Smelkov <kirr@nexedi.com> Subject: [PATCH 2/2] vfs: use &file->f_pos directly on files that have position Date: Sat, 13 Apr 2019 16:55:06 +0000 [thread overview] Message-ID: <20190413165449.11168-2-kirr@nexedi.com> (raw) In-Reply-To: <20190413165116.GB10314@deco.navytux.spb.ru> Long ago vfs read/write operations were passing ppos=&file->f_pos directly to .read / .write file_operations methods. That changed in 2004 in 55f09ec0087c ("read/write: pass down a copy of f_pos, not f_pos itself.") which started to pass ppos=&local_var trying to avoid simultaneous read/write/lseek stepping onto each other toes and overwriting file->f_pos racily. That measure was not complete and in 2014 commit 9c225f2655e36 ("vfs: atomic f_pos accesses as per POSIX") added file->f_pos_lock to completely disable simultaneous read/write/lseek runs. After f_pos_lock was introduced the reason to avoid passing ppos=&file->f_pos directly due to concurrency vanished. Linus explains[1]: In fact, we *used* to (long ago) pass in the address of "file->f_pos" itself to the low-level read/write routines. We then changed it to do that indirection through a local copy of pos (and file_pos_read/file_pos_write) because we didn't do the proper locking, so different read/write versions could mess with each other (and with lseek). But one of the things that commit 9c225f2655e36 ("vfs: atomic f_pos accesses as per POSIX") did was to add the proper locking at least for the cases that we care about deeply, so we *could* say that we have three cases: - FMODE_ATOMIC_POS: properly locked, - FMODE_STREAM: no pos at all - otherwise a "mostly don't care - don't mix!" and so we could go back to not copying the pos at all, and instead do something like loff_t *ppos = f.file->f_mode & FMODE_STREAM ? NULL : &file->f_pos; ret = vfs_write(f.file, buf, count, ppos); and perhaps have a long-term plan to try to get rid of the "don't mix" case entirely (ie "if you use f_pos, then we'll do the proper locking") (The above is obviously surrounded by the fdget_pos()/fdput_pos() that implements the locking decision). Currently for regular files we always set FMODE_ATOMIC_POS and change that to FMODE_STREAM if stream_open is used explicitly on open. That leaves other files, like e.g. sockets and pipes, for "mostly don't care - don't mix!" case. Sockets, for example, always check that on read/write the initial pos they receive is 0 and don't update it. And if it is !0 they return -ESPIPE. That suggests that we can do the switch into passing &file->f_pos directly now and incrementally convert to FMODE_STREAM files that were doing the stream-like checking manually in their low-level .read/.write handlers. Note: it is theoretically possible that a driver updates *ppos inside even if read/write returns error. For such cases the conversion will change IO semantic a bit. The semantic that is changing here was introduced in 2013 in commit 5faf153ebf61 "don't call file_pos_write() if vfs_{read,write}{,v}() fails". [1] https://lore.kernel.org/linux-fsdevel/CAHk-=whJtZt52SnhBGrNMnuxFn3GE9X_e02x8BPxtkqrfyZukw@mail.gmail.com/ Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Kirill Smelkov <kirr@nexedi.com> --- fs/read_write.c | 36 ++++-------------------------------- 1 file changed, 4 insertions(+), 32 deletions(-) diff --git a/fs/read_write.c b/fs/read_write.c index d62556be6848..13550b65cb2c 100644 --- a/fs/read_write.c +++ b/fs/read_write.c @@ -571,14 +571,7 @@ ssize_t ksys_read(unsigned int fd, char __user *buf, size_t count) ssize_t ret = -EBADF; if (f.file) { - loff_t pos, *ppos = file_ppos(f.file); - if (ppos) { - pos = *ppos; - ppos = &pos; - } - ret = vfs_read(f.file, buf, count, ppos); - if (ret >= 0 && ppos) - f.file->f_pos = pos; + ret = vfs_read(f.file, buf, count, file_ppos(f.file)); fdput_pos(f); } return ret; @@ -595,14 +588,7 @@ ssize_t ksys_write(unsigned int fd, const char __user *buf, size_t count) ssize_t ret = -EBADF; if (f.file) { - loff_t pos, *ppos = file_ppos(f.file); - if (ppos) { - pos = *ppos; - ppos = &pos; - } - ret = vfs_write(f.file, buf, count, ppos); - if (ret >= 0 && ppos) - f.file->f_pos = pos; + ret = vfs_write(f.file, buf, count, file_ppos(f.file)); fdput_pos(f); } @@ -1018,14 +1004,7 @@ static ssize_t do_readv(unsigned long fd, const struct iovec __user *vec, ssize_t ret = -EBADF; if (f.file) { - loff_t pos, *ppos = file_ppos(f.file); - if (ppos) { - pos = *ppos; - ppos = &pos; - } - ret = vfs_readv(f.file, vec, vlen, ppos, flags); - if (ret >= 0 && ppos) - f.file->f_pos = pos; + ret = vfs_readv(f.file, vec, vlen, file_ppos(f.file), flags); fdput_pos(f); } @@ -1042,14 +1021,7 @@ static ssize_t do_writev(unsigned long fd, const struct iovec __user *vec, ssize_t ret = -EBADF; if (f.file) { - loff_t pos, *ppos = file_ppos(f.file); - if (ppos) { - pos = *ppos; - ppos = &pos; - } - ret = vfs_writev(f.file, vec, vlen, ppos, flags); - if (ret >= 0 && ppos) - f.file->f_pos = pos; + ret = vfs_writev(f.file, vec, vlen, file_ppos(f.file), flags); fdput_pos(f); } -- 2.21.0.593.g511ec345e1
next prev parent reply other threads:[~2019-04-13 17:24 UTC|newest] Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-03-26 22:20 [PATCH 1/3] fs: stream_open - opener for stream-like files so that read and write can run simultaneously without deadlock Kirill Smelkov 2019-03-26 22:20 ` Kirill Smelkov 2019-03-26 23:22 ` [PATCH 3/3] fuse: Add FOPEN_STREAM and use stream_open() if filesystem returned that from open handler Kirill Smelkov 2019-04-24 7:13 ` [RESEND, PATCH " Kirill Smelkov [not found] ` <20190424160611.2A71321900@mail.kernel.org> 2019-04-24 19:16 ` Kirill Smelkov [not found] ` <8794193f3040b798010970228d978c05ad56ec52.1553637462.git.kirr@nexedi.com> 2019-03-27 6:54 ` [PATCH 2/3] *: convert stream-like files from nonseekable_open -> stream_open Lubomir Rintel 2019-03-27 16:58 ` [PATCH 1/3] fs: stream_open - opener for stream-like files so that read and write can run simultaneously without deadlock Juergen Gross 2019-04-06 17:07 ` Linus Torvalds 2019-04-07 20:04 ` Kirill Smelkov 2019-04-08 0:09 ` Linus Torvalds 2019-04-14 7:11 ` Kirill Smelkov [not found] ` <4c4651e2-167e-bfcc-7b3e-cda118f98a69@rasmusvillemoes.dk> [not found] ` <20190409203807.GA13855@deco.navytux.spb.ru> [not found] ` <d8c23d05-8810-13a2-cc50-7a47ff35e90b@rasmusvillemoes.dk> 2019-04-11 12:38 ` Kirill Smelkov 2019-04-11 16:22 ` Linus Torvalds 2019-04-12 12:42 ` Kirill Smelkov 2019-04-13 16:54 ` Kirill Smelkov 2019-04-13 16:54 ` [PATCH 1/2] vfs: pass ppos=NULL to .read()/.write() of FMODE_STREAM files Kirill Smelkov 2019-04-13 17:27 ` Linus Torvalds 2019-04-13 17:38 ` Al Viro 2019-04-13 18:44 ` Kirill Smelkov 2019-04-13 16:55 ` Kirill Smelkov [this message] 2019-04-13 16:55 ` [PATCH 2/2] vfs: use &file->f_pos directly on files that have position Kirill Smelkov
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20190413165449.11168-2-kirr@nexedi.com \ --to=kirr@nexedi.com \ --cc=arnd@arndb.de \ --cc=gregkh@linuxfoundation.org \ --cc=hch@lst.de \ --cc=linux-fsdevel@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=torvalds@linux-foundation.org \ --cc=viro@zeniv.linux.org.uk \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.