* [PATCH] reducing overheads in fget/fput @ 2003-04-28 16:52 Dipankar Sarma 2003-04-28 19:32 ` viro 0 siblings, 1 reply; 8+ messages in thread From: Dipankar Sarma @ 2003-04-28 16:52 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel Sometime in the recent past there were complaints about smp overheads doubling system time taken for a 1MB "dd" from /dev/zero. This path is very sensitive to fget()/fput() costs predominantly from acquiring the rwlock in files_struct and atomic increment/decrement of file->f_count (I experimented to determine this). It seems that this can be avoided in some of the most common cases resulting in significant benefits. The idea is that if we are not sharing the file descriptor table, there is no need to increase the struct file refcnt or even take the files_struct reader-writer lock while lookup up an fd. open/close gurantees that atleast one refcount for file struct is held and while forking (but not sharing fd table), an additional refcount is added for the child for each open fd. By avoiding both the rwlock and atomic operation on file->f_count, we shave off about 0.35 seconds of system time from the following simple test - time dd if=/dev/zero of=foo bs=1 count=1M Results on a 4-CPU P3 Xeon box with 512MB memory 1MB L2 cache - (sys time in seconds) kernel sys time std-dev ------------ -------- ------- UP - vanilla 2.032 0.018 SMP - vanilla 2.996 0.013 UP - file 1.991 0.034 SMP - file 2.646 0.042 I have included my experimental patch here (not meant for inclusion). I added fget_light() and fput_light() which avoid the rwlock and ->f_count if the fd table is not shared. It seems safe provided you are using fget_light() and fput_light() within the same system call and are not forking after an fget_light(). To test it I modified some common syscalls in fs/read_write.c. It of course doesn't optimize the case when we share fd tables using CLONE_FILES. That case can probably be optimized using an old RCU-based patch that we have. Does this approach make sense ? Or am I missing some big gotcha somewhere ? Oh, btw the patch survives LTP. Thanks Dipankar diff -urN linux-2.5.66-base/fs/file_table.c linux-2.5.66-file/fs/file_table.c --- linux-2.5.66-base/fs/file_table.c 2003-03-25 03:29:55.000000000 +0530 +++ linux-2.5.66-file/fs/file_table.c 2003-04-28 15:36:01.000000000 +0530 @@ -147,6 +147,15 @@ __fput(file); } +void fput_light(struct file * file) +{ + struct files_struct *files = current->files; + + if (unlikely(atomic_read(&files->count) != 1)) + if (atomic_dec_and_test(&file->f_count)) + __fput(file); +} + /* __fput is called from task context when aio completion releases the last * last use of a struct file *. Do not use otherwise. */ @@ -190,6 +199,29 @@ return file; } +/* + * Lightweight file lookup - no refcnt increment if fd table isn't shared. + * You can use this only if it is guranteed that the current task already + * holds a refcnt to that file + */ +struct file *fget_light(unsigned int fd) +{ + struct file *file; + struct files_struct *files = current->files; + + if (likely((atomic_read(&files->count) == 1))) { + file = fcheck(fd); + } else { + read_lock(&files->file_lock); + file = fcheck(fd); + if (file) + get_file(file); + read_unlock(&files->file_lock); + } + return file; +} + + void put_filp(struct file *file) { if (atomic_dec_and_test(&file->f_count)) { diff -urN linux-2.5.66-base/fs/read_write.c linux-2.5.66-file/fs/read_write.c --- linux-2.5.66-base/fs/read_write.c 2003-03-25 03:30:51.000000000 +0530 +++ linux-2.5.66-file/fs/read_write.c 2003-04-28 12:14:34.000000000 +0530 @@ -117,7 +117,7 @@ struct file * file; retval = -EBADF; - file = fget(fd); + file = fget_light(fd); if (!file) goto bad; @@ -128,7 +128,7 @@ if (res != (loff_t)retval) retval = -EOVERFLOW; /* LFS: should only happen on 32 bit platforms */ } - fput(file); + fput_light(file); bad: return retval; } @@ -143,7 +143,7 @@ loff_t offset; retval = -EBADF; - file = fget(fd); + file = fget_light(fd); if (!file) goto bad; @@ -161,7 +161,7 @@ retval = 0; } out_putf: - fput(file); + fput_light(file); bad: return retval; } @@ -252,10 +252,10 @@ struct file *file; ssize_t ret = -EBADF; - file = fget(fd); + file = fget_light(fd); if (file) { ret = vfs_read(file, buf, count, &file->f_pos); - fput(file); + fput_light(file); } return ret; @@ -266,10 +266,10 @@ struct file *file; ssize_t ret = -EBADF; - file = fget(fd); + file = fget_light(fd); if (file) { ret = vfs_write(file, buf, count, &file->f_pos); - fput(file); + fput_light(file); } return ret; @@ -284,10 +284,10 @@ if (pos < 0) return -EINVAL; - file = fget(fd); + file = fget_light(fd); if (file) { ret = vfs_read(file, buf, count, &pos); - fput(file); + fput_light(file); } return ret; @@ -302,10 +302,10 @@ if (pos < 0) return -EINVAL; - file = fget(fd); + file = fget_light(fd); if (file) { ret = vfs_write(file, buf, count, &pos); - fput(file); + fput_light(file); } return ret; @@ -480,10 +480,10 @@ struct file *file; ssize_t ret = -EBADF; - file = fget(fd); + file = fget_light(fd); if (file) { ret = vfs_readv(file, vec, vlen, &file->f_pos); - fput(file); + fput_light(file); } return ret; @@ -495,10 +495,10 @@ struct file *file; ssize_t ret = -EBADF; - file = fget(fd); + file = fget_light(fd); if (file) { ret = vfs_writev(file, vec, vlen, &file->f_pos); - fput(file); + fput_light(file); } return ret; @@ -516,7 +516,7 @@ * Get input file, and verify that it is ok.. */ retval = -EBADF; - in_file = fget(in_fd); + in_file = fget_light(in_fd); if (!in_file) goto out; if (!(in_file->f_mode & FMODE_READ)) @@ -539,7 +539,7 @@ * Get output file, and verify that it is ok.. */ retval = -EBADF; - out_file = fget(out_fd); + out_file = fget_light(out_fd); if (!out_file) goto fput_in; if (!(out_file->f_mode & FMODE_WRITE)) @@ -579,9 +579,9 @@ retval = -EOVERFLOW; fput_out: - fput(out_file); + fput_light(out_file); fput_in: - fput(in_file); + fput_light(in_file); out: return retval; } diff -urN linux-2.5.66-base/include/linux/file.h linux-2.5.66-file/include/linux/file.h --- linux-2.5.66-base/include/linux/file.h 2003-03-25 03:29:52.000000000 +0530 +++ linux-2.5.66-file/include/linux/file.h 2003-04-28 12:15:21.000000000 +0530 @@ -35,7 +35,9 @@ extern void FASTCALL(__fput(struct file *)); extern void FASTCALL(fput(struct file *)); +extern void FASTCALL(fput_light(struct file *)); extern struct file * FASTCALL(fget(unsigned int fd)); +extern struct file * FASTCALL(fget_light(unsigned int fd)); extern void FASTCALL(set_close_on_exec(unsigned int fd, int flag)); extern void put_filp(struct file *); extern int get_unused_fd(void); ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] reducing overheads in fget/fput 2003-04-28 16:52 [PATCH] reducing overheads in fget/fput Dipankar Sarma @ 2003-04-28 19:32 ` viro 2003-04-28 19:58 ` Dipankar Sarma 0 siblings, 1 reply; 8+ messages in thread From: viro @ 2003-04-28 19:32 UTC (permalink / raw) To: Dipankar Sarma; +Cc: Andrew Morton, linux-kernel On Mon, Apr 28, 2003 at 10:22:40PM +0530, Dipankar Sarma wrote: > Does this approach make sense ? Or am I missing some big gotcha > somewhere ? Oh, btw the patch survives LTP. You are. Have a process share file table at the time of call and have its sibling die in the middle. Oops - condition that had been true at time of fget_light() (->files->count > 1) is false at the time we fput_light(). Have fun - we had just leaked a reference to struct file. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] reducing overheads in fget/fput 2003-04-28 19:32 ` viro @ 2003-04-28 19:58 ` Dipankar Sarma 2003-05-02 17:17 ` Dipankar Sarma 0 siblings, 1 reply; 8+ messages in thread From: Dipankar Sarma @ 2003-04-28 19:58 UTC (permalink / raw) To: viro; +Cc: Andrew Morton, linux-kernel On Mon, Apr 28, 2003 at 08:32:28PM +0100, viro@parcelfarce.linux.theplanet.co.uk wrote: > You are. Have a process share file table at the time of call and > have its sibling die in the middle. Oops - condition that had > been true at time of fget_light() (->files->count > 1) is false > at the time we fput_light(). Have fun - we had just leaked a > reference to struct file. That shouldn't be very difficult to fix. For the fget_light/fput_light pair in a syscall, we make the files->count == 1 check only once at the beginning. Do you see a problem with that ? Thanks Dipankar ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] reducing overheads in fget/fput 2003-04-28 19:58 ` Dipankar Sarma @ 2003-05-02 17:17 ` Dipankar Sarma 2003-05-02 20:54 ` Andrew Morton 0 siblings, 1 reply; 8+ messages in thread From: Dipankar Sarma @ 2003-05-02 17:17 UTC (permalink / raw) To: viro; +Cc: Andrew Morton, linux-kernel On Tue, Apr 29, 2003 at 01:28:36AM +0530, Dipankar Sarma wrote: > On Mon, Apr 28, 2003 at 08:32:28PM +0100, viro@parcelfarce.linux.theplanet.co.uk wrote: > > You are. Have a process share file table at the time of call and > > have its sibling die in the middle. Oops - condition that had > > been true at time of fget_light() (->files->count > 1) is false > > at the time we fput_light(). Have fun - we had just leaked a > > reference to struct file. > > That shouldn't be very difficult to fix. For the fget_light/fput_light > pair in a syscall, we make the files->count == 1 check only once at the > beginning. Do you see a problem with that ? Here is a patch that fixes that. I re-did the measurements with Andrew's experiment (dd if=/dev/zero of=foo bs=1 count=1M). [4CPU P3 xeon with 1MB L2 cache and 512MB ram] kernel sys time std-dev ------------ -------- ------- UP - vanilla 2.104 0.028 SMP - vanilla 2.976 0.023 UP - file 1.867 0.019 SMP - file 2.719 0.026 Thanks Dipankar diff -urN linux-2.5.66-base/fs/file_table.c linux-2.5.66-file/fs/file_table.c --- linux-2.5.66-base/fs/file_table.c 2003-03-25 03:29:55.000000000 +0530 +++ linux-2.5.66-file/fs/file_table.c 2003-05-02 21:23:27.000000000 +0530 @@ -147,6 +147,13 @@ __fput(file); } +void fput_light(struct file * file, int flag) +{ + if (unlikely(flag)) + if (atomic_dec_and_test(&file->f_count)) + __fput(file); +} + /* __fput is called from task context when aio completion releases the last * last use of a struct file *. Do not use otherwise. */ @@ -190,6 +197,34 @@ return file; } +/* + * Lightweight file lookup - no refcnt increment if fd table isn't shared. + * You can use this only if it is guranteed that the current task already + * holds a refcnt to that file. That check has to be done at fget() only + * and a flag is returned to be passed to the corresponding fput_light(). + * There must not be a cloning between an fget_light/fput_light pair. + */ +struct file *fget_light(unsigned int fd, int *flag) +{ + struct file *file; + struct files_struct *files = current->files; + + *flag = 0; + if (likely((atomic_read(&files->count) == 1))) { + file = fcheck(fd); + } else { + read_lock(&files->file_lock); + file = fcheck(fd); + if (file) { + get_file(file); + *flag = 1; + } + read_unlock(&files->file_lock); + } + return file; +} + + void put_filp(struct file *file) { if (atomic_dec_and_test(&file->f_count)) { diff -urN linux-2.5.66-base/fs/read_write.c linux-2.5.66-file/fs/read_write.c --- linux-2.5.66-base/fs/read_write.c 2003-03-25 03:30:51.000000000 +0530 +++ linux-2.5.66-file/fs/read_write.c 2003-05-02 13:42:53.000000000 +0530 @@ -115,9 +115,10 @@ { off_t retval; struct file * file; + int flag; retval = -EBADF; - file = fget(fd); + file = fget_light(fd, &flag); if (!file) goto bad; @@ -128,7 +129,7 @@ if (res != (loff_t)retval) retval = -EOVERFLOW; /* LFS: should only happen on 32 bit platforms */ } - fput(file); + fput_light(file, flag); bad: return retval; } @@ -141,9 +142,10 @@ int retval; struct file * file; loff_t offset; + int flag; retval = -EBADF; - file = fget(fd); + file = fget_light(fd, &flag); if (!file) goto bad; @@ -161,7 +163,7 @@ retval = 0; } out_putf: - fput(file); + fput_light(file, flag); bad: return retval; } @@ -251,11 +253,12 @@ { struct file *file; ssize_t ret = -EBADF; + int flag; - file = fget(fd); + file = fget_light(fd, &flag); if (file) { ret = vfs_read(file, buf, count, &file->f_pos); - fput(file); + fput_light(file, flag); } return ret; @@ -265,11 +268,12 @@ { struct file *file; ssize_t ret = -EBADF; + int flag; - file = fget(fd); + file = fget_light(fd, &flag); if (file) { ret = vfs_write(file, buf, count, &file->f_pos); - fput(file); + fput_light(file, flag); } return ret; @@ -280,14 +284,15 @@ { struct file *file; ssize_t ret = -EBADF; + int flag; if (pos < 0) return -EINVAL; - file = fget(fd); + file = fget_light(fd, &flag); if (file) { ret = vfs_read(file, buf, count, &pos); - fput(file); + fput_light(file, flag); } return ret; @@ -298,14 +303,15 @@ { struct file *file; ssize_t ret = -EBADF; + int flag; if (pos < 0) return -EINVAL; - file = fget(fd); + file = fget_light(fd, &flag); if (file) { ret = vfs_write(file, buf, count, &pos); - fput(file); + fput_light(file, flag); } return ret; @@ -479,11 +485,12 @@ { struct file *file; ssize_t ret = -EBADF; + int flag; - file = fget(fd); + file = fget_light(fd, &flag); if (file) { ret = vfs_readv(file, vec, vlen, &file->f_pos); - fput(file); + fput_light(file, flag); } return ret; @@ -494,11 +501,12 @@ { struct file *file; ssize_t ret = -EBADF; + int flag; - file = fget(fd); + file = fget_light(fd, &flag); if (file) { ret = vfs_writev(file, vec, vlen, &file->f_pos); - fput(file); + fput_light(file, flag); } return ret; @@ -511,12 +519,13 @@ struct inode * in_inode, * out_inode; loff_t pos; ssize_t retval; + int flag_in, flag_out; /* * Get input file, and verify that it is ok.. */ retval = -EBADF; - in_file = fget(in_fd); + in_file = fget_light(in_fd, &flag_in); if (!in_file) goto out; if (!(in_file->f_mode & FMODE_READ)) @@ -539,7 +548,7 @@ * Get output file, and verify that it is ok.. */ retval = -EBADF; - out_file = fget(out_fd); + out_file = fget_light(out_fd, &flag_out); if (!out_file) goto fput_in; if (!(out_file->f_mode & FMODE_WRITE)) @@ -579,9 +588,9 @@ retval = -EOVERFLOW; fput_out: - fput(out_file); + fput_light(out_file, flag_out); fput_in: - fput(in_file); + fput_light(in_file, flag_in); out: return retval; } diff -urN linux-2.5.66-base/include/linux/file.h linux-2.5.66-file/include/linux/file.h --- linux-2.5.66-base/include/linux/file.h 2003-03-25 03:29:52.000000000 +0530 +++ linux-2.5.66-file/include/linux/file.h 2003-05-02 13:43:31.000000000 +0530 @@ -35,7 +35,9 @@ extern void FASTCALL(__fput(struct file *)); extern void FASTCALL(fput(struct file *)); +extern void FASTCALL(fput_light(struct file *, int)); extern struct file * FASTCALL(fget(unsigned int fd)); +extern struct file * FASTCALL(fget_light(unsigned int fd, int *flag)); extern void FASTCALL(set_close_on_exec(unsigned int fd, int flag)); extern void put_filp(struct file *); extern int get_unused_fd(void); ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] reducing overheads in fget/fput 2003-05-02 17:17 ` Dipankar Sarma @ 2003-05-02 20:54 ` Andrew Morton 2003-05-03 3:53 ` Dipankar Sarma 0 siblings, 1 reply; 8+ messages in thread From: Andrew Morton @ 2003-05-02 20:54 UTC (permalink / raw) To: dipankar; +Cc: viro, linux-kernel Dipankar Sarma <dipankar@in.ibm.com> wrote: > > > That shouldn't be very difficult to fix. For the fget_light/fput_light > > pair in a syscall, we make the files->count == 1 check only once at the > > beginning. Do you see a problem with that ? > > Here is a patch that fixes that. This patch is fairly foul. > kernel sys time std-dev > ------------ -------- ------- > UP - vanilla 2.104 0.028 > SMP - vanilla 2.976 0.023 > UP - file 1.867 0.019 > SMP - file 2.719 0.026 But it is localised, and makes a substantial difference. I inlined fput_light: diff -puN fs/file_table.c~fget-speedup-inline-fput_light fs/file_table.c --- 25/fs/file_table.c~fget-speedup-inline-fput_light Fri May 2 13:51:45 2003 +++ 25-akpm/fs/file_table.c Fri May 2 13:52:23 2003 @@ -141,19 +141,12 @@ void close_private_file(struct file *fil security_file_free(file); } -void fput(struct file * file) +void fput(struct file *file) { if (atomic_dec_and_test(&file->f_count)) __fput(file); } -void fput_light(struct file * file, int flag) -{ - if (unlikely(flag)) - if (atomic_dec_and_test(&file->f_count)) - __fput(file); -} - /* __fput is called from task context when aio completion releases the last * last use of a struct file *. Do not use otherwise. */ diff -puN include/linux/file.h~fget-speedup-inline-fput_light include/linux/file.h --- 25/include/linux/file.h~fget-speedup-inline-fput_light Fri May 2 13:51:45 2003 +++ 25-akpm/include/linux/file.h Fri May 2 13:52:52 2003 @@ -35,7 +35,13 @@ struct files_struct { extern void FASTCALL(__fput(struct file *)); extern void FASTCALL(fput(struct file *)); -extern void FASTCALL(fput_light(struct file *, int)); + +static inline void fput_light(struct file *file, int flag) +{ + if (unlikely(flag)) + fput(file); +} + extern struct file * FASTCALL(fget(unsigned int fd)); extern struct file * FASTCALL(fget_light(unsigned int fd, int *flag)); extern void FASTCALL(set_close_on_exec(unsigned int fd, int flag)); _ ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] reducing overheads in fget/fput 2003-05-02 20:54 ` Andrew Morton @ 2003-05-03 3:53 ` Dipankar Sarma 2003-05-03 4:00 ` Andrew Morton 0 siblings, 1 reply; 8+ messages in thread From: Dipankar Sarma @ 2003-05-03 3:53 UTC (permalink / raw) To: Andrew Morton; +Cc: viro, linux-kernel On Fri, May 02, 2003 at 01:54:04PM -0700, Andrew Morton wrote: > Dipankar Sarma <dipankar@in.ibm.com> wrote: > > Here is a patch that fixes that. > > This patch is fairly foul. Not sure what your grouse is, but I don't like the fget_ligth()/fput_light semantics myself. They don't seem natural, but I can't think of better way to do this. > > > kernel sys time std-dev > > ------------ -------- ------- > > UP - vanilla 2.104 0.028 > > SMP - vanilla 2.976 0.023 > > UP - file 1.867 0.019 > > SMP - file 2.719 0.026 > > But it is localised, and makes a substantial difference. If I haven't broken too many things, then I will try to get some performance results from large machines. > > I inlined fput_light: Looks good. Thanks Dipankar ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] reducing overheads in fget/fput 2003-05-03 3:53 ` Dipankar Sarma @ 2003-05-03 4:00 ` Andrew Morton 2003-05-03 4:24 ` Dipankar Sarma 0 siblings, 1 reply; 8+ messages in thread From: Andrew Morton @ 2003-05-03 4:00 UTC (permalink / raw) To: dipankar; +Cc: viro, linux-kernel Dipankar Sarma <dipankar@in.ibm.com> wrote: > > On Fri, May 02, 2003 at 01:54:04PM -0700, Andrew Morton wrote: > > Dipankar Sarma <dipankar@in.ibm.com> wrote: > > > Here is a patch that fixes that. > > > > This patch is fairly foul. > > Not sure what your grouse is, but I don't like the fget_ligth()/fput_light > semantics myself. They don't seem natural, but I can't think of > better way to do this. Precisely. > > > > > kernel sys time std-dev > > > ------------ -------- ------- > > > UP - vanilla 2.104 0.028 > > > SMP - vanilla 2.976 0.023 > > > UP - file 1.867 0.019 > > > SMP - file 2.719 0.026 > > > > But it is localised, and makes a substantial difference. > > If I haven't broken too many things, then I will try to get some > performance results from large machines. P4's will like it more. > > > > I inlined fput_light: > > Looks good. I also renamed `flag' to `fput_needed' everywhere. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] reducing overheads in fget/fput 2003-05-03 4:00 ` Andrew Morton @ 2003-05-03 4:24 ` Dipankar Sarma 0 siblings, 0 replies; 8+ messages in thread From: Dipankar Sarma @ 2003-05-03 4:24 UTC (permalink / raw) To: Andrew Morton; +Cc: viro, linux-kernel On Fri, May 02, 2003 at 09:00:03PM -0700, Andrew Morton wrote: > Dipankar Sarma <dipankar@in.ibm.com> wrote: > > Not sure what your grouse is, but I don't like the fget_ligth()/fput_light > > semantics myself. They don't seem natural, but I can't think of > > better way to do this. > > Precisely. I thought of doing something like - static inline is_fds_shared(void) { struct files_struct *files = current->files; return atomic_read(&files->count) != 1; } sys_blah(..) { int fds_shared = is_fds_shared(); file = fget_light(fd, fds_shared); ... ... fput_light(file, fds_shared); } It still didn't look very natural. We leave open the possibility of users doing is_fds_shared() for both fget_light and fput_light. With fput_needed flag, atleast we force them to use what is returned by fget_light. Thanks Dipankar ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2003-05-03 4:09 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2003-04-28 16:52 [PATCH] reducing overheads in fget/fput Dipankar Sarma 2003-04-28 19:32 ` viro 2003-04-28 19:58 ` Dipankar Sarma 2003-05-02 17:17 ` Dipankar Sarma 2003-05-02 20:54 ` Andrew Morton 2003-05-03 3:53 ` Dipankar Sarma 2003-05-03 4:00 ` Andrew Morton 2003-05-03 4:24 ` Dipankar Sarma
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).