* Re: [PATCH 0/3] readfile(2): a new syscall to make open/read/close faster @ 2020-07-05 2:06 Jan Ziak 2020-07-05 2:16 ` Matthew Wilcox 2020-07-05 11:50 ` Greg KH 0 siblings, 2 replies; 31+ messages in thread From: Jan Ziak @ 2020-07-05 2:06 UTC (permalink / raw) To: gregkh Cc: linux-api, linux-fsdevel, linux-kernel, linux-kselftest, linux-man, mtk.manpages, shuah, viro Hello At first, I thought that the proposed system call is capable of reading *multiple* small files using a single system call - which would help increase HDD/SSD queue utilization and increase IOPS (I/O operations per second) - but that isn't the case and the proposed system call can read just a single file. Without the ability to read multiple small files using a single system call, it is impossible to increase IOPS (unless an application is using multiple reader threads or somehow instructs the kernel to prefetch multiple files into memory). While you are at it, why not also add a readfiles system call to read multiple, presumably small, files? The initial unoptimized implementation of readfiles syscall can simply call readfile sequentially. Sincerely Jan (atomsymbol) ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 0/3] readfile(2): a new syscall to make open/read/close faster 2020-07-05 2:06 [PATCH 0/3] readfile(2): a new syscall to make open/read/close faster Jan Ziak @ 2020-07-05 2:16 ` Matthew Wilcox 2020-07-05 2:46 ` Jan Ziak 2020-07-05 11:50 ` Greg KH 1 sibling, 1 reply; 31+ messages in thread From: Matthew Wilcox @ 2020-07-05 2:16 UTC (permalink / raw) To: Jan Ziak Cc: gregkh, linux-api, linux-fsdevel, linux-kernel, linux-kselftest, linux-man, mtk.manpages, shuah, viro On Sun, Jul 05, 2020 at 04:06:22AM +0200, Jan Ziak wrote: > Hello > > At first, I thought that the proposed system call is capable of > reading *multiple* small files using a single system call - which > would help increase HDD/SSD queue utilization and increase IOPS (I/O > operations per second) - but that isn't the case and the proposed > system call can read just a single file. > > Without the ability to read multiple small files using a single system > call, it is impossible to increase IOPS (unless an application is > using multiple reader threads or somehow instructs the kernel to > prefetch multiple files into memory). What API would you use for this? ssize_t readfiles(int dfd, char **files, void **bufs, size_t *lens); I pretty much hate this interface, so I hope you have something better in mind. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 0/3] readfile(2): a new syscall to make open/read/close faster 2020-07-05 2:16 ` Matthew Wilcox @ 2020-07-05 2:46 ` Jan Ziak 2020-07-05 3:12 ` Matthew Wilcox 2020-07-05 6:32 ` Andreas Dilger 0 siblings, 2 replies; 31+ messages in thread From: Jan Ziak @ 2020-07-05 2:46 UTC (permalink / raw) To: Matthew Wilcox Cc: gregkh, linux-api, linux-fsdevel, linux-kernel, linux-kselftest, linux-man, mtk.manpages, shuah, viro On Sun, Jul 5, 2020 at 4:16 AM Matthew Wilcox <willy@infradead.org> wrote: > > On Sun, Jul 05, 2020 at 04:06:22AM +0200, Jan Ziak wrote: > > Hello > > > > At first, I thought that the proposed system call is capable of > > reading *multiple* small files using a single system call - which > > would help increase HDD/SSD queue utilization and increase IOPS (I/O > > operations per second) - but that isn't the case and the proposed > > system call can read just a single file. > > > > Without the ability to read multiple small files using a single system > > call, it is impossible to increase IOPS (unless an application is > > using multiple reader threads or somehow instructs the kernel to > > prefetch multiple files into memory). > > What API would you use for this? > > ssize_t readfiles(int dfd, char **files, void **bufs, size_t *lens); > > I pretty much hate this interface, so I hope you have something better > in mind. I am proposing the following: struct readfile_t { int dirfd; const char *pathname; void *buf; size_t count; int flags; ssize_t retval; // set by kernel int reserved; // not used by kernel }; int readfiles(struct readfile_t *requests, size_t count); Returns zero if all requests succeeded, otherwise the returned value is non-zero (glibc wrapper: -1) and user-space is expected to check which requests have succeeded and which have failed. retval in readfile_t is set to what the single-file readfile syscall would return if it was called with the contents of the corresponding readfile_t struct. The glibc library wrapper of this system call is expected to store the errno in the "reserved" field. Thus, a programmer using glibc sees: struct readfile_t { int dirfd; const char *pathname; void *buf; size_t count; int flags; ssize_t retval; // set by glibc (-1 on error) int errno; // set by glibc if retval is -1 }; retval and errno in glibc's readfile_t are set to what the single-file glibc readfile would return (retval) and set (errno) if it was called with the contents of the corresponding readfile_t struct. In case of an error, glibc will pick one readfile_t which failed (such as: the 1st failed one) and use it to set glibc's errno. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 0/3] readfile(2): a new syscall to make open/read/close faster 2020-07-05 2:46 ` Jan Ziak @ 2020-07-05 3:12 ` Matthew Wilcox 2020-07-05 3:18 ` Jan Ziak 2020-07-05 6:32 ` Andreas Dilger 1 sibling, 1 reply; 31+ messages in thread From: Matthew Wilcox @ 2020-07-05 3:12 UTC (permalink / raw) To: Jan Ziak Cc: gregkh, linux-api, linux-fsdevel, linux-kernel, linux-kselftest, linux-man, mtk.manpages, shuah, viro On Sun, Jul 05, 2020 at 04:46:04AM +0200, Jan Ziak wrote: > On Sun, Jul 5, 2020 at 4:16 AM Matthew Wilcox <willy@infradead.org> wrote: > > > > On Sun, Jul 05, 2020 at 04:06:22AM +0200, Jan Ziak wrote: > > > Hello > > > > > > At first, I thought that the proposed system call is capable of > > > reading *multiple* small files using a single system call - which > > > would help increase HDD/SSD queue utilization and increase IOPS (I/O > > > operations per second) - but that isn't the case and the proposed > > > system call can read just a single file. > > > > > > Without the ability to read multiple small files using a single system > > > call, it is impossible to increase IOPS (unless an application is > > > using multiple reader threads or somehow instructs the kernel to > > > prefetch multiple files into memory). > > > > What API would you use for this? > > > > ssize_t readfiles(int dfd, char **files, void **bufs, size_t *lens); > > > > I pretty much hate this interface, so I hope you have something better > > in mind. > > I am proposing the following: > > struct readfile_t { > int dirfd; > const char *pathname; > void *buf; > size_t count; > int flags; > ssize_t retval; // set by kernel > int reserved; // not used by kernel > }; > > int readfiles(struct readfile_t *requests, size_t count); > > Returns zero if all requests succeeded, otherwise the returned value > is non-zero (glibc wrapper: -1) and user-space is expected to check > which requests have succeeded and which have failed. retval in > readfile_t is set to what the single-file readfile syscall would > return if it was called with the contents of the corresponding > readfile_t struct. You should probably take a look at io_uring. That has the level of complexity of this proposal and supports open/read/close along with many other opcodes. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 0/3] readfile(2): a new syscall to make open/read/close faster 2020-07-05 3:12 ` Matthew Wilcox @ 2020-07-05 3:18 ` Jan Ziak 2020-07-05 3:27 ` Matthew Wilcox 0 siblings, 1 reply; 31+ messages in thread From: Jan Ziak @ 2020-07-05 3:18 UTC (permalink / raw) To: Matthew Wilcox Cc: gregkh, linux-api, linux-fsdevel, linux-kernel, linux-kselftest, linux-man, mtk.manpages, shuah, viro On Sun, Jul 5, 2020 at 5:12 AM Matthew Wilcox <willy@infradead.org> wrote: > > You should probably take a look at io_uring. That has the level of > complexity of this proposal and supports open/read/close along with many > other opcodes. Then glibc can implement readfile using io_uring and there is no need for a new single-file readfile syscall. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 0/3] readfile(2): a new syscall to make open/read/close faster 2020-07-05 3:18 ` Jan Ziak @ 2020-07-05 3:27 ` Matthew Wilcox 2020-07-05 4:09 ` Jan Ziak 2020-07-05 8:07 ` Vito Caputo 0 siblings, 2 replies; 31+ messages in thread From: Matthew Wilcox @ 2020-07-05 3:27 UTC (permalink / raw) To: Jan Ziak Cc: gregkh, linux-api, linux-fsdevel, linux-kernel, linux-kselftest, linux-man, mtk.manpages, shuah, viro On Sun, Jul 05, 2020 at 05:18:58AM +0200, Jan Ziak wrote: > On Sun, Jul 5, 2020 at 5:12 AM Matthew Wilcox <willy@infradead.org> wrote: > > > > You should probably take a look at io_uring. That has the level of > > complexity of this proposal and supports open/read/close along with many > > other opcodes. > > Then glibc can implement readfile using io_uring and there is no need > for a new single-file readfile syscall. It could, sure. But there's also a value in having a simple interface to accomplish a simple task. Your proposed API added a very complex interface to satisfy needs that clearly aren't part of the problem space that Greg is looking to address. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 0/3] readfile(2): a new syscall to make open/read/close faster 2020-07-05 3:27 ` Matthew Wilcox @ 2020-07-05 4:09 ` Jan Ziak 2020-07-05 11:58 ` Greg KH 2020-07-05 8:07 ` Vito Caputo 1 sibling, 1 reply; 31+ messages in thread From: Jan Ziak @ 2020-07-05 4:09 UTC (permalink / raw) To: Matthew Wilcox Cc: gregkh, linux-api, linux-fsdevel, linux-kernel, linux-kselftest, linux-man, mtk.manpages, shuah, viro On Sun, Jul 5, 2020 at 5:27 AM Matthew Wilcox <willy@infradead.org> wrote: > > On Sun, Jul 05, 2020 at 05:18:58AM +0200, Jan Ziak wrote: > > On Sun, Jul 5, 2020 at 5:12 AM Matthew Wilcox <willy@infradead.org> wrote: > > > > > > You should probably take a look at io_uring. That has the level of > > > complexity of this proposal and supports open/read/close along with many > > > other opcodes. > > > > Then glibc can implement readfile using io_uring and there is no need > > for a new single-file readfile syscall. > > It could, sure. But there's also a value in having a simple interface > to accomplish a simple task. Your proposed API added a very complex > interface to satisfy needs that clearly aren't part of the problem space > that Greg is looking to address. I believe that we should look at the single-file readfile syscall from a performance viewpoint. If an application is expecting to read a couple of small/medium-size files per second, then neither readfile nor readfiles makes sense in terms of improving performance. The benefits start to show up only in case an application is expecting to read at least a hundred of files per second. The "per second" part is important, it cannot be left out. Because readfile only improves performance for many-file reads, the syscall that applications performing many-file reads actually want is the multi-file version, not the single-file version. I am not sure I understand why you think that a pointer to an array of readfile_t structures is very complex. If it was very complex then it would be a deep tree or a large graph. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 0/3] readfile(2): a new syscall to make open/read/close faster 2020-07-05 4:09 ` Jan Ziak @ 2020-07-05 11:58 ` Greg KH 2020-07-06 6:07 ` Jan Ziak 0 siblings, 1 reply; 31+ messages in thread From: Greg KH @ 2020-07-05 11:58 UTC (permalink / raw) To: Jan Ziak Cc: Matthew Wilcox, linux-api, linux-fsdevel, linux-kernel, linux-kselftest, linux-man, mtk.manpages, shuah, viro On Sun, Jul 05, 2020 at 06:09:03AM +0200, Jan Ziak wrote: > On Sun, Jul 5, 2020 at 5:27 AM Matthew Wilcox <willy@infradead.org> wrote: > > > > On Sun, Jul 05, 2020 at 05:18:58AM +0200, Jan Ziak wrote: > > > On Sun, Jul 5, 2020 at 5:12 AM Matthew Wilcox <willy@infradead.org> wrote: > > > > > > > > You should probably take a look at io_uring. That has the level of > > > > complexity of this proposal and supports open/read/close along with many > > > > other opcodes. > > > > > > Then glibc can implement readfile using io_uring and there is no need > > > for a new single-file readfile syscall. > > > > It could, sure. But there's also a value in having a simple interface > > to accomplish a simple task. Your proposed API added a very complex > > interface to satisfy needs that clearly aren't part of the problem space > > that Greg is looking to address. > > I believe that we should look at the single-file readfile syscall from > a performance viewpoint. If an application is expecting to read a > couple of small/medium-size files per second, then neither readfile > nor readfiles makes sense in terms of improving performance. The > benefits start to show up only in case an application is expecting to > read at least a hundred of files per second. The "per second" part is > important, it cannot be left out. Because readfile only improves > performance for many-file reads, the syscall that applications > performing many-file reads actually want is the multi-file version, > not the single-file version. It also is a measurable increase over reading just a single file. Here's my really really fast AMD system doing just one call to readfile vs. one call sequence to open/read/close: $ ./readfile_speed -l 1 Running readfile test on file /sys/devices/system/cpu/vulnerabilities/meltdown for 1 loops... Took 3410 ns Running open/read/close test on file /sys/devices/system/cpu/vulnerabilities/meltdown for 1 loops... Took 3780 ns 370ns isn't all that much, yes, but it is 370ns that could have been used for something else :) Look at the overhead these days of a syscall using something like perf to see just how bad things have gotten on Intel-based systems (above was AMD which doesn't suffer all the syscall slowdowns, only some). I'm going to have to now dig up my old rpi to get the stats on that thing, as well as some Intel boxes to show the problem I'm trying to help out with here. I'll post that for the next round of this patch series. > I am not sure I understand why you think that a pointer to an array of > readfile_t structures is very complex. If it was very complex then it > would be a deep tree or a large graph. Of course you can make it more complex if you want, but look at the existing tools that currently do many open/read/close sequences. The apis there don't lend themselves very well to knowing the larger list of files ahead of time. But I could be looking at the wrong thing, what userspace programs are you thinking of that could be easily converted into using something like this? thanks, greg k-h ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 0/3] readfile(2): a new syscall to make open/read/close faster 2020-07-05 11:58 ` Greg KH @ 2020-07-06 6:07 ` Jan Ziak 2020-07-06 11:11 ` Matthew Wilcox 2020-07-06 11:18 ` Greg KH 0 siblings, 2 replies; 31+ messages in thread From: Jan Ziak @ 2020-07-06 6:07 UTC (permalink / raw) To: Greg KH Cc: Matthew Wilcox, linux-api, linux-fsdevel, linux-kernel, linux-kselftest, linux-man, mtk.manpages, shuah, viro On Sun, Jul 5, 2020 at 1:58 PM Greg KH <gregkh@linuxfoundation.org> wrote: > > On Sun, Jul 05, 2020 at 06:09:03AM +0200, Jan Ziak wrote: > > On Sun, Jul 5, 2020 at 5:27 AM Matthew Wilcox <willy@infradead.org> wrote: > > > > > > On Sun, Jul 05, 2020 at 05:18:58AM +0200, Jan Ziak wrote: > > > > On Sun, Jul 5, 2020 at 5:12 AM Matthew Wilcox <willy@infradead.org> wrote: > > > > > > > > > > You should probably take a look at io_uring. That has the level of > > > > > complexity of this proposal and supports open/read/close along with many > > > > > other opcodes. > > > > > > > > Then glibc can implement readfile using io_uring and there is no need > > > > for a new single-file readfile syscall. > > > > > > It could, sure. But there's also a value in having a simple interface > > > to accomplish a simple task. Your proposed API added a very complex > > > interface to satisfy needs that clearly aren't part of the problem space > > > that Greg is looking to address. > > > > I believe that we should look at the single-file readfile syscall from > > a performance viewpoint. If an application is expecting to read a > > couple of small/medium-size files per second, then neither readfile > > nor readfiles makes sense in terms of improving performance. The > > benefits start to show up only in case an application is expecting to > > read at least a hundred of files per second. The "per second" part is > > important, it cannot be left out. Because readfile only improves > > performance for many-file reads, the syscall that applications > > performing many-file reads actually want is the multi-file version, > > not the single-file version. > > It also is a measurable increase over reading just a single file. > Here's my really really fast AMD system doing just one call to readfile > vs. one call sequence to open/read/close: > > $ ./readfile_speed -l 1 > Running readfile test on file /sys/devices/system/cpu/vulnerabilities/meltdown for 1 loops... > Took 3410 ns > Running open/read/close test on file /sys/devices/system/cpu/vulnerabilities/meltdown for 1 loops... > Took 3780 ns > > 370ns isn't all that much, yes, but it is 370ns that could have been > used for something else :) I am curious as to how you amortized or accounted for the fact that readfile() first needs to open the dirfd and then close it later. From performance viewpoint, only codes where readfile() is called multiple times from within a loop make sense: dirfd = open(); for(...) { readfile(dirfd, ...); } close(dirfd); > Look at the overhead these days of a syscall using something like perf > to see just how bad things have gotten on Intel-based systems (above was > AMD which doesn't suffer all the syscall slowdowns, only some). > > I'm going to have to now dig up my old rpi to get the stats on that > thing, as well as some Intel boxes to show the problem I'm trying to > help out with here. I'll post that for the next round of this patch > series. > > > I am not sure I understand why you think that a pointer to an array of > > readfile_t structures is very complex. If it was very complex then it > > would be a deep tree or a large graph. > > Of course you can make it more complex if you want, but look at the > existing tools that currently do many open/read/close sequences. The > apis there don't lend themselves very well to knowing the larger list of > files ahead of time. But I could be looking at the wrong thing, what > userspace programs are you thinking of that could be easily converted > into using something like this? Perhaps, passing multiple filenames to tools via the command-line is a valid and quite general use case where it is known ahead of time that multiple files are going to be read, such as "gcc *.o" which is commonly used to link shared libraries and executables. Although, in case of "gcc *.o" some of the object files are likely to be cached in memory and thus unlikely to be required to be fetched from HDD/SSD, so the valid use case where we could see a speedup (if gcc was to use the multi-file readfiles() syscall) is when the programmer/Makefile invokes "gcc *.o" after rebuilding a small subset of the object files and the objects files which did not have to be rebuilt are stored on HDD/SSD, so basically this means 1st-time use of a project's Makefile in a particular day. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 0/3] readfile(2): a new syscall to make open/read/close faster 2020-07-06 6:07 ` Jan Ziak @ 2020-07-06 11:11 ` Matthew Wilcox 2020-07-06 11:18 ` Greg KH 1 sibling, 0 replies; 31+ messages in thread From: Matthew Wilcox @ 2020-07-06 11:11 UTC (permalink / raw) To: Jan Ziak Cc: Greg KH, linux-api, linux-fsdevel, linux-kernel, linux-kselftest, linux-man, mtk.manpages, shuah, viro On Mon, Jul 06, 2020 at 08:07:46AM +0200, Jan Ziak wrote: > On Sun, Jul 5, 2020 at 1:58 PM Greg KH <gregkh@linuxfoundation.org> wrote: > > It also is a measurable increase over reading just a single file. > > Here's my really really fast AMD system doing just one call to readfile > > vs. one call sequence to open/read/close: > > > > $ ./readfile_speed -l 1 > > Running readfile test on file /sys/devices/system/cpu/vulnerabilities/meltdown for 1 loops... > > Took 3410 ns > > Running open/read/close test on file /sys/devices/system/cpu/vulnerabilities/meltdown for 1 loops... > > Took 3780 ns > > > > 370ns isn't all that much, yes, but it is 370ns that could have been > > used for something else :) > > I am curious as to how you amortized or accounted for the fact that > readfile() first needs to open the dirfd and then close it later. > > >From performance viewpoint, only codes where readfile() is called > multiple times from within a loop make sense: > > dirfd = open(); > for(...) { > readfile(dirfd, ...); > } > close(dirfd); dirfd can be AT_FDCWD or if the path is absolute, dirfd will be ignored, so one does not have to open anything. It would be an optimisation if one wanted to read several files relating to the same process: char dir[50]; sprintf(dir, "/proc/%d", pid); dirfd = open(dir); readfile(dirfd, "maps", ...); readfile(dirfd, "stack", ...); readfile(dirfd, "comm", ...); readfile(dirfd, "environ", ...); close(dirfd); but one would not have to do that. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 0/3] readfile(2): a new syscall to make open/read/close faster 2020-07-06 6:07 ` Jan Ziak 2020-07-06 11:11 ` Matthew Wilcox @ 2020-07-06 11:18 ` Greg KH 1 sibling, 0 replies; 31+ messages in thread From: Greg KH @ 2020-07-06 11:18 UTC (permalink / raw) To: Jan Ziak Cc: Matthew Wilcox, linux-api, linux-fsdevel, linux-kernel, linux-kselftest, linux-man, mtk.manpages, shuah, viro On Mon, Jul 06, 2020 at 08:07:46AM +0200, Jan Ziak wrote: > On Sun, Jul 5, 2020 at 1:58 PM Greg KH <gregkh@linuxfoundation.org> wrote: > > > > On Sun, Jul 05, 2020 at 06:09:03AM +0200, Jan Ziak wrote: > > > On Sun, Jul 5, 2020 at 5:27 AM Matthew Wilcox <willy@infradead.org> wrote: > > > > > > > > On Sun, Jul 05, 2020 at 05:18:58AM +0200, Jan Ziak wrote: > > > > > On Sun, Jul 5, 2020 at 5:12 AM Matthew Wilcox <willy@infradead.org> wrote: > > > > > > > > > > > > You should probably take a look at io_uring. That has the level of > > > > > > complexity of this proposal and supports open/read/close along with many > > > > > > other opcodes. > > > > > > > > > > Then glibc can implement readfile using io_uring and there is no need > > > > > for a new single-file readfile syscall. > > > > > > > > It could, sure. But there's also a value in having a simple interface > > > > to accomplish a simple task. Your proposed API added a very complex > > > > interface to satisfy needs that clearly aren't part of the problem space > > > > that Greg is looking to address. > > > > > > I believe that we should look at the single-file readfile syscall from > > > a performance viewpoint. If an application is expecting to read a > > > couple of small/medium-size files per second, then neither readfile > > > nor readfiles makes sense in terms of improving performance. The > > > benefits start to show up only in case an application is expecting to > > > read at least a hundred of files per second. The "per second" part is > > > important, it cannot be left out. Because readfile only improves > > > performance for many-file reads, the syscall that applications > > > performing many-file reads actually want is the multi-file version, > > > not the single-file version. > > > > It also is a measurable increase over reading just a single file. > > Here's my really really fast AMD system doing just one call to readfile > > vs. one call sequence to open/read/close: > > > > $ ./readfile_speed -l 1 > > Running readfile test on file /sys/devices/system/cpu/vulnerabilities/meltdown for 1 loops... > > Took 3410 ns > > Running open/read/close test on file /sys/devices/system/cpu/vulnerabilities/meltdown for 1 loops... > > Took 3780 ns > > > > 370ns isn't all that much, yes, but it is 370ns that could have been > > used for something else :) > > I am curious as to how you amortized or accounted for the fact that > readfile() first needs to open the dirfd and then close it later. I do not open a dirfd, look at the benchmark code in the patch, it's all right there. I can make it simpler, will do that for the next round as I want to make it really obvious for people to test on their hardware. > >From performance viewpoint, only codes where readfile() is called > multiple times from within a loop make sense: > > dirfd = open(); > for(...) { > readfile(dirfd, ...); > } > close(dirfd); No need to open dirfd at all, my benchmarks did not do that, just pass in an absolute path if you don't want to. But if you want to, because you want to read a bunch of files, you can, faster than you could if you wanted to read a number of individual files without it :) thanks, greg k-h ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 0/3] readfile(2): a new syscall to make open/read/close faster 2020-07-05 3:27 ` Matthew Wilcox 2020-07-05 4:09 ` Jan Ziak @ 2020-07-05 8:07 ` Vito Caputo 2020-07-05 11:44 ` Greg KH 1 sibling, 1 reply; 31+ messages in thread From: Vito Caputo @ 2020-07-05 8:07 UTC (permalink / raw) To: Matthew Wilcox Cc: Jan Ziak, gregkh, linux-api, linux-fsdevel, linux-kernel, linux-kselftest, linux-man, mtk.manpages, shuah, viro On Sun, Jul 05, 2020 at 04:27:32AM +0100, Matthew Wilcox wrote: > On Sun, Jul 05, 2020 at 05:18:58AM +0200, Jan Ziak wrote: > > On Sun, Jul 5, 2020 at 5:12 AM Matthew Wilcox <willy@infradead.org> wrote: > > > > > > You should probably take a look at io_uring. That has the level of > > > complexity of this proposal and supports open/read/close along with many > > > other opcodes. > > > > Then glibc can implement readfile using io_uring and there is no need > > for a new single-file readfile syscall. > > It could, sure. But there's also a value in having a simple interface > to accomplish a simple task. Your proposed API added a very complex > interface to satisfy needs that clearly aren't part of the problem space > that Greg is looking to address. I disagree re: "aren't part of the problem space". Reading small files from procfs was specifically called out in the rationale for the syscall. In my experience you're rarely monitoring a single proc file in any situation where you care about the syscall overhead. You're monitoring many of them, and any serious effort to do this efficiently in a repeatedly sampled situation has cached the open fds and already uses pread() to simply restart from 0 on every sample and not repeatedly pay for the name lookup. Basically anything optimally using the existing interfaces for sampling proc files needs a way to read multiple open file descriptors in a single syscall to move the needle. This syscall doesn't provide that. It doesn't really give any advantage over what we can achieve already. It seems basically pointless to me, from a monitoring proc files perspective. Regards, Vito Caputo ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 0/3] readfile(2): a new syscall to make open/read/close faster 2020-07-05 8:07 ` Vito Caputo @ 2020-07-05 11:44 ` Greg KH 2020-07-05 20:34 ` Vito Caputo 0 siblings, 1 reply; 31+ messages in thread From: Greg KH @ 2020-07-05 11:44 UTC (permalink / raw) To: Vito Caputo Cc: Matthew Wilcox, Jan Ziak, linux-api, linux-fsdevel, linux-kernel, linux-kselftest, linux-man, mtk.manpages, shuah, viro On Sun, Jul 05, 2020 at 01:07:14AM -0700, Vito Caputo wrote: > On Sun, Jul 05, 2020 at 04:27:32AM +0100, Matthew Wilcox wrote: > > On Sun, Jul 05, 2020 at 05:18:58AM +0200, Jan Ziak wrote: > > > On Sun, Jul 5, 2020 at 5:12 AM Matthew Wilcox <willy@infradead.org> wrote: > > > > > > > > You should probably take a look at io_uring. That has the level of > > > > complexity of this proposal and supports open/read/close along with many > > > > other opcodes. > > > > > > Then glibc can implement readfile using io_uring and there is no need > > > for a new single-file readfile syscall. > > > > It could, sure. But there's also a value in having a simple interface > > to accomplish a simple task. Your proposed API added a very complex > > interface to satisfy needs that clearly aren't part of the problem space > > that Greg is looking to address. > > I disagree re: "aren't part of the problem space". > > Reading small files from procfs was specifically called out in the > rationale for the syscall. > > In my experience you're rarely monitoring a single proc file in any > situation where you care about the syscall overhead. You're > monitoring many of them, and any serious effort to do this efficiently > in a repeatedly sampled situation has cached the open fds and already > uses pread() to simply restart from 0 on every sample and not > repeatedly pay for the name lookup. That's your use case, but many other use cases are just "read a bunch of sysfs files in one shot". Examples of that are tools that monitor uevents and lots of hardware-information gathering tools. Also not all tools sem to be as smart as you think they are, look at util-linux for loads of the "open/read/close" lots of files pattern. I had a half-baked patch to convert it to use readfile which I need to polish off and post with the next series to show how this can be used to both make userspace simpler as well as use less cpu time. > Basically anything optimally using the existing interfaces for > sampling proc files needs a way to read multiple open file descriptors > in a single syscall to move the needle. Is psutils using this type of interface, or do they constantly open different files? What about fun tools like bashtop: https://github.com/aristocratos/bashtop.git which thankfully now relies on python's psutil package to parse proc in semi-sane ways, but that package does loads of constant open/read/close of proc files all the time from what I can tell. And lots of people rely on python's psutil, right? > This syscall doesn't provide that. It doesn't really give any > advantage over what we can achieve already. It seems basically > pointless to me, from a monitoring proc files perspective. What "good" monitoring programs do you suggest follow the pattern you recommend? thanks, greg k-h ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 0/3] readfile(2): a new syscall to make open/read/close faster 2020-07-05 11:44 ` Greg KH @ 2020-07-05 20:34 ` Vito Caputo 0 siblings, 0 replies; 31+ messages in thread From: Vito Caputo @ 2020-07-05 20:34 UTC (permalink / raw) To: Greg KH Cc: Matthew Wilcox, Jan Ziak, linux-api, linux-fsdevel, linux-kernel, linux-kselftest, linux-man, mtk.manpages, shuah, viro [-- Attachment #1: Type: text/plain, Size: 5991 bytes --] On Sun, Jul 05, 2020 at 01:44:54PM +0200, Greg KH wrote: > On Sun, Jul 05, 2020 at 01:07:14AM -0700, Vito Caputo wrote: > > On Sun, Jul 05, 2020 at 04:27:32AM +0100, Matthew Wilcox wrote: > > > On Sun, Jul 05, 2020 at 05:18:58AM +0200, Jan Ziak wrote: > > > > On Sun, Jul 5, 2020 at 5:12 AM Matthew Wilcox <willy@infradead.org> wrote: > > > > > > > > > > You should probably take a look at io_uring. That has the level of > > > > > complexity of this proposal and supports open/read/close along with many > > > > > other opcodes. > > > > > > > > Then glibc can implement readfile using io_uring and there is no need > > > > for a new single-file readfile syscall. > > > > > > It could, sure. But there's also a value in having a simple interface > > > to accomplish a simple task. Your proposed API added a very complex > > > interface to satisfy needs that clearly aren't part of the problem space > > > that Greg is looking to address. > > > > I disagree re: "aren't part of the problem space". > > > > Reading small files from procfs was specifically called out in the > > rationale for the syscall. > > > > In my experience you're rarely monitoring a single proc file in any > > situation where you care about the syscall overhead. You're > > monitoring many of them, and any serious effort to do this efficiently > > in a repeatedly sampled situation has cached the open fds and already > > uses pread() to simply restart from 0 on every sample and not > > repeatedly pay for the name lookup. > > That's your use case, but many other use cases are just "read a bunch of > sysfs files in one shot". Examples of that are tools that monitor > uevents and lots of hardware-information gathering tools. > > Also not all tools sem to be as smart as you think they are, look at > util-linux for loads of the "open/read/close" lots of files pattern. I > had a half-baked patch to convert it to use readfile which I need to > polish off and post with the next series to show how this can be used to > both make userspace simpler as well as use less cpu time. > > > Basically anything optimally using the existing interfaces for > > sampling proc files needs a way to read multiple open file descriptors > > in a single syscall to move the needle. > > Is psutils using this type of interface, or do they constantly open > different files? > When I last checked, psutils was not an optimal example, nor did I suggest it was. > What about fun tools like bashtop: > https://github.com/aristocratos/bashtop.git > which thankfully now relies on python's psutil package to parse proc in > semi-sane ways, but that package does loads of constant open/read/close > of proc files all the time from what I can tell. > > And lots of people rely on python's psutil, right? If python's psutil is constantly reopening the same files in /proc, this is an argument to go improve python's psutil, especially if it's popular. Your proposed syscall doesn't magically make everything suboptimally sampling proc more efficient. It still requires going out and modifying everything to use the new syscall. In order to actually realize a gain comparable to what can be done using existing interfaces, but with your new syscall, if the code wasn't already reusing the open fd it still requires a refactor to do so with your syscall, to eliminate the directory lookup on every sample. At the end of the day, if you did all this work, you'd have code that only works on kernels with the new syscall, didn't enjoy a significant performance gain over what could have been achieved using the existing interfaces, and still required basically the same amount of work as optimizing for the existing interfaces would have. For what gain? > > > This syscall doesn't provide that. It doesn't really give any > > advantage over what we can achieve already. It seems basically > > pointless to me, from a monitoring proc files perspective. > > What "good" monitoring programs do you suggest follow the pattern you > recommend? > "Good" is not generally a word I'd use to describe software, surely that's not me you're quoting... but I assume you mean "optimal". I'm sure sysprof is at least reusing open files when sampling proc, because we discussed the issue when Christian took over maintenance. It appears he's currently using the lseek()->read() sequence: https://gitlab.gnome.org/GNOME/sysprof/-/blob/master/src/libsysprof/sysprof-netdev-source.c#L223 https://gitlab.gnome.org/GNOME/sysprof/-/blob/master/src/libsysprof/sysprof-memory-source.c#L210 https://gitlab.gnome.org/GNOME/sysprof/-/blob/master/src/libsysprof/sysprof-diskstat-source.c#L185 It'd be more efficient to just use pread() and lose the lseek(), at which point it'd be just a single pread() call per sample per proc file. Nothing your proposed syscall would improve upon, not that it'd be eligible for software that wants to work on existing kernels from distros like Debian and Centos/RHEL anyways. If this were a conversation about providing something like a better scatter-gather interface akin to p{read,write}v but with the fd in the iovec, then we'd be talking about something very lucrative for proc sampling. But like you've said elsewhere in this thread, io_uring() may suffice as an alternative solution in that vein. My personal interest in this topic stems from an experimental window manager I made, and still use, which monitors every descendant process for the X session at frequencies up to 60HZ. The code opens a bunch of proc files for every process, and keeps them open until the process goes away or falls out of scope. See the attachment for some idea of what /proc/$(pidof wm)/fd looks like. All those proc files are read at up to 60HZ continuously. All top-like tools are really no different, and already shouldn't be reopening things on every sample. They should be fixed if not - with or without your syscall, it's equal effort, but the existing interfaces... exist. Regards, Vito Caputo [-- Attachment #2: vwm-fds.txt --] [-- Type: text/plain, Size: 18703 bytes --] total 0 lrwx------ 1 vcaputo vcaputo 64 Jul 5 13:16 0 -> /dev/tty1 l-wx------ 1 vcaputo vcaputo 64 Jul 5 13:16 1 -> /home/vcaputo/.xsession-errors lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 10 -> /proc/829/stat lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 100 -> /proc/8427/stat lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 101 -> /proc/8428/task/8428/children lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 102 -> /proc/8428/comm lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 103 -> /proc/8428/cmdline lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 104 -> /proc/8428/wchan lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 105 -> /proc/8428/stat lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 106 -> /proc/8430/task/8430/children lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 107 -> /proc/8430/comm lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 108 -> /proc/8430/cmdline lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 109 -> /proc/8430/wchan lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 11 -> /proc/830/task/830/children lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 110 -> /proc/8430/stat lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 111 -> /proc/8433/task/8433/children lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 112 -> /proc/8433/comm lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 113 -> /proc/8433/cmdline lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 114 -> /proc/8433/wchan lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 115 -> /proc/8433/stat lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 116 -> /proc/8434/task/8434/children lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 117 -> /proc/8434/comm lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 118 -> /proc/8434/cmdline lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 119 -> /proc/8434/wchan lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 12 -> /proc/830/comm lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 120 -> /proc/8434/stat lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 121 -> /proc/12400/task/12400/children lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 122 -> /proc/12400/comm lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 123 -> /proc/12400/cmdline lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 124 -> /proc/12400/wchan lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 125 -> /proc/12400/stat lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 126 -> /proc/11921/task/11921/children lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 127 -> /proc/11921/comm lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 128 -> /proc/11921/cmdline lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 129 -> /proc/11921/wchan lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 13 -> /proc/830/cmdline lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 130 -> /proc/11921/stat lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 131 -> /proc/30440/task/30440/children lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 132 -> /proc/30440/comm lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 133 -> /proc/30440/cmdline lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 134 -> /proc/30440/wchan lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 135 -> /proc/30440/stat lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 136 -> /proc/5841/task/5841/children lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 137 -> /proc/5841/comm lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 138 -> /proc/5841/cmdline lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 139 -> /proc/5841/wchan lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 14 -> /proc/830/wchan lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 140 -> /proc/5841/stat lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 141 -> /proc/25853/task/25853/children lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 142 -> /proc/25853/comm lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 143 -> /proc/25853/cmdline lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 144 -> /proc/25853/wchan lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 145 -> /proc/25853/stat lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 146 -> /proc/25854/task/25854/children lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 147 -> /proc/25854/comm lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 148 -> /proc/25854/cmdline lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 149 -> /proc/25854/wchan lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 15 -> /proc/830/stat lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 150 -> /proc/25854/stat lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 151 -> /proc/25856/task/25856/children lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 152 -> /proc/25856/comm lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 153 -> /proc/25856/cmdline lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 154 -> /proc/25856/wchan lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 155 -> /proc/25856/stat lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 156 -> /proc/25859/task/25859/children lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 157 -> /proc/25859/comm lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 158 -> /proc/25859/cmdline lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 159 -> /proc/25859/wchan lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 16 -> /proc/831/task/831/children lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 160 -> /proc/25859/stat lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 161 -> /proc/5843/task/5843/children lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 162 -> /proc/5843/comm lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 163 -> /proc/5843/cmdline lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 164 -> /proc/5843/wchan lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 165 -> /proc/5843/stat lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 166 -> /proc/5848/task/5848/children lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 167 -> /proc/5848/comm lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 168 -> /proc/5848/cmdline lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 169 -> /proc/5848/wchan lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 17 -> /proc/831/comm lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 170 -> /proc/5848/stat lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 171 -> /proc/5848/task lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 172 -> /proc/5848/task/5848/comm lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 173 -> /proc/5848/task/5848/cmdline lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 174 -> /proc/5848/task/5848/wchan lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 175 -> /proc/5848/task/5848/stat lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 176 -> /proc/5849/task/5849/comm lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 177 -> /proc/5849/task/5849/cmdline lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 178 -> /proc/5849/task/5849/wchan lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 179 -> /proc/5849/task/5849/stat lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 18 -> /proc/831/cmdline lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 180 -> /proc/5850/task/5850/comm lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 181 -> /proc/30441/task/30441/children lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 182 -> /proc/30441/comm lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 183 -> /proc/30441/cmdline lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 184 -> /proc/30441/wchan lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 185 -> /proc/30441/stat lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 186 -> /proc/30443/task/30443/children lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 187 -> /proc/30443/comm lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 188 -> /proc/30443/cmdline lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 189 -> /proc/30443/wchan lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 19 -> /proc/831/wchan lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 190 -> /proc/30443/stat lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 191 -> /proc/30446/task/30446/children lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 192 -> /proc/30446/comm lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 193 -> /proc/30446/cmdline lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 194 -> /proc/30446/wchan lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 195 -> /proc/30446/stat lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 196 -> /proc/30447/task/30447/children lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 197 -> /proc/30447/comm lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 198 -> /proc/30447/cmdline lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 199 -> /proc/30447/wchan l-wx------ 1 vcaputo vcaputo 64 Jul 5 13:16 2 -> /home/vcaputo/.xsession-errors lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 20 -> /proc/831/stat lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 200 -> /proc/30447/stat lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 201 -> /proc/30448/task/30448/children lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 202 -> /proc/30448/comm lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 203 -> /proc/30448/cmdline lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 204 -> /proc/30448/wchan lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 205 -> /proc/30448/stat lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 206 -> /proc/30451/task/30451/children lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 207 -> /proc/30451/comm lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 208 -> /proc/30451/cmdline lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 209 -> /proc/30451/wchan lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 21 -> /proc/832/task/832/children lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 210 -> /proc/30451/stat lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 211 -> /proc/30451/task lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 212 -> /proc/30451/task/30451/comm lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 213 -> /proc/30451/task/30451/cmdline lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 214 -> /proc/30451/task/30451/wchan lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 215 -> /proc/30451/task/30451/stat lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 216 -> /proc/30452/task/30452/comm lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 217 -> /proc/30452/task/30452/cmdline lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 218 -> /proc/30452/task/30452/wchan lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 219 -> /proc/30452/task/30452/stat lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 22 -> /proc/832/comm lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 220 -> /proc/30453/task/30453/comm lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 221 -> /proc/30453/task/30453/cmdline lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 222 -> /proc/30453/task/30453/wchan lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 223 -> /proc/30453/task/30453/stat lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 224 -> /proc/30454/task/30454/comm lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 225 -> /proc/30454/task/30454/cmdline lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 226 -> /proc/30454/task/30454/wchan lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 227 -> /proc/30454/task/30454/stat lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 228 -> /proc/30455/task/30455/children lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 229 -> /proc/30455/comm lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 23 -> /proc/832/cmdline lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 230 -> /proc/30455/cmdline lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 231 -> /proc/30455/wchan lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 232 -> /proc/30455/stat lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 233 -> /proc/30458/task/30458/children lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 234 -> /proc/30458/comm lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 235 -> /proc/30458/cmdline lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 236 -> /proc/30458/wchan lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 237 -> /proc/30458/stat lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 238 -> /proc/5850/task/5850/cmdline lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 239 -> /proc/5850/task/5850/wchan lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 24 -> /proc/832/wchan lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 240 -> /proc/5850/task/5850/stat lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 241 -> /proc/5851/task/5851/comm lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 242 -> /proc/5851/task/5851/cmdline lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 243 -> /proc/5851/task/5851/wchan lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 244 -> /proc/5851/task/5851/stat lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 245 -> /proc/5853/task/5853/comm lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 246 -> /proc/5853/task/5853/cmdline lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 247 -> /proc/5853/task/5853/wchan lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 248 -> /proc/5853/task/5853/stat lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 249 -> /proc/5856/task/5856/comm lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 25 -> /proc/832/stat lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 250 -> /proc/5856/task/5856/cmdline lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 251 -> /proc/5856/task/5856/wchan lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 252 -> /proc/5856/task/5856/stat lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 253 -> /proc/6844/task/6844/children lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 254 -> /proc/6844/comm lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 255 -> /proc/6844/cmdline lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 256 -> /proc/6844/wchan lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 257 -> /proc/6844/stat lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 26 -> /proc/833/task/833/children lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 27 -> /proc/833/comm lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 28 -> /proc/833/cmdline lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 29 -> /proc/833/wchan lrwx------ 1 vcaputo vcaputo 64 Jul 5 13:16 3 -> socket:[19590] lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 30 -> /proc/833/stat lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 31 -> /proc/839/task/839/children lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 32 -> /proc/839/comm lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 33 -> /proc/839/cmdline lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 34 -> /proc/839/wchan lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 35 -> /proc/839/stat lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 36 -> /proc/840/task/840/children lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 37 -> /proc/840/comm lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 38 -> /proc/840/cmdline lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 39 -> /proc/840/wchan lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 4 -> /proc lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 40 -> /proc/840/stat lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 41 -> /proc/842/task/842/children lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 42 -> /proc/842/comm lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 43 -> /proc/842/cmdline lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 44 -> /proc/842/wchan lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 45 -> /proc/842/stat lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 46 -> /proc/5858/task/5858/children lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 47 -> /proc/5858/comm lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 48 -> /proc/5858/cmdline lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 49 -> /proc/5858/wchan lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 5 -> /proc/stat lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 50 -> /proc/5858/stat lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 51 -> /proc/6841/task/6841/children lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 52 -> /proc/6841/comm lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 53 -> /proc/6841/cmdline lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 54 -> /proc/6841/wchan lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 55 -> /proc/6841/stat lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 56 -> /proc/6842/task/6842/children lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 57 -> /proc/6842/comm lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 58 -> /proc/6842/cmdline lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 59 -> /proc/6842/wchan lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 6 -> /proc/829/task/829/children lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 60 -> /proc/6842/stat lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 61 -> /proc/5840/task/5840/children lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 62 -> /proc/5840/comm lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 63 -> /proc/5840/cmdline lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 64 -> /proc/5840/wchan lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 65 -> /proc/5840/stat lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 66 -> /proc/896/task/896/children lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 67 -> /proc/896/comm lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 68 -> /proc/896/cmdline lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 69 -> /proc/896/wchan lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 7 -> /proc/829/comm lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 70 -> /proc/896/stat lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 71 -> /proc/897/task/897/children lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 72 -> /proc/897/comm lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 73 -> /proc/897/cmdline lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 74 -> /proc/897/wchan lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 75 -> /proc/897/stat lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 76 -> /proc/899/task/899/children lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 77 -> /proc/899/comm lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 78 -> /proc/899/cmdline lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 79 -> /proc/899/wchan lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 8 -> /proc/829/cmdline lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 80 -> /proc/899/stat lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 81 -> /proc/2293/task/2293/children lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 82 -> /proc/2293/comm lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 83 -> /proc/2293/cmdline lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 84 -> /proc/2293/wchan lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 85 -> /proc/2293/stat lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 86 -> /proc/2294/task/2294/children lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 87 -> /proc/2294/comm lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 88 -> /proc/2294/cmdline lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 89 -> /proc/2294/wchan lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 9 -> /proc/829/wchan lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 90 -> /proc/2294/stat lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 91 -> /proc/2296/task/2296/children lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 92 -> /proc/2296/comm lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 93 -> /proc/2296/cmdline lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 94 -> /proc/2296/wchan lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 95 -> /proc/2296/stat lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 96 -> /proc/8427/task/8427/children lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 97 -> /proc/8427/comm lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 98 -> /proc/8427/cmdline lr-x------ 1 vcaputo vcaputo 64 Jul 5 13:16 99 -> /proc/8427/wchan ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 0/3] readfile(2): a new syscall to make open/read/close faster 2020-07-05 2:46 ` Jan Ziak 2020-07-05 3:12 ` Matthew Wilcox @ 2020-07-05 6:32 ` Andreas Dilger 2020-07-05 7:25 ` Jan Ziak 1 sibling, 1 reply; 31+ messages in thread From: Andreas Dilger @ 2020-07-05 6:32 UTC (permalink / raw) To: Jan Ziak Cc: Matthew Wilcox, gregkh, linux-api, linux-fsdevel, linux-kernel, linux-kselftest, linux-man, mtk.manpages, shuah, viro [-- Attachment #1: Type: text/plain, Size: 3451 bytes --] On Jul 4, 2020, at 8:46 PM, Jan Ziak <0xe2.0x9a.0x9b@gmail.com> wrote: > > On Sun, Jul 5, 2020 at 4:16 AM Matthew Wilcox <willy@infradead.org> wrote: >> >> On Sun, Jul 05, 2020 at 04:06:22AM +0200, Jan Ziak wrote: >>> Hello >>> >>> At first, I thought that the proposed system call is capable of >>> reading *multiple* small files using a single system call - which >>> would help increase HDD/SSD queue utilization and increase IOPS (I/O >>> operations per second) - but that isn't the case and the proposed >>> system call can read just a single file. >>> >>> Without the ability to read multiple small files using a single system >>> call, it is impossible to increase IOPS (unless an application is >>> using multiple reader threads or somehow instructs the kernel to >>> prefetch multiple files into memory). >> >> What API would you use for this? >> >> ssize_t readfiles(int dfd, char **files, void **bufs, size_t *lens); >> >> I pretty much hate this interface, so I hope you have something better >> in mind. > > I am proposing the following: > > struct readfile_t { > int dirfd; > const char *pathname; > void *buf; > size_t count; > int flags; > ssize_t retval; // set by kernel > int reserved; // not used by kernel > }; If you are going to pass a struct from userspace to the kernel, it should not mix int and pointer types (which may be 64-bit values, so that there are not structure packing issues, like: struct readfile { int dirfd; int flags; const char *pathname; void *buf; size_t count; ssize_t retval; }; It would be better if "retval" was returned in "count", so that the structure fits nicely into 32 bytes on a 64-bit system, instead of being 40 bytes per entry, which adds up over many entries, like. struct readfile { int dirfd; int flags; const char *pathname; void *buf; ssize_t count; /* input: bytes requested, output: bytes read or -errno */ }; However, there is still an issue with passing pointers from userspace, since they may be 32-bit userspace pointers on a 64-bit kernel. > int readfiles(struct readfile_t *requests, size_t count); It's not clear why count is a "size_t" since it is not a size. An unsigned int is fine here, since it should never be negative. > Returns zero if all requests succeeded, otherwise the returned value > is non-zero (glibc wrapper: -1) and user-space is expected to check > which requests have succeeded and which have failed. retval in > readfile_t is set to what the single-file readfile syscall would > return if it was called with the contents of the corresponding > readfile_t struct. > > The glibc library wrapper of this system call is expected to store the > errno in the "reserved" field. Thus, a programmer using glibc sees: > > struct readfile_t { > int dirfd; > const char *pathname; > void *buf; > size_t count; > int flags; > ssize_t retval; // set by glibc (-1 on error) > int errno; // set by glibc if retval is -1 > }; Why not just return the errno directly in "retval", or in "count" as proposed? That avoids further bloating the structure by another field. > retval and errno in glibc's readfile_t are set to what the single-file > glibc readfile would return (retval) and set (errno) if it was called > with the contents of the corresponding readfile_t struct. In case of > an error, glibc will pick one readfile_t which failed (such as: the > 1st failed one) and use it to set glibc's errno. Cheers, Andreas [-- Attachment #2: Message signed with OpenPGP --] [-- Type: application/pgp-signature, Size: 873 bytes --] ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 0/3] readfile(2): a new syscall to make open/read/close faster 2020-07-05 6:32 ` Andreas Dilger @ 2020-07-05 7:25 ` Jan Ziak 2020-07-05 12:00 ` Greg KH 0 siblings, 1 reply; 31+ messages in thread From: Jan Ziak @ 2020-07-05 7:25 UTC (permalink / raw) To: Andreas Dilger Cc: Matthew Wilcox, gregkh, linux-api, linux-fsdevel, linux-kernel, linux-kselftest, linux-man, mtk.manpages, shuah, viro On Sun, Jul 5, 2020 at 8:32 AM Andreas Dilger <adilger@dilger.ca> wrote: > > On Jul 4, 2020, at 8:46 PM, Jan Ziak <0xe2.0x9a.0x9b@gmail.com> wrote: > > > > On Sun, Jul 5, 2020 at 4:16 AM Matthew Wilcox <willy@infradead.org> wrote: > >> > >> On Sun, Jul 05, 2020 at 04:06:22AM +0200, Jan Ziak wrote: > >>> Hello > >>> > >>> At first, I thought that the proposed system call is capable of > >>> reading *multiple* small files using a single system call - which > >>> would help increase HDD/SSD queue utilization and increase IOPS (I/O > >>> operations per second) - but that isn't the case and the proposed > >>> system call can read just a single file. > >>> > >>> Without the ability to read multiple small files using a single system > >>> call, it is impossible to increase IOPS (unless an application is > >>> using multiple reader threads or somehow instructs the kernel to > >>> prefetch multiple files into memory). > >> > >> What API would you use for this? > >> > >> ssize_t readfiles(int dfd, char **files, void **bufs, size_t *lens); > >> > >> I pretty much hate this interface, so I hope you have something better > >> in mind. > > > > I am proposing the following: > > > > struct readfile_t { > > int dirfd; > > const char *pathname; > > void *buf; > > size_t count; > > int flags; > > ssize_t retval; // set by kernel > > int reserved; // not used by kernel > > }; > > If you are going to pass a struct from userspace to the kernel, it > should not mix int and pointer types (which may be 64-bit values, > so that there are not structure packing issues, like: > > struct readfile { > int dirfd; > int flags; > const char *pathname; > void *buf; > size_t count; > ssize_t retval; > }; > > It would be better if "retval" was returned in "count", so that > the structure fits nicely into 32 bytes on a 64-bit system, instead > of being 40 bytes per entry, which adds up over many entries, like. I know what you mean and it is a valid point, but in my opinion it shouldn't (in most cases) be left to the programmer to decide what the binary layout of a data structure is - instead it should be left to an optimizing compiler to decide it. Just like code optimization, determining the physical layout of data structures can be subject to automatic optimizations as well. It is kind of unfortunate that in C/C++, and in many other statically compiled languages (even recent ones), the physical layout of all data structures is determined by the programmer rather than the compiler. Also, tagging fields as "input", "output", or both (the default) would be helpful in obtaining smaller sizes: struct readfile_t { input int dirfd; input const char *pathname; input void *buf; input size_t count; input int flags; output ssize_t retval; // set by kernel output int reserved; // not used by kernel }; int readfiles(struct readfile_t *requests, size_t count); struct readfile_t r[10]; // Write r[i] inputs int status = readfiles(r, nelem(r)); // Read r[i] outputs A data-layout optimizing compiler should be able to determine that the optimal layout of readfile_t is UNION(INPUT: 2*int+2*pointer+1*size_t, OUTPUT: 1*ssize_t+1*int). In the unfortunate case of the non-optimizing C language and if it is just a micro-optimization (optimizing readfile_t is a micro-optimization), it is better to leave the data structure in a form that is appropriate for being efficiently readable by programmers rather than to micro-optimize it and make it confusing to programmers. > struct readfile { > int dirfd; > int flags; > const char *pathname; > void *buf; > ssize_t count; /* input: bytes requested, output: bytes read or -errno */ > }; > > > However, there is still an issue with passing pointers from userspace, > since they may be 32-bit userspace pointers on a 64-bit kernel. > > > int readfiles(struct readfile_t *requests, size_t count); > > It's not clear why count is a "size_t" since it is not a size. > An unsigned int is fine here, since it should never be negative. Generally speaking, size_t reflects the size of the address space while unsigned int doesn't and therefore it is easier for unsigned int to overflow on very large data sets. > > Returns zero if all requests succeeded, otherwise the returned value > > is non-zero (glibc wrapper: -1) and user-space is expected to check > > which requests have succeeded and which have failed. retval in > > readfile_t is set to what the single-file readfile syscall would > > return if it was called with the contents of the corresponding > > readfile_t struct. > > > > The glibc library wrapper of this system call is expected to store the > > errno in the "reserved" field. Thus, a programmer using glibc sees: > > > > struct readfile_t { > > int dirfd; > > const char *pathname; > > void *buf; > > size_t count; > > int flags; > > ssize_t retval; // set by glibc (-1 on error) > > int errno; // set by glibc if retval is -1 > > }; > > Why not just return the errno directly in "retval", or in "count" as > proposed? That avoids further bloating the structure by another field. > > > retval and errno in glibc's readfile_t are set to what the single-file > > glibc readfile would return (retval) and set (errno) if it was called > > with the contents of the corresponding readfile_t struct. In case of > > an error, glibc will pick one readfile_t which failed (such as: the > > 1st failed one) and use it to set glibc's errno. > > > Cheers, Andreas ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 0/3] readfile(2): a new syscall to make open/read/close faster 2020-07-05 7:25 ` Jan Ziak @ 2020-07-05 12:00 ` Greg KH 0 siblings, 0 replies; 31+ messages in thread From: Greg KH @ 2020-07-05 12:00 UTC (permalink / raw) To: Jan Ziak Cc: Andreas Dilger, Matthew Wilcox, linux-api, linux-fsdevel, linux-kernel, linux-kselftest, linux-man, mtk.manpages, shuah, viro On Sun, Jul 05, 2020 at 09:25:39AM +0200, Jan Ziak wrote: > On Sun, Jul 5, 2020 at 8:32 AM Andreas Dilger <adilger@dilger.ca> wrote: > > > > On Jul 4, 2020, at 8:46 PM, Jan Ziak <0xe2.0x9a.0x9b@gmail.com> wrote: > > > > > > On Sun, Jul 5, 2020 at 4:16 AM Matthew Wilcox <willy@infradead.org> wrote: > > >> > > >> On Sun, Jul 05, 2020 at 04:06:22AM +0200, Jan Ziak wrote: > > >>> Hello > > >>> > > >>> At first, I thought that the proposed system call is capable of > > >>> reading *multiple* small files using a single system call - which > > >>> would help increase HDD/SSD queue utilization and increase IOPS (I/O > > >>> operations per second) - but that isn't the case and the proposed > > >>> system call can read just a single file. > > >>> > > >>> Without the ability to read multiple small files using a single system > > >>> call, it is impossible to increase IOPS (unless an application is > > >>> using multiple reader threads or somehow instructs the kernel to > > >>> prefetch multiple files into memory). > > >> > > >> What API would you use for this? > > >> > > >> ssize_t readfiles(int dfd, char **files, void **bufs, size_t *lens); > > >> > > >> I pretty much hate this interface, so I hope you have something better > > >> in mind. > > > > > > I am proposing the following: > > > > > > struct readfile_t { > > > int dirfd; > > > const char *pathname; > > > void *buf; > > > size_t count; > > > int flags; > > > ssize_t retval; // set by kernel > > > int reserved; // not used by kernel > > > }; > > > > If you are going to pass a struct from userspace to the kernel, it > > should not mix int and pointer types (which may be 64-bit values, > > so that there are not structure packing issues, like: > > > > struct readfile { > > int dirfd; > > int flags; > > const char *pathname; > > void *buf; > > size_t count; > > ssize_t retval; > > }; > > > > It would be better if "retval" was returned in "count", so that > > the structure fits nicely into 32 bytes on a 64-bit system, instead > > of being 40 bytes per entry, which adds up over many entries, like. > > I know what you mean and it is a valid point, but in my opinion it > shouldn't (in most cases) be left to the programmer to decide what the > binary layout of a data structure is - instead it should be left to an > optimizing compiler to decide it. We don't get that luxury when creating user/kernel apis in C, sorry. I suggest using the pahole tool if you are interested in seeing the "best" way a structure can be layed out, it can perform that optimization for you so that you know how to fix your code. thanks, greg k-h ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 0/3] readfile(2): a new syscall to make open/read/close faster 2020-07-05 2:06 [PATCH 0/3] readfile(2): a new syscall to make open/read/close faster Jan Ziak 2020-07-05 2:16 ` Matthew Wilcox @ 2020-07-05 11:50 ` Greg KH 2020-07-14 6:51 ` Pavel Machek 1 sibling, 1 reply; 31+ messages in thread From: Greg KH @ 2020-07-05 11:50 UTC (permalink / raw) To: Jan Ziak Cc: linux-api, linux-fsdevel, linux-kernel, linux-kselftest, linux-man, mtk.manpages, shuah, viro On Sun, Jul 05, 2020 at 04:06:22AM +0200, Jan Ziak wrote: > Hello > > At first, I thought that the proposed system call is capable of > reading *multiple* small files using a single system call - which > would help increase HDD/SSD queue utilization and increase IOPS (I/O > operations per second) - but that isn't the case and the proposed > system call can read just a single file. If you want to do this for multple files, use io_ring, that's what it was designed for. I think Jens was going to be adding support for the open/read/close pattern to it as well, after some other more pressing features/fixes were finished. > Without the ability to read multiple small files using a single system > call, it is impossible to increase IOPS (unless an application is > using multiple reader threads or somehow instructs the kernel to > prefetch multiple files into memory). There's not much (but it is mesurable) need to prefetch virtual files into memory first, which is primarily what this syscall is for (procfs, sysfs, securityfs, etc.) If you are dealing with real-disks, then yes, the overhead of the syscall might be in the noise compared to the i/o path of the data. > While you are at it, why not also add a readfiles system call to read > multiple, presumably small, files? The initial unoptimized > implementation of readfiles syscall can simply call readfile > sequentially. Again, that's what io_uring is for. thanks, greg k-h ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 0/3] readfile(2): a new syscall to make open/read/close faster 2020-07-05 11:50 ` Greg KH @ 2020-07-14 6:51 ` Pavel Machek 2020-07-14 8:07 ` Miklos Szeredi 0 siblings, 1 reply; 31+ messages in thread From: Pavel Machek @ 2020-07-14 6:51 UTC (permalink / raw) To: Greg KH Cc: Jan Ziak, linux-api, linux-fsdevel, linux-kernel, linux-kselftest, linux-man, mtk.manpages, shuah, viro [-- Attachment #1: Type: text/plain, Size: 919 bytes --] Hi! > > At first, I thought that the proposed system call is capable of > > reading *multiple* small files using a single system call - which > > would help increase HDD/SSD queue utilization and increase IOPS (I/O > > operations per second) - but that isn't the case and the proposed > > system call can read just a single file. > > If you want to do this for multple files, use io_ring, that's what it > was designed for. I think Jens was going to be adding support for the > open/read/close pattern to it as well, after some other more pressing > features/fixes were finished. What about... just using io_uring for single file, too? I'm pretty sure it can be wrapped in a library that is simple to use, avoiding need for new syscall. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 0/3] readfile(2): a new syscall to make open/read/close faster 2020-07-14 6:51 ` Pavel Machek @ 2020-07-14 8:07 ` Miklos Szeredi 2020-07-14 11:34 ` Pavel Begunkov 0 siblings, 1 reply; 31+ messages in thread From: Miklos Szeredi @ 2020-07-14 8:07 UTC (permalink / raw) To: Pavel Machek Cc: Greg KH, Jan Ziak, Linux API, linux-fsdevel, linux-kernel, linux-kselftest, linux-man, Michael Kerrisk, shuah, Al Viro, io-uring On Tue, Jul 14, 2020 at 8:51 AM Pavel Machek <pavel@denx.de> wrote: > > Hi! > > > > At first, I thought that the proposed system call is capable of > > > reading *multiple* small files using a single system call - which > > > would help increase HDD/SSD queue utilization and increase IOPS (I/O > > > operations per second) - but that isn't the case and the proposed > > > system call can read just a single file. > > > > If you want to do this for multple files, use io_ring, that's what it > > was designed for. I think Jens was going to be adding support for the > > open/read/close pattern to it as well, after some other more pressing > > features/fixes were finished. > > What about... just using io_uring for single file, too? I'm pretty > sure it can be wrapped in a library that is simple to use, avoiding > need for new syscall. Just wondering: is there a plan to add strace support to io_uring? And I don't just mean the syscalls associated with io_uring, but tracing the ring itself. I think that's quite important as io_uring becomes mainstream. Thanks, Miklos ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 0/3] readfile(2): a new syscall to make open/read/close faster 2020-07-14 8:07 ` Miklos Szeredi @ 2020-07-14 11:34 ` Pavel Begunkov 2020-07-14 11:55 ` Miklos Szeredi 0 siblings, 1 reply; 31+ messages in thread From: Pavel Begunkov @ 2020-07-14 11:34 UTC (permalink / raw) To: Miklos Szeredi, Pavel Machek Cc: Greg KH, Jan Ziak, Linux API, linux-fsdevel, linux-kernel, linux-kselftest, linux-man, Michael Kerrisk, shuah, Al Viro, io-uring On 14/07/2020 11:07, Miklos Szeredi wrote: > On Tue, Jul 14, 2020 at 8:51 AM Pavel Machek <pavel@denx.de> wrote: >> >> Hi! >> >>>> At first, I thought that the proposed system call is capable of >>>> reading *multiple* small files using a single system call - which >>>> would help increase HDD/SSD queue utilization and increase IOPS (I/O >>>> operations per second) - but that isn't the case and the proposed >>>> system call can read just a single file. >>> >>> If you want to do this for multple files, use io_ring, that's what it >>> was designed for. I think Jens was going to be adding support for the >>> open/read/close pattern to it as well, after some other more pressing >>> features/fixes were finished. >> >> What about... just using io_uring for single file, too? I'm pretty >> sure it can be wrapped in a library that is simple to use, avoiding >> need for new syscall. > > Just wondering: is there a plan to add strace support to io_uring? > And I don't just mean the syscalls associated with io_uring, but > tracing the ring itself. What kind of support do you mean? io_uring is asynchronous in nature with all intrinsic tracing/debugging/etc. problems of such APIs. And there are a lot of handy trace points, are those not enough? Though, this can be an interesting project to rethink how async APIs are worked with. > > I think that's quite important as io_uring becomes mainstream. -- Pavel Begunkov ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 0/3] readfile(2): a new syscall to make open/read/close faster 2020-07-14 11:34 ` Pavel Begunkov @ 2020-07-14 11:55 ` Miklos Szeredi 2020-07-15 8:31 ` Pavel Begunkov 0 siblings, 1 reply; 31+ messages in thread From: Miklos Szeredi @ 2020-07-14 11:55 UTC (permalink / raw) To: Pavel Begunkov Cc: Pavel Machek, Greg KH, Jan Ziak, Linux API, linux-fsdevel, linux-kernel, linux-kselftest, linux-man, Michael Kerrisk, shuah, Al Viro, io-uring On Tue, Jul 14, 2020 at 1:36 PM Pavel Begunkov <asml.silence@gmail.com> wrote: > > On 14/07/2020 11:07, Miklos Szeredi wrote: > > On Tue, Jul 14, 2020 at 8:51 AM Pavel Machek <pavel@denx.de> wrote: > >> > >> Hi! > >> > >>>> At first, I thought that the proposed system call is capable of > >>>> reading *multiple* small files using a single system call - which > >>>> would help increase HDD/SSD queue utilization and increase IOPS (I/O > >>>> operations per second) - but that isn't the case and the proposed > >>>> system call can read just a single file. > >>> > >>> If you want to do this for multple files, use io_ring, that's what it > >>> was designed for. I think Jens was going to be adding support for the > >>> open/read/close pattern to it as well, after some other more pressing > >>> features/fixes were finished. > >> > >> What about... just using io_uring for single file, too? I'm pretty > >> sure it can be wrapped in a library that is simple to use, avoiding > >> need for new syscall. > > > > Just wondering: is there a plan to add strace support to io_uring? > > And I don't just mean the syscalls associated with io_uring, but > > tracing the ring itself. > > What kind of support do you mean? io_uring is asynchronous in nature > with all intrinsic tracing/debugging/etc. problems of such APIs. > And there are a lot of handy trace points, are those not enough? > > Though, this can be an interesting project to rethink how async > APIs are worked with. Yeah, it's an interesting problem. The uring has the same events, as far as I understand, that are recorded in a multithreaded strace output (syscall entry, syscall exit); nothing more is needed. I do think this needs to be integrated into strace(1), otherwise the usefulness of that tool (which I think is *very* high) would go down drastically as io_uring usage goes up. Thanks, Miklos ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 0/3] readfile(2): a new syscall to make open/read/close faster 2020-07-14 11:55 ` Miklos Szeredi @ 2020-07-15 8:31 ` Pavel Begunkov 2020-07-15 8:41 ` Miklos Szeredi 0 siblings, 1 reply; 31+ messages in thread From: Pavel Begunkov @ 2020-07-15 8:31 UTC (permalink / raw) To: Miklos Szeredi Cc: Pavel Machek, Greg KH, Jan Ziak, Linux API, linux-fsdevel, linux-kernel, linux-kselftest, linux-man, Michael Kerrisk, shuah, Al Viro, io-uring On 14/07/2020 14:55, Miklos Szeredi wrote: > On Tue, Jul 14, 2020 at 1:36 PM Pavel Begunkov <asml.silence@gmail.com> wrote: >> >> On 14/07/2020 11:07, Miklos Szeredi wrote: >>> On Tue, Jul 14, 2020 at 8:51 AM Pavel Machek <pavel@denx.de> wrote: >>>> >>>> Hi! >>>> >>>>>> At first, I thought that the proposed system call is capable of >>>>>> reading *multiple* small files using a single system call - which >>>>>> would help increase HDD/SSD queue utilization and increase IOPS (I/O >>>>>> operations per second) - but that isn't the case and the proposed >>>>>> system call can read just a single file. >>>>> >>>>> If you want to do this for multple files, use io_ring, that's what it >>>>> was designed for. I think Jens was going to be adding support for the >>>>> open/read/close pattern to it as well, after some other more pressing >>>>> features/fixes were finished. >>>> >>>> What about... just using io_uring for single file, too? I'm pretty >>>> sure it can be wrapped in a library that is simple to use, avoiding >>>> need for new syscall. >>> >>> Just wondering: is there a plan to add strace support to io_uring? >>> And I don't just mean the syscalls associated with io_uring, but >>> tracing the ring itself. >> >> What kind of support do you mean? io_uring is asynchronous in nature >> with all intrinsic tracing/debugging/etc. problems of such APIs. >> And there are a lot of handy trace points, are those not enough? >> >> Though, this can be an interesting project to rethink how async >> APIs are worked with. > > Yeah, it's an interesting problem. The uring has the same events, as > far as I understand, that are recorded in a multithreaded strace > output (syscall entry, syscall exit); nothing more is needed> > I do think this needs to be integrated into strace(1), otherwise the > usefulness of that tool (which I think is *very* high) would go down > drastically as io_uring usage goes up. Not touching the topic of usefulness of strace + io_uring, but I'd rather have a tool that solves a problem, than a problem that created and honed for a tool. -- Pavel Begunkov ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 0/3] readfile(2): a new syscall to make open/read/close faster 2020-07-15 8:31 ` Pavel Begunkov @ 2020-07-15 8:41 ` Miklos Szeredi 2020-07-15 8:49 ` Pavel Begunkov 0 siblings, 1 reply; 31+ messages in thread From: Miklos Szeredi @ 2020-07-15 8:41 UTC (permalink / raw) To: Pavel Begunkov Cc: Pavel Machek, Greg KH, Jan Ziak, Linux API, linux-fsdevel, linux-kernel, linux-kselftest, linux-man, Michael Kerrisk, shuah, Al Viro, io-uring On Wed, Jul 15, 2020 at 10:33 AM Pavel Begunkov <asml.silence@gmail.com> wrote: > > On 14/07/2020 14:55, Miklos Szeredi wrote: > > On Tue, Jul 14, 2020 at 1:36 PM Pavel Begunkov <asml.silence@gmail.com> wrote: > >> > >> On 14/07/2020 11:07, Miklos Szeredi wrote: > >>> On Tue, Jul 14, 2020 at 8:51 AM Pavel Machek <pavel@denx.de> wrote: > >>>> > >>>> Hi! > >>>> > >>>>>> At first, I thought that the proposed system call is capable of > >>>>>> reading *multiple* small files using a single system call - which > >>>>>> would help increase HDD/SSD queue utilization and increase IOPS (I/O > >>>>>> operations per second) - but that isn't the case and the proposed > >>>>>> system call can read just a single file. > >>>>> > >>>>> If you want to do this for multple files, use io_ring, that's what it > >>>>> was designed for. I think Jens was going to be adding support for the > >>>>> open/read/close pattern to it as well, after some other more pressing > >>>>> features/fixes were finished. > >>>> > >>>> What about... just using io_uring for single file, too? I'm pretty > >>>> sure it can be wrapped in a library that is simple to use, avoiding > >>>> need for new syscall. > >>> > >>> Just wondering: is there a plan to add strace support to io_uring? > >>> And I don't just mean the syscalls associated with io_uring, but > >>> tracing the ring itself. > >> > >> What kind of support do you mean? io_uring is asynchronous in nature > >> with all intrinsic tracing/debugging/etc. problems of such APIs. > >> And there are a lot of handy trace points, are those not enough? > >> > >> Though, this can be an interesting project to rethink how async > >> APIs are worked with. > > > > Yeah, it's an interesting problem. The uring has the same events, as > > far as I understand, that are recorded in a multithreaded strace > > output (syscall entry, syscall exit); nothing more is needed> > > I do think this needs to be integrated into strace(1), otherwise the > > usefulness of that tool (which I think is *very* high) would go down > > drastically as io_uring usage goes up. > > Not touching the topic of usefulness of strace + io_uring, but I'd rather > have a tool that solves a problem, than a problem that created and honed > for a tool. Sorry, I'm not getting the metaphor. Can you please elaborate? Thanks, Miklos ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 0/3] readfile(2): a new syscall to make open/read/close faster 2020-07-15 8:41 ` Miklos Szeredi @ 2020-07-15 8:49 ` Pavel Begunkov 2020-07-15 9:00 ` Pavel Begunkov 0 siblings, 1 reply; 31+ messages in thread From: Pavel Begunkov @ 2020-07-15 8:49 UTC (permalink / raw) To: Miklos Szeredi Cc: Pavel Machek, Greg KH, Jan Ziak, Linux API, linux-fsdevel, linux-kernel, linux-kselftest, linux-man, Michael Kerrisk, shuah, Al Viro, io-uring On 15/07/2020 11:41, Miklos Szeredi wrote: > On Wed, Jul 15, 2020 at 10:33 AM Pavel Begunkov <asml.silence@gmail.com> wrote: >> >> On 14/07/2020 14:55, Miklos Szeredi wrote: >>> On Tue, Jul 14, 2020 at 1:36 PM Pavel Begunkov <asml.silence@gmail.com> wrote: >>>> >>>> On 14/07/2020 11:07, Miklos Szeredi wrote: >>>>> On Tue, Jul 14, 2020 at 8:51 AM Pavel Machek <pavel@denx.de> wrote: >>>>>> >>>>>> Hi! >>>>>> >>>>>>>> At first, I thought that the proposed system call is capable of >>>>>>>> reading *multiple* small files using a single system call - which >>>>>>>> would help increase HDD/SSD queue utilization and increase IOPS (I/O >>>>>>>> operations per second) - but that isn't the case and the proposed >>>>>>>> system call can read just a single file. >>>>>>> >>>>>>> If you want to do this for multple files, use io_ring, that's what it >>>>>>> was designed for. I think Jens was going to be adding support for the >>>>>>> open/read/close pattern to it as well, after some other more pressing >>>>>>> features/fixes were finished. >>>>>> >>>>>> What about... just using io_uring for single file, too? I'm pretty >>>>>> sure it can be wrapped in a library that is simple to use, avoiding >>>>>> need for new syscall. >>>>> >>>>> Just wondering: is there a plan to add strace support to io_uring? >>>>> And I don't just mean the syscalls associated with io_uring, but >>>>> tracing the ring itself. >>>> >>>> What kind of support do you mean? io_uring is asynchronous in nature >>>> with all intrinsic tracing/debugging/etc. problems of such APIs. >>>> And there are a lot of handy trace points, are those not enough? >>>> >>>> Though, this can be an interesting project to rethink how async >>>> APIs are worked with. >>> >>> Yeah, it's an interesting problem. The uring has the same events, as >>> far as I understand, that are recorded in a multithreaded strace >>> output (syscall entry, syscall exit); nothing more is needed> >>> I do think this needs to be integrated into strace(1), otherwise the >>> usefulness of that tool (which I think is *very* high) would go down >>> drastically as io_uring usage goes up. >> >> Not touching the topic of usefulness of strace + io_uring, but I'd rather >> have a tool that solves a problem, than a problem that created and honed >> for a tool. > > Sorry, I'm not getting the metaphor. Can you please elaborate? Sure, I mean _if_ there are tools that conceptually suit better, I'd prefer to work with them, then trying to shove a new and possibly alien infrastructure into strace. But my knowledge of strace is very limited, so can't tell whether that's the case. E.g. can it utilise static trace points? -- Pavel Begunkov ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 0/3] readfile(2): a new syscall to make open/read/close faster 2020-07-15 8:49 ` Pavel Begunkov @ 2020-07-15 9:00 ` Pavel Begunkov 2020-07-15 11:17 ` Miklos Szeredi 0 siblings, 1 reply; 31+ messages in thread From: Pavel Begunkov @ 2020-07-15 9:00 UTC (permalink / raw) To: Miklos Szeredi Cc: Pavel Machek, Greg KH, Jan Ziak, Linux API, linux-fsdevel, linux-kernel, linux-kselftest, linux-man, Michael Kerrisk, shuah, Al Viro, io-uring On 15/07/2020 11:49, Pavel Begunkov wrote: > On 15/07/2020 11:41, Miklos Szeredi wrote: >> On Wed, Jul 15, 2020 at 10:33 AM Pavel Begunkov <asml.silence@gmail.com> wrote: >>> >>> On 14/07/2020 14:55, Miklos Szeredi wrote: >>>> On Tue, Jul 14, 2020 at 1:36 PM Pavel Begunkov <asml.silence@gmail.com> wrote: >>>>> >>>>> On 14/07/2020 11:07, Miklos Szeredi wrote: >>>>>> On Tue, Jul 14, 2020 at 8:51 AM Pavel Machek <pavel@denx.de> wrote: >>>>>>> >>>>>>> Hi! >>>>>>> >>>>>>>>> At first, I thought that the proposed system call is capable of >>>>>>>>> reading *multiple* small files using a single system call - which >>>>>>>>> would help increase HDD/SSD queue utilization and increase IOPS (I/O >>>>>>>>> operations per second) - but that isn't the case and the proposed >>>>>>>>> system call can read just a single file. >>>>>>>> >>>>>>>> If you want to do this for multple files, use io_ring, that's what it >>>>>>>> was designed for. I think Jens was going to be adding support for the >>>>>>>> open/read/close pattern to it as well, after some other more pressing >>>>>>>> features/fixes were finished. >>>>>>> >>>>>>> What about... just using io_uring for single file, too? I'm pretty >>>>>>> sure it can be wrapped in a library that is simple to use, avoiding >>>>>>> need for new syscall. >>>>>> >>>>>> Just wondering: is there a plan to add strace support to io_uring? >>>>>> And I don't just mean the syscalls associated with io_uring, but >>>>>> tracing the ring itself. >>>>> >>>>> What kind of support do you mean? io_uring is asynchronous in nature >>>>> with all intrinsic tracing/debugging/etc. problems of such APIs. >>>>> And there are a lot of handy trace points, are those not enough? >>>>> >>>>> Though, this can be an interesting project to rethink how async >>>>> APIs are worked with. >>>> >>>> Yeah, it's an interesting problem. The uring has the same events, as >>>> far as I understand, that are recorded in a multithreaded strace >>>> output (syscall entry, syscall exit); nothing more is needed> >>>> I do think this needs to be integrated into strace(1), otherwise the >>>> usefulness of that tool (which I think is *very* high) would go down >>>> drastically as io_uring usage goes up. >>> >>> Not touching the topic of usefulness of strace + io_uring, but I'd rather >>> have a tool that solves a problem, than a problem that created and honed >>> for a tool. >> >> Sorry, I'm not getting the metaphor. Can you please elaborate? > > Sure, I mean _if_ there are tools that conceptually suit better, I'd > prefer to work with them, then trying to shove a new and possibly alien > infrastructure into strace. > > But my knowledge of strace is very limited, so can't tell whether that's > the case. E.g. can it utilise static trace points? I think, if you're going to push this idea, we should start a new thread CC'ing strace devs. -- Pavel Begunkov ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 0/3] readfile(2): a new syscall to make open/read/close faster 2020-07-15 9:00 ` Pavel Begunkov @ 2020-07-15 11:17 ` Miklos Szeredi 0 siblings, 0 replies; 31+ messages in thread From: Miklos Szeredi @ 2020-07-15 11:17 UTC (permalink / raw) To: Pavel Begunkov Cc: Pavel Machek, Greg KH, Jan Ziak, Linux API, linux-fsdevel, linux-kernel, linux-kselftest, linux-man, Michael Kerrisk, shuah, Al Viro, io-uring On Wed, Jul 15, 2020 at 11:02 AM Pavel Begunkov <asml.silence@gmail.com> wrote: > I think, if you're going to push this idea, we should start a new thread > CC'ing strace devs. Makes sense. I've pruned the Cc list, so here's the link for reference: https://lore.kernel.org/linux-fsdevel/CAJfpegu3EwbBFTSJiPhm7eMyTK2MzijLUp1gcboOo3meMF_+Qg@mail.gmail.com/T/#u Thanks, Miklos ^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH 0/3] readfile(2): a new syscall to make open/read/close faster @ 2020-07-04 14:02 Greg Kroah-Hartman 2020-07-04 19:30 ` Al Viro 2020-07-06 17:25 ` Dave Martin 0 siblings, 2 replies; 31+ messages in thread From: Greg Kroah-Hartman @ 2020-07-04 14:02 UTC (permalink / raw) To: viro, mtk.manpages, shuah, linux-api Cc: linux-fsdevel, linux-kernel, linux-man, linux-kselftest, Greg Kroah-Hartman Here is a tiny new syscall, readfile, that makes it simpler to read small/medium sized files all in one shot, no need to do open/read/close. This is especially helpful for tools that poke around in procfs or sysfs, making a little bit of a less system load than before, especially as syscall overheads go up over time due to various CPU bugs being addressed. There are 4 patches in this series, the first 3 are against the kernel tree, adding the syscall logic, wiring up the syscall, and adding some tests for it. The last patch is agains the man-pages project, adding a tiny man page to try to describe the new syscall. Greg Kroah-Hartman (3): readfile: implement readfile syscall arch: wire up the readfile syscall selftests: add readfile(2) selftests arch/alpha/kernel/syscalls/syscall.tbl | 1 + arch/arm/tools/syscall.tbl | 1 + arch/arm64/include/asm/unistd.h | 2 +- arch/arm64/include/asm/unistd32.h | 2 + arch/ia64/kernel/syscalls/syscall.tbl | 1 + arch/m68k/kernel/syscalls/syscall.tbl | 1 + arch/microblaze/kernel/syscalls/syscall.tbl | 1 + arch/mips/kernel/syscalls/syscall_n32.tbl | 1 + arch/mips/kernel/syscalls/syscall_n64.tbl | 1 + arch/mips/kernel/syscalls/syscall_o32.tbl | 1 + arch/parisc/kernel/syscalls/syscall.tbl | 1 + arch/powerpc/kernel/syscalls/syscall.tbl | 1 + arch/s390/kernel/syscalls/syscall.tbl | 1 + arch/sh/kernel/syscalls/syscall.tbl | 1 + arch/sparc/kernel/syscalls/syscall.tbl | 1 + arch/x86/entry/syscalls/syscall_32.tbl | 1 + arch/x86/entry/syscalls/syscall_64.tbl | 1 + arch/xtensa/kernel/syscalls/syscall.tbl | 1 + fs/open.c | 50 +++ include/linux/syscalls.h | 2 + include/uapi/asm-generic/unistd.h | 4 +- tools/testing/selftests/Makefile | 1 + tools/testing/selftests/readfile/.gitignore | 3 + tools/testing/selftests/readfile/Makefile | 7 + tools/testing/selftests/readfile/readfile.c | 285 +++++++++++++++++ .../selftests/readfile/readfile_speed.c | 301 ++++++++++++++++++ 26 files changed, 671 insertions(+), 2 deletions(-) create mode 100644 tools/testing/selftests/readfile/.gitignore create mode 100644 tools/testing/selftests/readfile/Makefile create mode 100644 tools/testing/selftests/readfile/readfile.c create mode 100644 tools/testing/selftests/readfile/readfile_speed.c -- 2.27.0 ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 0/3] readfile(2): a new syscall to make open/read/close faster 2020-07-04 14:02 Greg Kroah-Hartman @ 2020-07-04 19:30 ` Al Viro 2020-07-05 11:47 ` Greg Kroah-Hartman 2020-07-06 17:25 ` Dave Martin 1 sibling, 1 reply; 31+ messages in thread From: Al Viro @ 2020-07-04 19:30 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: mtk.manpages, shuah, linux-api, linux-fsdevel, linux-kernel, linux-man, linux-kselftest On Sat, Jul 04, 2020 at 04:02:46PM +0200, Greg Kroah-Hartman wrote: > Here is a tiny new syscall, readfile, that makes it simpler to read > small/medium sized files all in one shot, no need to do open/read/close. > This is especially helpful for tools that poke around in procfs or > sysfs, making a little bit of a less system load than before, especially > as syscall overheads go up over time due to various CPU bugs being > addressed. Nice series, but you are 3 months late with it... Next AFD, perhaps? Seriously, the rationale is bollocks. If the overhead of 2 extra syscalls is anywhere near the costs of the real work being done by that thing, we have already lost and the best thing to do is to throw the system away and start with saner hardware. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 0/3] readfile(2): a new syscall to make open/read/close faster 2020-07-04 19:30 ` Al Viro @ 2020-07-05 11:47 ` Greg Kroah-Hartman 0 siblings, 0 replies; 31+ messages in thread From: Greg Kroah-Hartman @ 2020-07-05 11:47 UTC (permalink / raw) To: Al Viro Cc: mtk.manpages, shuah, linux-api, linux-fsdevel, linux-kernel, linux-man, linux-kselftest On Sat, Jul 04, 2020 at 08:30:40PM +0100, Al Viro wrote: > On Sat, Jul 04, 2020 at 04:02:46PM +0200, Greg Kroah-Hartman wrote: > > Here is a tiny new syscall, readfile, that makes it simpler to read > > small/medium sized files all in one shot, no need to do open/read/close. > > This is especially helpful for tools that poke around in procfs or > > sysfs, making a little bit of a less system load than before, especially > > as syscall overheads go up over time due to various CPU bugs being > > addressed. > > Nice series, but you are 3 months late with it... Next AFD, perhaps? Perhaps :) > Seriously, the rationale is bollocks. If the overhead of 2 extra > syscalls is anywhere near the costs of the real work being done by > that thing, we have already lost and the best thing to do is to > throw the system away and start with saner hardware. The real-work the kernel does is almost neglegant compared to the open/close overhead of the syscalls on some platforms today. I'll post benchmarks with the next version of this patch series to hopefully show that. If not, then yeah, this isn't worth it, but it was fun to write. thanks, greg k-h ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 0/3] readfile(2): a new syscall to make open/read/close faster 2020-07-04 14:02 Greg Kroah-Hartman 2020-07-04 19:30 ` Al Viro @ 2020-07-06 17:25 ` Dave Martin 1 sibling, 0 replies; 31+ messages in thread From: Dave Martin @ 2020-07-06 17:25 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: viro, mtk.manpages, shuah, linux-api, linux-fsdevel, linux-kernel, linux-man, linux-kselftest On Sat, Jul 04, 2020 at 04:02:46PM +0200, Greg Kroah-Hartman wrote: > Here is a tiny new syscall, readfile, that makes it simpler to read > small/medium sized files all in one shot, no need to do open/read/close. > This is especially helpful for tools that poke around in procfs or > sysfs, making a little bit of a less system load than before, especially > as syscall overheads go up over time due to various CPU bugs being > addressed. > > There are 4 patches in this series, the first 3 are against the kernel > tree, adding the syscall logic, wiring up the syscall, and adding some > tests for it. > > The last patch is agains the man-pages project, adding a tiny man page > to try to describe the new syscall. General question, using this series as an illustration only: At the risk of starting a flamewar, why is this needed? Is there a realistic usecase that would get significant benefit from this? A lot of syscalls seem to get added that combine or refactor the functionality of existing syscalls without justifying why this is needed (or even wise). This case feels like a solution, not a primitive, so I wonder if the long-term ABI fragmentation is worth the benefit. I ask because I'd like to get an idea of the policy on what is and is not considered a frivolous ABI extension. (I'm sure a usecase must be in mind, but it isn't mentioned here. Certainly the time it takes top to dump the contents of /proc leaves something to be desired.) Cheers ---Dave ^ permalink raw reply [flat|nested] 31+ messages in thread
end of thread, other threads:[~2020-07-15 11:17 UTC | newest] Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-07-05 2:06 [PATCH 0/3] readfile(2): a new syscall to make open/read/close faster Jan Ziak 2020-07-05 2:16 ` Matthew Wilcox 2020-07-05 2:46 ` Jan Ziak 2020-07-05 3:12 ` Matthew Wilcox 2020-07-05 3:18 ` Jan Ziak 2020-07-05 3:27 ` Matthew Wilcox 2020-07-05 4:09 ` Jan Ziak 2020-07-05 11:58 ` Greg KH 2020-07-06 6:07 ` Jan Ziak 2020-07-06 11:11 ` Matthew Wilcox 2020-07-06 11:18 ` Greg KH 2020-07-05 8:07 ` Vito Caputo 2020-07-05 11:44 ` Greg KH 2020-07-05 20:34 ` Vito Caputo 2020-07-05 6:32 ` Andreas Dilger 2020-07-05 7:25 ` Jan Ziak 2020-07-05 12:00 ` Greg KH 2020-07-05 11:50 ` Greg KH 2020-07-14 6:51 ` Pavel Machek 2020-07-14 8:07 ` Miklos Szeredi 2020-07-14 11:34 ` Pavel Begunkov 2020-07-14 11:55 ` Miklos Szeredi 2020-07-15 8:31 ` Pavel Begunkov 2020-07-15 8:41 ` Miklos Szeredi 2020-07-15 8:49 ` Pavel Begunkov 2020-07-15 9:00 ` Pavel Begunkov 2020-07-15 11:17 ` Miklos Szeredi -- strict thread matches above, loose matches on Subject: below -- 2020-07-04 14:02 Greg Kroah-Hartman 2020-07-04 19:30 ` Al Viro 2020-07-05 11:47 ` Greg Kroah-Hartman 2020-07-06 17:25 ` Dave Martin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).