* splice(-> FIFO) never wakes up inotify IN_MODIFY? @ 2023-06-26 3:04 Ahelenia Ziemiańska 2023-06-26 6:11 ` Amir Goldstein 0 siblings, 1 reply; 55+ messages in thread From: Ahelenia Ziemiańska @ 2023-06-26 3:04 UTC (permalink / raw) To: Alexander Viro, Christian Brauner, linux-fsdevel, linux-kernel [-- Attachment #1: Type: text/plain, Size: 2173 bytes --] Hi! Consider the following programs: -- >8 -- ==> ino.c <== #define _GNU_SOURCE #include <stdio.h> #include <sys/inotify.h> #include <unistd.h> int main() { int ino = inotify_init1(IN_CLOEXEC); inotify_add_watch(ino, "/dev/fd/0", IN_MODIFY); char buf[64 * 1024]; struct inotify_event ev; while (read(ino, &ev, sizeof(ev)) > 0) { fprintf(stderr, "%d: mask=%x, cook=%x, len=%x, name=%.*s\n", ev.wd, ev.mask, ev.cookie, ev.len, (int)ev.len, ev.name); fprintf(stderr, "rd=%zd\n", read(0, buf, sizeof(buf))); } } ==> se.c <== #define _GNU_SOURCE #include <stdio.h> #include <sys/sendfile.h> int main() { ssize_t rd, acc = 0; while ((rd = sendfile(1, 0, 0, 128 * 1024 * 1024)) > 0) acc += rd; fprintf(stderr, "se=%zd: %m\n", acc); } ==> sp.c <== #define _GNU_SOURCE #include <fcntl.h> #include <stdio.h> int main() { ssize_t rd, acc = 0; while ((rd = splice(0, 0, 1, 0, 128 * 1024 * 1024, 0)) > 0) acc += rd; fprintf(stderr, "sp=%zd: %m\n", acc); } -- >8 -- By all means, ./sp | ./ino and ./se | ./ino should be equivalent, right? -- >8 -- $ make se sp ino $ mkfifo fifo $ ./ino < fifo & [1] 230 $ echo a > fifo $ echo a > fifo 1: mask=2, cook=0, len=0, name= rd=4 $ echo c > fifo 1: mask=2, cook=0, len=0, name= rd=2 $ ./se > fifo abcdef 1: mask=2, cook=0, len=0, name= asd ^D se=11: Success rd=11 1: mask=2, cook=0, len=0, name= rd=0 $ ./sp > fifo abcdefg asd dsasdadadad sp=24: Success $ < sp ./sp > fifo sp=25856: Success $ < sp ./sp > fifo ^C $ echo sp > fifo ^C -- >8 -- Note how in all ./sp > fifo cases, ./ino doesn't wake up! Note also how, thus, we've managed to fill the pipe buffer with ./sp (when it transferred 25856), and now we can't /ever/ write there again (both splicing and normal writes block, since there's no space left in the pipe; ./ino hasn't seen this and will never wake up or service the pipe): so we've effectively "denied service" by slickily using a different syscall to do the write, right? I consider this to be unexpected behaviour because (a) obviously and (b) sendfile() sends the inotify event. Happens on my linus checkout (v6.4-rc7-234-g547cc9be86f4) and bookworm (6.1.27-1). [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: splice(-> FIFO) never wakes up inotify IN_MODIFY? 2023-06-26 3:04 splice(-> FIFO) never wakes up inotify IN_MODIFY? Ahelenia Ziemiańska @ 2023-06-26 6:11 ` Amir Goldstein 2023-06-26 12:19 ` Ahelenia Ziemiańska 0 siblings, 1 reply; 55+ messages in thread From: Amir Goldstein @ 2023-06-26 6:11 UTC (permalink / raw) To: Ahelenia Ziemiańska Cc: Alexander Viro, Christian Brauner, linux-fsdevel, linux-kernel, Jan Kara On Mon, Jun 26, 2023 at 6:54 AM Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> wrote: > > Hi! > > Consider the following programs: > -- >8 -- > ==> ino.c <== > #define _GNU_SOURCE > #include <stdio.h> > #include <sys/inotify.h> > #include <unistd.h> > int main() { > int ino = inotify_init1(IN_CLOEXEC); > inotify_add_watch(ino, "/dev/fd/0", IN_MODIFY); > > char buf[64 * 1024]; > struct inotify_event ev; > while (read(ino, &ev, sizeof(ev)) > 0) { > fprintf(stderr, "%d: mask=%x, cook=%x, len=%x, name=%.*s\n", ev.wd, ev.mask, > ev.cookie, ev.len, (int)ev.len, ev.name); > fprintf(stderr, "rd=%zd\n", read(0, buf, sizeof(buf))); > } > } > That's a very odd (and wrong) way to implement poll(2). This is not a documented way to use pipes, so it may happen to work with sendfile(2), but there is no guarantee. > ==> se.c <== > #define _GNU_SOURCE > #include <stdio.h> > #include <sys/sendfile.h> > int main() { > ssize_t rd, acc = 0; > while ((rd = sendfile(1, 0, 0, 128 * 1024 * 1024)) > 0) > acc += rd; > fprintf(stderr, "se=%zd: %m\n", acc); > } > > ==> sp.c <== > #define _GNU_SOURCE > #include <fcntl.h> > #include <stdio.h> > int main() { > ssize_t rd, acc = 0; > while ((rd = splice(0, 0, 1, 0, 128 * 1024 * 1024, 0)) > 0) > acc += rd; > fprintf(stderr, "sp=%zd: %m\n", acc); > } > -- >8 -- > > By all means, ./sp | ./ino and ./se | ./ino should be equivalent, > right? > Maybe it should, but it's not. > -- >8 -- > $ make se sp ino > $ mkfifo fifo > $ ./ino < fifo & > [1] 230 > $ echo a > fifo > $ echo a > fifo > 1: mask=2, cook=0, len=0, name= > rd=4 > $ echo c > fifo > 1: mask=2, cook=0, len=0, name= > rd=2 > $ ./se > fifo > abcdef > 1: mask=2, cook=0, len=0, name= > asd > ^D > se=11: Success > rd=11 > 1: mask=2, cook=0, len=0, name= > rd=0 > $ ./sp > fifo > abcdefg > asd > dsasdadadad > sp=24: Success > $ < sp ./sp > fifo > sp=25856: Success > $ < sp ./sp > fifo > ^C > $ echo sp > fifo > ^C > -- >8 -- > > Note how in all ./sp > fifo cases, ./ino doesn't wake up! > Note also how, thus, we've managed to fill the pipe buffer with ./sp > (when it transferred 25856), and now we can't /ever/ write there again > (both splicing and normal writes block, since there's no space left in > the pipe; ./ino hasn't seen this and will never wake up or service the > pipe): > so we've effectively "denied service" by slickily using a different > syscall to do the write, right? > Only applications that do not check for availability of input in the pipe correctly will get "denied service". > I consider this to be unexpected behaviour because (a) obviously and > (b) sendfile() sends the inotify event. > The fact is that relying on inotify IN_MODIFY and IN_ACCESS events for pipes is not a good idea. splice(2) differentiates three different cases: if (ipipe && opipe) { ... if (ipipe) { ... if (opipe) { ... IN_ACCESS will only be generated for non-pipe input IN_MODIFY will only be generated for non-pipe output Similarly FAN_ACCESS_PERM fanotify permission events will only be generated for non-pipe input. sendfile(2) OTOH does not special cases the pipe input case at all and it generates IN_MODIFY for the pipe output case as well. If you would insist on fixing this inconsistency, I would be willing to consider a patch that matches sendfile(2) behavior to that of splice(2) and not the other way around. My general opinion about IN_ACCESS/IN_MODIFY (as well as FAN_ACCESS_PERM) is that they are not very practical, not well defined for pipes and anyway do not cover all the ways that a file can be modified/accessed (i.e. mmap). Therefore, IMO, there is no incentive to fix something that has been broken for decades unless you have a very real use case - not a made up one. Incidentally, I am working on a new set of fanotify permission events (FAN_PRE_ACCESS/MODIFY) that will have better defined semantics - those are not going to be applicable to pipes though. Thanks, Amir. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: splice(-> FIFO) never wakes up inotify IN_MODIFY? 2023-06-26 6:11 ` Amir Goldstein @ 2023-06-26 12:19 ` Ahelenia Ziemiańska 2023-06-26 12:57 ` Ahelenia Ziemiańska 2023-06-26 14:53 ` Amir Goldstein 0 siblings, 2 replies; 55+ messages in thread From: Ahelenia Ziemiańska @ 2023-06-26 12:19 UTC (permalink / raw) To: Amir Goldstein Cc: Alexander Viro, Christian Brauner, linux-fsdevel, linux-kernel, Jan Kara [-- Attachment #1: Type: text/plain, Size: 4623 bytes --] On Mon, Jun 26, 2023 at 09:11:53AM +0300, Amir Goldstein wrote: > On Mon, Jun 26, 2023 at 6:54 AM Ahelenia Ziemiańska > <nabijaczleweli@nabijaczleweli.xyz> wrote: > > > > Hi! > > > > Consider the following programs: > > -- >8 -- > > ==> ino.c <== > > #define _GNU_SOURCE > > #include <stdio.h> > > #include <sys/inotify.h> > > #include <unistd.h> > > int main() { > > int ino = inotify_init1(IN_CLOEXEC); > > inotify_add_watch(ino, "/dev/fd/0", IN_MODIFY); > > > > char buf[64 * 1024]; > > struct inotify_event ev; > > while (read(ino, &ev, sizeof(ev)) > 0) { > > fprintf(stderr, "%d: mask=%x, cook=%x, len=%x, name=%.*s\n", ev.wd, ev.mask, > > ev.cookie, ev.len, (int)ev.len, ev.name); > > fprintf(stderr, "rd=%zd\n", read(0, buf, sizeof(buf))); > > } > > } > > > > That's a very odd (and wrong) way to implement poll(2). > This is not a documented way to use pipes, so it may > happen to work with sendfile(2), but there is no guarantee. That's what I'm trying to do, yes. What's the right way to implement poll here? Because I don't think Linux has poll for pipes that behaves like this and POSIX certainly doesn't guarantee it, and, indeed, requires that polling a pipe that was hanged up instantly returns with POLLHUP forever. > > -- >8 -- > > $ make se sp ino > > $ mkfifo fifo > > $ ./ino < fifo & > > [1] 230 > > $ echo a > fifo > > $ echo a > fifo > > 1: mask=2, cook=0, len=0, name= > > rd=4 > > $ echo c > fifo > > 1: mask=2, cook=0, len=0, name= > > rd=2 > > $ ./se > fifo > > abcdef > > 1: mask=2, cook=0, len=0, name= > > asd > > ^D > > se=11: Success > > rd=11 > > 1: mask=2, cook=0, len=0, name= > > rd=0 > > $ ./sp > fifo > > abcdefg > > asd > > dsasdadadad > > sp=24: Success > > $ < sp ./sp > fifo > > sp=25856: Success > > $ < sp ./sp > fifo > > ^C > > $ echo sp > fifo > > ^C > > -- >8 -- > > > > Note how in all ./sp > fifo cases, ./ino doesn't wake up! > > Note also how, thus, we've managed to fill the pipe buffer with ./sp > > (when it transferred 25856), and now we can't /ever/ write there again > > (both splicing and normal writes block, since there's no space left in > > the pipe; ./ino hasn't seen this and will never wake up or service the > > pipe): > > so we've effectively "denied service" by slickily using a different > > syscall to do the write, right? > > > > I consider this to be unexpected behaviour because (a) obviously and > > (b) sendfile() sends the inotify event. > > > Only applications that do not check for availability > of input in the pipe correctly will get "denied service". > > The fact is that relying on inotify IN_MODIFY and IN_ACCESS events > for pipes is not a good idea. Okay, so how /is/ "correctly" then? Sleep in a loop and read non-blockingly? splice also breaks that (and, well, the pipe it's splicing to in general) https://lore.kernel.org/linux-fsdevel/qk6hjuam54khlaikf2ssom6custxf5is2ekkaequf4hvode3ls@zgf7j5j4ubvw/t/#u but that's beside the point I guess. > splice(2) differentiates three different cases: > if (ipipe && opipe) { > ... > if (ipipe) { > ... > if (opipe) { > ... > > IN_ACCESS will only be generated for non-pipe input > IN_MODIFY will only be generated for non-pipe output > > Similarly FAN_ACCESS_PERM fanotify permission events > will only be generated for non-pipe input. inotify(7) and fanotify(7) don't squeak on that, and imply the *ACCESS stuff is just for reading. > sendfile(2) OTOH does not special cases the pipe input > case at all and it generates IN_MODIFY for the pipe output > case as well. > > My general opinion about IN_ACCESS/IN_MODIFY > (as well as FAN_ACCESS_PERM) is that they are not > very practical, not well defined for pipes and anyway do > not cover all the ways that a file can be modified/accessed > (i.e. mmap). Therefore, IMO, there is no incentive to fix > something that has been broken for decades unless > you have a very real use case - not a made up one. My made-up use-case is tail -f, but I can just request IN_MOFIFY|IN_ACCESS for pipes, so if that's "correctly" then great. If it isn't, then, again, how /do/ you poll pipes. > If you would insist on fixing this inconsistency, I would be > willing to consider a patch that matches sendfile(2) behavior > to that of splice(2) and not the other way around. Meh, platform-specific API, long-standing behaviour, it's whatever; I'll just update *notify(7) to include that *ACCESSses are generated for "wants to/has read OR pipe was written". [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: splice(-> FIFO) never wakes up inotify IN_MODIFY? 2023-06-26 12:19 ` Ahelenia Ziemiańska @ 2023-06-26 12:57 ` Ahelenia Ziemiańska 2023-06-26 13:51 ` Jan Kara 2023-06-26 14:53 ` Amir Goldstein 1 sibling, 1 reply; 55+ messages in thread From: Ahelenia Ziemiańska @ 2023-06-26 12:57 UTC (permalink / raw) To: Amir Goldstein Cc: Alexander Viro, Christian Brauner, linux-fsdevel, linux-kernel, Jan Kara [-- Attachment #1: Type: text/plain, Size: 680 bytes --] On Mon, Jun 26, 2023 at 02:19:42PM +0200, Ahelenia Ziemiańska wrote: > > splice(2) differentiates three different cases: > > if (ipipe && opipe) { > > ... > > if (ipipe) { > > ... > > if (opipe) { > > ... > > > > IN_ACCESS will only be generated for non-pipe input > > IN_MODIFY will only be generated for non-pipe output > > > > Similarly FAN_ACCESS_PERM fanotify permission events > > will only be generated for non-pipe input. Sorry, I must've misunderstood this as "splicing to a pipe generates *ACCESS". Testing reveals this is not the case. So is it really true that the only way to poll a pipe is a sleep()/read(O_NONBLOCK) loop? [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: splice(-> FIFO) never wakes up inotify IN_MODIFY? 2023-06-26 12:57 ` Ahelenia Ziemiańska @ 2023-06-26 13:51 ` Jan Kara 2023-06-26 14:25 ` Ahelenia Ziemiańska 0 siblings, 1 reply; 55+ messages in thread From: Jan Kara @ 2023-06-26 13:51 UTC (permalink / raw) To: Ahelenia Ziemiańska Cc: Amir Goldstein, Alexander Viro, Christian Brauner, linux-fsdevel, linux-kernel, Jan Kara On Mon 26-06-23 14:57:55, Ahelenia Ziemiańska wrote: > On Mon, Jun 26, 2023 at 02:19:42PM +0200, Ahelenia Ziemiańska wrote: > > > splice(2) differentiates three different cases: > > > if (ipipe && opipe) { > > > ... > > > if (ipipe) { > > > ... > > > if (opipe) { > > > ... > > > > > > IN_ACCESS will only be generated for non-pipe input > > > IN_MODIFY will only be generated for non-pipe output > > > > > > Similarly FAN_ACCESS_PERM fanotify permission events > > > will only be generated for non-pipe input. > Sorry, I must've misunderstood this as "splicing to a pipe generates > *ACCESS". Testing reveals this is not the case. So is it really true > that the only way to poll a pipe is a sleep()/read(O_NONBLOCK) loop? So why doesn't poll(3) work? AFAIK it should... Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: splice(-> FIFO) never wakes up inotify IN_MODIFY? 2023-06-26 13:51 ` Jan Kara @ 2023-06-26 14:25 ` Ahelenia Ziemiańska 2023-06-26 15:00 ` Jan Kara 0 siblings, 1 reply; 55+ messages in thread From: Ahelenia Ziemiańska @ 2023-06-26 14:25 UTC (permalink / raw) To: Jan Kara Cc: Amir Goldstein, Alexander Viro, Christian Brauner, linux-fsdevel, linux-kernel [-- Attachment #1: Type: text/plain, Size: 2072 bytes --] On Mon, Jun 26, 2023 at 03:51:59PM +0200, Jan Kara wrote: > On Mon 26-06-23 14:57:55, Ahelenia Ziemiańska wrote: > > On Mon, Jun 26, 2023 at 02:19:42PM +0200, Ahelenia Ziemiańska wrote: > > > > splice(2) differentiates three different cases: > > > > if (ipipe && opipe) { > > > > ... > > > > if (ipipe) { > > > > ... > > > > if (opipe) { > > > > ... > > > > > > > > IN_ACCESS will only be generated for non-pipe input > > > > IN_MODIFY will only be generated for non-pipe output > > > > > > > > Similarly FAN_ACCESS_PERM fanotify permission events > > > > will only be generated for non-pipe input. > > Sorry, I must've misunderstood this as "splicing to a pipe generates > > *ACCESS". Testing reveals this is not the case. So is it really true > > that the only way to poll a pipe is a sleep()/read(O_NONBLOCK) loop? > So why doesn't poll(3) work? AFAIK it should... poll returns instantly with revents=POLLHUP for pipes that were closed by the last writer. Thus, you're either in a hot loop or you have to explicitly detect this and fall back to sleeping, which defeats the point of polling: -- >8 -- #define _GNU_SOURCE #include <errno.h> #include <fcntl.h> #include <poll.h> #include <stdio.h> #include <unistd.h> int main() { char buf[64 * 1024]; struct pollfd pf = {.fd = 0, .events = POLLIN}; size_t consec = 0; for (ssize_t rd;;) { while (poll(&pf, 1, -1) <= 0) ; if (pf.revents & POLLIN) { while ((rd = read(0, buf, sizeof(buf))) == -1 && errno == EINTR) ; fprintf(stderr, "\nrd=%zd: %m\n", rd); } if (pf.revents & POLLHUP) { if (!consec++) fprintf(stderr, "\n\tPOLLHUPs"); fprintf(stderr, "\r%zu", consec); } else consec = 0; } } -- >8 -- And -- >8 -- $ ./rdr < fifo rd=12: Success 1779532 POLLHUPs rd=5: Success 945087 POLLHUPs rd=12: Success ^C -- >8 -- corresponding to -- >8 -- $ cat > fifo abc def ghi ^D $ echo zupa > fifo $ cat > fifo as dsaa asd ^C -- >8 -- [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: splice(-> FIFO) never wakes up inotify IN_MODIFY? 2023-06-26 14:25 ` Ahelenia Ziemiańska @ 2023-06-26 15:00 ` Jan Kara 2023-06-26 15:15 ` Ahelenia Ziemiańska 0 siblings, 1 reply; 55+ messages in thread From: Jan Kara @ 2023-06-26 15:00 UTC (permalink / raw) To: Ahelenia Ziemiańska Cc: Jan Kara, Amir Goldstein, Alexander Viro, Christian Brauner, linux-fsdevel, linux-kernel On Mon 26-06-23 16:25:41, Ahelenia Ziemiańska wrote: > On Mon, Jun 26, 2023 at 03:51:59PM +0200, Jan Kara wrote: > > On Mon 26-06-23 14:57:55, Ahelenia Ziemiańska wrote: > > > On Mon, Jun 26, 2023 at 02:19:42PM +0200, Ahelenia Ziemiańska wrote: > > > > > splice(2) differentiates three different cases: > > > > > if (ipipe && opipe) { > > > > > ... > > > > > if (ipipe) { > > > > > ... > > > > > if (opipe) { > > > > > ... > > > > > > > > > > IN_ACCESS will only be generated for non-pipe input > > > > > IN_MODIFY will only be generated for non-pipe output > > > > > > > > > > Similarly FAN_ACCESS_PERM fanotify permission events > > > > > will only be generated for non-pipe input. > > > Sorry, I must've misunderstood this as "splicing to a pipe generates > > > *ACCESS". Testing reveals this is not the case. So is it really true > > > that the only way to poll a pipe is a sleep()/read(O_NONBLOCK) loop? > > So why doesn't poll(3) work? AFAIK it should... > poll returns instantly with revents=POLLHUP for pipes that were closed > by the last writer. > > Thus, you're either in a hot loop or you have to explicitly detect this > and fall back to sleeping, which defeats the point of polling: I see. There are two ways around this: a) open the file descriptor with O_RDWR (so there's always at least one writer). b) when you get POLLHUP, just close the fd and open it again. In these cases poll(3) will behave as you need (tested)... Honza > -- >8 -- > #define _GNU_SOURCE > #include <errno.h> > #include <fcntl.h> > #include <poll.h> > #include <stdio.h> > #include <unistd.h> > int main() { > char buf[64 * 1024]; > struct pollfd pf = {.fd = 0, .events = POLLIN}; > size_t consec = 0; > for (ssize_t rd;;) { > while (poll(&pf, 1, -1) <= 0) > ; > if (pf.revents & POLLIN) { > while ((rd = read(0, buf, sizeof(buf))) == -1 && errno == EINTR) > ; > fprintf(stderr, "\nrd=%zd: %m\n", rd); > } > if (pf.revents & POLLHUP) { > if (!consec++) > fprintf(stderr, "\n\tPOLLHUPs"); > fprintf(stderr, "\r%zu", consec); > } else > consec = 0; > } > } > -- >8 -- > > And > -- >8 -- > $ ./rdr < fifo > > rd=12: Success > > 1779532 POLLHUPs > rd=5: Success > > 945087 POLLHUPs > rd=12: Success > ^C > -- >8 -- > corresponding to > -- >8 -- > $ cat > fifo > abc > def > ghi > ^D > $ echo zupa > fifo > $ cat > fifo > as > dsaa > asd > ^C > -- >8 -- -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: splice(-> FIFO) never wakes up inotify IN_MODIFY? 2023-06-26 15:00 ` Jan Kara @ 2023-06-26 15:15 ` Ahelenia Ziemiańska 2023-06-26 16:52 ` Jan Kara 0 siblings, 1 reply; 55+ messages in thread From: Ahelenia Ziemiańska @ 2023-06-26 15:15 UTC (permalink / raw) To: Jan Kara Cc: Amir Goldstein, Alexander Viro, Christian Brauner, linux-fsdevel, linux-kernel [-- Attachment #1: Type: text/plain, Size: 2342 bytes --] On Mon, Jun 26, 2023 at 05:00:01PM +0200, Jan Kara wrote: > On Mon 26-06-23 16:25:41, Ahelenia Ziemiańska wrote: > > On Mon, Jun 26, 2023 at 03:51:59PM +0200, Jan Kara wrote: > > > On Mon 26-06-23 14:57:55, Ahelenia Ziemiańska wrote: > > > > On Mon, Jun 26, 2023 at 02:19:42PM +0200, Ahelenia Ziemiańska wrote: > > > > > > splice(2) differentiates three different cases: > > > > > > if (ipipe && opipe) { > > > > > > ... > > > > > > if (ipipe) { > > > > > > ... > > > > > > if (opipe) { > > > > > > ... > > > > > > > > > > > > IN_ACCESS will only be generated for non-pipe input > > > > > > IN_MODIFY will only be generated for non-pipe output > > > > > > > > > > > > Similarly FAN_ACCESS_PERM fanotify permission events > > > > > > will only be generated for non-pipe input. > > > > Sorry, I must've misunderstood this as "splicing to a pipe generates > > > > *ACCESS". Testing reveals this is not the case. So is it really true > > > > that the only way to poll a pipe is a sleep()/read(O_NONBLOCK) loop? > > > So why doesn't poll(3) work? AFAIK it should... > > poll returns instantly with revents=POLLHUP for pipes that were closed > > by the last writer. > > > > Thus, you're either in a hot loop or you have to explicitly detect this > > and fall back to sleeping, which defeats the point of polling: > I see. There are two ways around this: > > a) open the file descriptor with O_RDWR (so there's always at least one > writer). Not allowed in the general case, since you need to be able to tail -f files you can't write to. > b) when you get POLLHUP, just close the fd and open it again. Not allowed semantically, since tail -f follows the file, not the name. > In these cases poll(3) will behave as you need (tested)... Alas, those are not applicable to the standard use-case. If only linux exposed a way to see if a file was written to! For reference with other implementations, this just works and is guaranteed to work under kqueue(2) EVFILT_READ (admittedly, kqueue(2) is an epoll(7)-style system and not an inotify(7)-style one, but it solves the issue, and that's what NetBSD tail -f uses). Maybe this is short-sighted but I don't actually really see why inotify is... expected? To only generate file-was-written events only for some writes? [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: splice(-> FIFO) never wakes up inotify IN_MODIFY? 2023-06-26 15:15 ` Ahelenia Ziemiańska @ 2023-06-26 16:52 ` Jan Kara 0 siblings, 0 replies; 55+ messages in thread From: Jan Kara @ 2023-06-26 16:52 UTC (permalink / raw) To: Ahelenia Ziemiańska Cc: Jan Kara, Amir Goldstein, Alexander Viro, Christian Brauner, linux-fsdevel, linux-kernel On Mon 26-06-23 17:15:23, Ahelenia Ziemiańska wrote: > On Mon, Jun 26, 2023 at 05:00:01PM +0200, Jan Kara wrote: > > On Mon 26-06-23 16:25:41, Ahelenia Ziemiańska wrote: > > > On Mon, Jun 26, 2023 at 03:51:59PM +0200, Jan Kara wrote: > > > > On Mon 26-06-23 14:57:55, Ahelenia Ziemiańska wrote: > > > > > On Mon, Jun 26, 2023 at 02:19:42PM +0200, Ahelenia Ziemiańska wrote: > > > > > > > splice(2) differentiates three different cases: > > > > > > > if (ipipe && opipe) { > > > > > > > ... > > > > > > > if (ipipe) { > > > > > > > ... > > > > > > > if (opipe) { > > > > > > > ... > > > > > > > > > > > > > > IN_ACCESS will only be generated for non-pipe input > > > > > > > IN_MODIFY will only be generated for non-pipe output > > > > > > > > > > > > > > Similarly FAN_ACCESS_PERM fanotify permission events > > > > > > > will only be generated for non-pipe input. > > > > > Sorry, I must've misunderstood this as "splicing to a pipe generates > > > > > *ACCESS". Testing reveals this is not the case. So is it really true > > > > > that the only way to poll a pipe is a sleep()/read(O_NONBLOCK) loop? > > > > So why doesn't poll(3) work? AFAIK it should... > > > poll returns instantly with revents=POLLHUP for pipes that were closed > > > by the last writer. > > > > > > Thus, you're either in a hot loop or you have to explicitly detect this > > > and fall back to sleeping, which defeats the point of polling: > > I see. There are two ways around this: > > > > a) open the file descriptor with O_RDWR (so there's always at least one > > writer). > Not allowed in the general case, since you need to be able to tail -f > files you can't write to. Hum, fair point. > > b) when you get POLLHUP, just close the fd and open it again. > Not allowed semantically, since tail -f follows the file, not the name. Well, you can workaround that by using /proc/<pid>/fd/ magic links for reopening. > > In these cases poll(3) will behave as you need (tested)... > Alas, those are not applicable to the standard use-case. > If only linux exposed a way to see if a file was written to! I agree that having to jump through the hoops with poll for this relatively standard usage is annoying. Looking into the code, the kernel actually has extra code to generate these repeated POLLHUPs because apparently that was how the poll was behaving ages ago. Hum, researching some more about this, epoll(7) actually doesn't have this problem. I've tested using epoll(2) (in edge-triggered case) instead of poll(2) and that doesn't return repeated POLLHUP events. > For reference with other implementations, > this just works and is guaranteed to work under kqueue(2) EVFILT_READ > (admittedly, kqueue(2) is an epoll(7)-style system and not an > inotify(7)-style one, but it solves the issue, > and that's what NetBSD tail -f uses). > > Maybe this is short-sighted but I don't actually really see why inotify > is... expected? To only generate file-was-written events only for some > writes? Well, inotify similarly as fanotify have been created as filesystem monitoring APIs. Not as general "file descriptor monitoring" APIs. So they work well with regular files and directories but for other objects such as sockets or pipes or even for these "looking like files" objects in virtual filesystems like /proc, the results are pretty much undefined. Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: splice(-> FIFO) never wakes up inotify IN_MODIFY? 2023-06-26 12:19 ` Ahelenia Ziemiańska 2023-06-26 12:57 ` Ahelenia Ziemiańska @ 2023-06-26 14:53 ` Amir Goldstein 2023-06-26 15:12 ` Ahelenia Ziemiańska 1 sibling, 1 reply; 55+ messages in thread From: Amir Goldstein @ 2023-06-26 14:53 UTC (permalink / raw) To: Ahelenia Ziemiańska Cc: Alexander Viro, Christian Brauner, linux-fsdevel, linux-kernel, Jan Kara On Mon, Jun 26, 2023 at 3:19 PM Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> wrote: > > On Mon, Jun 26, 2023 at 09:11:53AM +0300, Amir Goldstein wrote: > > On Mon, Jun 26, 2023 at 6:54 AM Ahelenia Ziemiańska > > <nabijaczleweli@nabijaczleweli.xyz> wrote: > > > > > > Hi! > > > > > > Consider the following programs: > > > -- >8 -- > > > ==> ino.c <== > > > #define _GNU_SOURCE > > > #include <stdio.h> > > > #include <sys/inotify.h> > > > #include <unistd.h> > > > int main() { > > > int ino = inotify_init1(IN_CLOEXEC); > > > inotify_add_watch(ino, "/dev/fd/0", IN_MODIFY); > > > > > > char buf[64 * 1024]; > > > struct inotify_event ev; > > > while (read(ino, &ev, sizeof(ev)) > 0) { > > > fprintf(stderr, "%d: mask=%x, cook=%x, len=%x, name=%.*s\n", ev.wd, ev.mask, > > > ev.cookie, ev.len, (int)ev.len, ev.name); > > > fprintf(stderr, "rd=%zd\n", read(0, buf, sizeof(buf))); > > > } > > > } > > > > > > > That's a very odd (and wrong) way to implement poll(2). > > This is not a documented way to use pipes, so it may > > happen to work with sendfile(2), but there is no guarantee. > That's what I'm trying to do, yes. > What's the right way to implement poll here? Because I don't think Linux > has poll for pipes that behaves like this and POSIX certainly doesn't > guarantee it, and, indeed, requires that polling a pipe that was hanged > up instantly returns with POLLHUP forever. > > > > -- >8 -- > > > $ make se sp ino > > > $ mkfifo fifo > > > $ ./ino < fifo & > > > [1] 230 > > > $ echo a > fifo > > > $ echo a > fifo > > > 1: mask=2, cook=0, len=0, name= > > > rd=4 > > > $ echo c > fifo > > > 1: mask=2, cook=0, len=0, name= > > > rd=2 > > > $ ./se > fifo > > > abcdef > > > 1: mask=2, cook=0, len=0, name= > > > asd > > > ^D > > > se=11: Success > > > rd=11 > > > 1: mask=2, cook=0, len=0, name= > > > rd=0 > > > $ ./sp > fifo > > > abcdefg > > > asd > > > dsasdadadad > > > sp=24: Success > > > $ < sp ./sp > fifo > > > sp=25856: Success > > > $ < sp ./sp > fifo > > > ^C > > > $ echo sp > fifo > > > ^C > > > -- >8 -- > > > > > > Note how in all ./sp > fifo cases, ./ino doesn't wake up! > > > Note also how, thus, we've managed to fill the pipe buffer with ./sp > > > (when it transferred 25856), and now we can't /ever/ write there again > > > (both splicing and normal writes block, since there's no space left in > > > the pipe; ./ino hasn't seen this and will never wake up or service the > > > pipe): > > > so we've effectively "denied service" by slickily using a different > > > syscall to do the write, right? > > > > > > I consider this to be unexpected behaviour because (a) obviously and > > > (b) sendfile() sends the inotify event. > > > > > Only applications that do not check for availability > > of input in the pipe correctly will get "denied service". > > > > The fact is that relying on inotify IN_MODIFY and IN_ACCESS events > > for pipes is not a good idea. > Okay, so how /is/ "correctly" then? > Sleep in a loop and read non-blockingly? > splice also breaks that (and, well, the pipe it's splicing to in general) > https://lore.kernel.org/linux-fsdevel/qk6hjuam54khlaikf2ssom6custxf5is2ekkaequf4hvode3ls@zgf7j5j4ubvw/t/#u > but that's beside the point I guess. > > > splice(2) differentiates three different cases: > > if (ipipe && opipe) { > > ... > > if (ipipe) { > > ... > > if (opipe) { > > ... > > > > IN_ACCESS will only be generated for non-pipe input > > IN_MODIFY will only be generated for non-pipe output > > > > Similarly FAN_ACCESS_PERM fanotify permission events > > will only be generated for non-pipe input. > inotify(7) and fanotify(7) don't squeak on that, > and imply the *ACCESS stuff is just for reading. Correct. > > > sendfile(2) OTOH does not special cases the pipe input > > case at all and it generates IN_MODIFY for the pipe output > > case as well. > > > > My general opinion about IN_ACCESS/IN_MODIFY > > (as well as FAN_ACCESS_PERM) is that they are not > > very practical, not well defined for pipes and anyway do > > not cover all the ways that a file can be modified/accessed > > (i.e. mmap). Therefore, IMO, there is no incentive to fix > > something that has been broken for decades unless > > you have a very real use case - not a made up one. > My made-up use-case is tail -f, but I can just request > IN_MOFIFY|IN_ACCESS for pipes, so if that's "correctly" then great. > If it isn't, then, again, how /do/ you poll pipes. > > > If you would insist on fixing this inconsistency, I would be > > willing to consider a patch that matches sendfile(2) behavior > > to that of splice(2) and not the other way around. > Meh, platform-specific API, long-standing behaviour, it's whatever; > I'll just update *notify(7) to include that *ACCESSses are generated > for "wants to/has read OR pipe was written". I guess you already understood that is not what I meant. > So is it really true that the only way to poll a pipe is a > sleep()/read(O_NONBLOCK) loop? I don't think so, but inotify is not the way. Thanks, Amir. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: splice(-> FIFO) never wakes up inotify IN_MODIFY? 2023-06-26 14:53 ` Amir Goldstein @ 2023-06-26 15:12 ` Ahelenia Ziemiańska 2023-06-26 16:21 ` Amir Goldstein 0 siblings, 1 reply; 55+ messages in thread From: Ahelenia Ziemiańska @ 2023-06-26 15:12 UTC (permalink / raw) To: Amir Goldstein Cc: Alexander Viro, Christian Brauner, linux-fsdevel, linux-kernel, Jan Kara [-- Attachment #1: Type: text/plain, Size: 587 bytes --] On Mon, Jun 26, 2023 at 05:53:46PM +0300, Amir Goldstein wrote: > > So is it really true that the only way to poll a pipe is a > > sleep()/read(O_NONBLOCK) loop? > I don't think so, but inotify is not the way. So what is? What do the kernel developers recommend as a way to see if a file is written to, and that file happens to be a pipe? FTR, I've opened the symmetric Debian#1039488: https://bugs.debian.org/1039488 against coreutils, since, if this is expected, and writing to a pipe should not generate write events on that pipe, then tail -f is currently broken on most systems. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: splice(-> FIFO) never wakes up inotify IN_MODIFY? 2023-06-26 15:12 ` Ahelenia Ziemiańska @ 2023-06-26 16:21 ` Amir Goldstein 2023-06-26 17:14 ` Ahelenia Ziemiańska 0 siblings, 1 reply; 55+ messages in thread From: Amir Goldstein @ 2023-06-26 16:21 UTC (permalink / raw) To: Ahelenia Ziemiańska Cc: Alexander Viro, Christian Brauner, linux-fsdevel, linux-kernel, Jan Kara, Chung-Chiang Cheng On Mon, Jun 26, 2023 at 6:12 PM Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> wrote: > > On Mon, Jun 26, 2023 at 05:53:46PM +0300, Amir Goldstein wrote: > > > So is it really true that the only way to poll a pipe is a > > > sleep()/read(O_NONBLOCK) loop? > > I don't think so, but inotify is not the way. > So what is? What do the kernel developers recommend as a way to see if a > file is written to, and that file happens to be a pipe? > > FTR, I've opened the symmetric Debian#1039488: > https://bugs.debian.org/1039488 > against coreutils, since, if this is expected, and writing to a pipe > should not generate write events on that pipe, then tail -f is currently > broken on most systems. First of all, it is better to mention that you are trying to fix a real world use case when you are reporting a kernel misbehavior. What this makes me wonder is, if tail -f <fifo> is broken as you claim it is, how is it that decades go by without anyone noticing this problem? When looking at tail source code I see: /* Mark as '.ignore'd each member of F that corresponds to a pipe or fifo, and return the number of non-ignored members. */ static size_t ignore_fifo_and_pipe (struct File_spec *f, size_t n_files) { /* When there is no FILE operand and stdin is a pipe or FIFO POSIX requires that tail ignore the -f option. Since we allow multiple FILE operands, we extend that to say: with -f, ignore any "-" operand that corresponds to a pipe or FIFO. */ and it looks like tail_forever_inotify() is not being called unless there are non pipes: if (forever && ignore_fifo_and_pipe (F, n_files)) { The semantics of tail -f on a pipe input would be very odd, because the writer would need to close before tail can figure out which are the last lines. So honestly, we could maybe add IN_ACCESS/IN_MODIFY for the splice_pipe_to_pipe() case, but I would really like to know what the real use case is. Another observation is that splice(2) never used to report any inotify events at all until a very recent commit in v6.4 983652c69199 ("splice: report related fsnotify events") but this commit left out the splice_pipe_to_pipe() case. CC the author of the patch to ask why this case was left out and whether he would be interested in fixing that. Thanks, Amir. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: splice(-> FIFO) never wakes up inotify IN_MODIFY? 2023-06-26 16:21 ` Amir Goldstein @ 2023-06-26 17:14 ` Ahelenia Ziemiańska 2023-06-26 18:57 ` Amir Goldstein 0 siblings, 1 reply; 55+ messages in thread From: Ahelenia Ziemiańska @ 2023-06-26 17:14 UTC (permalink / raw) To: Amir Goldstein Cc: Alexander Viro, Christian Brauner, linux-fsdevel, linux-kernel, Jan Kara, Chung-Chiang Cheng [-- Attachment #1: Type: text/plain, Size: 8103 bytes --] On Mon, Jun 26, 2023 at 07:21:16PM +0300, Amir Goldstein wrote: > On Mon, Jun 26, 2023 at 6:12 PM Ahelenia Ziemiańska > <nabijaczleweli@nabijaczleweli.xyz> wrote: > > > > On Mon, Jun 26, 2023 at 05:53:46PM +0300, Amir Goldstein wrote: > > > > So is it really true that the only way to poll a pipe is a > > > > sleep()/read(O_NONBLOCK) loop? > > > I don't think so, but inotify is not the way. > > So what is? What do the kernel developers recommend as a way to see if a > > file is written to, and that file happens to be a pipe? > > > > FTR, I've opened the symmetric Debian#1039488: > > https://bugs.debian.org/1039488 > > against coreutils, since, if this is expected, and writing to a pipe > > should not generate write events on that pipe, then tail -f is currently > > broken on most systems. > First of all, it is better to mention that you are trying to fix a real > world use case when you are reporting a kernel misbehavior. I hadn't actually realised this affected coreutils tail as well before re-testing it today. > What this makes me wonder is, if tail -f <fifo> is broken as you claim > it is, how is it that decades go by without anyone noticing this problem? Most people don't use cat(1) that splice(2)s, I do; even if they do, they probably haven't filled the whole buffer so the missed splice(2) write was potentially covered by a later write(2) write. > When looking at tail source code I see: > > /* Mark as '.ignore'd each member of F that corresponds to a > pipe or fifo, and return the number of non-ignored members. */ > static size_t > ignore_fifo_and_pipe (struct File_spec *f, size_t n_files) > { > /* When there is no FILE operand and stdin is a pipe or FIFO > POSIX requires that tail ignore the -f option. > Since we allow multiple FILE operands, we extend that to say: with -f, > ignore any "-" operand that corresponds to a pipe or FIFO. */ > > and it looks like tail_forever_inotify() is not being called unless > there are non pipes: > > if (forever && ignore_fifo_and_pipe (F, n_files)) > { > > The semantics of tail -f on a pipe input would be very odd, because > the writer would need to close before tail can figure out which are > the last lines. The semantics of tail -f for FIFOs are formalised in POSIX, which says (Issue 8 Draft 3): 115551 −f If the input file is a regular file or if the file operand specifies a FIFO, do not 115552 terminate after the last line of the input file has been copied, but read and copy 115553 further bytes from the input file when they become available. If no file operand is 115554 specified and standard input is a pipe or FIFO, the −f option shall be ignored. If the 115555 input file is not a FIFO, pipe, or regular file, it is unspecified whether or not the −f 115556 option shall be ignored. coreutils sensibly interprets these in accordance with https://www.mail-archive.com/austin-group-l@opengroup.org/msg11402.html There are no special provisions for pipes/FIFOs before the input is exhausted, correct: tail is two programs in one; the first bit reads the input(s) to completion and outputs the bit you wanted, the second bit (-f) keeps reading the inputs from where the first bit left off. (Note that tail with -c +123 and -n +123 doesn't "care" what lines are last, and just copies from byte/line 123, but that's beside the point. Indeed, "tail -c+1 fifo > log" is the idealised log collection program from before: many programs may write to fifo, and all output is collected in log.) But yes: tail -f fifo first reads the entire "contents" of fifo (where for pipes this is defined as "until all writers hang up"), then continues reading fifo and copying whatever it reads. On a strict single-file implementation you can get away with reading and then sleeping when you get 0 (this is what traditional UNIX tails do). When you have multiple files, well, you want to poll them, and since pipes are unpollable, to avoid waking up every second and reading every unpollable input file to see if you got something (regular files, fifos), you use inotify(7) (coreutils) or kqueue(2) (NetBSD, probably others) to tell you when there's data. If inotify(7) for pipes worked, the entire coreutils tail -f semantic is implementable as a single poll(2): * of each individual pollable (sockets, chardevs) * of inotify of unpollables (pipes, regular files) * of pidfd (if --pid) this is very attractive. Naturally, I could also fall back to just a poll of pollables and pidfd with a second timeout if there are pipes in the inputs, but you see how this is sub-optimal for no real good reason. And, well, coreutils tail doesn't do this, so it's vulnerable. > So honestly, we could maybe add IN_ACCESS/IN_MODIFY for the > splice_pipe_to_pipe() case, but I would really like to know what > the real use case is. And splice_file_to_pipe() which is what we're hitting here. The real use case is as I said: I would like to be able to poll pipes with inotify instead of with sleep()/read(). > Another observation is that splice(2) never used to report any > inotify events at all until a very recent commit in v6.4 > 983652c69199 ("splice: report related fsnotify events") > but this commit left out the splice_pipe_to_pipe() case. > > CC the author of the patch to ask why this case was left > out and whether he would be interested in fixing that. I'm reading the discussion following <20230322062519.409752-1-cccheng@synology.com> as "people just forget to add inotify hooks to their I/O routines as a rule", thus my guess on why it was left out was "didn't even cross my mind" (or, perhaps "didn't think we even supported fsnotify for pipes"). Below you'll find a scissor-patch based on current linus HEAD; I've tested it works as-expected for both tty-to-pipe and pipe-to-pipe splices in my original reproducer. -- >8 -- From: =?UTF-8?q?Ahelenia=20Ziemia=C5=84ska?= <nabijaczleweli@nabijaczleweli.xyz> Date: Mon, 26 Jun 2023 19:02:28 +0200 Subject: [PATCH] splice: always fsnotify_access(in), fsnotify_modify(out) on success The current behaviour caused an asymmetry where some write APIs (write, sendfile) would notify the written-to/read-from objects, but splice wouldn't. This affected userspace which used inotify, like coreutils tail -f. Link: https://lore.kernel.org/linux-fsdevel/jbyihkyk5dtaohdwjyivambb2gffyjs3dodpofafnkkunxq7bu@jngkdxx65pux/t/#u Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> --- fs/splice.c | 21 ++++++++++++--------- 1 file changed, 12 insertions(+), 9 deletions(-) diff --git a/fs/splice.c b/fs/splice.c index 3e06611d19ae..94fae24f9d54 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -1154,7 +1154,8 @@ long do_splice(struct file *in, loff_t *off_in, struct file *out, if ((in->f_flags | out->f_flags) & O_NONBLOCK) flags |= SPLICE_F_NONBLOCK; - return splice_pipe_to_pipe(ipipe, opipe, len, flags); + ret = splice_pipe_to_pipe(ipipe, opipe, len, flags); + goto notify; } if (ipipe) { @@ -1182,15 +1183,12 @@ long do_splice(struct file *in, loff_t *off_in, struct file *out, ret = do_splice_from(ipipe, out, &offset, len, flags); file_end_write(out); - if (ret > 0) - fsnotify_modify(out); - if (!off_out) out->f_pos = offset; else *off_out = offset; - return ret; + goto notify; } if (opipe) { @@ -1209,18 +1207,23 @@ long do_splice(struct file *in, loff_t *off_in, struct file *out, ret = splice_file_to_pipe(in, opipe, &offset, len, flags); - if (ret > 0) - fsnotify_access(in); - if (!off_in) in->f_pos = offset; else *off_in = offset; - return ret; + goto notify; } return -EINVAL; + +notify: + if (ret > 0) { + fsnotify_access(in); + fsnotify_modify(out); + } + + return ret; } static long __do_splice(struct file *in, loff_t __user *off_in, -- 2.39.2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 55+ messages in thread
* Re: splice(-> FIFO) never wakes up inotify IN_MODIFY? 2023-06-26 17:14 ` Ahelenia Ziemiańska @ 2023-06-26 18:57 ` Amir Goldstein 2023-06-26 23:08 ` [PATCH v2 0/3] fanotify accounting for fs/splice.c наб ` (3 more replies) 0 siblings, 4 replies; 55+ messages in thread From: Amir Goldstein @ 2023-06-26 18:57 UTC (permalink / raw) To: Ahelenia Ziemiańska Cc: Alexander Viro, Christian Brauner, linux-fsdevel, linux-kernel, Jan Kara, Chung-Chiang Cheng On Mon, Jun 26, 2023 at 8:14 PM Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> wrote: > > On Mon, Jun 26, 2023 at 07:21:16PM +0300, Amir Goldstein wrote: > > On Mon, Jun 26, 2023 at 6:12 PM Ahelenia Ziemiańska > > <nabijaczleweli@nabijaczleweli.xyz> wrote: > > > > > > On Mon, Jun 26, 2023 at 05:53:46PM +0300, Amir Goldstein wrote: > > > > > So is it really true that the only way to poll a pipe is a > > > > > sleep()/read(O_NONBLOCK) loop? > > > > I don't think so, but inotify is not the way. > > > So what is? What do the kernel developers recommend as a way to see if a > > > file is written to, and that file happens to be a pipe? > > > > > > FTR, I've opened the symmetric Debian#1039488: > > > https://bugs.debian.org/1039488 > > > against coreutils, since, if this is expected, and writing to a pipe > > > should not generate write events on that pipe, then tail -f is currently > > > broken on most systems. > > First of all, it is better to mention that you are trying to fix a real > > world use case when you are reporting a kernel misbehavior. > I hadn't actually realised this affected coreutils tail as well before > re-testing it today. > > > What this makes me wonder is, if tail -f <fifo> is broken as you claim > > it is, how is it that decades go by without anyone noticing this problem? > Most people don't use cat(1) that splice(2)s, I do; > even if they do, they probably haven't filled the whole buffer so the > missed splice(2) write was potentially covered by a later write(2) write. > > > When looking at tail source code I see: > > > > /* Mark as '.ignore'd each member of F that corresponds to a > > pipe or fifo, and return the number of non-ignored members. */ > > static size_t > > ignore_fifo_and_pipe (struct File_spec *f, size_t n_files) > > { > > /* When there is no FILE operand and stdin is a pipe or FIFO > > POSIX requires that tail ignore the -f option. > > Since we allow multiple FILE operands, we extend that to say: with -f, > > ignore any "-" operand that corresponds to a pipe or FIFO. */ > > > > and it looks like tail_forever_inotify() is not being called unless > > there are non pipes: > > > > if (forever && ignore_fifo_and_pipe (F, n_files)) > > { > > > > The semantics of tail -f on a pipe input would be very odd, because > > the writer would need to close before tail can figure out which are > > the last lines. > The semantics of tail -f for FIFOs are formalised in POSIX, which says > (Issue 8 Draft 3): > 115551 −f If the input file is a regular file or if the file operand specifies a FIFO, do not > 115552 terminate after the last line of the input file has been copied, but read and copy > 115553 further bytes from the input file when they become available. If no file operand is > 115554 specified and standard input is a pipe or FIFO, the −f option shall be ignored. If the > 115555 input file is not a FIFO, pipe, or regular file, it is unspecified whether or not the −f > 115556 option shall be ignored. > coreutils sensibly interprets these in accordance with > https://www.mail-archive.com/austin-group-l@opengroup.org/msg11402.html > > There are no special provisions for pipes/FIFOs before the input is > exhausted, correct: tail is two programs in one; the first bit reads the > input(s) to completion and outputs the bit you wanted, the second bit > (-f) keeps reading the inputs from where the first bit left off. > > (Note that tail with -c +123 and -n +123 doesn't "care" what lines are > last, and just copies from byte/line 123, but that's beside the point. > Indeed, "tail -c+1 fifo > log" is the idealised log collection program > from before: many programs may write to fifo, and all output is > collected in log.) > > But yes: tail -f fifo first reads the entire "contents" of fifo > (where for pipes this is defined as "until all writers hang up"), > then continues reading fifo and copying whatever it reads. > On a strict single-file implementation you can get away with reading and > then sleeping when you get 0 (this is what traditional UNIX tails do). > > When you have multiple files, well, you want to poll them, and since > pipes are unpollable, to avoid waking up every second and reading every > unpollable input file to see if you got something (regular files, fifos), > you use inotify(7) (coreutils) or kqueue(2) (NetBSD, probably others) > to tell you when there's data. > > If inotify(7) for pipes worked, the entire coreutils tail -f semantic > is implementable as a single poll(2): > * of each individual pollable (sockets, chardevs) > * of inotify of unpollables (pipes, regular files) > * of pidfd (if --pid) > this is very attractive. Naturally, I could also fall back to just a > poll of pollables and pidfd with a second timeout if there are pipes in > the inputs, but you see how this is sub-optimal for no real good reason. > And, well, coreutils tail doesn't do this, so it's vulnerable. > Thanks for the explanation. > > So honestly, we could maybe add IN_ACCESS/IN_MODIFY for the > > splice_pipe_to_pipe() case, but I would really like to know what > > the real use case is. > And splice_file_to_pipe() which is what we're hitting here. > The real use case is as I said: I would like to be able to poll pipes > with inotify instead of with sleep()/read(). > > > Another observation is that splice(2) never used to report any > > inotify events at all until a very recent commit in v6.4 > > 983652c69199 ("splice: report related fsnotify events") > > but this commit left out the splice_pipe_to_pipe() case. > > > > CC the author of the patch to ask why this case was left > > out and whether he would be interested in fixing that. > I'm reading the discussion following > <20230322062519.409752-1-cccheng@synology.com> as > "people just forget to add inotify hooks to their I/O routines as a rule", > thus my guess on why it was left out was "didn't even cross my mind" > (or, perhaps "didn't think we even supported fsnotify for pipes"). > > Below you'll find a scissor-patch based on current linus HEAD; > I've tested it works as-expected for both tty-to-pipe and pipe-to-pipe > splices in my original reproducer. > -- >8 -- > From: =?UTF-8?q?Ahelenia=20Ziemia=C5=84ska?= > <nabijaczleweli@nabijaczleweli.xyz> > Date: Mon, 26 Jun 2023 19:02:28 +0200 > Subject: [PATCH] splice: always fsnotify_access(in), fsnotify_modify(out) on > success > > The current behaviour caused an asymmetry where some write APIs > (write, sendfile) would notify the written-to/read-from objects, > but splice wouldn't. > > This affected userspace which used inotify, like coreutils tail -f. > > Link: https://lore.kernel.org/linux-fsdevel/jbyihkyk5dtaohdwjyivambb2gffyjs3dodpofafnkkunxq7bu@jngkdxx65pux/t/#u > Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> (Create dependency in case they should be backported) Fixes: 983652c69199 ("splice: report related fsnotify events") Makes sense. Reviewed-by: Amir Goldstein <amir73il@gmail.com> > --- > fs/splice.c | 21 ++++++++++++--------- > 1 file changed, 12 insertions(+), 9 deletions(-) > > diff --git a/fs/splice.c b/fs/splice.c > index 3e06611d19ae..94fae24f9d54 100644 > --- a/fs/splice.c > +++ b/fs/splice.c > @@ -1154,7 +1154,8 @@ long do_splice(struct file *in, loff_t *off_in, struct file *out, > if ((in->f_flags | out->f_flags) & O_NONBLOCK) > flags |= SPLICE_F_NONBLOCK; > > - return splice_pipe_to_pipe(ipipe, opipe, len, flags); > + ret = splice_pipe_to_pipe(ipipe, opipe, len, flags); > + goto notify; > } > > if (ipipe) { > @@ -1182,15 +1183,12 @@ long do_splice(struct file *in, loff_t *off_in, struct file *out, > ret = do_splice_from(ipipe, out, &offset, len, flags); > file_end_write(out); > > - if (ret > 0) > - fsnotify_modify(out); > - > if (!off_out) > out->f_pos = offset; > else > *off_out = offset; > > - return ret; > + goto notify; > } > > if (opipe) { > @@ -1209,18 +1207,23 @@ long do_splice(struct file *in, loff_t *off_in, struct file *out, > > ret = splice_file_to_pipe(in, opipe, &offset, len, flags); > > - if (ret > 0) > - fsnotify_access(in); > - > if (!off_in) > in->f_pos = offset; > else > *off_in = offset; > > - return ret; > + goto notify; > } > > return -EINVAL; > + > +notify: > + if (ret > 0) { > + fsnotify_access(in); > + fsnotify_modify(out); > + } > + > + return ret; > } > > static long __do_splice(struct file *in, loff_t __user *off_in, > -- > 2.39.2 > ^ permalink raw reply [flat|nested] 55+ messages in thread
* [PATCH v2 0/3] fanotify accounting for fs/splice.c 2023-06-26 18:57 ` Amir Goldstein @ 2023-06-26 23:08 ` наб 2023-06-27 6:14 ` Amir Goldstein 2023-06-26 23:09 ` [PATCH v2 1/3] splice: always fsnotify_access(in), fsnotify_modify(out) on success Ahelenia Ziemiańska ` (2 subsequent siblings) 3 siblings, 1 reply; 55+ messages in thread From: наб @ 2023-06-26 23:08 UTC (permalink / raw) To: Amir Goldstein Cc: Alexander Viro, Christian Brauner, linux-fsdevel, linux-kernel, Jan Kara, Chung-Chiang Cheng [-- Attachment #1: Type: text/plain, Size: 1668 bytes --] "people just forget to add inotify hooks to their I/O routines as a rule"? Guess what I did, fully knowing that some are missing in this file :) ==> te.c <== #define _GNU_SOURCE #include <fcntl.h> #include <stdio.h> int main() { ssize_t rd, acc = 0; while ((rd = tee(0, 1, 128 * 1024 * 1024, 0)) > 0) acc += rd; fprintf(stderr, "te=%zd: %m\n", acc); } ==> vm.c <== #define _GNU_SOURCE #include <fcntl.h> #include <stdio.h> #include <string.h> static char sb[1024 * 1024]; int main() { memcpy(sb, "żupan", sizeof("żupan")); ssize_t rd = vmsplice(1, &(struct iovec){.iov_base = sb, .iov_len = sizeof(sb)}, 1, SPLICE_F_GIFT); fprintf(stderr, "vm=%zd: %m\n", rd); } echo zupa | ./te > fifo tees a few times and then blocks when the pipe fills, at which point we get into the broken state. Similarly, ./vm > fifo (with the default 64k F_GETPIPE_SZ) enters that same state instantly. With 2/3 and 3/3, they instead do 1: mask=2, cook=0, len=0, name= rd=80 1: mask=2, cook=0, len=0, name= rd=80 ... in a loop, as-expected, and # ./vm > fifo vm=65200: Success 1: mask=2, cook=0, len=0, name= rd=65200 I took the liberty of marking 2/3 and 3/3 as Fixes: of the original fanotify-in-splice commit as well, I think they fit the bill. Ahelenia Ziemiańska (3): splice: always fsnotify_access(in), fsnotify_modify(out) on success splice: fsnotify_modify(fd) in vmsplice splice: fsnotify_access(in), fsnotify_modify(out) on success in tee fs/splice.c | 29 ++++++++++++++++++++--------- 1 file changed, 20 insertions(+), 9 deletions(-) -- 2.39.2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH v2 0/3] fanotify accounting for fs/splice.c 2023-06-26 23:08 ` [PATCH v2 0/3] fanotify accounting for fs/splice.c наб @ 2023-06-27 6:14 ` Amir Goldstein 2023-06-27 16:55 ` [PATCH v3 0/3+1] " Ahelenia Ziemiańska 0 siblings, 1 reply; 55+ messages in thread From: Amir Goldstein @ 2023-06-27 6:14 UTC (permalink / raw) To: наб Cc: Alexander Viro, Christian Brauner, linux-fsdevel, linux-kernel, Jan Kara, Chung-Chiang Cheng On Tue, Jun 27, 2023 at 2:08 AM наб <nabijaczleweli@nabijaczleweli.xyz> wrote: > > "people just forget to add inotify hooks to their I/O routines as a rule"? > Guess what I did, fully knowing that some are missing in this file :) > > ==> te.c <== > #define _GNU_SOURCE > #include <fcntl.h> > #include <stdio.h> > int main() { > ssize_t rd, acc = 0; > while ((rd = tee(0, 1, 128 * 1024 * 1024, 0)) > 0) > acc += rd; > fprintf(stderr, "te=%zd: %m\n", acc); > } > > ==> vm.c <== > #define _GNU_SOURCE > #include <fcntl.h> > #include <stdio.h> > #include <string.h> > static char sb[1024 * 1024]; > int main() { > memcpy(sb, "żupan", sizeof("żupan")); > ssize_t rd = > vmsplice(1, &(struct iovec){.iov_base = sb, .iov_len = sizeof(sb)}, 1, > SPLICE_F_GIFT); > fprintf(stderr, "vm=%zd: %m\n", rd); > } > > > echo zupa | ./te > fifo tees a few times and then blocks when the pipe > fills, at which point we get into the broken state. > > Similarly, ./vm > fifo (with the default 64k F_GETPIPE_SZ) enters that > same state instantly. > > With 2/3 and 3/3, they instead do > 1: mask=2, cook=0, len=0, name= > rd=80 > 1: mask=2, cook=0, len=0, name= > rd=80 > ... > in a loop, as-expected, and > # ./vm > fifo > vm=65200: Success > 1: mask=2, cook=0, len=0, name= > rd=65200 > > I took the liberty of marking 2/3 and 3/3 as Fixes: of the original > fanotify-in-splice commit as well, I think they fit the bill. > Thank you for doing this thorough research on all the variants! It would be great if you could add test coverage for these syscalls. Simplest would be to clone an LTP inotify test, e.g. ltp/testcases/kernel/syscalls/inotify/inotify01 for the different splice syscall variants (sendfile as well). LTP already has other tests for all those syscalls, so there are plenty of examples of how to use the LTP helpers to test those syscalls. You can either clone one inotify test per syscall, or clone one inotify test that creates a fifo instead of a file that inotify watches and use a test cases array for the different syscalls to test (see example of test cases array in inotify10 test). Thanks, Amir. ^ permalink raw reply [flat|nested] 55+ messages in thread
* [PATCH v3 0/3+1] fanotify accounting for fs/splice.c 2023-06-27 6:14 ` Amir Goldstein @ 2023-06-27 16:55 ` Ahelenia Ziemiańska 2023-06-27 16:55 ` [PATCH v3 1/3] splice: always fsnotify_access(in), fsnotify_modify(out) on success Ahelenia Ziemiańska ` (5 more replies) 0 siblings, 6 replies; 55+ messages in thread From: Ahelenia Ziemiańska @ 2023-06-27 16:55 UTC (permalink / raw) To: Amir Goldstein Cc: Alexander Viro, Christian Brauner, linux-fsdevel, linux-kernel, Jan Kara, Chung-Chiang Cheng, ltp [-- Attachment #1: Type: text/plain, Size: 3694 bytes --] In 1/3 I've applied if/else if/else tree like you said, and expounded a bit in the message. This is less pretty now, however, since it turns out that iter_file_splice_write() already marks the out fd as written because it writes to it via vfs_iter_write(), and that sent a double notification. $ git grep -F .splice_write | grep -v iter_file_splice_write drivers/char/mem.c: .splice_write = splice_write_null, drivers/char/virtio_console.c: .splice_write = port_fops_splice_write, fs/fuse/dev.c: .splice_write = fuse_dev_splice_write, fs/gfs2/file.c: .splice_write = gfs2_file_splice_write, fs/gfs2/file.c: .splice_write = gfs2_file_splice_write, fs/overlayfs/file.c: .splice_write = ovl_splice_write, net/socket.c: .splice_write = generic_splice_sendpage, scripts/coccinelle/api/stream_open.cocci: .splice_write = splice_write_f, Of these, splice_write_null() doesn't mark out as written (but it's for /dev/null so I think this is expected), and I haven't been able to visually confirm whether port_fops_splice_write() and generic_splice_sendpage() do. All the others delegate to iter_file_splice_write(). In 2/3 I fixed the vmsplice notification placement (access from pipe, modify to pipe). I'm following this up with an LTP patch, where only sendfile_file_to_pipe passes on 6.1.27-1 and all tests pass on v6.4 + this patchset. Ahelenia Ziemiańska (3): splice: always fsnotify_access(in), fsnotify_modify(out) on success splice: fsnotify_access(fd)/fsnotify_modify(fd) in vmsplice splice: fsnotify_access(in), fsnotify_modify(out) on success in tee fs/splice.c | 43 +++++++++++++++++++++++++------------------ 1 file changed, 25 insertions(+), 18 deletions(-) Interdiff against v2: diff --git a/fs/splice.c b/fs/splice.c index 3234aaa6e957..0427f0a91c7d 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -1155,10 +1155,7 @@ long do_splice(struct file *in, loff_t *off_in, struct file *out, flags |= SPLICE_F_NONBLOCK; ret = splice_pipe_to_pipe(ipipe, opipe, len, flags); - goto notify; - } - - if (ipipe) { + } else if (ipipe) { if (off_in) return -ESPIPE; if (off_out) { @@ -1188,10 +1185,10 @@ long do_splice(struct file *in, loff_t *off_in, struct file *out, else *off_out = offset; - goto notify; - } - - if (opipe) { + // ->splice_write already marked out + // as modified via vfs_iter_write() + goto noaccessout; + } else if (opipe) { if (off_out) return -ESPIPE; if (off_in) { @@ -1211,17 +1208,14 @@ long do_splice(struct file *in, loff_t *off_in, struct file *out, in->f_pos = offset; else *off_in = offset; + } else + return -EINVAL; - goto notify; - } - - return -EINVAL; - -notify: - if (ret > 0) { - fsnotify_access(in); + if (ret > 0) fsnotify_modify(out); - } +noaccessout: + if (ret > 0) + fsnotify_access(in); return ret; } @@ -1352,6 +1346,9 @@ static long vmsplice_to_user(struct file *file, struct iov_iter *iter, pipe_unlock(pipe); } + if (ret > 0) + fsnotify_access(file); + return ret; } @@ -1381,8 +1378,10 @@ static long vmsplice_to_pipe(struct file *file, struct iov_iter *iter, if (!ret) ret = iter_to_pipe(iter, pipe, buf_flag); pipe_unlock(pipe); - if (ret > 0) + if (ret > 0) { wakeup_pipe_readers(pipe); + fsnotify_modify(file); + } return ret; } @@ -1447,9 +1446,6 @@ SYSCALL_DEFINE4(vmsplice, int, fd, const struct iovec __user *, uiov, else error = vmsplice_to_user(f.file, &iter, flags); - if (error > 0) - fsnotify_modify(f.file); - kfree(iov); out_fdput: fdput(f); -- 2.39.2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 55+ messages in thread
* [PATCH v3 1/3] splice: always fsnotify_access(in), fsnotify_modify(out) on success 2023-06-27 16:55 ` [PATCH v3 0/3+1] " Ahelenia Ziemiańska @ 2023-06-27 16:55 ` Ahelenia Ziemiańska 2023-06-27 18:10 ` Amir Goldstein 2023-06-27 16:55 ` [PATCH v3 2/3] splice: fsnotify_access(fd)/fsnotify_modify(fd) in vmsplice Ahelenia Ziemiańska ` (4 subsequent siblings) 5 siblings, 1 reply; 55+ messages in thread From: Ahelenia Ziemiańska @ 2023-06-27 16:55 UTC (permalink / raw) To: Amir Goldstein Cc: Alexander Viro, Christian Brauner, linux-fsdevel, linux-kernel, Jan Kara, Chung-Chiang Cheng, ltp [-- Attachment #1: Type: text/plain, Size: 2608 bytes --] The current behaviour caused an asymmetry where some write APIs (write, sendfile) would notify the written-to/read-from objects, but splice wouldn't. This affected userspace which uses inotify, most notably coreutils tail -f, to monitor pipes. If the pipe buffer had been filled by a splice-family function: * tail wouldn't know and thus wouldn't service the pipe, and * all writes to the pipe would block because it's full, thus service was denied. (For the particular case of tail -f this could be worked around with ---disable-inotify.) Fixes: 983652c69199 ("splice: report related fsnotify events") Link: https://lore.kernel.org/linux-fsdevel/jbyihkyk5dtaohdwjyivambb2gffyjs3dodpofafnkkunxq7bu@jngkdxx65pux/t/#u Link: https://bugs.debian.org/1039488 Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Reviewed-by: Amir Goldstein <amir73il@gmail.com> --- fs/splice.c | 31 ++++++++++++++----------------- 1 file changed, 14 insertions(+), 17 deletions(-) diff --git a/fs/splice.c b/fs/splice.c index 3e06611d19ae..e16f4f032d2f 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -1154,10 +1154,8 @@ long do_splice(struct file *in, loff_t *off_in, struct file *out, if ((in->f_flags | out->f_flags) & O_NONBLOCK) flags |= SPLICE_F_NONBLOCK; - return splice_pipe_to_pipe(ipipe, opipe, len, flags); - } - - if (ipipe) { + ret = splice_pipe_to_pipe(ipipe, opipe, len, flags); + } else if (ipipe) { if (off_in) return -ESPIPE; if (off_out) { @@ -1182,18 +1180,15 @@ long do_splice(struct file *in, loff_t *off_in, struct file *out, ret = do_splice_from(ipipe, out, &offset, len, flags); file_end_write(out); - if (ret > 0) - fsnotify_modify(out); - if (!off_out) out->f_pos = offset; else *off_out = offset; - return ret; - } - - if (opipe) { + // splice_write-> already marked out + // as modified via vfs_iter_write() + goto noaccessout; + } else if (opipe) { if (off_out) return -ESPIPE; if (off_in) { @@ -1209,18 +1204,20 @@ long do_splice(struct file *in, loff_t *off_in, struct file *out, ret = splice_file_to_pipe(in, opipe, &offset, len, flags); - if (ret > 0) - fsnotify_access(in); - if (!off_in) in->f_pos = offset; else *off_in = offset; + } else + return -EINVAL; - return ret; - } + if (ret > 0) + fsnotify_modify(out); +noaccessout: + if (ret > 0) + fsnotify_access(in); - return -EINVAL; + return ret; } static long __do_splice(struct file *in, loff_t __user *off_in, -- 2.39.2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 55+ messages in thread
* Re: [PATCH v3 1/3] splice: always fsnotify_access(in), fsnotify_modify(out) on success 2023-06-27 16:55 ` [PATCH v3 1/3] splice: always fsnotify_access(in), fsnotify_modify(out) on success Ahelenia Ziemiańska @ 2023-06-27 18:10 ` Amir Goldstein 2023-06-27 20:13 ` Ahelenia Ziemiańska 0 siblings, 1 reply; 55+ messages in thread From: Amir Goldstein @ 2023-06-27 18:10 UTC (permalink / raw) To: Ahelenia Ziemiańska Cc: Alexander Viro, Christian Brauner, linux-fsdevel, linux-kernel, Jan Kara, Chung-Chiang Cheng, ltp On Tue, Jun 27, 2023 at 7:55 PM Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> wrote: > > The current behaviour caused an asymmetry where some write APIs > (write, sendfile) would notify the written-to/read-from objects, > but splice wouldn't. > > This affected userspace which uses inotify, most notably coreutils > tail -f, to monitor pipes. > If the pipe buffer had been filled by a splice-family function: > * tail wouldn't know and thus wouldn't service the pipe, and > * all writes to the pipe would block because it's full, > thus service was denied. > (For the particular case of tail -f this could be worked around > with ---disable-inotify.) > > Fixes: 983652c69199 ("splice: report related fsnotify events") > Link: https://lore.kernel.org/linux-fsdevel/jbyihkyk5dtaohdwjyivambb2gffyjs3dodpofafnkkunxq7bu@jngkdxx65pux/t/#u > Link: https://bugs.debian.org/1039488 > Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> > Reviewed-by: Amir Goldstein <amir73il@gmail.com> > --- > fs/splice.c | 31 ++++++++++++++----------------- > 1 file changed, 14 insertions(+), 17 deletions(-) > > diff --git a/fs/splice.c b/fs/splice.c > index 3e06611d19ae..e16f4f032d2f 100644 > --- a/fs/splice.c > +++ b/fs/splice.c > @@ -1154,10 +1154,8 @@ long do_splice(struct file *in, loff_t *off_in, struct file *out, > if ((in->f_flags | out->f_flags) & O_NONBLOCK) > flags |= SPLICE_F_NONBLOCK; > > - return splice_pipe_to_pipe(ipipe, opipe, len, flags); > - } > - > - if (ipipe) { > + ret = splice_pipe_to_pipe(ipipe, opipe, len, flags); > + } else if (ipipe) { > if (off_in) > return -ESPIPE; > if (off_out) { > @@ -1182,18 +1180,15 @@ long do_splice(struct file *in, loff_t *off_in, struct file *out, > ret = do_splice_from(ipipe, out, &offset, len, flags); > file_end_write(out); > > - if (ret > 0) > - fsnotify_modify(out); > - > if (!off_out) > out->f_pos = offset; > else > *off_out = offset; > > - return ret; > - } > - > - if (opipe) { > + // splice_write-> already marked out > + // as modified via vfs_iter_write() > + goto noaccessout; > + } else if (opipe) { > if (off_out) > return -ESPIPE; > if (off_in) { > @@ -1209,18 +1204,20 @@ long do_splice(struct file *in, loff_t *off_in, struct file *out, > > ret = splice_file_to_pipe(in, opipe, &offset, len, flags); > > - if (ret > 0) > - fsnotify_access(in); > - > if (!off_in) > in->f_pos = offset; > else > *off_in = offset; > + } else > + return -EINVAL; > > - return ret; > - } > + if (ret > 0) > + fsnotify_modify(out); > +noaccessout: > + if (ret > 0) > + fsnotify_access(in); > As I wrote, I don't like this special case. I prefer that we generate double IN_MODIFY than having to maintain unreadable code. Let's see what Jan has to say about this. Thanks, Amir. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH v3 1/3] splice: always fsnotify_access(in), fsnotify_modify(out) on success 2023-06-27 18:10 ` Amir Goldstein @ 2023-06-27 20:13 ` Ahelenia Ziemiańska 0 siblings, 0 replies; 55+ messages in thread From: Ahelenia Ziemiańska @ 2023-06-27 20:13 UTC (permalink / raw) To: Amir Goldstein Cc: Alexander Viro, Christian Brauner, linux-fsdevel, linux-kernel, Jan Kara, Chung-Chiang Cheng, ltp [-- Attachment #1: Type: text/plain, Size: 1345 bytes --] On Tue, Jun 27, 2023 at 09:10:09PM +0300, Amir Goldstein wrote: > On Tue, Jun 27, 2023 at 7:55 PM Ahelenia Ziemiańska > <nabijaczleweli@nabijaczleweli.xyz> wrote: > > @@ -1209,18 +1204,20 @@ long do_splice(struct file *in, loff_t *off_in, struct file *out, > > > > ret = splice_file_to_pipe(in, opipe, &offset, len, flags); > > > > - if (ret > 0) > > - fsnotify_access(in); > > - > > if (!off_in) > > in->f_pos = offset; > > else > > *off_in = offset; > > + } else > > + return -EINVAL; > > > > - return ret; > > - } > > + if (ret > 0) > > + fsnotify_modify(out); > > +noaccessout: > > + if (ret > 0) > > + fsnotify_access(in); > > > As I wrote, I don't like this special case. > I prefer that we generate double IN_MODIFY than > having to maintain unreadable code. > > Let's see what Jan has to say about this. Yes, in principle I definitely agree, but I don't know what the official policy is on (effectively-)spurious/duplicate events; neither the kernel documentation nor the manual speak to the reliability of the signal, so I defaulted to the variant I thought to be correcter, if filthy. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 55+ messages in thread
* [PATCH v3 2/3] splice: fsnotify_access(fd)/fsnotify_modify(fd) in vmsplice 2023-06-27 16:55 ` [PATCH v3 0/3+1] " Ahelenia Ziemiańska 2023-06-27 16:55 ` [PATCH v3 1/3] splice: always fsnotify_access(in), fsnotify_modify(out) on success Ahelenia Ziemiańska @ 2023-06-27 16:55 ` Ahelenia Ziemiańska 2023-06-27 18:11 ` Amir Goldstein 2023-06-27 16:55 ` [PATCH v3 3/3] splice: fsnotify_access(in), fsnotify_modify(out) on success in tee Ahelenia Ziemiańska ` (3 subsequent siblings) 5 siblings, 1 reply; 55+ messages in thread From: Ahelenia Ziemiańska @ 2023-06-27 16:55 UTC (permalink / raw) To: Amir Goldstein Cc: Alexander Viro, Christian Brauner, linux-fsdevel, linux-kernel, Jan Kara, Chung-Chiang Cheng, ltp [-- Attachment #1: Type: text/plain, Size: 1121 bytes --] Same logic applies here: this can fill up the pipe and pollers that rely on getting IN_MODIFY notifications never wake up. Fixes: 983652c69199 ("splice: report related fsnotify events") Link: https://lore.kernel.org/linux-fsdevel/jbyihkyk5dtaohdwjyivambb2gffyjs3dodpofafnkkunxq7bu@jngkdxx65pux/t/#u Link: https://bugs.debian.org/1039488 Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> --- fs/splice.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/fs/splice.c b/fs/splice.c index e16f4f032d2f..0eb36e93c030 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -1346,6 +1346,9 @@ static long vmsplice_to_user(struct file *file, struct iov_iter *iter, pipe_unlock(pipe); } + if (ret > 0) + fsnotify_access(file); + return ret; } @@ -1375,8 +1378,10 @@ static long vmsplice_to_pipe(struct file *file, struct iov_iter *iter, if (!ret) ret = iter_to_pipe(iter, pipe, buf_flag); pipe_unlock(pipe); - if (ret > 0) + if (ret > 0) { wakeup_pipe_readers(pipe); + fsnotify_modify(file); + } return ret; } -- 2.39.2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 55+ messages in thread
* Re: [PATCH v3 2/3] splice: fsnotify_access(fd)/fsnotify_modify(fd) in vmsplice 2023-06-27 16:55 ` [PATCH v3 2/3] splice: fsnotify_access(fd)/fsnotify_modify(fd) in vmsplice Ahelenia Ziemiańska @ 2023-06-27 18:11 ` Amir Goldstein 0 siblings, 0 replies; 55+ messages in thread From: Amir Goldstein @ 2023-06-27 18:11 UTC (permalink / raw) To: Ahelenia Ziemiańska Cc: Alexander Viro, Christian Brauner, linux-fsdevel, linux-kernel, Jan Kara, Chung-Chiang Cheng, ltp On Tue, Jun 27, 2023 at 7:55 PM Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> wrote: > > Same logic applies here: this can fill up the pipe and pollers that rely > on getting IN_MODIFY notifications never wake up. > > Fixes: 983652c69199 ("splice: report related fsnotify events") > Link: https://lore.kernel.org/linux-fsdevel/jbyihkyk5dtaohdwjyivambb2gffyjs3dodpofafnkkunxq7bu@jngkdxx65pux/t/#u > Link: https://bugs.debian.org/1039488 > Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Reviewed-by: Amir Goldstein <amir73il@gmail.com> > --- > fs/splice.c | 7 ++++++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/fs/splice.c b/fs/splice.c > index e16f4f032d2f..0eb36e93c030 100644 > --- a/fs/splice.c > +++ b/fs/splice.c > @@ -1346,6 +1346,9 @@ static long vmsplice_to_user(struct file *file, struct iov_iter *iter, > pipe_unlock(pipe); > } > > + if (ret > 0) > + fsnotify_access(file); > + > return ret; > } > > @@ -1375,8 +1378,10 @@ static long vmsplice_to_pipe(struct file *file, struct iov_iter *iter, > if (!ret) > ret = iter_to_pipe(iter, pipe, buf_flag); > pipe_unlock(pipe); > - if (ret > 0) > + if (ret > 0) { > wakeup_pipe_readers(pipe); > + fsnotify_modify(file); > + } > return ret; > } > > -- > 2.39.2 > ^ permalink raw reply [flat|nested] 55+ messages in thread
* [PATCH v3 3/3] splice: fsnotify_access(in), fsnotify_modify(out) on success in tee 2023-06-27 16:55 ` [PATCH v3 0/3+1] " Ahelenia Ziemiańska 2023-06-27 16:55 ` [PATCH v3 1/3] splice: always fsnotify_access(in), fsnotify_modify(out) on success Ahelenia Ziemiańska 2023-06-27 16:55 ` [PATCH v3 2/3] splice: fsnotify_access(fd)/fsnotify_modify(fd) in vmsplice Ahelenia Ziemiańska @ 2023-06-27 16:55 ` Ahelenia Ziemiańska 2023-06-27 16:57 ` [LTP PATCH] inotify13: new test for fs/splice.c functions vs pipes vs inotify Ahelenia Ziemiańska ` (2 subsequent siblings) 5 siblings, 0 replies; 55+ messages in thread From: Ahelenia Ziemiańska @ 2023-06-27 16:55 UTC (permalink / raw) To: Amir Goldstein Cc: Alexander Viro, Christian Brauner, linux-fsdevel, linux-kernel, Jan Kara, Chung-Chiang Cheng, ltp [-- Attachment #1: Type: text/plain, Size: 881 bytes --] Same logic applies here: this can fill up the pipe, and pollers that rely on getting IN_MODIFY notifications never wake up. Fixes: 983652c69199 ("splice: report related fsnotify events") Link: https://lore.kernel.org/linux-fsdevel/jbyihkyk5dtaohdwjyivambb2gffyjs3dodpofafnkkunxq7bu@jngkdxx65pux/t/#u Link: https://bugs.debian.org/1039488 Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Reviewed-by: Amir Goldstein <amir73il@gmail.com> --- fs/splice.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/fs/splice.c b/fs/splice.c index 0eb36e93c030..2ecfccbda956 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -1815,6 +1815,11 @@ long do_tee(struct file *in, struct file *out, size_t len, unsigned int flags) } } + if (ret > 0) { + fsnotify_access(in); + fsnotify_modify(out); + } + return ret; } -- 2.39.2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 55+ messages in thread
* [LTP PATCH] inotify13: new test for fs/splice.c functions vs pipes vs inotify 2023-06-27 16:55 ` [PATCH v3 0/3+1] " Ahelenia Ziemiańska ` (2 preceding siblings ...) 2023-06-27 16:55 ` [PATCH v3 3/3] splice: fsnotify_access(in), fsnotify_modify(out) on success in tee Ahelenia Ziemiańska @ 2023-06-27 16:57 ` Ahelenia Ziemiańska 2023-06-27 18:31 ` Amir Goldstein 2023-06-27 18:03 ` [PATCH v3 0/3+1] fanotify accounting for fs/splice.c Amir Goldstein 2023-06-28 4:51 ` [PATCH v3 0/3+1] " Christoph Hellwig 5 siblings, 1 reply; 55+ messages in thread From: Ahelenia Ziemiańska @ 2023-06-27 16:57 UTC (permalink / raw) To: Amir Goldstein Cc: Alexander Viro, Christian Brauner, linux-fsdevel, linux-kernel, Jan Kara, Chung-Chiang Cheng, ltp [-- Attachment #1: Type: text/plain, Size: 10847 bytes --] The only one that passes on 6.1.27-1 is sendfile_file_to_pipe. Link: https://lore.kernel.org/linux-fsdevel/jbyihkyk5dtaohdwjyivambb2gffyjs3dodpofafnkkunxq7bu@jngkdxx65pux/t/#u Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> --- Formatted to clang-format defaults. Put the original Fixes:ed SHA in the metadata, that's probably fine, right? testcases/kernel/syscalls/inotify/.gitignore | 1 + testcases/kernel/syscalls/inotify/inotify13.c | 246 ++++++++++++++++++ 2 files changed, 247 insertions(+) create mode 100644 testcases/kernel/syscalls/inotify/inotify13.c diff --git a/testcases/kernel/syscalls/inotify/.gitignore b/testcases/kernel/syscalls/inotify/.gitignore index f6e5c546a..b597ea63f 100644 --- a/testcases/kernel/syscalls/inotify/.gitignore +++ b/testcases/kernel/syscalls/inotify/.gitignore @@ -10,3 +10,4 @@ /inotify10 /inotify11 /inotify12 +/inotify13 diff --git a/testcases/kernel/syscalls/inotify/inotify13.c b/testcases/kernel/syscalls/inotify/inotify13.c new file mode 100644 index 000000000..c34f1dc9f --- /dev/null +++ b/testcases/kernel/syscalls/inotify/inotify13.c @@ -0,0 +1,246 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/*\ + * Verify splice-family functions (and sendfile) generate IN_ACCESS + * for what they read and IN_MODIFY for what they write. + * + * Regression test for 983652c69199 and + * https://lore.kernel.org/linux-fsdevel/jbyihkyk5dtaohdwjyivambb2gffyjs3dodpofafnkkunxq7bu@jngkdxx65pux/t/#u + */ + +#define _GNU_SOURCE +#include "config.h" + +#include <stdio.h> +#include <unistd.h> +#include <stdlib.h> +#include <fcntl.h> +#include <stdbool.h> +#include <inttypes.h> +#include <signal.h> +#include <sys/mman.h> +#include <sys/sendfile.h> + +#include "tst_test.h" +#include "tst_safe_macros.h" +#include "inotify.h" + +#if defined(HAVE_SYS_INOTIFY_H) +#include <sys/inotify.h> + + +static int pipes[2] = {-1, -1}; +static int inotify = -1; +static int memfd = -1; +static int data_pipes[2] = {-1, -1}; + +static void watch_rw(int fd) { + char buf[64]; + sprintf(buf, "/proc/self/fd/%d", fd); + SAFE_MYINOTIFY_ADD_WATCH(inotify, buf, IN_ACCESS | IN_MODIFY); +} + +static int compar(const void *l, const void *r) { + const struct inotify_event *lie = l; + const struct inotify_event *rie = r; + return lie->wd - rie->wd; +} + +static void get_events(size_t evcnt, struct inotify_event evs[static evcnt]) { + struct inotify_event tail, *itr = evs; + for (size_t left = evcnt; left; --left) + SAFE_READ(true, inotify, itr++, sizeof(struct inotify_event)); + + TEST(read(inotify, &tail, sizeof(struct inotify_event))); + if (TST_RET != -1) + tst_brk(TFAIL, "expect %zu events", evcnt); + if (TST_ERR != EAGAIN) + tst_brk(TFAIL | TTERRNO, "expected EAGAIN"); + + qsort(evs, evcnt, sizeof(struct inotify_event), compar); +} + +static void expect_event(struct inotify_event *ev, int wd, uint32_t mask) { + if (ev->wd != wd) + tst_brk(TFAIL, "expect event for wd %d got %d", wd, ev->wd); + if (ev->mask != mask) + tst_brk(TFAIL, "expect event with mask %" PRIu32 " got %" PRIu32 "", mask, + ev->mask); +} + +#define F2P(splice) \ + SAFE_WRITE(SAFE_WRITE_RETRY, memfd, __func__, sizeof(__func__)); \ + SAFE_LSEEK(memfd, 0, SEEK_SET); \ + watch_rw(memfd); \ + watch_rw(pipes[0]); \ + TEST(splice); \ + if (TST_RET == -1) \ + tst_brk(TBROK | TERRNO, #splice); \ + if (TST_RET != sizeof(__func__)) \ + tst_brk(TBROK, #splice ": %" PRId64 "", TST_RET); \ + \ + /*expecting: IN_ACCESS memfd, IN_MODIFY pipes[0]*/ \ + struct inotify_event events[2]; \ + get_events(ARRAY_SIZE(events), events); \ + expect_event(events + 0, 1, IN_ACCESS); \ + expect_event(events + 1, 2, IN_MODIFY); \ + \ + char buf[sizeof(__func__)]; \ + SAFE_READ(true, pipes[0], buf, sizeof(__func__)); \ + if (memcmp(buf, __func__, sizeof(__func__))) \ + tst_brk(TFAIL, "buf contents bad"); +static void splice_file_to_pipe(void) { + F2P(splice(memfd, NULL, pipes[1], NULL, 128 * 1024 * 1024, 0)); +} +static void sendfile_file_to_pipe(void) { + F2P(sendfile(pipes[1], memfd, NULL, 128 * 1024 * 1024)); +} + +static void splice_pipe_to_file(void) { + SAFE_WRITE(SAFE_WRITE_RETRY, pipes[1], __func__, sizeof(__func__)); + watch_rw(pipes[0]); + watch_rw(memfd); + TEST(splice(pipes[0], NULL, memfd, NULL, 128 * 1024 * 1024, 0)); + if(TST_RET == -1) + tst_brk(TBROK | TERRNO, "splice"); + if(TST_RET != sizeof(__func__)) + tst_brk(TBROK, "splice: %" PRId64 "", TST_RET); + + // expecting: IN_ACCESS pipes[0], IN_MODIFY memfd + struct inotify_event events[2]; + get_events(ARRAY_SIZE(events), events); + expect_event(events + 0, 1, IN_ACCESS); + expect_event(events + 1, 2, IN_MODIFY); + + char buf[sizeof(__func__)]; + SAFE_LSEEK(memfd, 0, SEEK_SET); + SAFE_READ(true, memfd, buf, sizeof(__func__)); + if (memcmp(buf, __func__, sizeof(__func__))) + tst_brk(TFAIL, "buf contents bad"); +} + +#define P2P(splice) \ + SAFE_WRITE(SAFE_WRITE_RETRY, data_pipes[1], __func__, sizeof(__func__)); \ + watch_rw(data_pipes[0]); \ + watch_rw(pipes[1]); \ + TEST(splice); \ + if (TST_RET == -1) \ + tst_brk(TBROK | TERRNO, #splice); \ + if (TST_RET != sizeof(__func__)) \ + tst_brk(TBROK, #splice ": %" PRId64 "", TST_RET); \ + \ + /* expecting: IN_ACCESS data_pipes[0], IN_MODIFY pipes[1] */ \ + struct inotify_event events[2]; \ + get_events(ARRAY_SIZE(events), events); \ + expect_event(events + 0, 1, IN_ACCESS); \ + expect_event(events + 1, 2, IN_MODIFY); \ + \ + char buf[sizeof(__func__)]; \ + SAFE_READ(true, pipes[0], buf, sizeof(__func__)); \ + if (memcmp(buf, __func__, sizeof(__func__))) \ + tst_brk(TFAIL, "buf contents bad"); +static void splice_pipe_to_pipe(void) { + P2P(splice(data_pipes[0], NULL, pipes[1], NULL, 128 * 1024 * 1024, 0)); +} +static void tee_pipe_to_pipe(void) { + P2P(tee(data_pipes[0], pipes[1], 128 * 1024 * 1024, 0)); +} + +static char vmsplice_pipe_to_mem_dt[32 * 1024]; +static void vmsplice_pipe_to_mem(void) { + memcpy(vmsplice_pipe_to_mem_dt, __func__, sizeof(__func__)); + watch_rw(pipes[0]); + TEST(vmsplice(pipes[1], + &(struct iovec){.iov_base = vmsplice_pipe_to_mem_dt, + .iov_len = sizeof(vmsplice_pipe_to_mem_dt)}, + 1, SPLICE_F_GIFT)); + if (TST_RET == -1) + tst_brk(TBROK | TERRNO, "vmsplice"); + if (TST_RET != sizeof(vmsplice_pipe_to_mem_dt)) + tst_brk(TBROK, "vmsplice: %" PRId64 "", TST_RET); + + // expecting: IN_MODIFY pipes[0] + struct inotify_event event; + get_events(1, &event); + expect_event(&event, 1, IN_MODIFY); + + char buf[sizeof(__func__)]; + SAFE_READ(true, pipes[0], buf, sizeof(__func__)); + if (memcmp(buf, __func__, sizeof(__func__))) + tst_brk(TFAIL, "buf contents bad"); +} + +static void vmsplice_mem_to_pipe(void) { + char buf[sizeof(__func__)]; + SAFE_WRITE(SAFE_WRITE_RETRY, pipes[1], __func__, sizeof(__func__)); + watch_rw(pipes[1]); + TEST(vmsplice(pipes[0], + &(struct iovec){.iov_base = buf, .iov_len = sizeof(buf)}, 1, + 0)); + if (TST_RET == -1) + tst_brk(TBROK | TERRNO, "vmsplice"); + if (TST_RET != sizeof(buf)) + tst_brk(TBROK, "vmsplice: %" PRId64 "", TST_RET); + + // expecting: IN_ACCESS pipes[1] + struct inotify_event event; + get_events(1, &event); + expect_event(&event, 1, IN_ACCESS); + if (memcmp(buf, __func__, sizeof(__func__))) + tst_brk(TFAIL, "buf contents bad"); +} + +typedef void (*tests_f)(void); +#define TEST_F(f) { f, #f } +static const struct { + tests_f f; + const char *n; +} tests[] = { + TEST_F(splice_file_to_pipe), TEST_F(sendfile_file_to_pipe), + TEST_F(splice_pipe_to_file), TEST_F(splice_pipe_to_pipe), + TEST_F(tee_pipe_to_pipe), TEST_F(vmsplice_pipe_to_mem), + TEST_F(vmsplice_mem_to_pipe), +}; + +static void run_test(unsigned int n) +{ + tst_res(TINFO, "%s", tests[n].n); + + SAFE_PIPE2(pipes, O_CLOEXEC); + SAFE_PIPE2(data_pipes, O_CLOEXEC); + inotify = SAFE_MYINOTIFY_INIT1(IN_NONBLOCK | IN_CLOEXEC); + if((memfd = memfd_create(__func__, MFD_CLOEXEC)) == -1) + tst_brk(TCONF | TERRNO, "memfd"); + tests[n].f(); + tst_res(TPASS, "ок"); +} + +static void cleanup(void) +{ + if (memfd != -1) + SAFE_CLOSE(memfd); + if (inotify != -1) + SAFE_CLOSE(inotify); + if (pipes[0] != -1) + SAFE_CLOSE(pipes[0]); + if (pipes[1] != -1) + SAFE_CLOSE(pipes[1]); + if (data_pipes[0] != -1) + SAFE_CLOSE(data_pipes[0]); + if (data_pipes[1] != -1) + SAFE_CLOSE(data_pipes[1]); +} + +static struct tst_test test = { + .max_runtime = 10, + .cleanup = cleanup, + .test = run_test, + .tcnt = ARRAY_SIZE(tests), + .tags = (const struct tst_tag[]) { + {"linux-git", "983652c69199"}, + {} + }, +}; + +#else + TST_TEST_TCONF("system doesn't have required inotify support"); +#endif -- 2.39.2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 55+ messages in thread
* Re: [LTP PATCH] inotify13: new test for fs/splice.c functions vs pipes vs inotify 2023-06-27 16:57 ` [LTP PATCH] inotify13: new test for fs/splice.c functions vs pipes vs inotify Ahelenia Ziemiańska @ 2023-06-27 18:31 ` Amir Goldstein 2023-06-27 20:59 ` [LTP RFC PATCH v2] " Ahelenia Ziemiańska 2023-06-27 22:57 ` [LTP PATCH] " Petr Vorel 0 siblings, 2 replies; 55+ messages in thread From: Amir Goldstein @ 2023-06-27 18:31 UTC (permalink / raw) To: Ahelenia Ziemiańska Cc: Alexander Viro, Christian Brauner, linux-fsdevel, linux-kernel, Jan Kara, Chung-Chiang Cheng, ltp, Petr Vorel On Tue, Jun 27, 2023 at 7:57 PM Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> wrote: > > The only one that passes on 6.1.27-1 is sendfile_file_to_pipe. > > Link: https://lore.kernel.org/linux-fsdevel/jbyihkyk5dtaohdwjyivambb2gffyjs3dodpofafnkkunxq7bu@jngkdxx65pux/t/#u > Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> > --- > Formatted to clang-format defaults. Put the original Fixes:ed SHA in the > metadata, that's probably fine, right? No. The git commit is for the commits that fix the problem. This can only be added after your fixes are merged. I will let the LPT developers comment about style, but I think LTP project wants tab indents. I am personally unable to read this patch with so little indentation and so much macroing. > > testcases/kernel/syscalls/inotify/.gitignore | 1 + > testcases/kernel/syscalls/inotify/inotify13.c | 246 ++++++++++++++++++ > 2 files changed, 247 insertions(+) > create mode 100644 testcases/kernel/syscalls/inotify/inotify13.c > > diff --git a/testcases/kernel/syscalls/inotify/.gitignore b/testcases/kernel/syscalls/inotify/.gitignore > index f6e5c546a..b597ea63f 100644 > --- a/testcases/kernel/syscalls/inotify/.gitignore > +++ b/testcases/kernel/syscalls/inotify/.gitignore > @@ -10,3 +10,4 @@ > /inotify10 > /inotify11 > /inotify12 > +/inotify13 > diff --git a/testcases/kernel/syscalls/inotify/inotify13.c b/testcases/kernel/syscalls/inotify/inotify13.c > new file mode 100644 > index 000000000..c34f1dc9f > --- /dev/null > +++ b/testcases/kernel/syscalls/inotify/inotify13.c > @@ -0,0 +1,246 @@ > +// SPDX-License-Identifier: GPL-2.0-or-later > +/*\ > + * Verify splice-family functions (and sendfile) generate IN_ACCESS > + * for what they read and IN_MODIFY for what they write. > + * > + * Regression test for 983652c69199 and > + * https://lore.kernel.org/linux-fsdevel/jbyihkyk5dtaohdwjyivambb2gffyjs3dodpofafnkkunxq7bu@jngkdxx65pux/t/#u > + */ > + > +#define _GNU_SOURCE > +#include "config.h" > + > +#include <stdio.h> > +#include <unistd.h> > +#include <stdlib.h> > +#include <fcntl.h> > +#include <stdbool.h> > +#include <inttypes.h> > +#include <signal.h> > +#include <sys/mman.h> > +#include <sys/sendfile.h> > + > +#include "tst_test.h" > +#include "tst_safe_macros.h" > +#include "inotify.h" > + > +#if defined(HAVE_SYS_INOTIFY_H) > +#include <sys/inotify.h> > + > + > +static int pipes[2] = {-1, -1}; > +static int inotify = -1; > +static int memfd = -1; > +static int data_pipes[2] = {-1, -1}; > + > +static void watch_rw(int fd) { > + char buf[64]; > + sprintf(buf, "/proc/self/fd/%d", fd); > + SAFE_MYINOTIFY_ADD_WATCH(inotify, buf, IN_ACCESS | IN_MODIFY); > +} > + > +static int compar(const void *l, const void *r) { > + const struct inotify_event *lie = l; > + const struct inotify_event *rie = r; > + return lie->wd - rie->wd; > +} > + > +static void get_events(size_t evcnt, struct inotify_event evs[static evcnt]) { > + struct inotify_event tail, *itr = evs; > + for (size_t left = evcnt; left; --left) > + SAFE_READ(true, inotify, itr++, sizeof(struct inotify_event)); > + > + TEST(read(inotify, &tail, sizeof(struct inotify_event))); > + if (TST_RET != -1) > + tst_brk(TFAIL, "expect %zu events", evcnt); > + if (TST_ERR != EAGAIN) > + tst_brk(TFAIL | TTERRNO, "expected EAGAIN"); > + > + qsort(evs, evcnt, sizeof(struct inotify_event), compar); > +} > + > +static void expect_event(struct inotify_event *ev, int wd, uint32_t mask) { > + if (ev->wd != wd) > + tst_brk(TFAIL, "expect event for wd %d got %d", wd, ev->wd); > + if (ev->mask != mask) > + tst_brk(TFAIL, "expect event with mask %" PRIu32 " got %" PRIu32 "", mask, > + ev->mask); > +} > + > +#define F2P(splice) \ > + SAFE_WRITE(SAFE_WRITE_RETRY, memfd, __func__, sizeof(__func__)); \ > + SAFE_LSEEK(memfd, 0, SEEK_SET); \ > + watch_rw(memfd); \ > + watch_rw(pipes[0]); \ > + TEST(splice); \ > + if (TST_RET == -1) \ > + tst_brk(TBROK | TERRNO, #splice); \ > + if (TST_RET != sizeof(__func__)) \ > + tst_brk(TBROK, #splice ": %" PRId64 "", TST_RET); \ > + \ > + /*expecting: IN_ACCESS memfd, IN_MODIFY pipes[0]*/ \ > + struct inotify_event events[2]; \ > + get_events(ARRAY_SIZE(events), events); \ > + expect_event(events + 0, 1, IN_ACCESS); \ > + expect_event(events + 1, 2, IN_MODIFY); \ > + \ > + char buf[sizeof(__func__)]; \ > + SAFE_READ(true, pipes[0], buf, sizeof(__func__)); \ > + if (memcmp(buf, __func__, sizeof(__func__))) \ > + tst_brk(TFAIL, "buf contents bad"); > +static void splice_file_to_pipe(void) { > + F2P(splice(memfd, NULL, pipes[1], NULL, 128 * 1024 * 1024, 0)); > +} > +static void sendfile_file_to_pipe(void) { > + F2P(sendfile(pipes[1], memfd, NULL, 128 * 1024 * 1024)); > +} > + > +static void splice_pipe_to_file(void) { > + SAFE_WRITE(SAFE_WRITE_RETRY, pipes[1], __func__, sizeof(__func__)); > + watch_rw(pipes[0]); > + watch_rw(memfd); > + TEST(splice(pipes[0], NULL, memfd, NULL, 128 * 1024 * 1024, 0)); > + if(TST_RET == -1) > + tst_brk(TBROK | TERRNO, "splice"); > + if(TST_RET != sizeof(__func__)) > + tst_brk(TBROK, "splice: %" PRId64 "", TST_RET); > + > + // expecting: IN_ACCESS pipes[0], IN_MODIFY memfd > + struct inotify_event events[2]; > + get_events(ARRAY_SIZE(events), events); > + expect_event(events + 0, 1, IN_ACCESS); > + expect_event(events + 1, 2, IN_MODIFY); > + > + char buf[sizeof(__func__)]; > + SAFE_LSEEK(memfd, 0, SEEK_SET); > + SAFE_READ(true, memfd, buf, sizeof(__func__)); > + if (memcmp(buf, __func__, sizeof(__func__))) > + tst_brk(TFAIL, "buf contents bad"); > +} > + > +#define P2P(splice) \ > + SAFE_WRITE(SAFE_WRITE_RETRY, data_pipes[1], __func__, sizeof(__func__)); \ > + watch_rw(data_pipes[0]); \ > + watch_rw(pipes[1]); \ > + TEST(splice); \ > + if (TST_RET == -1) \ > + tst_brk(TBROK | TERRNO, #splice); \ > + if (TST_RET != sizeof(__func__)) \ > + tst_brk(TBROK, #splice ": %" PRId64 "", TST_RET); \ > + \ > + /* expecting: IN_ACCESS data_pipes[0], IN_MODIFY pipes[1] */ \ > + struct inotify_event events[2]; \ > + get_events(ARRAY_SIZE(events), events); \ > + expect_event(events + 0, 1, IN_ACCESS); \ > + expect_event(events + 1, 2, IN_MODIFY); \ > + \ > + char buf[sizeof(__func__)]; \ > + SAFE_READ(true, pipes[0], buf, sizeof(__func__)); \ > + if (memcmp(buf, __func__, sizeof(__func__))) \ > + tst_brk(TFAIL, "buf contents bad"); > +static void splice_pipe_to_pipe(void) { > + P2P(splice(data_pipes[0], NULL, pipes[1], NULL, 128 * 1024 * 1024, 0)); > +} > +static void tee_pipe_to_pipe(void) { > + P2P(tee(data_pipes[0], pipes[1], 128 * 1024 * 1024, 0)); > +} > + > +static char vmsplice_pipe_to_mem_dt[32 * 1024]; > +static void vmsplice_pipe_to_mem(void) { > + memcpy(vmsplice_pipe_to_mem_dt, __func__, sizeof(__func__)); > + watch_rw(pipes[0]); > + TEST(vmsplice(pipes[1], > + &(struct iovec){.iov_base = vmsplice_pipe_to_mem_dt, > + .iov_len = sizeof(vmsplice_pipe_to_mem_dt)}, > + 1, SPLICE_F_GIFT)); > + if (TST_RET == -1) > + tst_brk(TBROK | TERRNO, "vmsplice"); > + if (TST_RET != sizeof(vmsplice_pipe_to_mem_dt)) > + tst_brk(TBROK, "vmsplice: %" PRId64 "", TST_RET); > + > + // expecting: IN_MODIFY pipes[0] > + struct inotify_event event; > + get_events(1, &event); > + expect_event(&event, 1, IN_MODIFY); > + > + char buf[sizeof(__func__)]; > + SAFE_READ(true, pipes[0], buf, sizeof(__func__)); > + if (memcmp(buf, __func__, sizeof(__func__))) > + tst_brk(TFAIL, "buf contents bad"); > +} > + > +static void vmsplice_mem_to_pipe(void) { > + char buf[sizeof(__func__)]; > + SAFE_WRITE(SAFE_WRITE_RETRY, pipes[1], __func__, sizeof(__func__)); > + watch_rw(pipes[1]); > + TEST(vmsplice(pipes[0], > + &(struct iovec){.iov_base = buf, .iov_len = sizeof(buf)}, 1, > + 0)); > + if (TST_RET == -1) > + tst_brk(TBROK | TERRNO, "vmsplice"); > + if (TST_RET != sizeof(buf)) > + tst_brk(TBROK, "vmsplice: %" PRId64 "", TST_RET); > + > + // expecting: IN_ACCESS pipes[1] > + struct inotify_event event; > + get_events(1, &event); > + expect_event(&event, 1, IN_ACCESS); > + if (memcmp(buf, __func__, sizeof(__func__))) > + tst_brk(TFAIL, "buf contents bad"); > +} > + > +typedef void (*tests_f)(void); > +#define TEST_F(f) { f, #f } > +static const struct { > + tests_f f; > + const char *n; > +} tests[] = { > + TEST_F(splice_file_to_pipe), TEST_F(sendfile_file_to_pipe), > + TEST_F(splice_pipe_to_file), TEST_F(splice_pipe_to_pipe), > + TEST_F(tee_pipe_to_pipe), TEST_F(vmsplice_pipe_to_mem), > + TEST_F(vmsplice_mem_to_pipe), > +}; > + > +static void run_test(unsigned int n) > +{ > + tst_res(TINFO, "%s", tests[n].n); > + > + SAFE_PIPE2(pipes, O_CLOEXEC); > + SAFE_PIPE2(data_pipes, O_CLOEXEC); > + inotify = SAFE_MYINOTIFY_INIT1(IN_NONBLOCK | IN_CLOEXEC); > + if((memfd = memfd_create(__func__, MFD_CLOEXEC)) == -1) > + tst_brk(TCONF | TERRNO, "memfd"); > + tests[n].f(); Normally, a test cases table would encode things like the number of expected events and type of events. The idea is that the test template has parametrized code and not just a loop for test cases subroutines, but there are many ways to write tests, so as long as it gets the job done and is readable to humans, I don't mind. Right now this test may do the job, but it is not readable for this human ;-) mostly because of the huge macros - LTP is known for pretty large macros, but those are for generic utilities and you have complete test cases written as macros (templates). > + tst_res(TPASS, "ок"); > +} > + > +static void cleanup(void) > +{ > + if (memfd != -1) > + SAFE_CLOSE(memfd); > + if (inotify != -1) > + SAFE_CLOSE(inotify); > + if (pipes[0] != -1) > + SAFE_CLOSE(pipes[0]); > + if (pipes[1] != -1) > + SAFE_CLOSE(pipes[1]); > + if (data_pipes[0] != -1) > + SAFE_CLOSE(data_pipes[0]); > + if (data_pipes[1] != -1) > + SAFE_CLOSE(data_pipes[1]); > +} > + This cleanup does not happen for every test case - it happens only at the end of all the tests IIRC. > +static struct tst_test test = { > + .max_runtime = 10, > + .cleanup = cleanup, > + .test = run_test, > + .tcnt = ARRAY_SIZE(tests), > + .tags = (const struct tst_tag[]) { > + {"linux-git", "983652c69199"}, Leave this out for now. Thanks, Amir. ^ permalink raw reply [flat|nested] 55+ messages in thread
* [LTP RFC PATCH v2] inotify13: new test for fs/splice.c functions vs pipes vs inotify 2023-06-27 18:31 ` Amir Goldstein @ 2023-06-27 20:59 ` Ahelenia Ziemiańska 2023-06-28 0:21 ` [LTP RFC PATCH v3] " Ahelenia Ziemiańska 2023-06-27 22:57 ` [LTP PATCH] " Petr Vorel 1 sibling, 1 reply; 55+ messages in thread From: Ahelenia Ziemiańska @ 2023-06-27 20:59 UTC (permalink / raw) To: Amir Goldstein Cc: Alexander Viro, Christian Brauner, linux-fsdevel, linux-kernel, Jan Kara, Chung-Chiang Cheng, ltp [-- Attachment #1: Type: text/plain, Size: 10702 bytes --] The only one that passes on 6.1.27-1 is sendfile_file_to_pipe. Link: https://lore.kernel.org/linux-fsdevel/jbyihkyk5dtaohdwjyivambb2gffyjs3dodpofafnkkunxq7bu@jngkdxx65pux/t/#u Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> --- Whitespace-only changes against v1. Marking as RFC since we have no commit SHA to reference. I didn't find any style opinions in the repository, so I just let clang-format rip. Apparently, according to a /github wiki page/: https://github.com/linux-test-project/ltp/wiki/Test-Writing-Guidelines#2-coding-style which is /only referenced/ from a 2020 issue: https://github.com/linux-test-project/ltp/issues/631 LTP uses the kernel style. Could've fooled me, because the other inotify tests are definitely not in the kernel style. Re-formatted the whole file with the linux .clang-format. testcases/kernel/syscalls/inotify/.gitignore | 1 + testcases/kernel/syscalls/inotify/inotify13.c | 260 ++++++++++++++++++ 2 files changed, 261 insertions(+) create mode 100644 testcases/kernel/syscalls/inotify/inotify13.c diff --git a/testcases/kernel/syscalls/inotify/.gitignore b/testcases/kernel/syscalls/inotify/.gitignore index f6e5c546a..b597ea63f 100644 --- a/testcases/kernel/syscalls/inotify/.gitignore +++ b/testcases/kernel/syscalls/inotify/.gitignore @@ -10,3 +10,4 @@ /inotify10 /inotify11 /inotify12 +/inotify13 diff --git a/testcases/kernel/syscalls/inotify/inotify13.c b/testcases/kernel/syscalls/inotify/inotify13.c new file mode 100644 index 000000000..8c2cd4cf1 --- /dev/null +++ b/testcases/kernel/syscalls/inotify/inotify13.c @@ -0,0 +1,260 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/*\ + * Verify splice-family functions (and sendfile) generate IN_ACCESS + * for what they read and IN_MODIFY for what they write. + * + * Regression test for 983652c69199 and + * https://lore.kernel.org/linux-fsdevel/jbyihkyk5dtaohdwjyivambb2gffyjs3dodpofafnkkunxq7bu@jngkdxx65pux/t/#u + */ + +#define _GNU_SOURCE +#include "config.h" + +#include <stdio.h> +#include <unistd.h> +#include <stdlib.h> +#include <fcntl.h> +#include <stdbool.h> +#include <inttypes.h> +#include <signal.h> +#include <sys/mman.h> +#include <sys/sendfile.h> + +#include "tst_test.h" +#include "tst_safe_macros.h" +#include "inotify.h" + +#if defined(HAVE_SYS_INOTIFY_H) +#include <sys/inotify.h> + +static int pipes[2] = { -1, -1 }; +static int inotify = -1; +static int memfd = -1; +static int data_pipes[2] = { -1, -1 }; + +static void watch_rw(int fd) +{ + char buf[64]; + sprintf(buf, "/proc/self/fd/%d", fd); + SAFE_MYINOTIFY_ADD_WATCH(inotify, buf, IN_ACCESS | IN_MODIFY); +} + +static int compar(const void *l, const void *r) +{ + const struct inotify_event *lie = l; + const struct inotify_event *rie = r; + return lie->wd - rie->wd; +} + +static void get_events(size_t evcnt, struct inotify_event evs[static evcnt]) +{ + struct inotify_event tail, *itr = evs; + for (size_t left = evcnt; left; --left) + SAFE_READ(true, inotify, itr++, sizeof(struct inotify_event)); + + TEST(read(inotify, &tail, sizeof(struct inotify_event))); + if (TST_RET != -1) + tst_brk(TFAIL, ">%zu events", evcnt); + if (TST_ERR != EAGAIN) + tst_brk(TFAIL | TTERRNO, "expected EAGAIN"); + + qsort(evs, evcnt, sizeof(struct inotify_event), compar); +} + +static void expect_event(struct inotify_event *ev, int wd, uint32_t mask) +{ + if (ev->wd != wd) + tst_brk(TFAIL, "expect event for wd %d got %d", wd, ev->wd); + if (ev->mask != mask) + tst_brk(TFAIL, + "expect event with mask %" PRIu32 " got %" PRIu32 "", + mask, ev->mask); +} + +#define F2P(splice) \ + SAFE_WRITE(SAFE_WRITE_RETRY, memfd, __func__, sizeof(__func__)); \ + SAFE_LSEEK(memfd, 0, SEEK_SET); \ + watch_rw(memfd); \ + watch_rw(pipes[0]); \ + TEST(splice); \ + if (TST_RET == -1) \ + tst_brk(TBROK | TERRNO, #splice); \ + if (TST_RET != sizeof(__func__)) \ + tst_brk(TBROK, #splice ": %" PRId64 "", TST_RET); \ + \ + /* expecting: IN_ACCESS memfd, IN_MODIFY pipes[0] */ \ + struct inotify_event events[2]; \ + get_events(ARRAY_SIZE(events), events); \ + expect_event(events + 0, 1, IN_ACCESS); \ + expect_event(events + 1, 2, IN_MODIFY); \ + \ + char buf[sizeof(__func__)]; \ + SAFE_READ(true, pipes[0], buf, sizeof(__func__)); \ + if (memcmp(buf, __func__, sizeof(__func__))) \ + tst_brk(TFAIL, "buf contents bad"); +static void splice_file_to_pipe(void) +{ + F2P(splice(memfd, NULL, pipes[1], NULL, 128 * 1024 * 1024, 0)); +} +static void sendfile_file_to_pipe(void) +{ + F2P(sendfile(pipes[1], memfd, NULL, 128 * 1024 * 1024)); +} + +static void splice_pipe_to_file(void) +{ + SAFE_WRITE(SAFE_WRITE_RETRY, pipes[1], __func__, sizeof(__func__)); + watch_rw(pipes[0]); + watch_rw(memfd); + TEST(splice(pipes[0], NULL, memfd, NULL, 128 * 1024 * 1024, 0)); + if (TST_RET == -1) + tst_brk(TBROK | TERRNO, "splice"); + if (TST_RET != sizeof(__func__)) + tst_brk(TBROK, "splice: %" PRId64 "", TST_RET); + + /* expecting: IN_ACCESS pipes[0], IN_MODIFY memfd */ + struct inotify_event events[2]; + get_events(ARRAY_SIZE(events), events); + expect_event(events + 0, 1, IN_ACCESS); + expect_event(events + 1, 2, IN_MODIFY); + + char buf[sizeof(__func__)]; + SAFE_LSEEK(memfd, 0, SEEK_SET); + SAFE_READ(true, memfd, buf, sizeof(__func__)); + if (memcmp(buf, __func__, sizeof(__func__))) + tst_brk(TFAIL, "buf contents bad"); +} + +#define P2P(splice) \ + SAFE_WRITE(SAFE_WRITE_RETRY, data_pipes[1], __func__, \ + sizeof(__func__)); \ + watch_rw(data_pipes[0]); \ + watch_rw(pipes[1]); \ + TEST(splice); \ + if (TST_RET == -1) \ + tst_brk(TBROK | TERRNO, #splice); \ + if (TST_RET != sizeof(__func__)) \ + tst_brk(TBROK, #splice ": %" PRId64 "", TST_RET); \ + \ + /* expecting: IN_ACCESS data_pipes[0], IN_MODIFY pipes[1] */ \ + struct inotify_event events[2]; \ + get_events(ARRAY_SIZE(events), events); \ + expect_event(events + 0, 1, IN_ACCESS); \ + expect_event(events + 1, 2, IN_MODIFY); \ + \ + char buf[sizeof(__func__)]; \ + SAFE_READ(true, pipes[0], buf, sizeof(__func__)); \ + if (memcmp(buf, __func__, sizeof(__func__))) \ + tst_brk(TFAIL, "buf contents bad"); +static void splice_pipe_to_pipe(void) +{ + P2P(splice(data_pipes[0], NULL, pipes[1], NULL, 128 * 1024 * 1024, 0)); +} +static void tee_pipe_to_pipe(void) +{ + P2P(tee(data_pipes[0], pipes[1], 128 * 1024 * 1024, 0)); +} + +static char vmsplice_pipe_to_mem_dt[32 * 1024]; +static void vmsplice_pipe_to_mem(void) +{ + memcpy(vmsplice_pipe_to_mem_dt, __func__, sizeof(__func__)); + watch_rw(pipes[0]); + TEST(vmsplice( + pipes[1], + &(struct iovec){ .iov_base = vmsplice_pipe_to_mem_dt, + .iov_len = sizeof(vmsplice_pipe_to_mem_dt) }, + 1, SPLICE_F_GIFT)); + if (TST_RET == -1) + tst_brk(TBROK | TERRNO, "vmsplice"); + if (TST_RET != sizeof(vmsplice_pipe_to_mem_dt)) + tst_brk(TBROK, "vmsplice: %" PRId64 "", TST_RET); + + /* expecting: IN_MODIFY pipes[0] */ + struct inotify_event event; + get_events(1, &event); + expect_event(&event, 1, IN_MODIFY); + + char buf[sizeof(__func__)]; + SAFE_READ(true, pipes[0], buf, sizeof(__func__)); + if (memcmp(buf, __func__, sizeof(__func__))) + tst_brk(TFAIL, "buf contents bad"); +} + +static void vmsplice_mem_to_pipe(void) +{ + char buf[sizeof(__func__)]; + SAFE_WRITE(SAFE_WRITE_RETRY, pipes[1], __func__, sizeof(__func__)); + watch_rw(pipes[1]); + TEST(vmsplice(pipes[0], + &(struct iovec){ .iov_base = buf, + .iov_len = sizeof(buf) }, + 1, 0)); + if (TST_RET == -1) + tst_brk(TBROK | TERRNO, "vmsplice"); + if (TST_RET != sizeof(buf)) + tst_brk(TBROK, "vmsplice: %" PRId64 "", TST_RET); + + /* expecting: IN_ACCESS pipes[1] */ + struct inotify_event event; + get_events(1, &event); + expect_event(&event, 1, IN_ACCESS); + if (memcmp(buf, __func__, sizeof(__func__))) + tst_brk(TFAIL, "buf contents bad"); +} + +typedef void (*tests_f)(void); +#define TEST_F(f) \ + { \ + f, #f, \ + } +static const struct { + tests_f f; + const char *n; +} tests[] = { + TEST_F(splice_file_to_pipe), TEST_F(sendfile_file_to_pipe), + TEST_F(splice_pipe_to_file), TEST_F(splice_pipe_to_pipe), + TEST_F(tee_pipe_to_pipe), TEST_F(vmsplice_pipe_to_mem), + TEST_F(vmsplice_mem_to_pipe), +}; + +static void run_test(unsigned int n) +{ + tst_res(TINFO, "%s", tests[n].n); + + SAFE_PIPE2(pipes, O_CLOEXEC); + SAFE_PIPE2(data_pipes, O_CLOEXEC); + inotify = SAFE_MYINOTIFY_INIT1(IN_NONBLOCK | IN_CLOEXEC); + if ((memfd = memfd_create(__func__, MFD_CLOEXEC)) == -1) + tst_brk(TCONF | TERRNO, "memfd"); + tests[n].f(); + tst_res(TPASS, "ок"); +} + +static void cleanup(void) +{ + if (memfd != -1) + SAFE_CLOSE(memfd); + if (inotify != -1) + SAFE_CLOSE(inotify); + if (pipes[0] != -1) + SAFE_CLOSE(pipes[0]); + if (pipes[1] != -1) + SAFE_CLOSE(pipes[1]); + if (data_pipes[0] != -1) + SAFE_CLOSE(data_pipes[0]); + if (data_pipes[1] != -1) + SAFE_CLOSE(data_pipes[1]); +} + +static struct tst_test test = { + .max_runtime = 10, + .cleanup = cleanup, + .test = run_test, + .tcnt = ARRAY_SIZE(tests), + .tags = (const struct tst_tag[]){ { "linux-git", "983652c69199" }, {} }, +}; + +#else +TST_TEST_TCONF("system doesn't have required inotify support"); +#endif -- 2.39.2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 55+ messages in thread
* [LTP RFC PATCH v3] inotify13: new test for fs/splice.c functions vs pipes vs inotify 2023-06-27 20:59 ` [LTP RFC PATCH v2] " Ahelenia Ziemiańska @ 2023-06-28 0:21 ` Ahelenia Ziemiańska 2023-06-28 5:30 ` Amir Goldstein 0 siblings, 1 reply; 55+ messages in thread From: Ahelenia Ziemiańska @ 2023-06-28 0:21 UTC (permalink / raw) To: Amir Goldstein Cc: Alexander Viro, Christian Brauner, linux-fsdevel, linux-kernel, Jan Kara, Chung-Chiang Cheng, ltp, Petr Vorel [-- Attachment #1: Type: text/plain, Size: 9945 bytes --] The only one that passes on 6.1.27-1 is sendfile_file_to_pipe. Link: https://lore.kernel.org/linux-fsdevel/jbyihkyk5dtaohdwjyivambb2gffyjs3dodpofafnkkunxq7bu@jngkdxx65pux/t/#u Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> --- Sorry, I missed second part of Amir's comments somehow. cleanup is only run at the end by default: run it manually to not leak fds between tests. I've parameterised the tests from the driver, instead of with macros, and removed the tst_tag data. Added the * [Description] tag and full commit subject to the header comment; leaving the lore.k.o link for now, to be turned into a SHA when the kernel behaviour this tests starts having a SHA. Error checking has been lifted out as well. Formatted in kernel style accd'g to clang-format and check-inotify13. I used the wrong address for ltp@ the first time; I've since bounced the patchset, and am sending this, to the correct address. They were all held for moderation for now. testcases/kernel/syscalls/inotify/.gitignore | 1 + testcases/kernel/syscalls/inotify/inotify13.c | 282 ++++++++++++++++++ 2 files changed, 283 insertions(+) create mode 100644 testcases/kernel/syscalls/inotify/inotify13.c diff --git a/testcases/kernel/syscalls/inotify/.gitignore b/testcases/kernel/syscalls/inotify/.gitignore index f6e5c546a..b597ea63f 100644 --- a/testcases/kernel/syscalls/inotify/.gitignore +++ b/testcases/kernel/syscalls/inotify/.gitignore @@ -10,3 +10,4 @@ /inotify10 /inotify11 /inotify12 +/inotify13 diff --git a/testcases/kernel/syscalls/inotify/inotify13.c b/testcases/kernel/syscalls/inotify/inotify13.c new file mode 100644 index 000000000..97f88053e --- /dev/null +++ b/testcases/kernel/syscalls/inotify/inotify13.c @@ -0,0 +1,282 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/*\ + * [Description] + * Verify splice-family functions (and sendfile) generate IN_ACCESS + * for what they read and IN_MODIFY for what they write. + * + * Regression test for 983652c69199 ("splice: report related fsnotify events") and + * https://lore.kernel.org/linux-fsdevel/jbyihkyk5dtaohdwjyivambb2gffyjs3dodpofafnkkunxq7bu@jngkdxx65pux/t/#u + */ + +#define _GNU_SOURCE +#include "config.h" + +#include <stdio.h> +#include <unistd.h> +#include <stdlib.h> +#include <fcntl.h> +#include <stdbool.h> +#include <inttypes.h> +#include <signal.h> +#include <sys/mman.h> +#include <sys/sendfile.h> + +#include "tst_test.h" +#include "tst_safe_macros.h" +#include "inotify.h" + +#if defined(HAVE_SYS_INOTIFY_H) +#include <sys/inotify.h> + +static int pipes[2] = { -1, -1 }; +static int inotify = -1; +static int memfd = -1; +static int data_pipes[2] = { -1, -1 }; + +static void watch_rw(int fd) +{ + char buf[64]; + + sprintf(buf, "/proc/self/fd/%d", fd); + SAFE_MYINOTIFY_ADD_WATCH(inotify, buf, IN_ACCESS | IN_MODIFY); +} + +static int compar(const void *l, const void *r) +{ + const struct inotify_event *lie = l; + const struct inotify_event *rie = r; + + return lie->wd - rie->wd; +} + +static void get_events(size_t evcnt, struct inotify_event evs[static evcnt]) +{ + struct inotify_event tail, *itr = evs; + + for (size_t left = evcnt; left; --left) + SAFE_READ(true, inotify, itr++, sizeof(struct inotify_event)); + + TEST(read(inotify, &tail, sizeof(struct inotify_event))); + if (TST_RET != -1) + tst_brk(TFAIL, ">%zu events", evcnt); + if (TST_ERR != EAGAIN) + tst_brk(TFAIL | TTERRNO, "expected EAGAIN"); + + qsort(evs, evcnt, sizeof(struct inotify_event), compar); +} + +static void expect_transfer(const char *name, size_t size) +{ + if (TST_RET == -1) + tst_brk(TBROK | TERRNO, "%s", name); + if ((size_t)TST_RET != size) + tst_brk(TBROK, "%s: %ld != %zu", name, TST_RET, size); +} + +static void expect_event(struct inotify_event *ev, int wd, uint32_t mask) +{ + if (ev->wd != wd) + tst_brk(TFAIL, "expect event for wd %d got %d", wd, ev->wd); + if (ev->mask != mask) + tst_brk(TFAIL, + "expect event with mask %" PRIu32 " got %" PRIu32 "", + mask, ev->mask); +} + +// write to file, rewind, transfer accd'g to f2p, read from pipe +// expecting: IN_ACCESS memfd, IN_MODIFY pipes[0] +static void file_to_pipe(const char *name, ssize_t (*f2p)(void)) +{ + struct inotify_event events[2]; + char buf[strlen(name)]; + + SAFE_WRITE(SAFE_WRITE_RETRY, memfd, name, strlen(name)); + SAFE_LSEEK(memfd, 0, SEEK_SET); + watch_rw(memfd); + watch_rw(pipes[0]); + TEST(f2p()); + expect_transfer(name, strlen(name)); + + get_events(ARRAY_SIZE(events), events); + expect_event(events + 0, 1, IN_ACCESS); + expect_event(events + 1, 2, IN_MODIFY); + + SAFE_READ(true, pipes[0], buf, strlen(name)); + if (memcmp(buf, name, strlen(name))) + tst_brk(TFAIL, "buf contents bad"); +} +static ssize_t splice_file_to_pipe(void) +{ + return splice(memfd, NULL, pipes[1], NULL, 128 * 1024 * 1024, 0); +} +static ssize_t sendfile_file_to_pipe(void) +{ + return sendfile(pipes[1], memfd, NULL, 128 * 1024 * 1024); +} + +// write to pipe, transfer with splice, rewind file, read from file +// expecting: IN_ACCESS pipes[0], IN_MODIFY memfd +static void splice_pipe_to_file(const char *name, ssize_t (*param)(void)) +{ + (void)name; + (void)param; + struct inotify_event events[2]; + char buf[sizeof(__func__)]; + + SAFE_WRITE(SAFE_WRITE_RETRY, pipes[1], __func__, sizeof(__func__)); + watch_rw(pipes[0]); + watch_rw(memfd); + TEST(splice(pipes[0], NULL, memfd, NULL, 128 * 1024 * 1024, 0)); + expect_transfer(__func__, sizeof(__func__)); + + get_events(ARRAY_SIZE(events), events); + expect_event(events + 0, 1, IN_ACCESS); + expect_event(events + 1, 2, IN_MODIFY); + + SAFE_LSEEK(memfd, 0, SEEK_SET); + SAFE_READ(true, memfd, buf, sizeof(__func__)); + if (memcmp(buf, __func__, sizeof(__func__))) + tst_brk(TFAIL, "buf contents bad"); +} + +// write to data_pipe, transfer accd'g to p2p, read from pipe +// expecting: IN_ACCESS data_pipes[0], IN_MODIFY pipes[1] +static void pipe_to_pipe(const char *name, ssize_t (*p2p)(void)) +{ + struct inotify_event events[2]; + char buf[strlen(name)]; + + SAFE_WRITE(SAFE_WRITE_RETRY, data_pipes[1], name, strlen(name)); + watch_rw(data_pipes[0]); + watch_rw(pipes[1]); + TEST(p2p()); + expect_transfer(name, strlen(name)); + + get_events(ARRAY_SIZE(events), events); + expect_event(events + 0, 1, IN_ACCESS); + expect_event(events + 1, 2, IN_MODIFY); + + SAFE_READ(true, pipes[0], buf, strlen(name)); + if (memcmp(buf, name, strlen(name))) + tst_brk(TFAIL, "buf contents bad"); +} +static ssize_t splice_pipe_to_pipe(void) +{ + return splice(data_pipes[0], NULL, pipes[1], NULL, 128 * 1024 * 1024, + 0); +} +static ssize_t tee_pipe_to_pipe(void) +{ + return tee(data_pipes[0], pipes[1], 128 * 1024 * 1024, 0); +} + +// vmsplice to pipe, read from pipe +// expecting: IN_MODIFY pipes[0] +static char vmsplice_pipe_to_mem_dt[32 * 1024]; +static void vmsplice_pipe_to_mem(const char *name, ssize_t (*param)(void)) +{ + (void)name; + (void)param; + struct inotify_event event; + char buf[sizeof(__func__)]; + + memcpy(vmsplice_pipe_to_mem_dt, __func__, sizeof(__func__)); + watch_rw(pipes[0]); + TEST(vmsplice( + pipes[1], + &(struct iovec){ .iov_base = vmsplice_pipe_to_mem_dt, + .iov_len = sizeof(vmsplice_pipe_to_mem_dt) }, + 1, SPLICE_F_GIFT)); + expect_transfer(__func__, sizeof(vmsplice_pipe_to_mem_dt)); + + get_events(1, &event); + expect_event(&event, 1, IN_MODIFY); + + SAFE_READ(true, pipes[0], buf, sizeof(__func__)); + if (memcmp(buf, __func__, sizeof(__func__))) + tst_brk(TFAIL, "buf contents bad"); +} + +// write to pipe, vmsplice from pipe +// expecting: IN_ACCESS pipes[1] +static void vmsplice_mem_to_pipe(const char *name, ssize_t (*param)(void)) +{ + (void)name; + (void)param; + char buf[sizeof(__func__)]; + struct inotify_event event; + + SAFE_WRITE(SAFE_WRITE_RETRY, pipes[1], __func__, sizeof(__func__)); + watch_rw(pipes[1]); + TEST(vmsplice(pipes[0], + &(struct iovec){ .iov_base = buf, + .iov_len = sizeof(buf) }, + 1, 0)); + expect_transfer(__func__, sizeof(buf)); + + get_events(1, &event); + expect_event(&event, 1, IN_ACCESS); + + if (memcmp(buf, __func__, sizeof(__func__))) + tst_brk(TFAIL, "buf contents bad"); +} + +#define TEST_F(f, param) \ + { \ + #f, f, param, \ + } +static const struct { + const char *n; + void (*f)(const char *name, ssize_t (*param)(void)); + ssize_t (*param)(void); +} tests[] = { + TEST_F(file_to_pipe, splice_file_to_pipe), + TEST_F(file_to_pipe, sendfile_file_to_pipe), + TEST_F(splice_pipe_to_file, NULL), + TEST_F(pipe_to_pipe, splice_pipe_to_pipe), + TEST_F(pipe_to_pipe, tee_pipe_to_pipe), + TEST_F(vmsplice_pipe_to_mem, NULL), + TEST_F(vmsplice_mem_to_pipe, NULL), +}; + +static void cleanup(void) +{ + if (memfd != -1) + SAFE_CLOSE(memfd); + if (inotify != -1) + SAFE_CLOSE(inotify); + if (pipes[0] != -1) + SAFE_CLOSE(pipes[0]); + if (pipes[1] != -1) + SAFE_CLOSE(pipes[1]); + if (data_pipes[0] != -1) + SAFE_CLOSE(data_pipes[0]); + if (data_pipes[1] != -1) + SAFE_CLOSE(data_pipes[1]); +} + +static void run_test(unsigned int n) +{ + tst_res(TINFO, "%s", tests[n].n); + + SAFE_PIPE2(pipes, O_CLOEXEC); + SAFE_PIPE2(data_pipes, O_CLOEXEC); + inotify = SAFE_MYINOTIFY_INIT1(IN_NONBLOCK | IN_CLOEXEC); + memfd = memfd_create(__func__, MFD_CLOEXEC); + if (memfd == -1) + tst_brk(TCONF | TERRNO, "memfd"); + tests[n].f(tests[n].n, tests[n].param); + tst_res(TPASS, "ок"); + cleanup(); +} + +static struct tst_test test = { + .cleanup = cleanup, + .test = run_test, + .tcnt = ARRAY_SIZE(tests), + .tags = (const struct tst_tag[]){ {} }, +}; + +#else +TST_TEST_TCONF("system doesn't have required inotify support"); +#endif -- 2.39.2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 55+ messages in thread
* Re: [LTP RFC PATCH v3] inotify13: new test for fs/splice.c functions vs pipes vs inotify 2023-06-28 0:21 ` [LTP RFC PATCH v3] " Ahelenia Ziemiańska @ 2023-06-28 5:30 ` Amir Goldstein 2023-06-28 16:03 ` Ahelenia Ziemiańska 0 siblings, 1 reply; 55+ messages in thread From: Amir Goldstein @ 2023-06-28 5:30 UTC (permalink / raw) To: Ahelenia Ziemiańska Cc: Alexander Viro, Christian Brauner, linux-fsdevel, linux-kernel, Jan Kara, Chung-Chiang Cheng, ltp, Petr Vorel On Wed, Jun 28, 2023 at 3:21 AM Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> wrote: > > The only one that passes on 6.1.27-1 is sendfile_file_to_pipe. > > Link: https://lore.kernel.org/linux-fsdevel/jbyihkyk5dtaohdwjyivambb2gffyjs3dodpofafnkkunxq7bu@jngkdxx65pux/t/#u > Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> > --- > Sorry, I missed second part of Amir's comments somehow. > cleanup is only run at the end by default: > run it manually to not leak fds between tests. > > I've parameterised the tests from the driver, instead of with macros, > and removed the tst_tag data. > > Added the * [Description] tag and full commit subject to the header > comment; leaving the lore.k.o link for now, to be turned into a SHA > when the kernel behaviour this tests starts having a SHA. > > Error checking has been lifted out as well. > Formatted in kernel style accd'g to clang-format and check-inotify13. > > I used the wrong address for ltp@ the first time; I've since bounced the > patchset, and am sending this, to the correct address. They were all > held for moderation for now. > > testcases/kernel/syscalls/inotify/.gitignore | 1 + > testcases/kernel/syscalls/inotify/inotify13.c | 282 ++++++++++++++++++ > 2 files changed, 283 insertions(+) > create mode 100644 testcases/kernel/syscalls/inotify/inotify13.c > > diff --git a/testcases/kernel/syscalls/inotify/.gitignore b/testcases/kernel/syscalls/inotify/.gitignore > index f6e5c546a..b597ea63f 100644 > --- a/testcases/kernel/syscalls/inotify/.gitignore > +++ b/testcases/kernel/syscalls/inotify/.gitignore > @@ -10,3 +10,4 @@ > /inotify10 > /inotify11 > /inotify12 > +/inotify13 > diff --git a/testcases/kernel/syscalls/inotify/inotify13.c b/testcases/kernel/syscalls/inotify/inotify13.c > new file mode 100644 > index 000000000..97f88053e > --- /dev/null > +++ b/testcases/kernel/syscalls/inotify/inotify13.c > @@ -0,0 +1,282 @@ > +// SPDX-License-Identifier: GPL-2.0-or-later > +/*\ > + * [Description] > + * Verify splice-family functions (and sendfile) generate IN_ACCESS > + * for what they read and IN_MODIFY for what they write. > + * > + * Regression test for 983652c69199 ("splice: report related fsnotify events") and > + * https://lore.kernel.org/linux-fsdevel/jbyihkyk5dtaohdwjyivambb2gffyjs3dodpofafnkkunxq7bu@jngkdxx65pux/t/#u The process of posting a test for the fix that was not yet merged is indeed a chicken and egg situation. What I usually do is post a draft test (like this) and link to the post of the LTP test (and maybe a branch on github) when posting the fix, to say how I tested the fix. I would then put it in my TODO to re-post the LTP test once the kernel fix has been merged. > + */ > + > +#define _GNU_SOURCE > +#include "config.h" > + > +#include <stdio.h> > +#include <unistd.h> > +#include <stdlib.h> > +#include <fcntl.h> > +#include <stdbool.h> > +#include <inttypes.h> > +#include <signal.h> > +#include <sys/mman.h> > +#include <sys/sendfile.h> > + > +#include "tst_test.h" > +#include "tst_safe_macros.h" > +#include "inotify.h" > + > +#if defined(HAVE_SYS_INOTIFY_H) > +#include <sys/inotify.h> > + > +static int pipes[2] = { -1, -1 }; > +static int inotify = -1; > +static int memfd = -1; > +static int data_pipes[2] = { -1, -1 }; > + > +static void watch_rw(int fd) > +{ > + char buf[64]; > + > + sprintf(buf, "/proc/self/fd/%d", fd); > + SAFE_MYINOTIFY_ADD_WATCH(inotify, buf, IN_ACCESS | IN_MODIFY); > +} > + > +static int compar(const void *l, const void *r) > +{ > + const struct inotify_event *lie = l; > + const struct inotify_event *rie = r; > + > + return lie->wd - rie->wd; > +} > + > +static void get_events(size_t evcnt, struct inotify_event evs[static evcnt]) > +{ > + struct inotify_event tail, *itr = evs; > + > + for (size_t left = evcnt; left; --left) > + SAFE_READ(true, inotify, itr++, sizeof(struct inotify_event)); > + > + TEST(read(inotify, &tail, sizeof(struct inotify_event))); > + if (TST_RET != -1) > + tst_brk(TFAIL, ">%zu events", evcnt); > + if (TST_ERR != EAGAIN) > + tst_brk(TFAIL | TTERRNO, "expected EAGAIN"); > + > + qsort(evs, evcnt, sizeof(struct inotify_event), compar); > +} > + > +static void expect_transfer(const char *name, size_t size) > +{ > + if (TST_RET == -1) > + tst_brk(TBROK | TERRNO, "%s", name); > + if ((size_t)TST_RET != size) > + tst_brk(TBROK, "%s: %ld != %zu", name, TST_RET, size); > +} > + > +static void expect_event(struct inotify_event *ev, int wd, uint32_t mask) > +{ > + if (ev->wd != wd) > + tst_brk(TFAIL, "expect event for wd %d got %d", wd, ev->wd); > + if (ev->mask != mask) > + tst_brk(TFAIL, > + "expect event with mask %" PRIu32 " got %" PRIu32 "", > + mask, ev->mask); > +} > + > +// write to file, rewind, transfer accd'g to f2p, read from pipe > +// expecting: IN_ACCESS memfd, IN_MODIFY pipes[0] > +static void file_to_pipe(const char *name, ssize_t (*f2p)(void)) > +{ > + struct inotify_event events[2]; > + char buf[strlen(name)]; > + > + SAFE_WRITE(SAFE_WRITE_RETRY, memfd, name, strlen(name)); > + SAFE_LSEEK(memfd, 0, SEEK_SET); > + watch_rw(memfd); > + watch_rw(pipes[0]); > + TEST(f2p()); > + expect_transfer(name, strlen(name)); > + > + get_events(ARRAY_SIZE(events), events); > + expect_event(events + 0, 1, IN_ACCESS); > + expect_event(events + 1, 2, IN_MODIFY); So what I meant to say is that if there are double events that usually get merged (unless reader was fast enough to read the first event), this is something that I could live with, but encoding an expectation for a double event, that's not at all what I meant. But anyway, I see that you've found a way to work around this problem, so at least the test can expect and get a single event. I think you are missing expect_no_more_events() here to verify that you won't get double events. See test inotify12 as an example for a test that encodes expect_events per test case and also verifies there are no unexpected extra events. That's also an example of a more generic test template, but your test cases are all a bit different from each other is subtle ways, so I trust you will find the best balance between putting generic parameterized code in the run_test() template and putting code in the test case subroutine. > + > + SAFE_READ(true, pipes[0], buf, strlen(name)); > + if (memcmp(buf, name, strlen(name))) > + tst_brk(TFAIL, "buf contents bad"); > +} > +static ssize_t splice_file_to_pipe(void) > +{ > + return splice(memfd, NULL, pipes[1], NULL, 128 * 1024 * 1024, 0); > +} > +static ssize_t sendfile_file_to_pipe(void) > +{ > + return sendfile(pipes[1], memfd, NULL, 128 * 1024 * 1024); > +} > + > +// write to pipe, transfer with splice, rewind file, read from file > +// expecting: IN_ACCESS pipes[0], IN_MODIFY memfd > +static void splice_pipe_to_file(const char *name, ssize_t (*param)(void)) > +{ > + (void)name; > + (void)param; > + struct inotify_event events[2]; > + char buf[sizeof(__func__)]; > + > + SAFE_WRITE(SAFE_WRITE_RETRY, pipes[1], __func__, sizeof(__func__)); > + watch_rw(pipes[0]); > + watch_rw(memfd); > + TEST(splice(pipes[0], NULL, memfd, NULL, 128 * 1024 * 1024, 0)); > + expect_transfer(__func__, sizeof(__func__)); > + > + get_events(ARRAY_SIZE(events), events); > + expect_event(events + 0, 1, IN_ACCESS); > + expect_event(events + 1, 2, IN_MODIFY); > + > + SAFE_LSEEK(memfd, 0, SEEK_SET); > + SAFE_READ(true, memfd, buf, sizeof(__func__)); > + if (memcmp(buf, __func__, sizeof(__func__))) > + tst_brk(TFAIL, "buf contents bad"); > +} > + > +// write to data_pipe, transfer accd'g to p2p, read from pipe > +// expecting: IN_ACCESS data_pipes[0], IN_MODIFY pipes[1] > +static void pipe_to_pipe(const char *name, ssize_t (*p2p)(void)) > +{ > + struct inotify_event events[2]; > + char buf[strlen(name)]; > + > + SAFE_WRITE(SAFE_WRITE_RETRY, data_pipes[1], name, strlen(name)); > + watch_rw(data_pipes[0]); > + watch_rw(pipes[1]); > + TEST(p2p()); > + expect_transfer(name, strlen(name)); > + > + get_events(ARRAY_SIZE(events), events); > + expect_event(events + 0, 1, IN_ACCESS); > + expect_event(events + 1, 2, IN_MODIFY); > + > + SAFE_READ(true, pipes[0], buf, strlen(name)); > + if (memcmp(buf, name, strlen(name))) > + tst_brk(TFAIL, "buf contents bad"); > +} > +static ssize_t splice_pipe_to_pipe(void) > +{ > + return splice(data_pipes[0], NULL, pipes[1], NULL, 128 * 1024 * 1024, > + 0); > +} > +static ssize_t tee_pipe_to_pipe(void) > +{ > + return tee(data_pipes[0], pipes[1], 128 * 1024 * 1024, 0); > +} > + > +// vmsplice to pipe, read from pipe > +// expecting: IN_MODIFY pipes[0] > +static char vmsplice_pipe_to_mem_dt[32 * 1024]; > +static void vmsplice_pipe_to_mem(const char *name, ssize_t (*param)(void)) > +{ > + (void)name; > + (void)param; > + struct inotify_event event; > + char buf[sizeof(__func__)]; > + > + memcpy(vmsplice_pipe_to_mem_dt, __func__, sizeof(__func__)); > + watch_rw(pipes[0]); > + TEST(vmsplice( > + pipes[1], > + &(struct iovec){ .iov_base = vmsplice_pipe_to_mem_dt, > + .iov_len = sizeof(vmsplice_pipe_to_mem_dt) }, > + 1, SPLICE_F_GIFT)); > + expect_transfer(__func__, sizeof(vmsplice_pipe_to_mem_dt)); > + > + get_events(1, &event); > + expect_event(&event, 1, IN_MODIFY); > + > + SAFE_READ(true, pipes[0], buf, sizeof(__func__)); > + if (memcmp(buf, __func__, sizeof(__func__))) > + tst_brk(TFAIL, "buf contents bad"); > +} > + > +// write to pipe, vmsplice from pipe > +// expecting: IN_ACCESS pipes[1] > +static void vmsplice_mem_to_pipe(const char *name, ssize_t (*param)(void)) > +{ > + (void)name; > + (void)param; > + char buf[sizeof(__func__)]; > + struct inotify_event event; > + > + SAFE_WRITE(SAFE_WRITE_RETRY, pipes[1], __func__, sizeof(__func__)); > + watch_rw(pipes[1]); > + TEST(vmsplice(pipes[0], > + &(struct iovec){ .iov_base = buf, > + .iov_len = sizeof(buf) }, > + 1, 0)); > + expect_transfer(__func__, sizeof(buf)); > + > + get_events(1, &event); > + expect_event(&event, 1, IN_ACCESS); > + > + if (memcmp(buf, __func__, sizeof(__func__))) > + tst_brk(TFAIL, "buf contents bad"); > +} > + > +#define TEST_F(f, param) \ > + { \ > + #f, f, param, \ > + } > +static const struct { > + const char *n; > + void (*f)(const char *name, ssize_t (*param)(void)); > + ssize_t (*param)(void); > +} tests[] = { > + TEST_F(file_to_pipe, splice_file_to_pipe), > + TEST_F(file_to_pipe, sendfile_file_to_pipe), > + TEST_F(splice_pipe_to_file, NULL), > + TEST_F(pipe_to_pipe, splice_pipe_to_pipe), > + TEST_F(pipe_to_pipe, tee_pipe_to_pipe), > + TEST_F(vmsplice_pipe_to_mem, NULL), > + TEST_F(vmsplice_mem_to_pipe, NULL), > +}; > + > +static void cleanup(void) > +{ > + if (memfd != -1) > + SAFE_CLOSE(memfd); > + if (inotify != -1) > + SAFE_CLOSE(inotify); > + if (pipes[0] != -1) > + SAFE_CLOSE(pipes[0]); > + if (pipes[1] != -1) > + SAFE_CLOSE(pipes[1]); > + if (data_pipes[0] != -1) > + SAFE_CLOSE(data_pipes[0]); > + if (data_pipes[1] != -1) > + SAFE_CLOSE(data_pipes[1]); > +} > + > +static void run_test(unsigned int n) > +{ > + tst_res(TINFO, "%s", tests[n].n); > + > + SAFE_PIPE2(pipes, O_CLOEXEC); > + SAFE_PIPE2(data_pipes, O_CLOEXEC); > + inotify = SAFE_MYINOTIFY_INIT1(IN_NONBLOCK | IN_CLOEXEC); > + memfd = memfd_create(__func__, MFD_CLOEXEC); > + if (memfd == -1) > + tst_brk(TCONF | TERRNO, "memfd"); > + tests[n].f(tests[n].n, tests[n].param); > + tst_res(TPASS, "ок"); > + cleanup(); > +} > + > +static struct tst_test test = { > + .cleanup = cleanup, > + .test = run_test, > + .tcnt = ARRAY_SIZE(tests), > + .tags = (const struct tst_tag[]){ {} }, I don't think this is needed for the draft... Thanks, Amir. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [LTP RFC PATCH v3] inotify13: new test for fs/splice.c functions vs pipes vs inotify 2023-06-28 5:30 ` Amir Goldstein @ 2023-06-28 16:03 ` Ahelenia Ziemiańska 0 siblings, 0 replies; 55+ messages in thread From: Ahelenia Ziemiańska @ 2023-06-28 16:03 UTC (permalink / raw) To: Amir Goldstein Cc: Alexander Viro, Christian Brauner, linux-fsdevel, linux-kernel, Jan Kara, Chung-Chiang Cheng, ltp, Petr Vorel [-- Attachment #1: Type: text/plain, Size: 5321 bytes --] On Wed, Jun 28, 2023 at 08:30:15AM +0300, Amir Goldstein wrote: > On Wed, Jun 28, 2023 at 3:21 AM Ahelenia Ziemiańska > > diff --git a/testcases/kernel/syscalls/inotify/inotify13.c b/testcases/kernel/syscalls/inotify/inotify13.c > > new file mode 100644 > > index 000000000..97f88053e > > --- /dev/null > > +++ b/testcases/kernel/syscalls/inotify/inotify13.c > > @@ -0,0 +1,282 @@ > > +// SPDX-License-Identifier: GPL-2.0-or-later > > +/*\ > > + * [Description] > > + * Verify splice-family functions (and sendfile) generate IN_ACCESS > > + * for what they read and IN_MODIFY for what they write. > > + * > > + * Regression test for 983652c69199 ("splice: report related fsnotify events") and > > + * https://lore.kernel.org/linux-fsdevel/jbyihkyk5dtaohdwjyivambb2gffyjs3dodpofafnkkunxq7bu@jngkdxx65pux/t/#u > The process of posting a test for the fix that was not yet merged > is indeed a chicken and egg situation. > > What I usually do is post a draft test (like this) and link > to the post of the LTP test (and maybe a branch on github) > when posting the fix, to say how I tested the fix. https://git.sr.ht/~nabijaczleweli/ltp/commit/v4 for now. > I would then put it in my TODO to re-post the LTP > test once the kernel fix has been merged. Yep. > > +static int compar(const void *l, const void *r) > > +{ > > + const struct inotify_event *lie = l; > > + const struct inotify_event *rie = r; > > + > > + return lie->wd - rie->wd; > > +} > > + > > +static void get_events(size_t evcnt, struct inotify_event evs[static evcnt]) > > +{ > > + struct inotify_event tail, *itr = evs; > > + > > + for (size_t left = evcnt; left; --left) > > + SAFE_READ(true, inotify, itr++, sizeof(struct inotify_event)); > > + > > + TEST(read(inotify, &tail, sizeof(struct inotify_event))); > > + if (TST_RET != -1) > > + tst_brk(TFAIL, ">%zu events", evcnt); > > + if (TST_ERR != EAGAIN) > > + tst_brk(TFAIL | TTERRNO, "expected EAGAIN"); > > + > > + qsort(evs, evcnt, sizeof(struct inotify_event), compar); > > +} > > + > > +static void expect_transfer(const char *name, size_t size) > > +{ > > + if (TST_RET == -1) > > + tst_brk(TBROK | TERRNO, "%s", name); > > + if ((size_t)TST_RET != size) > > + tst_brk(TBROK, "%s: %ld != %zu", name, TST_RET, size); > > +} > > + > > +static void expect_event(struct inotify_event *ev, int wd, uint32_t mask) > > +{ > > + if (ev->wd != wd) > > + tst_brk(TFAIL, "expect event for wd %d got %d", wd, ev->wd); > > + if (ev->mask != mask) > > + tst_brk(TFAIL, > > + "expect event with mask %" PRIu32 " got %" PRIu32 "", > > + mask, ev->mask); > > +} > > + > > +// write to file, rewind, transfer accd'g to f2p, read from pipe > > +// expecting: IN_ACCESS memfd, IN_MODIFY pipes[0] > > +static void file_to_pipe(const char *name, ssize_t (*f2p)(void)) > > +{ > > + struct inotify_event events[2]; > > + char buf[strlen(name)]; > > + > > + SAFE_WRITE(SAFE_WRITE_RETRY, memfd, name, strlen(name)); > > + SAFE_LSEEK(memfd, 0, SEEK_SET); > > + watch_rw(memfd); > > + watch_rw(pipes[0]); > > + TEST(f2p()); > > + expect_transfer(name, strlen(name)); > > + > > + get_events(ARRAY_SIZE(events), events); > > + expect_event(events + 0, 1, IN_ACCESS); > > + expect_event(events + 1, 2, IN_MODIFY); > So what I meant to say is that if there are double events that > usually get merged (unless reader was fast enough to read the > first event), this is something that I could live with, but encoding > an expectation for a double event, that's not at all what I meant. > > But anyway, I see that you've found a way to work around > this problem, so at least the test can expect and get a single event. I've tried (admittedly, not all that hard) to read a double out modify event in this case with the v4 kernel patchset and haven't managed it. > I think you are missing expect_no_more_events() here to > verify that you won't get double events. get_events() reads precisely N events, then tries to read another, and fails if that succeeds. Maybe a better name would be "get_events_exact()". > See test inotify12 as an example for a test that encodes > expect_events per test case and also verifies there are no > unexpected extra events. > > That's also an example of a more generic test template, > but your test cases are all a bit different from each other is > subtle ways, so I trust you will find the best balance between > putting generic parameterized code in the run_test() template > and putting code in the test case subroutine. Yes, that's indeed an optics issue: it looks like there's more, but the only actually "common" bit of the test drivers is that they all read events in the middle: the set-up before is different, and the additional post-conditions are different. We /could/ encode the expected events in the test array, but then that would put the expected events away from the code that generates them, which is more code, and more confusing for no good reason I think. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [LTP PATCH] inotify13: new test for fs/splice.c functions vs pipes vs inotify 2023-06-27 18:31 ` Amir Goldstein 2023-06-27 20:59 ` [LTP RFC PATCH v2] " Ahelenia Ziemiańska @ 2023-06-27 22:57 ` Petr Vorel 1 sibling, 0 replies; 55+ messages in thread From: Petr Vorel @ 2023-06-27 22:57 UTC (permalink / raw) To: Amir Goldstein Cc: Ahelenia Ziemiańska, Alexander Viro, Christian Brauner, linux-fsdevel, linux-kernel, Jan Kara, Chung-Chiang Cheng, ltp > On Tue, Jun 27, 2023 at 7:57 PM Ahelenia Ziemiańska > <nabijaczleweli@nabijaczleweli.xyz> wrote: > > The only one that passes on 6.1.27-1 is sendfile_file_to_pipe. > > Link: https://lore.kernel.org/linux-fsdevel/jbyihkyk5dtaohdwjyivambb2gffyjs3dodpofafnkkunxq7bu@jngkdxx65pux/t/#u > > Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> > > --- > > Formatted to clang-format defaults. Put the original Fixes:ed SHA in the > > metadata, that's probably fine, right? > No. The git commit is for the commits that fix the problem. > This can only be added after your fixes are merged. > I will let the LPT developers comment about style, > but I think LTP project wants tab indents. > I am personally unable to read this patch with so little indentation > and so much macroing. Yes, it's hard to read. Style formatting it would improve it little bit (make check-inotify13 is your friend, it complains a lot, also some spaces above if () would make it more readable), but there are other things, e.g. macros F2P(splice) and P2P(splice) should be functions (readability). Please have look at other inotify tests, they are fairly simple and easy to read. Also, this is a patch for LTP, you're supposed to post it also to LTP mailing list (ltp@lists.linux.it, you need to register to https://lists.linux.it/listinfo/ltp first). > > testcases/kernel/syscalls/inotify/.gitignore | 1 + > > testcases/kernel/syscalls/inotify/inotify13.c | 246 ++++++++++++++++++ > > 2 files changed, 247 insertions(+) > > create mode 100644 testcases/kernel/syscalls/inotify/inotify13.c > > diff --git a/testcases/kernel/syscalls/inotify/.gitignore b/testcases/kernel/syscalls/inotify/.gitignore > > index f6e5c546a..b597ea63f 100644 > > --- a/testcases/kernel/syscalls/inotify/.gitignore > > +++ b/testcases/kernel/syscalls/inotify/.gitignore > > @@ -10,3 +10,4 @@ > > /inotify10 > > /inotify11 > > /inotify12 > > +/inotify13 > > diff --git a/testcases/kernel/syscalls/inotify/inotify13.c b/testcases/kernel/syscalls/inotify/inotify13.c > > new file mode 100644 > > index 000000000..c34f1dc9f > > --- /dev/null > > +++ b/testcases/kernel/syscalls/inotify/inotify13.c > > @@ -0,0 +1,246 @@ > > +// SPDX-License-Identifier: GPL-2.0-or-later > > +/*\ You need to add here: * [Description] > > + * Verify splice-family functions (and sendfile) generate IN_ACCESS > > + * for what they read and IN_MODIFY for what they write. > > + * > > + * Regression test for 983652c69199 and I guess there would be only 983652c69199 ("splice: report related fsnotify events"). > > + * https://lore.kernel.org/linux-fsdevel/jbyihkyk5dtaohdwjyivambb2gffyjs3dodpofafnkkunxq7bu@jngkdxx65pux/t/#u This is some discussion, not a real patch. Not sure how much useful it is, until it results to fix accepted in the mainline kernel. > > + */ > > + > > +#define _GNU_SOURCE > > +#include "config.h" > > + > > +#include <stdio.h> > > +#include <unistd.h> > > +#include <stdlib.h> > > +#include <fcntl.h> > > +#include <stdbool.h> > > +#include <inttypes.h> > > +#include <signal.h> > > +#include <sys/mman.h> > > +#include <sys/sendfile.h> > > + > > +#include "tst_test.h" > > +#include "tst_safe_macros.h" > > +#include "inotify.h" > > + > > +#if defined(HAVE_SYS_INOTIFY_H) > > +#include <sys/inotify.h> > > + > > + > > +static int pipes[2] = {-1, -1}; > > +static int inotify = -1; > > +static int memfd = -1; > > +static int data_pipes[2] = {-1, -1}; > > + > > +static void watch_rw(int fd) { > > + char buf[64]; > > + sprintf(buf, "/proc/self/fd/%d", fd); > > + SAFE_MYINOTIFY_ADD_WATCH(inotify, buf, IN_ACCESS | IN_MODIFY); > > +} > > + > > +static int compar(const void *l, const void *r) { > > + const struct inotify_event *lie = l; > > + const struct inotify_event *rie = r; > > + return lie->wd - rie->wd; > > +} > > + > > +static void get_events(size_t evcnt, struct inotify_event evs[static evcnt]) { > > + struct inotify_event tail, *itr = evs; > > + for (size_t left = evcnt; left; --left) > > + SAFE_READ(true, inotify, itr++, sizeof(struct inotify_event)); > > + > > + TEST(read(inotify, &tail, sizeof(struct inotify_event))); > > + if (TST_RET != -1) > > + tst_brk(TFAIL, "expect %zu events", evcnt); > > + if (TST_ERR != EAGAIN) > > + tst_brk(TFAIL | TTERRNO, "expected EAGAIN"); > > + > > + qsort(evs, evcnt, sizeof(struct inotify_event), compar); > > +} > > + > > +static void expect_event(struct inotify_event *ev, int wd, uint32_t mask) { > > + if (ev->wd != wd) > > + tst_brk(TFAIL, "expect event for wd %d got %d", wd, ev->wd); > > + if (ev->mask != mask) > > + tst_brk(TFAIL, "expect event with mask %" PRIu32 " got %" PRIu32 "", mask, > > + ev->mask); > > +} > > + > > +#define F2P(splice) \ > > + SAFE_WRITE(SAFE_WRITE_RETRY, memfd, __func__, sizeof(__func__)); \ > > + SAFE_LSEEK(memfd, 0, SEEK_SET); \ > > + watch_rw(memfd); \ > > + watch_rw(pipes[0]); \ > > + TEST(splice); \ > > + if (TST_RET == -1) \ > > + tst_brk(TBROK | TERRNO, #splice); \ > > + if (TST_RET != sizeof(__func__)) \ > > + tst_brk(TBROK, #splice ": %" PRId64 "", TST_RET); \ > > + \ > > + /*expecting: IN_ACCESS memfd, IN_MODIFY pipes[0]*/ \ > > + struct inotify_event events[2]; \ > > + get_events(ARRAY_SIZE(events), events); \ > > + expect_event(events + 0, 1, IN_ACCESS); \ > > + expect_event(events + 1, 2, IN_MODIFY); \ > > + \ > > + char buf[sizeof(__func__)]; \ > > + SAFE_READ(true, pipes[0], buf, sizeof(__func__)); \ > > + if (memcmp(buf, __func__, sizeof(__func__))) \ > > + tst_brk(TFAIL, "buf contents bad"); > > +static void splice_file_to_pipe(void) { > > + F2P(splice(memfd, NULL, pipes[1], NULL, 128 * 1024 * 1024, 0)); > > +} > > +static void sendfile_file_to_pipe(void) { > > + F2P(sendfile(pipes[1], memfd, NULL, 128 * 1024 * 1024)); > > +} > > + > > +static void splice_pipe_to_file(void) { > > + SAFE_WRITE(SAFE_WRITE_RETRY, pipes[1], __func__, sizeof(__func__)); > > + watch_rw(pipes[0]); > > + watch_rw(memfd); > > + TEST(splice(pipes[0], NULL, memfd, NULL, 128 * 1024 * 1024, 0)); > > + if(TST_RET == -1) > > + tst_brk(TBROK | TERRNO, "splice"); > > + if(TST_RET != sizeof(__func__)) > > + tst_brk(TBROK, "splice: %" PRId64 "", TST_RET); > > + > > + // expecting: IN_ACCESS pipes[0], IN_MODIFY memfd > > + struct inotify_event events[2]; > > + get_events(ARRAY_SIZE(events), events); > > + expect_event(events + 0, 1, IN_ACCESS); > > + expect_event(events + 1, 2, IN_MODIFY); > > + > > + char buf[sizeof(__func__)]; > > + SAFE_LSEEK(memfd, 0, SEEK_SET); > > + SAFE_READ(true, memfd, buf, sizeof(__func__)); > > + if (memcmp(buf, __func__, sizeof(__func__))) > > + tst_brk(TFAIL, "buf contents bad"); > > +} > > + > > +#define P2P(splice) \ > > + SAFE_WRITE(SAFE_WRITE_RETRY, data_pipes[1], __func__, sizeof(__func__)); \ > > + watch_rw(data_pipes[0]); \ > > + watch_rw(pipes[1]); \ > > + TEST(splice); \ > > + if (TST_RET == -1) \ > > + tst_brk(TBROK | TERRNO, #splice); \ > > + if (TST_RET != sizeof(__func__)) \ > > + tst_brk(TBROK, #splice ": %" PRId64 "", TST_RET); \ > > + \ > > + /* expecting: IN_ACCESS data_pipes[0], IN_MODIFY pipes[1] */ \ > > + struct inotify_event events[2]; \ > > + get_events(ARRAY_SIZE(events), events); \ > > + expect_event(events + 0, 1, IN_ACCESS); \ > > + expect_event(events + 1, 2, IN_MODIFY); \ > > + \ > > + char buf[sizeof(__func__)]; \ > > + SAFE_READ(true, pipes[0], buf, sizeof(__func__)); \ > > + if (memcmp(buf, __func__, sizeof(__func__))) \ > > + tst_brk(TFAIL, "buf contents bad"); > > +static void splice_pipe_to_pipe(void) { > > + P2P(splice(data_pipes[0], NULL, pipes[1], NULL, 128 * 1024 * 1024, 0)); > > +} > > +static void tee_pipe_to_pipe(void) { > > + P2P(tee(data_pipes[0], pipes[1], 128 * 1024 * 1024, 0)); > > +} > > + > > +static char vmsplice_pipe_to_mem_dt[32 * 1024]; > > +static void vmsplice_pipe_to_mem(void) { > > + memcpy(vmsplice_pipe_to_mem_dt, __func__, sizeof(__func__)); > > + watch_rw(pipes[0]); > > + TEST(vmsplice(pipes[1], > > + &(struct iovec){.iov_base = vmsplice_pipe_to_mem_dt, > > + .iov_len = sizeof(vmsplice_pipe_to_mem_dt)}, > > + 1, SPLICE_F_GIFT)); > > + if (TST_RET == -1) > > + tst_brk(TBROK | TERRNO, "vmsplice"); > > + if (TST_RET != sizeof(vmsplice_pipe_to_mem_dt)) > > + tst_brk(TBROK, "vmsplice: %" PRId64 "", TST_RET); > > + > > + // expecting: IN_MODIFY pipes[0] > > + struct inotify_event event; > > + get_events(1, &event); > > + expect_event(&event, 1, IN_MODIFY); > > + > > + char buf[sizeof(__func__)]; > > + SAFE_READ(true, pipes[0], buf, sizeof(__func__)); > > + if (memcmp(buf, __func__, sizeof(__func__))) > > + tst_brk(TFAIL, "buf contents bad"); > > +} > > + > > +static void vmsplice_mem_to_pipe(void) { > > + char buf[sizeof(__func__)]; > > + SAFE_WRITE(SAFE_WRITE_RETRY, pipes[1], __func__, sizeof(__func__)); > > + watch_rw(pipes[1]); > > + TEST(vmsplice(pipes[0], > > + &(struct iovec){.iov_base = buf, .iov_len = sizeof(buf)}, 1, > > + 0)); > > + if (TST_RET == -1) > > + tst_brk(TBROK | TERRNO, "vmsplice"); > > + if (TST_RET != sizeof(buf)) > > + tst_brk(TBROK, "vmsplice: %" PRId64 "", TST_RET); > > + > > + // expecting: IN_ACCESS pipes[1] > > + struct inotify_event event; > > + get_events(1, &event); > > + expect_event(&event, 1, IN_ACCESS); > > + if (memcmp(buf, __func__, sizeof(__func__))) > > + tst_brk(TFAIL, "buf contents bad"); > > +} > > + > > +typedef void (*tests_f)(void); > > +#define TEST_F(f) { f, #f } > > +static const struct { > > + tests_f f; > > + const char *n; > > +} tests[] = { > > + TEST_F(splice_file_to_pipe), TEST_F(sendfile_file_to_pipe), > > + TEST_F(splice_pipe_to_file), TEST_F(splice_pipe_to_pipe), > > + TEST_F(tee_pipe_to_pipe), TEST_F(vmsplice_pipe_to_mem), > > + TEST_F(vmsplice_mem_to_pipe), > > +}; > > + > > +static void run_test(unsigned int n) > > +{ > > + tst_res(TINFO, "%s", tests[n].n); > > + > > + SAFE_PIPE2(pipes, O_CLOEXEC); > > + SAFE_PIPE2(data_pipes, O_CLOEXEC); > > + inotify = SAFE_MYINOTIFY_INIT1(IN_NONBLOCK | IN_CLOEXEC); > > + if((memfd = memfd_create(__func__, MFD_CLOEXEC)) == -1) > > + tst_brk(TCONF | TERRNO, "memfd"); > > + tests[n].f(); > Normally, a test cases table would encode things like > the number of expected events and type of events. > The idea is that the test template has parametrized code > and not just a loop for test cases subroutines, but there > are many ways to write tests, so as long as it gets the job > done and is readable to humans, I don't mind. > Right now this test may do the job, but it is not readable > for this human ;-) > mostly because of the huge macros - > LTP is known for pretty large macros, but those are > for generic utilities and you have complete test cases > written as macros (templates). +100. We strive for simple readable code, which is not this one. Kind regards, Petr > > + tst_res(TPASS, "ок"); > > +} > > + > > +static void cleanup(void) > > +{ > > + if (memfd != -1) > > + SAFE_CLOSE(memfd); > > + if (inotify != -1) > > + SAFE_CLOSE(inotify); > > + if (pipes[0] != -1) > > + SAFE_CLOSE(pipes[0]); > > + if (pipes[1] != -1) > > + SAFE_CLOSE(pipes[1]); > > + if (data_pipes[0] != -1) > > + SAFE_CLOSE(data_pipes[0]); > > + if (data_pipes[1] != -1) > > + SAFE_CLOSE(data_pipes[1]); > > +} > > + > This cleanup does not happen for every test case - > it happens only at the end of all the tests IIRC. > > +static struct tst_test test = { > > + .max_runtime = 10, > > + .cleanup = cleanup, > > + .test = run_test, > > + .tcnt = ARRAY_SIZE(tests), > > + .tags = (const struct tst_tag[]) { > > + {"linux-git", "983652c69199"}, > Leave this out for now. > Thanks, > Amir. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH v3 0/3+1] fanotify accounting for fs/splice.c 2023-06-27 16:55 ` [PATCH v3 0/3+1] " Ahelenia Ziemiańska ` (3 preceding siblings ...) 2023-06-27 16:57 ` [LTP PATCH] inotify13: new test for fs/splice.c functions vs pipes vs inotify Ahelenia Ziemiańska @ 2023-06-27 18:03 ` Amir Goldstein 2023-06-27 20:34 ` Ahelenia Ziemiańska 2023-06-28 4:51 ` [PATCH v3 0/3+1] " Christoph Hellwig 5 siblings, 1 reply; 55+ messages in thread From: Amir Goldstein @ 2023-06-27 18:03 UTC (permalink / raw) To: Ahelenia Ziemiańska Cc: Alexander Viro, Christian Brauner, linux-fsdevel, linux-kernel, Jan Kara, Chung-Chiang Cheng, ltp On Tue, Jun 27, 2023 at 7:55 PM Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> wrote: > > In 1/3 I've applied if/else if/else tree like you said, > and expounded a bit in the message. > > This is less pretty now, however, since it turns out that If my advice turns out to be bad, then please drop it. > iter_file_splice_write() already marks the out fd as written because it > writes to it via vfs_iter_write(), and that sent a double notification. > > $ git grep -F .splice_write | grep -v iter_file_splice_write > drivers/char/mem.c: .splice_write = splice_write_null, > drivers/char/virtio_console.c: .splice_write = port_fops_splice_write, > fs/fuse/dev.c: .splice_write = fuse_dev_splice_write, > fs/gfs2/file.c: .splice_write = gfs2_file_splice_write, > fs/gfs2/file.c: .splice_write = gfs2_file_splice_write, > fs/overlayfs/file.c: .splice_write = ovl_splice_write, > net/socket.c: .splice_write = generic_splice_sendpage, > scripts/coccinelle/api/stream_open.cocci: .splice_write = splice_write_f, > > Of these, splice_write_null() doesn't mark out as written > (but it's for /dev/null so I think this is expected), > and I haven't been able to visually confirm whether > port_fops_splice_write() and generic_splice_sendpage() do. > > All the others delegate to iter_file_splice_write(). > All this is very troubling to me. It translates to a mental model that I cannot remember and cannot maintain for fixes whose value are still questionable. IIUC, the only thing you need to change in do_splice() for making your use case work is to add fsnotify_modify() for the splice_pipe_to_pipe() case. Right? So either make the change that you need, or all the changes that are simple to follow without trying to make the world consistent - these pipe iterators business is really messy. I don't know if avoiding a double event (which is likely not visible) is worth the complicated code that is hard to understand. > In 2/3 I fixed the vmsplice notification placement > (access from pipe, modify to pipe). > > I'm following this up with an LTP patch, where only sendfile_file_to_pipe > passes on 6.1.27-1 and all tests pass on v6.4 + this patchset. > Were these tests able to detect the double event? Maybe it's not visible because double consequent events get merged. > Ahelenia Ziemiańska (3): > splice: always fsnotify_access(in), fsnotify_modify(out) on success > splice: fsnotify_access(fd)/fsnotify_modify(fd) in vmsplice > splice: fsnotify_access(in), fsnotify_modify(out) on success in tee > > fs/splice.c | 43 +++++++++++++++++++++++++------------------ > 1 file changed, 25 insertions(+), 18 deletions(-) > > > Interdiff against v2: > diff --git a/fs/splice.c b/fs/splice.c > index 3234aaa6e957..0427f0a91c7d 100644 > --- a/fs/splice.c > +++ b/fs/splice.c > @@ -1155,10 +1155,7 @@ long do_splice(struct file *in, loff_t *off_in, struct file *out, > flags |= SPLICE_F_NONBLOCK; > > ret = splice_pipe_to_pipe(ipipe, opipe, len, flags); > - goto notify; > - } > - > - if (ipipe) { > + } else if (ipipe) { > if (off_in) > return -ESPIPE; > if (off_out) { > @@ -1188,10 +1185,10 @@ long do_splice(struct file *in, loff_t *off_in, struct file *out, > else > *off_out = offset; > > - goto notify; > - } > - > - if (opipe) { > + // ->splice_write already marked out > + // as modified via vfs_iter_write() > + goto noaccessout; That's too ugly IMO. Are you claiming that the code in master is wrong? Because in master there is fsnotify_modify(out) for (ipipe) case. > + } else if (opipe) { > if (off_out) > return -ESPIPE; > if (off_in) { > @@ -1211,17 +1208,14 @@ long do_splice(struct file *in, loff_t *off_in, struct file *out, > in->f_pos = offset; > else > *off_in = offset; > + } else > + return -EINVAL; > > - goto notify; > - } > - > - return -EINVAL; > - > -notify: > - if (ret > 0) { > - fsnotify_access(in); > + if (ret > 0) > fsnotify_modify(out); > - } > +noaccessout: > + if (ret > 0) > + fsnotify_access(in); > Not to mention that it should be nomodifyout, but I dislike this "common" code that it not common at all, so either just handle the pipe_to_pipe case to fix your use case or leave this code completely common ignoring the possible double events. Thanks, Amir. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH v3 0/3+1] fanotify accounting for fs/splice.c 2023-06-27 18:03 ` [PATCH v3 0/3+1] fanotify accounting for fs/splice.c Amir Goldstein @ 2023-06-27 20:34 ` Ahelenia Ziemiańska 2023-06-27 20:50 ` [PATCH v4 0/3] " Ahelenia Ziemiańska 0 siblings, 1 reply; 55+ messages in thread From: Ahelenia Ziemiańska @ 2023-06-27 20:34 UTC (permalink / raw) To: Amir Goldstein Cc: Alexander Viro, Christian Brauner, linux-fsdevel, linux-kernel, Jan Kara, Chung-Chiang Cheng, ltp [-- Attachment #1: Type: text/plain, Size: 3665 bytes --] On Tue, Jun 27, 2023 at 09:03:17PM +0300, Amir Goldstein wrote: > On Tue, Jun 27, 2023 at 7:55 PM Ahelenia Ziemiańska > <nabijaczleweli@nabijaczleweli.xyz> wrote: > > > > In 1/3 I've applied if/else if/else tree like you said, > > and expounded a bit in the message. > > > > This is less pretty now, however, since it turns out that > If my advice turns out to be bad, then please drop it. The if/else if/else with no goto is better than before; it was made ugly by the special-casing below. > > iter_file_splice_write() already marks the out fd as written because it > > writes to it via vfs_iter_write(), and that sent a double notification. > > > > $ git grep -F .splice_write | grep -v iter_file_splice_write > > drivers/char/mem.c: .splice_write = splice_write_null, > > drivers/char/virtio_console.c: .splice_write = port_fops_splice_write, > > fs/fuse/dev.c: .splice_write = fuse_dev_splice_write, > > fs/gfs2/file.c: .splice_write = gfs2_file_splice_write, > > fs/gfs2/file.c: .splice_write = gfs2_file_splice_write, > > fs/overlayfs/file.c: .splice_write = ovl_splice_write, > > net/socket.c: .splice_write = generic_splice_sendpage, > > scripts/coccinelle/api/stream_open.cocci: .splice_write = splice_write_f, > > > > Of these, splice_write_null() doesn't mark out as written > > (but it's for /dev/null so I think this is expected), > > and I haven't been able to visually confirm whether > > port_fops_splice_write() and generic_splice_sendpage() do. > > > > All the others delegate to iter_file_splice_write(). > All this is very troubling to me. > It translates to a mental model that I cannot remember and > cannot maintain for fixes whose value are still questionable. > > IIUC, the only thing you need to change in do_splice() for > making your use case work is to add fsnotify_modify() > for the splice_pipe_to_pipe() case. Right? No, all splice/tee/vmsplice cases need to generate modify events for the output fd. Really, all I/O syscalls do, but those are for today. > So either make the change that you need, or all the changes > that are simple to follow without trying to make the world > consistent Thus I also originally had all the aforementioned generate access/modify for in/out. > - these pipe iterators business is really messy. > I don't know if avoiding a double event (which is likely not visible) > is worth the complicated code that is hard to understand. > > > In 2/3 I fixed the vmsplice notification placement > > (access from pipe, modify to pipe). > > > > I'm following this up with an LTP patch, where only sendfile_file_to_pipe > > passes on 6.1.27-1 and all tests pass on v6.4 + this patchset. > Were these tests able to detect the double event? > Maybe it's not visible because double consequent events get merged. That's how I discovered it, yes. They aren't merged because we'd generate modify out <- from the VFS callback access in <- from do_splice modify out <- ibid. I agree this got very ugly very fast for a weird edge case ‒ maybe I did get a little over-zealous on having a consistent "one syscall↔one event for each affected file" model. OTOH: I've found that just using if (ret > 0) { fsnotify_modify(out); fsnotify_access(in); } does get the events merged from modify out <- from the VFS callback modify out <- from do_splice access in <- ibid. into modify out access in which solves all issues (reliable wake-up regardless of backing file, no spurious wake-ups) at no cost. I would've done this originally, but I hadn't known inotify events get merged :v [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 55+ messages in thread
* [PATCH v4 0/3] fanotify accounting for fs/splice.c 2023-06-27 20:34 ` Ahelenia Ziemiańska @ 2023-06-27 20:50 ` Ahelenia Ziemiańska 2023-06-27 20:50 ` [PATCH v4 1/3] splice: always fsnotify_access(in), fsnotify_modify(out) on success Ahelenia Ziemiańska ` (4 more replies) 0 siblings, 5 replies; 55+ messages in thread From: Ahelenia Ziemiańska @ 2023-06-27 20:50 UTC (permalink / raw) To: Amir Goldstein Cc: Alexander Viro, Christian Brauner, linux-fsdevel, linux-kernel, Jan Kara, Chung-Chiang Cheng, ltp [-- Attachment #1: Type: text/plain, Size: 1226 bytes --] Always generate modify out, access in for splice; this gets automatically merged with no ugly special cases. No changes to 2/3 or 3/3. Ahelenia Ziemiańska (3): splice: always fsnotify_access(in), fsnotify_modify(out) on success splice: fsnotify_access(fd)/fsnotify_modify(fd) in vmsplice splice: fsnotify_access(in), fsnotify_modify(out) on success in tee fs/splice.c | 38 ++++++++++++++++++++------------------ 1 file changed, 20 insertions(+), 18 deletions(-) Interdiff against v3: diff --git a/fs/splice.c b/fs/splice.c index 2ecfccbda956..bdbabc2ebfff 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -1184,10 +1184,6 @@ long do_splice(struct file *in, loff_t *off_in, struct file *out, out->f_pos = offset; else *off_out = offset; - - // splice_write-> already marked out - // as modified via vfs_iter_write() - goto noaccessout; } else if (opipe) { if (off_out) return -ESPIPE; @@ -1211,11 +1207,10 @@ long do_splice(struct file *in, loff_t *off_in, struct file *out, } else return -EINVAL; - if (ret > 0) + if (ret > 0) { fsnotify_modify(out); -noaccessout: - if (ret > 0) fsnotify_access(in); + } return ret; } -- 2.39.2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 55+ messages in thread
* [PATCH v4 1/3] splice: always fsnotify_access(in), fsnotify_modify(out) on success 2023-06-27 20:50 ` [PATCH v4 0/3] " Ahelenia Ziemiańska @ 2023-06-27 20:50 ` Ahelenia Ziemiańska 2023-06-28 6:33 ` Amir Goldstein 2023-06-27 20:50 ` [PATCH v4 2/3] splice: fsnotify_access(fd)/fsnotify_modify(fd) in vmsplice Ahelenia Ziemiańska ` (3 subsequent siblings) 4 siblings, 1 reply; 55+ messages in thread From: Ahelenia Ziemiańska @ 2023-06-27 20:50 UTC (permalink / raw) To: Amir Goldstein Cc: Alexander Viro, Christian Brauner, linux-fsdevel, linux-kernel, Jan Kara, Chung-Chiang Cheng, ltp [-- Attachment #1: Type: text/plain, Size: 2573 bytes --] The current behaviour caused an asymmetry where some write APIs (write, sendfile) would notify the written-to/read-from objects, but splice wouldn't. This affected userspace which uses inotify, most notably coreutils tail -f, to monitor pipes. If the pipe buffer had been filled by a splice-family function: * tail wouldn't know and thus wouldn't service the pipe, and * all writes to the pipe would block because it's full, thus service was denied. (For the particular case of tail -f this could be worked around with ---disable-inotify.) Generate modify out before access in to let inotify merge the modify out events in thr ipipe case. Fixes: 983652c69199 ("splice: report related fsnotify events") Link: https://lore.kernel.org/linux-fsdevel/jbyihkyk5dtaohdwjyivambb2gffyjs3dodpofafnkkunxq7bu@jngkdxx65pux/t/#u Link: https://bugs.debian.org/1039488 Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Reviewed-by: Amir Goldstein <amir73il@gmail.com> --- fs/splice.c | 26 +++++++++----------------- 1 file changed, 9 insertions(+), 17 deletions(-) diff --git a/fs/splice.c b/fs/splice.c index 3e06611d19ae..b5c7a5ae0e94 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -1154,10 +1154,8 @@ long do_splice(struct file *in, loff_t *off_in, struct file *out, if ((in->f_flags | out->f_flags) & O_NONBLOCK) flags |= SPLICE_F_NONBLOCK; - return splice_pipe_to_pipe(ipipe, opipe, len, flags); - } - - if (ipipe) { + ret = splice_pipe_to_pipe(ipipe, opipe, len, flags); + } else if (ipipe) { if (off_in) return -ESPIPE; if (off_out) { @@ -1182,18 +1180,11 @@ long do_splice(struct file *in, loff_t *off_in, struct file *out, ret = do_splice_from(ipipe, out, &offset, len, flags); file_end_write(out); - if (ret > 0) - fsnotify_modify(out); - if (!off_out) out->f_pos = offset; else *off_out = offset; - - return ret; - } - - if (opipe) { + } else if (opipe) { if (off_out) return -ESPIPE; if (off_in) { @@ -1209,18 +1200,19 @@ long do_splice(struct file *in, loff_t *off_in, struct file *out, ret = splice_file_to_pipe(in, opipe, &offset, len, flags); - if (ret > 0) - fsnotify_access(in); - if (!off_in) in->f_pos = offset; else *off_in = offset; + } else + return -EINVAL; - return ret; + if (ret > 0) { + fsnotify_modify(out); + fsnotify_access(in); } - return -EINVAL; + return ret; } static long __do_splice(struct file *in, loff_t __user *off_in, -- 2.39.2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 55+ messages in thread
* Re: [PATCH v4 1/3] splice: always fsnotify_access(in), fsnotify_modify(out) on success 2023-06-27 20:50 ` [PATCH v4 1/3] splice: always fsnotify_access(in), fsnotify_modify(out) on success Ahelenia Ziemiańska @ 2023-06-28 6:33 ` Amir Goldstein 2023-06-28 10:11 ` Jan Kara 2023-06-28 17:09 ` Ahelenia Ziemiańska 0 siblings, 2 replies; 55+ messages in thread From: Amir Goldstein @ 2023-06-28 6:33 UTC (permalink / raw) To: Ahelenia Ziemiańska Cc: Alexander Viro, Christian Brauner, linux-fsdevel, linux-kernel, Jan Kara, Chung-Chiang Cheng, ltp On Tue, Jun 27, 2023 at 11:50 PM Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> wrote: > > The current behaviour caused an asymmetry where some write APIs > (write, sendfile) would notify the written-to/read-from objects, > but splice wouldn't. > > This affected userspace which uses inotify, most notably coreutils > tail -f, to monitor pipes. > If the pipe buffer had been filled by a splice-family function: > * tail wouldn't know and thus wouldn't service the pipe, and > * all writes to the pipe would block because it's full, > thus service was denied. > (For the particular case of tail -f this could be worked around > with ---disable-inotify.) > Is my understanding of the tail code wrong? My understanding was that tail_forever_inotify() is not called for pipes, or is it being called when tailing a mixed collection of pipes and regular files? If there are subtleties like those you need to mention them , otherwise people will not be able to reproduce the problem that you are describing. I need to warn you about something regarding this patch - often there are colliding interests among different kernel users - fsnotify use cases quite often collide with the interest of users tracking performance regressions and IN_ACCESS/IN_MODIFY on anonymous pipes specifically have been the source of several performance regression reports in the past and have driven optimizations like: 71d734103edf ("fsnotify: Rearrange fast path to minimise overhead when there is no watcher") e43de7f0862b ("fsnotify: optimize the case of no marks of any type") The moral of this story is: even if your patches are accepted by fsnotify reviewers, once they are staged for merging they will be subject to performance regression tests and I can tell you with certainty that performance regression will not be tolerated for the tail -f use case. I will push your v4 patches to a branch in my github, to let the kernel test bots run the performance regressions on it whenever they get to it. Moreover, if coreutils will change tail -f to start setting inotify watches on anonymous pipes (my understanding is that currently does not?), then any tail -f on anonymous pipe can cripple the "no marks on sb" performance optimization for all anonymous pipes and that would be a *very* unfortunate outcome. I think we need to add a rule to fanotify_events_supported() to ban sb/mount marks on SB_KERNMOUNT and backport this fix to LTS kernels (I will look into it) and then we can fine tune the s_fsnotify_connectors optimization in fsnotify_parent() for the SB_KERNMOUNT special case. This may be able to save your patch for the faith of NACKed for performance regression. > Generate modify out before access in to let inotify merge the > modify out events in thr ipipe case. This comment is not clear and does not belong in this context, but it very much belongs near the code in question. Please wait to collect more feedback and specifically to hear what Jan has to say about this hack before posting v5! FYI, we are now in the beginning of the 6.5 "merge window", which means that maintainers may be less responsive for the next two weeks to non-critical patches as this one, which are not targeted for the 6.5 kernel release. Thanks, Amir. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH v4 1/3] splice: always fsnotify_access(in), fsnotify_modify(out) on success 2023-06-28 6:33 ` Amir Goldstein @ 2023-06-28 10:11 ` Jan Kara 2023-06-28 17:09 ` Ahelenia Ziemiańska 1 sibling, 0 replies; 55+ messages in thread From: Jan Kara @ 2023-06-28 10:11 UTC (permalink / raw) To: Amir Goldstein Cc: Ahelenia Ziemiańska, Alexander Viro, Christian Brauner, linux-fsdevel, linux-kernel, Jan Kara, Chung-Chiang Cheng, ltp On Wed 28-06-23 09:33:43, Amir Goldstein wrote: > On Tue, Jun 27, 2023 at 11:50 PM Ahelenia Ziemiańska > <nabijaczleweli@nabijaczleweli.xyz> wrote: > > > > The current behaviour caused an asymmetry where some write APIs > > (write, sendfile) would notify the written-to/read-from objects, > > but splice wouldn't. > > > > This affected userspace which uses inotify, most notably coreutils > > tail -f, to monitor pipes. > > If the pipe buffer had been filled by a splice-family function: > > * tail wouldn't know and thus wouldn't service the pipe, and > > * all writes to the pipe would block because it's full, > > thus service was denied. > > (For the particular case of tail -f this could be worked around > > with ---disable-inotify.) > > > > Is my understanding of the tail code wrong? > My understanding was that tail_forever_inotify() is not called for > pipes, or is it being called when tailing a mixed collection of pipes > and regular files? If there are subtleties like those you need to > mention them , otherwise people will not be able to reproduce the > problem that you are describing. Well, on my openSUSE 15.4 at least, tail -f does use inotify on FIFOs and indeed when data is spliced to the FIFO, tail doesn't notice. > I need to warn you about something regarding this patch - > often there are colliding interests among different kernel users - > fsnotify use cases quite often collide with the interest of users tracking > performance regressions and IN_ACCESS/IN_MODIFY on anonymous pipes > specifically have been the source of several performance regression reports > in the past and have driven optimizations like: > > 71d734103edf ("fsnotify: Rearrange fast path to minimise overhead > when there is no watcher") > e43de7f0862b ("fsnotify: optimize the case of no marks of any type") > > The moral of this story is: even if your patches are accepted by fsnotify > reviewers, once they are staged for merging they will be subject to > performance regression tests and I can tell you with certainty that > performance regression will not be tolerated for the tail -f use case. > I will push your v4 patches to a branch in my github, to let the kernel > test bots run the performance regressions on it whenever they get to it. > > Moreover, if coreutils will change tail -f to start setting inotify watches > on anonymous pipes (my understanding is that currently does not?), > then any tail -f on anonymous pipe can cripple the "no marks on sb" > performance optimization for all anonymous pipes and that would be > a *very* unfortunate outcome. Do you mean the "s_fsnotify_connectors" check? Yeah, a fsnotify watch on any pipe inode is going to somewhat slow down the fsnotify calls for any pipe. OTOH I don't expect inotify watches on pipe inodes to be common and it is not like the overhead is huge. Also nobody really prevents you from placing watch on pipe inode now with similar consequences, this patch only makes it actually working with splice. So I'm not worried about the performance impact. At least until somebody comes with a realistic complaint ;-). > I think we need to add a rule to fanotify_events_supported() to ban > sb/mount marks on SB_KERNMOUNT and backport this > fix to LTS kernels (I will look into it) and then we can fine tune > the s_fsnotify_connectors optimization in fsnotify_parent() for > the SB_KERNMOUNT special case. Yeah, probably makes sense. Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH v4 1/3] splice: always fsnotify_access(in), fsnotify_modify(out) on success 2023-06-28 6:33 ` Amir Goldstein 2023-06-28 10:11 ` Jan Kara @ 2023-06-28 17:09 ` Ahelenia Ziemiańska 2023-06-28 18:38 ` Amir Goldstein 1 sibling, 1 reply; 55+ messages in thread From: Ahelenia Ziemiańska @ 2023-06-28 17:09 UTC (permalink / raw) To: Amir Goldstein Cc: Alexander Viro, Christian Brauner, linux-fsdevel, linux-kernel, Jan Kara, Chung-Chiang Cheng, ltp [-- Attachment #1: Type: text/plain, Size: 4802 bytes --] On Wed, Jun 28, 2023 at 09:33:43AM +0300, Amir Goldstein wrote: > On Tue, Jun 27, 2023 at 11:50 PM Ahelenia Ziemiańska > <nabijaczleweli@nabijaczleweli.xyz> wrote: > > The current behaviour caused an asymmetry where some write APIs > > (write, sendfile) would notify the written-to/read-from objects, > > but splice wouldn't. > > > > This affected userspace which uses inotify, most notably coreutils > > tail -f, to monitor pipes. > > If the pipe buffer had been filled by a splice-family function: > > * tail wouldn't know and thus wouldn't service the pipe, and > > * all writes to the pipe would block because it's full, > > thus service was denied. > > (For the particular case of tail -f this could be worked around > > with ---disable-inotify.) > Is my understanding of the tail code wrong? > My understanding was that tail_forever_inotify() is not called for > pipes, or is it being called when tailing a mixed collection of pipes > and regular files? If there are subtleties like those you need to > mention them , otherwise people will not be able to reproduce the > problem that you are describing. I can't squeak to the code itself, but it's trivial to check: $ tail -f fifo & [1] 3213996 $ echo zupa > fifo zupa $ echo zupa > fifo zupa $ echo zupa > fifo zupa $ cat /bin/tail > fifo # ... $ cat /bin/tail > fifo hangs: the fifo is being watched with inotify. This happens regardless of other files being specified. tail -f doesn't follow FIFOs or pipes if they're fd 0 (guaranteed by POSIX, coreutils conforms). OTOH, you could theoretically do $ cat | tail -f /dev/fd/3 3<&0 which first reads from the pipe until completion (⇔ hangup, cat died), then hangs, because it's waiting for more data on the pipe. This can never happen under a normal scenario, but doing $ echo zupa > /proc/3238590/fd/3 a few times reveals it's using classic 1/s polling (and splicing to /proc/3238590/fd/3 actually yields that data being output from tail). > I need to warn you about something regarding this patch - > often there are colliding interests among different kernel users - > fsnotify use cases quite often collide with the interest of users tracking > performance regressions and IN_ACCESS/IN_MODIFY on anonymous pipes > specifically have been the source of several performance regression reports > in the past and have driven optimizations like: > > 71d734103edf ("fsnotify: Rearrange fast path to minimise overhead > when there is no watcher") > e43de7f0862b ("fsnotify: optimize the case of no marks of any type") > > The moral of this story is: even if your patches are accepted by fsnotify > reviewers, once they are staged for merging they will be subject to > performance regression tests and I can tell you with certainty that > performance regression will not be tolerated for the tail -f use case. > I will push your v4 patches to a branch in my github, to let the kernel > test bots run the performance regressions on it whenever they get to it. > > Moreover, if coreutils will change tail -f to start setting inotify watches > on anonymous pipes (my understanding is that currently does not?), > then any tail -f on anonymous pipe can cripple the "no marks on sb" > performance optimization for all anonymous pipes and that would be > a *very* unfortunate outcome. As seen above, it doesn't set inotify watches on anon pipes, and (since it manages to distinguish "| /dev/fd/3 3<&0" from "fifo", so it must be going further than S_ISFIFO(fstat())) this is an explicit design decision. If you refuse setting inotifies on anon pipes then that likely won't impact any userspace program (it's pathological, and for tail-like cases it'd only be meaningful for magic /proc/$pid/fd/* symlinks), and if it's in the name of performance then no-one'll likely complain, or even notice. > I think we need to add a rule to fanotify_events_supported() to ban > sb/mount marks on SB_KERNMOUNT and backport this > fix to LTS kernels (I will look into it) and then we can fine tune > the s_fsnotify_connectors optimization in fsnotify_parent() for > the SB_KERNMOUNT special case. > This may be able to save your patch for the faith of NACKed > for performance regression. This goes over my head, but if Jan says it makes sense then it must do. > > Generate modify out before access in to let inotify merge the > > modify out events in thr ipipe case. > This comment is not clear and does not belong in this context, > but it very much belongs near the code in question. Turned it into /* * Generate modify out before access in: * do_splice_from() may've already sent modify out, * and this ensures the events get merged. */ for v5. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH v4 1/3] splice: always fsnotify_access(in), fsnotify_modify(out) on success 2023-06-28 17:09 ` Ahelenia Ziemiańska @ 2023-06-28 18:38 ` Amir Goldstein 2023-06-28 20:18 ` Ahelenia Ziemiańska 0 siblings, 1 reply; 55+ messages in thread From: Amir Goldstein @ 2023-06-28 18:38 UTC (permalink / raw) To: Ahelenia Ziemiańska Cc: Alexander Viro, Christian Brauner, linux-fsdevel, linux-kernel, Jan Kara, Chung-Chiang Cheng, ltp On Wed, Jun 28, 2023 at 8:09 PM Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> wrote: > > On Wed, Jun 28, 2023 at 09:33:43AM +0300, Amir Goldstein wrote: > > On Tue, Jun 27, 2023 at 11:50 PM Ahelenia Ziemiańska > > <nabijaczleweli@nabijaczleweli.xyz> wrote: > > > The current behaviour caused an asymmetry where some write APIs > > > (write, sendfile) would notify the written-to/read-from objects, > > > but splice wouldn't. > > > > > > This affected userspace which uses inotify, most notably coreutils > > > tail -f, to monitor pipes. > > > If the pipe buffer had been filled by a splice-family function: > > > * tail wouldn't know and thus wouldn't service the pipe, and > > > * all writes to the pipe would block because it's full, > > > thus service was denied. > > > (For the particular case of tail -f this could be worked around > > > with ---disable-inotify.) > > Is my understanding of the tail code wrong? > > My understanding was that tail_forever_inotify() is not called for > > pipes, or is it being called when tailing a mixed collection of pipes > > and regular files? If there are subtleties like those you need to > > mention them , otherwise people will not be able to reproduce the > > problem that you are describing. > I can't squeak to the code itself, but it's trivial to check: > $ tail -f fifo & > [1] 3213996 > $ echo zupa > fifo > zupa > $ echo zupa > fifo > zupa > $ echo zupa > fifo > zupa > $ cat /bin/tail > fifo > # ... > $ cat /bin/tail > fifo > hangs: the fifo is being watched with inotify. > > This happens regardless of other files being specified. > > tail -f doesn't follow FIFOs or pipes if they're fd 0 > (guaranteed by POSIX, coreutils conforms). > > OTOH, you could theoretically do > $ cat | tail -f /dev/fd/3 3<&0 > which first reads from the pipe until completion (⇔ hangup, cat died), > then hangs, because it's waiting for more data on the pipe. > > This can never happen under a normal scenario, but doing > $ echo zupa > /proc/3238590/fd/3 > a few times reveals it's using classic 1/s polling > (and splicing to /proc/3238590/fd/3 actually yields that data being > output from tail). > > > I need to warn you about something regarding this patch - > > often there are colliding interests among different kernel users - > > fsnotify use cases quite often collide with the interest of users tracking > > performance regressions and IN_ACCESS/IN_MODIFY on anonymous pipes > > specifically have been the source of several performance regression reports > > in the past and have driven optimizations like: > > > > 71d734103edf ("fsnotify: Rearrange fast path to minimise overhead > > when there is no watcher") > > e43de7f0862b ("fsnotify: optimize the case of no marks of any type") > > > > The moral of this story is: even if your patches are accepted by fsnotify > > reviewers, once they are staged for merging they will be subject to > > performance regression tests and I can tell you with certainty that > > performance regression will not be tolerated for the tail -f use case. > > I will push your v4 patches to a branch in my github, to let the kernel > > test bots run the performance regressions on it whenever they get to it. > > > > Moreover, if coreutils will change tail -f to start setting inotify watches > > on anonymous pipes (my understanding is that currently does not?), > > then any tail -f on anonymous pipe can cripple the "no marks on sb" > > performance optimization for all anonymous pipes and that would be > > a *very* unfortunate outcome. > As seen above, it doesn't set inotify watches on anon pipes, and > (since it manages to distinguish "| /dev/fd/3 3<&0" from "fifo", > so it must be going further than S_ISFIFO(fstat())) > this is an explicit design decision. > > If you refuse setting inotifies on anon pipes then that likely won't > impact any userspace program (it's pathological, and for tail-like cases > it'd only be meaningful for magic /proc/$pid/fd/* symlinks), > and if it's in the name of performance then no-one'll likely complain, > or even notice. > Unfortunately, it doesn't work this way - most of the time we are not supposed to break existing applications and I have no way of knowing if those applications exist... > > I think we need to add a rule to fanotify_events_supported() to ban > > sb/mount marks on SB_KERNMOUNT and backport this > > fix to LTS kernels (I will look into it) and then we can fine tune > > the s_fsnotify_connectors optimization in fsnotify_parent() for > > the SB_KERNMOUNT special case. > > This may be able to save your patch for the faith of NACKed > > for performance regression. > This goes over my head, but if Jan says it makes sense > then it must do. > Here you go: https://github.com/amir73il/linux/commits/fsnotify_pipe I ended up using SB_NOUSER which is narrower than SB_KERNMOUNT. Care to test? 1) Functionally - that I did not break your tests. 2) Optimization - that when one anon pipe has an inotify watch write to another anon pipe stops at fsnotify_inode_has_watchers() and does not get to fsnotify(). > > > Generate modify out before access in to let inotify merge the > > > modify out events in thr ipipe case. > > This comment is not clear and does not belong in this context, > > but it very much belongs near the code in question. > Turned it into > /* > * Generate modify out before access in: > * do_splice_from() may've already sent modify out, > * and this ensures the events get merged. > */ > for v5. OK. Thanks, Amir. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH v4 1/3] splice: always fsnotify_access(in), fsnotify_modify(out) on success 2023-06-28 18:38 ` Amir Goldstein @ 2023-06-28 20:18 ` Ahelenia Ziemiańska 2023-06-30 11:03 ` Amir Goldstein 0 siblings, 1 reply; 55+ messages in thread From: Ahelenia Ziemiańska @ 2023-06-28 20:18 UTC (permalink / raw) To: Amir Goldstein Cc: Alexander Viro, Christian Brauner, linux-fsdevel, linux-kernel, Jan Kara, Chung-Chiang Cheng, ltp [-- Attachment #1: Type: text/plain, Size: 2270 bytes --] On Wed, Jun 28, 2023 at 09:38:03PM +0300, Amir Goldstein wrote: > On Wed, Jun 28, 2023 at 8:09 PM Ahelenia Ziemiańska > <nabijaczleweli@nabijaczleweli.xyz> wrote: > > On Wed, Jun 28, 2023 at 09:33:43AM +0300, Amir Goldstein wrote: > > > I think we need to add a rule to fanotify_events_supported() to ban > > > sb/mount marks on SB_KERNMOUNT and backport this > > > fix to LTS kernels (I will look into it) and then we can fine tune > > > the s_fsnotify_connectors optimization in fsnotify_parent() for > > > the SB_KERNMOUNT special case. > > > This may be able to save your patch for the faith of NACKed > > > for performance regression. > > This goes over my head, but if Jan says it makes sense > > then it must do. > Here you go: > https://github.com/amir73il/linux/commits/fsnotify_pipe > > I ended up using SB_NOUSER which is narrower than > SB_KERNMOUNT. > > Care to test? > 1) Functionally - that I did not break your tests. ) | gzip -d > inotify13; chmod +x inotify13; exec ./inotify13 tst_test.c:1560: TINFO: Timeout per run is 0h 00m 30s inotify13.c:260: TINFO: file_to_pipe inotify13.c:269: TPASS: ок inotify13.c:260: TINFO: file_to_pipe inotify13.c:269: TPASS: ок inotify13.c:260: TINFO: splice_pipe_to_file inotify13.c:269: TPASS: ок inotify13.c:260: TINFO: pipe_to_pipe inotify13.c:269: TPASS: ок inotify13.c:260: TINFO: pipe_to_pipe inotify13.c:269: TPASS: ок inotify13.c:260: TINFO: vmsplice_pipe_to_mem inotify13.c:269: TPASS: ок inotify13.c:260: TINFO: vmsplice_mem_to_pipe inotify13.c:269: TPASS: ок Summary: passed 7 failed 0 broken 0 skipped 0 warnings 0 The discrete tests from before also work as expected, both to a fifo and an anon pipe. > 2) Optimization - that when one anon pipe has an inotify watch > write to another anon pipe stops at fsnotify_inode_has_watchers() > and does not get to fsnotify(). Yes, I can confirm this as well: fsnotify_parent() only continues to fsnotify() for the watched pipe; writes to other pipes early-exit. To validate the counterfactual, I reverted "fsnotify: optimize the case of anonymous pipe with no watches" and fsnotify() was being called for each anon pipe write, so long as any anon pipe watches were registered. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH v4 1/3] splice: always fsnotify_access(in), fsnotify_modify(out) on success 2023-06-28 20:18 ` Ahelenia Ziemiańska @ 2023-06-30 11:03 ` Amir Goldstein 0 siblings, 0 replies; 55+ messages in thread From: Amir Goldstein @ 2023-06-30 11:03 UTC (permalink / raw) To: Ahelenia Ziemiańska Cc: Alexander Viro, Christian Brauner, linux-fsdevel, linux-kernel, Jan Kara, Chung-Chiang Cheng, ltp On Wed, Jun 28, 2023 at 11:18 PM Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> wrote: > > On Wed, Jun 28, 2023 at 09:38:03PM +0300, Amir Goldstein wrote: > > On Wed, Jun 28, 2023 at 8:09 PM Ahelenia Ziemiańska > > <nabijaczleweli@nabijaczleweli.xyz> wrote: > > > On Wed, Jun 28, 2023 at 09:33:43AM +0300, Amir Goldstein wrote: > > > > I think we need to add a rule to fanotify_events_supported() to ban > > > > sb/mount marks on SB_KERNMOUNT and backport this > > > > fix to LTS kernels (I will look into it) and then we can fine tune > > > > the s_fsnotify_connectors optimization in fsnotify_parent() for > > > > the SB_KERNMOUNT special case. > > > > This may be able to save your patch for the faith of NACKed > > > > for performance regression. > > > This goes over my head, but if Jan says it makes sense > > > then it must do. > > Here you go: > > https://github.com/amir73il/linux/commits/fsnotify_pipe > > > > I ended up using SB_NOUSER which is narrower than > > SB_KERNMOUNT. > > > > Care to test? > > 1) Functionally - that I did not break your tests. > ) | gzip -d > inotify13; chmod +x inotify13; exec ./inotify13 > tst_test.c:1560: TINFO: Timeout per run is 0h 00m 30s > inotify13.c:260: TINFO: file_to_pipe > inotify13.c:269: TPASS: ок > inotify13.c:260: TINFO: file_to_pipe > inotify13.c:269: TPASS: ок > inotify13.c:260: TINFO: splice_pipe_to_file > inotify13.c:269: TPASS: ок > inotify13.c:260: TINFO: pipe_to_pipe > inotify13.c:269: TPASS: ок > inotify13.c:260: TINFO: pipe_to_pipe > inotify13.c:269: TPASS: ок > inotify13.c:260: TINFO: vmsplice_pipe_to_mem > inotify13.c:269: TPASS: ок > inotify13.c:260: TINFO: vmsplice_mem_to_pipe > inotify13.c:269: TPASS: ок > > Summary: > passed 7 > failed 0 > broken 0 > skipped 0 > warnings 0 > > The discrete tests from before also work as expected, > both to a fifo and an anon pipe. > > > 2) Optimization - that when one anon pipe has an inotify watch > > write to another anon pipe stops at fsnotify_inode_has_watchers() > > and does not get to fsnotify(). > Yes, I can confirm this as well: fsnotify_parent() only continues to > fsnotify() for the watched pipe; writes to other pipes early-exit. > > To validate the counterfactual, I reverted "fsnotify: optimize the case > of anonymous pipe with no watches" and fsnotify() was being called > for each anon pipe write, so long as any anon pipe watches were registered. Thank you for testing! As Jan suggested, when you post v5, with my Reviewed-by and Jan's Acked-by, please ask Christian to review and consider taking these patches through the vfs tree for the 6.6 release. Please include a link to your LTP test in the cover letter and a link to my performance optimization patches. Unless the kernel test bots detect a performance regression due to your patches, I am not sure whether or not or when we will apply the optimization patches. Thanks, Amir. ^ permalink raw reply [flat|nested] 55+ messages in thread
* [PATCH v4 2/3] splice: fsnotify_access(fd)/fsnotify_modify(fd) in vmsplice 2023-06-27 20:50 ` [PATCH v4 0/3] " Ahelenia Ziemiańska 2023-06-27 20:50 ` [PATCH v4 1/3] splice: always fsnotify_access(in), fsnotify_modify(out) on success Ahelenia Ziemiańska @ 2023-06-27 20:50 ` Ahelenia Ziemiańska 2023-06-27 20:51 ` [PATCH v4 3/3] splice: fsnotify_access(in), fsnotify_modify(out) on success in tee Ahelenia Ziemiańska ` (2 subsequent siblings) 4 siblings, 0 replies; 55+ messages in thread From: Ahelenia Ziemiańska @ 2023-06-27 20:50 UTC (permalink / raw) To: Amir Goldstein Cc: Alexander Viro, Christian Brauner, linux-fsdevel, linux-kernel, Jan Kara, Chung-Chiang Cheng, ltp [-- Attachment #1: Type: text/plain, Size: 1171 bytes --] Same logic applies here: this can fill up the pipe and pollers that rely on getting IN_MODIFY notifications never wake up. Fixes: 983652c69199 ("splice: report related fsnotify events") Link: https://lore.kernel.org/linux-fsdevel/jbyihkyk5dtaohdwjyivambb2gffyjs3dodpofafnkkunxq7bu@jngkdxx65pux/t/#u Link: https://bugs.debian.org/1039488 Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Reviewed-by: Amir Goldstein <amir73il@gmail.com> --- fs/splice.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/fs/splice.c b/fs/splice.c index b5c7a5ae0e94..d3a7f4d5c078 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -1341,6 +1341,9 @@ static long vmsplice_to_user(struct file *file, struct iov_iter *iter, pipe_unlock(pipe); } + if (ret > 0) + fsnotify_access(file); + return ret; } @@ -1370,8 +1373,10 @@ static long vmsplice_to_pipe(struct file *file, struct iov_iter *iter, if (!ret) ret = iter_to_pipe(iter, pipe, buf_flag); pipe_unlock(pipe); - if (ret > 0) + if (ret > 0) { wakeup_pipe_readers(pipe); + fsnotify_modify(file); + } return ret; } -- 2.39.2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 55+ messages in thread
* [PATCH v4 3/3] splice: fsnotify_access(in), fsnotify_modify(out) on success in tee 2023-06-27 20:50 ` [PATCH v4 0/3] " Ahelenia Ziemiańska 2023-06-27 20:50 ` [PATCH v4 1/3] splice: always fsnotify_access(in), fsnotify_modify(out) on success Ahelenia Ziemiańska 2023-06-27 20:50 ` [PATCH v4 2/3] splice: fsnotify_access(fd)/fsnotify_modify(fd) in vmsplice Ahelenia Ziemiańska @ 2023-06-27 20:51 ` Ahelenia Ziemiańska 2023-06-28 6:02 ` [PATCH v4 0/3] fanotify accounting for fs/splice.c Amir Goldstein 2023-06-28 11:38 ` Jan Kara 4 siblings, 0 replies; 55+ messages in thread From: Ahelenia Ziemiańska @ 2023-06-27 20:51 UTC (permalink / raw) To: Amir Goldstein Cc: Alexander Viro, Christian Brauner, linux-fsdevel, linux-kernel, Jan Kara, Chung-Chiang Cheng, ltp [-- Attachment #1: Type: text/plain, Size: 881 bytes --] Same logic applies here: this can fill up the pipe, and pollers that rely on getting IN_MODIFY notifications never wake up. Fixes: 983652c69199 ("splice: report related fsnotify events") Link: https://lore.kernel.org/linux-fsdevel/jbyihkyk5dtaohdwjyivambb2gffyjs3dodpofafnkkunxq7bu@jngkdxx65pux/t/#u Link: https://bugs.debian.org/1039488 Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Reviewed-by: Amir Goldstein <amir73il@gmail.com> --- fs/splice.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/fs/splice.c b/fs/splice.c index d3a7f4d5c078..bdbabc2ebfff 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -1810,6 +1810,11 @@ long do_tee(struct file *in, struct file *out, size_t len, unsigned int flags) } } + if (ret > 0) { + fsnotify_access(in); + fsnotify_modify(out); + } + return ret; } -- 2.39.2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 55+ messages in thread
* Re: [PATCH v4 0/3] fanotify accounting for fs/splice.c 2023-06-27 20:50 ` [PATCH v4 0/3] " Ahelenia Ziemiańska ` (2 preceding siblings ...) 2023-06-27 20:51 ` [PATCH v4 3/3] splice: fsnotify_access(in), fsnotify_modify(out) on success in tee Ahelenia Ziemiańska @ 2023-06-28 6:02 ` Amir Goldstein 2023-06-28 11:38 ` Jan Kara 4 siblings, 0 replies; 55+ messages in thread From: Amir Goldstein @ 2023-06-28 6:02 UTC (permalink / raw) To: Ahelenia Ziemiańska Cc: Alexander Viro, Christian Brauner, linux-fsdevel, linux-kernel, Jan Kara, Chung-Chiang Cheng, Mel Gorman On Tue, Jun 27, 2023 at 11:50 PM Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> wrote: > > Always generate modify out, access in for splice; > this gets automatically merged with no ugly special cases. Ahelenia, Obviously, you are new to sending patches to the kernel and you appear to be a very enthusiastic and fast learner, so I assume you won't mind getting some tips that you won't find in any document. a) CC LTP only on tests, not on the kernel patches b) Please don't post these "diff" patches. Developers (and bots) should be able to understand which upstream commit a patch is based on (git format-patch provides that info). I mean you can send those diff patches as part of a conversation to explain yourself, that's fine, just don't post them as if they are patches for review. c) When there are prospect reviewers that have not reviewed v1 (especially inotify maintainer), it is better to wait at least one day posting v2 and v3 and v4 ;), because: 1. It is better to accumulate review comments from several reviewers 2. Different reviewers may disagree, so if you are just following my advice you may need to go back and forth until everyone is happy 3. It's racy - reviewers may be in the middle of review of v1 without realizing that v2,v3,v4 is already in their inbox, so that's creating extra work for them - not a good outcome Going to review v4.... Thanks, Amir. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH v4 0/3] fanotify accounting for fs/splice.c 2023-06-27 20:50 ` [PATCH v4 0/3] " Ahelenia Ziemiańska ` (3 preceding siblings ...) 2023-06-28 6:02 ` [PATCH v4 0/3] fanotify accounting for fs/splice.c Amir Goldstein @ 2023-06-28 11:38 ` Jan Kara 2023-06-28 13:41 ` Amir Goldstein 2023-06-28 18:54 ` Ahelenia Ziemiańska 4 siblings, 2 replies; 55+ messages in thread From: Jan Kara @ 2023-06-28 11:38 UTC (permalink / raw) To: Ahelenia Ziemiańska Cc: Amir Goldstein, Alexander Viro, Christian Brauner, linux-fsdevel, linux-kernel, Jan Kara, Chung-Chiang Cheng, ltp Hello! On Tue 27-06-23 22:50:46, Ahelenia Ziemiańska wrote: > Always generate modify out, access in for splice; > this gets automatically merged with no ugly special cases. > > No changes to 2/3 or 3/3. Thanks for the patches Ahelena! The code looks fine to me but to be honest I still have one unresolved question so let me think about it loud here for documentation purposes :). Do we want fsnotify (any filesystem notification framework like inotify or fanotify) to actually generate events on FIFOs? FIFOs are virtual objects and are not part of the filesystem as such (well, the inode itself and the name is), hence *filesystem* notification framework does not seem like a great fit to watch for changes or accesses there. And if we say "yes" for FIFOs, then why not AF_UNIX sockets? Where do we draw the line? And is it all worth the trouble? I understand the convenience of inotify working on FIFOs for the "tail -f" usecase but then wouldn't this better be fixed in tail(1) itself by using epoll(7) for FIFOs which, as I've noted in my other reply, does not have the problem that poll(2) has when there are no writers? Another issue with FIFOs is that they do not have a concept of file position. For hierarchical storage usecase we are introducing events that will report file ranges being modified / accessed and officially supporting FIFOs is one more special case to deal with. What is supporting your changes is that fsnotify mostly works for FIFOs already now (normal reads & writes generate notification) so splice not working could be viewed as an inconsistency. Sockets (although they are visible in the filesystem) cannot be open so for them the illusion of being a file is even weaker. So overall I guess I'm slightly in favor of making fsnotify generate events on FIFOs even with splice, provided Amir does not see a big trouble in supporting this with his upcoming HSM changes. Honza > Ahelenia Ziemiańska (3): > splice: always fsnotify_access(in), fsnotify_modify(out) on success > splice: fsnotify_access(fd)/fsnotify_modify(fd) in vmsplice > splice: fsnotify_access(in), fsnotify_modify(out) on success in tee > > fs/splice.c | 38 ++++++++++++++++++++------------------ > 1 file changed, 20 insertions(+), 18 deletions(-) > > Interdiff against v3: > diff --git a/fs/splice.c b/fs/splice.c > index 2ecfccbda956..bdbabc2ebfff 100644 > --- a/fs/splice.c > +++ b/fs/splice.c > @@ -1184,10 +1184,6 @@ long do_splice(struct file *in, loff_t *off_in, struct file *out, > out->f_pos = offset; > else > *off_out = offset; > - > - // splice_write-> already marked out > - // as modified via vfs_iter_write() > - goto noaccessout; > } else if (opipe) { > if (off_out) > return -ESPIPE; > @@ -1211,11 +1207,10 @@ long do_splice(struct file *in, loff_t *off_in, struct file *out, > } else > return -EINVAL; > > - if (ret > 0) > + if (ret > 0) { > fsnotify_modify(out); > -noaccessout: > - if (ret > 0) > fsnotify_access(in); > + } > > return ret; > } > -- > 2.39.2 -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH v4 0/3] fanotify accounting for fs/splice.c 2023-06-28 11:38 ` Jan Kara @ 2023-06-28 13:41 ` Amir Goldstein 2023-06-28 18:54 ` Ahelenia Ziemiańska 1 sibling, 0 replies; 55+ messages in thread From: Amir Goldstein @ 2023-06-28 13:41 UTC (permalink / raw) To: Jan Kara Cc: Ahelenia Ziemiańska, Alexander Viro, Christian Brauner, linux-fsdevel, linux-kernel, Chung-Chiang Cheng, ltp On Wed, Jun 28, 2023 at 2:38 PM Jan Kara <jack@suse.cz> wrote: > > Hello! > > On Tue 27-06-23 22:50:46, Ahelenia Ziemiańska wrote: > > Always generate modify out, access in for splice; > > this gets automatically merged with no ugly special cases. > > > > No changes to 2/3 or 3/3. > > Thanks for the patches Ahelena! The code looks fine to me but to be honest > I still have one unresolved question so let me think about it loud here for > documentation purposes :). Do we want fsnotify (any filesystem > notification framework like inotify or fanotify) to actually generate > events on FIFOs? FIFOs are virtual objects and are not part of the > filesystem as such (well, the inode itself and the name is), hence > *filesystem* notification framework does not seem like a great fit to watch > for changes or accesses there. And if we say "yes" for FIFOs, then why not > AF_UNIX sockets? Where do we draw the line? And is it all worth the > trouble? > > I understand the convenience of inotify working on FIFOs for the "tail -f" > usecase but then wouldn't this better be fixed in tail(1) itself by using > epoll(7) for FIFOs which, as I've noted in my other reply, does not have > the problem that poll(2) has when there are no writers? > > Another issue with FIFOs is that they do not have a concept of file > position. For hierarchical storage usecase we are introducing events that > will report file ranges being modified / accessed and officially supporting > FIFOs is one more special case to deal with. > > What is supporting your changes is that fsnotify mostly works for FIFOs > already now (normal reads & writes generate notification) so splice not > working could be viewed as an inconsistency. Sockets (although they are > visible in the filesystem) cannot be open so for them the illusion of being > a file is even weaker. > > So overall I guess I'm slightly in favor of making fsnotify generate events > on FIFOs even with splice, provided Amir does not see a big trouble in > supporting this with his upcoming HSM changes. > I've also thought about this. The thing about the HSM events is that they are permission events and just like FAN_ACCESS_PERM, they originate from the common access control helpers {rw,remap}_verify_area(), which also happen to have the file range info (with ppos NULL for pipes). Ahelenia's patches do not add any new rw_verify_area() to pipes so no new FAN_ACCESS_PERM events were added. If we could go back to the design of fanotify we would have probably made it explicit that permission events are only allowed on regular files and dirs. For the new HSM events we can (and will) do that. In any case, the new events are supposed to be delivered with file access range records, so delivering HSM events on pipes wouldn't make any sense. So I do not see any problem with these patches wrt upcomping HSM events. However, note that these patches create more inconsistencies between IN_ACCESS and FAN_ACCESS_PERM on pipes. We can leave it at that if we want, but fixing the inconsistencies by adding more FAN_ACCESS_PERM events on pipes - this is not something that I wouldn't be comfortable with. If anything, we can remove FAN_ACCESS_PERM events from special files and see if anybody complains. I don't know of any users of FAN_ACCESS_PERM and even for FAN_OPEN_PERM, I don't think that AV-vendors have anything useful to do with open permission events on special files. Thanks, Amir. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH v4 0/3] fanotify accounting for fs/splice.c 2023-06-28 11:38 ` Jan Kara 2023-06-28 13:41 ` Amir Goldstein @ 2023-06-28 18:54 ` Ahelenia Ziemiańska 2023-06-29 8:45 ` Jan Kara 1 sibling, 1 reply; 55+ messages in thread From: Ahelenia Ziemiańska @ 2023-06-28 18:54 UTC (permalink / raw) To: Jan Kara Cc: Amir Goldstein, Alexander Viro, Christian Brauner, linux-fsdevel, linux-kernel, Chung-Chiang Cheng, ltp [-- Attachment #1: Type: text/plain, Size: 2847 bytes --] Hi! On Wed, Jun 28, 2023 at 01:38:53PM +0200, Jan Kara wrote: > On Tue 27-06-23 22:50:46, Ahelenia Ziemiańska wrote: > > Always generate modify out, access in for splice; > > this gets automatically merged with no ugly special cases. > > > > No changes to 2/3 or 3/3. > Thanks for the patches Ahelena! The code looks fine to me but to be honest > I still have one unresolved question so let me think about it loud here for > documentation purposes :). Do we want fsnotify (any filesystem > notification framework like inotify or fanotify) to actually generate > events on FIFOs? FIFOs are virtual objects and are not part of the > filesystem as such (well, the inode itself and the name is), hence > *filesystem* notification framework does not seem like a great fit to watch > for changes or accesses there. And if we say "yes" for FIFOs, then why not > AF_UNIX sockets? Where do we draw the line? And is it all worth the > trouble? As a relative outsider (I haven't used inotify before this, and have not been subjected to it or its peripheries before), I interpreted inotify as being the Correct solution for: 1. stuff you can find in a normal (non-/dev, you don't want to touch devices) filesystem traversal 2. stuff you can open where, going down the list in inode(7): S_IFSOCK can't open S_IFLNK can't open S_IFREG yes! S_IFBLK it's a device S_IFDIR yes! S_IFCHR it's a device S_IFIFO yes! It appears that I'm not the only one who's interpreted it that way, especially since neither regular files nor pipes are pollable. (Though, under that same categorisation, I wouldn't be surprised if anonymous pipes had been refused, for example, since those are conventionally unnameable.) To this end, I'd say we're leaving the line precisely where it was drawn before, even if by accident. > I understand the convenience of inotify working on FIFOs for the "tail -f" > usecase but then wouldn't this better be fixed in tail(1) itself by using > epoll(7) for FIFOs which, as I've noted in my other reply, does not have > the problem that poll(2) has when there are no writers? Yes, epoll in ET mode returns POLLHUP only once, but you /also/ need the inotify anyway for regular files, which epoll refuses (and, with -F, you may want both epoll for a pipe and inotify for the directory it's contained in). Is it possible to do? yes. Is it more annoying than just having pipes report when they were written to? very much so. inotify actually working(*) is presumably why coreutils tail doesn't use epoll ‒ inotify already provides all required events(*), you can use the same code for regular files and fifos, and with one fewer level of indirection: there's just no need(*). (*: except with a magic syscall only I use apparently) [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH v4 0/3] fanotify accounting for fs/splice.c 2023-06-28 18:54 ` Ahelenia Ziemiańska @ 2023-06-29 8:45 ` Jan Kara 0 siblings, 0 replies; 55+ messages in thread From: Jan Kara @ 2023-06-29 8:45 UTC (permalink / raw) To: Ahelenia Ziemiańska Cc: Jan Kara, Amir Goldstein, Alexander Viro, Christian Brauner, linux-fsdevel, linux-kernel, Chung-Chiang Cheng, ltp Hi! On Wed 28-06-23 20:54:28, Ahelenia Ziemiańska wrote: > On Wed, Jun 28, 2023 at 01:38:53PM +0200, Jan Kara wrote: > > On Tue 27-06-23 22:50:46, Ahelenia Ziemiańska wrote: > > > Always generate modify out, access in for splice; > > > this gets automatically merged with no ugly special cases. > > > > > > No changes to 2/3 or 3/3. > > Thanks for the patches Ahelena! The code looks fine to me but to be honest > > I still have one unresolved question so let me think about it loud here for > > documentation purposes :). Do we want fsnotify (any filesystem > > notification framework like inotify or fanotify) to actually generate > > events on FIFOs? FIFOs are virtual objects and are not part of the > > filesystem as such (well, the inode itself and the name is), hence > > *filesystem* notification framework does not seem like a great fit to watch > > for changes or accesses there. And if we say "yes" for FIFOs, then why not > > AF_UNIX sockets? Where do we draw the line? And is it all worth the > > trouble? > As a relative outsider (I haven't used inotify before this, and have not > been subjected to it or its peripheries before), > I interpreted inotify as being the Correct solution for: > 1. stuff you can find in a normal > (non-/dev, you don't want to touch devices) > filesystem traversal > 2. stuff you can open > where, going down the list in inode(7): > S_IFSOCK can't open > S_IFLNK can't open > S_IFREG yes! > S_IFBLK it's a device > S_IFDIR yes! > S_IFCHR it's a device > S_IFIFO yes! > > It appears that I'm not the only one who's interpreted it that way, > especially since neither regular files nor pipes are pollable. > (Though, under that same categorisation, I wouldn't be surprised > if anonymous pipes had been refused, for example, since those are > conventionally unnameable.) > > To this end, I'd say we're leaving the line precisely where it was drawn > before, even if by accident. I agree, although I'd note that there are S_IFREG inodes under /sys or /proc where it would be too difficult to provide fsnotify events (exactly because the file contents is not "data stored somewhere" but rather something "generated on the fly") so the illusion is not perfect already. > > I understand the convenience of inotify working on FIFOs for the "tail -f" > > usecase but then wouldn't this better be fixed in tail(1) itself by using > > epoll(7) for FIFOs which, as I've noted in my other reply, does not have > > the problem that poll(2) has when there are no writers? > Yes, epoll in ET mode returns POLLHUP only once, but you /also/ need the > inotify anyway for regular files, which epoll refuses > (and, with -F, you may want both epoll for a pipe and inotify for the > directory it's contained in). > Is it possible to do? yes. Is it more annoying than just having pipes > report when they were written to? very much so. > > inotify actually working(*) is presumably why coreutils tail doesn't use > epoll ‒ inotify already provides all required events(*), you can use the > same code for regular files and fifos, and with one fewer level of > indirection: there's just no need(*). > > (*: except with a magic syscall only I use apparently) Yeah, I've slept to this and I still think adding fsnotify events to splice is a nicer option so feel free to add: Acked-by: Jan Kara <jack@suse.cz> to all kernel patches in your series. Since the changes are in splice code, Christian or Al Viro (who you already have on CC list) should be merging this so please make sure to also include them in the v5 submission. Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH v3 0/3+1] fanotify accounting for fs/splice.c 2023-06-27 16:55 ` [PATCH v3 0/3+1] " Ahelenia Ziemiańska ` (4 preceding siblings ...) 2023-06-27 18:03 ` [PATCH v3 0/3+1] fanotify accounting for fs/splice.c Amir Goldstein @ 2023-06-28 4:51 ` Christoph Hellwig 2023-06-28 10:38 ` Jan Kara 5 siblings, 1 reply; 55+ messages in thread From: Christoph Hellwig @ 2023-06-28 4:51 UTC (permalink / raw) To: Ahelenia Ziemiańska Cc: Amir Goldstein, Alexander Viro, Christian Brauner, linux-fsdevel, linux-kernel, Jan Kara, Chung-Chiang Cheng, ltp Can you please resend this outside this thread? I really cant't see what's new or old here if you have a reply-to in the old thread. On Tue, Jun 27, 2023 at 06:55:22PM +0200, Ahelenia Ziemiańska wrote: > In 1/3 I've applied if/else if/else tree like you said, > and expounded a bit in the message. > > This is less pretty now, however, since it turns out that > iter_file_splice_write() already marks the out fd as written because it > writes to it via vfs_iter_write(), and that sent a double notification. It seems like vfs_iter_write is the wrong level to implement ->splice_write given that the the ->splice_write caller has already checked f_mode, done the equivalent of rw_verify_area and should do the fsnotify_modify. I'd suggest to just open code the relevant parts of vfs_iocb_iter_write in iter_file_splice_write. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH v3 0/3+1] fanotify accounting for fs/splice.c 2023-06-28 4:51 ` [PATCH v3 0/3+1] " Christoph Hellwig @ 2023-06-28 10:38 ` Jan Kara 0 siblings, 0 replies; 55+ messages in thread From: Jan Kara @ 2023-06-28 10:38 UTC (permalink / raw) To: Christoph Hellwig Cc: Ahelenia Ziemiańska, Amir Goldstein, Alexander Viro, Christian Brauner, linux-fsdevel, linux-kernel, Jan Kara, Chung-Chiang Cheng, ltp On Tue 27-06-23 21:51:05, Christoph Hellwig wrote: > Can you please resend this outside this thread? I really cant't see > what's new or old here if you have a reply-to in the old thread. > > On Tue, Jun 27, 2023 at 06:55:22PM +0200, Ahelenia Ziemiańska wrote: > > In 1/3 I've applied if/else if/else tree like you said, > > and expounded a bit in the message. > > > > This is less pretty now, however, since it turns out that > > iter_file_splice_write() already marks the out fd as written because it > > writes to it via vfs_iter_write(), and that sent a double notification. > > It seems like vfs_iter_write is the wrong level to implement > ->splice_write given that the the ->splice_write caller has already > checked f_mode, done the equivalent of rw_verify_area and > should do the fsnotify_modify. I'd suggest to just open code the > relevant parts of vfs_iocb_iter_write in iter_file_splice_write. Yeah, looking into the code I agree (with a small remark that unlike vfs_iocb_iter_write() this particular variant also needs to work with files providing only ->write and not ->write_iter). But we can live with duplicate events for now and this seems like a rather separate cleanup to do. Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 55+ messages in thread
* [PATCH v2 1/3] splice: always fsnotify_access(in), fsnotify_modify(out) on success 2023-06-26 18:57 ` Amir Goldstein 2023-06-26 23:08 ` [PATCH v2 0/3] fanotify accounting for fs/splice.c наб @ 2023-06-26 23:09 ` Ahelenia Ziemiańska 2023-06-27 6:02 ` Amir Goldstein 2023-06-26 23:09 ` [PATCH v2 2/3] splice: fsnotify_modify(fd) in vmsplice Ahelenia Ziemiańska 2023-06-26 23:09 ` [PATCH v2 3/3] splice: fsnotify_access(in), fsnotify_modify(out) on success in tee Ahelenia Ziemiańska 3 siblings, 1 reply; 55+ messages in thread From: Ahelenia Ziemiańska @ 2023-06-26 23:09 UTC (permalink / raw) To: Amir Goldstein Cc: Alexander Viro, Christian Brauner, linux-fsdevel, linux-kernel, Jan Kara, Chung-Chiang Cheng [-- Attachment #1: Type: text/plain, Size: 2023 bytes --] The current behaviour caused an asymmetry where some write APIs (write, sendfile) would notify the written-to/read-from objects, but splice wouldn't. This affected userspace which used inotify, like coreutils tail -f. Fixes: 983652c69199 ("splice: report related fsnotify events") Link: https://lore.kernel.org/linux-fsdevel/jbyihkyk5dtaohdwjyivambb2gffyjs3dodpofafnkkunxq7bu@jngkdxx65pux/t/#u Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Reviewed-by: Amir Goldstein <amir73il@gmail.com> --- No changes since v1 (except in the message). fs/splice.c | 21 ++++++++++++--------- 1 file changed, 12 insertions(+), 9 deletions(-) diff --git a/fs/splice.c b/fs/splice.c index 3e06611d19ae..94fae24f9d54 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -1154,7 +1154,8 @@ long do_splice(struct file *in, loff_t *off_in, struct file *out, if ((in->f_flags | out->f_flags) & O_NONBLOCK) flags |= SPLICE_F_NONBLOCK; - return splice_pipe_to_pipe(ipipe, opipe, len, flags); + ret = splice_pipe_to_pipe(ipipe, opipe, len, flags); + goto notify; } if (ipipe) { @@ -1182,15 +1183,12 @@ long do_splice(struct file *in, loff_t *off_in, struct file *out, ret = do_splice_from(ipipe, out, &offset, len, flags); file_end_write(out); - if (ret > 0) - fsnotify_modify(out); - if (!off_out) out->f_pos = offset; else *off_out = offset; - return ret; + goto notify; } if (opipe) { @@ -1209,18 +1207,23 @@ long do_splice(struct file *in, loff_t *off_in, struct file *out, ret = splice_file_to_pipe(in, opipe, &offset, len, flags); - if (ret > 0) - fsnotify_access(in); - if (!off_in) in->f_pos = offset; else *off_in = offset; - return ret; + goto notify; } return -EINVAL; + +notify: + if (ret > 0) { + fsnotify_access(in); + fsnotify_modify(out); + } + + return ret; } static long __do_splice(struct file *in, loff_t __user *off_in, -- 2.39.2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 55+ messages in thread
* Re: [PATCH v2 1/3] splice: always fsnotify_access(in), fsnotify_modify(out) on success 2023-06-26 23:09 ` [PATCH v2 1/3] splice: always fsnotify_access(in), fsnotify_modify(out) on success Ahelenia Ziemiańska @ 2023-06-27 6:02 ` Amir Goldstein 0 siblings, 0 replies; 55+ messages in thread From: Amir Goldstein @ 2023-06-27 6:02 UTC (permalink / raw) To: Ahelenia Ziemiańska Cc: Alexander Viro, Christian Brauner, linux-fsdevel, linux-kernel, Jan Kara, Chung-Chiang Cheng On Tue, Jun 27, 2023 at 2:09 AM Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> wrote: > > The current behaviour caused an asymmetry where some write APIs > (write, sendfile) would notify the written-to/read-from objects, > but splice wouldn't. > > This affected userspace which used inotify, like coreutils tail -f. > typo: uses? But this comment is not very clear IMO. Please imagine that the reader is a distro maintainer or coreutils maintainer How are they supposed to react to this comment? Is this an important fix for them to backport?? How does this change actually affect tail -f? Does it fix a bug in tail -f? As far as I understand from our last conversation, the answer is that it does not fix a bug in tail -f and it won't affect tail -f at all - it would *allow* tail -f to be changed in a way that could improve some use cases. Right? improve in what way exactly. The simplest way to explain this fix perhaps would be to link to a patch to tail and explain how the kernel+tail fixes improve a use case. > Fixes: 983652c69199 ("splice: report related fsnotify events") > Link: https://lore.kernel.org/linux-fsdevel/jbyihkyk5dtaohdwjyivambb2gffyjs3dodpofafnkkunxq7bu@jngkdxx65pux/t/#u > Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> > Reviewed-by: Amir Goldstein <amir73il@gmail.com> > --- > No changes since v1 (except in the message). > > fs/splice.c | 21 ++++++++++++--------- > 1 file changed, 12 insertions(+), 9 deletions(-) > > diff --git a/fs/splice.c b/fs/splice.c > index 3e06611d19ae..94fae24f9d54 100644 > --- a/fs/splice.c > +++ b/fs/splice.c > @@ -1154,7 +1154,8 @@ long do_splice(struct file *in, loff_t *off_in, struct file *out, > if ((in->f_flags | out->f_flags) & O_NONBLOCK) > flags |= SPLICE_F_NONBLOCK; > > - return splice_pipe_to_pipe(ipipe, opipe, len, flags); > + ret = splice_pipe_to_pipe(ipipe, opipe, len, flags); > + goto notify; > } > > if (ipipe) { > @@ -1182,15 +1183,12 @@ long do_splice(struct file *in, loff_t *off_in, struct file *out, > ret = do_splice_from(ipipe, out, &offset, len, flags); > file_end_write(out); > > - if (ret > 0) > - fsnotify_modify(out); > - > if (!off_out) > out->f_pos = offset; > else > *off_out = offset; > > - return ret; > + goto notify; > } > > if (opipe) { > @@ -1209,18 +1207,23 @@ long do_splice(struct file *in, loff_t *off_in, struct file *out, > > ret = splice_file_to_pipe(in, opipe, &offset, len, flags); > > - if (ret > 0) > - fsnotify_access(in); > - > if (!off_in) > in->f_pos = offset; > else > *off_in = offset; > > - return ret; > + goto notify; > } > > return -EINVAL; > + > +notify: > + if (ret > 0) { > + fsnotify_access(in); > + fsnotify_modify(out); > + } > + > + return ret; > } > Sorry I haven't noticed this in the first review, but goto is not really needed. We make the three cases if{}else if{}else if{} and return -EINVAL in the else case. It's not really that important, just a bit nicer IMO, so as you wish. Thanks, Amir. ^ permalink raw reply [flat|nested] 55+ messages in thread
* [PATCH v2 2/3] splice: fsnotify_modify(fd) in vmsplice 2023-06-26 18:57 ` Amir Goldstein 2023-06-26 23:08 ` [PATCH v2 0/3] fanotify accounting for fs/splice.c наб 2023-06-26 23:09 ` [PATCH v2 1/3] splice: always fsnotify_access(in), fsnotify_modify(out) on success Ahelenia Ziemiańska @ 2023-06-26 23:09 ` Ahelenia Ziemiańska 2023-06-27 6:27 ` Amir Goldstein 2023-06-26 23:09 ` [PATCH v2 3/3] splice: fsnotify_access(in), fsnotify_modify(out) on success in tee Ahelenia Ziemiańska 3 siblings, 1 reply; 55+ messages in thread From: Ahelenia Ziemiańska @ 2023-06-26 23:09 UTC (permalink / raw) To: Amir Goldstein Cc: Alexander Viro, Christian Brauner, linux-fsdevel, linux-kernel, Jan Kara, Chung-Chiang Cheng [-- Attachment #1: Type: text/plain, Size: 822 bytes --] Same logic applies here: this can fill up the pipe and pollers that rely on getting IN_MODIFY notifications never wake up. Fixes: 983652c69199 ("splice: report related fsnotify events") Link: https://lore.kernel.org/linux-fsdevel/jbyihkyk5dtaohdwjyivambb2gffyjs3dodpofafnkkunxq7bu@jngkdxx65pux/t/#u Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> --- fs/splice.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/splice.c b/fs/splice.c index 94fae24f9d54..a18274209dc1 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -1447,6 +1447,9 @@ SYSCALL_DEFINE4(vmsplice, int, fd, const struct iovec __user *, uiov, else error = vmsplice_to_user(f.file, &iter, flags); + if (error > 0) + fsnotify_modify(f.file); + kfree(iov); out_fdput: fdput(f); -- 2.39.2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 55+ messages in thread
* Re: [PATCH v2 2/3] splice: fsnotify_modify(fd) in vmsplice 2023-06-26 23:09 ` [PATCH v2 2/3] splice: fsnotify_modify(fd) in vmsplice Ahelenia Ziemiańska @ 2023-06-27 6:27 ` Amir Goldstein 0 siblings, 0 replies; 55+ messages in thread From: Amir Goldstein @ 2023-06-27 6:27 UTC (permalink / raw) To: Ahelenia Ziemiańska Cc: Alexander Viro, Christian Brauner, linux-fsdevel, linux-kernel, Jan Kara, Chung-Chiang Cheng On Tue, Jun 27, 2023 at 2:09 AM Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> wrote: > > Same logic applies here: this can fill up the pipe and pollers that rely > on getting IN_MODIFY notifications never wake up. > > Fixes: 983652c69199 ("splice: report related fsnotify events") > Link: https://lore.kernel.org/linux-fsdevel/jbyihkyk5dtaohdwjyivambb2gffyjs3dodpofafnkkunxq7bu@jngkdxx65pux/t/#u > Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> > --- > fs/splice.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/fs/splice.c b/fs/splice.c > index 94fae24f9d54..a18274209dc1 100644 > --- a/fs/splice.c > +++ b/fs/splice.c > @@ -1447,6 +1447,9 @@ SYSCALL_DEFINE4(vmsplice, int, fd, const struct iovec __user *, uiov, > else > error = vmsplice_to_user(f.file, &iter, flags); > > + if (error > 0) > + fsnotify_modify(f.file); > + Wow, that is a twisted syscall, it does either write or read on the pipe depending on whether it was open for read or write. so you need to move fsnotify_modify() into vmsplice_to_pipe() and add fsnotify_access() to vmsplice_to_user(). Thanks, Amir. ^ permalink raw reply [flat|nested] 55+ messages in thread
* [PATCH v2 3/3] splice: fsnotify_access(in), fsnotify_modify(out) on success in tee 2023-06-26 18:57 ` Amir Goldstein ` (2 preceding siblings ...) 2023-06-26 23:09 ` [PATCH v2 2/3] splice: fsnotify_modify(fd) in vmsplice Ahelenia Ziemiańska @ 2023-06-26 23:09 ` Ahelenia Ziemiańska 2023-06-27 6:20 ` Amir Goldstein 3 siblings, 1 reply; 55+ messages in thread From: Ahelenia Ziemiańska @ 2023-06-26 23:09 UTC (permalink / raw) To: Amir Goldstein Cc: Alexander Viro, Christian Brauner, linux-fsdevel, linux-kernel, Jan Kara, Chung-Chiang Cheng [-- Attachment #1: Type: text/plain, Size: 792 bytes --] Same logic applies here: this can fill up the pipe, and pollers that rely on getting IN_MODIFY notifications never wake up. Fixes: 983652c69199 ("splice: report related fsnotify events") Link: https://lore.kernel.org/linux-fsdevel/jbyihkyk5dtaohdwjyivambb2gffyjs3dodpofafnkkunxq7bu@jngkdxx65pux/t/#u Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> --- fs/splice.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/fs/splice.c b/fs/splice.c index a18274209dc1..3234aaa6e957 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -1819,6 +1819,11 @@ long do_tee(struct file *in, struct file *out, size_t len, unsigned int flags) } } + if (ret > 0) { + fsnotify_access(in); + fsnotify_modify(out); + } + return ret; } -- 2.39.2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 55+ messages in thread
* Re: [PATCH v2 3/3] splice: fsnotify_access(in), fsnotify_modify(out) on success in tee 2023-06-26 23:09 ` [PATCH v2 3/3] splice: fsnotify_access(in), fsnotify_modify(out) on success in tee Ahelenia Ziemiańska @ 2023-06-27 6:20 ` Amir Goldstein 0 siblings, 0 replies; 55+ messages in thread From: Amir Goldstein @ 2023-06-27 6:20 UTC (permalink / raw) To: Ahelenia Ziemiańska Cc: Alexander Viro, Christian Brauner, linux-fsdevel, linux-kernel, Jan Kara, Chung-Chiang Cheng On Tue, Jun 27, 2023 at 2:09 AM Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> wrote: > > Same logic applies here: this can fill up the pipe, and pollers that rely > on getting IN_MODIFY notifications never wake up. > > Fixes: 983652c69199 ("splice: report related fsnotify events") > Link: https://lore.kernel.org/linux-fsdevel/jbyihkyk5dtaohdwjyivambb2gffyjs3dodpofafnkkunxq7bu@jngkdxx65pux/t/#u > Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Makes sense. Reviewed-by: Amir Goldstein <amir73il@gmail.com> > --- > fs/splice.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/fs/splice.c b/fs/splice.c > index a18274209dc1..3234aaa6e957 100644 > --- a/fs/splice.c > +++ b/fs/splice.c > @@ -1819,6 +1819,11 @@ long do_tee(struct file *in, struct file *out, size_t len, unsigned int flags) > } > } > > + if (ret > 0) { > + fsnotify_access(in); > + fsnotify_modify(out); > + } > + > return ret; > } > > -- > 2.39.2 ^ permalink raw reply [flat|nested] 55+ messages in thread
end of thread, other threads:[~2023-06-30 11:04 UTC | newest] Thread overview: 55+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-06-26 3:04 splice(-> FIFO) never wakes up inotify IN_MODIFY? Ahelenia Ziemiańska 2023-06-26 6:11 ` Amir Goldstein 2023-06-26 12:19 ` Ahelenia Ziemiańska 2023-06-26 12:57 ` Ahelenia Ziemiańska 2023-06-26 13:51 ` Jan Kara 2023-06-26 14:25 ` Ahelenia Ziemiańska 2023-06-26 15:00 ` Jan Kara 2023-06-26 15:15 ` Ahelenia Ziemiańska 2023-06-26 16:52 ` Jan Kara 2023-06-26 14:53 ` Amir Goldstein 2023-06-26 15:12 ` Ahelenia Ziemiańska 2023-06-26 16:21 ` Amir Goldstein 2023-06-26 17:14 ` Ahelenia Ziemiańska 2023-06-26 18:57 ` Amir Goldstein 2023-06-26 23:08 ` [PATCH v2 0/3] fanotify accounting for fs/splice.c наб 2023-06-27 6:14 ` Amir Goldstein 2023-06-27 16:55 ` [PATCH v3 0/3+1] " Ahelenia Ziemiańska 2023-06-27 16:55 ` [PATCH v3 1/3] splice: always fsnotify_access(in), fsnotify_modify(out) on success Ahelenia Ziemiańska 2023-06-27 18:10 ` Amir Goldstein 2023-06-27 20:13 ` Ahelenia Ziemiańska 2023-06-27 16:55 ` [PATCH v3 2/3] splice: fsnotify_access(fd)/fsnotify_modify(fd) in vmsplice Ahelenia Ziemiańska 2023-06-27 18:11 ` Amir Goldstein 2023-06-27 16:55 ` [PATCH v3 3/3] splice: fsnotify_access(in), fsnotify_modify(out) on success in tee Ahelenia Ziemiańska 2023-06-27 16:57 ` [LTP PATCH] inotify13: new test for fs/splice.c functions vs pipes vs inotify Ahelenia Ziemiańska 2023-06-27 18:31 ` Amir Goldstein 2023-06-27 20:59 ` [LTP RFC PATCH v2] " Ahelenia Ziemiańska 2023-06-28 0:21 ` [LTP RFC PATCH v3] " Ahelenia Ziemiańska 2023-06-28 5:30 ` Amir Goldstein 2023-06-28 16:03 ` Ahelenia Ziemiańska 2023-06-27 22:57 ` [LTP PATCH] " Petr Vorel 2023-06-27 18:03 ` [PATCH v3 0/3+1] fanotify accounting for fs/splice.c Amir Goldstein 2023-06-27 20:34 ` Ahelenia Ziemiańska 2023-06-27 20:50 ` [PATCH v4 0/3] " Ahelenia Ziemiańska 2023-06-27 20:50 ` [PATCH v4 1/3] splice: always fsnotify_access(in), fsnotify_modify(out) on success Ahelenia Ziemiańska 2023-06-28 6:33 ` Amir Goldstein 2023-06-28 10:11 ` Jan Kara 2023-06-28 17:09 ` Ahelenia Ziemiańska 2023-06-28 18:38 ` Amir Goldstein 2023-06-28 20:18 ` Ahelenia Ziemiańska 2023-06-30 11:03 ` Amir Goldstein 2023-06-27 20:50 ` [PATCH v4 2/3] splice: fsnotify_access(fd)/fsnotify_modify(fd) in vmsplice Ahelenia Ziemiańska 2023-06-27 20:51 ` [PATCH v4 3/3] splice: fsnotify_access(in), fsnotify_modify(out) on success in tee Ahelenia Ziemiańska 2023-06-28 6:02 ` [PATCH v4 0/3] fanotify accounting for fs/splice.c Amir Goldstein 2023-06-28 11:38 ` Jan Kara 2023-06-28 13:41 ` Amir Goldstein 2023-06-28 18:54 ` Ahelenia Ziemiańska 2023-06-29 8:45 ` Jan Kara 2023-06-28 4:51 ` [PATCH v3 0/3+1] " Christoph Hellwig 2023-06-28 10:38 ` Jan Kara 2023-06-26 23:09 ` [PATCH v2 1/3] splice: always fsnotify_access(in), fsnotify_modify(out) on success Ahelenia Ziemiańska 2023-06-27 6:02 ` Amir Goldstein 2023-06-26 23:09 ` [PATCH v2 2/3] splice: fsnotify_modify(fd) in vmsplice Ahelenia Ziemiańska 2023-06-27 6:27 ` Amir Goldstein 2023-06-26 23:09 ` [PATCH v2 3/3] splice: fsnotify_access(in), fsnotify_modify(out) on success in tee Ahelenia Ziemiańska 2023-06-27 6:20 ` Amir Goldstein
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).