TL;DR: - Trivial patches to replace calls to get_unused_fd() by get_unused_fd_flags(0); - Patch to remove get_unused_fd(); - Patch to add support O_CLOEXEC in fanotify_init(). Hi, Please find the eighth revision of my patchset to remove get_unused_fd() macro in order to help subsystems to use get_unused_fd_flags() (or anon_inode_getfd()) with flags either provided by the userspace or set to O_CLOEXEC by default where appropriate. Without get_unused_fd() macro, more subsystems are likely to use get_unused_fd_flags() (or anon_inode_getfd()) and be taught to provide an API that let userspace choose the opening flags of the file descriptor. Not allowing userspace to provide the "open" flags or not using O_CLOEXEC by default should be considered bad practice from security point of view: in most case O_CLOEXEC must be used to not leak file descriptor across exec(). Not allowing userspace to atomically set close-on-exec flag and not using O_CLOEXEC should be avoided to protect multi-threaded program from race condition when it tried to set close-on-exec flag using fcntl(fd, F_SETFD, FD_CLOEXEC) after opening the file descriptor. Example: int fd; int ret; fd = open(filename, O_RDONLY); if (fd < 0) { perror("open"); return -1; } /* * window opened for another thread to call fork(), * then the new process can call exec() at any time * and the file descriptor would be inherited */ ret = fcntl(fd, F_SETFD, FD_CLOEXEC) if (ret < 0) { perror("fnctl()"); close(fd); return -1; } vs.: int fd; fd = open(filename, O_RDONLY | O_CLOEXEC); if (fd < 0) { perror("open"); return -1; } Using O_CLOEXEC by default when flags are not (eg. cannot be) provided by userspace is the safest bet as it allows userspace to choose, if, when and where the file descriptor is going to be inherited across exec(): userspace is free to call fcntl() to remove FD_CLOEXEC flag in the child process that will call exec(). Unfortunately, O_CLOEXEC cannot be made the default for most existing system calls as it's not the default behavior for POSIX / Unix. Reader interested in this issue could have a look at "Ghosts of Unix past, part 2: Conflated designs" [1] article by Neil Brown. FAQ: - Why do one want close-on-exec ? Setting close-on-exec flag on file descriptor ensure it won't be inherited silently by child, child of child, etc. when executing another program. If the file descriptor is not closed, some kernel resources can be locked until the last process with the opened file descriptor exit. If the file descriptor is not closed, this can lead to a security issue, eg. making resources available to a less privileged program allowing information leak and/or deny of service. - Why do one need atomic close-on-exec ? Even if it's possible to set close-on-exec flag through call to fcntl() as shown previously, it introduces a race condition in multi-threaded process, where a thread call fork() / exec() while another thread is between call to open() and fcntl(). Additionally, using close-on-exec free the programmer from tracking manually which file descriptor is to be closed in a child before calling exec(): in a program using multiple third-party libraries, it's difficult to know which file descriptor must be closed. AFAIK, while there's a atexit(), pthread_atfork(), there's no atexec() userspace function in libc to allow libraries to register a handler in order to close its opened file descriptor before exec(). - Why default to close-on-exec ? Some kernel interfaces don't allow userspace to pass a O_CLOEXEC-like flag when creating a new file descriptor. In such cases, if possible (see below), O_CLOEXEC must be made the default so that userspace doesn't have to call fcntl() which, as demonstrated previously, is open to race condition in multi-threaded program. - How to choose between flag 0 or O_CLOEXEC in call to get_unused_fd_flags() (or anon_inode_getfd()) ? Short: Will it break existing application ? Will it break kernel ABI ? If answer is no, use O_CLOEXEC. If answer is yes, use 0. Long: If userspace expect to retrieve a file descriptor with plain old Unix(tm) semantics, O_CLOEXEC must not be made the default, as it could break some applications expecting that the file descriptor will be inherited across exec(). But for some subsystems, such as InfiniBand, KVM, VFIO, etc. it makes no sense to have file descriptors inherited across exec() since those are tied to resources that will vanish when a another program will replace the current one by mean of exec(), so it's safe to use O_CLOEXEC in such cases. For others, like XFS, the file descriptor is retrieved by one program and will be used by a different program, executed as a child. In this case, setting O_CLOEXEC would break existing application which do not expect to have to call fcntl(fd, F_SETFD, 0) to make it available across exec(). If file descriptor created by a subsystem is not tied to the current process resources, it's likely legal to use it in a different process context, thus O_CLOEXEC must not be the default. If one, as a subsystem maintainer, cannot tell for sure that no userspace program ever rely current behavior, eg. file descriptor being inherited across exec(), then the default flag *must* be kept 0 to not break application. - This subsystem cannot be turned to use O_CLOEXEC by default: If O_CLOEXEC cannot be made the default, it would be interesting to think to extend the API to have a (set of) function(s) taking a flag parameter so that userspace can atomically request close-on-exec if it need it (and it should need it !). - Background: One might want to read "Secure File Descriptor Handling" [2] by Ulrich Drepper who is responsible of adding O_CLOEXEC flag on open(), and flags alike on other syscalls. One might also want to read PEP-446 "Make newly created file descriptors non-inheritable" [3] by Victor Stinner since it has lot more background information on file descriptor leaking. One would also like to read "Excuse me son, but your code is leaking !!!" [4] by Dan Walsh for advice. [1] http://lwn.net/Articles/412131/ [2] http://udrepper.livejournal.com/20407.html [3] http://www.python.org/dev/peps/pep-0446/ [4] http://danwalsh.livejournal.com/53603.html - Statistics: In linux-next, tag next-20140924, they're currently: - 33 calls to fd_install() with one call part of anon_inode_getfd() - 26 calls to get_unused_fd_flags() with one call part of anon_inode_getfd() with another part of get_unused_fd() macro - 11 calls to anon_inode_getfd() - 8 calls to anon_inode_getfile() with one call part of anon_inode_getfd() - 6 calls to get_unused_fd() Changes from patchset v7 [PATCHSETv7] - Rebased on top of latest -next - Simplified commit message for trivial patches - Proofread commit messages - Addded CC: linux-api@vger.kernel.org Changes from patchset v6 [PATCHSETv6] - Rebased on top of latest -next - Added Cc: trivial@kernel.org for the first trivials patches. Changes from patchset v5 [PATCHSETv5] - perf: introduce a flag to enable close-on-exec in perf_event_open() DROPPED: applied upstream, commit a21b0b354d4a. Changes from patchset v4 [PATCHSETv4]: - rewrote cover letter following discussion with perf maintainer. Thanks to Peter Zijlstra. - modified a bit some commit messages. - events: use get_unused_fd_flags(0) instead of get_unused_fd() DROPPED: replaced by following patch. - perf: introduce a flag to enable close-on-exec in perf_event_open() NEW: instead of hard coding the flags to 0, this patch allows userspace to specify close-on-exec flag. - fanotify: use get_unused_fd_flags(0) instead of get_unused_fd() DROPPED: replaced by following patch. - fanotify: enable close-on-exec on events' fd when requested in fanotify_init() NEW: instead of hard coding the flags to 0, this patch enable close-on-exec if userspace request it. Changes from patchset v3 [PATCHSETv3]: - industrialio: use anon_inode_getfd() with O_CLOEXEC flag DROPPED: applied upstream, commit a646fbf0fd11. Changes from patchset v2 [PATCHSETv2]: - android/sw_sync: use get_unused_fd_flags(O_CLOEXEC) instead of get_unused_fd() DROPPED: applied upstream, commit 45acea57335e. - android/sync: use get_unused_fd_flags(O_CLOEXEC) instead of get_unused_fd() DROPPED: applied upstream, commit 9c6cd3b39048. - vfio: use get_unused_fd_flags(0) instead of get_unused_fd() DROPPED: applied upstream, commit a5d550703d2c. Additionally subsystem maintainer applied another patch on top to set the flags to O_CLOEXEC, commit 5d042fbdbb2d. - industrialio: use anon_inode_getfd() with O_CLOEXEC flag NEW: propose to use O_CLOEXEC as default flag. Changes from patchset v1 [PATCHSETv1]: - explicitly added subsystem maintainers as mail recepients. - infiniband: use get_unused_fd_flags(0) instead of get_unused_fd() DROPPED: subsystem maintainer applied another patch using get_unused_fd_flags(O_CLOEXEC) as suggested, commit da183c7af844. - android/sw_sync: use get_unused_fd_flags(0) instead of get_unused_fd() MODIFIED: use get_unused_fd_flags(O_CLOEXEC) as suggested. - android/sync: use get_unused_fd_flags(0) instead of get_unused_fd() MODIFIED: use get_unused_fd_flags(O_CLOEXEC) as suggested. - xfs: use get_unused_fd_flags(0) instead of get_unused_fd() DROPPED: applied asis by subsystem maintainer, commit 862a62937e76. - sctp: use get_unused_fd_flags(0) instead of get_unused_fd() DROPPED: applied asis by subsystem maintainer, commit 8a59bd3e9b29. Links: [PATCHSETv7] http://lkml.kernel.org/r/cover.1401630396.git.ydroneaud@opteya.com [PATCHSETv6] http://lkml.kernel.org/r/cover.1394532336.git.ydroneaud@opteya.com [PATCHSETv5] http://lkml.kernel.org/r/cover.1388952061.git.ydroneaud@opteya.com [PATCHSETv4] http://lkml.kernel.org/r/cover.1383121137.git.ydroneaud@opteya.com [PATCHSETv3] http://lkml.kernel.org/r/cover.1378460926.git.ydroneaud@opteya.com [PATCHSETv2] http://lkml.kernel.org/r/cover.1376327678.git.ydroneaud@opteya.com [PATCHSETv1] http://lkml.kernel.org/r/cover.1372777600.git.ydroneaud@opteya.com Yann Droneaud (6): fanotify: enable close-on-exec on events' fd when requested in fanotify_init() ia64: trivial: replace get_unused_fd() by get_unused_fd_flags(0) ppc/cell: trivial: replace get_unused_fd() by get_unused_fd_flags(0) binfmt_misc: trivial: replace get_unused_fd() by get_unused_fd_flags(0) file: trivial: replace get_unused_fd() by get_unused_fd_flags(0) file: remove get_unused_fd() macro arch/ia64/kernel/perfmon.c | 2 +- arch/powerpc/platforms/cell/spufs/inode.c | 4 ++-- fs/binfmt_misc.c | 2 +- fs/file.c | 2 +- fs/notify/fanotify/fanotify_user.c | 2 +- include/linux/file.h | 1 - 6 files changed, 6 insertions(+), 7 deletions(-) -- 1.9.3
TL;DR: - Trivial patches to replace calls to get_unused_fd() by get_unused_fd_flags(0); - Patch to remove get_unused_fd(); - Patch to add support O_CLOEXEC in fanotify_init(). Hi, Please find the eighth revision of my patchset to remove get_unused_fd() macro in order to help subsystems to use get_unused_fd_flags() (or anon_inode_getfd()) with flags either provided by the userspace or set to O_CLOEXEC by default where appropriate. Without get_unused_fd() macro, more subsystems are likely to use get_unused_fd_flags() (or anon_inode_getfd()) and be taught to provide an API that let userspace choose the opening flags of the file descriptor. Not allowing userspace to provide the "open" flags or not using O_CLOEXEC by default should be considered bad practice from security point of view: in most case O_CLOEXEC must be used to not leak file descriptor across exec(). Not allowing userspace to atomically set close-on-exec flag and not using O_CLOEXEC should be avoided to protect multi-threaded program from race condition when it tried to set close-on-exec flag using fcntl(fd, F_SETFD, FD_CLOEXEC) after opening the file descriptor. Example: int fd; int ret; fd = open(filename, O_RDONLY); if (fd < 0) { perror("open"); return -1; } /* * window opened for another thread to call fork(), * then the new process can call exec() at any time * and the file descriptor would be inherited */ ret = fcntl(fd, F_SETFD, FD_CLOEXEC) if (ret < 0) { perror("fnctl()"); close(fd); return -1; } vs.: int fd; fd = open(filename, O_RDONLY | O_CLOEXEC); if (fd < 0) { perror("open"); return -1; } Using O_CLOEXEC by default when flags are not (eg. cannot be) provided by userspace is the safest bet as it allows userspace to choose, if, when and where the file descriptor is going to be inherited across exec(): userspace is free to call fcntl() to remove FD_CLOEXEC flag in the child process that will call exec(). Unfortunately, O_CLOEXEC cannot be made the default for most existing system calls as it's not the default behavior for POSIX / Unix. Reader interested in this issue could have a look at "Ghosts of Unix past, part 2: Conflated designs" [1] article by Neil Brown. FAQ: - Why do one want close-on-exec ? Setting close-on-exec flag on file descriptor ensure it won't be inherited silently by child, child of child, etc. when executing another program. If the file descriptor is not closed, some kernel resources can be locked until the last process with the opened file descriptor exit. If the file descriptor is not closed, this can lead to a security issue, eg. making resources available to a less privileged program allowing information leak and/or deny of service. - Why do one need atomic close-on-exec ? Even if it's possible to set close-on-exec flag through call to fcntl() as shown previously, it introduces a race condition in multi-threaded process, where a thread call fork() / exec() while another thread is between call to open() and fcntl(). Additionally, using close-on-exec free the programmer from tracking manually which file descriptor is to be closed in a child before calling exec(): in a program using multiple third-party libraries, it's difficult to know which file descriptor must be closed. AFAIK, while there's a atexit(), pthread_atfork(), there's no atexec() userspace function in libc to allow libraries to register a handler in order to close its opened file descriptor before exec(). - Why default to close-on-exec ? Some kernel interfaces don't allow userspace to pass a O_CLOEXEC-like flag when creating a new file descriptor. In such cases, if possible (see below), O_CLOEXEC must be made the default so that userspace doesn't have to call fcntl() which, as demonstrated previously, is open to race condition in multi-threaded program. - How to choose between flag 0 or O_CLOEXEC in call to get_unused_fd_flags() (or anon_inode_getfd()) ? Short: Will it break existing application ? Will it break kernel ABI ? If answer is no, use O_CLOEXEC. If answer is yes, use 0. Long: If userspace expect to retrieve a file descriptor with plain old Unix(tm) semantics, O_CLOEXEC must not be made the default, as it could break some applications expecting that the file descriptor will be inherited across exec(). But for some subsystems, such as InfiniBand, KVM, VFIO, etc. it makes no sense to have file descriptors inherited across exec() since those are tied to resources that will vanish when a another program will replace the current one by mean of exec(), so it's safe to use O_CLOEXEC in such cases. For others, like XFS, the file descriptor is retrieved by one program and will be used by a different program, executed as a child. In this case, setting O_CLOEXEC would break existing application which do not expect to have to call fcntl(fd, F_SETFD, 0) to make it available across exec(). If file descriptor created by a subsystem is not tied to the current process resources, it's likely legal to use it in a different process context, thus O_CLOEXEC must not be the default. If one, as a subsystem maintainer, cannot tell for sure that no userspace program ever rely current behavior, eg. file descriptor being inherited across exec(), then the default flag *must* be kept 0 to not break application. - This subsystem cannot be turned to use O_CLOEXEC by default: If O_CLOEXEC cannot be made the default, it would be interesting to think to extend the API to have a (set of) function(s) taking a flag parameter so that userspace can atomically request close-on-exec if it need it (and it should need it !). - Background: One might want to read "Secure File Descriptor Handling" [2] by Ulrich Drepper who is responsible of adding O_CLOEXEC flag on open(), and flags alike on other syscalls. One might also want to read PEP-446 "Make newly created file descriptors non-inheritable" [3] by Victor Stinner since it has lot more background information on file descriptor leaking. One would also like to read "Excuse me son, but your code is leaking !!!" [4] by Dan Walsh for advice. [1] http://lwn.net/Articles/412131/ [2] http://udrepper.livejournal.com/20407.html [3] http://www.python.org/dev/peps/pep-0446/ [4] http://danwalsh.livejournal.com/53603.html - Statistics: In linux-next, tag next-20140924, they're currently: - 33 calls to fd_install() with one call part of anon_inode_getfd() - 26 calls to get_unused_fd_flags() with one call part of anon_inode_getfd() with another part of get_unused_fd() macro - 11 calls to anon_inode_getfd() - 8 calls to anon_inode_getfile() with one call part of anon_inode_getfd() - 6 calls to get_unused_fd() Changes from patchset v7 [PATCHSETv7] - Rebased on top of latest -next - Simplified commit message for trivial patches - Proofread commit messages - Addded CC: linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org Changes from patchset v6 [PATCHSETv6] - Rebased on top of latest -next - Added Cc: trivial-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org for the first trivials patches. Changes from patchset v5 [PATCHSETv5] - perf: introduce a flag to enable close-on-exec in perf_event_open() DROPPED: applied upstream, commit a21b0b354d4a. Changes from patchset v4 [PATCHSETv4]: - rewrote cover letter following discussion with perf maintainer. Thanks to Peter Zijlstra. - modified a bit some commit messages. - events: use get_unused_fd_flags(0) instead of get_unused_fd() DROPPED: replaced by following patch. - perf: introduce a flag to enable close-on-exec in perf_event_open() NEW: instead of hard coding the flags to 0, this patch allows userspace to specify close-on-exec flag. - fanotify: use get_unused_fd_flags(0) instead of get_unused_fd() DROPPED: replaced by following patch. - fanotify: enable close-on-exec on events' fd when requested in fanotify_init() NEW: instead of hard coding the flags to 0, this patch enable close-on-exec if userspace request it. Changes from patchset v3 [PATCHSETv3]: - industrialio: use anon_inode_getfd() with O_CLOEXEC flag DROPPED: applied upstream, commit a646fbf0fd11. Changes from patchset v2 [PATCHSETv2]: - android/sw_sync: use get_unused_fd_flags(O_CLOEXEC) instead of get_unused_fd() DROPPED: applied upstream, commit 45acea57335e. - android/sync: use get_unused_fd_flags(O_CLOEXEC) instead of get_unused_fd() DROPPED: applied upstream, commit 9c6cd3b39048. - vfio: use get_unused_fd_flags(0) instead of get_unused_fd() DROPPED: applied upstream, commit a5d550703d2c. Additionally subsystem maintainer applied another patch on top to set the flags to O_CLOEXEC, commit 5d042fbdbb2d. - industrialio: use anon_inode_getfd() with O_CLOEXEC flag NEW: propose to use O_CLOEXEC as default flag. Changes from patchset v1 [PATCHSETv1]: - explicitly added subsystem maintainers as mail recepients. - infiniband: use get_unused_fd_flags(0) instead of get_unused_fd() DROPPED: subsystem maintainer applied another patch using get_unused_fd_flags(O_CLOEXEC) as suggested, commit da183c7af844. - android/sw_sync: use get_unused_fd_flags(0) instead of get_unused_fd() MODIFIED: use get_unused_fd_flags(O_CLOEXEC) as suggested. - android/sync: use get_unused_fd_flags(0) instead of get_unused_fd() MODIFIED: use get_unused_fd_flags(O_CLOEXEC) as suggested. - xfs: use get_unused_fd_flags(0) instead of get_unused_fd() DROPPED: applied asis by subsystem maintainer, commit 862a62937e76. - sctp: use get_unused_fd_flags(0) instead of get_unused_fd() DROPPED: applied asis by subsystem maintainer, commit 8a59bd3e9b29. Links: [PATCHSETv7] http://lkml.kernel.org/r/cover.1401630396.git.ydroneaud-RlY5vtjFyJ3QT0dZR+AlfA@public.gmane.org [PATCHSETv6] http://lkml.kernel.org/r/cover.1394532336.git.ydroneaud-RlY5vtjFyJ3QT0dZR+AlfA@public.gmane.org [PATCHSETv5] http://lkml.kernel.org/r/cover.1388952061.git.ydroneaud-RlY5vtjFyJ3QT0dZR+AlfA@public.gmane.org [PATCHSETv4] http://lkml.kernel.org/r/cover.1383121137.git.ydroneaud-RlY5vtjFyJ3QT0dZR+AlfA@public.gmane.org [PATCHSETv3] http://lkml.kernel.org/r/cover.1378460926.git.ydroneaud-RlY5vtjFyJ3QT0dZR+AlfA@public.gmane.org [PATCHSETv2] http://lkml.kernel.org/r/cover.1376327678.git.ydroneaud-RlY5vtjFyJ3QT0dZR+AlfA@public.gmane.org [PATCHSETv1] http://lkml.kernel.org/r/cover.1372777600.git.ydroneaud-RlY5vtjFyJ3QT0dZR+AlfA@public.gmane.org Yann Droneaud (6): fanotify: enable close-on-exec on events' fd when requested in fanotify_init() ia64: trivial: replace get_unused_fd() by get_unused_fd_flags(0) ppc/cell: trivial: replace get_unused_fd() by get_unused_fd_flags(0) binfmt_misc: trivial: replace get_unused_fd() by get_unused_fd_flags(0) file: trivial: replace get_unused_fd() by get_unused_fd_flags(0) file: remove get_unused_fd() macro arch/ia64/kernel/perfmon.c | 2 +- arch/powerpc/platforms/cell/spufs/inode.c | 4 ++-- fs/binfmt_misc.c | 2 +- fs/file.c | 2 +- fs/notify/fanotify/fanotify_user.c | 2 +- include/linux/file.h | 1 - 6 files changed, 6 insertions(+), 7 deletions(-) -- 1.9.3
TL;DR: - Trivial patches to replace calls to get_unused_fd() by get_unused_fd_flags(0); - Patch to remove get_unused_fd(); - Patch to add support O_CLOEXEC in fanotify_init(). Hi, Please find the eighth revision of my patchset to remove get_unused_fd() macro in order to help subsystems to use get_unused_fd_flags() (or anon_inode_getfd()) with flags either provided by the userspace or set to O_CLOEXEC by default where appropriate. Without get_unused_fd() macro, more subsystems are likely to use get_unused_fd_flags() (or anon_inode_getfd()) and be taught to provide an API that let userspace choose the opening flags of the file descriptor. Not allowing userspace to provide the "open" flags or not using O_CLOEXEC by default should be considered bad practice from security point of view: in most case O_CLOEXEC must be used to not leak file descriptor across exec(). Not allowing userspace to atomically set close-on-exec flag and not using O_CLOEXEC should be avoided to protect multi-threaded program from race condition when it tried to set close-on-exec flag using fcntl(fd, F_SETFD, FD_CLOEXEC) after opening the file descriptor. Example: int fd; int ret; fd = open(filename, O_RDONLY); if (fd < 0) { perror("open"); return -1; } /* * window opened for another thread to call fork(), * then the new process can call exec() at any time * and the file descriptor would be inherited */ ret = fcntl(fd, F_SETFD, FD_CLOEXEC) if (ret < 0) { perror("fnctl()"); close(fd); return -1; } vs.: int fd; fd = open(filename, O_RDONLY | O_CLOEXEC); if (fd < 0) { perror("open"); return -1; } Using O_CLOEXEC by default when flags are not (eg. cannot be) provided by userspace is the safest bet as it allows userspace to choose, if, when and where the file descriptor is going to be inherited across exec(): userspace is free to call fcntl() to remove FD_CLOEXEC flag in the child process that will call exec(). Unfortunately, O_CLOEXEC cannot be made the default for most existing system calls as it's not the default behavior for POSIX / Unix. Reader interested in this issue could have a look at "Ghosts of Unix past, part 2: Conflated designs" [1] article by Neil Brown. FAQ: - Why do one want close-on-exec ? Setting close-on-exec flag on file descriptor ensure it won't be inherited silently by child, child of child, etc. when executing another program. If the file descriptor is not closed, some kernel resources can be locked until the last process with the opened file descriptor exit. If the file descriptor is not closed, this can lead to a security issue, eg. making resources available to a less privileged program allowing information leak and/or deny of service. - Why do one need atomic close-on-exec ? Even if it's possible to set close-on-exec flag through call to fcntl() as shown previously, it introduces a race condition in multi-threaded process, where a thread call fork() / exec() while another thread is between call to open() and fcntl(). Additionally, using close-on-exec free the programmer from tracking manually which file descriptor is to be closed in a child before calling exec(): in a program using multiple third-party libraries, it's difficult to know which file descriptor must be closed. AFAIK, while there's a atexit(), pthread_atfork(), there's no atexec() userspace function in libc to allow libraries to register a handler in order to close its opened file descriptor before exec(). - Why default to close-on-exec ? Some kernel interfaces don't allow userspace to pass a O_CLOEXEC-like flag when creating a new file descriptor. In such cases, if possible (see below), O_CLOEXEC must be made the default so that userspace doesn't have to call fcntl() which, as demonstrated previously, is open to race condition in multi-threaded program. - How to choose between flag 0 or O_CLOEXEC in call to get_unused_fd_flags() (or anon_inode_getfd()) ? Short: Will it break existing application ? Will it break kernel ABI ? If answer is no, use O_CLOEXEC. If answer is yes, use 0. Long: If userspace expect to retrieve a file descriptor with plain old Unix(tm) semantics, O_CLOEXEC must not be made the default, as it could break some applications expecting that the file descriptor will be inherited across exec(). But for some subsystems, such as InfiniBand, KVM, VFIO, etc. it makes no sense to have file descriptors inherited across exec() since those are tied to resources that will vanish when a another program will replace the current one by mean of exec(), so it's safe to use O_CLOEXEC in such cases. For others, like XFS, the file descriptor is retrieved by one program and will be used by a different program, executed as a child. In this case, setting O_CLOEXEC would break existing application which do not expect to have to call fcntl(fd, F_SETFD, 0) to make it available across exec(). If file descriptor created by a subsystem is not tied to the current process resources, it's likely legal to use it in a different process context, thus O_CLOEXEC must not be the default. If one, as a subsystem maintainer, cannot tell for sure that no userspace program ever rely current behavior, eg. file descriptor being inherited across exec(), then the default flag *must* be kept 0 to not break application. - This subsystem cannot be turned to use O_CLOEXEC by default: If O_CLOEXEC cannot be made the default, it would be interesting to think to extend the API to have a (set of) function(s) taking a flag parameter so that userspace can atomically request close-on-exec if it need it (and it should need it !). - Background: One might want to read "Secure File Descriptor Handling" [2] by Ulrich Drepper who is responsible of adding O_CLOEXEC flag on open(), and flags alike on other syscalls. One might also want to read PEP-446 "Make newly created file descriptors non-inheritable" [3] by Victor Stinner since it has lot more background information on file descriptor leaking. One would also like to read "Excuse me son, but your code is leaking !!!" [4] by Dan Walsh for advice. [1] http://lwn.net/Articles/412131/ [2] http://udrepper.livejournal.com/20407.html [3] http://www.python.org/dev/peps/pep-0446/ [4] http://danwalsh.livejournal.com/53603.html - Statistics: In linux-next, tag next-20140924, they're currently: - 33 calls to fd_install() with one call part of anon_inode_getfd() - 26 calls to get_unused_fd_flags() with one call part of anon_inode_getfd() with another part of get_unused_fd() macro - 11 calls to anon_inode_getfd() - 8 calls to anon_inode_getfile() with one call part of anon_inode_getfd() - 6 calls to get_unused_fd() Changes from patchset v7 [PATCHSETv7] - Rebased on top of latest -next - Simplified commit message for trivial patches - Proofread commit messages - Addded CC: linux-api@vger.kernel.org Changes from patchset v6 [PATCHSETv6] - Rebased on top of latest -next - Added Cc: trivial@kernel.org for the first trivials patches. Changes from patchset v5 [PATCHSETv5] - perf: introduce a flag to enable close-on-exec in perf_event_open() DROPPED: applied upstream, commit a21b0b354d4a. Changes from patchset v4 [PATCHSETv4]: - rewrote cover letter following discussion with perf maintainer. Thanks to Peter Zijlstra. - modified a bit some commit messages. - events: use get_unused_fd_flags(0) instead of get_unused_fd() DROPPED: replaced by following patch. - perf: introduce a flag to enable close-on-exec in perf_event_open() NEW: instead of hard coding the flags to 0, this patch allows userspace to specify close-on-exec flag. - fanotify: use get_unused_fd_flags(0) instead of get_unused_fd() DROPPED: replaced by following patch. - fanotify: enable close-on-exec on events' fd when requested in fanotify_init() NEW: instead of hard coding the flags to 0, this patch enable close-on-exec if userspace request it. Changes from patchset v3 [PATCHSETv3]: - industrialio: use anon_inode_getfd() with O_CLOEXEC flag DROPPED: applied upstream, commit a646fbf0fd11. Changes from patchset v2 [PATCHSETv2]: - android/sw_sync: use get_unused_fd_flags(O_CLOEXEC) instead of get_unused_fd() DROPPED: applied upstream, commit 45acea57335e. - android/sync: use get_unused_fd_flags(O_CLOEXEC) instead of get_unused_fd() DROPPED: applied upstream, commit 9c6cd3b39048. - vfio: use get_unused_fd_flags(0) instead of get_unused_fd() DROPPED: applied upstream, commit a5d550703d2c. Additionally subsystem maintainer applied another patch on top to set the flags to O_CLOEXEC, commit 5d042fbdbb2d. - industrialio: use anon_inode_getfd() with O_CLOEXEC flag NEW: propose to use O_CLOEXEC as default flag. Changes from patchset v1 [PATCHSETv1]: - explicitly added subsystem maintainers as mail recepients. - infiniband: use get_unused_fd_flags(0) instead of get_unused_fd() DROPPED: subsystem maintainer applied another patch using get_unused_fd_flags(O_CLOEXEC) as suggested, commit da183c7af844. - android/sw_sync: use get_unused_fd_flags(0) instead of get_unused_fd() MODIFIED: use get_unused_fd_flags(O_CLOEXEC) as suggested. - android/sync: use get_unused_fd_flags(0) instead of get_unused_fd() MODIFIED: use get_unused_fd_flags(O_CLOEXEC) as suggested. - xfs: use get_unused_fd_flags(0) instead of get_unused_fd() DROPPED: applied asis by subsystem maintainer, commit 862a62937e76. - sctp: use get_unused_fd_flags(0) instead of get_unused_fd() DROPPED: applied asis by subsystem maintainer, commit 8a59bd3e9b29. Links: [PATCHSETv7] http://lkml.kernel.org/r/cover.1401630396.git.ydroneaud@opteya.com [PATCHSETv6] http://lkml.kernel.org/r/cover.1394532336.git.ydroneaud@opteya.com [PATCHSETv5] http://lkml.kernel.org/r/cover.1388952061.git.ydroneaud@opteya.com [PATCHSETv4] http://lkml.kernel.org/r/cover.1383121137.git.ydroneaud@opteya.com [PATCHSETv3] http://lkml.kernel.org/r/cover.1378460926.git.ydroneaud@opteya.com [PATCHSETv2] http://lkml.kernel.org/r/cover.1376327678.git.ydroneaud@opteya.com [PATCHSETv1] http://lkml.kernel.org/r/cover.1372777600.git.ydroneaud@opteya.com Yann Droneaud (6): fanotify: enable close-on-exec on events' fd when requested in fanotify_init() ia64: trivial: replace get_unused_fd() by get_unused_fd_flags(0) ppc/cell: trivial: replace get_unused_fd() by get_unused_fd_flags(0) binfmt_misc: trivial: replace get_unused_fd() by get_unused_fd_flags(0) file: trivial: replace get_unused_fd() by get_unused_fd_flags(0) file: remove get_unused_fd() macro arch/ia64/kernel/perfmon.c | 2 +- arch/powerpc/platforms/cell/spufs/inode.c | 4 ++-- fs/binfmt_misc.c | 2 +- fs/file.c | 2 +- fs/notify/fanotify/fanotify_user.c | 2 +- include/linux/file.h | 1 - 6 files changed, 6 insertions(+), 7 deletions(-) -- 1.9.3
TL;DR: - Trivial patches to replace calls to get_unused_fd() by get_unused_fd_flags(0); - Patch to remove get_unused_fd(); - Patch to add support O_CLOEXEC in fanotify_init(). Hi, Please find the eighth revision of my patchset to remove get_unused_fd() macro in order to help subsystems to use get_unused_fd_flags() (or anon_inode_getfd()) with flags either provided by the userspace or set to O_CLOEXEC by default where appropriate. Without get_unused_fd() macro, more subsystems are likely to use get_unused_fd_flags() (or anon_inode_getfd()) and be taught to provide an API that let userspace choose the opening flags of the file descriptor. Not allowing userspace to provide the "open" flags or not using O_CLOEXEC by default should be considered bad practice from security point of view: in most case O_CLOEXEC must be used to not leak file descriptor across exec(). Not allowing userspace to atomically set close-on-exec flag and not using O_CLOEXEC should be avoided to protect multi-threaded program from race condition when it tried to set close-on-exec flag using fcntl(fd, F_SETFD, FD_CLOEXEC) after opening the file descriptor. Example: int fd; int ret; fd = open(filename, O_RDONLY); if (fd < 0) { perror("open"); return -1; } /* * window opened for another thread to call fork(), * then the new process can call exec() at any time * and the file descriptor would be inherited */ ret = fcntl(fd, F_SETFD, FD_CLOEXEC) if (ret < 0) { perror("fnctl()"); close(fd); return -1; } vs.: int fd; fd = open(filename, O_RDONLY | O_CLOEXEC); if (fd < 0) { perror("open"); return -1; } Using O_CLOEXEC by default when flags are not (eg. cannot be) provided by userspace is the safest bet as it allows userspace to choose, if, when and where the file descriptor is going to be inherited across exec(): userspace is free to call fcntl() to remove FD_CLOEXEC flag in the child process that will call exec(). Unfortunately, O_CLOEXEC cannot be made the default for most existing system calls as it's not the default behavior for POSIX / Unix. Reader interested in this issue could have a look at "Ghosts of Unix past, part 2: Conflated designs" [1] article by Neil Brown. FAQ: - Why do one want close-on-exec ? Setting close-on-exec flag on file descriptor ensure it won't be inherited silently by child, child of child, etc. when executing another program. If the file descriptor is not closed, some kernel resources can be locked until the last process with the opened file descriptor exit. If the file descriptor is not closed, this can lead to a security issue, eg. making resources available to a less privileged program allowing information leak and/or deny of service. - Why do one need atomic close-on-exec ? Even if it's possible to set close-on-exec flag through call to fcntl() as shown previously, it introduces a race condition in multi-threaded process, where a thread call fork() / exec() while another thread is between call to open() and fcntl(). Additionally, using close-on-exec free the programmer from tracking manually which file descriptor is to be closed in a child before calling exec(): in a program using multiple third-party libraries, it's difficult to know which file descriptor must be closed. AFAIK, while there's a atexit(), pthread_atfork(), there's no atexec() userspace function in libc to allow libraries to register a handler in order to close its opened file descriptor before exec(). - Why default to close-on-exec ? Some kernel interfaces don't allow userspace to pass a O_CLOEXEC-like flag when creating a new file descriptor. In such cases, if possible (see below), O_CLOEXEC must be made the default so that userspace doesn't have to call fcntl() which, as demonstrated previously, is open to race condition in multi-threaded program. - How to choose between flag 0 or O_CLOEXEC in call to get_unused_fd_flags() (or anon_inode_getfd()) ? Short: Will it break existing application ? Will it break kernel ABI ? If answer is no, use O_CLOEXEC. If answer is yes, use 0. Long: If userspace expect to retrieve a file descriptor with plain old Unix(tm) semantics, O_CLOEXEC must not be made the default, as it could break some applications expecting that the file descriptor will be inherited across exec(). But for some subsystems, such as InfiniBand, KVM, VFIO, etc. it makes no sense to have file descriptors inherited across exec() since those are tied to resources that will vanish when a another program will replace the current one by mean of exec(), so it's safe to use O_CLOEXEC in such cases. For others, like XFS, the file descriptor is retrieved by one program and will be used by a different program, executed as a child. In this case, setting O_CLOEXEC would break existing application which do not expect to have to call fcntl(fd, F_SETFD, 0) to make it available across exec(). If file descriptor created by a subsystem is not tied to the current process resources, it's likely legal to use it in a different process context, thus O_CLOEXEC must not be the default. If one, as a subsystem maintainer, cannot tell for sure that no userspace program ever rely current behavior, eg. file descriptor being inherited across exec(), then the default flag *must* be kept 0 to not break application. - This subsystem cannot be turned to use O_CLOEXEC by default: If O_CLOEXEC cannot be made the default, it would be interesting to think to extend the API to have a (set of) function(s) taking a flag parameter so that userspace can atomically request close-on-exec if it need it (and it should need it !). - Background: One might want to read "Secure File Descriptor Handling" [2] by Ulrich Drepper who is responsible of adding O_CLOEXEC flag on open(), and flags alike on other syscalls. One might also want to read PEP-446 "Make newly created file descriptors non-inheritable" [3] by Victor Stinner since it has lot more background information on file descriptor leaking. One would also like to read "Excuse me son, but your code is leaking !!!" [4] by Dan Walsh for advice. [1] http://lwn.net/Articles/412131/ [2] http://udrepper.livejournal.com/20407.html [3] http://www.python.org/dev/peps/pep-0446/ [4] http://danwalsh.livejournal.com/53603.html - Statistics: In linux-next, tag next-20140924, they're currently: - 33 calls to fd_install() with one call part of anon_inode_getfd() - 26 calls to get_unused_fd_flags() with one call part of anon_inode_getfd() with another part of get_unused_fd() macro - 11 calls to anon_inode_getfd() - 8 calls to anon_inode_getfile() with one call part of anon_inode_getfd() - 6 calls to get_unused_fd() Changes from patchset v7 [PATCHSETv7] - Rebased on top of latest -next - Simplified commit message for trivial patches - Proofread commit messages - Addded CC: linux-api@vger.kernel.org Changes from patchset v6 [PATCHSETv6] - Rebased on top of latest -next - Added Cc: trivial@kernel.org for the first trivials patches. Changes from patchset v5 [PATCHSETv5] - perf: introduce a flag to enable close-on-exec in perf_event_open() DROPPED: applied upstream, commit a21b0b354d4a. Changes from patchset v4 [PATCHSETv4]: - rewrote cover letter following discussion with perf maintainer. Thanks to Peter Zijlstra. - modified a bit some commit messages. - events: use get_unused_fd_flags(0) instead of get_unused_fd() DROPPED: replaced by following patch. - perf: introduce a flag to enable close-on-exec in perf_event_open() NEW: instead of hard coding the flags to 0, this patch allows userspace to specify close-on-exec flag. - fanotify: use get_unused_fd_flags(0) instead of get_unused_fd() DROPPED: replaced by following patch. - fanotify: enable close-on-exec on events' fd when requested in fanotify_init() NEW: instead of hard coding the flags to 0, this patch enable close-on-exec if userspace request it. Changes from patchset v3 [PATCHSETv3]: - industrialio: use anon_inode_getfd() with O_CLOEXEC flag DROPPED: applied upstream, commit a646fbf0fd11. Changes from patchset v2 [PATCHSETv2]: - android/sw_sync: use get_unused_fd_flags(O_CLOEXEC) instead of get_unused_fd() DROPPED: applied upstream, commit 45acea57335e. - android/sync: use get_unused_fd_flags(O_CLOEXEC) instead of get_unused_fd() DROPPED: applied upstream, commit 9c6cd3b39048. - vfio: use get_unused_fd_flags(0) instead of get_unused_fd() DROPPED: applied upstream, commit a5d550703d2c. Additionally subsystem maintainer applied another patch on top to set the flags to O_CLOEXEC, commit 5d042fbdbb2d. - industrialio: use anon_inode_getfd() with O_CLOEXEC flag NEW: propose to use O_CLOEXEC as default flag. Changes from patchset v1 [PATCHSETv1]: - explicitly added subsystem maintainers as mail recepients. - infiniband: use get_unused_fd_flags(0) instead of get_unused_fd() DROPPED: subsystem maintainer applied another patch using get_unused_fd_flags(O_CLOEXEC) as suggested, commit da183c7af844. - android/sw_sync: use get_unused_fd_flags(0) instead of get_unused_fd() MODIFIED: use get_unused_fd_flags(O_CLOEXEC) as suggested. - android/sync: use get_unused_fd_flags(0) instead of get_unused_fd() MODIFIED: use get_unused_fd_flags(O_CLOEXEC) as suggested. - xfs: use get_unused_fd_flags(0) instead of get_unused_fd() DROPPED: applied asis by subsystem maintainer, commit 862a62937e76. - sctp: use get_unused_fd_flags(0) instead of get_unused_fd() DROPPED: applied asis by subsystem maintainer, commit 8a59bd3e9b29. Links: [PATCHSETv7] http://lkml.kernel.org/r/cover.1401630396.git.ydroneaud@opteya.com [PATCHSETv6] http://lkml.kernel.org/r/cover.1394532336.git.ydroneaud@opteya.com [PATCHSETv5] http://lkml.kernel.org/r/cover.1388952061.git.ydroneaud@opteya.com [PATCHSETv4] http://lkml.kernel.org/r/cover.1383121137.git.ydroneaud@opteya.com [PATCHSETv3] http://lkml.kernel.org/r/cover.1378460926.git.ydroneaud@opteya.com [PATCHSETv2] http://lkml.kernel.org/r/cover.1376327678.git.ydroneaud@opteya.com [PATCHSETv1] http://lkml.kernel.org/r/cover.1372777600.git.ydroneaud@opteya.com Yann Droneaud (6): fanotify: enable close-on-exec on events' fd when requested in fanotify_init() ia64: trivial: replace get_unused_fd() by get_unused_fd_flags(0) ppc/cell: trivial: replace get_unused_fd() by get_unused_fd_flags(0) binfmt_misc: trivial: replace get_unused_fd() by get_unused_fd_flags(0) file: trivial: replace get_unused_fd() by get_unused_fd_flags(0) file: remove get_unused_fd() macro arch/ia64/kernel/perfmon.c | 2 +- arch/powerpc/platforms/cell/spufs/inode.c | 4 ++-- fs/binfmt_misc.c | 2 +- fs/file.c | 2 +- fs/notify/fanotify/fanotify_user.c | 2 +- include/linux/file.h | 1 - 6 files changed, 6 insertions(+), 7 deletions(-) -- 1.9.3
According to commit 80af258867648 ('fanotify: groups can specify their f_flags for new fd'), file descriptors created as part of file access notification events inherit flags from the event_f_flags argument passed to syscall fanotify_init(2). So while it is legal for userspace to call fanotify_init() with O_CLOEXEC as part of its second argument, O_CLOEXEC is currently silently ignored. Indeed event_f_flags are only given to dentry_open(), which only seems to care about O_ACCMODE and O_PATH in do_dentry_open(), O_DIRECT in open_check_o_direct() and O_LARGEFILE in generic_file_open(). More, there's no effective check on event_f_flags value that would catch unknown / unsupported values, unlike the one on f_flags argument of the syscall (see FAN_ALL_INIT_FLAGS in include/uapi/linux/fanotify.h). Reading article "Botching up ioctls"[1] by Daniel Vetter might make one feel uncomfortable when trying to add extension to an API that doesn't check for unrecognized values. But it seems logical to set close-on-exec flag on the file descriptor if userspace is allowed to request it with O_CLOEXEC. In fact, according to some lookup on http://codesearch.debian.net/ and various search engine, there's already some userspace code requesting it: - in systemd's readahead[2]: fanotify_fd = fanotify_init(FAN_CLOEXEC|FAN_NONBLOCK, O_RDONLY|O_LARGEFILE|O_CLOEXEC|O_NOATIME); - in clsync[3]: #define FANOTIFY_EVFLAGS (O_LARGEFILE|O_RDONLY|O_CLOEXEC) int fanotify_d = fanotify_init(FANOTIFY_FLAGS, FANOTIFY_EVFLAGS); - in examples [4] from "Filesystem monitoring in the Linux kernel" article[5] by Aleksander Morgado: if ((fanotify_fd = fanotify_init (FAN_CLOEXEC, O_RDONLY | O_CLOEXEC | O_LARGEFILE)) < 0) Lookup also returned some wrong usage of the syscall: - in Gonk HAL from Mozilla Firefox OS sources[6]: mFd = fanotify_init(FAN_CLASS_NOTIF, FAN_CLOEXEC); Adding support for O_CLOEXEC in fanotify_init() won't magically enable it for Gonk since FAN_CLOEXEC is defined as 0x1, which is likely equal to O_WRONLY when used in open flag context. In the other hand, it won't hurt it either. So this patch replaces call to macro get_unused_fd() by a call to function get_unused_fd_flags() with event_f_flags value as argument. This way O_CLOEXEC flag in the second argument of fanotify_init(2) syscall is interpreted so that close-on-exec get enabled when requested. [1] http://blog.ffwll.ch/2013/11/botching-up-ioctls.html [2] http://cgit.freedesktop.org/systemd/systemd/tree/src/readahead/readahead-collect.c?id=v208#n294 [3] https://github.com/xaionaro/clsync/blob/v0.2.1/sync.c#L1631 https://github.com/xaionaro/clsync/blob/v0.2.1/configuration.h#L38 [4] http://www.lanedo.com/~aleksander/fanotify/fanotify-example.c [5] http://www.lanedo.com/2013/filesystem-monitoring-linux-kernel/ [6] http://hg.mozilla.org/mozilla-central/file/325c74addeba/hal/gonk/GonkDiskSpaceWatcher.cpp#l167 Link: http://lkml.kernel.org/r/cover.1411562410.git.ydroneaud@opteya.com Cc: Richard Guy Briggs <rgb@redhat.com> Cc: Eric Paris <eparis@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: stable@vger.kernel.org Cc: linux-api@vger.kernel.org Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> --- fs/notify/fanotify/fanotify_user.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index b13992a41bd9..c991616acca9 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -78,7 +78,7 @@ static int create_fd(struct fsnotify_group *group, pr_debug("%s: group=%p event=%p\n", __func__, group, event); - client_fd = get_unused_fd(); + client_fd = get_unused_fd_flags(group->fanotify_data.f_flags); if (client_fd < 0) return client_fd; -- 1.9.3
This patch replaces calls to get_unused_fd() with equivalent call to get_unused_fd_flags(0) to preserve current behavor for existing code. In a further patch, get_unused_fd() will be removed so that new code start using get_unused_fd_flags(), with the hope O_CLOEXEC could be used, either by default or choosen by userspace. Link: http://lkml.kernel.org/r/cover.1411562410.git.ydroneaud@opteya.com Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: trivial@kernel.org Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> --- arch/ia64/kernel/perfmon.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/ia64/kernel/perfmon.c b/arch/ia64/kernel/perfmon.c index 5845ffea67c3..dc063fe6646a 100644 --- a/arch/ia64/kernel/perfmon.c +++ b/arch/ia64/kernel/perfmon.c @@ -2662,7 +2662,7 @@ pfm_context_create(pfm_context_t *ctx, void *arg, int count, struct pt_regs *reg ret = -ENOMEM; - fd = get_unused_fd(); + fd = get_unused_fd_flags(0); if (fd < 0) return fd; -- 1.9.3
This patch replaces calls to get_unused_fd() with equivalent call to get_unused_fd_flags(0) to preserve current behavor for existing code. In a further patch, get_unused_fd() will be removed so that new code start using get_unused_fd_flags(), with the hope O_CLOEXEC could be used, either by default or choosen by userspace. Link: http://lkml.kernel.org/r/cover.1411562410.git.ydroneaud@opteya.com Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: trivial@kernel.org Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> --- arch/ia64/kernel/perfmon.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/ia64/kernel/perfmon.c b/arch/ia64/kernel/perfmon.c index 5845ffea67c3..dc063fe6646a 100644 --- a/arch/ia64/kernel/perfmon.c +++ b/arch/ia64/kernel/perfmon.c @@ -2662,7 +2662,7 @@ pfm_context_create(pfm_context_t *ctx, void *arg, int count, struct pt_regs *reg ret = -ENOMEM; - fd = get_unused_fd(); + fd = get_unused_fd_flags(0); if (fd < 0) return fd; -- 1.9.3
This patch replaces calls to get_unused_fd() with equivalent call to get_unused_fd_flags(0) to preserve current behavor for existing code. In a further patch, get_unused_fd() will be removed so that new code start using get_unused_fd_flags(), with the hope O_CLOEXEC could be used, either by default or choosen by userspace. Link: http://lkml.kernel.org/r/cover.1411562410.git.ydroneaud@opteya.com Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: trivial@kernel.org Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> --- arch/powerpc/platforms/cell/spufs/inode.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/platforms/cell/spufs/inode.c b/arch/powerpc/platforms/cell/spufs/inode.c index 87ba7cf99cd7..51effcec30d8 100644 --- a/arch/powerpc/platforms/cell/spufs/inode.c +++ b/arch/powerpc/platforms/cell/spufs/inode.c @@ -301,7 +301,7 @@ static int spufs_context_open(struct path *path) int ret; struct file *filp; - ret = get_unused_fd(); + ret = get_unused_fd_flags(0); if (ret < 0) return ret; @@ -518,7 +518,7 @@ static int spufs_gang_open(struct path *path) int ret; struct file *filp; - ret = get_unused_fd(); + ret = get_unused_fd_flags(0); if (ret < 0) return ret; -- 1.9.3
This patch replaces calls to get_unused_fd() with equivalent call to get_unused_fd_flags(0) to preserve current behavor for existing code. In a further patch, get_unused_fd() will be removed so that new code start using get_unused_fd_flags(), with the hope O_CLOEXEC could be used, either by default or choosen by userspace. Link: http://lkml.kernel.org/r/cover.1411562410.git.ydroneaud@opteya.com Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: trivial@kernel.org Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> --- arch/powerpc/platforms/cell/spufs/inode.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/platforms/cell/spufs/inode.c b/arch/powerpc/platforms/cell/spufs/inode.c index 87ba7cf99cd7..51effcec30d8 100644 --- a/arch/powerpc/platforms/cell/spufs/inode.c +++ b/arch/powerpc/platforms/cell/spufs/inode.c @@ -301,7 +301,7 @@ static int spufs_context_open(struct path *path) int ret; struct file *filp; - ret = get_unused_fd(); + ret = get_unused_fd_flags(0); if (ret < 0) return ret; @@ -518,7 +518,7 @@ static int spufs_gang_open(struct path *path) int ret; struct file *filp; - ret = get_unused_fd(); + ret = get_unused_fd_flags(0); if (ret < 0) return ret; -- 1.9.3
This patch replaces calls to get_unused_fd() with equivalent call to get_unused_fd_flags(0) to preserve current behavor for existing code. In a further patch, get_unused_fd() will be removed so that new code start using get_unused_fd_flags(), with the hope O_CLOEXEC could be used, either by default or choosen by userspace. Link: http://lkml.kernel.org/r/cover.1411562410.git.ydroneaud@opteya.com Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: trivial@kernel.org Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> --- fs/binfmt_misc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c index fd8beb9657a2..1cc5377ba955 100644 --- a/fs/binfmt_misc.c +++ b/fs/binfmt_misc.c @@ -153,7 +153,7 @@ static int load_misc_binary(struct linux_binprm *bprm) /* if the binary should be opened on behalf of the * interpreter than keep it open and assign descriptor * to it */ - fd_binary = get_unused_fd(); + fd_binary = get_unused_fd_flags(0); if (fd_binary < 0) { retval = fd_binary; goto _ret; -- 1.9.3
This patch replaces calls to get_unused_fd() with equivalent call to get_unused_fd_flags(0) to preserve current behavor for existing code. In a further patch, get_unused_fd() will be removed so that new code start using get_unused_fd_flags(), with the hope O_CLOEXEC could be used, either by default or choosen by userspace. Link: http://lkml.kernel.org/r/cover.1411562410.git.ydroneaud@opteya.com Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: trivial@kernel.org Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> --- fs/file.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/file.c b/fs/file.c index ab3eb6a88239..ee738ea028fa 100644 --- a/fs/file.c +++ b/fs/file.c @@ -869,7 +869,7 @@ SYSCALL_DEFINE1(dup, unsigned int, fildes) struct file *file = fget_raw(fildes); if (file) { - ret = get_unused_fd(); + ret = get_unused_fd_flags(0); if (ret >= 0) fd_install(ret, file); else -- 1.9.3
Macro get_unused_fd() is used to allocate a file descriptor with default flags. Those default flags (0) don't enable close-on-exec. This can be seen as an unsafe default: in most case close-on-exec should be enabled to not leak file descriptor across exec(). It would be better to have a "safer" default set of flags, eg. O_CLOEXEC must be used to enable close-on-exec. Instead this patch removes get_unused_fd() so that out of tree modules won't be affect by a runtime behavor change which might introduce other kind of bugs: it's better to catch the change at build time, making it easier to fix. Removing the macro will also promote use of get_unused_fd_flags() (or anon_inode_getfd()) with flags provided by userspace. Or, if flags cannot be given by userspace, with flags set to O_CLOEXEC by default. Link: http://lkml.kernel.org/r/cover.1411562410.git.ydroneaud@opteya.com Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> --- include/linux/file.h | 1 - 1 file changed, 1 deletion(-) diff --git a/include/linux/file.h b/include/linux/file.h index 4d69123377a2..f87d30882a24 100644 --- a/include/linux/file.h +++ b/include/linux/file.h @@ -66,7 +66,6 @@ extern void set_close_on_exec(unsigned int fd, int flag); extern bool get_close_on_exec(unsigned int fd); extern void put_filp(struct file *); extern int get_unused_fd_flags(unsigned flags); -#define get_unused_fd() get_unused_fd_flags(0) extern void put_unused_fd(unsigned int fd); extern void fd_install(unsigned int fd, struct file *file); -- 1.9.3
On 24.09.2014 15:11, Yann Droneaud wrote: > According to commit 80af258867648 ('fanotify: groups can specify > their f_flags for new fd'), file descriptors created as part of > file access notification events inherit flags from the > event_f_flags argument passed to syscall fanotify_init(2). > > So while it is legal for userspace to call fanotify_init() with > O_CLOEXEC as part of its second argument, O_CLOEXEC is currently > silently ignored. > > Indeed event_f_flags are only given to dentry_open(), which only > seems to care about O_ACCMODE and O_PATH in do_dentry_open(), > O_DIRECT in open_check_o_direct() and O_LARGEFILE in > generic_file_open(). I tested on kernel 3.17.0-rc5. I passed O_CLOEXEC in event_f_flags. When I called fcnt(event_metadata->fd, F_GETFD) it did not return FD_CLOEXEC. So I can confirm your observation that O_CLOEXEC is not working as expected. I found this definition #define get_unused_fd() get_unused_fd_flags(0) So essentially when get_unused_fd() is called for a fanotify event O_CLOEXEC is ignored. This is what your patch fixes. > > More, there's no effective check on event_f_flags value that > would catch unknown / unsupported values, unlike the one on > f_flags argument of the syscall (see FAN_ALL_INIT_FLAGS in > include/uapi/linux/fanotify.h). The fanotify_init(2) man page describes which flags are allowable in event_f_flags. Could you, please, explain why the following code in fanotify_user.c is not to be considered an effective check: if (event_f_flags & ~FANOTIFY_INIT_ALL_EVENT_F_BITS) return -EINVAL; switch (event_f_flags & O_ACCMODE) { case O_RDONLY: case O_RDWR: case O_WRONLY: break; default: return -EINVAL; } I CC Jan Kara as he reviewed the code. Best regards Heinrich Schuchardt
On 24.09.2014 15:11, Yann Droneaud wrote: > According to commit 80af258867648 ('fanotify: groups can specify > their f_flags for new fd'), file descriptors created as part of > file access notification events inherit flags from the > event_f_flags argument passed to syscall fanotify_init(2). > > So while it is legal for userspace to call fanotify_init() with > O_CLOEXEC as part of its second argument, O_CLOEXEC is currently > silently ignored. > > Indeed event_f_flags are only given to dentry_open(), which only > seems to care about O_ACCMODE and O_PATH in do_dentry_open(), > O_DIRECT in open_check_o_direct() and O_LARGEFILE in > generic_file_open(). I tested on kernel 3.17.0-rc5. I passed O_CLOEXEC in event_f_flags. When I called fcnt(event_metadata->fd, F_GETFD) it did not return FD_CLOEXEC. So I can confirm your observation that O_CLOEXEC is not working as expected. I found this definition #define get_unused_fd() get_unused_fd_flags(0) So essentially when get_unused_fd() is called for a fanotify event O_CLOEXEC is ignored. This is what your patch fixes. > > More, there's no effective check on event_f_flags value that > would catch unknown / unsupported values, unlike the one on > f_flags argument of the syscall (see FAN_ALL_INIT_FLAGS in > include/uapi/linux/fanotify.h). The fanotify_init(2) man page describes which flags are allowable in event_f_flags. Could you, please, explain why the following code in fanotify_user.c is not to be considered an effective check: if (event_f_flags & ~FANOTIFY_INIT_ALL_EVENT_F_BITS) return -EINVAL; switch (event_f_flags & O_ACCMODE) { case O_RDONLY: case O_RDWR: case O_WRONLY: break; default: return -EINVAL; } I CC Jan Kara as he reviewed the code. Best regards Heinrich Schuchardt
Hi, Le jeudi 25 septembre 2014 à 22:57 +0200, Heinrich Schuchardt a écrit : > On 24.09.2014 15:11, Yann Droneaud wrote: > > According to commit 80af258867648 ('fanotify: groups can specify > > their f_flags for new fd'), file descriptors created as part of > > file access notification events inherit flags from the > > event_f_flags argument passed to syscall fanotify_init(2). > > > > So while it is legal for userspace to call fanotify_init() with > > O_CLOEXEC as part of its second argument, O_CLOEXEC is currently > > silently ignored. > > > > Indeed event_f_flags are only given to dentry_open(), which only > > seems to care about O_ACCMODE and O_PATH in do_dentry_open(), > > O_DIRECT in open_check_o_direct() and O_LARGEFILE in > > generic_file_open(). > > I tested on kernel 3.17.0-rc5. I passed O_CLOEXEC in event_f_flags. > When I called fcnt(event_metadata->fd, F_GETFD) it did not return > FD_CLOEXEC. So I can confirm your observation that O_CLOEXEC is not > working as expected. > > I found this definition > #define get_unused_fd() get_unused_fd_flags(0) > > So essentially when get_unused_fd() is called for a fanotify event > O_CLOEXEC is ignored. > > This is what your patch fixes. > > > > > More, there's no effective check on event_f_flags value that > > would catch unknown / unsupported values, unlike the one on > > f_flags argument of the syscall (see FAN_ALL_INIT_FLAGS in > > include/uapi/linux/fanotify.h). > > The fanotify_init(2) man page describes which flags are allowable in > event_f_flags. > > Could you, please, explain why the following code in fanotify_user.c is > not to be considered an effective check: > > if (event_f_flags & ~FANOTIFY_INIT_ALL_EVENT_F_BITS) > return -EINVAL; > > switch (event_f_flags & O_ACCMODE) { > case O_RDONLY: > case O_RDWR: > case O_WRONLY: > break; > default: > return -EINVAL; > } > > I CC Jan Kara as he reviewed the code. > I missed the opportunity to update my commit message. I've sent my initial version of the patch (with the same wording) on 5th, January, 2014: http://mid.gmane.org/3d9591f81e62a78a726721c8052b3910870db35e.1388952061.git.ydroneaud@opteya.com http://lkml.kernel.org/r/3d9591f81e62a78a726721c8052b3910870db35e.1388952061.git.ydroneaud@opteya.com The patch was sent again on March, 11: http://lkml.kernel.org/r/baab31b572b216d13f2149bdf07e0f79a1bae660.1394532336.git.ydroneaud@opteya.com http://mid.gmane.org/baab31b572b216d13f2149bdf07e0f79a1bae660.1394532336.git.ydroneaud@opteya.com And another time, on June, 1st: http://lkml.kernel.org/r/2c6ab28980f0007ea3b9afa7ecd7497806a6a451.1401630396.git.ydroneaud@opteya.com http://mid.gmane.org/2c6ab28980f0007ea3b9afa7ecd7497806a6a451.1401630396.git.ydroneaud@opteya.com So as you can see, my patch predate yours: http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=48149e9d3a7e924010a0daab30a6197b7d7b6580 But I have to apologize: I haven't noticed your patch was merged between my previous submission and the current one. My bad. I will update the commit message to remove my obsolete comment on the input parameter check. Thanks again for review and testing. Regards. -- Yann Droneaud OPTEYA
Hello Andrew, please, add the patch described in https://lkml.org/lkml/2014/9/24/967 to the MM tree. I have tested kernel 3.17.0-rc6 with and without the patch and it fixes the described error. As the patch is valid independent of the rest of the patch set, there is no reason to wait for the rest to be merged. You may add Reviewed by: Heinrich Schuchardt <xypron.glpk@gmx.de> Best regards Heinrich Schuchardt On 26.09.2014 10:58, Yann Droneaud wrote: > Hi, > > Le jeudi 25 septembre 2014 à 22:57 +0200, Heinrich Schuchardt a écrit : >> On 24.09.2014 15:11, Yann Droneaud wrote: >>> According to commit 80af258867648 ('fanotify: groups can specify >>> their f_flags for new fd'), file descriptors created as part of >>> file access notification events inherit flags from the >>> event_f_flags argument passed to syscall fanotify_init(2). >>> >>> So while it is legal for userspace to call fanotify_init() with >>> O_CLOEXEC as part of its second argument, O_CLOEXEC is currently >>> silently ignored. >>> >>> Indeed event_f_flags are only given to dentry_open(), which only >>> seems to care about O_ACCMODE and O_PATH in do_dentry_open(), >>> O_DIRECT in open_check_o_direct() and O_LARGEFILE in >>> generic_file_open(). >> >> I tested on kernel 3.17.0-rc5. I passed O_CLOEXEC in event_f_flags. >> When I called fcnt(event_metadata->fd, F_GETFD) it did not return >> FD_CLOEXEC. So I can confirm your observation that O_CLOEXEC is not >> working as expected. >> >> I found this definition >> #define get_unused_fd() get_unused_fd_flags(0) >> >> So essentially when get_unused_fd() is called for a fanotify event >> O_CLOEXEC is ignored. >> >> This is what your patch fixes. >>
According to commit 80af258867648 ('fanotify: groups can specify their f_flags for new fd'), file descriptors created as part of file access notification events inherit flags from the event_f_flags argument passed to syscall fanotify_init(2). So while it is legal for userspace to call fanotify_init() with O_CLOEXEC as part of its second argument, O_CLOEXEC is currently silently ignored. Indeed event_f_flags are only given to dentry_open(), which only seems to care about O_ACCMODE and O_PATH in do_dentry_open(), O_DIRECT in open_check_o_direct() and O_LARGEFILE in generic_file_open(). But it seems logical to set close-on-exec flag on the file descriptor if userspace is allowed to request it with O_CLOEXEC. In fact, according to some lookup on http://codesearch.debian.net/ and various search engine, there's already some userspace code requesting it: - in systemd's readahead[2]: fanotify_fd = fanotify_init(FAN_CLOEXEC|FAN_NONBLOCK, O_RDONLY|O_LARGEFILE|O_CLOEXEC|O_NOATIME); - in clsync[3]: #define FANOTIFY_EVFLAGS (O_LARGEFILE|O_RDONLY|O_CLOEXEC) int fanotify_d = fanotify_init(FANOTIFY_FLAGS, FANOTIFY_EVFLAGS); - in examples [4] from "Filesystem monitoring in the Linux kernel" article[5] by Aleksander Morgado: if ((fanotify_fd = fanotify_init (FAN_CLOEXEC, O_RDONLY | O_CLOEXEC | O_LARGEFILE)) < 0) Lookup also returned some wrong usage of the syscall: - in Gonk HAL from Mozilla Firefox OS sources[6]: mFd = fanotify_init(FAN_CLASS_NOTIF, FAN_CLOEXEC); Adding support for O_CLOEXEC in fanotify_init() won't magically enable it for Gonk since FAN_CLOEXEC is defined as 0x1, which is likely equal to O_WRONLY when used in open flag context. In the other hand, it won't hurt it either. So this patch replaces call to macro get_unused_fd() by a call to function get_unused_fd_flags() with event_f_flags value as argument. This way O_CLOEXEC flag in the second argument of fanotify_init(2) syscall is interpreted so that close-on-exec get enabled when requested. [1] http://blog.ffwll.ch/2013/11/botching-up-ioctls.html [2] http://cgit.freedesktop.org/systemd/systemd/tree/src/readahead/readahead-collect.c?id=v208#n294 [3] https://github.com/xaionaro/clsync/blob/v0.2.1/sync.c#L1631 https://github.com/xaionaro/clsync/blob/v0.2.1/configuration.h#L38 [4] http://www.lanedo.com/~aleksander/fanotify/fanotify-example.c [5] http://www.lanedo.com/2013/filesystem-monitoring-linux-kernel/ [6] http://hg.mozilla.org/mozilla-central/file/325c74addeba/hal/gonk/GonkDiskSpaceWatcher.cpp#l167 Link: http://lkml.kernel.org/r/cover.1394532336.git.ydroneaud@opteya.com Cc: Heinrich Schuchardt <xypron.glpk@gmx.de> Cc: Jan Kara <jack@suse.cz> Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Cc: Michael Kerrisk-manpages <mtk.manpages@gmail.com> Cc: Lino Sanfilippo <LinoSanfilippo@gmx.de> Cc: Richard Guy Briggs <rgb@redhat.com> Cc: Eric Paris <eparis@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: stable@vger.kernel.org Cc: linux-api@vger.kernel.org Reviewed by: Heinrich Schuchardt <xypron.glpk@gmx.de> Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> --- Hi Andrew and Henrich, Please find an updated patch with a commit message fixed regarding the obsolote comments on code which is now updated, thanks to Heinrich's patch. Changes from v8: - fixed commit message - added Reviewed-by: Regards. Yann Droneaud, OPTEYA fs/notify/fanotify/fanotify_user.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index b13992a41bd9..c991616acca9 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -78,7 +78,7 @@ static int create_fd(struct fsnotify_group *group, pr_debug("%s: group=%p event=%p\n", __func__, group, event); - client_fd = get_unused_fd(); + client_fd = get_unused_fd_flags(group->fanotify_data.f_flags); if (client_fd < 0) return client_fd; -- 1.9.3
On Mon 29-09-14 10:49:15, Yann Droneaud wrote: > According to commit 80af258867648 ('fanotify: groups can specify > their f_flags for new fd'), file descriptors created as part of > file access notification events inherit flags from the > event_f_flags argument passed to syscall fanotify_init(2). > > So while it is legal for userspace to call fanotify_init() with > O_CLOEXEC as part of its second argument, O_CLOEXEC is currently > silently ignored. > > Indeed event_f_flags are only given to dentry_open(), which only > seems to care about O_ACCMODE and O_PATH in do_dentry_open(), > O_DIRECT in open_check_o_direct() and O_LARGEFILE in > generic_file_open(). > > But it seems logical to set close-on-exec flag on the file > descriptor if userspace is allowed to request it with O_CLOEXEC. > > In fact, according to some lookup on http://codesearch.debian.net/ > and various search engine, there's already some userspace code > requesting it: > > - in systemd's readahead[2]: > > fanotify_fd = fanotify_init(FAN_CLOEXEC|FAN_NONBLOCK, O_RDONLY|O_LARGEFILE|O_CLOEXEC|O_NOATIME); > > - in clsync[3]: > > #define FANOTIFY_EVFLAGS (O_LARGEFILE|O_RDONLY|O_CLOEXEC) > > int fanotify_d = fanotify_init(FANOTIFY_FLAGS, FANOTIFY_EVFLAGS); > > - in examples [4] from "Filesystem monitoring in the Linux > kernel" article[5] by Aleksander Morgado: > > if ((fanotify_fd = fanotify_init (FAN_CLOEXEC, > O_RDONLY | O_CLOEXEC | O_LARGEFILE)) < 0) > > Lookup also returned some wrong usage of the syscall: > > - in Gonk HAL from Mozilla Firefox OS sources[6]: > > mFd = fanotify_init(FAN_CLASS_NOTIF, FAN_CLOEXEC); > > Adding support for O_CLOEXEC in fanotify_init() won't magically > enable it for Gonk since FAN_CLOEXEC is defined as 0x1, which > is likely equal to O_WRONLY when used in open flag context. In > the other hand, it won't hurt it either. > > So this patch replaces call to macro get_unused_fd() by a call > to function get_unused_fd_flags() with event_f_flags value as > argument. This way O_CLOEXEC flag in the second argument of > fanotify_init(2) syscall is interpreted so that close-on-exec > get enabled when requested. > > [1] http://blog.ffwll.ch/2013/11/botching-up-ioctls.html > [2] http://cgit.freedesktop.org/systemd/systemd/tree/src/readahead/readahead-collect.c?id=v208#n294 > [3] https://github.com/xaionaro/clsync/blob/v0.2.1/sync.c#L1631 > https://github.com/xaionaro/clsync/blob/v0.2.1/configuration.h#L38 > [4] http://www.lanedo.com/~aleksander/fanotify/fanotify-example.c > [5] http://www.lanedo.com/2013/filesystem-monitoring-linux-kernel/ > [6] http://hg.mozilla.org/mozilla-central/file/325c74addeba/hal/gonk/GonkDiskSpaceWatcher.cpp#l167 > > Link: http://lkml.kernel.org/r/cover.1394532336.git.ydroneaud@opteya.com > Cc: Heinrich Schuchardt <xypron.glpk@gmx.de> > Cc: Jan Kara <jack@suse.cz> > Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> > Cc: Michael Kerrisk-manpages <mtk.manpages@gmail.com> > Cc: Lino Sanfilippo <LinoSanfilippo@gmx.de> > Cc: Richard Guy Briggs <rgb@redhat.com> > Cc: Eric Paris <eparis@redhat.com> > Cc: Al Viro <viro@zeniv.linux.org.uk> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: stable@vger.kernel.org > Cc: linux-api@vger.kernel.org > Reviewed by: Heinrich Schuchardt <xypron.glpk@gmx.de> > Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> The patch looks good. You can add: Reviewed-by: Jan Kara <jack@suse.cz> Honza > --- > Hi Andrew and Henrich, > > Please find an updated patch with a commit message fixed regarding > the obsolote comments on code which is now updated, thanks to > Heinrich's patch. > > Changes from v8: > - fixed commit message > - added Reviewed-by: > > Regards. > > Yann Droneaud, > OPTEYA > > fs/notify/fanotify/fanotify_user.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c > index b13992a41bd9..c991616acca9 100644 > --- a/fs/notify/fanotify/fanotify_user.c > +++ b/fs/notify/fanotify/fanotify_user.c > @@ -78,7 +78,7 @@ static int create_fd(struct fsnotify_group *group, > > pr_debug("%s: group=%p event=%p\n", __func__, group, event); > > - client_fd = get_unused_fd(); > + client_fd = get_unused_fd_flags(group->fanotify_data.f_flags); > if (client_fd < 0) > return client_fd; > > -- > 1.9.3 > -- Jan Kara <jack@suse.cz> SUSE Labs, CR
On Mon, 29 Sep 2014 10:49:15 +0200 Yann Droneaud <ydroneaud@opteya.com> wrote:
> According to commit 80af258867648 ('fanotify: groups can specify
> their f_flags for new fd'), file descriptors created as part of
> file access notification events inherit flags from the
> event_f_flags argument passed to syscall fanotify_init(2).
>
> So while it is legal for userspace to call fanotify_init() with
> O_CLOEXEC as part of its second argument, O_CLOEXEC is currently
> silently ignored.
>
> Indeed event_f_flags are only given to dentry_open(), which only
> seems to care about O_ACCMODE and O_PATH in do_dentry_open(),
> O_DIRECT in open_check_o_direct() and O_LARGEFILE in
> generic_file_open().
>
> But it seems logical to set close-on-exec flag on the file
> descriptor if userspace is allowed to request it with O_CLOEXEC.
>
> In fact, according to some lookup on http://codesearch.debian.net/
> and various search engine, there's already some userspace code
> requesting it:
>
> - in systemd's readahead[2]:
>
> fanotify_fd = fanotify_init(FAN_CLOEXEC|FAN_NONBLOCK, O_RDONLY|O_LARGEFILE|O_CLOEXEC|O_NOATIME);
>
> - in clsync[3]:
>
> #define FANOTIFY_EVFLAGS (O_LARGEFILE|O_RDONLY|O_CLOEXEC)
>
> int fanotify_d = fanotify_init(FANOTIFY_FLAGS, FANOTIFY_EVFLAGS);
>
> - in examples [4] from "Filesystem monitoring in the Linux
> kernel" article[5] by Aleksander Morgado:
>
> if ((fanotify_fd = fanotify_init (FAN_CLOEXEC,
> O_RDONLY | O_CLOEXEC | O_LARGEFILE)) < 0)
So we have a number of apps which are setting O_CLOEXEC, but it doesn't
actually work. With this change it *will* work, so the behaviour of
those apps might change, possibly breaking them?
On Mon, 29 Sep 2014 10:49:15 +0200 Yann Droneaud <ydroneaud-RlY5vtjFyJ3QT0dZR+AlfA@public.gmane.org> wrote:
> According to commit 80af258867648 ('fanotify: groups can specify
> their f_flags for new fd'), file descriptors created as part of
> file access notification events inherit flags from the
> event_f_flags argument passed to syscall fanotify_init(2).
>
> So while it is legal for userspace to call fanotify_init() with
> O_CLOEXEC as part of its second argument, O_CLOEXEC is currently
> silently ignored.
>
> Indeed event_f_flags are only given to dentry_open(), which only
> seems to care about O_ACCMODE and O_PATH in do_dentry_open(),
> O_DIRECT in open_check_o_direct() and O_LARGEFILE in
> generic_file_open().
>
> But it seems logical to set close-on-exec flag on the file
> descriptor if userspace is allowed to request it with O_CLOEXEC.
>
> In fact, according to some lookup on http://codesearch.debian.net/
> and various search engine, there's already some userspace code
> requesting it:
>
> - in systemd's readahead[2]:
>
> fanotify_fd = fanotify_init(FAN_CLOEXEC|FAN_NONBLOCK, O_RDONLY|O_LARGEFILE|O_CLOEXEC|O_NOATIME);
>
> - in clsync[3]:
>
> #define FANOTIFY_EVFLAGS (O_LARGEFILE|O_RDONLY|O_CLOEXEC)
>
> int fanotify_d = fanotify_init(FANOTIFY_FLAGS, FANOTIFY_EVFLAGS);
>
> - in examples [4] from "Filesystem monitoring in the Linux
> kernel" article[5] by Aleksander Morgado:
>
> if ((fanotify_fd = fanotify_init (FAN_CLOEXEC,
> O_RDONLY | O_CLOEXEC | O_LARGEFILE)) < 0)
So we have a number of apps which are setting O_CLOEXEC, but it doesn't
actually work. With this change it *will* work, so the behaviour of
those apps might change, possibly breaking them?
Hi,
Le mercredi 01 octobre 2014 à 15:36 -0700, Andrew Morton a écrit :
> On Mon, 29 Sep 2014 10:49:15 +0200 Yann Droneaud <ydroneaud@opteya.com> wrote:
>
> > According to commit 80af258867648 ('fanotify: groups can specify
> > their f_flags for new fd'), file descriptors created as part of
> > file access notification events inherit flags from the
> > event_f_flags argument passed to syscall fanotify_init(2).
> >
> > So while it is legal for userspace to call fanotify_init() with
> > O_CLOEXEC as part of its second argument, O_CLOEXEC is currently
> > silently ignored.
> >
> > Indeed event_f_flags are only given to dentry_open(), which only
> > seems to care about O_ACCMODE and O_PATH in do_dentry_open(),
> > O_DIRECT in open_check_o_direct() and O_LARGEFILE in
> > generic_file_open().
> >
> > But it seems logical to set close-on-exec flag on the file
> > descriptor if userspace is allowed to request it with O_CLOEXEC.
> >
> > In fact, according to some lookup on http://codesearch.debian.net/
> > and various search engine, there's already some userspace code
> > requesting it:
> >
> > - in systemd's readahead[2]:
> >
> > fanotify_fd = fanotify_init(FAN_CLOEXEC|FAN_NONBLOCK, O_RDONLY|O_LARGEFILE|O_CLOEXEC|O_NOATIME);
> >
> > - in clsync[3]:
> >
> > #define FANOTIFY_EVFLAGS (O_LARGEFILE|O_RDONLY|O_CLOEXEC)
> >
> > int fanotify_d = fanotify_init(FANOTIFY_FLAGS, FANOTIFY_EVFLAGS);
> >
> > - in examples [4] from "Filesystem monitoring in the Linux
> > kernel" article[5] by Aleksander Morgado:
> >
> > if ((fanotify_fd = fanotify_init (FAN_CLOEXEC,
> > O_RDONLY | O_CLOEXEC | O_LARGEFILE)) < 0)
>
> So we have a number of apps which are setting O_CLOEXEC, but it doesn't
> actually work. With this change it *will* work, so the behaviour of
> those apps might change, possibly breaking them?
>
In the other hand, not enabling close-on-exec might expose unwanted file
descriptor to childs, creating security issues. YMMV.
--
Yann Droneaud
OPTEYA
On Thu, 02 Oct 2014 08:20:55 +0200 Yann Droneaud wrote:
> Hi,
>
> Le mercredi 01 octobre 2014 à 15:36 -0700, Andrew Morton a écrit :
> > On Mon, 29 Sep 2014 10:49:15 +0200 Yann Droneaud <ydroneaud@opteya.com> wrote:
> >
> > > According to commit 80af258867648 ('fanotify: groups can specify
> > > their f_flags for new fd'), file descriptors created as part of
> > > file access notification events inherit flags from the
> > > event_f_flags argument passed to syscall fanotify_init(2).
> > >
> > > So while it is legal for userspace to call fanotify_init() with
> > > O_CLOEXEC as part of its second argument, O_CLOEXEC is currently
> > > silently ignored.
> > >
> > > Indeed event_f_flags are only given to dentry_open(), which only
> > > seems to care about O_ACCMODE and O_PATH in do_dentry_open(),
> > > O_DIRECT in open_check_o_direct() and O_LARGEFILE in
> > > generic_file_open().
> > >
> > > But it seems logical to set close-on-exec flag on the file
> > > descriptor if userspace is allowed to request it with O_CLOEXEC.
> > >
> > > In fact, according to some lookup on http://codesearch.debian.net/
> > > and various search engine, there's already some userspace code
> > > requesting it:
> > >
> > > - in systemd's readahead[2]:
> > >
> > > fanotify_fd = fanotify_init(FAN_CLOEXEC|FAN_NONBLOCK, O_RDONLY|O_LARGEFILE|O_CLOEXEC|O_NOATIME);
> > >
> > > - in clsync[3]:
> > >
> > > #define FANOTIFY_EVFLAGS (O_LARGEFILE|O_RDONLY|O_CLOEXEC)
> > >
> > > int fanotify_d = fanotify_init(FANOTIFY_FLAGS, FANOTIFY_EVFLAGS);
> > >
> > > - in examples [4] from "Filesystem monitoring in the Linux
> > > kernel" article[5] by Aleksander Morgado:
> > >
> > > if ((fanotify_fd = fanotify_init (FAN_CLOEXEC,
> > > O_RDONLY | O_CLOEXEC | O_LARGEFILE)) < 0)
> >
> > So we have a number of apps which are setting O_CLOEXEC, but it doesn't
> > actually work. With this change it *will* work, so the behaviour of
> > those apps might change, possibly breaking them?
> >
>
> In the other hand, not enabling close-on-exec might expose unwanted file
> descriptor to childs, creating security issues. YMMV.
>
As someone who uses fanotify for content introspection, I can say that
I am _explicitly_ marking the fd obtained via read() as O_CLOEXEC,
because I have encountered a situation where a child managed to create
a deadlock because it kept the fd open after the main application
responded with FAN_ALLOW.
--
Mihai Donțu
In order to not potentially break applications which were requesting O_CLOEXEC on event file descriptors but which actually need it to be not effective as the kernel currently ignore the flag, so the file descriptor is inherited accross exec regardless of O_CLOEXEC (please forgive me for the wording), this patch introduces FAN_FD_CLOEXEC flag to fanotify_init() so that application can request O_CLOEXEC to be effective. Newer application would use FAN_FD_CLOEXEC flag along O_CLOEXEC to enable close on exec on newly created file descriptor: fd = fanotify_init(FAN_CLOEXEC|FAN_NONBLOCK|FAN_FD_CLOEXEC, O_RDONLY|O_LARGEFILE|O_CLOEXEC|O_NOATIME); Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> --- Hi Andrew, While I believe fanotify_init() must enable close-on-exec when requested by userspace to prevent unwelcomed security issue, I understand your concerns regarding the possible breakage on userspace application requesting O_CLOEXEC but relying on it not being enable on file descriptor created for the events. So with a new flag to fanotify_init(), we could allow newer applications to really enable O_CLOEXEC. But I feel bad to have to force application to specify twice they want close on exec: - are you sure ? - are you really sure ? - is this your final answer ? ... Regards. Yann Droneaud OPTEYA fs/notify/fanotify/fanotify_user.c | 6 +++++- include/uapi/linux/fanotify.h | 5 ++++- 2 files changed, 9 insertions(+), 2 deletions(-) diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index c991616acca9..3c1fb1412f37 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -78,7 +78,7 @@ static int create_fd(struct fsnotify_group *group, pr_debug("%s: group=%p event=%p\n", __func__, group, event); - client_fd = get_unused_fd_flags(group->fanotify_data.f_flags); + client_fd = get_unused_fd_flags(group->fanotify_data.f_flags & O_CLOEXEC); if (client_fd < 0) return client_fd; @@ -706,6 +706,10 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags) return -EINVAL; } + if ((event_f_flags & O_CLOEXEC) && + !(flags & FAN_FD_CLOEXEC)) + event_f_flags ^= O_CLOEXEC; + user = get_current_user(); if (atomic_read(&user->fanotify_listeners) > FANOTIFY_DEFAULT_MAX_LISTENERS) { free_uid(user); diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h index 030508d195d3..f2d517be3152 100644 --- a/include/uapi/linux/fanotify.h +++ b/include/uapi/linux/fanotify.h @@ -36,7 +36,10 @@ #define FAN_UNLIMITED_QUEUE 0x00000010 #define FAN_UNLIMITED_MARKS 0x00000020 -#define FAN_ALL_INIT_FLAGS (FAN_CLOEXEC | FAN_NONBLOCK | \ +/* flags used for fanotify_init() too */ +#define FAN_FD_CLOEXEC 0x00000100 + +#define FAN_ALL_INIT_FLAGS (FAN_CLOEXEC | FAN_NONBLOCK | FAN_FD_CLOEXEC | \ FAN_ALL_CLASS_BITS | FAN_UNLIMITED_QUEUE |\ FAN_UNLIMITED_MARKS) -- 1.9.3
On 10/02/2014 08:52 AM, Yann Droneaud wrote:
> In order to not potentially break applications which were
> requesting O_CLOEXEC on event file descriptors but which
> actually need it to be not effective as the kernel currently
> ignore the flag, so the file descriptor is inherited accross
> exec regardless of O_CLOEXEC (please forgive me for the
> wording), this patch introduces FAN_FD_CLOEXEC flag to
> fanotify_init() so that application can request O_CLOEXEC
> to be effective.
> Newer application would use FAN_FD_CLOEXEC flag along
> O_CLOEXEC to enable close on exec on newly created
> file descriptor:
>
> fd = fanotify_init(FAN_CLOEXEC|FAN_NONBLOCK|FAN_FD_CLOEXEC,
> O_RDONLY|O_LARGEFILE|O_CLOEXEC|O_NOATIME);
Ugh really?
IMHO there should be widespread or at least known breakage with
O_CLOEXEC before adding messiness like this.
It seems surprising to me that apps that would depend on
O_CLOEXEC being ineffective.
please reconsider this one.
thanks,
Pádraig.
On 10/02/2014 08:52 AM, Yann Droneaud wrote:
> In order to not potentially break applications which were
> requesting O_CLOEXEC on event file descriptors but which
> actually need it to be not effective as the kernel currently
> ignore the flag, so the file descriptor is inherited accross
> exec regardless of O_CLOEXEC (please forgive me for the
> wording), this patch introduces FAN_FD_CLOEXEC flag to
> fanotify_init() so that application can request O_CLOEXEC
> to be effective.
> Newer application would use FAN_FD_CLOEXEC flag along
> O_CLOEXEC to enable close on exec on newly created
> file descriptor:
>
> fd = fanotify_init(FAN_CLOEXEC|FAN_NONBLOCK|FAN_FD_CLOEXEC,
> O_RDONLY|O_LARGEFILE|O_CLOEXEC|O_NOATIME);
Ugh really?
IMHO there should be widespread or at least known breakage with
O_CLOEXEC before adding messiness like this.
It seems surprising to me that apps that would depend on
O_CLOEXEC being ineffective.
please reconsider this one.
thanks,
Pádraig.
On 10/02/2014 08:52 AM, Yann Droneaud wrote:
> In order to not potentially break applications which were
> requesting O_CLOEXEC on event file descriptors but which
> actually need it to be not effective as the kernel currently
> ignore the flag, so the file descriptor is inherited accross
> exec regardless of O_CLOEXEC (please forgive me for the
> wording), this patch introduces FAN_FD_CLOEXEC flag to
> fanotify_init() so that application can request O_CLOEXEC
> to be effective.
> Newer application would use FAN_FD_CLOEXEC flag along
> O_CLOEXEC to enable close on exec on newly created
> file descriptor:
>
> fd = fanotify_init(FAN_CLOEXEC|FAN_NONBLOCK|FAN_FD_CLOEXEC,
> O_RDONLY|O_LARGEFILE|O_CLOEXEC|O_NOATIME);
Ugh really?
IMHO there should be widespread or at least known breakage with
O_CLOEXEC before adding messiness like this.
It seems surprising to me that apps that would depend on
O_CLOEXEC being ineffective.
please reconsider this one.
thanks,
P�draig.
On Wed 01-10-14 15:36:21, Andrew Morton wrote:
> On Mon, 29 Sep 2014 10:49:15 +0200 Yann Droneaud <ydroneaud@opteya.com> wrote:
>
> > According to commit 80af258867648 ('fanotify: groups can specify
> > their f_flags for new fd'), file descriptors created as part of
> > file access notification events inherit flags from the
> > event_f_flags argument passed to syscall fanotify_init(2).
> >
> > So while it is legal for userspace to call fanotify_init() with
> > O_CLOEXEC as part of its second argument, O_CLOEXEC is currently
> > silently ignored.
> >
> > Indeed event_f_flags are only given to dentry_open(), which only
> > seems to care about O_ACCMODE and O_PATH in do_dentry_open(),
> > O_DIRECT in open_check_o_direct() and O_LARGEFILE in
> > generic_file_open().
> >
> > But it seems logical to set close-on-exec flag on the file
> > descriptor if userspace is allowed to request it with O_CLOEXEC.
> >
> > In fact, according to some lookup on http://codesearch.debian.net/
> > and various search engine, there's already some userspace code
> > requesting it:
> >
> > - in systemd's readahead[2]:
> >
> > fanotify_fd = fanotify_init(FAN_CLOEXEC|FAN_NONBLOCK, O_RDONLY|O_LARGEFILE|O_CLOEXEC|O_NOATIME);
> >
> > - in clsync[3]:
> >
> > #define FANOTIFY_EVFLAGS (O_LARGEFILE|O_RDONLY|O_CLOEXEC)
> >
> > int fanotify_d = fanotify_init(FANOTIFY_FLAGS, FANOTIFY_EVFLAGS);
> >
> > - in examples [4] from "Filesystem monitoring in the Linux
> > kernel" article[5] by Aleksander Morgado:
> >
> > if ((fanotify_fd = fanotify_init (FAN_CLOEXEC,
> > O_RDONLY | O_CLOEXEC | O_LARGEFILE)) < 0)
>
> So we have a number of apps which are setting O_CLOEXEC, but it doesn't
> actually work. With this change it *will* work, so the behaviour of
> those apps might change, possibly breaking them?
Possibly. OTOH I'd dare to say that most of the apps specifying O_CLOEXEC
want that behavior and their security may be weakened by the fact that
O_CLOEXEC is ignored. So we are weighting possible security issues for apps
doing things right (and Mihai mentioned in this thread that at least he has
an application which needs O_CLOEXEC working) against possible breakage for
apps which just randomly set O_CLOEXEC without wanting. So I'm really for
fixing O_CLOEXEC behavior.
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
On Wed 01-10-14 15:36:21, Andrew Morton wrote:
> On Mon, 29 Sep 2014 10:49:15 +0200 Yann Droneaud <ydroneaud-RlY5vtjFyJ3QT0dZR+AlfA@public.gmane.org> wrote:
>
> > According to commit 80af258867648 ('fanotify: groups can specify
> > their f_flags for new fd'), file descriptors created as part of
> > file access notification events inherit flags from the
> > event_f_flags argument passed to syscall fanotify_init(2).
> >
> > So while it is legal for userspace to call fanotify_init() with
> > O_CLOEXEC as part of its second argument, O_CLOEXEC is currently
> > silently ignored.
> >
> > Indeed event_f_flags are only given to dentry_open(), which only
> > seems to care about O_ACCMODE and O_PATH in do_dentry_open(),
> > O_DIRECT in open_check_o_direct() and O_LARGEFILE in
> > generic_file_open().
> >
> > But it seems logical to set close-on-exec flag on the file
> > descriptor if userspace is allowed to request it with O_CLOEXEC.
> >
> > In fact, according to some lookup on http://codesearch.debian.net/
> > and various search engine, there's already some userspace code
> > requesting it:
> >
> > - in systemd's readahead[2]:
> >
> > fanotify_fd = fanotify_init(FAN_CLOEXEC|FAN_NONBLOCK, O_RDONLY|O_LARGEFILE|O_CLOEXEC|O_NOATIME);
> >
> > - in clsync[3]:
> >
> > #define FANOTIFY_EVFLAGS (O_LARGEFILE|O_RDONLY|O_CLOEXEC)
> >
> > int fanotify_d = fanotify_init(FANOTIFY_FLAGS, FANOTIFY_EVFLAGS);
> >
> > - in examples [4] from "Filesystem monitoring in the Linux
> > kernel" article[5] by Aleksander Morgado:
> >
> > if ((fanotify_fd = fanotify_init (FAN_CLOEXEC,
> > O_RDONLY | O_CLOEXEC | O_LARGEFILE)) < 0)
>
> So we have a number of apps which are setting O_CLOEXEC, but it doesn't
> actually work. With this change it *will* work, so the behaviour of
> those apps might change, possibly breaking them?
Possibly. OTOH I'd dare to say that most of the apps specifying O_CLOEXEC
want that behavior and their security may be weakened by the fact that
O_CLOEXEC is ignored. So we are weighting possible security issues for apps
doing things right (and Mihai mentioned in this thread that at least he has
an application which needs O_CLOEXEC working) against possible breakage for
apps which just randomly set O_CLOEXEC without wanting. So I'm really for
fixing O_CLOEXEC behavior.
Honza
--
Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>
SUSE Labs, CR
On Thu, Oct 2, 2014 at 11:13 AM, Pádraig Brady <P@draigbrady.com> wrote: > On 10/02/2014 08:52 AM, Yann Droneaud wrote: >> In order to not potentially break applications which were >> requesting O_CLOEXEC on event file descriptors but which >> actually need it to be not effective as the kernel currently >> ignore the flag, so the file descriptor is inherited accross >> exec regardless of O_CLOEXEC (please forgive me for the >> wording), this patch introduces FAN_FD_CLOEXEC flag to >> fanotify_init() so that application can request O_CLOEXEC >> to be effective. >> Newer application would use FAN_FD_CLOEXEC flag along >> O_CLOEXEC to enable close on exec on newly created >> file descriptor: >> >> fd = fanotify_init(FAN_CLOEXEC|FAN_NONBLOCK|FAN_FD_CLOEXEC, >> O_RDONLY|O_LARGEFILE|O_CLOEXEC|O_NOATIME); > > Ugh really? > IMHO there should be widespread or at least known breakage with > O_CLOEXEC before adding messiness like this. > It seems surprising to me that apps that would depend on > O_CLOEXEC being ineffective. > > please reconsider this one. Agreed. The number of applications for which there are silent (not *yet* observed) breakages because O_CLOEXEC is not working as expected is likely rather larger than the set of applications that randomly specify O_CLOEXEC and then somehow get broken as a result. Cheers, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/
Hi, Le jeudi 02 octobre 2014 à 10:13 +0100, Pádraig Brady a écrit : > On 10/02/2014 08:52 AM, Yann Droneaud wrote: > > In order to not potentially break applications which were > > requesting O_CLOEXEC on event file descriptors but which > > actually need it to be not effective as the kernel currently > > ignore the flag, so the file descriptor is inherited accross > > exec regardless of O_CLOEXEC (please forgive me for the > > wording), this patch introduces FAN_FD_CLOEXEC flag to > > fanotify_init() so that application can request O_CLOEXEC > > to be effective. > > Newer application would use FAN_FD_CLOEXEC flag along > > O_CLOEXEC to enable close on exec on newly created > > file descriptor: > > > > fd = fanotify_init(FAN_CLOEXEC|FAN_NONBLOCK|FAN_FD_CLOEXEC, > > O_RDONLY|O_LARGEFILE|O_CLOEXEC|O_NOATIME); > > Ugh really? > IMHO there should be widespread or at least known breakage with > O_CLOEXEC before adding messiness like this. You should have read the other part of my message: > > While I believe fanotify_init() must enable close-on-exec > > when requested by userspace to prevent unwelcomed security > > issue, I understand your concerns regarding the possible > > breakage on userspace application requesting O_CLOEXEC > > but relying on it not being enable on file descriptor > > created for the events. > > > So with a new flag to fanotify_init(), we could allow > > newer applications to really enable O_CLOEXEC. > > > But I feel bad to have to force application to specify > > twice they want close on exec: > > - are you sure ? > > - are you really sure ? > > - is this your final answer ? > > ... I'm not really fond of this option. > It seems surprising to me that apps that would depend on > O_CLOEXEC being ineffective. > We have seen userspace developers making mistakes, and those mistakes were mistakenly ignored by the kernel until someone try to fix the mistake on kernel side, which broke the existing userspace application. > please reconsider this one. > I'm not going to promote this patch as it's a quick and dirty hack to demonstrate what would be the other option. Regards. -- Yann Droneaud OPTEYA
Hi, Le jeudi 02 octobre 2014 à 10:13 +0100, Pádraig Brady a écrit : > On 10/02/2014 08:52 AM, Yann Droneaud wrote: > > In order to not potentially break applications which were > > requesting O_CLOEXEC on event file descriptors but which > > actually need it to be not effective as the kernel currently > > ignore the flag, so the file descriptor is inherited accross > > exec regardless of O_CLOEXEC (please forgive me for the > > wording), this patch introduces FAN_FD_CLOEXEC flag to > > fanotify_init() so that application can request O_CLOEXEC > > to be effective. > > Newer application would use FAN_FD_CLOEXEC flag along > > O_CLOEXEC to enable close on exec on newly created > > file descriptor: > > > > fd = fanotify_init(FAN_CLOEXEC|FAN_NONBLOCK|FAN_FD_CLOEXEC, > > O_RDONLY|O_LARGEFILE|O_CLOEXEC|O_NOATIME); > > Ugh really? > IMHO there should be widespread or at least known breakage with > O_CLOEXEC before adding messiness like this. You should have read the other part of my message: > > While I believe fanotify_init() must enable close-on-exec > > when requested by userspace to prevent unwelcomed security > > issue, I understand your concerns regarding the possible > > breakage on userspace application requesting O_CLOEXEC > > but relying on it not being enable on file descriptor > > created for the events. > > > So with a new flag to fanotify_init(), we could allow > > newer applications to really enable O_CLOEXEC. > > > But I feel bad to have to force application to specify > > twice they want close on exec: > > - are you sure ? > > - are you really sure ? > > - is this your final answer ? > > ... I'm not really fond of this option. > It seems surprising to me that apps that would depend on > O_CLOEXEC being ineffective. > We have seen userspace developers making mistakes, and those mistakes were mistakenly ignored by the kernel until someone try to fix the mistake on kernel side, which broke the existing userspace application. > please reconsider this one. > I'm not going to promote this patch as it's a quick and dirty hack to demonstrate what would be the other option. Regards. -- Yann Droneaud OPTEYA
On Thu, 2 Oct 2014 12:44:10 +0200 Jan Kara <jack@suse.cz> wrote:
> On Wed 01-10-14 15:36:21, Andrew Morton wrote:
> > On Mon, 29 Sep 2014 10:49:15 +0200 Yann Droneaud <ydroneaud@opteya.com> wrote:
> >
> > > According to commit 80af258867648 ('fanotify: groups can specify
> > > their f_flags for new fd'), file descriptors created as part of
> > > file access notification events inherit flags from the
> > > event_f_flags argument passed to syscall fanotify_init(2).
> > >
> > > So while it is legal for userspace to call fanotify_init() with
> > > O_CLOEXEC as part of its second argument, O_CLOEXEC is currently
> > > silently ignored.
> > >
> > > Indeed event_f_flags are only given to dentry_open(), which only
> > > seems to care about O_ACCMODE and O_PATH in do_dentry_open(),
> > > O_DIRECT in open_check_o_direct() and O_LARGEFILE in
> > > generic_file_open().
> > >
> > > But it seems logical to set close-on-exec flag on the file
> > > descriptor if userspace is allowed to request it with O_CLOEXEC.
> > >
> > > In fact, according to some lookup on http://codesearch.debian.net/
> > > and various search engine, there's already some userspace code
> > > requesting it:
> > >
> > > - in systemd's readahead[2]:
> > >
> > > fanotify_fd = fanotify_init(FAN_CLOEXEC|FAN_NONBLOCK, O_RDONLY|O_LARGEFILE|O_CLOEXEC|O_NOATIME);
> > >
> > > - in clsync[3]:
> > >
> > > #define FANOTIFY_EVFLAGS (O_LARGEFILE|O_RDONLY|O_CLOEXEC)
> > >
> > > int fanotify_d = fanotify_init(FANOTIFY_FLAGS, FANOTIFY_EVFLAGS);
> > >
> > > - in examples [4] from "Filesystem monitoring in the Linux
> > > kernel" article[5] by Aleksander Morgado:
> > >
> > > if ((fanotify_fd = fanotify_init (FAN_CLOEXEC,
> > > O_RDONLY | O_CLOEXEC | O_LARGEFILE)) < 0)
> >
> > So we have a number of apps which are setting O_CLOEXEC, but it doesn't
> > actually work. With this change it *will* work, so the behaviour of
> > those apps might change, possibly breaking them?
> Possibly. OTOH I'd dare to say that most of the apps specifying O_CLOEXEC
> want that behavior and their security may be weakened by the fact that
> O_CLOEXEC is ignored. So we are weighting possible security issues for apps
> doing things right (and Mihai mentioned in this thread that at least he has
> an application which needs O_CLOEXEC working) against possible breakage for
> apps which just randomly set O_CLOEXEC without wanting. So I'm really for
> fixing O_CLOEXEC behavior.
Fair enough, it sounds like the risk is acceptable.
Can we get a new version sent out with all this new info appropriately
changelogged?
On Thu, 2 Oct 2014 12:44:10 +0200 Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org> wrote:
> On Wed 01-10-14 15:36:21, Andrew Morton wrote:
> > On Mon, 29 Sep 2014 10:49:15 +0200 Yann Droneaud <ydroneaud-RlY5vtjFyJ3QT0dZR+AlfA@public.gmane.org> wrote:
> >
> > > According to commit 80af258867648 ('fanotify: groups can specify
> > > their f_flags for new fd'), file descriptors created as part of
> > > file access notification events inherit flags from the
> > > event_f_flags argument passed to syscall fanotify_init(2).
> > >
> > > So while it is legal for userspace to call fanotify_init() with
> > > O_CLOEXEC as part of its second argument, O_CLOEXEC is currently
> > > silently ignored.
> > >
> > > Indeed event_f_flags are only given to dentry_open(), which only
> > > seems to care about O_ACCMODE and O_PATH in do_dentry_open(),
> > > O_DIRECT in open_check_o_direct() and O_LARGEFILE in
> > > generic_file_open().
> > >
> > > But it seems logical to set close-on-exec flag on the file
> > > descriptor if userspace is allowed to request it with O_CLOEXEC.
> > >
> > > In fact, according to some lookup on http://codesearch.debian.net/
> > > and various search engine, there's already some userspace code
> > > requesting it:
> > >
> > > - in systemd's readahead[2]:
> > >
> > > fanotify_fd = fanotify_init(FAN_CLOEXEC|FAN_NONBLOCK, O_RDONLY|O_LARGEFILE|O_CLOEXEC|O_NOATIME);
> > >
> > > - in clsync[3]:
> > >
> > > #define FANOTIFY_EVFLAGS (O_LARGEFILE|O_RDONLY|O_CLOEXEC)
> > >
> > > int fanotify_d = fanotify_init(FANOTIFY_FLAGS, FANOTIFY_EVFLAGS);
> > >
> > > - in examples [4] from "Filesystem monitoring in the Linux
> > > kernel" article[5] by Aleksander Morgado:
> > >
> > > if ((fanotify_fd = fanotify_init (FAN_CLOEXEC,
> > > O_RDONLY | O_CLOEXEC | O_LARGEFILE)) < 0)
> >
> > So we have a number of apps which are setting O_CLOEXEC, but it doesn't
> > actually work. With this change it *will* work, so the behaviour of
> > those apps might change, possibly breaking them?
> Possibly. OTOH I'd dare to say that most of the apps specifying O_CLOEXEC
> want that behavior and their security may be weakened by the fact that
> O_CLOEXEC is ignored. So we are weighting possible security issues for apps
> doing things right (and Mihai mentioned in this thread that at least he has
> an application which needs O_CLOEXEC working) against possible breakage for
> apps which just randomly set O_CLOEXEC without wanting. So I'm really for
> fixing O_CLOEXEC behavior.
Fair enough, it sounds like the risk is acceptable.
Can we get a new version sent out with all this new info appropriately
changelogged?
According to commit 80af258867648 ('fanotify: groups can specify their f_flags for new fd'), file descriptors created as part of file access notification events inherit flags from the event_f_flags argument passed to syscall fanotify_init(2)[1]. Unfortunately O_CLOEXEC is currently silently ignored. Indeed, event_f_flags are only given to dentry_open(), which only seems to care about O_ACCMODE and O_PATH in do_dentry_open(), O_DIRECT in open_check_o_direct() and O_LARGEFILE in generic_file_open(). It's a pity, since, according to some lookup on various search engines and http://codesearch.debian.net/, there's already some userspace code which use O_CLOEXEC: - in systemd's readahead[2]: fanotify_fd = fanotify_init(FAN_CLOEXEC|FAN_NONBLOCK, O_RDONLY|O_LARGEFILE|O_CLOEXEC|O_NOATIME); - in clsync[3]: #define FANOTIFY_EVFLAGS (O_LARGEFILE|O_RDONLY|O_CLOEXEC) int fanotify_d = fanotify_init(FANOTIFY_FLAGS, FANOTIFY_EVFLAGS); - in examples [4] from "Filesystem monitoring in the Linux kernel" article[5] by Aleksander Morgado: if ((fanotify_fd = fanotify_init (FAN_CLOEXEC, O_RDONLY | O_CLOEXEC | O_LARGEFILE)) < 0) Additionally, since commit 48149e9d3a7e ('fanotify: check file flags passed in fanotify_init'). having O_CLOEXEC as part of fanotify_init() second argument is expressly allowed. So it seems expected to set close-on-exec flag on the file descriptors if userspace is allowed to request it with O_CLOEXEC. But Andrew Morton raised[6] the concern that enabling now close-on-exec might break existing applications which ask for O_CLOEXEC but expect the file descriptor to be inherited across exec(). In the other hand, as reported by Mihai Donțu[7], not setting close-on-exec on the file descriptor returned as part of file access notify can break applications due to deadlock. So close-on-exec is needed for most applications. More, applications asking for close-on-exec are likely expecting it to be enabled, relying on O_CLOEXEC being effective. If not, it might weaken their security, as noted by Jan Kara[8]. So this patch replaces call to macro get_unused_fd() by a call to function get_unused_fd_flags() with event_f_flags value as argument. This way O_CLOEXEC flag in the second argument of fanotify_init(2) syscall is interpreted and close-on-exec get enabled when requested. [1] http://man7.org/linux/man-pages/man2/fanotify_init.2.html [2] http://cgit.freedesktop.org/systemd/systemd/tree/src/readahead/readahead-collect.c?id=v208#n294 [3] https://github.com/xaionaro/clsync/blob/v0.2.1/sync.c#L1631 https://github.com/xaionaro/clsync/blob/v0.2.1/configuration.h#L38 [4] http://www.lanedo.com/~aleksander/fanotify/fanotify-example.c [5] http://www.lanedo.com/2013/filesystem-monitoring-linux-kernel/ [6] http://lkml.kernel.org/r/20141001153621.65e9258e65a6167bf2e4cb50@linux-foundation.org [7] http://lkml.kernel.org/r/20141002095046.3715eb69@mdontu-l [8] http://lkml.kernel.org/r/20141002104410.GB19748@quack.suse.cz Link: http://lkml.kernel.org/r/cover.1411562410.git.ydroneaud@opteya.com Cc: Mihai Donțu <mihai.dontu@gmail.com> Cc: Pádraig Brady <P@draigBrady.com> Cc: Heinrich Schuchardt <xypron.glpk@gmx.de> Cc: Jan Kara <jack@suse.cz> Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Cc: Michael Kerrisk-manpages <mtk.manpages@gmail.com> Cc: Lino Sanfilippo <LinoSanfilippo@gmx.de> Cc: Richard Guy Briggs <rgb@redhat.com> Cc: Eric Paris <eparis@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: stable@vger.kernel.org Cc: linux-api@vger.kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Reviewed by: Heinrich Schuchardt <xypron.glpk@gmx.de> Tested-by: Heinrich Schuchardt <xypron.glpk@gmx.de> Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> --- Hi Andrew, > Fair enough, it sounds like the risk is acceptable. > OK. > Can we get a new version sent out with all this new info appropriately > changelogged? > Of course ! Please find an updated patch with revamped commit message. Changes from v8.1: - added more Cc: - added Reviewed-by: - rewrote commit message. fs/notify/fanotify/fanotify_user.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index b13992a41bd9..c991616acca9 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -78,7 +78,7 @@ static int create_fd(struct fsnotify_group *group, pr_debug("%s: group=%p event=%p\n", __func__, group, event); - client_fd = get_unused_fd(); + client_fd = get_unused_fd_flags(group->fanotify_data.f_flags); if (client_fd < 0) return client_fd; -- 1.9.3
According to commit 80af258867648 ('fanotify: groups can specify their f_flags for new fd'), file descriptors created as part of file access notification events inherit flags from the event_f_flags argument passed to syscall fanotify_init(2)[1]. Unfortunately O_CLOEXEC is currently silently ignored. Indeed, event_f_flags are only given to dentry_open(), which only seems to care about O_ACCMODE and O_PATH in do_dentry_open(), O_DIRECT in open_check_o_direct() and O_LARGEFILE in generic_file_open(). It's a pity, since, according to some lookup on various search engines and http://codesearch.debian.net/, there's already some userspace code which use O_CLOEXEC: - in systemd's readahead[2]: fanotify_fd = fanotify_init(FAN_CLOEXEC|FAN_NONBLOCK, O_RDONLY|O_LARGEFILE|O_CLOEXEC|O_NOATIME); - in clsync[3]: #define FANOTIFY_EVFLAGS (O_LARGEFILE|O_RDONLY|O_CLOEXEC) int fanotify_d = fanotify_init(FANOTIFY_FLAGS, FANOTIFY_EVFLAGS); - in examples [4] from "Filesystem monitoring in the Linux kernel" article[5] by Aleksander Morgado: if ((fanotify_fd = fanotify_init (FAN_CLOEXEC, O_RDONLY | O_CLOEXEC | O_LARGEFILE)) < 0) Additionally, since commit 48149e9d3a7e ('fanotify: check file flags passed in fanotify_init'). having O_CLOEXEC as part of fanotify_init() second argument is expressly allowed. So it seems expected to set close-on-exec flag on the file descriptors if userspace is allowed to request it with O_CLOEXEC. But Andrew Morton raised[6] the concern that enabling now close-on-exec might break existing applications which ask for O_CLOEXEC but expect the file descriptor to be inherited across exec(). In the other hand, as reported by Mihai Donțu[7], not setting close-on-exec on the file descriptor returned as part of file access notify can break applications due to deadlock. So close-on-exec is needed for most applications. More, applications asking for close-on-exec are likely expecting it to be enabled, relying on O_CLOEXEC being effective. If not, it might weaken their security, as noted by Jan Kara[8]. So this patch replaces call to macro get_unused_fd() by a call to function get_unused_fd_flags() with event_f_flags value as argument. This way O_CLOEXEC flag in the second argument of fanotify_init(2) syscall is interpreted and close-on-exec get enabled when requested. [1] http://man7.org/linux/man-pages/man2/fanotify_init.2.html [2] http://cgit.freedesktop.org/systemd/systemd/tree/src/readahead/readahead-collect.c?id=v208#n294 [3] https://github.com/xaionaro/clsync/blob/v0.2.1/sync.c#L1631 https://github.com/xaionaro/clsync/blob/v0.2.1/configuration.h#L38 [4] http://www.lanedo.com/~aleksander/fanotify/fanotify-example.c [5] http://www.lanedo.com/2013/filesystem-monitoring-linux-kernel/ [6] http://lkml.kernel.org/r/20141001153621.65e9258e65a6167bf2e4cb50@linux-foundation.org [7] http://lkml.kernel.org/r/20141002095046.3715eb69@mdontu-l [8] http://lkml.kernel.org/r/20141002104410.GB19748-+0h/O2h83AeN3ZZ/Hiejyg@public.gmane.org Link: http://lkml.kernel.org/r/cover.1411562410.git.ydroneaud-RlY5vtjFyJ1hl2p70BpVqQ@public.gmane.orgm Cc: Mihai Donțu <mihai.dontu-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> Cc: Pádraig Brady <P@draigBrady.com> Cc: Heinrich Schuchardt <xypron.glpk-Mmb7MZpHnFY@public.gmane.org> Cc: Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org> Cc: Valdis Kletnieks <Valdis.Kletnieks-PjAqaU27lzQ@public.gmane.org> Cc: Michael Kerrisk-manpages <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> Cc: Lino Sanfilippo <LinoSanfilippo-Mmb7MZpHnFY@public.gmane.org> Cc: Richard Guy Briggs <rgb-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> Cc: Eric Paris <eparis-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> Cc: Al Viro <viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org> Cc: Andrew Morton <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> Cc: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org Cc: linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org Reviewed-by: Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org> Reviewed by: Heinrich Schuchardt <xypron.glpk-Mmb7MZpHnFY@public.gmane.org> Tested-by: Heinrich Schuchardt <xypron.glpk-Mmb7MZpHnFY@public.gmane.org> Signed-off-by: Yann Droneaud <ydroneaud-RlY5vtjFyJ3QT0dZR+AlfA@public.gmane.org> --- Hi Andrew, > Fair enough, it sounds like the risk is acceptable. > OK. > Can we get a new version sent out with all this new info appropriately > changelogged? > Of course ! Please find an updated patch with revamped commit message. Changes from v8.1: - added more Cc: - added Reviewed-by: - rewrote commit message. fs/notify/fanotify/fanotify_user.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index b13992a41bd9..c991616acca9 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -78,7 +78,7 @@ static int create_fd(struct fsnotify_group *group, pr_debug("%s: group=%p event=%p\n", __func__, group, event); - client_fd = get_unused_fd(); + client_fd = get_unused_fd_flags(group->fanotify_data.f_flags); if (client_fd < 0) return client_fd; -- 1.9.3