All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHv8 0/6] Getting rid of get_unused_fd() / enable close-on-exec
@ 2014-09-24 13:11 ` Yann Droneaud
  0 siblings, 0 replies; 33+ messages in thread
From: Yann Droneaud @ 2014-09-24 13:11 UTC (permalink / raw)
  To: Richard Guy Briggs, Eric Paris, Tony Luck, Fenghua Yu,
	linux-ia64, Jeremy Kerr, Arnd Bergmann, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, linuxppc-dev, cbe-oss-dev,
	Al Viro, linux-fsdevel, Andrew Morton, Jiri Kosina
  Cc: linux-kernel, Yann Droneaud, linux-api, Ulrich Drepper

TL;DR:

- Trivial patches to replace calls to get_unused_fd() by
  get_unused_fd_flags(0);
- Patch to remove get_unused_fd();
- Patch to add support O_CLOEXEC in fanotify_init().

Hi,

Please find the eighth revision of my patchset to remove
get_unused_fd() macro in order to help subsystems to use
get_unused_fd_flags() (or anon_inode_getfd()) with flags either
provided by the userspace or set to O_CLOEXEC by default where
appropriate.

Without get_unused_fd() macro, more subsystems are likely to use
get_unused_fd_flags() (or anon_inode_getfd()) and be taught to
provide an API that let userspace choose the opening flags of
the file descriptor.

Not allowing userspace to provide the "open" flags or not using
O_CLOEXEC by default should be considered bad practice from
security point of view: in most case O_CLOEXEC must be used to
not leak file descriptor across exec().

Not allowing userspace to atomically set close-on-exec flag and
not using O_CLOEXEC should be avoided to protect multi-threaded
program from race condition when it tried to set close-on-exec
flag using fcntl(fd, F_SETFD, FD_CLOEXEC) after opening the file
descriptor.

Example:

    int fd;
    int ret;

    fd = open(filename, O_RDONLY);
    if (fd < 0) {
        perror("open");
        return -1;
    }

    /*
     * window opened for another thread to call fork(),
     * then the new process can call exec() at any time
     * and the file descriptor would be inherited
     */

    ret = fcntl(fd, F_SETFD, FD_CLOEXEC)
    if (ret < 0) {
        perror("fnctl()");
        close(fd);
        return -1;
    }

vs.:

    int fd;
    fd = open(filename, O_RDONLY | O_CLOEXEC);
    if (fd < 0) {
        perror("open");
        return -1;
    }

Using O_CLOEXEC by default when flags are not (eg. cannot be)
provided by userspace is the safest bet as it allows userspace
to choose, if, when and where the file descriptor is going to be
inherited across exec(): userspace is free to call fcntl() to
remove FD_CLOEXEC flag in the child process that will call
exec().

Unfortunately, O_CLOEXEC cannot be made the default for most
existing system calls as it's not the default behavior for
POSIX / Unix. Reader interested in this issue could have a look
at "Ghosts of Unix past, part 2: Conflated designs" [1] article
by Neil Brown.

FAQ:

- Why do one want close-on-exec ?

Setting close-on-exec flag on file descriptor ensure it won't be
inherited silently by child, child of child, etc. when executing
another program.

If the file descriptor is not closed, some kernel resources can
be locked until the last process with the opened file descriptor
exit.

If the file descriptor is not closed, this can lead to a
security issue, eg. making resources available to a less
privileged program allowing information leak and/or deny of
service.

- Why do one need atomic close-on-exec ?

Even if it's possible to set close-on-exec flag through call to
fcntl() as shown previously, it introduces a race condition in
multi-threaded process, where a thread call fork() / exec()
while another thread is between call to open() and fcntl().

Additionally, using close-on-exec free the programmer from
tracking manually which file descriptor is to be closed in a
child before calling exec(): in a program using multiple
third-party libraries, it's difficult to know which file
descriptor must be closed. AFAIK, while there's a atexit(),
pthread_atfork(), there's no atexec() userspace function
in libc to allow libraries to register a handler in order
to close its opened file descriptor before exec().

- Why default to close-on-exec ?

Some kernel interfaces don't allow userspace to pass a
O_CLOEXEC-like flag when creating a new file descriptor.

In such cases, if possible (see below), O_CLOEXEC must be made
the default so that userspace doesn't have to call fcntl()
which, as demonstrated previously, is open to race condition
in multi-threaded program.

- How to choose between flag 0 or O_CLOEXEC in call to
  get_unused_fd_flags() (or anon_inode_getfd()) ?

Short: Will it break existing application ? Will it break kernel
       ABI ?

       If answer is no, use O_CLOEXEC.
       If answer is yes, use 0.

Long: If userspace expect to retrieve a file descriptor with
      plain old Unix(tm) semantics, O_CLOEXEC must not be made
      the default, as it could break some applications expecting
      that the file descriptor will be inherited across exec().

      But for some subsystems, such as InfiniBand, KVM, VFIO,
      etc. it makes no sense to have file descriptors inherited
      across exec() since those are tied to resources that will
      vanish when a another program will replace the current
      one by mean of exec(), so it's safe to use O_CLOEXEC in
      such cases.

      For others, like XFS, the file descriptor is retrieved by
      one program and will be used by a different program,
      executed as a child. In this case, setting O_CLOEXEC would
      break existing application which do not expect to have to
      call fcntl(fd, F_SETFD, 0) to make it available across
      exec().

      If file descriptor created by a subsystem is not tied to
      the current process resources, it's likely legal to use it
      in a different process context, thus O_CLOEXEC must not be
      the default.

      If one, as a subsystem maintainer, cannot tell for sure
      that no userspace program ever rely current behavior, eg.
      file descriptor being inherited across exec(), then the
      default flag *must* be kept 0 to not break application.

- This subsystem cannot be turned to use O_CLOEXEC by default:

If O_CLOEXEC cannot be made the default, it would be interesting
to think to extend the API to have a (set of) function(s) taking
a flag parameter so that userspace can atomically request
close-on-exec if it need it (and it should need it !).

- Background:

One might want to read "Secure File Descriptor Handling" [2] by
Ulrich Drepper who is responsible of adding O_CLOEXEC flag on
open(), and flags alike on other syscalls.

One might also want to read PEP-446 "Make newly created file
descriptors non-inheritable" [3] by Victor Stinner since it has
lot more background information on file descriptor leaking.

One would also like to read "Excuse me son, but your code is
leaking !!!" [4] by Dan Walsh for advice.

[1] http://lwn.net/Articles/412131/
[2] http://udrepper.livejournal.com/20407.html
[3] http://www.python.org/dev/peps/pep-0446/
[4] http://danwalsh.livejournal.com/53603.html

- Statistics:

In linux-next, tag next-20140924, they're currently:

- 33 calls to fd_install()
       with one call part of anon_inode_getfd()
- 26 calls to get_unused_fd_flags()
       with one call part of anon_inode_getfd()
       with another part of get_unused_fd() macro
- 11 calls to anon_inode_getfd()
-  8 calls to anon_inode_getfile()
       with one call part of anon_inode_getfd()
-  6 calls to get_unused_fd()

Changes from patchset v7 [PATCHSETv7]

- Rebased on top of latest -next
- Simplified commit message for trivial patches
- Proofread commit messages
- Addded CC: linux-api@vger.kernel.org

Changes from patchset v6 [PATCHSETv6]

- Rebased on top of latest -next
- Added Cc: trivial@kernel.org for the first trivials
  patches.

Changes from patchset v5 [PATCHSETv5]

- perf: introduce a flag to enable close-on-exec in
  perf_event_open()
  DROPPED: applied upstream, commit a21b0b354d4a.

Changes from patchset v4 [PATCHSETv4]:

- rewrote cover letter following discussion with perf
  maintainer. Thanks to Peter Zijlstra.

- modified a bit some commit messages.

- events: use get_unused_fd_flags(0) instead of get_unused_fd()
  DROPPED: replaced by following patch.

- perf: introduce a flag to enable close-on-exec in
  perf_event_open()
  NEW: instead of hard coding the flags to 0, this patch
       allows userspace to specify close-on-exec flag.

- fanotify: use get_unused_fd_flags(0) instead of get_unused_fd()
  DROPPED: replaced by following patch.

- fanotify: enable close-on-exec on events' fd when requested in
    fanotify_init()
  NEW: instead of hard coding the flags to 0, this patch
       enable close-on-exec if userspace request it.

Changes from patchset v3 [PATCHSETv3]:

- industrialio: use anon_inode_getfd() with O_CLOEXEC flag
  DROPPED: applied upstream, commit a646fbf0fd11.

Changes from patchset v2 [PATCHSETv2]:

- android/sw_sync: use get_unused_fd_flags(O_CLOEXEC) instead
  of get_unused_fd()
  DROPPED: applied upstream, commit 45acea57335e.

- android/sync: use get_unused_fd_flags(O_CLOEXEC) instead of
  get_unused_fd()
  DROPPED: applied upstream, commit 9c6cd3b39048.

- vfio: use get_unused_fd_flags(0) instead of get_unused_fd()
  DROPPED: applied upstream, commit a5d550703d2c.
  Additionally subsystem maintainer applied another patch on top
  to set the flags to O_CLOEXEC, commit 5d042fbdbb2d.

- industrialio: use anon_inode_getfd() with O_CLOEXEC flag
  NEW: propose to use O_CLOEXEC as default flag.

Changes from patchset v1 [PATCHSETv1]:

- explicitly added subsystem maintainers as mail recepients.

- infiniband: use get_unused_fd_flags(0) instead of
  get_unused_fd()
  DROPPED: subsystem maintainer applied another patch
           using get_unused_fd_flags(O_CLOEXEC) as suggested,
	   commit da183c7af844.

- android/sw_sync: use get_unused_fd_flags(0) instead of
  get_unused_fd()
  MODIFIED: use get_unused_fd_flags(O_CLOEXEC) as suggested.

- android/sync: use get_unused_fd_flags(0) instead of
  get_unused_fd()
  MODIFIED: use get_unused_fd_flags(O_CLOEXEC) as suggested.

- xfs: use get_unused_fd_flags(0) instead of get_unused_fd()
  DROPPED: applied asis by subsystem maintainer, commit 862a62937e76.

- sctp: use get_unused_fd_flags(0) instead of get_unused_fd()
  DROPPED: applied asis by subsystem maintainer, commit 8a59bd3e9b29.

Links:

[PATCHSETv7]
  http://lkml.kernel.org/r/cover.1401630396.git.ydroneaud@opteya.com

[PATCHSETv6]
  http://lkml.kernel.org/r/cover.1394532336.git.ydroneaud@opteya.com

[PATCHSETv5]
  http://lkml.kernel.org/r/cover.1388952061.git.ydroneaud@opteya.com

[PATCHSETv4]
  http://lkml.kernel.org/r/cover.1383121137.git.ydroneaud@opteya.com

[PATCHSETv3]
  http://lkml.kernel.org/r/cover.1378460926.git.ydroneaud@opteya.com

[PATCHSETv2]
  http://lkml.kernel.org/r/cover.1376327678.git.ydroneaud@opteya.com

[PATCHSETv1]
  http://lkml.kernel.org/r/cover.1372777600.git.ydroneaud@opteya.com

Yann Droneaud (6):
  fanotify: enable close-on-exec on events' fd when requested in
    fanotify_init()
  ia64: trivial: replace get_unused_fd() by get_unused_fd_flags(0)
  ppc/cell: trivial: replace get_unused_fd() by get_unused_fd_flags(0)
  binfmt_misc: trivial: replace get_unused_fd() by
    get_unused_fd_flags(0)
  file: trivial: replace get_unused_fd() by get_unused_fd_flags(0)
  file: remove get_unused_fd() macro

 arch/ia64/kernel/perfmon.c                | 2 +-
 arch/powerpc/platforms/cell/spufs/inode.c | 4 ++--
 fs/binfmt_misc.c                          | 2 +-
 fs/file.c                                 | 2 +-
 fs/notify/fanotify/fanotify_user.c        | 2 +-
 include/linux/file.h                      | 1 -
 6 files changed, 6 insertions(+), 7 deletions(-)

-- 
1.9.3


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCHv8 0/6] Getting rid of get_unused_fd() / enable close-on-exec
@ 2014-09-24 13:11 ` Yann Droneaud
  0 siblings, 0 replies; 33+ messages in thread
From: Yann Droneaud @ 2014-09-24 13:11 UTC (permalink / raw)
  To: Richard Guy Briggs, Eric Paris, Tony Luck, Fenghua Yu,
	linux-ia64-u79uwXL29TY76Z2rM5mHXA, Jeremy Kerr, Arnd Bergmann,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ,
	cbe-oss-dev-uLR06cmDAlY/bJ5BZ2RsiQ, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton, Jiri Kosina
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, Yann Droneaud,
	linux-api-u79uwXL29TY76Z2rM5mHXA, Ulrich Drepper

TL;DR:

- Trivial patches to replace calls to get_unused_fd() by
  get_unused_fd_flags(0);
- Patch to remove get_unused_fd();
- Patch to add support O_CLOEXEC in fanotify_init().

Hi,

Please find the eighth revision of my patchset to remove
get_unused_fd() macro in order to help subsystems to use
get_unused_fd_flags() (or anon_inode_getfd()) with flags either
provided by the userspace or set to O_CLOEXEC by default where
appropriate.

Without get_unused_fd() macro, more subsystems are likely to use
get_unused_fd_flags() (or anon_inode_getfd()) and be taught to
provide an API that let userspace choose the opening flags of
the file descriptor.

Not allowing userspace to provide the "open" flags or not using
O_CLOEXEC by default should be considered bad practice from
security point of view: in most case O_CLOEXEC must be used to
not leak file descriptor across exec().

Not allowing userspace to atomically set close-on-exec flag and
not using O_CLOEXEC should be avoided to protect multi-threaded
program from race condition when it tried to set close-on-exec
flag using fcntl(fd, F_SETFD, FD_CLOEXEC) after opening the file
descriptor.

Example:

    int fd;
    int ret;

    fd = open(filename, O_RDONLY);
    if (fd < 0) {
        perror("open");
        return -1;
    }

    /*
     * window opened for another thread to call fork(),
     * then the new process can call exec() at any time
     * and the file descriptor would be inherited
     */

    ret = fcntl(fd, F_SETFD, FD_CLOEXEC)
    if (ret < 0) {
        perror("fnctl()");
        close(fd);
        return -1;
    }

vs.:

    int fd;
    fd = open(filename, O_RDONLY | O_CLOEXEC);
    if (fd < 0) {
        perror("open");
        return -1;
    }

Using O_CLOEXEC by default when flags are not (eg. cannot be)
provided by userspace is the safest bet as it allows userspace
to choose, if, when and where the file descriptor is going to be
inherited across exec(): userspace is free to call fcntl() to
remove FD_CLOEXEC flag in the child process that will call
exec().

Unfortunately, O_CLOEXEC cannot be made the default for most
existing system calls as it's not the default behavior for
POSIX / Unix. Reader interested in this issue could have a look
at "Ghosts of Unix past, part 2: Conflated designs" [1] article
by Neil Brown.

FAQ:

- Why do one want close-on-exec ?

Setting close-on-exec flag on file descriptor ensure it won't be
inherited silently by child, child of child, etc. when executing
another program.

If the file descriptor is not closed, some kernel resources can
be locked until the last process with the opened file descriptor
exit.

If the file descriptor is not closed, this can lead to a
security issue, eg. making resources available to a less
privileged program allowing information leak and/or deny of
service.

- Why do one need atomic close-on-exec ?

Even if it's possible to set close-on-exec flag through call to
fcntl() as shown previously, it introduces a race condition in
multi-threaded process, where a thread call fork() / exec()
while another thread is between call to open() and fcntl().

Additionally, using close-on-exec free the programmer from
tracking manually which file descriptor is to be closed in a
child before calling exec(): in a program using multiple
third-party libraries, it's difficult to know which file
descriptor must be closed. AFAIK, while there's a atexit(),
pthread_atfork(), there's no atexec() userspace function
in libc to allow libraries to register a handler in order
to close its opened file descriptor before exec().

- Why default to close-on-exec ?

Some kernel interfaces don't allow userspace to pass a
O_CLOEXEC-like flag when creating a new file descriptor.

In such cases, if possible (see below), O_CLOEXEC must be made
the default so that userspace doesn't have to call fcntl()
which, as demonstrated previously, is open to race condition
in multi-threaded program.

- How to choose between flag 0 or O_CLOEXEC in call to
  get_unused_fd_flags() (or anon_inode_getfd()) ?

Short: Will it break existing application ? Will it break kernel
       ABI ?

       If answer is no, use O_CLOEXEC.
       If answer is yes, use 0.

Long: If userspace expect to retrieve a file descriptor with
      plain old Unix(tm) semantics, O_CLOEXEC must not be made
      the default, as it could break some applications expecting
      that the file descriptor will be inherited across exec().

      But for some subsystems, such as InfiniBand, KVM, VFIO,
      etc. it makes no sense to have file descriptors inherited
      across exec() since those are tied to resources that will
      vanish when a another program will replace the current
      one by mean of exec(), so it's safe to use O_CLOEXEC in
      such cases.

      For others, like XFS, the file descriptor is retrieved by
      one program and will be used by a different program,
      executed as a child. In this case, setting O_CLOEXEC would
      break existing application which do not expect to have to
      call fcntl(fd, F_SETFD, 0) to make it available across
      exec().

      If file descriptor created by a subsystem is not tied to
      the current process resources, it's likely legal to use it
      in a different process context, thus O_CLOEXEC must not be
      the default.

      If one, as a subsystem maintainer, cannot tell for sure
      that no userspace program ever rely current behavior, eg.
      file descriptor being inherited across exec(), then the
      default flag *must* be kept 0 to not break application.

- This subsystem cannot be turned to use O_CLOEXEC by default:

If O_CLOEXEC cannot be made the default, it would be interesting
to think to extend the API to have a (set of) function(s) taking
a flag parameter so that userspace can atomically request
close-on-exec if it need it (and it should need it !).

- Background:

One might want to read "Secure File Descriptor Handling" [2] by
Ulrich Drepper who is responsible of adding O_CLOEXEC flag on
open(), and flags alike on other syscalls.

One might also want to read PEP-446 "Make newly created file
descriptors non-inheritable" [3] by Victor Stinner since it has
lot more background information on file descriptor leaking.

One would also like to read "Excuse me son, but your code is
leaking !!!" [4] by Dan Walsh for advice.

[1] http://lwn.net/Articles/412131/
[2] http://udrepper.livejournal.com/20407.html
[3] http://www.python.org/dev/peps/pep-0446/
[4] http://danwalsh.livejournal.com/53603.html

- Statistics:

In linux-next, tag next-20140924, they're currently:

- 33 calls to fd_install()
       with one call part of anon_inode_getfd()
- 26 calls to get_unused_fd_flags()
       with one call part of anon_inode_getfd()
       with another part of get_unused_fd() macro
- 11 calls to anon_inode_getfd()
-  8 calls to anon_inode_getfile()
       with one call part of anon_inode_getfd()
-  6 calls to get_unused_fd()

Changes from patchset v7 [PATCHSETv7]

- Rebased on top of latest -next
- Simplified commit message for trivial patches
- Proofread commit messages
- Addded CC: linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

Changes from patchset v6 [PATCHSETv6]

- Rebased on top of latest -next
- Added Cc: trivial-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org for the first trivials
  patches.

Changes from patchset v5 [PATCHSETv5]

- perf: introduce a flag to enable close-on-exec in
  perf_event_open()
  DROPPED: applied upstream, commit a21b0b354d4a.

Changes from patchset v4 [PATCHSETv4]:

- rewrote cover letter following discussion with perf
  maintainer. Thanks to Peter Zijlstra.

- modified a bit some commit messages.

- events: use get_unused_fd_flags(0) instead of get_unused_fd()
  DROPPED: replaced by following patch.

- perf: introduce a flag to enable close-on-exec in
  perf_event_open()
  NEW: instead of hard coding the flags to 0, this patch
       allows userspace to specify close-on-exec flag.

- fanotify: use get_unused_fd_flags(0) instead of get_unused_fd()
  DROPPED: replaced by following patch.

- fanotify: enable close-on-exec on events' fd when requested in
    fanotify_init()
  NEW: instead of hard coding the flags to 0, this patch
       enable close-on-exec if userspace request it.

Changes from patchset v3 [PATCHSETv3]:

- industrialio: use anon_inode_getfd() with O_CLOEXEC flag
  DROPPED: applied upstream, commit a646fbf0fd11.

Changes from patchset v2 [PATCHSETv2]:

- android/sw_sync: use get_unused_fd_flags(O_CLOEXEC) instead
  of get_unused_fd()
  DROPPED: applied upstream, commit 45acea57335e.

- android/sync: use get_unused_fd_flags(O_CLOEXEC) instead of
  get_unused_fd()
  DROPPED: applied upstream, commit 9c6cd3b39048.

- vfio: use get_unused_fd_flags(0) instead of get_unused_fd()
  DROPPED: applied upstream, commit a5d550703d2c.
  Additionally subsystem maintainer applied another patch on top
  to set the flags to O_CLOEXEC, commit 5d042fbdbb2d.

- industrialio: use anon_inode_getfd() with O_CLOEXEC flag
  NEW: propose to use O_CLOEXEC as default flag.

Changes from patchset v1 [PATCHSETv1]:

- explicitly added subsystem maintainers as mail recepients.

- infiniband: use get_unused_fd_flags(0) instead of
  get_unused_fd()
  DROPPED: subsystem maintainer applied another patch
           using get_unused_fd_flags(O_CLOEXEC) as suggested,
	   commit da183c7af844.

- android/sw_sync: use get_unused_fd_flags(0) instead of
  get_unused_fd()
  MODIFIED: use get_unused_fd_flags(O_CLOEXEC) as suggested.

- android/sync: use get_unused_fd_flags(0) instead of
  get_unused_fd()
  MODIFIED: use get_unused_fd_flags(O_CLOEXEC) as suggested.

- xfs: use get_unused_fd_flags(0) instead of get_unused_fd()
  DROPPED: applied asis by subsystem maintainer, commit 862a62937e76.

- sctp: use get_unused_fd_flags(0) instead of get_unused_fd()
  DROPPED: applied asis by subsystem maintainer, commit 8a59bd3e9b29.

Links:

[PATCHSETv7]
  http://lkml.kernel.org/r/cover.1401630396.git.ydroneaud-RlY5vtjFyJ3QT0dZR+AlfA@public.gmane.org

[PATCHSETv6]
  http://lkml.kernel.org/r/cover.1394532336.git.ydroneaud-RlY5vtjFyJ3QT0dZR+AlfA@public.gmane.org

[PATCHSETv5]
  http://lkml.kernel.org/r/cover.1388952061.git.ydroneaud-RlY5vtjFyJ3QT0dZR+AlfA@public.gmane.org

[PATCHSETv4]
  http://lkml.kernel.org/r/cover.1383121137.git.ydroneaud-RlY5vtjFyJ3QT0dZR+AlfA@public.gmane.org

[PATCHSETv3]
  http://lkml.kernel.org/r/cover.1378460926.git.ydroneaud-RlY5vtjFyJ3QT0dZR+AlfA@public.gmane.org

[PATCHSETv2]
  http://lkml.kernel.org/r/cover.1376327678.git.ydroneaud-RlY5vtjFyJ3QT0dZR+AlfA@public.gmane.org

[PATCHSETv1]
  http://lkml.kernel.org/r/cover.1372777600.git.ydroneaud-RlY5vtjFyJ3QT0dZR+AlfA@public.gmane.org

Yann Droneaud (6):
  fanotify: enable close-on-exec on events' fd when requested in
    fanotify_init()
  ia64: trivial: replace get_unused_fd() by get_unused_fd_flags(0)
  ppc/cell: trivial: replace get_unused_fd() by get_unused_fd_flags(0)
  binfmt_misc: trivial: replace get_unused_fd() by
    get_unused_fd_flags(0)
  file: trivial: replace get_unused_fd() by get_unused_fd_flags(0)
  file: remove get_unused_fd() macro

 arch/ia64/kernel/perfmon.c                | 2 +-
 arch/powerpc/platforms/cell/spufs/inode.c | 4 ++--
 fs/binfmt_misc.c                          | 2 +-
 fs/file.c                                 | 2 +-
 fs/notify/fanotify/fanotify_user.c        | 2 +-
 include/linux/file.h                      | 1 -
 6 files changed, 6 insertions(+), 7 deletions(-)

-- 
1.9.3

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCHv8 0/6] Getting rid of get_unused_fd() / enable close-on-exec
@ 2014-09-24 13:11 ` Yann Droneaud
  0 siblings, 0 replies; 33+ messages in thread
From: Yann Droneaud @ 2014-09-24 13:11 UTC (permalink / raw)
  To: Richard Guy Briggs, Eric Paris, Tony Luck, Fenghua Yu,
	linux-ia64, Jeremy Kerr, Arnd Bergmann, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, linuxppc-dev, cbe-oss-dev,
	Al Viro, linux-fsdevel, Andrew Morton, Jiri Kosina
  Cc: Yann Droneaud, linux-api, linux-kernel, Ulrich Drepper

TL;DR:

- Trivial patches to replace calls to get_unused_fd() by
  get_unused_fd_flags(0);
- Patch to remove get_unused_fd();
- Patch to add support O_CLOEXEC in fanotify_init().

Hi,

Please find the eighth revision of my patchset to remove
get_unused_fd() macro in order to help subsystems to use
get_unused_fd_flags() (or anon_inode_getfd()) with flags either
provided by the userspace or set to O_CLOEXEC by default where
appropriate.

Without get_unused_fd() macro, more subsystems are likely to use
get_unused_fd_flags() (or anon_inode_getfd()) and be taught to
provide an API that let userspace choose the opening flags of
the file descriptor.

Not allowing userspace to provide the "open" flags or not using
O_CLOEXEC by default should be considered bad practice from
security point of view: in most case O_CLOEXEC must be used to
not leak file descriptor across exec().

Not allowing userspace to atomically set close-on-exec flag and
not using O_CLOEXEC should be avoided to protect multi-threaded
program from race condition when it tried to set close-on-exec
flag using fcntl(fd, F_SETFD, FD_CLOEXEC) after opening the file
descriptor.

Example:

    int fd;
    int ret;

    fd = open(filename, O_RDONLY);
    if (fd < 0) {
        perror("open");
        return -1;
    }

    /*
     * window opened for another thread to call fork(),
     * then the new process can call exec() at any time
     * and the file descriptor would be inherited
     */

    ret = fcntl(fd, F_SETFD, FD_CLOEXEC)
    if (ret < 0) {
        perror("fnctl()");
        close(fd);
        return -1;
    }

vs.:

    int fd;
    fd = open(filename, O_RDONLY | O_CLOEXEC);
    if (fd < 0) {
        perror("open");
        return -1;
    }

Using O_CLOEXEC by default when flags are not (eg. cannot be)
provided by userspace is the safest bet as it allows userspace
to choose, if, when and where the file descriptor is going to be
inherited across exec(): userspace is free to call fcntl() to
remove FD_CLOEXEC flag in the child process that will call
exec().

Unfortunately, O_CLOEXEC cannot be made the default for most
existing system calls as it's not the default behavior for
POSIX / Unix. Reader interested in this issue could have a look
at "Ghosts of Unix past, part 2: Conflated designs" [1] article
by Neil Brown.

FAQ:

- Why do one want close-on-exec ?

Setting close-on-exec flag on file descriptor ensure it won't be
inherited silently by child, child of child, etc. when executing
another program.

If the file descriptor is not closed, some kernel resources can
be locked until the last process with the opened file descriptor
exit.

If the file descriptor is not closed, this can lead to a
security issue, eg. making resources available to a less
privileged program allowing information leak and/or deny of
service.

- Why do one need atomic close-on-exec ?

Even if it's possible to set close-on-exec flag through call to
fcntl() as shown previously, it introduces a race condition in
multi-threaded process, where a thread call fork() / exec()
while another thread is between call to open() and fcntl().

Additionally, using close-on-exec free the programmer from
tracking manually which file descriptor is to be closed in a
child before calling exec(): in a program using multiple
third-party libraries, it's difficult to know which file
descriptor must be closed. AFAIK, while there's a atexit(),
pthread_atfork(), there's no atexec() userspace function
in libc to allow libraries to register a handler in order
to close its opened file descriptor before exec().

- Why default to close-on-exec ?

Some kernel interfaces don't allow userspace to pass a
O_CLOEXEC-like flag when creating a new file descriptor.

In such cases, if possible (see below), O_CLOEXEC must be made
the default so that userspace doesn't have to call fcntl()
which, as demonstrated previously, is open to race condition
in multi-threaded program.

- How to choose between flag 0 or O_CLOEXEC in call to
  get_unused_fd_flags() (or anon_inode_getfd()) ?

Short: Will it break existing application ? Will it break kernel
       ABI ?

       If answer is no, use O_CLOEXEC.
       If answer is yes, use 0.

Long: If userspace expect to retrieve a file descriptor with
      plain old Unix(tm) semantics, O_CLOEXEC must not be made
      the default, as it could break some applications expecting
      that the file descriptor will be inherited across exec().

      But for some subsystems, such as InfiniBand, KVM, VFIO,
      etc. it makes no sense to have file descriptors inherited
      across exec() since those are tied to resources that will
      vanish when a another program will replace the current
      one by mean of exec(), so it's safe to use O_CLOEXEC in
      such cases.

      For others, like XFS, the file descriptor is retrieved by
      one program and will be used by a different program,
      executed as a child. In this case, setting O_CLOEXEC would
      break existing application which do not expect to have to
      call fcntl(fd, F_SETFD, 0) to make it available across
      exec().

      If file descriptor created by a subsystem is not tied to
      the current process resources, it's likely legal to use it
      in a different process context, thus O_CLOEXEC must not be
      the default.

      If one, as a subsystem maintainer, cannot tell for sure
      that no userspace program ever rely current behavior, eg.
      file descriptor being inherited across exec(), then the
      default flag *must* be kept 0 to not break application.

- This subsystem cannot be turned to use O_CLOEXEC by default:

If O_CLOEXEC cannot be made the default, it would be interesting
to think to extend the API to have a (set of) function(s) taking
a flag parameter so that userspace can atomically request
close-on-exec if it need it (and it should need it !).

- Background:

One might want to read "Secure File Descriptor Handling" [2] by
Ulrich Drepper who is responsible of adding O_CLOEXEC flag on
open(), and flags alike on other syscalls.

One might also want to read PEP-446 "Make newly created file
descriptors non-inheritable" [3] by Victor Stinner since it has
lot more background information on file descriptor leaking.

One would also like to read "Excuse me son, but your code is
leaking !!!" [4] by Dan Walsh for advice.

[1] http://lwn.net/Articles/412131/
[2] http://udrepper.livejournal.com/20407.html
[3] http://www.python.org/dev/peps/pep-0446/
[4] http://danwalsh.livejournal.com/53603.html

- Statistics:

In linux-next, tag next-20140924, they're currently:

- 33 calls to fd_install()
       with one call part of anon_inode_getfd()
- 26 calls to get_unused_fd_flags()
       with one call part of anon_inode_getfd()
       with another part of get_unused_fd() macro
- 11 calls to anon_inode_getfd()
-  8 calls to anon_inode_getfile()
       with one call part of anon_inode_getfd()
-  6 calls to get_unused_fd()

Changes from patchset v7 [PATCHSETv7]

- Rebased on top of latest -next
- Simplified commit message for trivial patches
- Proofread commit messages
- Addded CC: linux-api@vger.kernel.org

Changes from patchset v6 [PATCHSETv6]

- Rebased on top of latest -next
- Added Cc: trivial@kernel.org for the first trivials
  patches.

Changes from patchset v5 [PATCHSETv5]

- perf: introduce a flag to enable close-on-exec in
  perf_event_open()
  DROPPED: applied upstream, commit a21b0b354d4a.

Changes from patchset v4 [PATCHSETv4]:

- rewrote cover letter following discussion with perf
  maintainer. Thanks to Peter Zijlstra.

- modified a bit some commit messages.

- events: use get_unused_fd_flags(0) instead of get_unused_fd()
  DROPPED: replaced by following patch.

- perf: introduce a flag to enable close-on-exec in
  perf_event_open()
  NEW: instead of hard coding the flags to 0, this patch
       allows userspace to specify close-on-exec flag.

- fanotify: use get_unused_fd_flags(0) instead of get_unused_fd()
  DROPPED: replaced by following patch.

- fanotify: enable close-on-exec on events' fd when requested in
    fanotify_init()
  NEW: instead of hard coding the flags to 0, this patch
       enable close-on-exec if userspace request it.

Changes from patchset v3 [PATCHSETv3]:

- industrialio: use anon_inode_getfd() with O_CLOEXEC flag
  DROPPED: applied upstream, commit a646fbf0fd11.

Changes from patchset v2 [PATCHSETv2]:

- android/sw_sync: use get_unused_fd_flags(O_CLOEXEC) instead
  of get_unused_fd()
  DROPPED: applied upstream, commit 45acea57335e.

- android/sync: use get_unused_fd_flags(O_CLOEXEC) instead of
  get_unused_fd()
  DROPPED: applied upstream, commit 9c6cd3b39048.

- vfio: use get_unused_fd_flags(0) instead of get_unused_fd()
  DROPPED: applied upstream, commit a5d550703d2c.
  Additionally subsystem maintainer applied another patch on top
  to set the flags to O_CLOEXEC, commit 5d042fbdbb2d.

- industrialio: use anon_inode_getfd() with O_CLOEXEC flag
  NEW: propose to use O_CLOEXEC as default flag.

Changes from patchset v1 [PATCHSETv1]:

- explicitly added subsystem maintainers as mail recepients.

- infiniband: use get_unused_fd_flags(0) instead of
  get_unused_fd()
  DROPPED: subsystem maintainer applied another patch
           using get_unused_fd_flags(O_CLOEXEC) as suggested,
	   commit da183c7af844.

- android/sw_sync: use get_unused_fd_flags(0) instead of
  get_unused_fd()
  MODIFIED: use get_unused_fd_flags(O_CLOEXEC) as suggested.

- android/sync: use get_unused_fd_flags(0) instead of
  get_unused_fd()
  MODIFIED: use get_unused_fd_flags(O_CLOEXEC) as suggested.

- xfs: use get_unused_fd_flags(0) instead of get_unused_fd()
  DROPPED: applied asis by subsystem maintainer, commit 862a62937e76.

- sctp: use get_unused_fd_flags(0) instead of get_unused_fd()
  DROPPED: applied asis by subsystem maintainer, commit 8a59bd3e9b29.

Links:

[PATCHSETv7]
  http://lkml.kernel.org/r/cover.1401630396.git.ydroneaud@opteya.com

[PATCHSETv6]
  http://lkml.kernel.org/r/cover.1394532336.git.ydroneaud@opteya.com

[PATCHSETv5]
  http://lkml.kernel.org/r/cover.1388952061.git.ydroneaud@opteya.com

[PATCHSETv4]
  http://lkml.kernel.org/r/cover.1383121137.git.ydroneaud@opteya.com

[PATCHSETv3]
  http://lkml.kernel.org/r/cover.1378460926.git.ydroneaud@opteya.com

[PATCHSETv2]
  http://lkml.kernel.org/r/cover.1376327678.git.ydroneaud@opteya.com

[PATCHSETv1]
  http://lkml.kernel.org/r/cover.1372777600.git.ydroneaud@opteya.com

Yann Droneaud (6):
  fanotify: enable close-on-exec on events' fd when requested in
    fanotify_init()
  ia64: trivial: replace get_unused_fd() by get_unused_fd_flags(0)
  ppc/cell: trivial: replace get_unused_fd() by get_unused_fd_flags(0)
  binfmt_misc: trivial: replace get_unused_fd() by
    get_unused_fd_flags(0)
  file: trivial: replace get_unused_fd() by get_unused_fd_flags(0)
  file: remove get_unused_fd() macro

 arch/ia64/kernel/perfmon.c                | 2 +-
 arch/powerpc/platforms/cell/spufs/inode.c | 4 ++--
 fs/binfmt_misc.c                          | 2 +-
 fs/file.c                                 | 2 +-
 fs/notify/fanotify/fanotify_user.c        | 2 +-
 include/linux/file.h                      | 1 -
 6 files changed, 6 insertions(+), 7 deletions(-)

-- 
1.9.3

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCHv8 1/6] fanotify: enable close-on-exec on events' fd when requested in fanotify_init()
  2014-09-24 13:11 ` Yann Droneaud
  (?)
  (?)
@ 2014-09-24 13:11 ` Yann Droneaud
  2014-09-25 20:57     ` Heinrich Schuchardt
  -1 siblings, 1 reply; 33+ messages in thread
From: Yann Droneaud @ 2014-09-24 13:11 UTC (permalink / raw)
  To: Eric Paris, Richard Guy Briggs, Al Viro, Andrew Morton
  Cc: linux-kernel, Yann Droneaud, linux-fsdevel, stable, linux-api

According to commit 80af258867648 ('fanotify: groups can specify
their f_flags for new fd'), file descriptors created as part of
file access notification events inherit flags from the
event_f_flags argument passed to syscall fanotify_init(2).

So while it is legal for userspace to call fanotify_init() with
O_CLOEXEC as part of its second argument, O_CLOEXEC is currently
silently ignored.

Indeed event_f_flags are only given to dentry_open(), which only
seems to care about O_ACCMODE and O_PATH in do_dentry_open(),
O_DIRECT in open_check_o_direct() and O_LARGEFILE in
generic_file_open().

More, there's no effective check on event_f_flags value that
would catch unknown / unsupported values, unlike the one on
f_flags argument of the syscall (see FAN_ALL_INIT_FLAGS in
include/uapi/linux/fanotify.h).

Reading article "Botching up ioctls"[1] by Daniel Vetter might
make one feel uncomfortable when trying to add extension to an
API that doesn't check for unrecognized values.

But it seems logical to set close-on-exec flag on the file
descriptor if userspace is allowed to request it with O_CLOEXEC.

In fact, according to some lookup on http://codesearch.debian.net/
and various search engine, there's already some userspace code
requesting it:

- in systemd's readahead[2]:

    fanotify_fd = fanotify_init(FAN_CLOEXEC|FAN_NONBLOCK, O_RDONLY|O_LARGEFILE|O_CLOEXEC|O_NOATIME);

- in clsync[3]:

    #define FANOTIFY_EVFLAGS (O_LARGEFILE|O_RDONLY|O_CLOEXEC)

    int fanotify_d = fanotify_init(FANOTIFY_FLAGS, FANOTIFY_EVFLAGS);

- in examples [4] from "Filesystem monitoring in the Linux
  kernel" article[5] by Aleksander Morgado:

    if ((fanotify_fd = fanotify_init (FAN_CLOEXEC,
                                      O_RDONLY | O_CLOEXEC | O_LARGEFILE)) < 0)

Lookup also returned some wrong usage of the syscall:

- in Gonk HAL from Mozilla Firefox OS sources[6]:

    mFd = fanotify_init(FAN_CLASS_NOTIF, FAN_CLOEXEC);

Adding support for O_CLOEXEC in fanotify_init() won't magically
enable it for Gonk since FAN_CLOEXEC is defined as 0x1, which
is likely equal to O_WRONLY when used in open flag context. In
the other hand, it won't hurt it either.

So this patch replaces call to macro get_unused_fd() by a call
to function get_unused_fd_flags() with event_f_flags value as
argument. This way O_CLOEXEC flag in the second argument of
fanotify_init(2) syscall is interpreted so that close-on-exec
get enabled when requested.

[1] http://blog.ffwll.ch/2013/11/botching-up-ioctls.html
[2] http://cgit.freedesktop.org/systemd/systemd/tree/src/readahead/readahead-collect.c?id=v208#n294
[3] https://github.com/xaionaro/clsync/blob/v0.2.1/sync.c#L1631
    https://github.com/xaionaro/clsync/blob/v0.2.1/configuration.h#L38
[4] http://www.lanedo.com/~aleksander/fanotify/fanotify-example.c
[5] http://www.lanedo.com/2013/filesystem-monitoring-linux-kernel/
[6] http://hg.mozilla.org/mozilla-central/file/325c74addeba/hal/gonk/GonkDiskSpaceWatcher.cpp#l167

Link: http://lkml.kernel.org/r/cover.1411562410.git.ydroneaud@opteya.com
Cc: Richard Guy Briggs <rgb@redhat.com>
Cc: Eric Paris <eparis@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: stable@vger.kernel.org
Cc: linux-api@vger.kernel.org
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
---
 fs/notify/fanotify/fanotify_user.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index b13992a41bd9..c991616acca9 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -78,7 +78,7 @@ static int create_fd(struct fsnotify_group *group,
 
 	pr_debug("%s: group=%p event=%p\n", __func__, group, event);
 
-	client_fd = get_unused_fd();
+	client_fd = get_unused_fd_flags(group->fanotify_data.f_flags);
 	if (client_fd < 0)
 		return client_fd;
 
-- 
1.9.3


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCHv8 2/6] ia64: trivial: replace get_unused_fd() by get_unused_fd_flags(0)
  2014-09-24 13:11 ` Yann Droneaud
                   ` (2 preceding siblings ...)
  (?)
@ 2014-09-24 13:11 ` Yann Droneaud
  -1 siblings, 0 replies; 33+ messages in thread
From: Yann Droneaud @ 2014-09-24 13:11 UTC (permalink / raw)
  To: Tony Luck, Fenghua Yu, Al Viro, Andrew Morton, Jiri Kosina
  Cc: linux-kernel, Yann Droneaud, linux-ia64, linux-fsdevel

This patch replaces calls to get_unused_fd() with equivalent call to
get_unused_fd_flags(0) to preserve current behavor for existing code.

In a further patch, get_unused_fd() will be removed so that new code
start using get_unused_fd_flags(), with the hope O_CLOEXEC could be
used, either by default or choosen by userspace.

Link: http://lkml.kernel.org/r/cover.1411562410.git.ydroneaud@opteya.com
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: trivial@kernel.org
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
---
 arch/ia64/kernel/perfmon.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/ia64/kernel/perfmon.c b/arch/ia64/kernel/perfmon.c
index 5845ffea67c3..dc063fe6646a 100644
--- a/arch/ia64/kernel/perfmon.c
+++ b/arch/ia64/kernel/perfmon.c
@@ -2662,7 +2662,7 @@ pfm_context_create(pfm_context_t *ctx, void *arg, int count, struct pt_regs *reg
 
 	ret = -ENOMEM;
 
-	fd = get_unused_fd();
+	fd = get_unused_fd_flags(0);
 	if (fd < 0)
 		return fd;
 
-- 
1.9.3


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCHv8 3/6] ppc/cell: trivial: replace get_unused_fd() by get_unused_fd_flags(0)
  2014-09-24 13:11 ` Yann Droneaud
@ 2014-09-24 13:11   ` Yann Droneaud
  -1 siblings, 0 replies; 33+ messages in thread
From: Yann Droneaud @ 2014-09-24 13:11 UTC (permalink / raw)
  To: Jeremy Kerr, Arnd Bergmann, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Al Viro, Andrew Morton,
	Jiri Kosina
  Cc: linux-kernel, Yann Droneaud, linuxppc-dev, cbe-oss-dev, linux-fsdevel

This patch replaces calls to get_unused_fd() with equivalent call to
get_unused_fd_flags(0) to preserve current behavor for existing code.

In a further patch, get_unused_fd() will be removed so that new code
start using get_unused_fd_flags(), with the hope O_CLOEXEC could be
used, either by default or choosen by userspace.

Link: http://lkml.kernel.org/r/cover.1411562410.git.ydroneaud@opteya.com
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: trivial@kernel.org
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
---
 arch/powerpc/platforms/cell/spufs/inode.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/cell/spufs/inode.c b/arch/powerpc/platforms/cell/spufs/inode.c
index 87ba7cf99cd7..51effcec30d8 100644
--- a/arch/powerpc/platforms/cell/spufs/inode.c
+++ b/arch/powerpc/platforms/cell/spufs/inode.c
@@ -301,7 +301,7 @@ static int spufs_context_open(struct path *path)
 	int ret;
 	struct file *filp;
 
-	ret = get_unused_fd();
+	ret = get_unused_fd_flags(0);
 	if (ret < 0)
 		return ret;
 
@@ -518,7 +518,7 @@ static int spufs_gang_open(struct path *path)
 	int ret;
 	struct file *filp;
 
-	ret = get_unused_fd();
+	ret = get_unused_fd_flags(0);
 	if (ret < 0)
 		return ret;
 
-- 
1.9.3


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCHv8 3/6] ppc/cell: trivial: replace get_unused_fd() by get_unused_fd_flags(0)
@ 2014-09-24 13:11   ` Yann Droneaud
  0 siblings, 0 replies; 33+ messages in thread
From: Yann Droneaud @ 2014-09-24 13:11 UTC (permalink / raw)
  To: Jeremy Kerr, Arnd Bergmann, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Al Viro, Andrew Morton,
	Jiri Kosina
  Cc: Yann Droneaud, cbe-oss-dev, linux-fsdevel, linuxppc-dev, linux-kernel

This patch replaces calls to get_unused_fd() with equivalent call to
get_unused_fd_flags(0) to preserve current behavor for existing code.

In a further patch, get_unused_fd() will be removed so that new code
start using get_unused_fd_flags(), with the hope O_CLOEXEC could be
used, either by default or choosen by userspace.

Link: http://lkml.kernel.org/r/cover.1411562410.git.ydroneaud@opteya.com
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: trivial@kernel.org
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
---
 arch/powerpc/platforms/cell/spufs/inode.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/cell/spufs/inode.c b/arch/powerpc/platforms/cell/spufs/inode.c
index 87ba7cf99cd7..51effcec30d8 100644
--- a/arch/powerpc/platforms/cell/spufs/inode.c
+++ b/arch/powerpc/platforms/cell/spufs/inode.c
@@ -301,7 +301,7 @@ static int spufs_context_open(struct path *path)
 	int ret;
 	struct file *filp;
 
-	ret = get_unused_fd();
+	ret = get_unused_fd_flags(0);
 	if (ret < 0)
 		return ret;
 
@@ -518,7 +518,7 @@ static int spufs_gang_open(struct path *path)
 	int ret;
 	struct file *filp;
 
-	ret = get_unused_fd();
+	ret = get_unused_fd_flags(0);
 	if (ret < 0)
 		return ret;
 
-- 
1.9.3

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCHv8 4/6] binfmt_misc: trivial: replace get_unused_fd() by get_unused_fd_flags(0)
  2014-09-24 13:11 ` Yann Droneaud
                   ` (4 preceding siblings ...)
  (?)
@ 2014-09-24 13:11 ` Yann Droneaud
  -1 siblings, 0 replies; 33+ messages in thread
From: Yann Droneaud @ 2014-09-24 13:11 UTC (permalink / raw)
  To: Al Viro, Andrew Morton, Jiri Kosina
  Cc: linux-kernel, Yann Droneaud, linux-fsdevel

This patch replaces calls to get_unused_fd() with equivalent call to
get_unused_fd_flags(0) to preserve current behavor for existing code.

In a further patch, get_unused_fd() will be removed so that new code
start using get_unused_fd_flags(), with the hope O_CLOEXEC could be
used, either by default or choosen by userspace.

Link: http://lkml.kernel.org/r/cover.1411562410.git.ydroneaud@opteya.com
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: trivial@kernel.org
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
---
 fs/binfmt_misc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c
index fd8beb9657a2..1cc5377ba955 100644
--- a/fs/binfmt_misc.c
+++ b/fs/binfmt_misc.c
@@ -153,7 +153,7 @@ static int load_misc_binary(struct linux_binprm *bprm)
 		/* if the binary should be opened on behalf of the
 		 * interpreter than keep it open and assign descriptor
 		 * to it */
- 		fd_binary = get_unused_fd();
+		fd_binary = get_unused_fd_flags(0);
  		if (fd_binary < 0) {
  			retval = fd_binary;
  			goto _ret;
-- 
1.9.3


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCHv8 5/6] file: trivial: replace get_unused_fd() by get_unused_fd_flags(0)
  2014-09-24 13:11 ` Yann Droneaud
                   ` (5 preceding siblings ...)
  (?)
@ 2014-09-24 13:11 ` Yann Droneaud
  -1 siblings, 0 replies; 33+ messages in thread
From: Yann Droneaud @ 2014-09-24 13:11 UTC (permalink / raw)
  To: Al Viro, Andrew Morton, Jiri Kosina
  Cc: linux-kernel, Yann Droneaud, linux-fsdevel

This patch replaces calls to get_unused_fd() with equivalent call to
get_unused_fd_flags(0) to preserve current behavor for existing code.

In a further patch, get_unused_fd() will be removed so that new code
start using get_unused_fd_flags(), with the hope O_CLOEXEC could be
used, either by default or choosen by userspace.

Link: http://lkml.kernel.org/r/cover.1411562410.git.ydroneaud@opteya.com
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: trivial@kernel.org
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
---
 fs/file.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/file.c b/fs/file.c
index ab3eb6a88239..ee738ea028fa 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -869,7 +869,7 @@ SYSCALL_DEFINE1(dup, unsigned int, fildes)
 	struct file *file = fget_raw(fildes);
 
 	if (file) {
-		ret = get_unused_fd();
+		ret = get_unused_fd_flags(0);
 		if (ret >= 0)
 			fd_install(ret, file);
 		else
-- 
1.9.3


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCHv8 6/6] file: remove get_unused_fd() macro
  2014-09-24 13:11 ` Yann Droneaud
                   ` (6 preceding siblings ...)
  (?)
@ 2014-09-24 13:11 ` Yann Droneaud
  -1 siblings, 0 replies; 33+ messages in thread
From: Yann Droneaud @ 2014-09-24 13:11 UTC (permalink / raw)
  To: Al Viro, Andrew Morton; +Cc: linux-kernel, Yann Droneaud, linux-fsdevel

Macro get_unused_fd() is used to allocate a file descriptor
with default flags. Those default flags (0) don't enable
close-on-exec.

This can be seen as an unsafe default: in most case close-on-exec
should be enabled to not leak file descriptor across exec().

It would be better to have a "safer" default set of flags, eg.
O_CLOEXEC must be used to enable close-on-exec.

Instead this patch removes get_unused_fd() so that out of tree
modules won't be affect by a runtime behavor change which might
introduce other kind of bugs: it's better to catch the change at
build time, making it easier to fix.

Removing the macro will also promote use of get_unused_fd_flags()
(or anon_inode_getfd()) with flags provided by userspace. Or, if
flags cannot be given by userspace, with flags set to O_CLOEXEC
by default.

Link: http://lkml.kernel.org/r/cover.1411562410.git.ydroneaud@opteya.com
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
---
 include/linux/file.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/include/linux/file.h b/include/linux/file.h
index 4d69123377a2..f87d30882a24 100644
--- a/include/linux/file.h
+++ b/include/linux/file.h
@@ -66,7 +66,6 @@ extern void set_close_on_exec(unsigned int fd, int flag);
 extern bool get_close_on_exec(unsigned int fd);
 extern void put_filp(struct file *);
 extern int get_unused_fd_flags(unsigned flags);
-#define get_unused_fd() get_unused_fd_flags(0)
 extern void put_unused_fd(unsigned int fd);
 
 extern void fd_install(unsigned int fd, struct file *file);
-- 
1.9.3


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCHv8 1/6] fanotify: enable close-on-exec on events' fd when requested in fanotify_init()
@ 2014-09-25 20:57     ` Heinrich Schuchardt
  0 siblings, 0 replies; 33+ messages in thread
From: Heinrich Schuchardt @ 2014-09-25 20:57 UTC (permalink / raw)
  To: Yann Droneaud
  Cc: Eric Paris, Richard Guy Briggs, Al Viro, Andrew Morton,
	linux-kernel, linux-fsdevel, stable, linux-api, Jan Kara

On 24.09.2014 15:11, Yann Droneaud wrote:
> According to commit 80af258867648 ('fanotify: groups can specify
> their f_flags for new fd'), file descriptors created as part of
> file access notification events inherit flags from the
> event_f_flags argument passed to syscall fanotify_init(2).
>
> So while it is legal for userspace to call fanotify_init() with
> O_CLOEXEC as part of its second argument, O_CLOEXEC is currently
> silently ignored.
>
> Indeed event_f_flags are only given to dentry_open(), which only
> seems to care about O_ACCMODE and O_PATH in do_dentry_open(),
> O_DIRECT in open_check_o_direct() and O_LARGEFILE in
> generic_file_open().

I tested on kernel 3.17.0-rc5. I passed O_CLOEXEC in event_f_flags.
When I called fcnt(event_metadata->fd, F_GETFD) it did not return 
FD_CLOEXEC. So I can confirm your observation that O_CLOEXEC is not 
working as expected.

I found this definition
#define get_unused_fd() get_unused_fd_flags(0)

So essentially when get_unused_fd() is called for a fanotify event 
O_CLOEXEC is ignored.

This is what your patch fixes.

>
> More, there's no effective check on event_f_flags value that
> would catch unknown / unsupported values, unlike the one on
> f_flags argument of the syscall (see FAN_ALL_INIT_FLAGS in
> include/uapi/linux/fanotify.h).

The fanotify_init(2) man page describes which flags are allowable in 
event_f_flags.

Could you, please, explain why the following code in fanotify_user.c is 
not to be considered an effective check:

     if (event_f_flags & ~FANOTIFY_INIT_ALL_EVENT_F_BITS)
         return -EINVAL;

     switch (event_f_flags & O_ACCMODE) {
     case O_RDONLY:
     case O_RDWR:
     case O_WRONLY:
         break;
     default:
         return -EINVAL;
     }

I CC Jan Kara as he reviewed the code.

Best regards

Heinrich Schuchardt

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCHv8 1/6] fanotify: enable close-on-exec on events' fd when requested in fanotify_init()
@ 2014-09-25 20:57     ` Heinrich Schuchardt
  0 siblings, 0 replies; 33+ messages in thread
From: Heinrich Schuchardt @ 2014-09-25 20:57 UTC (permalink / raw)
  To: Yann Droneaud
  Cc: Eric Paris, Richard Guy Briggs, Al Viro, Andrew Morton,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	stable-u79uwXL29TY76Z2rM5mHXA, linux-api-u79uwXL29TY76Z2rM5mHXA,
	Jan Kara

On 24.09.2014 15:11, Yann Droneaud wrote:
> According to commit 80af258867648 ('fanotify: groups can specify
> their f_flags for new fd'), file descriptors created as part of
> file access notification events inherit flags from the
> event_f_flags argument passed to syscall fanotify_init(2).
>
> So while it is legal for userspace to call fanotify_init() with
> O_CLOEXEC as part of its second argument, O_CLOEXEC is currently
> silently ignored.
>
> Indeed event_f_flags are only given to dentry_open(), which only
> seems to care about O_ACCMODE and O_PATH in do_dentry_open(),
> O_DIRECT in open_check_o_direct() and O_LARGEFILE in
> generic_file_open().

I tested on kernel 3.17.0-rc5. I passed O_CLOEXEC in event_f_flags.
When I called fcnt(event_metadata->fd, F_GETFD) it did not return 
FD_CLOEXEC. So I can confirm your observation that O_CLOEXEC is not 
working as expected.

I found this definition
#define get_unused_fd() get_unused_fd_flags(0)

So essentially when get_unused_fd() is called for a fanotify event 
O_CLOEXEC is ignored.

This is what your patch fixes.

>
> More, there's no effective check on event_f_flags value that
> would catch unknown / unsupported values, unlike the one on
> f_flags argument of the syscall (see FAN_ALL_INIT_FLAGS in
> include/uapi/linux/fanotify.h).

The fanotify_init(2) man page describes which flags are allowable in 
event_f_flags.

Could you, please, explain why the following code in fanotify_user.c is 
not to be considered an effective check:

     if (event_f_flags & ~FANOTIFY_INIT_ALL_EVENT_F_BITS)
         return -EINVAL;

     switch (event_f_flags & O_ACCMODE) {
     case O_RDONLY:
     case O_RDWR:
     case O_WRONLY:
         break;
     default:
         return -EINVAL;
     }

I CC Jan Kara as he reviewed the code.

Best regards

Heinrich Schuchardt

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCHv8 1/6] fanotify: enable close-on-exec on events' fd when requested in fanotify_init()
  2014-09-25 20:57     ` Heinrich Schuchardt
  (?)
@ 2014-09-26  8:58     ` Yann Droneaud
  2014-09-27  7:26       ` Heinrich Schuchardt
  -1 siblings, 1 reply; 33+ messages in thread
From: Yann Droneaud @ 2014-09-26  8:58 UTC (permalink / raw)
  To: Heinrich Schuchardt
  Cc: Eric Paris, Richard Guy Briggs, Al Viro, Andrew Morton,
	linux-kernel, linux-fsdevel, stable, linux-api, Jan Kara,
	Lino Sanfilippo

Hi,

Le jeudi 25 septembre 2014 à 22:57 +0200, Heinrich Schuchardt a écrit :
> On 24.09.2014 15:11, Yann Droneaud wrote:
> > According to commit 80af258867648 ('fanotify: groups can specify
> > their f_flags for new fd'), file descriptors created as part of
> > file access notification events inherit flags from the
> > event_f_flags argument passed to syscall fanotify_init(2).
> >
> > So while it is legal for userspace to call fanotify_init() with
> > O_CLOEXEC as part of its second argument, O_CLOEXEC is currently
> > silently ignored.
> >
> > Indeed event_f_flags are only given to dentry_open(), which only
> > seems to care about O_ACCMODE and O_PATH in do_dentry_open(),
> > O_DIRECT in open_check_o_direct() and O_LARGEFILE in
> > generic_file_open().
> 
> I tested on kernel 3.17.0-rc5. I passed O_CLOEXEC in event_f_flags.
> When I called fcnt(event_metadata->fd, F_GETFD) it did not return 
> FD_CLOEXEC. So I can confirm your observation that O_CLOEXEC is not 
> working as expected.
> 
> I found this definition
> #define get_unused_fd() get_unused_fd_flags(0)
> 
> So essentially when get_unused_fd() is called for a fanotify event 
> O_CLOEXEC is ignored.
> 
> This is what your patch fixes.
> 
> >
> > More, there's no effective check on event_f_flags value that
> > would catch unknown / unsupported values, unlike the one on
> > f_flags argument of the syscall (see FAN_ALL_INIT_FLAGS in
> > include/uapi/linux/fanotify.h).
> 
> The fanotify_init(2) man page describes which flags are allowable in 
> event_f_flags.
> 
> Could you, please, explain why the following code in fanotify_user.c is 
> not to be considered an effective check:
> 
>      if (event_f_flags & ~FANOTIFY_INIT_ALL_EVENT_F_BITS)
>          return -EINVAL;
> 
>      switch (event_f_flags & O_ACCMODE) {
>      case O_RDONLY:
>      case O_RDWR:
>      case O_WRONLY:
>          break;
>      default:
>          return -EINVAL;
>      }
> 
> I CC Jan Kara as he reviewed the code.
> 

I missed the opportunity to update my commit message.

I've sent my initial version of the patch (with the same wording) on
5th, January, 2014:

http://mid.gmane.org/3d9591f81e62a78a726721c8052b3910870db35e.1388952061.git.ydroneaud@opteya.com
http://lkml.kernel.org/r/3d9591f81e62a78a726721c8052b3910870db35e.1388952061.git.ydroneaud@opteya.com

The patch was sent again on March, 11:

http://lkml.kernel.org/r/baab31b572b216d13f2149bdf07e0f79a1bae660.1394532336.git.ydroneaud@opteya.com
http://mid.gmane.org/baab31b572b216d13f2149bdf07e0f79a1bae660.1394532336.git.ydroneaud@opteya.com

And another time, on June, 1st:

http://lkml.kernel.org/r/2c6ab28980f0007ea3b9afa7ecd7497806a6a451.1401630396.git.ydroneaud@opteya.com
http://mid.gmane.org/2c6ab28980f0007ea3b9afa7ecd7497806a6a451.1401630396.git.ydroneaud@opteya.com

So as you can see, my patch predate yours:

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=48149e9d3a7e924010a0daab30a6197b7d7b6580

But I have to apologize: I haven't noticed your patch was merged between
my previous submission and the current one. My bad.

I will update the commit message to remove my obsolete comment on the
input parameter check.

Thanks again for review and testing.

Regards.

-- 
Yann Droneaud
OPTEYA



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCHv8 1/6] fanotify: enable close-on-exec on events' fd when requested in fanotify_init()
  2014-09-26  8:58     ` Yann Droneaud
@ 2014-09-27  7:26       ` Heinrich Schuchardt
  2014-09-29  8:49         ` [PATCHv8.1] " Yann Droneaud
  0 siblings, 1 reply; 33+ messages in thread
From: Heinrich Schuchardt @ 2014-09-27  7:26 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Yann Droneaud, Eric Paris, Richard Guy Briggs, Al Viro,
	linux-kernel, linux-fsdevel, stable, linux-api, Jan Kara,
	Lino Sanfilippo

Hello Andrew,

please, add the patch described in
https://lkml.org/lkml/2014/9/24/967
to the MM tree.

I have tested kernel 3.17.0-rc6 with and without the patch and it fixes 
the described error.
As the patch is valid independent of the rest of the patch set, there is 
no reason to wait for the rest to be merged.

You may add
Reviewed by: Heinrich Schuchardt <xypron.glpk@gmx.de>

Best regards

Heinrich Schuchardt




On 26.09.2014 10:58, Yann Droneaud wrote:
> Hi,
>
> Le jeudi 25 septembre 2014 à 22:57 +0200, Heinrich Schuchardt a écrit :
>> On 24.09.2014 15:11, Yann Droneaud wrote:
>>> According to commit 80af258867648 ('fanotify: groups can specify
>>> their f_flags for new fd'), file descriptors created as part of
>>> file access notification events inherit flags from the
>>> event_f_flags argument passed to syscall fanotify_init(2).
>>>
>>> So while it is legal for userspace to call fanotify_init() with
>>> O_CLOEXEC as part of its second argument, O_CLOEXEC is currently
>>> silently ignored.
>>>
>>> Indeed event_f_flags are only given to dentry_open(), which only
>>> seems to care about O_ACCMODE and O_PATH in do_dentry_open(),
>>> O_DIRECT in open_check_o_direct() and O_LARGEFILE in
>>> generic_file_open().
>>
>> I tested on kernel 3.17.0-rc5. I passed O_CLOEXEC in event_f_flags.
>> When I called fcnt(event_metadata->fd, F_GETFD) it did not return
>> FD_CLOEXEC. So I can confirm your observation that O_CLOEXEC is not
>> working as expected.
>>
>> I found this definition
>> #define get_unused_fd() get_unused_fd_flags(0)
>>
>> So essentially when get_unused_fd() is called for a fanotify event
>> O_CLOEXEC is ignored.
>>
>> This is what your patch fixes.
>>


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCHv8.1] fanotify: enable close-on-exec on events' fd when requested in fanotify_init()
  2014-09-27  7:26       ` Heinrich Schuchardt
@ 2014-09-29  8:49         ` Yann Droneaud
  2014-09-29 11:50           ` Jan Kara
  2014-10-01 22:36             ` Andrew Morton
  0 siblings, 2 replies; 33+ messages in thread
From: Yann Droneaud @ 2014-09-29  8:49 UTC (permalink / raw)
  To: Heinrich Schuchardt, Andrew Morton
  Cc: Yann Droneaud, Eric Paris, Richard Guy Briggs, Al Viro,
	linux-kernel, linux-fsdevel, stable, linux-api, Jan Kara,
	Lino Sanfilippo, Valdis Kletnieks, Michael Kerrisk-manpages

According to commit 80af258867648 ('fanotify: groups can specify
their f_flags for new fd'), file descriptors created as part of
file access notification events inherit flags from the
event_f_flags argument passed to syscall fanotify_init(2).

So while it is legal for userspace to call fanotify_init() with
O_CLOEXEC as part of its second argument, O_CLOEXEC is currently
silently ignored.

Indeed event_f_flags are only given to dentry_open(), which only
seems to care about O_ACCMODE and O_PATH in do_dentry_open(),
O_DIRECT in open_check_o_direct() and O_LARGEFILE in
generic_file_open().

But it seems logical to set close-on-exec flag on the file
descriptor if userspace is allowed to request it with O_CLOEXEC.

In fact, according to some lookup on http://codesearch.debian.net/
and various search engine, there's already some userspace code
requesting it:

- in systemd's readahead[2]:

    fanotify_fd = fanotify_init(FAN_CLOEXEC|FAN_NONBLOCK, O_RDONLY|O_LARGEFILE|O_CLOEXEC|O_NOATIME);

- in clsync[3]:

    #define FANOTIFY_EVFLAGS (O_LARGEFILE|O_RDONLY|O_CLOEXEC)

    int fanotify_d = fanotify_init(FANOTIFY_FLAGS, FANOTIFY_EVFLAGS);

- in examples [4] from "Filesystem monitoring in the Linux
  kernel" article[5] by Aleksander Morgado:

    if ((fanotify_fd = fanotify_init (FAN_CLOEXEC,
                                      O_RDONLY | O_CLOEXEC | O_LARGEFILE)) < 0)

Lookup also returned some wrong usage of the syscall:

- in Gonk HAL from Mozilla Firefox OS sources[6]:

    mFd = fanotify_init(FAN_CLASS_NOTIF, FAN_CLOEXEC);

Adding support for O_CLOEXEC in fanotify_init() won't magically
enable it for Gonk since FAN_CLOEXEC is defined as 0x1, which
is likely equal to O_WRONLY when used in open flag context. In
the other hand, it won't hurt it either.

So this patch replaces call to macro get_unused_fd() by a call
to function get_unused_fd_flags() with event_f_flags value as
argument. This way O_CLOEXEC flag in the second argument of
fanotify_init(2) syscall is interpreted so that close-on-exec
get enabled when requested.

[1] http://blog.ffwll.ch/2013/11/botching-up-ioctls.html
[2] http://cgit.freedesktop.org/systemd/systemd/tree/src/readahead/readahead-collect.c?id=v208#n294
[3] https://github.com/xaionaro/clsync/blob/v0.2.1/sync.c#L1631
    https://github.com/xaionaro/clsync/blob/v0.2.1/configuration.h#L38
[4] http://www.lanedo.com/~aleksander/fanotify/fanotify-example.c
[5] http://www.lanedo.com/2013/filesystem-monitoring-linux-kernel/
[6] http://hg.mozilla.org/mozilla-central/file/325c74addeba/hal/gonk/GonkDiskSpaceWatcher.cpp#l167

Link: http://lkml.kernel.org/r/cover.1394532336.git.ydroneaud@opteya.com
Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>
Cc: Jan Kara <jack@suse.cz>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Cc: Michael Kerrisk-manpages <mtk.manpages@gmail.com>
Cc: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Cc: Richard Guy Briggs <rgb@redhat.com>
Cc: Eric Paris <eparis@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: stable@vger.kernel.org
Cc: linux-api@vger.kernel.org
Reviewed by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
---
Hi Andrew and Henrich,

Please find an updated patch with a commit message fixed regarding
the obsolote comments on code which is now updated, thanks to
Heinrich's patch.

Changes from v8:
- fixed commit message
- added Reviewed-by:

Regards.

Yann Droneaud,
OPTEYA

 fs/notify/fanotify/fanotify_user.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index b13992a41bd9..c991616acca9 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -78,7 +78,7 @@ static int create_fd(struct fsnotify_group *group,
 
 	pr_debug("%s: group=%p event=%p\n", __func__, group, event);
 
-	client_fd = get_unused_fd();
+	client_fd = get_unused_fd_flags(group->fanotify_data.f_flags);
 	if (client_fd < 0)
 		return client_fd;
 
-- 
1.9.3


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCHv8.1] fanotify: enable close-on-exec on events' fd when requested in fanotify_init()
  2014-09-29  8:49         ` [PATCHv8.1] " Yann Droneaud
@ 2014-09-29 11:50           ` Jan Kara
  2014-10-01 22:36             ` Andrew Morton
  1 sibling, 0 replies; 33+ messages in thread
From: Jan Kara @ 2014-09-29 11:50 UTC (permalink / raw)
  To: Yann Droneaud
  Cc: Heinrich Schuchardt, Andrew Morton, Eric Paris,
	Richard Guy Briggs, Al Viro, linux-kernel, linux-fsdevel, stable,
	linux-api, Jan Kara, Lino Sanfilippo, Valdis Kletnieks,
	Michael Kerrisk-manpages

On Mon 29-09-14 10:49:15, Yann Droneaud wrote:
> According to commit 80af258867648 ('fanotify: groups can specify
> their f_flags for new fd'), file descriptors created as part of
> file access notification events inherit flags from the
> event_f_flags argument passed to syscall fanotify_init(2).
> 
> So while it is legal for userspace to call fanotify_init() with
> O_CLOEXEC as part of its second argument, O_CLOEXEC is currently
> silently ignored.
> 
> Indeed event_f_flags are only given to dentry_open(), which only
> seems to care about O_ACCMODE and O_PATH in do_dentry_open(),
> O_DIRECT in open_check_o_direct() and O_LARGEFILE in
> generic_file_open().
> 
> But it seems logical to set close-on-exec flag on the file
> descriptor if userspace is allowed to request it with O_CLOEXEC.
> 
> In fact, according to some lookup on http://codesearch.debian.net/
> and various search engine, there's already some userspace code
> requesting it:
> 
> - in systemd's readahead[2]:
> 
>     fanotify_fd = fanotify_init(FAN_CLOEXEC|FAN_NONBLOCK, O_RDONLY|O_LARGEFILE|O_CLOEXEC|O_NOATIME);
> 
> - in clsync[3]:
> 
>     #define FANOTIFY_EVFLAGS (O_LARGEFILE|O_RDONLY|O_CLOEXEC)
> 
>     int fanotify_d = fanotify_init(FANOTIFY_FLAGS, FANOTIFY_EVFLAGS);
> 
> - in examples [4] from "Filesystem monitoring in the Linux
>   kernel" article[5] by Aleksander Morgado:
> 
>     if ((fanotify_fd = fanotify_init (FAN_CLOEXEC,
>                                       O_RDONLY | O_CLOEXEC | O_LARGEFILE)) < 0)
> 
> Lookup also returned some wrong usage of the syscall:
> 
> - in Gonk HAL from Mozilla Firefox OS sources[6]:
> 
>     mFd = fanotify_init(FAN_CLASS_NOTIF, FAN_CLOEXEC);
> 
> Adding support for O_CLOEXEC in fanotify_init() won't magically
> enable it for Gonk since FAN_CLOEXEC is defined as 0x1, which
> is likely equal to O_WRONLY when used in open flag context. In
> the other hand, it won't hurt it either.
> 
> So this patch replaces call to macro get_unused_fd() by a call
> to function get_unused_fd_flags() with event_f_flags value as
> argument. This way O_CLOEXEC flag in the second argument of
> fanotify_init(2) syscall is interpreted so that close-on-exec
> get enabled when requested.
> 
> [1] http://blog.ffwll.ch/2013/11/botching-up-ioctls.html
> [2] http://cgit.freedesktop.org/systemd/systemd/tree/src/readahead/readahead-collect.c?id=v208#n294
> [3] https://github.com/xaionaro/clsync/blob/v0.2.1/sync.c#L1631
>     https://github.com/xaionaro/clsync/blob/v0.2.1/configuration.h#L38
> [4] http://www.lanedo.com/~aleksander/fanotify/fanotify-example.c
> [5] http://www.lanedo.com/2013/filesystem-monitoring-linux-kernel/
> [6] http://hg.mozilla.org/mozilla-central/file/325c74addeba/hal/gonk/GonkDiskSpaceWatcher.cpp#l167
> 
> Link: http://lkml.kernel.org/r/cover.1394532336.git.ydroneaud@opteya.com
> Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
> Cc: Michael Kerrisk-manpages <mtk.manpages@gmail.com>
> Cc: Lino Sanfilippo <LinoSanfilippo@gmx.de>
> Cc: Richard Guy Briggs <rgb@redhat.com>
> Cc: Eric Paris <eparis@redhat.com>
> Cc: Al Viro <viro@zeniv.linux.org.uk>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: stable@vger.kernel.org
> Cc: linux-api@vger.kernel.org
> Reviewed by: Heinrich Schuchardt <xypron.glpk@gmx.de>
> Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
  The patch looks good. You can add:
Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
> Hi Andrew and Henrich,
> 
> Please find an updated patch with a commit message fixed regarding
> the obsolote comments on code which is now updated, thanks to
> Heinrich's patch.
> 
> Changes from v8:
> - fixed commit message
> - added Reviewed-by:
> 
> Regards.
> 
> Yann Droneaud,
> OPTEYA
> 
>  fs/notify/fanotify/fanotify_user.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
> index b13992a41bd9..c991616acca9 100644
> --- a/fs/notify/fanotify/fanotify_user.c
> +++ b/fs/notify/fanotify/fanotify_user.c
> @@ -78,7 +78,7 @@ static int create_fd(struct fsnotify_group *group,
>  
>  	pr_debug("%s: group=%p event=%p\n", __func__, group, event);
>  
> -	client_fd = get_unused_fd();
> +	client_fd = get_unused_fd_flags(group->fanotify_data.f_flags);
>  	if (client_fd < 0)
>  		return client_fd;
>  
> -- 
> 1.9.3
> 
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCHv8.1] fanotify: enable close-on-exec on events' fd when requested in fanotify_init()
@ 2014-10-01 22:36             ` Andrew Morton
  0 siblings, 0 replies; 33+ messages in thread
From: Andrew Morton @ 2014-10-01 22:36 UTC (permalink / raw)
  To: Yann Droneaud
  Cc: Heinrich Schuchardt, Eric Paris, Richard Guy Briggs, Al Viro,
	linux-kernel, linux-fsdevel, stable, linux-api, Jan Kara,
	Lino Sanfilippo, Valdis Kletnieks, Michael Kerrisk-manpages

On Mon, 29 Sep 2014 10:49:15 +0200 Yann Droneaud <ydroneaud@opteya.com> wrote:

> According to commit 80af258867648 ('fanotify: groups can specify
> their f_flags for new fd'), file descriptors created as part of
> file access notification events inherit flags from the
> event_f_flags argument passed to syscall fanotify_init(2).
> 
> So while it is legal for userspace to call fanotify_init() with
> O_CLOEXEC as part of its second argument, O_CLOEXEC is currently
> silently ignored.
> 
> Indeed event_f_flags are only given to dentry_open(), which only
> seems to care about O_ACCMODE and O_PATH in do_dentry_open(),
> O_DIRECT in open_check_o_direct() and O_LARGEFILE in
> generic_file_open().
> 
> But it seems logical to set close-on-exec flag on the file
> descriptor if userspace is allowed to request it with O_CLOEXEC.
> 
> In fact, according to some lookup on http://codesearch.debian.net/
> and various search engine, there's already some userspace code
> requesting it:
> 
> - in systemd's readahead[2]:
> 
>     fanotify_fd = fanotify_init(FAN_CLOEXEC|FAN_NONBLOCK, O_RDONLY|O_LARGEFILE|O_CLOEXEC|O_NOATIME);
> 
> - in clsync[3]:
> 
>     #define FANOTIFY_EVFLAGS (O_LARGEFILE|O_RDONLY|O_CLOEXEC)
> 
>     int fanotify_d = fanotify_init(FANOTIFY_FLAGS, FANOTIFY_EVFLAGS);
> 
> - in examples [4] from "Filesystem monitoring in the Linux
>   kernel" article[5] by Aleksander Morgado:
> 
>     if ((fanotify_fd = fanotify_init (FAN_CLOEXEC,
>                                       O_RDONLY | O_CLOEXEC | O_LARGEFILE)) < 0)

So we have a number of apps which are setting O_CLOEXEC, but it doesn't
actually work.  With this change it *will* work, so the behaviour of
those apps might change, possibly breaking them?


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCHv8.1] fanotify: enable close-on-exec on events' fd when requested in fanotify_init()
@ 2014-10-01 22:36             ` Andrew Morton
  0 siblings, 0 replies; 33+ messages in thread
From: Andrew Morton @ 2014-10-01 22:36 UTC (permalink / raw)
  To: Yann Droneaud
  Cc: Heinrich Schuchardt, Eric Paris, Richard Guy Briggs, Al Viro,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	stable-u79uwXL29TY76Z2rM5mHXA, linux-api-u79uwXL29TY76Z2rM5mHXA,
	Jan Kara, Lino Sanfilippo, Valdis Kletnieks,
	Michael Kerrisk-manpages

On Mon, 29 Sep 2014 10:49:15 +0200 Yann Droneaud <ydroneaud-RlY5vtjFyJ3QT0dZR+AlfA@public.gmane.org> wrote:

> According to commit 80af258867648 ('fanotify: groups can specify
> their f_flags for new fd'), file descriptors created as part of
> file access notification events inherit flags from the
> event_f_flags argument passed to syscall fanotify_init(2).
> 
> So while it is legal for userspace to call fanotify_init() with
> O_CLOEXEC as part of its second argument, O_CLOEXEC is currently
> silently ignored.
> 
> Indeed event_f_flags are only given to dentry_open(), which only
> seems to care about O_ACCMODE and O_PATH in do_dentry_open(),
> O_DIRECT in open_check_o_direct() and O_LARGEFILE in
> generic_file_open().
> 
> But it seems logical to set close-on-exec flag on the file
> descriptor if userspace is allowed to request it with O_CLOEXEC.
> 
> In fact, according to some lookup on http://codesearch.debian.net/
> and various search engine, there's already some userspace code
> requesting it:
> 
> - in systemd's readahead[2]:
> 
>     fanotify_fd = fanotify_init(FAN_CLOEXEC|FAN_NONBLOCK, O_RDONLY|O_LARGEFILE|O_CLOEXEC|O_NOATIME);
> 
> - in clsync[3]:
> 
>     #define FANOTIFY_EVFLAGS (O_LARGEFILE|O_RDONLY|O_CLOEXEC)
> 
>     int fanotify_d = fanotify_init(FANOTIFY_FLAGS, FANOTIFY_EVFLAGS);
> 
> - in examples [4] from "Filesystem monitoring in the Linux
>   kernel" article[5] by Aleksander Morgado:
> 
>     if ((fanotify_fd = fanotify_init (FAN_CLOEXEC,
>                                       O_RDONLY | O_CLOEXEC | O_LARGEFILE)) < 0)

So we have a number of apps which are setting O_CLOEXEC, but it doesn't
actually work.  With this change it *will* work, so the behaviour of
those apps might change, possibly breaking them?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCHv8.1] fanotify: enable close-on-exec on events' fd when requested in fanotify_init()
  2014-10-01 22:36             ` Andrew Morton
  (?)
@ 2014-10-02  6:20             ` Yann Droneaud
  2014-10-02  6:50               ` Mihai Donțu
  2014-10-02  7:52               ` [PATCH] fanotify: add a flag to allow setting O_CLOEXEC on event fd Yann Droneaud
  -1 siblings, 2 replies; 33+ messages in thread
From: Yann Droneaud @ 2014-10-02  6:20 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Heinrich Schuchardt, Eric Paris, Richard Guy Briggs, Al Viro,
	linux-kernel, linux-fsdevel, stable, linux-api, Jan Kara,
	Lino Sanfilippo, Valdis Kletnieks, Michael Kerrisk-manpages,
	Yann Droneaud

Hi,

Le mercredi 01 octobre 2014 à 15:36 -0700, Andrew Morton a écrit :
> On Mon, 29 Sep 2014 10:49:15 +0200 Yann Droneaud <ydroneaud@opteya.com> wrote:
> 
> > According to commit 80af258867648 ('fanotify: groups can specify
> > their f_flags for new fd'), file descriptors created as part of
> > file access notification events inherit flags from the
> > event_f_flags argument passed to syscall fanotify_init(2).
> > 
> > So while it is legal for userspace to call fanotify_init() with
> > O_CLOEXEC as part of its second argument, O_CLOEXEC is currently
> > silently ignored.
> > 
> > Indeed event_f_flags are only given to dentry_open(), which only
> > seems to care about O_ACCMODE and O_PATH in do_dentry_open(),
> > O_DIRECT in open_check_o_direct() and O_LARGEFILE in
> > generic_file_open().
> > 
> > But it seems logical to set close-on-exec flag on the file
> > descriptor if userspace is allowed to request it with O_CLOEXEC.
> > 
> > In fact, according to some lookup on http://codesearch.debian.net/
> > and various search engine, there's already some userspace code
> > requesting it:
> > 
> > - in systemd's readahead[2]:
> > 
> >     fanotify_fd = fanotify_init(FAN_CLOEXEC|FAN_NONBLOCK, O_RDONLY|O_LARGEFILE|O_CLOEXEC|O_NOATIME);
> > 
> > - in clsync[3]:
> > 
> >     #define FANOTIFY_EVFLAGS (O_LARGEFILE|O_RDONLY|O_CLOEXEC)
> > 
> >     int fanotify_d = fanotify_init(FANOTIFY_FLAGS, FANOTIFY_EVFLAGS);
> > 
> > - in examples [4] from "Filesystem monitoring in the Linux
> >   kernel" article[5] by Aleksander Morgado:
> > 
> >     if ((fanotify_fd = fanotify_init (FAN_CLOEXEC,
> >                                       O_RDONLY | O_CLOEXEC | O_LARGEFILE)) < 0)
> 
> So we have a number of apps which are setting O_CLOEXEC, but it doesn't
> actually work.  With this change it *will* work, so the behaviour of
> those apps might change, possibly breaking them?
> 

In the other hand, not enabling close-on-exec might expose unwanted file
descriptor to childs, creating security issues. YMMV.

-- 
Yann Droneaud
OPTEYA



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCHv8.1] fanotify: enable close-on-exec on events' fd when requested in fanotify_init()
  2014-10-02  6:20             ` Yann Droneaud
@ 2014-10-02  6:50               ` Mihai Donțu
  2014-10-02  7:52               ` [PATCH] fanotify: add a flag to allow setting O_CLOEXEC on event fd Yann Droneaud
  1 sibling, 0 replies; 33+ messages in thread
From: Mihai Donțu @ 2014-10-02  6:50 UTC (permalink / raw)
  To: Yann Droneaud
  Cc: Andrew Morton, Heinrich Schuchardt, Eric Paris,
	Richard Guy Briggs, Al Viro, linux-kernel, linux-fsdevel, stable,
	linux-api, Jan Kara, Lino Sanfilippo, Valdis Kletnieks,
	Michael Kerrisk-manpages

On Thu, 02 Oct 2014 08:20:55 +0200 Yann Droneaud wrote:
> Hi,
> 
> Le mercredi 01 octobre 2014 à 15:36 -0700, Andrew Morton a écrit :
> > On Mon, 29 Sep 2014 10:49:15 +0200 Yann Droneaud <ydroneaud@opteya.com> wrote:
> > 
> > > According to commit 80af258867648 ('fanotify: groups can specify
> > > their f_flags for new fd'), file descriptors created as part of
> > > file access notification events inherit flags from the
> > > event_f_flags argument passed to syscall fanotify_init(2).
> > > 
> > > So while it is legal for userspace to call fanotify_init() with
> > > O_CLOEXEC as part of its second argument, O_CLOEXEC is currently
> > > silently ignored.
> > > 
> > > Indeed event_f_flags are only given to dentry_open(), which only
> > > seems to care about O_ACCMODE and O_PATH in do_dentry_open(),
> > > O_DIRECT in open_check_o_direct() and O_LARGEFILE in
> > > generic_file_open().
> > > 
> > > But it seems logical to set close-on-exec flag on the file
> > > descriptor if userspace is allowed to request it with O_CLOEXEC.
> > > 
> > > In fact, according to some lookup on http://codesearch.debian.net/
> > > and various search engine, there's already some userspace code
> > > requesting it:
> > > 
> > > - in systemd's readahead[2]:
> > > 
> > >     fanotify_fd = fanotify_init(FAN_CLOEXEC|FAN_NONBLOCK, O_RDONLY|O_LARGEFILE|O_CLOEXEC|O_NOATIME);
> > > 
> > > - in clsync[3]:
> > > 
> > >     #define FANOTIFY_EVFLAGS (O_LARGEFILE|O_RDONLY|O_CLOEXEC)
> > > 
> > >     int fanotify_d = fanotify_init(FANOTIFY_FLAGS, FANOTIFY_EVFLAGS);
> > > 
> > > - in examples [4] from "Filesystem monitoring in the Linux
> > >   kernel" article[5] by Aleksander Morgado:
> > > 
> > >     if ((fanotify_fd = fanotify_init (FAN_CLOEXEC,
> > >                                       O_RDONLY | O_CLOEXEC | O_LARGEFILE)) < 0)
> > 
> > So we have a number of apps which are setting O_CLOEXEC, but it doesn't
> > actually work.  With this change it *will* work, so the behaviour of
> > those apps might change, possibly breaking them?
> > 
> 
> In the other hand, not enabling close-on-exec might expose unwanted file
> descriptor to childs, creating security issues. YMMV.
> 

As someone who uses fanotify for content introspection, I can say that
I am _explicitly_ marking the fd obtained via read() as O_CLOEXEC,
because I have encountered a situation where a child managed to create
a deadlock because it kept the fd open after the main application
responded with FAN_ALLOW.

-- 
Mihai Donțu

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH] fanotify: add a flag to allow setting O_CLOEXEC on event fd
  2014-10-02  6:20             ` Yann Droneaud
  2014-10-02  6:50               ` Mihai Donțu
@ 2014-10-02  7:52               ` Yann Droneaud
  2014-10-02  9:13                   ` Pádraig Brady
  1 sibling, 1 reply; 33+ messages in thread
From: Yann Droneaud @ 2014-10-02  7:52 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Yann Droneaud, Heinrich Schuchardt, Eric Paris,
	Richard Guy Briggs, Al Viro, linux-kernel, linux-fsdevel, stable,
	linux-api, Jan Kara, Lino Sanfilippo, Valdis Kletnieks,
	Michael Kerrisk-manpages

In order to not potentially break applications which were
requesting O_CLOEXEC on event file descriptors but which
actually need it to be not effective as the kernel currently
ignore the flag, so the file descriptor is inherited accross
exec regardless of O_CLOEXEC (please forgive me for the
wording), this patch introduces FAN_FD_CLOEXEC flag to
fanotify_init() so that application can request O_CLOEXEC
to be effective.
Newer application would use FAN_FD_CLOEXEC flag along
O_CLOEXEC to enable close on exec on newly created
file descriptor:

  fd = fanotify_init(FAN_CLOEXEC|FAN_NONBLOCK|FAN_FD_CLOEXEC,
                     O_RDONLY|O_LARGEFILE|O_CLOEXEC|O_NOATIME);

Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
---

Hi Andrew,

While I believe fanotify_init() must enable close-on-exec
when requested by userspace to prevent unwelcomed security
issue, I understand your concerns regarding the possible
breakage on userspace application requesting O_CLOEXEC
but relying on it not being enable on file descriptor
created for the events.

So with a new flag to fanotify_init(), we could allow
newer applications to really enable O_CLOEXEC.

But I feel bad to have to force application to specify
twice they want close on exec:
  - are you sure ?
  - are you really sure ?
  - is this your final answer ?
  ...

Regards.

Yann Droneaud
OPTEYA


 fs/notify/fanotify/fanotify_user.c | 6 +++++-
 include/uapi/linux/fanotify.h      | 5 ++++-
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index c991616acca9..3c1fb1412f37 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -78,7 +78,7 @@ static int create_fd(struct fsnotify_group *group,
 
 	pr_debug("%s: group=%p event=%p\n", __func__, group, event);
 
-	client_fd = get_unused_fd_flags(group->fanotify_data.f_flags);
+	client_fd = get_unused_fd_flags(group->fanotify_data.f_flags & O_CLOEXEC);
 	if (client_fd < 0)
 		return client_fd;
 
@@ -706,6 +706,10 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
 		return -EINVAL;
 	}
 
+	if ((event_f_flags & O_CLOEXEC) &&
+	    !(flags & FAN_FD_CLOEXEC))
+		event_f_flags ^= O_CLOEXEC;
+
 	user = get_current_user();
 	if (atomic_read(&user->fanotify_listeners) > FANOTIFY_DEFAULT_MAX_LISTENERS) {
 		free_uid(user);
diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h
index 030508d195d3..f2d517be3152 100644
--- a/include/uapi/linux/fanotify.h
+++ b/include/uapi/linux/fanotify.h
@@ -36,7 +36,10 @@
 #define FAN_UNLIMITED_QUEUE	0x00000010
 #define FAN_UNLIMITED_MARKS	0x00000020
 
-#define FAN_ALL_INIT_FLAGS	(FAN_CLOEXEC | FAN_NONBLOCK | \
+/* flags used for fanotify_init() too */
+#define FAN_FD_CLOEXEC		0x00000100
+
+#define FAN_ALL_INIT_FLAGS	(FAN_CLOEXEC | FAN_NONBLOCK | FAN_FD_CLOEXEC | \
 				 FAN_ALL_CLASS_BITS | FAN_UNLIMITED_QUEUE |\
 				 FAN_UNLIMITED_MARKS)
 
-- 
1.9.3


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] fanotify: add a flag to allow setting O_CLOEXEC on event fd
@ 2014-10-02  9:13                   ` Pádraig Brady
  0 siblings, 0 replies; 33+ messages in thread
From: Pádraig Brady @ 2014-10-02  9:13 UTC (permalink / raw)
  To: Yann Droneaud
  Cc: Andrew Morton, Heinrich Schuchardt, Eric Paris,
	Richard Guy Briggs, Al Viro, linux-kernel, linux-fsdevel, stable,
	linux-api, Jan Kara, Lino Sanfilippo, Valdis Kletnieks,
	Michael Kerrisk-manpages

On 10/02/2014 08:52 AM, Yann Droneaud wrote:
> In order to not potentially break applications which were
> requesting O_CLOEXEC on event file descriptors but which
> actually need it to be not effective as the kernel currently
> ignore the flag, so the file descriptor is inherited accross
> exec regardless of O_CLOEXEC (please forgive me for the
> wording), this patch introduces FAN_FD_CLOEXEC flag to
> fanotify_init() so that application can request O_CLOEXEC
> to be effective.
> Newer application would use FAN_FD_CLOEXEC flag along
> O_CLOEXEC to enable close on exec on newly created
> file descriptor:
> 
>   fd = fanotify_init(FAN_CLOEXEC|FAN_NONBLOCK|FAN_FD_CLOEXEC,
>                      O_RDONLY|O_LARGEFILE|O_CLOEXEC|O_NOATIME);

Ugh really?
IMHO there should be widespread or at least known breakage with
O_CLOEXEC before adding messiness like this.
It seems surprising to me that apps that would depend on
O_CLOEXEC being ineffective.

please reconsider this one.

thanks,
Pádraig.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] fanotify: add a flag to allow setting O_CLOEXEC on event fd
@ 2014-10-02  9:13                   ` Pádraig Brady
  0 siblings, 0 replies; 33+ messages in thread
From: Pádraig Brady @ 2014-10-02  9:13 UTC (permalink / raw)
  To: Yann Droneaud
  Cc: Andrew Morton, Heinrich Schuchardt, Eric Paris,
	Richard Guy Briggs, Al Viro, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	stable-u79uwXL29TY76Z2rM5mHXA, linux-api-u79uwXL29TY76Z2rM5mHXA,
	Jan Kara, Lino Sanfilippo, Valdis Kletnieks,
	Michael Kerrisk-manpages

On 10/02/2014 08:52 AM, Yann Droneaud wrote:
> In order to not potentially break applications which were
> requesting O_CLOEXEC on event file descriptors but which
> actually need it to be not effective as the kernel currently
> ignore the flag, so the file descriptor is inherited accross
> exec regardless of O_CLOEXEC (please forgive me for the
> wording), this patch introduces FAN_FD_CLOEXEC flag to
> fanotify_init() so that application can request O_CLOEXEC
> to be effective.
> Newer application would use FAN_FD_CLOEXEC flag along
> O_CLOEXEC to enable close on exec on newly created
> file descriptor:
> 
>   fd = fanotify_init(FAN_CLOEXEC|FAN_NONBLOCK|FAN_FD_CLOEXEC,
>                      O_RDONLY|O_LARGEFILE|O_CLOEXEC|O_NOATIME);

Ugh really?
IMHO there should be widespread or at least known breakage with
O_CLOEXEC before adding messiness like this.
It seems surprising to me that apps that would depend on
O_CLOEXEC being ineffective.

please reconsider this one.

thanks,
Pádraig.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] fanotify: add a flag to allow setting O_CLOEXEC on event fd
@ 2014-10-02  9:13                   ` Pádraig Brady
  0 siblings, 0 replies; 33+ messages in thread
From: Pádraig Brady @ 2014-10-02  9:13 UTC (permalink / raw)
  To: Yann Droneaud
  Cc: Andrew Morton, Heinrich Schuchardt, Eric Paris,
	Richard Guy Briggs, Al Viro, linux-kernel, linux-fsdevel, stable,
	linux-api, Jan Kara, Lino Sanfilippo, Valdis Kletnieks,
	Michael Kerrisk-manpages

On 10/02/2014 08:52 AM, Yann Droneaud wrote:
> In order to not potentially break applications which were
> requesting O_CLOEXEC on event file descriptors but which
> actually need it to be not effective as the kernel currently
> ignore the flag, so the file descriptor is inherited accross
> exec regardless of O_CLOEXEC (please forgive me for the
> wording), this patch introduces FAN_FD_CLOEXEC flag to
> fanotify_init() so that application can request O_CLOEXEC
> to be effective.
> Newer application would use FAN_FD_CLOEXEC flag along
> O_CLOEXEC to enable close on exec on newly created
> file descriptor:
> 
>   fd = fanotify_init(FAN_CLOEXEC|FAN_NONBLOCK|FAN_FD_CLOEXEC,
>                      O_RDONLY|O_LARGEFILE|O_CLOEXEC|O_NOATIME);

Ugh really?
IMHO there should be widespread or at least known breakage with
O_CLOEXEC before adding messiness like this.
It seems surprising to me that apps that would depend on
O_CLOEXEC being ineffective.

please reconsider this one.

thanks,
P�draig.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCHv8.1] fanotify: enable close-on-exec on events' fd when requested in fanotify_init()
@ 2014-10-02 10:44               ` Jan Kara
  0 siblings, 0 replies; 33+ messages in thread
From: Jan Kara @ 2014-10-02 10:44 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Yann Droneaud, Heinrich Schuchardt, Eric Paris,
	Richard Guy Briggs, Al Viro, linux-kernel, linux-fsdevel, stable,
	linux-api, Jan Kara, Lino Sanfilippo, Valdis Kletnieks,
	Michael Kerrisk-manpages

On Wed 01-10-14 15:36:21, Andrew Morton wrote:
> On Mon, 29 Sep 2014 10:49:15 +0200 Yann Droneaud <ydroneaud@opteya.com> wrote:
> 
> > According to commit 80af258867648 ('fanotify: groups can specify
> > their f_flags for new fd'), file descriptors created as part of
> > file access notification events inherit flags from the
> > event_f_flags argument passed to syscall fanotify_init(2).
> > 
> > So while it is legal for userspace to call fanotify_init() with
> > O_CLOEXEC as part of its second argument, O_CLOEXEC is currently
> > silently ignored.
> > 
> > Indeed event_f_flags are only given to dentry_open(), which only
> > seems to care about O_ACCMODE and O_PATH in do_dentry_open(),
> > O_DIRECT in open_check_o_direct() and O_LARGEFILE in
> > generic_file_open().
> > 
> > But it seems logical to set close-on-exec flag on the file
> > descriptor if userspace is allowed to request it with O_CLOEXEC.
> > 
> > In fact, according to some lookup on http://codesearch.debian.net/
> > and various search engine, there's already some userspace code
> > requesting it:
> > 
> > - in systemd's readahead[2]:
> > 
> >     fanotify_fd = fanotify_init(FAN_CLOEXEC|FAN_NONBLOCK, O_RDONLY|O_LARGEFILE|O_CLOEXEC|O_NOATIME);
> > 
> > - in clsync[3]:
> > 
> >     #define FANOTIFY_EVFLAGS (O_LARGEFILE|O_RDONLY|O_CLOEXEC)
> > 
> >     int fanotify_d = fanotify_init(FANOTIFY_FLAGS, FANOTIFY_EVFLAGS);
> > 
> > - in examples [4] from "Filesystem monitoring in the Linux
> >   kernel" article[5] by Aleksander Morgado:
> > 
> >     if ((fanotify_fd = fanotify_init (FAN_CLOEXEC,
> >                                       O_RDONLY | O_CLOEXEC | O_LARGEFILE)) < 0)
> 
> So we have a number of apps which are setting O_CLOEXEC, but it doesn't
> actually work.  With this change it *will* work, so the behaviour of
> those apps might change, possibly breaking them?
  Possibly. OTOH I'd dare to say that most of the apps specifying O_CLOEXEC
want that behavior and their security may be weakened by the fact that
O_CLOEXEC is ignored. So we are weighting possible security issues for apps
doing things right (and Mihai mentioned in this thread that at least he has
an application which needs O_CLOEXEC working) against possible breakage for
apps which just randomly set O_CLOEXEC without wanting. So I'm really for
fixing O_CLOEXEC behavior.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCHv8.1] fanotify: enable close-on-exec on events' fd when requested in fanotify_init()
@ 2014-10-02 10:44               ` Jan Kara
  0 siblings, 0 replies; 33+ messages in thread
From: Jan Kara @ 2014-10-02 10:44 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Yann Droneaud, Heinrich Schuchardt, Eric Paris,
	Richard Guy Briggs, Al Viro, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	stable-u79uwXL29TY76Z2rM5mHXA, linux-api-u79uwXL29TY76Z2rM5mHXA,
	Jan Kara, Lino Sanfilippo, Valdis Kletnieks,
	Michael Kerrisk-manpages

On Wed 01-10-14 15:36:21, Andrew Morton wrote:
> On Mon, 29 Sep 2014 10:49:15 +0200 Yann Droneaud <ydroneaud-RlY5vtjFyJ3QT0dZR+AlfA@public.gmane.org> wrote:
> 
> > According to commit 80af258867648 ('fanotify: groups can specify
> > their f_flags for new fd'), file descriptors created as part of
> > file access notification events inherit flags from the
> > event_f_flags argument passed to syscall fanotify_init(2).
> > 
> > So while it is legal for userspace to call fanotify_init() with
> > O_CLOEXEC as part of its second argument, O_CLOEXEC is currently
> > silently ignored.
> > 
> > Indeed event_f_flags are only given to dentry_open(), which only
> > seems to care about O_ACCMODE and O_PATH in do_dentry_open(),
> > O_DIRECT in open_check_o_direct() and O_LARGEFILE in
> > generic_file_open().
> > 
> > But it seems logical to set close-on-exec flag on the file
> > descriptor if userspace is allowed to request it with O_CLOEXEC.
> > 
> > In fact, according to some lookup on http://codesearch.debian.net/
> > and various search engine, there's already some userspace code
> > requesting it:
> > 
> > - in systemd's readahead[2]:
> > 
> >     fanotify_fd = fanotify_init(FAN_CLOEXEC|FAN_NONBLOCK, O_RDONLY|O_LARGEFILE|O_CLOEXEC|O_NOATIME);
> > 
> > - in clsync[3]:
> > 
> >     #define FANOTIFY_EVFLAGS (O_LARGEFILE|O_RDONLY|O_CLOEXEC)
> > 
> >     int fanotify_d = fanotify_init(FANOTIFY_FLAGS, FANOTIFY_EVFLAGS);
> > 
> > - in examples [4] from "Filesystem monitoring in the Linux
> >   kernel" article[5] by Aleksander Morgado:
> > 
> >     if ((fanotify_fd = fanotify_init (FAN_CLOEXEC,
> >                                       O_RDONLY | O_CLOEXEC | O_LARGEFILE)) < 0)
> 
> So we have a number of apps which are setting O_CLOEXEC, but it doesn't
> actually work.  With this change it *will* work, so the behaviour of
> those apps might change, possibly breaking them?
  Possibly. OTOH I'd dare to say that most of the apps specifying O_CLOEXEC
want that behavior and their security may be weakened by the fact that
O_CLOEXEC is ignored. So we are weighting possible security issues for apps
doing things right (and Mihai mentioned in this thread that at least he has
an application which needs O_CLOEXEC working) against possible breakage for
apps which just randomly set O_CLOEXEC without wanting. So I'm really for
fixing O_CLOEXEC behavior.

								Honza
-- 
Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] fanotify: add a flag to allow setting O_CLOEXEC on event fd
  2014-10-02  9:13                   ` Pádraig Brady
  (?)
  (?)
@ 2014-10-02 11:55                   ` Michael Kerrisk (man-pages)
  -1 siblings, 0 replies; 33+ messages in thread
From: Michael Kerrisk (man-pages) @ 2014-10-02 11:55 UTC (permalink / raw)
  To: Pádraig Brady
  Cc: Yann Droneaud, Andrew Morton, Heinrich Schuchardt, Eric Paris,
	Richard Guy Briggs, Al Viro, lkml, linux-fsdevel, stable,
	Linux API, Jan Kara, Lino Sanfilippo, Valdis Kletnieks

On Thu, Oct 2, 2014 at 11:13 AM, Pádraig Brady <P@draigbrady.com> wrote:
> On 10/02/2014 08:52 AM, Yann Droneaud wrote:
>> In order to not potentially break applications which were
>> requesting O_CLOEXEC on event file descriptors but which
>> actually need it to be not effective as the kernel currently
>> ignore the flag, so the file descriptor is inherited accross
>> exec regardless of O_CLOEXEC (please forgive me for the
>> wording), this patch introduces FAN_FD_CLOEXEC flag to
>> fanotify_init() so that application can request O_CLOEXEC
>> to be effective.
>> Newer application would use FAN_FD_CLOEXEC flag along
>> O_CLOEXEC to enable close on exec on newly created
>> file descriptor:
>>
>>   fd = fanotify_init(FAN_CLOEXEC|FAN_NONBLOCK|FAN_FD_CLOEXEC,
>>                      O_RDONLY|O_LARGEFILE|O_CLOEXEC|O_NOATIME);
>
> Ugh really?
> IMHO there should be widespread or at least known breakage with
> O_CLOEXEC before adding messiness like this.
> It seems surprising to me that apps that would depend on
> O_CLOEXEC being ineffective.
>
> please reconsider this one.

Agreed. The number of applications for which there are silent (not
*yet* observed) breakages because O_CLOEXEC is not working as expected
is likely rather larger than the set of applications that randomly
specify O_CLOEXEC and then somehow get broken as a result.

Cheers,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] fanotify: add a flag to allow setting O_CLOEXEC on event fd
@ 2014-10-02 14:44                     ` Yann Droneaud
  0 siblings, 0 replies; 33+ messages in thread
From: Yann Droneaud @ 2014-10-02 14:44 UTC (permalink / raw)
  To: Pádraig Brady
  Cc: Andrew Morton, Heinrich Schuchardt, Eric Paris,
	Richard Guy Briggs, Al Viro, linux-kernel, linux-fsdevel, stable,
	linux-api, Jan Kara, Lino Sanfilippo, Valdis Kletnieks,
	Michael Kerrisk-manpages, Yann Droneaud

Hi,

Le jeudi 02 octobre 2014 à 10:13 +0100, Pádraig Brady a écrit :
> On 10/02/2014 08:52 AM, Yann Droneaud wrote:
> > In order to not potentially break applications which were
> > requesting O_CLOEXEC on event file descriptors but which
> > actually need it to be not effective as the kernel currently
> > ignore the flag, so the file descriptor is inherited accross
> > exec regardless of O_CLOEXEC (please forgive me for the
> > wording), this patch introduces FAN_FD_CLOEXEC flag to
> > fanotify_init() so that application can request O_CLOEXEC
> > to be effective.
> > Newer application would use FAN_FD_CLOEXEC flag along
> > O_CLOEXEC to enable close on exec on newly created
> > file descriptor:
> > 
> >   fd = fanotify_init(FAN_CLOEXEC|FAN_NONBLOCK|FAN_FD_CLOEXEC,
> >                      O_RDONLY|O_LARGEFILE|O_CLOEXEC|O_NOATIME);
> 
> Ugh really?
> IMHO there should be widespread or at least known breakage with
> O_CLOEXEC before adding messiness like this.

You should have read the other part of my message:

> > While I believe fanotify_init() must enable close-on-exec
> > when requested by userspace to prevent unwelcomed security
> > issue, I understand your concerns regarding the possible
> > breakage on userspace application requesting O_CLOEXEC
> > but relying on it not being enable on file descriptor
> > created for the events.
> 
> > So with a new flag to fanotify_init(), we could allow
> > newer applications to really enable O_CLOEXEC.
> 
> > But I feel bad to have to force application to specify
> > twice they want close on exec:
> >  - are you sure ?
> >  - are you really sure ?
> >  - is this your final answer ?
> >  ...


I'm not really fond of this option.

> It seems surprising to me that apps that would depend on
> O_CLOEXEC being ineffective.
> 

We have seen userspace developers making mistakes, and those mistakes
were mistakenly ignored by the kernel until someone try to fix the
mistake on kernel side, which broke the existing userspace application.

> please reconsider this one.
> 

I'm not going to promote this patch as it's a quick and dirty hack to
demonstrate what would be the other option.

Regards.

-- 
Yann Droneaud
OPTEYA



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] fanotify: add a flag to allow setting O_CLOEXEC on event fd
@ 2014-10-02 14:44                     ` Yann Droneaud
  0 siblings, 0 replies; 33+ messages in thread
From: Yann Droneaud @ 2014-10-02 14:44 UTC (permalink / raw)
  To: Pádraig Brady
  Cc: Andrew Morton, Heinrich Schuchardt, Eric Paris,
	Richard Guy Briggs, Al Viro, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	stable-u79uwXL29TY76Z2rM5mHXA, linux-api-u79uwXL29TY76Z2rM5mHXA,
	Jan Kara, Lino Sanfilippo, Valdis Kletnieks,
	Michael Kerrisk-manpages, Yann Droneaud

Hi,

Le jeudi 02 octobre 2014 à 10:13 +0100, Pádraig Brady a écrit :
> On 10/02/2014 08:52 AM, Yann Droneaud wrote:
> > In order to not potentially break applications which were
> > requesting O_CLOEXEC on event file descriptors but which
> > actually need it to be not effective as the kernel currently
> > ignore the flag, so the file descriptor is inherited accross
> > exec regardless of O_CLOEXEC (please forgive me for the
> > wording), this patch introduces FAN_FD_CLOEXEC flag to
> > fanotify_init() so that application can request O_CLOEXEC
> > to be effective.
> > Newer application would use FAN_FD_CLOEXEC flag along
> > O_CLOEXEC to enable close on exec on newly created
> > file descriptor:
> > 
> >   fd = fanotify_init(FAN_CLOEXEC|FAN_NONBLOCK|FAN_FD_CLOEXEC,
> >                      O_RDONLY|O_LARGEFILE|O_CLOEXEC|O_NOATIME);
> 
> Ugh really?
> IMHO there should be widespread or at least known breakage with
> O_CLOEXEC before adding messiness like this.

You should have read the other part of my message:

> > While I believe fanotify_init() must enable close-on-exec
> > when requested by userspace to prevent unwelcomed security
> > issue, I understand your concerns regarding the possible
> > breakage on userspace application requesting O_CLOEXEC
> > but relying on it not being enable on file descriptor
> > created for the events.
> 
> > So with a new flag to fanotify_init(), we could allow
> > newer applications to really enable O_CLOEXEC.
> 
> > But I feel bad to have to force application to specify
> > twice they want close on exec:
> >  - are you sure ?
> >  - are you really sure ?
> >  - is this your final answer ?
> >  ...


I'm not really fond of this option.

> It seems surprising to me that apps that would depend on
> O_CLOEXEC being ineffective.
> 

We have seen userspace developers making mistakes, and those mistakes
were mistakenly ignored by the kernel until someone try to fix the
mistake on kernel side, which broke the existing userspace application.

> please reconsider this one.
> 

I'm not going to promote this patch as it's a quick and dirty hack to
demonstrate what would be the other option.

Regards.

-- 
Yann Droneaud
OPTEYA

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCHv8.1] fanotify: enable close-on-exec on events' fd when requested in fanotify_init()
@ 2014-10-02 19:46                 ` Andrew Morton
  0 siblings, 0 replies; 33+ messages in thread
From: Andrew Morton @ 2014-10-02 19:46 UTC (permalink / raw)
  To: Jan Kara
  Cc: Yann Droneaud, Heinrich Schuchardt, Eric Paris,
	Richard Guy Briggs, Al Viro, linux-kernel, linux-fsdevel, stable,
	linux-api, Lino Sanfilippo, Valdis Kletnieks,
	Michael Kerrisk-manpages

On Thu, 2 Oct 2014 12:44:10 +0200 Jan Kara <jack@suse.cz> wrote:

> On Wed 01-10-14 15:36:21, Andrew Morton wrote:
> > On Mon, 29 Sep 2014 10:49:15 +0200 Yann Droneaud <ydroneaud@opteya.com> wrote:
> > 
> > > According to commit 80af258867648 ('fanotify: groups can specify
> > > their f_flags for new fd'), file descriptors created as part of
> > > file access notification events inherit flags from the
> > > event_f_flags argument passed to syscall fanotify_init(2).
> > > 
> > > So while it is legal for userspace to call fanotify_init() with
> > > O_CLOEXEC as part of its second argument, O_CLOEXEC is currently
> > > silently ignored.
> > > 
> > > Indeed event_f_flags are only given to dentry_open(), which only
> > > seems to care about O_ACCMODE and O_PATH in do_dentry_open(),
> > > O_DIRECT in open_check_o_direct() and O_LARGEFILE in
> > > generic_file_open().
> > > 
> > > But it seems logical to set close-on-exec flag on the file
> > > descriptor if userspace is allowed to request it with O_CLOEXEC.
> > > 
> > > In fact, according to some lookup on http://codesearch.debian.net/
> > > and various search engine, there's already some userspace code
> > > requesting it:
> > > 
> > > - in systemd's readahead[2]:
> > > 
> > >     fanotify_fd = fanotify_init(FAN_CLOEXEC|FAN_NONBLOCK, O_RDONLY|O_LARGEFILE|O_CLOEXEC|O_NOATIME);
> > > 
> > > - in clsync[3]:
> > > 
> > >     #define FANOTIFY_EVFLAGS (O_LARGEFILE|O_RDONLY|O_CLOEXEC)
> > > 
> > >     int fanotify_d = fanotify_init(FANOTIFY_FLAGS, FANOTIFY_EVFLAGS);
> > > 
> > > - in examples [4] from "Filesystem monitoring in the Linux
> > >   kernel" article[5] by Aleksander Morgado:
> > > 
> > >     if ((fanotify_fd = fanotify_init (FAN_CLOEXEC,
> > >                                       O_RDONLY | O_CLOEXEC | O_LARGEFILE)) < 0)
> > 
> > So we have a number of apps which are setting O_CLOEXEC, but it doesn't
> > actually work.  With this change it *will* work, so the behaviour of
> > those apps might change, possibly breaking them?
>   Possibly. OTOH I'd dare to say that most of the apps specifying O_CLOEXEC
> want that behavior and their security may be weakened by the fact that
> O_CLOEXEC is ignored. So we are weighting possible security issues for apps
> doing things right (and Mihai mentioned in this thread that at least he has
> an application which needs O_CLOEXEC working) against possible breakage for
> apps which just randomly set O_CLOEXEC without wanting. So I'm really for
> fixing O_CLOEXEC behavior.

Fair enough, it sounds like the risk is acceptable.

Can we get a new version sent out with all this new info appropriately
changelogged?


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCHv8.1] fanotify: enable close-on-exec on events' fd when requested in fanotify_init()
@ 2014-10-02 19:46                 ` Andrew Morton
  0 siblings, 0 replies; 33+ messages in thread
From: Andrew Morton @ 2014-10-02 19:46 UTC (permalink / raw)
  To: Jan Kara
  Cc: Yann Droneaud, Heinrich Schuchardt, Eric Paris,
	Richard Guy Briggs, Al Viro, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	stable-u79uwXL29TY76Z2rM5mHXA, linux-api-u79uwXL29TY76Z2rM5mHXA,
	Lino Sanfilippo, Valdis Kletnieks, Michael Kerrisk-manpages

On Thu, 2 Oct 2014 12:44:10 +0200 Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org> wrote:

> On Wed 01-10-14 15:36:21, Andrew Morton wrote:
> > On Mon, 29 Sep 2014 10:49:15 +0200 Yann Droneaud <ydroneaud-RlY5vtjFyJ3QT0dZR+AlfA@public.gmane.org> wrote:
> > 
> > > According to commit 80af258867648 ('fanotify: groups can specify
> > > their f_flags for new fd'), file descriptors created as part of
> > > file access notification events inherit flags from the
> > > event_f_flags argument passed to syscall fanotify_init(2).
> > > 
> > > So while it is legal for userspace to call fanotify_init() with
> > > O_CLOEXEC as part of its second argument, O_CLOEXEC is currently
> > > silently ignored.
> > > 
> > > Indeed event_f_flags are only given to dentry_open(), which only
> > > seems to care about O_ACCMODE and O_PATH in do_dentry_open(),
> > > O_DIRECT in open_check_o_direct() and O_LARGEFILE in
> > > generic_file_open().
> > > 
> > > But it seems logical to set close-on-exec flag on the file
> > > descriptor if userspace is allowed to request it with O_CLOEXEC.
> > > 
> > > In fact, according to some lookup on http://codesearch.debian.net/
> > > and various search engine, there's already some userspace code
> > > requesting it:
> > > 
> > > - in systemd's readahead[2]:
> > > 
> > >     fanotify_fd = fanotify_init(FAN_CLOEXEC|FAN_NONBLOCK, O_RDONLY|O_LARGEFILE|O_CLOEXEC|O_NOATIME);
> > > 
> > > - in clsync[3]:
> > > 
> > >     #define FANOTIFY_EVFLAGS (O_LARGEFILE|O_RDONLY|O_CLOEXEC)
> > > 
> > >     int fanotify_d = fanotify_init(FANOTIFY_FLAGS, FANOTIFY_EVFLAGS);
> > > 
> > > - in examples [4] from "Filesystem monitoring in the Linux
> > >   kernel" article[5] by Aleksander Morgado:
> > > 
> > >     if ((fanotify_fd = fanotify_init (FAN_CLOEXEC,
> > >                                       O_RDONLY | O_CLOEXEC | O_LARGEFILE)) < 0)
> > 
> > So we have a number of apps which are setting O_CLOEXEC, but it doesn't
> > actually work.  With this change it *will* work, so the behaviour of
> > those apps might change, possibly breaking them?
>   Possibly. OTOH I'd dare to say that most of the apps specifying O_CLOEXEC
> want that behavior and their security may be weakened by the fact that
> O_CLOEXEC is ignored. So we are weighting possible security issues for apps
> doing things right (and Mihai mentioned in this thread that at least he has
> an application which needs O_CLOEXEC working) against possible breakage for
> apps which just randomly set O_CLOEXEC without wanting. So I'm really for
> fixing O_CLOEXEC behavior.

Fair enough, it sounds like the risk is acceptable.

Can we get a new version sent out with all this new info appropriately
changelogged?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCHv8.2] fanotify: enable close-on-exec on events' fd when requested in fanotify_init()
@ 2014-10-03  8:39                   ` Yann Droneaud
  0 siblings, 0 replies; 33+ messages in thread
From: Yann Droneaud @ 2014-10-03  8:39 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Yann Droneaud, Jan Kara, Heinrich Schuchardt, Eric Paris,
	Richard Guy Briggs, Al Viro, linux-kernel, linux-fsdevel, stable,
	linux-api, Lino Sanfilippo, Valdis Kletnieks,
	Michael Kerrisk-manpages, Mihai Donțu, Pádraig Brady

According to commit 80af258867648 ('fanotify: groups can specify
their f_flags for new fd'), file descriptors created as part of
file access notification events inherit flags from the
event_f_flags argument passed to syscall fanotify_init(2)[1].

Unfortunately O_CLOEXEC is currently silently ignored.

Indeed, event_f_flags are only given to dentry_open(), which only
seems to care about O_ACCMODE and O_PATH in do_dentry_open(),
O_DIRECT in open_check_o_direct() and O_LARGEFILE in
generic_file_open().

It's a pity, since, according to some lookup on various search
engines and http://codesearch.debian.net/, there's already some
userspace code which use O_CLOEXEC:

- in systemd's readahead[2]:

    fanotify_fd = fanotify_init(FAN_CLOEXEC|FAN_NONBLOCK, O_RDONLY|O_LARGEFILE|O_CLOEXEC|O_NOATIME);

- in clsync[3]:

    #define FANOTIFY_EVFLAGS (O_LARGEFILE|O_RDONLY|O_CLOEXEC)

    int fanotify_d = fanotify_init(FANOTIFY_FLAGS, FANOTIFY_EVFLAGS);

- in examples [4] from "Filesystem monitoring in the Linux
  kernel" article[5] by Aleksander Morgado:

    if ((fanotify_fd = fanotify_init (FAN_CLOEXEC,
                                      O_RDONLY | O_CLOEXEC | O_LARGEFILE)) < 0)

Additionally, since commit 48149e9d3a7e ('fanotify: check file
flags passed in fanotify_init'). having O_CLOEXEC as part of
fanotify_init() second argument is expressly allowed.

So it seems expected to set close-on-exec flag on the file
descriptors if userspace is allowed to request it with O_CLOEXEC.

But Andrew Morton raised[6] the concern that enabling now
close-on-exec might break existing applications which ask for
O_CLOEXEC but expect the file descriptor to be inherited
across exec().

In the other hand, as reported by Mihai Donțu[7], not setting
close-on-exec on the file descriptor returned as part of file
access notify can break applications due to deadlock.
So close-on-exec is needed for most applications.

More, applications asking for close-on-exec are likely expecting
it to be enabled, relying on O_CLOEXEC being effective. If not,
it might weaken their security, as noted by Jan Kara[8].

So this patch replaces call to macro get_unused_fd() by a call
to function get_unused_fd_flags() with event_f_flags value as
argument. This way O_CLOEXEC flag in the second argument of
fanotify_init(2) syscall is interpreted and close-on-exec
get enabled when requested.

[1] http://man7.org/linux/man-pages/man2/fanotify_init.2.html
[2] http://cgit.freedesktop.org/systemd/systemd/tree/src/readahead/readahead-collect.c?id=v208#n294
[3] https://github.com/xaionaro/clsync/blob/v0.2.1/sync.c#L1631
    https://github.com/xaionaro/clsync/blob/v0.2.1/configuration.h#L38
[4] http://www.lanedo.com/~aleksander/fanotify/fanotify-example.c
[5] http://www.lanedo.com/2013/filesystem-monitoring-linux-kernel/
[6] http://lkml.kernel.org/r/20141001153621.65e9258e65a6167bf2e4cb50@linux-foundation.org
[7] http://lkml.kernel.org/r/20141002095046.3715eb69@mdontu-l
[8] http://lkml.kernel.org/r/20141002104410.GB19748@quack.suse.cz

Link: http://lkml.kernel.org/r/cover.1411562410.git.ydroneaud@opteya.com
Cc: Mihai Donțu <mihai.dontu@gmail.com>
Cc: Pádraig Brady <P@draigBrady.com>
Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>
Cc: Jan Kara <jack@suse.cz>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Cc: Michael Kerrisk-manpages <mtk.manpages@gmail.com>
Cc: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Cc: Richard Guy Briggs <rgb@redhat.com>
Cc: Eric Paris <eparis@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: stable@vger.kernel.org
Cc: linux-api@vger.kernel.org
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Tested-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
---
Hi Andrew,

> Fair enough, it sounds like the risk is acceptable.
>

OK.

> Can we get a new version sent out with all this new info appropriately
> changelogged?
>

Of course !

Please find an updated patch with revamped commit message.

Changes from v8.1:

- added more Cc:
- added Reviewed-by:
- rewrote commit message.

 fs/notify/fanotify/fanotify_user.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index b13992a41bd9..c991616acca9 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -78,7 +78,7 @@ static int create_fd(struct fsnotify_group *group,
 
 	pr_debug("%s: group=%p event=%p\n", __func__, group, event);
 
-	client_fd = get_unused_fd();
+	client_fd = get_unused_fd_flags(group->fanotify_data.f_flags);
 	if (client_fd < 0)
 		return client_fd;
 
-- 
1.9.3


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCHv8.2] fanotify: enable close-on-exec on events' fd when requested in fanotify_init()
@ 2014-10-03  8:39                   ` Yann Droneaud
  0 siblings, 0 replies; 33+ messages in thread
From: Yann Droneaud @ 2014-10-03  8:39 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Yann Droneaud, Jan Kara, Heinrich Schuchardt, Eric Paris,
	Richard Guy Briggs, Al Viro, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	stable-u79uwXL29TY76Z2rM5mHXA, linux-api-u79uwXL29TY76Z2rM5mHXA,
	Lino Sanfilippo, Valdis Kletnieks, Michael Kerrisk-manpages,
	Mihai Donțu, Pádraig Brady

According to commit 80af258867648 ('fanotify: groups can specify
their f_flags for new fd'), file descriptors created as part of
file access notification events inherit flags from the
event_f_flags argument passed to syscall fanotify_init(2)[1].

Unfortunately O_CLOEXEC is currently silently ignored.

Indeed, event_f_flags are only given to dentry_open(), which only
seems to care about O_ACCMODE and O_PATH in do_dentry_open(),
O_DIRECT in open_check_o_direct() and O_LARGEFILE in
generic_file_open().

It's a pity, since, according to some lookup on various search
engines and http://codesearch.debian.net/, there's already some
userspace code which use O_CLOEXEC:

- in systemd's readahead[2]:

    fanotify_fd = fanotify_init(FAN_CLOEXEC|FAN_NONBLOCK, O_RDONLY|O_LARGEFILE|O_CLOEXEC|O_NOATIME);

- in clsync[3]:

    #define FANOTIFY_EVFLAGS (O_LARGEFILE|O_RDONLY|O_CLOEXEC)

    int fanotify_d = fanotify_init(FANOTIFY_FLAGS, FANOTIFY_EVFLAGS);

- in examples [4] from "Filesystem monitoring in the Linux
  kernel" article[5] by Aleksander Morgado:

    if ((fanotify_fd = fanotify_init (FAN_CLOEXEC,
                                      O_RDONLY | O_CLOEXEC | O_LARGEFILE)) < 0)

Additionally, since commit 48149e9d3a7e ('fanotify: check file
flags passed in fanotify_init'). having O_CLOEXEC as part of
fanotify_init() second argument is expressly allowed.

So it seems expected to set close-on-exec flag on the file
descriptors if userspace is allowed to request it with O_CLOEXEC.

But Andrew Morton raised[6] the concern that enabling now
close-on-exec might break existing applications which ask for
O_CLOEXEC but expect the file descriptor to be inherited
across exec().

In the other hand, as reported by Mihai Donțu[7], not setting
close-on-exec on the file descriptor returned as part of file
access notify can break applications due to deadlock.
So close-on-exec is needed for most applications.

More, applications asking for close-on-exec are likely expecting
it to be enabled, relying on O_CLOEXEC being effective. If not,
it might weaken their security, as noted by Jan Kara[8].

So this patch replaces call to macro get_unused_fd() by a call
to function get_unused_fd_flags() with event_f_flags value as
argument. This way O_CLOEXEC flag in the second argument of
fanotify_init(2) syscall is interpreted and close-on-exec
get enabled when requested.

[1] http://man7.org/linux/man-pages/man2/fanotify_init.2.html
[2] http://cgit.freedesktop.org/systemd/systemd/tree/src/readahead/readahead-collect.c?id=v208#n294
[3] https://github.com/xaionaro/clsync/blob/v0.2.1/sync.c#L1631
    https://github.com/xaionaro/clsync/blob/v0.2.1/configuration.h#L38
[4] http://www.lanedo.com/~aleksander/fanotify/fanotify-example.c
[5] http://www.lanedo.com/2013/filesystem-monitoring-linux-kernel/
[6] http://lkml.kernel.org/r/20141001153621.65e9258e65a6167bf2e4cb50@linux-foundation.org
[7] http://lkml.kernel.org/r/20141002095046.3715eb69@mdontu-l
[8] http://lkml.kernel.org/r/20141002104410.GB19748-+0h/O2h83AeN3ZZ/Hiejyg@public.gmane.org

Link: http://lkml.kernel.org/r/cover.1411562410.git.ydroneaud-RlY5vtjFyJ1hl2p70BpVqQ@public.gmane.orgm
Cc: Mihai Donțu <mihai.dontu-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Cc: Pádraig Brady <P@draigBrady.com>
Cc: Heinrich Schuchardt <xypron.glpk-Mmb7MZpHnFY@public.gmane.org>
Cc: Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>
Cc: Valdis Kletnieks <Valdis.Kletnieks-PjAqaU27lzQ@public.gmane.org>
Cc: Michael Kerrisk-manpages <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Cc: Lino Sanfilippo <LinoSanfilippo-Mmb7MZpHnFY@public.gmane.org>
Cc: Richard Guy Briggs <rgb-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: Eric Paris <eparis-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: Al Viro <viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
Cc: Andrew Morton <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
Cc: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Cc: linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Reviewed-by: Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>
Reviewed by: Heinrich Schuchardt <xypron.glpk-Mmb7MZpHnFY@public.gmane.org>
Tested-by: Heinrich Schuchardt <xypron.glpk-Mmb7MZpHnFY@public.gmane.org>
Signed-off-by: Yann Droneaud <ydroneaud-RlY5vtjFyJ3QT0dZR+AlfA@public.gmane.org>
---
Hi Andrew,

> Fair enough, it sounds like the risk is acceptable.
>

OK.

> Can we get a new version sent out with all this new info appropriately
> changelogged?
>

Of course !

Please find an updated patch with revamped commit message.

Changes from v8.1:

- added more Cc:
- added Reviewed-by:
- rewrote commit message.

 fs/notify/fanotify/fanotify_user.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index b13992a41bd9..c991616acca9 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -78,7 +78,7 @@ static int create_fd(struct fsnotify_group *group,
 
 	pr_debug("%s: group=%p event=%p\n", __func__, group, event);
 
-	client_fd = get_unused_fd();
+	client_fd = get_unused_fd_flags(group->fanotify_data.f_flags);
 	if (client_fd < 0)
 		return client_fd;
 
-- 
1.9.3

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2014-10-03  8:44 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-24 13:11 [PATCHv8 0/6] Getting rid of get_unused_fd() / enable close-on-exec Yann Droneaud
2014-09-24 13:11 ` Yann Droneaud
2014-09-24 13:11 ` Yann Droneaud
2014-09-24 13:11 ` [PATCHv8 1/6] fanotify: enable close-on-exec on events' fd when requested in fanotify_init() Yann Droneaud
2014-09-25 20:57   ` Heinrich Schuchardt
2014-09-25 20:57     ` Heinrich Schuchardt
2014-09-26  8:58     ` Yann Droneaud
2014-09-27  7:26       ` Heinrich Schuchardt
2014-09-29  8:49         ` [PATCHv8.1] " Yann Droneaud
2014-09-29 11:50           ` Jan Kara
2014-10-01 22:36           ` Andrew Morton
2014-10-01 22:36             ` Andrew Morton
2014-10-02  6:20             ` Yann Droneaud
2014-10-02  6:50               ` Mihai Donțu
2014-10-02  7:52               ` [PATCH] fanotify: add a flag to allow setting O_CLOEXEC on event fd Yann Droneaud
2014-10-02  9:13                 ` Pádraig Brady
2014-10-02  9:13                   ` Pádraig Brady
2014-10-02  9:13                   ` Pádraig Brady
2014-10-02 11:55                   ` Michael Kerrisk (man-pages)
2014-10-02 14:44                   ` Yann Droneaud
2014-10-02 14:44                     ` Yann Droneaud
2014-10-02 10:44             ` [PATCHv8.1] fanotify: enable close-on-exec on events' fd when requested in fanotify_init() Jan Kara
2014-10-02 10:44               ` Jan Kara
2014-10-02 19:46               ` Andrew Morton
2014-10-02 19:46                 ` Andrew Morton
2014-10-03  8:39                 ` [PATCHv8.2] " Yann Droneaud
2014-10-03  8:39                   ` Yann Droneaud
2014-09-24 13:11 ` [PATCHv8 2/6] ia64: trivial: replace get_unused_fd() by get_unused_fd_flags(0) Yann Droneaud
2014-09-24 13:11 ` [PATCHv8 3/6] ppc/cell: " Yann Droneaud
2014-09-24 13:11   ` Yann Droneaud
2014-09-24 13:11 ` [PATCHv8 4/6] binfmt_misc: " Yann Droneaud
2014-09-24 13:11 ` [PATCHv8 5/6] file: " Yann Droneaud
2014-09-24 13:11 ` [PATCHv8 6/6] file: remove get_unused_fd() macro Yann Droneaud

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.