linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Proposal to fix pwrite with O_APPEND via pwritev2 flag
@ 2020-01-24  0:02 Rich Felker
  2020-01-24  9:37 ` Florian Weimer
  2020-02-05  4:42 ` Rich Felker
  0 siblings, 2 replies; 4+ messages in thread
From: Rich Felker @ 2020-01-24  0:02 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel, linux-api; +Cc: Alexander Viro

There's a longstanding unfixable (due to API stability) bug in the
pwrite syscall:

http://man7.org/linux/man-pages/man2/pwrite.2.html#BUGS

whereby it wrongly honors O_APPEND if set, ignoring the caller-passed
offset. Now that there's a pwritev2 syscall that takes a flags
argument, it's possible to fix this without breaking stability by
adding a new RWF_NOAPPEND flag, which callers that want the fixed
behavior can then pass.

I have a completely untested patch to add such a flag, but would like
to get a feel for whether the concept is acceptable before putting
time into testing it. If so, I'll submit this as a proper patch with
detailed commit message etc. Draft is below.

Rich



diff --git a/include/linux/fs.h b/include/linux/fs.h
index e0d909d35763..3a769a972f79 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3397,6 +3397,8 @@ static inline int kiocb_set_rw_flags(struct kiocb *ki, rwf_t flags)
 {
 	if (unlikely(flags & ~RWF_SUPPORTED))
 		return -EOPNOTSUPP;
+	if (unlikely((flags & RWF_APPEND) && (flags & RWF_NOAPPEND)))
+		return -EINVAL;
 
 	if (flags & RWF_NOWAIT) {
 		if (!(ki->ki_filp->f_mode & FMODE_NOWAIT))
@@ -3411,6 +3413,8 @@ static inline int kiocb_set_rw_flags(struct kiocb *ki, rwf_t flags)
 		ki->ki_flags |= (IOCB_DSYNC | IOCB_SYNC);
 	if (flags & RWF_APPEND)
 		ki->ki_flags |= IOCB_APPEND;
+	if (flags & RWF_NOAPPEND)
+		ki->ki_flags &= ~IOCB_APPEND;
 	return 0;
 }
 
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index 379a612f8f1d..591357d9b3c9 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -299,8 +299,11 @@ typedef int __bitwise __kernel_rwf_t;
 /* per-IO O_APPEND */
 #define RWF_APPEND	((__force __kernel_rwf_t)0x00000010)
 
+/* per-IO negation of O_APPEND */
+#define RWF_NOAPPEND	((__force __kernel_rwf_t)0x00000020)
+
 /* mask of flags supported by the kernel */
 #define RWF_SUPPORTED	(RWF_HIPRI | RWF_DSYNC | RWF_SYNC | RWF_NOWAIT |\
-			 RWF_APPEND)
+			 RWF_APPEND | RWF_NOAPPEND)
 
 #endif /* _UAPI_LINUX_FS_H */

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: Proposal to fix pwrite with O_APPEND via pwritev2 flag
  2020-01-24  0:02 Proposal to fix pwrite with O_APPEND via pwritev2 flag Rich Felker
@ 2020-01-24  9:37 ` Florian Weimer
  2020-01-24 14:07   ` Rich Felker
  2020-02-05  4:42 ` Rich Felker
  1 sibling, 1 reply; 4+ messages in thread
From: Florian Weimer @ 2020-01-24  9:37 UTC (permalink / raw)
  To: Rich Felker; +Cc: linux-fsdevel, linux-kernel, linux-api, Alexander Viro

* Rich Felker:

> There's a longstanding unfixable (due to API stability) bug in the
> pwrite syscall:
>
> http://man7.org/linux/man-pages/man2/pwrite.2.html#BUGS
>
> whereby it wrongly honors O_APPEND if set, ignoring the caller-passed
> offset. Now that there's a pwritev2 syscall that takes a flags
> argument, it's possible to fix this without breaking stability by
> adding a new RWF_NOAPPEND flag, which callers that want the fixed
> behavior can then pass.
>
> I have a completely untested patch to add such a flag, but would like
> to get a feel for whether the concept is acceptable before putting
> time into testing it. If so, I'll submit this as a proper patch with
> detailed commit message etc. Draft is below.

Has this come up before?

I had already written a test case and it turns out that an O_APPEND
descriptor does not protect the previously written data in the file:

openat(AT_FDCWD, "/tmp/append-truncateuoRexJ", O_RDWR|O_CREAT|O_EXCL, 0600) = 3
write(3, "@", 1)                        = 1
close(3)                                = 0
openat(AT_FDCWD, "/tmp/append-truncateuoRexJ", O_WRONLY|O_APPEND) = 3
ftruncate(3, 0)                         = 0

So at least it looks like there is no security issue in adding a
RWF_NOAPPEND flag.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Proposal to fix pwrite with O_APPEND via pwritev2 flag
  2020-01-24  9:37 ` Florian Weimer
@ 2020-01-24 14:07   ` Rich Felker
  0 siblings, 0 replies; 4+ messages in thread
From: Rich Felker @ 2020-01-24 14:07 UTC (permalink / raw)
  To: Florian Weimer; +Cc: linux-fsdevel, linux-kernel, linux-api, Alexander Viro

On Fri, Jan 24, 2020 at 10:37:22AM +0100, Florian Weimer wrote:
> * Rich Felker:
> 
> > There's a longstanding unfixable (due to API stability) bug in the
> > pwrite syscall:
> >
> > http://man7.org/linux/man-pages/man2/pwrite.2.html#BUGS
> >
> > whereby it wrongly honors O_APPEND if set, ignoring the caller-passed
> > offset. Now that there's a pwritev2 syscall that takes a flags
> > argument, it's possible to fix this without breaking stability by
> > adding a new RWF_NOAPPEND flag, which callers that want the fixed
> > behavior can then pass.
> >
> > I have a completely untested patch to add such a flag, but would like
> > to get a feel for whether the concept is acceptable before putting
> > time into testing it. If so, I'll submit this as a proper patch with
> > detailed commit message etc. Draft is below.
> 
> Has this come up before?

I'm not sure if there's an open glibc bug for it or not, but it's come
up in musl community before that the kernel is non-conforming here for
historical reasons (preserving the original bug in case any software
is depending on it) and we've always wanted to have a fix, but
couldn't find one short of just erroring out if O_APPEND is set when
pwrite is called. That's what the fallback will do (rather than
silently write data at the wrong place) if pwritev2+RWF_NOAPPEND is
not supported on the system at runtime.

> I had already written a test case and it turns out that an O_APPEND
> descriptor does not protect the previously written data in the file:
> 
> openat(AT_FDCWD, "/tmp/append-truncateuoRexJ", O_RDWR|O_CREAT|O_EXCL, 0600) = 3
> write(3, "@", 1)                        = 1
> close(3)                                = 0
> openat(AT_FDCWD, "/tmp/append-truncateuoRexJ", O_WRONLY|O_APPEND) = 3
> ftruncate(3, 0)                         = 0
> 
> So at least it looks like there is no security issue in adding a
> RWF_NOAPPEND flag.

Indeed, if you have the file open you can just use fcntl to remove
O_APPEND (but of course using that in an emulation would be racy), so
it's not a security boundary. Someone could try to "make it into one"
with seccomp, blocking fcntl that would remove O_APPEND and blocking
ftruncate, mmap, and all other ways you could modify the existing part
of the file, but that sounds fragile, and if they really want to do
that they can block pwritev2 as well (or at least block it with
RWF_NOAPPEND or future/unknown flags).

Rich

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Proposal to fix pwrite with O_APPEND via pwritev2 flag
  2020-01-24  0:02 Proposal to fix pwrite with O_APPEND via pwritev2 flag Rich Felker
  2020-01-24  9:37 ` Florian Weimer
@ 2020-02-05  4:42 ` Rich Felker
  1 sibling, 0 replies; 4+ messages in thread
From: Rich Felker @ 2020-02-05  4:42 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel, linux-api; +Cc: Alexander Viro

On Thu, Jan 23, 2020 at 07:02:43PM -0500, Rich Felker wrote:
> There's a longstanding unfixable (due to API stability) bug in the
> pwrite syscall:
> 
> http://man7.org/linux/man-pages/man2/pwrite.2.html#BUGS
> 
> whereby it wrongly honors O_APPEND if set, ignoring the caller-passed
> offset. Now that there's a pwritev2 syscall that takes a flags
> argument, it's possible to fix this without breaking stability by
> adding a new RWF_NOAPPEND flag, which callers that want the fixed
> behavior can then pass.
> 
> I have a completely untested patch to add such a flag, but would like
> to get a feel for whether the concept is acceptable before putting
> time into testing it. If so, I'll submit this as a proper patch with
> detailed commit message etc. Draft is below.

I went ahead and tested this, and it works as intended, so I'll post a
proper patch with commit message.

Rich



> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index e0d909d35763..3a769a972f79 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -3397,6 +3397,8 @@ static inline int kiocb_set_rw_flags(struct kiocb *ki, rwf_t flags)
>  {
>  	if (unlikely(flags & ~RWF_SUPPORTED))
>  		return -EOPNOTSUPP;
> +	if (unlikely((flags & RWF_APPEND) && (flags & RWF_NOAPPEND)))
> +		return -EINVAL;
>  
>  	if (flags & RWF_NOWAIT) {
>  		if (!(ki->ki_filp->f_mode & FMODE_NOWAIT))
> @@ -3411,6 +3413,8 @@ static inline int kiocb_set_rw_flags(struct kiocb *ki, rwf_t flags)
>  		ki->ki_flags |= (IOCB_DSYNC | IOCB_SYNC);
>  	if (flags & RWF_APPEND)
>  		ki->ki_flags |= IOCB_APPEND;
> +	if (flags & RWF_NOAPPEND)
> +		ki->ki_flags &= ~IOCB_APPEND;
>  	return 0;
>  }
>  
> diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
> index 379a612f8f1d..591357d9b3c9 100644
> --- a/include/uapi/linux/fs.h
> +++ b/include/uapi/linux/fs.h
> @@ -299,8 +299,11 @@ typedef int __bitwise __kernel_rwf_t;
>  /* per-IO O_APPEND */
>  #define RWF_APPEND	((__force __kernel_rwf_t)0x00000010)
>  
> +/* per-IO negation of O_APPEND */
> +#define RWF_NOAPPEND	((__force __kernel_rwf_t)0x00000020)
> +
>  /* mask of flags supported by the kernel */
>  #define RWF_SUPPORTED	(RWF_HIPRI | RWF_DSYNC | RWF_SYNC | RWF_NOWAIT |\
> -			 RWF_APPEND)
> +			 RWF_APPEND | RWF_NOAPPEND)
>  
>  #endif /* _UAPI_LINUX_FS_H */

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-02-05  4:42 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-24  0:02 Proposal to fix pwrite with O_APPEND via pwritev2 flag Rich Felker
2020-01-24  9:37 ` Florian Weimer
2020-01-24 14:07   ` Rich Felker
2020-02-05  4:42 ` Rich Felker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).