All of lore.kernel.org
 help / color / mirror / Atom feed
From: Al Viro <viro@zeniv.linux.org.uk>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: [RFC] raw_copy_from_user() semantics
Date: Sun, 19 Jul 2020 04:17:33 +0100	[thread overview]
Message-ID: <20200719031733.GI2786714@ZenIV.linux.org.uk> (raw)

	Back in 2017 I'd made a bogus promise regarding raw_copy_from_user().
 Namely, that in case of short copy it will copy at least something unless
nothing could've been read at all.

	Such property could've been used by code that would want to squeeze
every byte, by doing copy_from_user() in a loop, advancing the source and
destination by the amount copied until we stop getting any progress.

	There are two problems with that.  First of all, the promise was
bogus - there are architectures where it is simply not true.  E.g. ppc
(or alpha, or sparc, or...)  can have copy_from_user() called with source
one word prior to an unmapped page, fetch that word, fail to fetch the next
one and bugger off without doing any stores.

	So any byte-squeezing loop of that sort would break on a bunch
of architectures.  Another problem is that we simply don't have a lot
of such loops.	Looking through the mainline kernel has found only two
such beasts - one in vcs_write() and another in iomap_dio_inline_actor().
Both are in write(2) and write(2) has never made any warranties regarding
the situation when a part of buffer we want to write is unreadable.
In particular, it does *not* squeeze every last byte out - not for
regular files on local filesystems, not for NFS, not for pipes, not for
sockets, etc.

	E.g. in case of generic_perform_write() we treat "nothing has
been copied" as "try to fault in, fail if that's not possible, retry the
attempt to copy otherwise"; we have to, since we are doing the copying
with pagefaults disabled, so a failure to copy anything may be simply
due to a page evicted by memory pressure.  And if we have an evicted page
followed by genuinely unreadable one, we can very well have fault-in fail
(and write stop) while we still have a couple of kilobytes worth of data
yet to be copied from the evicted page.

	We could try to make the promise true; that would mean messing
with a lot of unpleasant asm code (in exception handlers, at that) and
I don't believe it's worth the bother.  We could add a "squeeze every
last byte" helper, but I don't see any valid users for it right now.

	BTW, "do copy_from_user() in a loop" is not a good way to do such
helper anyway - if we stop copying on a fault and lose some data already
fetched, that data is not going to be more than a few registers worth,
so it's better to have this helper do a byte-by-byte get_user() loop
for dealing with that case.

	IMO we should simply remove the bogus promise, document the real
situation and if somebody needs the "squeeze every last byte" helper,
let them say so.  Audit of the current mainline shows only two places
that might be trying something like that and AFAICS both are accidental.

	Does anybody object against the following?

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index 7bcadca22100..083fda6aa384 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -30,10 +30,11 @@
  * starting at from.  All data past to + size - N must be left unmodified.
  *
  * If copying succeeds, the return value must be 0.  If some data cannot be
- * fetched, it is permitted to copy less than had been fetched; the only
- * hard requirement is that not storing anything at all (i.e. returning size)
- * should happen only when nothing could be copied.  In other words, you don't
- * have to squeeze as much as possible - it is allowed, but not necessary.
+ * fetched, it is permitted to copy less than had been fetched.  In other
+ * words, you don't have to squeeze as much as possible - it is allowed, but
+ * not necessary.  In particular, it is possible to have some bytes fetched
+ * and nothing copied, so do _not_ assume that doing copies until we get to
+ * "nothing copied" will copy every byte that can be fetched.
  *
  * For raw_copy_from_user() to always points to kernel memory and no faults
  * on store should happen.  Interpretation of from is affected by set_fs().

             reply	other threads:[~2020-07-19  3:18 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-19  3:17 Al Viro [this message]
2020-07-19 19:28 ` [RFC] raw_copy_from_user() semantics Linus Torvalds
2020-07-19 19:34   ` Linus Torvalds
2020-07-22 11:37     ` Catalin Marinas
2020-07-22 13:14       ` David Laight
2020-07-22 16:53         ` Catalin Marinas
2020-07-23  8:37           ` David Laight
2020-07-23 10:18             ` Catalin Marinas
2020-07-23 10:34               ` David Laight

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200719031733.GI2786714@ZenIV.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.