All of lore.kernel.org
 help / color / mirror / Atom feed
From: Al Viro <viro@ZenIV.linux.org.uk>
To: netdev@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	David Miller <davem@davemloft.net>,
	linux-kernel@vger.kernel.org
Subject: [RFC] situation with csum_and_copy_... API
Date: Tue, 18 Nov 2014 08:47:45 +0000	[thread overview]
Message-ID: <20141118084745.GT7996@ZenIV.linux.org.uk> (raw)

I've spent the last weekend crawling through the copy-and-calculate-csum
primitives on all architectures.  It came up during iov_iter work; below
are the results of that review and, in the end, several questions about
the short-term tactics of a change that needs to be done for iov_iter-net.

The API apparently consists of 3 functions - present and exported on all
architectures:
	1) csum_and_copy_from_user()
	2) csum_and_copy_to_user()
	3) csum_partial_copy_nocheck().
There are very few places _using_ those, all but one in net stack (the only
exception is in drivers/net).
The minimal implementations would be

__wsum csum_and_copy_from_user(const void __user *src, void *dst, int len,
			       __wsum sum, int *err_ptr)
{
	if (unlikely(copy_from_user(dst, src, len) < 0)) {
		*err_ptr = -EFAULT;
		return <whatever>;
	}
	return csum_partial(dst, len, sum);
}

__wsum csum_and_copy_to_user(const void *src, void __user *dst, int len,
			     __wsum sum, int *err_ptr)
{
	sum = csum_partial(src, len, sum);
	if (unlikely(copy_to_user(dst, src, len) < 0)) {
		*err_ptr = -EFAULT;
		return <whatever>;
	}
	return sum;
}

__wsum csum_partial_copy_nocheck(const void *src, void *dst, int len,
				 __wsum sum)
{
	memcpy(dst, src, len);
	return csum_partial(dst, len, sum);
}

Note that we are *not* guaranteed that *err_ptr is touched in case of success
- not all architectures zero it in that case.  Callers must (and do) zero it
before the call of csum_and_copy_{from,to}_user().  Furthermore, return value
in case of error is undefined.  On the "from" side result is really "anything
whatsoever", on the "to" side most of the architectures end up returning ~0U.
However, even that is not guaranteed - e.g. avr32 and xtensa might return 0
in that case.  All existing callers discard the return value in case of error -
most of them by jumping out of the scope of variable it's assigned to.  The
only exception is skb_copy_and_csum_datagram(); in that case return value is
discarded by caller of skb_copy_and_csum_datagram() itself (also by jumping
out of scope of the variable it had been put into).

Generally, error in copying from userland ends up zeroing the rest of
destination.  However, there are exceptions (which might be considered bugs)
*and* callers discard all the copied data in case of error anyway.  There
are several call chains:
	1) from skb_add_data(): calls skb_trim() immediately after the failure.
	2) from skb_do_copy_data_nocache() called from skb_add_data_nocache():
__skb_trim() in skb_add_data_nocache()
	3) from skb_do_copy_data_nocache() called from
skb_copy_to_page_nocache(): we do not increase skb->len until after successful
copying.
	4) from skb_copy_to_page(): we do not increase skb->len until after
successful copying.
	5) from csum_partial_copy_fromiovecend() from ip_generic_getfrag()
or ping_getfrag(): those are passed as getfrag callback to ip_make_skb(),
ip_append_data() or ip6_append_data().  ip_make_skb() and ip_append_data()
just pass it to __ip_append_data(), so we are left with two fairly similar
functions to analyse.
	* __ip_append_data() has 3 places where getfrag() is called directly and
one where it's passed to ip_ufo_append_data().  Direct ones either free skb, or
do __skb_trim() or do not increment skb->len on error.  ip_ufo_append_data()
passes it further to skb_append_datato_frags(), where we do not increment skb->len
and friends in case of getfrag() failure.
	* ip6_append_data() the situation is identical, with ip6_ufo_append_data()
in place of ip_ufo_append_data().

So for all in-tree users of that sucker we are guaranteed to discard the whole
thing in case of error and this zeroing the tail is pointless.

Note also that the order of src and dest is opposite to normal for memcpy-like
functions.  Compared to the rest of the ugliness it's trivial, but it's still
not nice.

IMO the calling conventions are atrocious.

And then there are architecture-specific warts.  A relatively minor one is that
csum_partial_copy_nocheck() has a slightly saner name on some architectures -
there it's called csum_partial_copy().  Of course, those architectures have
#define csum_partial_copy_nocheck csum_partial_copy...  Worse ones are related
to csum_and_copy_from_user() - on everything other than ppc64 it is an
inlined wrapper around csum_partial_copy_from_user(), which is what's actually
exported.  On ppc64 csum_partial_copy_from_user() simply doesn't exist.
The _only_ difference between it and csum_and_copy_from_user() is that the
wrapper does access_ok() check.  However, some architectures repeat that
access_ok() in csum_partial_copy_from_user() (and on x86 we have
#define csum_and_copy_from_user csum_partial_copy_from_user, with no
access_ok() in the wrapper).  As it is, it's architecture-dependent whether
csum_partial_copy_from_user() does or does not access_ok(); on
alpha, frv, m32r, mn10300, parisc, s390, score and x86 it does, on the
rest it doesn't.

To make it even more fun, converting verify_iovec() and its compat equivalent
to use of {,compat_}rw_copy_check_uvector() would guarantee that all those
access_ok() are redundant to start with, both on send and receive side of
things.  Note, BTW, that for read()/write()/readv()/writev() it's already
redundant.  Moreover, we have a very good reason to do that anyway - conversion
of sendmsg/recvmsg to iov_iter primitives would pretty much require that,
since copy_{to,from}_iter() assume that iovec behind the iov_iter has been
validated wrt access_ok().

I do have a patch doing just that; the question is what to do with csum-and-copy
primitives.  Originally I planned to simply strip those access_ok() from those
(both the explicit calls and use of copy_from_user() where we ought to use
__copy_from_user(), etc.), but that's not nice to potential out-of-tree callers
of those suckers.  If any of those exist and manage to cope with the wonderful
calling conventions, that is.  As it is, we have the total of 4 callers of
csum_and_copy_from_user() and 2 callers of csum_and_copy_to_user(), all in
networking code.  Do we care about potential out-of-tree users existing and
getting screwed by such change?  Davem, Linus?

Alternatively, we could introduce __csum_and_copy_{from,to}_user() that would
_not_ do access_ok() (and wouldn't be required to do zeroing the tail, etc.),
convert the existing callers to that and leave csum_and_copy_{to,from}_user()
as architecture-independent wrappers around those - "do access_ok(),
then try to call __csum_and... variant, then fall back to dumb implementation
if that fails".  For almost all architectures csum_and_partial_copy_from_user()
would get stripped of access_ok() and renamed to __csum_and_copy_from_user();
for ppc64 we'd define __csum_and_copy_from_user(src,dst,len,sum,errp) as
csum_and_partial_copy_generic(src,dst,len,sum,errp,NULL) - they already
have that kind of code structure.  This variant still buggers the out-of-tree
code using csum_and_partial_copy_from_user() directly, but users of
csum_and_copy_{from,to}_user() are left intact, such modules were already
broken on ppc64 *and* breakage is of obvious "it won't link" kind.

Comments, suggestions?

PS: there are some really amusing brainos - e.g. mn10300 csum_and_copy_to_user()
starts with this:
	missing = copy_to_user(dst, src, len);
	if (missing) {
		memset(dst + len - missing, 0, missing);
		*err_ptr = -EFAULT;
	}
that's right, if copy_to_user() has returned non-zero, try to do
memset() on userland addresses it has failed to write into...

             reply	other threads:[~2014-11-18  8:47 UTC|newest]

Thread overview: 133+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-18  8:47 Al Viro [this message]
2014-11-18 19:40 ` [patches][RFC] situation with csum_and_copy_... API Al Viro
2014-11-18 19:41   ` [PATCH 1/5] separate kernel- and userland-side msghdr Al Viro
2014-11-18 19:42   ` [PATCH 2/5] {compat_,}verify_iovec(): switch to generic copying of iovecs Al Viro
2014-11-18 19:42   ` [PATCH 3/5] remove a bunch of now-pointless access_ok() in net Al Viro
2014-11-18 19:43   ` [PATCH 4/5] bury skb_copy_to_page() Al Viro
2014-11-18 19:43   ` [PATCH 5/5] fold verify_iovec() into copy_msghdr_from_user() Al Viro
2014-11-19 20:25   ` [patches][RFC] situation with csum_and_copy_... API David Miller
2014-11-18 20:49 ` [RFC] " Linus Torvalds
2014-11-18 21:23   ` Al Viro
2014-11-18 21:39     ` Linus Torvalds
2014-11-19 20:31     ` David Miller
2014-11-19 20:40       ` Linus Torvalds
2014-11-19 21:17         ` Al Viro
2014-11-19 21:17         ` David Miller
2014-11-19 21:30           ` Al Viro
2014-11-19 21:53             ` David Miller
2014-11-20 21:47               ` Al Viro
2014-11-20 21:55                 ` Eric Dumazet
2014-11-20 22:25                   ` Al Viro
2014-11-20 22:53                     ` Eric Dumazet
2014-11-21  8:49                       ` Al Viro
2014-11-21 15:01                         ` Eric Dumazet
2014-11-21 17:42                         ` David Laight
2014-11-21 19:39                           ` Al Viro
2014-11-21 19:40                             ` Linus Torvalds
2014-11-24 10:03                             ` David Laight
2014-11-22  3:27                         ` Al Viro
2014-11-22  3:36                           ` Al Viro
2014-11-24 10:27                           ` David Laight
2014-11-20 23:23                 ` David Miller
2014-11-21 17:26                   ` David Miller
2014-11-22  4:28                     ` Al Viro
2014-11-22  4:29                       ` [PATCH 01/17] new helper: skb_copy_and_csum_datagram_msg() Al Viro
2014-11-22  4:30                       ` [PATCH 02/17] new helper: memcpy_from_msg() Al Viro
2014-11-22  4:30                       ` [PATCH 03/17] switch ipxrtr_route_packet() from iovec to msghdr Al Viro
2014-11-22  4:31                       ` [PATCH 04/17] new helper: memcpy_to_msg() Al Viro
2014-11-22  4:32                       ` [PATCH 05/17] switch drivers/net/tun.c to ->read_iter() Al Viro
2014-11-22  4:32                       ` [PATCH 06/17] switch macvtap " Al Viro
2014-11-23 23:29                         ` Ben Hutchings
2014-11-22  4:33                       ` [PATCH 07/17] new helpers: skb_copy_datagram_from_iter() and zerocopy_sg_from_iter() Al Viro
2014-11-24  0:02                         ` Ben Hutchings
2014-11-24  0:29                           ` Ben Hutchings
2014-11-24  5:34                           ` Jason Wang
2014-11-24 10:03                             ` Al Viro
2014-11-22  4:33                       ` [PATCH 08/17] {macvtap,tun}_get_user(): switch to iov_iter Al Viro
2014-11-24  0:27                         ` Ben Hutchings
2014-11-24  1:06                           ` Ben Hutchings
2014-11-24 10:15                           ` Al Viro
2014-11-22  4:34                       ` [PATCH 09/17] kill zerocopy_sg_from_iovec() Al Viro
2014-11-22  4:35                       ` [PATCH 10/17] switch AF_PACKET and AF_UNIX to skb_copy_datagram_from_iter() Al Viro
2014-11-22  4:36                       ` PATCH 11/17] switch sctp_user_addto_chunk() and sctp_datamsg_from_user() to passing iov_iter Al Viro
2014-11-22  4:36                       ` [PATCH 12/17] tipc_sendmsg(): pass msghdr instead of its ->msg_iov Al Viro
2014-11-22  4:37                       ` [PATCH 13/17] tipc_msg_build(): " Al Viro
2014-11-22  4:37                       ` [PATCH 14/17] vmci_transport: switch ->enqeue_dgram, ->enqueue_stream and ->dequeue_stream to msghdr Al Viro
2014-11-22  4:38                       ` [PATCH 15/17] [atm] switch vcc_sendmsg() to copy_from_iter() Al Viro
2014-11-22  4:38                       ` [PATCH 16/17] rds: switch ->inc_copy_to_user() to passing iov_iter Al Viro
2014-11-22  4:39                       ` [PATCH 17/17] rds: switch rds_message_copy_from_user() to iov_iter Al Viro
2014-11-24  2:00                         ` Ben Hutchings
2014-11-24 10:17                           ` Al Viro
2014-11-22  7:24                       ` [RFC] situation with csum_and_copy_... API David Miller
2014-11-25  2:40                         ` Al Viro
2014-11-25 14:02                           ` [PATCH v2 01/17] new helper: skb_copy_and_csum_datagram_msg() Al Viro
2014-11-25 19:28                             ` David Miller
2014-11-25 20:59                               ` Al Viro
2014-11-26 17:27                                 ` David Miller
2014-12-05  5:56                                   ` the next chunk of iov_iter-net stuff for review Al Viro
2014-12-05  5:58                                     ` [PATCH 01/12] raw.c: stick msghdr into raw_frag_vec Al Viro
2014-12-05  5:58                                     ` [PATCH 02/12] ipv6 equivalent of "ipv4: Avoid reading user iov twice after raw_probe_proto_opt" Al Viro
2014-12-05  5:58                                     ` [PATCH 03/12] ip_generic_getfrag, udplite_getfrag: switch to passing msghdr Al Viro
2014-12-05  5:58                                     ` [PATCH 04/12] switch tcp_sock->ucopy from iovec (ucopy.iov) to msghdr (ucopy.msg) Al Viro
2014-12-05  5:58                                     ` [PATCH 05/12] switch l2cap ->memcpy_fromiovec() to msghdr Al Viro
2014-12-05  5:58                                     ` [PATCH 06/12] vmci: propagate msghdr all way down to __qp_memcpy_from_queue() Al Viro
2014-12-05  5:58                                     ` [PATCH 07/12] put iov_iter into msghdr Al Viro
2014-12-05  5:58                                     ` [PATCH 08/12] first fruits - kill l2cap ->memcpy_fromiovec() Al Viro
2014-12-05  5:58                                     ` [PATCH 09/12] switch memcpy_to_msg() and skb_copy{,_and_csum}_datagram_msg() to primitives Al Viro
2014-12-05  5:58                                     ` [PATCH 10/12] ppp_read(): switch to skb_copy_datagram_iter() Al Viro
2014-12-05  5:58                                     ` [PATCH 11/12] skb_copy_datagram_iovec() can die Al Viro
2014-12-05  5:58                                     ` [PATCH 12/12] bury memcpy_toiovec() Al Viro
2014-12-09 20:07                                     ` the next chunk of iov_iter-net stuff for review David Miller
2014-12-09 21:04                                       ` Al Viro
2014-12-09 21:17                                         ` David Miller
2014-12-09 21:23                                           ` Al Viro
2014-12-09 21:37                                             ` David Miller
2014-12-09 22:49                                               ` Al Viro
2014-12-09 22:50                                                 ` [PATCH 01/25] iov_iter.c: macros for iterating over iov_iter Al Viro
2014-12-09 22:50                                                 ` [PATCH 02/25] iov_iter.c: iterate_and_advance Al Viro
2014-12-09 22:50                                                 ` [PATCH 03/25] iov_iter.c: convert iov_iter_npages() to iterate_all_kinds Al Viro
2014-12-09 22:50                                                 ` [PATCH 04/25] iov_iter.c: convert iov_iter_get_pages() " Al Viro
2014-12-09 22:50                                                 ` [PATCH 05/25] iov_iter.c: convert iov_iter_get_pages_alloc() " Al Viro
2014-12-09 22:50                                                 ` [PATCH 06/25] iov_iter.c: convert iov_iter_zero() to iterate_and_advance Al Viro
2014-12-09 22:50                                                 ` [PATCH 07/25] iov_iter.c: get rid of bvec_copy_page_{to,from}_iter() Al Viro
2014-12-09 22:50                                                 ` [PATCH 08/25] iov_iter.c: convert copy_from_iter() to iterate_and_advance Al Viro
2014-12-09 22:50                                                 ` [PATCH 09/25] iov_iter.c: convert copy_to_iter() " Al Viro
2014-12-09 22:50                                                 ` [PATCH 10/25] iov_iter.c: handle ITER_KVEC directly Al Viro
2014-12-09 22:50                                                 ` [PATCH 11/25] csum_and_copy_..._iter() Al Viro
2014-12-09 22:50                                                 ` [PATCH 12/25] new helper: iov_iter_kvec() Al Viro
2014-12-09 22:50                                                 ` [PATCH 13/25] copy_from_iter_nocache() Al Viro
2014-12-09 22:50                                                 ` [PATCH 14/25] raw.c: stick msghdr into raw_frag_vec Al Viro
2014-12-09 22:50                                                 ` [PATCH 15/25] ipv6 equivalent of "ipv4: Avoid reading user iov twice after raw_probe_proto_opt" Al Viro
2014-12-09 22:50                                                 ` [PATCH 16/25] ip_generic_getfrag, udplite_getfrag: switch to passing msghdr Al Viro
2014-12-09 22:50                                                 ` [PATCH 17/25] switch tcp_sock->ucopy from iovec (ucopy.iov) to msghdr (ucopy.msg) Al Viro
2014-12-09 22:50                                                 ` [PATCH 18/25] switch l2cap ->memcpy_fromiovec() to msghdr Al Viro
2014-12-09 22:50                                                 ` [PATCH 19/25] vmci: propagate msghdr all way down to __qp_memcpy_from_queue() Al Viro
2014-12-09 22:50                                                 ` [PATCH 20/25] put iov_iter into msghdr Al Viro
2014-12-09 22:50                                                 ` [PATCH 21/25] first fruits - kill l2cap ->memcpy_fromiovec() Al Viro
2014-12-09 22:50                                                 ` [PATCH 22/25] switch memcpy_to_msg() and skb_copy{,_and_csum}_datagram_msg() to primitives Al Viro
2014-12-09 22:50                                                 ` [PATCH 23/25] ppp_read(): switch to skb_copy_datagram_iter() Al Viro
2014-12-09 22:50                                                 ` [PATCH 24/25] skb_copy_datagram_iovec() can die Al Viro
2014-12-09 22:50                                                 ` [PATCH 25/25] bury memcpy_toiovec() Al Viro
2014-12-09 23:13                                                 ` the next chunk of iov_iter-net stuff for review Al Viro
2014-12-10 18:25                                                 ` David Miller
2014-11-25 14:02                           ` [PATCH v2 02/17] new helper: memcpy_from_msg() Al Viro
2014-11-25 14:02                           ` [PATCH v2 03/17] switch ipxrtr_route_packet() from iovec to msghdr Al Viro
2014-11-25 14:02                           ` [PATCH v2 04/17] new helper: memcpy_to_msg() Al Viro
2014-11-25 14:02                           ` [PATCH v2 05/17] switch drivers/net/tun.c to ->read_iter() Al Viro
2014-11-25 14:02                           ` [PATCH v2 06/17] switch macvtap " Al Viro
2014-11-25 14:02                           ` [PATCH v2 07/17] new helpers: skb_copy_datagram_from_iter() and zerocopy_sg_from_iter() Al Viro
2014-11-25 14:02                           ` [PATCH v2 08/17] {macvtap,tun}_get_user(): switch to iov_iter Al Viro
2015-02-03 10:10                             ` Michael S. Tsirkin
2015-02-03 14:27                               ` Al Viro
2015-02-03 15:19                                 ` Michael S. Tsirkin
2014-11-25 14:02                           ` [PATCH v2 09/17] kill zerocopy_sg_from_iovec() Al Viro
2014-11-25 14:02                           ` [PATCH v2 10/17] switch AF_PACKET and AF_UNIX to skb_copy_datagram_from_iter() Al Viro
2014-11-25 14:02                           ` [PATCH v2 11/17] switch sctp_user_addto_chunk() and sctp_datamsg_from_user() to passing iov_iter Al Viro
2014-11-25 14:02                           ` [PATCH v2 12/17] tipc_sendmsg(): pass msghdr instead of its ->msg_iov Al Viro
2014-11-25 14:02                           ` [PATCH v2 13/17] tipc_msg_build(): " Al Viro
2014-11-25 14:02                           ` [PATCH v2 14/17] vmci_transport: switch ->enqeue_dgram, ->enqueue_stream and ->dequeue_stream to msghdr Al Viro
2014-11-25 14:02                           ` [PATCH v2 15/17] [atm] switch vcc_sendmsg() to copy_from_iter() Al Viro
2014-11-25 14:02                           ` [PATCH v2 16/17] rds: switch ->inc_copy_to_user() to passing iov_iter Al Viro
2014-11-25 14:02                           ` [PATCH v2 17/17] rds: switch rds_message_copy_from_user() to iov_iter Al Viro
2014-11-22 17:48                       ` [RFC] situation with csum_and_copy_... API Linus Torvalds
2014-11-21  4:17                 ` Nicholas A. Bellinger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141118084745.GT7996@ZenIV.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=davem@davemloft.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.