All of lore.kernel.org
 help / color / mirror / Atom feed
From: Al Viro <viro@ZenIV.linux.org.uk>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Miller <davem@davemloft.net>,
	torvalds@linux-foundation.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, target-devel@vger.kernel.org,
	"Nicholas A. Bellinger" <nab@linux-iscsi.org>,
	Christoph Hellwig <hch@infradead.org>
Subject: Re: [RFC] situation with csum_and_copy_... API
Date: Fri, 21 Nov 2014 08:49:56 +0000	[thread overview]
Message-ID: <20141121084956.GT7996@ZenIV.linux.org.uk> (raw)
In-Reply-To: <1416524035.8629.54.camel@edumazet-glaptop2.roam.corp.google.com>

On Thu, Nov 20, 2014 at 02:53:55PM -0800, Eric Dumazet wrote:
> > And no, your solution doesn't work.  Sorry.  You'll break e.g. smb_send_kvec()
> > that way.  ceph_tcp_sendmsg() as well, IIRC.
> 
> Nowhere in tcp_sendmsg() the iov had const qualifier.

*nod*

> If it was declared as const, this discussion would not happen,
> we would know we are not allowed to modify it.
>
> iov_iter is nice, but not a single time it is used in net/ 
> 
> Please make sure to add const where appropriate.

The situation is a fairly long-standing mess.  Take a look at e.g.
do_sock_write() - what we have there is
static ssize_t do_sock_write(struct msghdr *msg, struct kiocb *iocb,
                        struct file *file, const struct iovec *iov,
                        unsigned long nr_segs)
...
        msg->msg_iov = (struct iovec *)iov;
...
        return __sock_sendmsg(iocb, sock, msg, size);

That's where const is stripped off.  The reason why memcpy_fromiovec()
is modifying iovec is that it we obviously want subsequent calls to keep
picking new and new parts.  And on sendmsg(2) path iovec is discarded
after the method call anyway, so we get away with "let's do whatever's
more convenient internally, caller won't care anyway".

Unfortunately, it means that other callers (kernel_sendmsg() users, that is)
end up with very unpleasant situation - they can't even predict if iovec
will be fully drained, partially drained or left completely unchanged.
It's protocol-dependent *and* it depends on the codepath taken in ->sendmsg().
Note that "partially drained" is also a real-world case - e.g.
ping_v4_sendmsg() ends up draining the first sizeof(struct icmphdr) and leaves
the rest there.

In some cases it doesn't hurt much - throwaway struct iovec or a short array
of such which is fed to sendmsg and immediately discarded.  The rest either
relies on assumption about the behaviour for (known) protocol, which end
up being incorrect more often than not, or grumbles, makes a throwaway copy
of its iovec array and after the call either drains the original itself or
advances an iov_iter pointing to it.  There's quite a collection of unhappy
comments in those callers about this situation.  On the recvmsg side the
picture is the same.

We would be better off with iov_iter passed to __sock_{send,recv}msg() (as
a part of struct msghdr, instead of ->msg_iov/->msg_iovlen) and always
advanced to match the amount of data actually picked from it.  With iovec
behind it remaining constant.  That would work just as well as the current
variant for sendmsg(2)/recvmsg(2)/etc., be a lot more convenient for
kernel_{send,recv}msg() callers and would allow a lot of other fun stuff.

The problem, of course, is how to get through the conversion without a huge
patch from hell.  I'm reasonably certain that we can do that on the method
side of things - right now the amount of places aware of ->msg_iov in that
branch is much lower than in mainline.  The fun part is keeping the users
of kernel_{send,recv}msg() from breaking while we do the rest of transition.
The ones that use throwaway iovecs are fine - they don't assume any particular
behaviour wrt iovec draining.  A bunch of those will be able to use new
warranties later on, but those are separate patches that should go after the
switch to new rules.  Besides those we have the following:
	iscsit_do_tx_data().  Sends over TCP, assumes iovec drained.
	iscsit_do_rx_data().  Recieves over TCP, assumes iovec drained.
	smb_send_kvec().  Sends, assumes iovec unchanged.
	rxrpc_reject_packets().  Sends, assumes iovec unchanged.
	ceph_tcp_sendmsg().  Sends, assumes iovec unchanged.
The last three are not a problem, obviously - they are not quite correct
right now, but with those changes we get no regressions and the closer
we are to complete conversion, the better off they are.  Which leaves
us with iscsit_do_[rt]x_data().  

For tcp_sendmsg() conversion we need skb_add_data_nocache() and
skb_copy_to_page_nocache() variants that would take iov_iter as data
source.  Then the loop over iovec members and while (seglen > 0) loop
inside it get conflated (with iov_iter_count() instead of seglen).

Another thing is tcp_sendmsg_fastopen() and tcp_send_rcvq().  The latter
should just use copy_from_iter() instead of memcpy_from_iovec(), the former
is dealt with by making tcp_send_syn_data() use the same copy_from_iter()
instead of memcpy_from_iovecend().

All of that depends on ->msg_iter being already introduced and it doesn't break
iscsi_do_tx_data() any worse than it's currently broken.  Moreover, right after
it we'll be able to fix iscsi_do_tx_data() for good, simply by stopping to mess
with ->msg_iter after the first pass through the loop in there.

skb_add_data_nocache() and skb_copy_to_page_nocache() are both wrappers for
skb_do_copy_data_nocache(), which is used only by those two *and* they are
used only by tcp_sendmsg().  So there's no other code to be disrupted by
changes in those.  What skb_do_copy_data_nocache() is doing is a mix of
copy_from_user() (trivial - we just use copy_from_iter() instead),
csum_and_copy_from_user() (new primitive - csum_and_copy_from_iter()) and
__copy_from_user_nocache() (also a new primitive, easily added, but we
are really getting to the point where the amount of boilerplate in iov_iter.c
becomes painful).  Incidentally, xip_file_write() could benefit from the last
one as well, giving us full ->write_iter() for those...

As for the recvmsg() part of story, we'll need a new helper there as well -
csum_and_copy_to_iter(), for use in skb_copy_and_csum_datagram_iovec()
analogue that would take iov_iter.  Which will very shortly replace the
iovec one.  We'll need tp->ucopy.msg instead of tp->ucopy.iov for that;
that can be done as the first step, actually (ucopy.msg is always
->msg_iov of some msghdr with sufficiently long lifetime).  That +
introduction of helpers leaves us with moderately-sized patch converting
tcp_recvmsg() to new semantics.  And unfortunately the same patch will have
to include a (fairly simple) chunk in iscsi_do_rx_data() - same "don't
mess with ->msg_iter on subsequent iterations" thing.

At that point we'll have all kernel_{send,recv}msg() users happy with the
new semantics and are free to switch ->sendmsg()/->recvmsg() instances to
use of iov_iter primitives, not worrying about messing the callers up.
>From that point it's a smooth sailing - a bunch of iovec helpers will become
dead code shortly after that, etc.

One more moderately interesting spot will be around AF_ALG stuff - we'll need
to teach iov_iter_get_pages{,_alloc}() to do the right thing for kvec-backed
iterators.  Not hard to do, fortunately.

Overall, I think I have the whole series plotted in enough details to be
reasonably certain we can pull it off.  Right now I'm dealing with
mm/iov_iter.c stuff; the amount of boilerplate source is already high enough
and with those extra primitives it'll get really unpleasant.

What we need there is something templates-like, as much as I hate C++, and
I'm still not happy with what I have at the moment...  Hopefully I'll get
that in more or less tolerable form today.

  reply	other threads:[~2014-11-21  8:50 UTC|newest]

Thread overview: 133+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-18  8:47 [RFC] situation with csum_and_copy_... API Al Viro
2014-11-18 19:40 ` [patches][RFC] " Al Viro
2014-11-18 19:41   ` [PATCH 1/5] separate kernel- and userland-side msghdr Al Viro
2014-11-18 19:42   ` [PATCH 2/5] {compat_,}verify_iovec(): switch to generic copying of iovecs Al Viro
2014-11-18 19:42   ` [PATCH 3/5] remove a bunch of now-pointless access_ok() in net Al Viro
2014-11-18 19:43   ` [PATCH 4/5] bury skb_copy_to_page() Al Viro
2014-11-18 19:43   ` [PATCH 5/5] fold verify_iovec() into copy_msghdr_from_user() Al Viro
2014-11-19 20:25   ` [patches][RFC] situation with csum_and_copy_... API David Miller
2014-11-18 20:49 ` [RFC] " Linus Torvalds
2014-11-18 21:23   ` Al Viro
2014-11-18 21:39     ` Linus Torvalds
2014-11-19 20:31     ` David Miller
2014-11-19 20:40       ` Linus Torvalds
2014-11-19 21:17         ` Al Viro
2014-11-19 21:17         ` David Miller
2014-11-19 21:30           ` Al Viro
2014-11-19 21:53             ` David Miller
2014-11-20 21:47               ` Al Viro
2014-11-20 21:55                 ` Eric Dumazet
2014-11-20 22:25                   ` Al Viro
2014-11-20 22:53                     ` Eric Dumazet
2014-11-21  8:49                       ` Al Viro [this message]
2014-11-21 15:01                         ` Eric Dumazet
2014-11-21 17:42                         ` David Laight
2014-11-21 19:39                           ` Al Viro
2014-11-21 19:40                             ` Linus Torvalds
2014-11-24 10:03                             ` David Laight
2014-11-22  3:27                         ` Al Viro
2014-11-22  3:36                           ` Al Viro
2014-11-24 10:27                           ` David Laight
2014-11-20 23:23                 ` David Miller
2014-11-21 17:26                   ` David Miller
2014-11-22  4:28                     ` Al Viro
2014-11-22  4:29                       ` [PATCH 01/17] new helper: skb_copy_and_csum_datagram_msg() Al Viro
2014-11-22  4:30                       ` [PATCH 02/17] new helper: memcpy_from_msg() Al Viro
2014-11-22  4:30                       ` [PATCH 03/17] switch ipxrtr_route_packet() from iovec to msghdr Al Viro
2014-11-22  4:31                       ` [PATCH 04/17] new helper: memcpy_to_msg() Al Viro
2014-11-22  4:32                       ` [PATCH 05/17] switch drivers/net/tun.c to ->read_iter() Al Viro
2014-11-22  4:32                       ` [PATCH 06/17] switch macvtap " Al Viro
2014-11-23 23:29                         ` Ben Hutchings
2014-11-22  4:33                       ` [PATCH 07/17] new helpers: skb_copy_datagram_from_iter() and zerocopy_sg_from_iter() Al Viro
2014-11-24  0:02                         ` Ben Hutchings
2014-11-24  0:29                           ` Ben Hutchings
2014-11-24  5:34                           ` Jason Wang
2014-11-24 10:03                             ` Al Viro
2014-11-22  4:33                       ` [PATCH 08/17] {macvtap,tun}_get_user(): switch to iov_iter Al Viro
2014-11-24  0:27                         ` Ben Hutchings
2014-11-24  1:06                           ` Ben Hutchings
2014-11-24 10:15                           ` Al Viro
2014-11-22  4:34                       ` [PATCH 09/17] kill zerocopy_sg_from_iovec() Al Viro
2014-11-22  4:35                       ` [PATCH 10/17] switch AF_PACKET and AF_UNIX to skb_copy_datagram_from_iter() Al Viro
2014-11-22  4:36                       ` PATCH 11/17] switch sctp_user_addto_chunk() and sctp_datamsg_from_user() to passing iov_iter Al Viro
2014-11-22  4:36                       ` [PATCH 12/17] tipc_sendmsg(): pass msghdr instead of its ->msg_iov Al Viro
2014-11-22  4:37                       ` [PATCH 13/17] tipc_msg_build(): " Al Viro
2014-11-22  4:37                       ` [PATCH 14/17] vmci_transport: switch ->enqeue_dgram, ->enqueue_stream and ->dequeue_stream to msghdr Al Viro
2014-11-22  4:38                       ` [PATCH 15/17] [atm] switch vcc_sendmsg() to copy_from_iter() Al Viro
2014-11-22  4:38                       ` [PATCH 16/17] rds: switch ->inc_copy_to_user() to passing iov_iter Al Viro
2014-11-22  4:39                       ` [PATCH 17/17] rds: switch rds_message_copy_from_user() to iov_iter Al Viro
2014-11-24  2:00                         ` Ben Hutchings
2014-11-24 10:17                           ` Al Viro
2014-11-22  7:24                       ` [RFC] situation with csum_and_copy_... API David Miller
2014-11-25  2:40                         ` Al Viro
2014-11-25 14:02                           ` [PATCH v2 01/17] new helper: skb_copy_and_csum_datagram_msg() Al Viro
2014-11-25 19:28                             ` David Miller
2014-11-25 20:59                               ` Al Viro
2014-11-26 17:27                                 ` David Miller
2014-12-05  5:56                                   ` the next chunk of iov_iter-net stuff for review Al Viro
2014-12-05  5:58                                     ` [PATCH 01/12] raw.c: stick msghdr into raw_frag_vec Al Viro
2014-12-05  5:58                                     ` [PATCH 02/12] ipv6 equivalent of "ipv4: Avoid reading user iov twice after raw_probe_proto_opt" Al Viro
2014-12-05  5:58                                     ` [PATCH 03/12] ip_generic_getfrag, udplite_getfrag: switch to passing msghdr Al Viro
2014-12-05  5:58                                     ` [PATCH 04/12] switch tcp_sock->ucopy from iovec (ucopy.iov) to msghdr (ucopy.msg) Al Viro
2014-12-05  5:58                                     ` [PATCH 05/12] switch l2cap ->memcpy_fromiovec() to msghdr Al Viro
2014-12-05  5:58                                     ` [PATCH 06/12] vmci: propagate msghdr all way down to __qp_memcpy_from_queue() Al Viro
2014-12-05  5:58                                     ` [PATCH 07/12] put iov_iter into msghdr Al Viro
2014-12-05  5:58                                     ` [PATCH 08/12] first fruits - kill l2cap ->memcpy_fromiovec() Al Viro
2014-12-05  5:58                                     ` [PATCH 09/12] switch memcpy_to_msg() and skb_copy{,_and_csum}_datagram_msg() to primitives Al Viro
2014-12-05  5:58                                     ` [PATCH 10/12] ppp_read(): switch to skb_copy_datagram_iter() Al Viro
2014-12-05  5:58                                     ` [PATCH 11/12] skb_copy_datagram_iovec() can die Al Viro
2014-12-05  5:58                                     ` [PATCH 12/12] bury memcpy_toiovec() Al Viro
2014-12-09 20:07                                     ` the next chunk of iov_iter-net stuff for review David Miller
2014-12-09 21:04                                       ` Al Viro
2014-12-09 21:17                                         ` David Miller
2014-12-09 21:23                                           ` Al Viro
2014-12-09 21:37                                             ` David Miller
2014-12-09 22:49                                               ` Al Viro
2014-12-09 22:50                                                 ` [PATCH 01/25] iov_iter.c: macros for iterating over iov_iter Al Viro
2014-12-09 22:50                                                 ` [PATCH 02/25] iov_iter.c: iterate_and_advance Al Viro
2014-12-09 22:50                                                 ` [PATCH 03/25] iov_iter.c: convert iov_iter_npages() to iterate_all_kinds Al Viro
2014-12-09 22:50                                                 ` [PATCH 04/25] iov_iter.c: convert iov_iter_get_pages() " Al Viro
2014-12-09 22:50                                                 ` [PATCH 05/25] iov_iter.c: convert iov_iter_get_pages_alloc() " Al Viro
2014-12-09 22:50                                                 ` [PATCH 06/25] iov_iter.c: convert iov_iter_zero() to iterate_and_advance Al Viro
2014-12-09 22:50                                                 ` [PATCH 07/25] iov_iter.c: get rid of bvec_copy_page_{to,from}_iter() Al Viro
2014-12-09 22:50                                                 ` [PATCH 08/25] iov_iter.c: convert copy_from_iter() to iterate_and_advance Al Viro
2014-12-09 22:50                                                 ` [PATCH 09/25] iov_iter.c: convert copy_to_iter() " Al Viro
2014-12-09 22:50                                                 ` [PATCH 10/25] iov_iter.c: handle ITER_KVEC directly Al Viro
2014-12-09 22:50                                                 ` [PATCH 11/25] csum_and_copy_..._iter() Al Viro
2014-12-09 22:50                                                 ` [PATCH 12/25] new helper: iov_iter_kvec() Al Viro
2014-12-09 22:50                                                 ` [PATCH 13/25] copy_from_iter_nocache() Al Viro
2014-12-09 22:50                                                 ` [PATCH 14/25] raw.c: stick msghdr into raw_frag_vec Al Viro
2014-12-09 22:50                                                 ` [PATCH 15/25] ipv6 equivalent of "ipv4: Avoid reading user iov twice after raw_probe_proto_opt" Al Viro
2014-12-09 22:50                                                 ` [PATCH 16/25] ip_generic_getfrag, udplite_getfrag: switch to passing msghdr Al Viro
2014-12-09 22:50                                                 ` [PATCH 17/25] switch tcp_sock->ucopy from iovec (ucopy.iov) to msghdr (ucopy.msg) Al Viro
2014-12-09 22:50                                                 ` [PATCH 18/25] switch l2cap ->memcpy_fromiovec() to msghdr Al Viro
2014-12-09 22:50                                                 ` [PATCH 19/25] vmci: propagate msghdr all way down to __qp_memcpy_from_queue() Al Viro
2014-12-09 22:50                                                 ` [PATCH 20/25] put iov_iter into msghdr Al Viro
2014-12-09 22:50                                                 ` [PATCH 21/25] first fruits - kill l2cap ->memcpy_fromiovec() Al Viro
2014-12-09 22:50                                                 ` [PATCH 22/25] switch memcpy_to_msg() and skb_copy{,_and_csum}_datagram_msg() to primitives Al Viro
2014-12-09 22:50                                                 ` [PATCH 23/25] ppp_read(): switch to skb_copy_datagram_iter() Al Viro
2014-12-09 22:50                                                 ` [PATCH 24/25] skb_copy_datagram_iovec() can die Al Viro
2014-12-09 22:50                                                 ` [PATCH 25/25] bury memcpy_toiovec() Al Viro
2014-12-09 23:13                                                 ` the next chunk of iov_iter-net stuff for review Al Viro
2014-12-10 18:25                                                 ` David Miller
2014-11-25 14:02                           ` [PATCH v2 02/17] new helper: memcpy_from_msg() Al Viro
2014-11-25 14:02                           ` [PATCH v2 03/17] switch ipxrtr_route_packet() from iovec to msghdr Al Viro
2014-11-25 14:02                           ` [PATCH v2 04/17] new helper: memcpy_to_msg() Al Viro
2014-11-25 14:02                           ` [PATCH v2 05/17] switch drivers/net/tun.c to ->read_iter() Al Viro
2014-11-25 14:02                           ` [PATCH v2 06/17] switch macvtap " Al Viro
2014-11-25 14:02                           ` [PATCH v2 07/17] new helpers: skb_copy_datagram_from_iter() and zerocopy_sg_from_iter() Al Viro
2014-11-25 14:02                           ` [PATCH v2 08/17] {macvtap,tun}_get_user(): switch to iov_iter Al Viro
2015-02-03 10:10                             ` Michael S. Tsirkin
2015-02-03 14:27                               ` Al Viro
2015-02-03 15:19                                 ` Michael S. Tsirkin
2014-11-25 14:02                           ` [PATCH v2 09/17] kill zerocopy_sg_from_iovec() Al Viro
2014-11-25 14:02                           ` [PATCH v2 10/17] switch AF_PACKET and AF_UNIX to skb_copy_datagram_from_iter() Al Viro
2014-11-25 14:02                           ` [PATCH v2 11/17] switch sctp_user_addto_chunk() and sctp_datamsg_from_user() to passing iov_iter Al Viro
2014-11-25 14:02                           ` [PATCH v2 12/17] tipc_sendmsg(): pass msghdr instead of its ->msg_iov Al Viro
2014-11-25 14:02                           ` [PATCH v2 13/17] tipc_msg_build(): " Al Viro
2014-11-25 14:02                           ` [PATCH v2 14/17] vmci_transport: switch ->enqeue_dgram, ->enqueue_stream and ->dequeue_stream to msghdr Al Viro
2014-11-25 14:02                           ` [PATCH v2 15/17] [atm] switch vcc_sendmsg() to copy_from_iter() Al Viro
2014-11-25 14:02                           ` [PATCH v2 16/17] rds: switch ->inc_copy_to_user() to passing iov_iter Al Viro
2014-11-25 14:02                           ` [PATCH v2 17/17] rds: switch rds_message_copy_from_user() to iov_iter Al Viro
2014-11-22 17:48                       ` [RFC] situation with csum_and_copy_... API Linus Torvalds
2014-11-21  4:17                 ` Nicholas A. Bellinger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141121084956.GT7996@ZenIV.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=hch@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nab@linux-iscsi.org \
    --cc=netdev@vger.kernel.org \
    --cc=target-devel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.