linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Howells <dhowells@redhat.com>
To: netdev@vger.kernel.org
Cc: David Howells <dhowells@redhat.com>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Willem de Bruijn <willemdebruijn.kernel@gmail.com>,
	David Ahern <dsahern@kernel.org>,
	Matthew Wilcox <willy@infradead.org>,
	Jens Axboe <axboe@kernel.dk>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: [PATCH net-next 00/12] splice, net: Replace sendpage with sendmsg(MSG_SPLICE_PAGES), part 3
Date: Wed, 24 May 2023 16:32:59 +0100	[thread overview]
Message-ID: <20230524153311.3625329-1-dhowells@redhat.com> (raw)

Here's the third tranche of patches towards providing a MSG_SPLICE_PAGES
internal sendmsg flag that is intended to replace the ->sendpage() op with
calls to sendmsg().  MSG_SPLICE_PAGES is a hint that tells the protocol
that it should splice the pages supplied if it can and copy them if not.

The primary focus of this tranche is to allow data passed in the slab to be
copied into page fragments (appending it to existing free space within an
sk_buff could also be possible), thereby allowing a single sendmsg() to mix
data held in the slab (such as higher-level protocol pieces) and data held
in pages (such as content for a network filesystem).  This puts the copying
in (mostly) one place: skb_splice_from_iter().

To make this work, some sort of locking is needed with the allocator.  I've
chosen to make the allocator internally have a separate bucket per cpu, as
the netdev and napi allocators already do - and then share the allocated
pages amongst those services that were using their own allocators.  I'm not
sure that the existing usage of the allocator is completely thread safe.

TLS is also converted here because that does things differently and uses
sk_msg rather than sk_buff - and so can't use skb_splice_from_iter().

So, firstly the page_frag_alloc_align() allocator is overhauled:

 (1) Split it out from mm/page_alloc.c into its own file,
     mm/page_frag_alloc.c.

 (2) Add a common function to clear an allocator.

 (3) Make the alignment specification consistent with some of the wrapper
     functions.

 (4) Make it use multipage folios rather than compound pages.

 (5) Make it handle __GFP_ZERO, rather than devolving this to the page
     allocator.

     Note that the current behaviour is potentially broken as the page may
     get reused if all refs have been dropped, but it doesn't then get
     cleared.  This might mean that the NVMe over TCP driver, for example,
     will malfunction under some circumstances.

 (6) Give it per-cpu buckets to allocate from to avoid the need for locking
     against users on other cpus.

 (7) The netdev_alloc_cache and the napi fragment cache are then recast
     in terms of this and some private allocators are removed.

We can then make use of the page fragment allocator to copy data that is
resident in the slab rather than returning EIO:

 (8) Make skb_splice_from_iter() copy data provided in the slab to page
     fragments.

 (9) Implement MSG_SPLICE_PAGES support in the AF_TLS-sw sendmsg and make
     tls_sw_sendpage() just a wrapper around sendmsg().

(10) Implement MSG_SPLICE_PAGES support in AF_TLS-device and make
     tls_device_sendpage() just a wrapper around sendmsg().

I've pushed the patches here also:

	https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=sendpage-3

David

Link: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=51c78a4d532efe9543a4df019ff405f05c6157f6 # part 1

David Howells (12):
  mm: Move the page fragment allocator from page_alloc.c into its own
    file
  mm: Provide a page_frag_cache allocator cleanup function
  mm: Make the page_frag_cache allocator alignment param a pow-of-2
  mm: Make the page_frag_cache allocator use multipage folios
  mm: Make the page_frag_cache allocator handle __GFP_ZERO itself
  mm: Make the page_frag_cache allocator use per-cpu
  net: Clean up users of netdev_alloc_cache and napi_frag_cache
  net: Copy slab data for sendmsg(MSG_SPLICE_PAGES)
  tls/sw: Support MSG_SPLICE_PAGES
  tls/sw: Convert tls_sw_sendpage() to use MSG_SPLICE_PAGES
  tls/device: Support MSG_SPLICE_PAGES
  tls/device: Convert tls_device_sendpage() to use MSG_SPLICE_PAGES

 drivers/net/ethernet/google/gve/gve.h      |   1 -
 drivers/net/ethernet/google/gve/gve_main.c |  16 --
 drivers/net/ethernet/google/gve/gve_rx.c   |   2 +-
 drivers/net/ethernet/mediatek/mtk_wed_wo.c |  19 +-
 drivers/net/ethernet/mediatek/mtk_wed_wo.h |   2 -
 drivers/nvme/host/tcp.c                    |  19 +-
 drivers/nvme/target/tcp.c                  |  22 +-
 include/linux/gfp.h                        |  17 +-
 include/linux/mm_types.h                   |  13 +-
 include/linux/skbuff.h                     |  28 +--
 mm/Makefile                                |   2 +-
 mm/page_alloc.c                            | 126 ------------
 mm/page_frag_alloc.c                       | 206 +++++++++++++++++++
 net/core/skbuff.c                          |  94 +++++----
 net/tls/tls_device.c                       |  93 ++++-----
 net/tls/tls_sw.c                           | 221 ++++++++-------------
 16 files changed, 418 insertions(+), 463 deletions(-)
 create mode 100644 mm/page_frag_alloc.c



             reply	other threads:[~2023-05-24 15:33 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-24 15:32 David Howells [this message]
2023-05-24 15:33 ` [PATCH net-next 01/12] mm: Move the page fragment allocator from page_alloc.c into its own file David Howells
2023-05-24 15:33 ` [PATCH net-next 02/12] mm: Provide a page_frag_cache allocator cleanup function David Howells
2023-05-24 15:33 ` [PATCH net-next 03/12] mm: Make the page_frag_cache allocator alignment param a pow-of-2 David Howells
2023-05-27 15:54   ` Alexander H Duyck
2023-11-30  9:00     ` Yunsheng Lin
2023-06-16 15:28   ` David Howells
2023-06-16 16:06     ` Alexander Duyck
2023-05-24 15:33 ` [PATCH net-next 04/12] mm: Make the page_frag_cache allocator use multipage folios David Howells
2023-05-26 11:56   ` Yunsheng Lin
2023-05-27 15:47     ` Alexander H Duyck
2023-05-26 12:47   ` David Howells
2023-05-26 14:06     ` Mika Penttilä
2023-05-27  0:50   ` Jakub Kicinski
2023-05-24 15:33 ` [PATCH net-next 05/12] mm: Make the page_frag_cache allocator handle __GFP_ZERO itself David Howells
2023-05-27  0:57   ` Jakub Kicinski
2023-05-27 15:54     ` Alexander Duyck
2023-05-24 15:33 ` [PATCH net-next 06/12] mm: Make the page_frag_cache allocator use per-cpu David Howells
2023-05-27  1:02   ` Jakub Kicinski
2023-05-24 15:33 ` [PATCH net-next 07/12] net: Clean up users of netdev_alloc_cache and napi_frag_cache David Howells
2023-05-24 15:33 ` [PATCH net-next 08/12] net: Copy slab data for sendmsg(MSG_SPLICE_PAGES) David Howells
2023-05-24 15:33 ` [PATCH net-next 09/12] tls/sw: Support MSG_SPLICE_PAGES David Howells
2023-05-27  1:08   ` Jakub Kicinski
2023-05-30 22:26   ` Bug in short splice to socket? David Howells
2023-05-31  0:32     ` Jakub Kicinski
2023-05-24 15:33 ` [PATCH net-next 10/12] tls/sw: Convert tls_sw_sendpage() to use MSG_SPLICE_PAGES David Howells
2023-05-27  1:13   ` Jakub Kicinski
2023-05-24 15:33 ` [PATCH net-next 11/12] tls/device: Support MSG_SPLICE_PAGES David Howells
2023-05-24 15:33 ` [PATCH net-next 12/12] tls/device: Convert tls_device_sendpage() to use MSG_SPLICE_PAGES David Howells

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230524153311.3625329-1-dhowells@redhat.com \
    --to=dhowells@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=edumazet@google.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=willemdebruijn.kernel@gmail.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).