All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next] tls: cap the output scatter list to something reasonable
@ 2022-02-02 22:20 Jakub Kicinski
  2022-02-04 10:40 ` patchwork-bot+netdevbpf
  2022-02-07 17:15 ` Al Viro
  0 siblings, 2 replies; 7+ messages in thread
From: Jakub Kicinski @ 2022-02-02 22:20 UTC (permalink / raw)
  To: davem
  Cc: netdev, viro, borisp, john.fastabend, daniel, vfedorenko,
	kernel-team, axboe, Jakub Kicinski

TLS recvmsg() passes user pages as destination for decrypt.
The decrypt operation is repeated record by record, each
record being 16kB, max. TLS allocates an sg_table and uses
iov_iter_get_pages() to populate it with enough pages to
fit the decrypted record.

Even though we decrypt a single message at a time we size
the sg_table based on the entire length of the iovec.
This leads to unnecessarily large allocations, risking
triggering OOM conditions.

Use iov_iter_truncate() / iov_iter_reexpand() to construct
a "capped" version of iov_iter_npages(). Alternatively we
could parametrize iov_iter_npages() to take the size as
arg instead of using i->count, or do something else..

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
 include/linux/uio.h | 17 +++++++++++++++++
 net/tls/tls_sw.c    |  3 ++-
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/include/linux/uio.h b/include/linux/uio.h
index 1198a2bfc9bf..739285fe5a2f 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -273,6 +273,23 @@ static inline void iov_iter_reexpand(struct iov_iter *i, size_t count)
 	i->count = count;
 }
 
+static inline int
+iov_iter_npages_cap(struct iov_iter *i, int maxpages, size_t max_bytes)
+{
+	size_t shorted = 0;
+	int npages;
+
+	if (iov_iter_count(i) > max_bytes) {
+		shorted = iov_iter_count(i) - max_bytes;
+		iov_iter_truncate(i, max_bytes);
+	}
+	npages = iov_iter_npages(i, INT_MAX);
+	if (shorted)
+		iov_iter_reexpand(i, iov_iter_count(i) + shorted);
+
+	return npages;
+}
+
 struct csum_state {
 	__wsum csum;
 	size_t off;
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index efc84845bb6b..0024a692f0f8 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -1433,7 +1433,8 @@ static int decrypt_internal(struct sock *sk, struct sk_buff *skb,
 
 	if (*zc && (out_iov || out_sg)) {
 		if (out_iov)
-			n_sgout = iov_iter_npages(out_iov, INT_MAX) + 1;
+			n_sgout = 1 +
+				iov_iter_npages_cap(out_iov, INT_MAX, data_len);
 		else
 			n_sgout = sg_nents(out_sg);
 		n_sgin = skb_nsg(skb, rxm->offset + prot->prepend_size,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next] tls: cap the output scatter list to something reasonable
  2022-02-02 22:20 [PATCH net-next] tls: cap the output scatter list to something reasonable Jakub Kicinski
@ 2022-02-04 10:40 ` patchwork-bot+netdevbpf
  2022-02-07 17:15 ` Al Viro
  1 sibling, 0 replies; 7+ messages in thread
From: patchwork-bot+netdevbpf @ 2022-02-04 10:40 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, netdev, viro, borisp, john.fastabend, daniel, vfedorenko,
	kernel-team, axboe

Hello:

This patch was applied to netdev/net-next.git (master)
by David S. Miller <davem@davemloft.net>:

On Wed,  2 Feb 2022 14:20:31 -0800 you wrote:
> TLS recvmsg() passes user pages as destination for decrypt.
> The decrypt operation is repeated record by record, each
> record being 16kB, max. TLS allocates an sg_table and uses
> iov_iter_get_pages() to populate it with enough pages to
> fit the decrypted record.
> 
> Even though we decrypt a single message at a time we size
> the sg_table based on the entire length of the iovec.
> This leads to unnecessarily large allocations, risking
> triggering OOM conditions.
> 
> [...]

Here is the summary with links:
  - [net-next] tls: cap the output scatter list to something reasonable
    https://git.kernel.org/netdev/net-next/c/b93235e68921

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next] tls: cap the output scatter list to something reasonable
  2022-02-02 22:20 [PATCH net-next] tls: cap the output scatter list to something reasonable Jakub Kicinski
  2022-02-04 10:40 ` patchwork-bot+netdevbpf
@ 2022-02-07 17:15 ` Al Viro
  2022-02-07 17:26   ` Jakub Kicinski
  1 sibling, 1 reply; 7+ messages in thread
From: Al Viro @ 2022-02-07 17:15 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, netdev, borisp, john.fastabend, daniel, vfedorenko,
	kernel-team, axboe

On Wed, Feb 02, 2022 at 02:20:31PM -0800, Jakub Kicinski wrote:
> TLS recvmsg() passes user pages as destination for decrypt.
> The decrypt operation is repeated record by record, each
> record being 16kB, max. TLS allocates an sg_table and uses
> iov_iter_get_pages() to populate it with enough pages to
> fit the decrypted record.
> 
> Even though we decrypt a single message at a time we size
> the sg_table based on the entire length of the iovec.
> This leads to unnecessarily large allocations, risking
> triggering OOM conditions.
> 
> Use iov_iter_truncate() / iov_iter_reexpand() to construct
> a "capped" version of iov_iter_npages(). Alternatively we
> could parametrize iov_iter_npages() to take the size as
> arg instead of using i->count, or do something else..

Er...  Would simply passing 16384/PAGE_SIZE instead of MAX_INT work
for your purposes?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next] tls: cap the output scatter list to something reasonable
  2022-02-07 17:15 ` Al Viro
@ 2022-02-07 17:26   ` Jakub Kicinski
  2022-02-07 21:20     ` Al Viro
  0 siblings, 1 reply; 7+ messages in thread
From: Jakub Kicinski @ 2022-02-07 17:26 UTC (permalink / raw)
  To: Al Viro
  Cc: davem, netdev, borisp, john.fastabend, daniel, vfedorenko,
	kernel-team, axboe

On Mon, 7 Feb 2022 17:15:30 +0000 Al Viro wrote:
> On Wed, Feb 02, 2022 at 02:20:31PM -0800, Jakub Kicinski wrote:
> > TLS recvmsg() passes user pages as destination for decrypt.
> > The decrypt operation is repeated record by record, each
> > record being 16kB, max. TLS allocates an sg_table and uses
> > iov_iter_get_pages() to populate it with enough pages to
> > fit the decrypted record.
> > 
> > Even though we decrypt a single message at a time we size
> > the sg_table based on the entire length of the iovec.
> > This leads to unnecessarily large allocations, risking
> > triggering OOM conditions.
> > 
> > Use iov_iter_truncate() / iov_iter_reexpand() to construct
> > a "capped" version of iov_iter_npages(). Alternatively we
> > could parametrize iov_iter_npages() to take the size as
> > arg instead of using i->count, or do something else..  
> 
> Er...  Would simply passing 16384/PAGE_SIZE instead of MAX_INT work
> for your purposes?

The last arg is maxpages, I want maxbytes, no?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next] tls: cap the output scatter list to something reasonable
  2022-02-07 17:26   ` Jakub Kicinski
@ 2022-02-07 21:20     ` Al Viro
  2022-02-07 23:34       ` Jakub Kicinski
  0 siblings, 1 reply; 7+ messages in thread
From: Al Viro @ 2022-02-07 21:20 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, netdev, borisp, john.fastabend, daniel, vfedorenko,
	kernel-team, axboe

On Mon, Feb 07, 2022 at 09:26:19AM -0800, Jakub Kicinski wrote:
> On Mon, 7 Feb 2022 17:15:30 +0000 Al Viro wrote:
> > On Wed, Feb 02, 2022 at 02:20:31PM -0800, Jakub Kicinski wrote:
> > > TLS recvmsg() passes user pages as destination for decrypt.
> > > The decrypt operation is repeated record by record, each
> > > record being 16kB, max. TLS allocates an sg_table and uses
> > > iov_iter_get_pages() to populate it with enough pages to
> > > fit the decrypted record.
> > > 
> > > Even though we decrypt a single message at a time we size
> > > the sg_table based on the entire length of the iovec.
> > > This leads to unnecessarily large allocations, risking
> > > triggering OOM conditions.
> > > 
> > > Use iov_iter_truncate() / iov_iter_reexpand() to construct
> > > a "capped" version of iov_iter_npages(). Alternatively we
> > > could parametrize iov_iter_npages() to take the size as
> > > arg instead of using i->count, or do something else..  
> > 
> > Er...  Would simply passing 16384/PAGE_SIZE instead of MAX_INT work
> > for your purposes?
> 
> The last arg is maxpages, I want maxbytes, no?

What's the point of pass maxpages as argument to that, seeing that
you ignore the value you've got?  I'm just trying to understand what
semantics do you really intend for that thing.

Another thing: looking at that bunch now, for pipe-backed ones
that won't work.  It's a bug, strictly speaking, even though
the actual primitives that grab those pages *will* honour the
truncation/reexpand.

Frankly, I wonder if we would be better off with making iov_iter_npages()
a wrapper for that one, passing SIZE_MAX as maxbytes.  How does the following
(completely untested) look for you?

diff --git a/include/linux/uio.h b/include/linux/uio.h
index 1198a2bfc9bfc..07e4ebae7c6fa 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -236,11 +236,16 @@ ssize_t iov_iter_get_pages(struct iov_iter *i, struct page **pages,
 			size_t maxsize, unsigned maxpages, size_t *start);
 ssize_t iov_iter_get_pages_alloc(struct iov_iter *i, struct page ***pages,
 			size_t maxsize, size_t *start);
-int iov_iter_npages(const struct iov_iter *i, int maxpages);
+int __iov_iter_npages(const struct iov_iter *i, size_t maxsize, int maxpages);
 void iov_iter_restore(struct iov_iter *i, struct iov_iter_state *state);
 
 const void *dup_iter(struct iov_iter *new, struct iov_iter *old, gfp_t flags);
 
+static inline int iov_iter_npages(const struct iov_iter *i, int maxpages)
+{
+	return __iov_iter_npages(i, SIZE_MAX, maxpages);
+}
+
 static inline size_t iov_iter_count(const struct iov_iter *i)
 {
 	return i->count;
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index b0e0acdf96c15..35a86d68f7073 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1763,9 +1763,9 @@ size_t hash_and_copy_to_iter(const void *addr, size_t bytes, void *hashp,
 }
 EXPORT_SYMBOL(hash_and_copy_to_iter);
 
-static int iov_npages(const struct iov_iter *i, int maxpages)
+static int iov_npages(const struct iov_iter *i, size_t size, int maxpages)
 {
-	size_t skip = i->iov_offset, size = i->count;
+	size_t skip = i->iov_offset;
 	const struct iovec *p;
 	int npages = 0;
 
@@ -1783,9 +1783,9 @@ static int iov_npages(const struct iov_iter *i, int maxpages)
 	return npages;
 }
 
-static int bvec_npages(const struct iov_iter *i, int maxpages)
+static int bvec_npages(const struct iov_iter *i, size_t size, int maxpages)
 {
-	size_t skip = i->iov_offset, size = i->count;
+	size_t skip = i->iov_offset;
 	const struct bio_vec *p;
 	int npages = 0;
 
@@ -1801,36 +1801,43 @@ static int bvec_npages(const struct iov_iter *i, int maxpages)
 	return npages;
 }
 
-int iov_iter_npages(const struct iov_iter *i, int maxpages)
+int __iov_iter_npages(const struct iov_iter *i, size_t maxsize, int maxpages)
 {
-	if (unlikely(!i->count))
+	if (i->count < maxsize)
+		maxsize = i->count;
+	if (unlikely(!maxsize))
 		return 0;
 	/* iovec and kvec have identical layouts */
 	if (likely(iter_is_iovec(i) || iov_iter_is_kvec(i)))
-		return iov_npages(i, maxpages);
+		return iov_npages(i, maxsize, maxpages);
 	if (iov_iter_is_bvec(i))
-		return bvec_npages(i, maxpages);
+		return bvec_npages(i, maxsize, maxpages);
 	if (iov_iter_is_pipe(i)) {
 		unsigned int iter_head;
 		int npages;
 		size_t off;
+		size_t n;
 
 		if (!sanity(i))
 			return 0;
 
 		data_start(i, &iter_head, &off);
+		n = DIV_ROUND_UP(off + maxsize, PAGE_SIZE);
+		if (maxpages > n)
+			maxpages = n;
+
 		/* some of this one + all after this one */
 		npages = pipe_space_for_user(iter_head, i->pipe->tail, i->pipe);
 		return min(npages, maxpages);
 	}
 	if (iov_iter_is_xarray(i)) {
 		unsigned offset = (i->xarray_start + i->iov_offset) % PAGE_SIZE;
-		int npages = DIV_ROUND_UP(offset + i->count, PAGE_SIZE);
+		int npages = DIV_ROUND_UP(offset + maxsize, PAGE_SIZE);
 		return min(npages, maxpages);
 	}
 	return 0;
 }
-EXPORT_SYMBOL(iov_iter_npages);
+EXPORT_SYMBOL(__iov_iter_npages);
 
 const void *dup_iter(struct iov_iter *new, struct iov_iter *old, gfp_t flags)
 {

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next] tls: cap the output scatter list to something reasonable
  2022-02-07 21:20     ` Al Viro
@ 2022-02-07 23:34       ` Jakub Kicinski
  2022-02-18  1:48         ` Jakub Kicinski
  0 siblings, 1 reply; 7+ messages in thread
From: Jakub Kicinski @ 2022-02-07 23:34 UTC (permalink / raw)
  To: Al Viro
  Cc: davem, netdev, borisp, john.fastabend, daniel, vfedorenko,
	kernel-team, axboe

On Mon, 7 Feb 2022 21:20:04 +0000 Al Viro wrote:
> > > > Alternatively we could parametrize iov_iter_npages() to take
> > > > the size as arg instead of using i->count, or do something
> > > > else..    
> > > 
> > > Er...  Would simply passing 16384/PAGE_SIZE instead of MAX_INT
> > > work for your purposes?  
> > 
> > The last arg is maxpages, I want maxbytes, no?  
> 
> What's the point of pass maxpages as argument to that, seeing that
> you ignore the value you've got?  I'm just trying to understand what
> semantics do you really intend for that thing.
> 
> Another thing: looking at that bunch now, for pipe-backed ones
> that won't work.  It's a bug, strictly speaking, even though
> the actual primitives that grab those pages *will* honour the
> truncation/reexpand.
> 
> Frankly, I wonder if we would be better off with making
> iov_iter_npages() a wrapper for that one, passing SIZE_MAX as
> maxbytes.  How does the following (completely untested) look for you?

That looks cleaner to me as well! Will you submit officially or should 
I take care of the conversion?  My v1 has already made its way to
net-next.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next] tls: cap the output scatter list to something reasonable
  2022-02-07 23:34       ` Jakub Kicinski
@ 2022-02-18  1:48         ` Jakub Kicinski
  0 siblings, 0 replies; 7+ messages in thread
From: Jakub Kicinski @ 2022-02-18  1:48 UTC (permalink / raw)
  To: Al Viro
  Cc: davem, netdev, borisp, john.fastabend, daniel, vfedorenko,
	kernel-team, axboe

On Mon, 7 Feb 2022 15:34:50 -0800 Jakub Kicinski wrote:
> > What's the point of pass maxpages as argument to that, seeing that
> > you ignore the value you've got?  I'm just trying to understand what
> > semantics do you really intend for that thing.
> > 
> > Another thing: looking at that bunch now, for pipe-backed ones
> > that won't work.  It's a bug, strictly speaking, even though
> > the actual primitives that grab those pages *will* honour the
> > truncation/reexpand.
> > 
> > Frankly, I wonder if we would be better off with making
> > iov_iter_npages() a wrapper for that one, passing SIZE_MAX as
> > maxbytes.  How does the following (completely untested) look for you?  
> 
> That looks cleaner to me as well! Will you submit officially or should 
> I take care of the conversion?  My v1 has already made its way to
> net-next.

Gentle reminder that I need at least your sign-off tag to make progress
here.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-02-18  1:48 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-02 22:20 [PATCH net-next] tls: cap the output scatter list to something reasonable Jakub Kicinski
2022-02-04 10:40 ` patchwork-bot+netdevbpf
2022-02-07 17:15 ` Al Viro
2022-02-07 17:26   ` Jakub Kicinski
2022-02-07 21:20     ` Al Viro
2022-02-07 23:34       ` Jakub Kicinski
2022-02-18  1:48         ` Jakub Kicinski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.