linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] af_unix: utilize skb's fragment list for sending large datagrams
@ 2019-08-22 10:38 Jan Dakinevich
  2019-08-22 19:04 ` David Miller
  0 siblings, 1 reply; 3+ messages in thread
From: Jan Dakinevich @ 2019-08-22 10:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Denis Lunev, Konstantin Khorenko, Jan Dakinevich,
	David S. Miller, Paolo Abeni, Al Viro, Jens Axboe,
	Hannes Reinecke, Karsten Graul, Kyeongdon Kim, Thomas Gleixner,
	netdev

When somebody tries to send big datagram, kernel makes an attempt to
avoid high-order allocation placing it into both: skb's data buffer
and skb's paged part (->frag).

However, paged part can not exceed MAX_SKB_FRAGS * PAGE_SIZE, and large
datagram causes increasing skb's data buffer. Thus, if any user-space
program sets send buffer (by calling setsockopt(SO_SNDBUF, ...)) to
maximum allowed size (wmem_max) it becomes able to cause any amount
of uncontrolled high-order kernel allocations.

To avoid this, do not pass more then SKB_MAX_ALLOC for skb's data
buffer and make use of fragment list of skb (->frag_list) in addition
to paged part for huge datagrams.

Signed-off-by: Jan Dakinevich <jan.dakinevich@virtuozzo.com>
---
 net/unix/af_unix.c | 38 +++++++++++++++++++++++++++-----------
 1 file changed, 27 insertions(+), 11 deletions(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 67e87db..0c13937 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1580,7 +1580,9 @@ static int unix_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
 	struct sk_buff *skb;
 	long timeo;
 	struct scm_cookie scm;
-	int data_len = 0;
+	unsigned long frag_len;
+	unsigned long paged_len;
+	unsigned long header_len;
 	int sk_locked;
 
 	wait_for_unix_gc();
@@ -1613,27 +1615,41 @@ static int unix_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
 	if (len > sk->sk_sndbuf - 32)
 		goto out;
 
-	if (len > SKB_MAX_ALLOC) {
-		data_len = min_t(size_t,
-				 len - SKB_MAX_ALLOC,
-				 MAX_SKB_FRAGS * PAGE_SIZE);
-		data_len = PAGE_ALIGN(data_len);
+	BUILD_BUG_ON(SKB_MAX_ALLOC < PAGE_SIZE);
 
-		BUILD_BUG_ON(SKB_MAX_ALLOC < PAGE_SIZE);
-	}
+	header_len = min(len, SKB_MAX_ALLOC);
+	paged_len = min(len - header_len, MAX_SKB_FRAGS * PAGE_SIZE);
+	frag_len = len - header_len - paged_len;
 
-	skb = sock_alloc_send_pskb(sk, len - data_len, data_len,
+	skb = sock_alloc_send_pskb(sk, header_len, paged_len,
 				   msg->msg_flags & MSG_DONTWAIT, &err,
 				   PAGE_ALLOC_COSTLY_ORDER);
 	if (skb == NULL)
 		goto out;
 
+	while (frag_len) {
+		unsigned long size = min(SKB_MAX_ALLOC, frag_len);
+		struct sk_buff *frag;
+
+		frag = sock_alloc_send_pskb(sk, size, 0,
+					    msg->msg_flags & MSG_DONTWAIT,
+					    &err, 0);
+		if (!frag)
+			goto out_free;
+
+		skb_put(frag, size);
+		frag->next = skb_shinfo(skb)->frag_list;
+		skb_shinfo(skb)->frag_list = frag;
+
+		frag_len -= size;
+	}
+
 	err = unix_scm_to_skb(&scm, skb, true);
 	if (err < 0)
 		goto out_free;
 
-	skb_put(skb, len - data_len);
-	skb->data_len = data_len;
+	skb_put(skb, header_len);
+	skb->data_len = len - header_len;
 	skb->len = len;
 	err = skb_copy_datagram_from_iter(skb, 0, &msg->msg_iter, len);
 	if (err)
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] af_unix: utilize skb's fragment list for sending large datagrams
  2019-08-22 10:38 [PATCH] af_unix: utilize skb's fragment list for sending large datagrams Jan Dakinevich
@ 2019-08-22 19:04 ` David Miller
  2019-08-24 20:38   ` Denis Lunev
  0 siblings, 1 reply; 3+ messages in thread
From: David Miller @ 2019-08-22 19:04 UTC (permalink / raw)
  To: jan.dakinevich
  Cc: linux-kernel, den, khorenko, pabeni, viro, axboe, hare, kgraul,
	kyeongdon.kim, tglx, netdev

From: Jan Dakinevich <jan.dakinevich@virtuozzo.com>
Date: Thu, 22 Aug 2019 10:38:39 +0000

> However, paged part can not exceed MAX_SKB_FRAGS * PAGE_SIZE, and large
> datagram causes increasing skb's data buffer. Thus, if any user-space
> program sets send buffer (by calling setsockopt(SO_SNDBUF, ...)) to
> maximum allowed size (wmem_max) it becomes able to cause any amount
> of uncontrolled high-order kernel allocations.

So?  You want huge SKBs you get the high order allocations, seems
rather reasonable to me.

SKBs using fragment lists are the most difficult and cpu intensive
geometry for an SKB to have and we should avoid using it where
feasible.

I don't want to apply this, sorry.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] af_unix: utilize skb's fragment list for sending large datagrams
  2019-08-22 19:04 ` David Miller
@ 2019-08-24 20:38   ` Denis Lunev
  0 siblings, 0 replies; 3+ messages in thread
From: Denis Lunev @ 2019-08-24 20:38 UTC (permalink / raw)
  To: David Miller, Jan Dakinevich
  Cc: linux-kernel, Konstantin Khorenko, pabeni, viro, axboe, hare,
	kgraul, kyeongdon.kim, tglx, netdev

On 8/22/19 9:04 PM, David Miller wrote:
> From: Jan Dakinevich <jan.dakinevich@virtuozzo.com>
> Date: Thu, 22 Aug 2019 10:38:39 +0000
>
>> However, paged part can not exceed MAX_SKB_FRAGS * PAGE_SIZE, and large
>> datagram causes increasing skb's data buffer. Thus, if any user-space
>> program sets send buffer (by calling setsockopt(SO_SNDBUF, ...)) to
>> maximum allowed size (wmem_max) it becomes able to cause any amount
>> of uncontrolled high-order kernel allocations.
> So?  You want huge SKBs you get the high order allocations, seems
> rather reasonable to me.
>
> SKBs using fragment lists are the most difficult and cpu intensive
> geometry for an SKB to have and we should avoid using it where
> feasible.
>
> I don't want to apply this, sorry.
Under even mediocre memory pressure this will either takes seconds or fail,
which does not look good. We can try to allocate memory of big order
but not that hard and switch to fragments when possible.

Please also note that even ordinary user could trigger really big
allocations
and thus force the whole node to dance.

Den

Den

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2019-08-24 20:38 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-22 10:38 [PATCH] af_unix: utilize skb's fragment list for sending large datagrams Jan Dakinevich
2019-08-22 19:04 ` David Miller
2019-08-24 20:38   ` Denis Lunev

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).