All of lore.kernel.org
 help / color / mirror / Atom feed
From: brakmo <brakmo@fb.com>
To: netdev <netdev@vger.kernel.org>
Cc: Martin Lau <kafai@fb.com>, Alexei Starovoitov <ast@fb.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Eric Dumazet <eric.dumazet@gmail.com>,
	Kernel Team <Kernel-team@fb.com>
Subject: [PATCH v2 bpf-next 5/9] bpf: Add bpf helper bpf_tcp_check_probe_timer
Date: Fri, 22 Feb 2019 17:06:59 -0800	[thread overview]
Message-ID: <20190223010703.678070-6-brakmo@fb.com> (raw)
In-Reply-To: <20190223010703.678070-1-brakmo@fb.com>

This patch adds a new bpf helper BPF_FUNC_tcp_check_probe_timer
"int bpf_check_tcp_probe_timer(struct tcp_bpf_sock *tp, u32 when_us)".
It is added to BPF_PROG_TYPE_CGROUP_SKB typed bpf_prog which currently
can be attached to the ingress and egress path.

To ensure it is only called from BPF_CGROUP_INET_EGRESS, the
attr->expected_attach_type must be specified as BPF_CGROUP_INET_EGRESS
during load time if the prog uses this new helper.
The newly added prog->enforce_expected_attach_type bit will also be set
if this new helper is used.  This bit is for backward compatibility reason
because currently prog->expected_attach_type has been ignored in
BPF_PROG_TYPE_CGROUP_SKB.  During attach time,
prog->expected_attach_type is only enforced if the
prog->enforce_expected_attach_type bit is set.
i.e. prog->expected_attach_type is only enforced if this new helper
is used by the prog.

The function forces when_us to be at least TCP_TIMEOUT_MIN (currently
2 jiffies) and no more than TCP_RTO_MIN (currently 200ms).

When using a bpf_prog to limit the egress bandwidth of a cgroup,
it can happen that we drop a packet of a connection that has no
packets out. In this case, the connection may not retry sending
the packet until the probe timer fires. Since the default value
of the probe timer is at least 200ms, this can introduce link
underutiliation (i.e. the cgroup egress bandwidth being smaller
than the specified rate) thus increased tail latency.
This helper function allows for setting a smaller probe timer.

Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
---
 include/uapi/linux/bpf.h | 12 +++++++++++-
 net/core/filter.c        | 32 ++++++++++++++++++++++++++++++++
 2 files changed, 43 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index fc646f3eaf9b..5d0bed852800 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -2372,6 +2372,15 @@ union bpf_attr {
  *		current value is ect (ECN capable). Works with IPv6 and IPv4.
  *	Return
  *		1 if set, 0 if not set.
+ *
+ * int bpf_tcp_check_probe_timer(struct bpf_tcp_sock *tp, int when_us)
+ *	Description
+ *		Checks that there are no packets out and there is no pending
+ *		timer. If both of these are true, it bounds when_us by
+ *		TCP_TIMEOUT_MIN (2 jiffies) or TCP_RTO_MIN (200ms) and
+ *		sets the probe timer.
+ *	Return
+ *		0
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -2472,7 +2481,8 @@ union bpf_attr {
 	FN(sk_fullsock),		\
 	FN(tcp_sock),			\
 	FN(tcp_enter_cwr),		\
-	FN(skb_ecn_set_ce),
+	FN(skb_ecn_set_ce),		\
+	FN(tcp_check_probe_timer),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git a/net/core/filter.c b/net/core/filter.c
index 955369c6ed30..7d7026768840 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5456,6 +5456,31 @@ static const struct bpf_func_proto bpf_skb_ecn_set_ce_proto = {
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_PTR_TO_CTX,
 };
+
+BPF_CALL_2(bpf_tcp_check_probe_timer, struct tcp_sock *, tp, u32, when_us)
+{
+	struct sock *sk = (struct sock *) tp;
+	unsigned long when = usecs_to_jiffies(when_us);
+
+	if (!tp->packets_out && !inet_csk(sk)->icsk_pending) {
+		if (when < TCP_TIMEOUT_MIN)
+			when = TCP_TIMEOUT_MIN;
+		else if (when > TCP_RTO_MIN)
+			when = TCP_RTO_MIN;
+
+		tcp_reset_xmit_timer(sk, ICSK_TIME_PROBE0,
+				     when, TCP_RTO_MAX, NULL);
+	}
+	return 0;
+}
+
+static const struct bpf_func_proto bpf_tcp_check_probe_timer_proto = {
+	.func		= bpf_tcp_check_probe_timer,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_TCP_SOCK,
+	.arg2_type	= ARG_ANYTHING,
+};
 #endif /* CONFIG_INET */
 
 bool bpf_helper_changes_pkt_data(void *func)
@@ -5624,6 +5649,13 @@ cg_skb_func_proto(enum bpf_func_id func_id, struct bpf_prog *prog)
 		}
 	case BPF_FUNC_skb_ecn_set_ce:
 		return &bpf_skb_ecn_set_ce_proto;
+	case BPF_FUNC_tcp_check_probe_timer:
+		if (prog->expected_attach_type == BPF_CGROUP_INET_EGRESS) {
+			prog->enforce_expected_attach_type = 1;
+			return &bpf_tcp_check_probe_timer_proto;
+		} else {
+			return NULL;
+		}
 #endif
 	default:
 		return sk_filter_func_proto(func_id, prog);
-- 
2.17.1


  parent reply	other threads:[~2019-02-23  1:07 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-23  1:06 [PATCH v2 bpf-next 0/9] bpf: Network Resource Manager (NRM) brakmo
2019-02-23  1:06 ` [PATCH v2 bpf-next 1/9] bpf: Remove const from get_func_proto brakmo
2019-02-23  1:06 ` [PATCH v2 bpf-next 2/9] bpf: Add bpf helper bpf_tcp_enter_cwr brakmo
2019-02-24  1:32   ` Eric Dumazet
2019-02-24  3:08     ` Martin Lau
2019-02-24  4:44       ` Alexei Starovoitov
2019-02-24 18:00       ` Eric Dumazet
2019-02-25 23:14   ` Stanislav Fomichev
2019-02-26  1:30     ` Martin Lau
2019-02-26  3:32       ` Stanislav Fomichev
2019-02-23  1:06 ` [PATCH v2 bpf-next 3/9] bpf: Test bpf_tcp_enter_cwr in test_verifier brakmo
2019-02-23  1:06 ` [PATCH v2 bpf-next 4/9] bpf: add bpf helper bpf_skb_ecn_set_ce brakmo
2019-02-23  1:14   ` Daniel Borkmann
2019-02-23  7:30     ` Martin Lau
2019-02-25 10:10       ` Daniel Borkmann
2019-02-25 16:52         ` Eric Dumazet
2019-02-23  1:06 ` brakmo [this message]
2019-02-23  1:07 ` [PATCH v2 bpf-next 6/9] bpf: sync bpf.h to tools and update bpf_helpers.h brakmo
2019-02-23  1:07 ` [PATCH v2 bpf-next 7/9] bpf: Sample NRM BPF program to limit egress bw brakmo
2019-02-23  1:07 ` [PATCH v2 bpf-next 8/9] bpf: User program for testing NRM brakmo
2019-02-23  1:07 ` [PATCH v2 bpf-next 9/9] bpf: NRM test script brakmo
2019-02-23  3:03 ` [PATCH v2 bpf-next 0/9] bpf: Network Resource Manager (NRM) David Ahern
2019-02-23 18:39   ` Eric Dumazet
2019-02-23 20:40     ` Alexei Starovoitov
2019-02-23 20:43       ` Eric Dumazet
2019-02-23 23:25         ` Alexei Starovoitov
2019-02-24  2:58           ` David Ahern
2019-02-24  4:48             ` Alexei Starovoitov
2019-02-25  1:38               ` David Ahern

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190223010703.678070-6-brakmo@fb.com \
    --to=brakmo@fb.com \
    --cc=Kernel-team@fb.com \
    --cc=ast@fb.com \
    --cc=daniel@iogearbox.net \
    --cc=eric.dumazet@gmail.com \
    --cc=kafai@fb.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.