All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Dumazet <eric.dumazet@gmail.com>
To: Willem de Bruijn <willemb@google.com>
Cc: netdev@vger.kernel.org, davem@davemloft.net, edumazet@google.com
Subject: Re: [PATCH] rps: selective flow shedding during softnet overflow
Date: Fri, 19 Apr 2013 10:58:54 -0700	[thread overview]
Message-ID: <1366394334.16391.36.camel@edumazet-glaptop> (raw)
In-Reply-To: <1366393612-16885-1-git-send-email-willemb@google.com>

On Fri, 2013-04-19 at 13:46 -0400, Willem de Bruijn wrote:
> A cpu executing the network receive path sheds packets when its input
> queue grows to netdev_max_backlog. A single high rate flow (such as a
> spoofed source DoS) can exceed a single cpu processing rate and will
> degrade throughput of other flows hashed onto the same cpu.
> 
> This patch adds a more fine grained hashtable. If the netdev backlog
> is above a threshold, IRQ cpus track the ratio of total traffic of
> each flow (using 1024 buckets, configurable). The ratio is measured
> by counting the number of packets per flow over the last 256 packets
> from the source cpu. Any flow that occupies a large fraction of this
> (set at 50%) will see packet drop while above the threshold.
> 
> Tested:
> Setup is a muli-threaded UDP echo server with network rx IRQ on cpu0,
> kernel receive (RPS) on cpu0 and application threads on cpus 2--7
> each handling 20k req/s. Throughput halves when hit with a 400 kpps
> antagonist storm. With this patch applied, antagonist overload is
> dropped and the server processes its complete load.
> 
> The patch is effective when kernel receive processing is the
> bottleneck. The above RPS scenario is a extreme, but the same is
> reached with RFS and sufficient kernel processing (iptables, packet
> socket tap, ..).
> 
> Signed-off-by: Willem de Bruijn <willemb@google.com>
> ---

> +#ifdef CONFIG_NET_FLOW_LIMIT
> +#define FLOW_LIMIT_HISTORY	(1 << 8)	/* must be ^2 */
> +struct sd_flow_limit {
> +	u64			count;
> +	unsigned int		history_head;
> +	u16			history[FLOW_LIMIT_HISTORY];
> +	u8			buckets[];
> +};
> +
> +extern int netdev_flow_limit_table_len;
> +#endif /* CONFIG_NET_FLOW_LIMIT */
> +
>  /*
>   * Incoming packets are placed on per-cpu queues
>   */
> @@ -1808,6 +1820,10 @@ struct softnet_data {
>  	unsigned int		dropped;
>  	struct sk_buff_head	input_pkt_queue;
>  	struct napi_struct	backlog;
> +
> +#ifdef CONFIG_NET_FLOW_LIMIT
> +	struct sd_flow_limit	*flow_limit;
> +#endif
>  };
>  
>  static inline void input_queue_head_incr(struct softnet_data *sd)
> diff --git a/net/Kconfig b/net/Kconfig
> index 2ddc904..ff66a4f 100644
> --- a/net/Kconfig
> +++ b/net/Kconfig
> @@ -259,6 +259,16 @@ config BPF_JIT
>  	  packet sniffing (libpcap/tcpdump). Note : Admin should enable
>  	  this feature changing /proc/sys/net/core/bpf_jit_enable
>  
> +config NET_FLOW_LIMIT
> +	bool "Flow shedding under load"
> +	---help---
> +	  The network stack has to drop packets when a receive processing CPUs
> +	  backlog reaches netdev_max_backlog. If a few out of many active flows
> +	  generate the vast majority of load, drop their traffic earlier to
> +	  maintain capacity for the other flows. This feature provides servers
> +	  with many clients some protection against DoS by a single (spoofed)
> +	  flow that greatly exceeds average workload.
> +
>  menu "Network testing"
>  
>  config NET_PKTGEN
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 3655ff9..67a4ae0 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -3054,6 +3054,47 @@ static int rps_ipi_queued(struct softnet_data *sd)
>  	return 0;
>  }
>  
> +#ifdef CONFIG_NET_FLOW_LIMIT
> +int netdev_flow_limit_table_len __read_mostly = (1 << 12);
> +#endif
> +
> +static bool skb_flow_limit(struct sk_buff *skb, unsigned int qlen)
> +{
> +#ifdef CONFIG_NET_FLOW_LIMIT
> +	struct sd_flow_limit *fl;
> +	struct softnet_data *sd;
> +	unsigned int old_flow, new_flow;
> +
> +	if (qlen < (netdev_max_backlog >> 1))
> +		return false;
> +
> +	sd = &per_cpu(softnet_data, smp_processor_id());
> +
> +	rcu_read_lock();
> +	fl = rcu_dereference(sd->flow_limit);
> +	if (fl) {
> +		new_flow = skb_get_rxhash(skb) &
> +			   (netdev_flow_limit_table_len - 1);

There is a race accessing netdev_flow_limit_table_len

(the admin might change the value, and we might do an out of bound
access)

This should be a field in fl, aka fl->mask, so thats its safe


> +		old_flow = fl->history[fl->history_head];
> +		fl->history[fl->history_head] = new_flow;
> +
> +		fl->history_head++;
> +		fl->history_head &= FLOW_LIMIT_HISTORY - 1;
> +
> +		if (likely(fl->buckets[old_flow]))
> +			fl->buckets[old_flow]--;
> +
> +		if (++fl->buckets[new_flow] > (FLOW_LIMIT_HISTORY >> 1)) {
> +			fl->count++;
> +			rcu_read_unlock();
> +			return true;
> +		}
> +	}
> +	rcu_read_unlock();
> +#endif
> +	return false;
> +}
> +

Very nice work by the way !

  reply	other threads:[~2013-04-19 17:58 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-04-19 17:46 [PATCH] rps: selective flow shedding during softnet overflow Willem de Bruijn
2013-04-19 17:58 ` Eric Dumazet [this message]
2013-04-22 20:40   ` Willem de Bruijn
2013-04-22 20:46     ` [PATCH net-next v2] " Willem de Bruijn
2013-04-22 22:30       ` Eric Dumazet
2013-04-23 18:45         ` Willem de Bruijn
2013-04-23 18:46           ` [PATCH net-next v3] " Willem de Bruijn
2013-04-23 19:18             ` Eric Dumazet
2013-04-23 20:30               ` Willem de Bruijn
2013-04-23 20:31                 ` [PATCH net-next v4] " Willem de Bruijn
2013-04-23 21:23                   ` Stephen Hemminger
2013-04-23 21:37                     ` Willem de Bruijn
2013-04-23 21:37                     ` Eric Dumazet
2013-04-23 21:52                       ` Stephen Hemminger
2013-04-23 22:34                         ` David Miller
2013-04-24  0:09                         ` Eric Dumazet
2013-04-24  0:37                           ` [PATCH net-next v5] " Willem de Bruijn
2013-04-24  1:07                             ` Eric Dumazet
2013-04-25  8:20                             ` David Miller
2013-05-20 14:02                               ` [PATCH net-next v6] " Willem de Bruijn
2013-05-20 16:00                                 ` Eric Dumazet
2013-05-20 16:08                                   ` Willem de Bruijn
2013-05-20 20:48                                   ` David Miller
2013-04-24  1:25                           ` [PATCH net-next v4] " Jamal Hadi Salim
2013-04-24  1:32                             ` Eric Dumazet
2013-04-24  1:44                               ` Jamal Hadi Salim
2013-04-24  2:11                                 ` Eric Dumazet
2013-04-24 13:00                                   ` Jamal Hadi Salim
2013-04-24 14:41                                     ` Eric Dumazet
2013-04-23 22:33                     ` David Miller
2013-04-23 21:34                   ` Eric Dumazet
2013-04-23 22:41                   ` David Miller
2013-04-23 23:11                     ` Eric Dumazet
2013-04-23 23:15                       ` David Miller
2013-04-23 23:26                         ` Eric Dumazet
2013-04-24  0:03                         ` Stephen Hemminger
2013-04-24  0:00                     ` Willem de Bruijn
2013-04-23 20:46                 ` [PATCH net-next v3] " Eric Dumazet
2013-04-19 19:03 ` [PATCH] " Stephen Hemminger
2013-04-19 19:21   ` Eric Dumazet
2013-04-19 20:11   ` Willem de Bruijn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1366394334.16391.36.camel@edumazet-glaptop \
    --to=eric.dumazet@gmail.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=willemb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.