From mboxrd@z Thu Jan 1 00:00:00 1970 From: Eric Dumazet Subject: Re: [PATCH net-next] net_sched: more precise pkt_len computation Date: Fri, 26 Oct 2012 13:11:54 +0200 Message-ID: <1351249914.6537.339.camel@edumazet-glaptop> References: <1350965711-4390-1-git-send-email-amwang@redhat.com> <1350982432.26308.8.camel@cr0> <58DEB62C-6237-435E-8300-8E390F101F58@unimore.it> <1351239007.6537.291.camel@edumazet-glaptop> <1351243970.6537.321.camel@edumazet-glaptop> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit Cc: netdev@vger.kernel.org, Stephen Hemminger , Patrick McHardy , Paolo Valente , Jamal Hadi Salim , Herbert Xu To: David Miller Return-path: Received: from mail-wi0-f178.google.com ([209.85.212.178]:33894 "EHLO mail-wi0-f178.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758370Ab2JZLL7 (ORCPT ); Fri, 26 Oct 2012 07:11:59 -0400 Received: by mail-wi0-f178.google.com with SMTP id hr7so330997wib.1 for ; Fri, 26 Oct 2012 04:11:57 -0700 (PDT) In-Reply-To: <1351243970.6537.321.camel@edumazet-glaptop> Sender: netdev-owner@vger.kernel.org List-ID: On Fri, 2012-10-26 at 11:32 +0200, Eric Dumazet wrote: > From: Eric Dumazet > > One long standing problem with TSO/GSO/GRO packets is that skb->len > doesnt represent a precise amount of bytes on wire. > > Headers are only accounted for the first segment. > For TCP, thats typically 66 bytes per 1448 bytes segment missing, > an error of 4.5 % > > As consequences : > > 1) TBF/CBQ/HTB/NETEM/... can send more bytes than the assigned limits. > 2) Device stats are slightly under estimated as well. > > Fix this by taking account of headers in qdisc_skb_cb(skb)->pkt_len > computation. > > Packet schedulers should use pkt_len instead of skb->len for their > bandwidth limitations, and TSO enabled devices drivers could use pkt_len > if their statistics are not hardware assisted, and if they dont scratch > skb->cb[] first word. > > Signed-off-by: Eric Dumazet > Cc: Jamal Hadi Salim > Cc: Stephen Hemminger > Cc: Paolo Valente > Cc: Herbert Xu > Cc: Patrick McHardy > --- > net/core/dev.c | 11 ++++++++++- > 1 file changed, 10 insertions(+), 1 deletion(-) > > diff --git a/net/core/dev.c b/net/core/dev.c > index b4978e2..cf2b5f4 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -2442,7 +2442,6 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q, > bool contended; > int rc; > > - qdisc_skb_cb(skb)->pkt_len = skb->len; > qdisc_calculate_pkt_len(skb, q); > /* > * Heuristic to force contended enqueues to serialize on a > @@ -2579,6 +2578,16 @@ int dev_queue_xmit(struct sk_buff *skb) > skb->tc_verd = SET_TC_AT(skb->tc_verd, AT_EGRESS); > #endif > trace_net_dev_queue(skb); > + qdisc_skb_cb(skb)->pkt_len = skb->len; > + if (skb_is_gso(skb)) { > + unsigned int hdr_len = skb_transport_offset(skb); > + > + if (skb_shinfo(skb)->gso_type & (SKB_GSO_TCPV4 | SKB_GSO_TCPV6)) > + hdr_len += tcp_hdrlen(skb); > + else > + hdr_len += sizeof(struct udphdr); > + qdisc_skb_cb(skb)->pkt_len += (skb_shinfo(skb)->gso_segs - 1) * hdr_len; > + } > if (q->enqueue) { > rc = __dev_xmit_skb(skb, q, dev, txq); > goto out; > Hmm, this doesnt quite work for GRO (ingress), as we call skb_reset_transport_header(skb); in __netif_receive_skb() before handle_ing() So skb_transport_offset(skb) is 14 here, instead of 14+(IP header) This skb_reset_transport_header(skb); looks like defensive programming, for the !NET_SKBUFF_DATA_USES_OFFSET case ? We could either : 1) remove this skb_reset_transport_header(skb) call or 2) use the following helper instead : #ifdef NET_SKBUFF_DATA_USES_OFFSET static inline void skb_sanitize_transport_header(struct sk_buff *skb) { skb->transport_header = max_t(sk_buff_data_t, skb->data - skb->head, skb->transport_header); } #else static inline void skb_sanitize_transport_header(struct sk_buff *skb) { skb->transport_header = max_t(sk_buff_data_t, skb->data, skb->transport_header); } #endif