* [PATCH 0/3] mvneta: software TSO implementation @ 2014-04-10 22:57 Ezequiel Garcia 2014-04-10 22:58 ` [PATCH 1/3] net: mvneta: Factorize feature setting Ezequiel Garcia ` (3 more replies) 0 siblings, 4 replies; 17+ messages in thread From: Ezequiel Garcia @ 2014-04-10 22:57 UTC (permalink / raw) To: netdev Cc: David S. Miller, Eric Dumazet, Thomas Petazzoni, Gregory Clement, Simon Guinot, Willy Tarreau, Tawfik Bayouk, Lior Amsalem, Ezequiel Garcia This patchset adds a software TSO implementation to our mvneta driver which is already in use on a number platforms. When enabled, the TSO shows both CPU usage reduction and throughput performance improvements. The TSO feature can be completely disabled at runtime using ethtool: $ ethtool -K eth0 tso {on,off} When TSO is disabled the only overhead remaining is the DMA buffers for the TSO headers, allocated when the tx queues are initialized. Follow-up patches can release/re-allocate these buffers upon TSO disabling/re-enabling. In most of my tests I've used iperf where the improvement is most striking, using a Plat'home Openblocks AX/3 board as the iperf client (tx). Measuring the CPU usage with vmstat shows a substantial CPU usage drop when TSO is on (~15% vs. ~25%). HTTP-based tests performed by Willy Tarreau have shown nice performance improvements. The first two patches are trivial cleanups, and the third patch does all the hard work. The TSO implementation was originally prepared by Simon Guinot, and I did some style massage and other cleaning. Any comments about this is much appreciated. Ezequiel Garcia (3): net: mvneta: Factorize feature setting net: mvneta: Clean mvneta_tx() sk_buff handling net: mvneta: Introduce a software TSO implementation drivers/net/ethernet/marvell/mvneta.c | 226 ++++++++++++++++++++++++++++++++-- 1 file changed, 215 insertions(+), 11 deletions(-) -- 1.9.1 ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 1/3] net: mvneta: Factorize feature setting 2014-04-10 22:57 [PATCH 0/3] mvneta: software TSO implementation Ezequiel Garcia @ 2014-04-10 22:58 ` Ezequiel Garcia 2014-04-10 22:58 ` [PATCH 2/3] net: mvneta: Clean mvneta_tx() sk_buff handling Ezequiel Garcia ` (2 subsequent siblings) 3 siblings, 0 replies; 17+ messages in thread From: Ezequiel Garcia @ 2014-04-10 22:58 UTC (permalink / raw) To: netdev Cc: David S. Miller, Eric Dumazet, Thomas Petazzoni, Gregory Clement, Simon Guinot, Willy Tarreau, Tawfik Bayouk, Lior Amsalem, Ezequiel Garcia In order to ease the addition of new features, let's factorize the feature list. Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com> --- drivers/net/ethernet/marvell/mvneta.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c index d04b1c3..b0db2c4 100644 --- a/drivers/net/ethernet/marvell/mvneta.c +++ b/drivers/net/ethernet/marvell/mvneta.c @@ -2863,8 +2863,8 @@ static int mvneta_probe(struct platform_device *pdev) netif_napi_add(dev, &pp->napi, mvneta_poll, pp->weight); dev->features = NETIF_F_SG | NETIF_F_IP_CSUM; - dev->hw_features |= NETIF_F_SG | NETIF_F_IP_CSUM; - dev->vlan_features |= NETIF_F_SG | NETIF_F_IP_CSUM; + dev->hw_features |= dev->features; + dev->vlan_features |= dev->features; dev->priv_flags |= IFF_UNICAST_FLT; err = register_netdev(dev); -- 1.9.1 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 2/3] net: mvneta: Clean mvneta_tx() sk_buff handling 2014-04-10 22:57 [PATCH 0/3] mvneta: software TSO implementation Ezequiel Garcia 2014-04-10 22:58 ` [PATCH 1/3] net: mvneta: Factorize feature setting Ezequiel Garcia @ 2014-04-10 22:58 ` Ezequiel Garcia 2014-04-10 22:58 ` [PATCH 3/3] net: mvneta: Introduce a software TSO implementation Ezequiel Garcia 2014-04-11 0:51 ` [PATCH 0/3] mvneta: " David Miller 3 siblings, 0 replies; 17+ messages in thread From: Ezequiel Garcia @ 2014-04-10 22:58 UTC (permalink / raw) To: netdev Cc: David S. Miller, Eric Dumazet, Thomas Petazzoni, Gregory Clement, Simon Guinot, Willy Tarreau, Tawfik Bayouk, Lior Amsalem, Ezequiel Garcia Rework mvneta_tx() so that the code that performs the final handling before a sk_buff is transmitted is done only if the numbers of fragments processed if positive. This is preparation work to introduce a software TSO routine. Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com> --- drivers/net/ethernet/marvell/mvneta.c | 15 +++++++-------- 1 file changed, 7 insertions(+), 8 deletions(-) diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c index b0db2c4..e5bd3ca 100644 --- a/drivers/net/ethernet/marvell/mvneta.c +++ b/drivers/net/ethernet/marvell/mvneta.c @@ -1584,7 +1584,6 @@ static int mvneta_tx(struct sk_buff *skb, struct net_device *dev) u16 txq_id = skb_get_queue_mapping(skb); struct mvneta_tx_queue *txq = &pp->txqs[txq_id]; struct mvneta_tx_desc *tx_desc; - struct netdev_queue *nq; int frags = 0; u32 tx_cmd; @@ -1592,7 +1591,6 @@ static int mvneta_tx(struct sk_buff *skb, struct net_device *dev) goto out; frags = skb_shinfo(skb)->nr_frags + 1; - nq = netdev_get_tx_queue(dev, txq_id); /* Get a descriptor for the first part of the packet */ tx_desc = mvneta_txq_next_desc_get(txq); @@ -1635,15 +1633,16 @@ static int mvneta_tx(struct sk_buff *skb, struct net_device *dev) } } - txq->count += frags; - mvneta_txq_pend_desc_add(pp, txq, frags); - - if (txq->size - txq->count < MAX_SKB_FRAGS + 1) - netif_tx_stop_queue(nq); - out: if (frags > 0) { struct mvneta_pcpu_stats *stats = this_cpu_ptr(pp->stats); + struct netdev_queue *nq = netdev_get_tx_queue(dev, txq_id); + + txq->count += frags; + mvneta_txq_pend_desc_add(pp, txq, frags); + + if (txq->size - txq->count < MAX_SKB_FRAGS + 1) + netif_tx_stop_queue(nq); u64_stats_update_begin(&stats->syncp); stats->tx_packets++; -- 1.9.1 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 3/3] net: mvneta: Introduce a software TSO implementation 2014-04-10 22:57 [PATCH 0/3] mvneta: software TSO implementation Ezequiel Garcia 2014-04-10 22:58 ` [PATCH 1/3] net: mvneta: Factorize feature setting Ezequiel Garcia 2014-04-10 22:58 ` [PATCH 2/3] net: mvneta: Clean mvneta_tx() sk_buff handling Ezequiel Garcia @ 2014-04-10 22:58 ` Ezequiel Garcia 2014-04-10 23:16 ` Eric Dumazet 2014-05-05 14:47 ` Ezequiel Garcia 2014-04-11 0:51 ` [PATCH 0/3] mvneta: " David Miller 3 siblings, 2 replies; 17+ messages in thread From: Ezequiel Garcia @ 2014-04-10 22:58 UTC (permalink / raw) To: netdev Cc: David S. Miller, Eric Dumazet, Thomas Petazzoni, Gregory Clement, Simon Guinot, Willy Tarreau, Tawfik Bayouk, Lior Amsalem, Ezequiel Garcia, Simon Guinot This commit implements a software TSO which reduces the CPU usage significantly while retaining the outbound throughput at line rate. Tested on a Plat'home Openblocks AX/3 board acting as iperf client (tx). The CPU usage shows a substantial CPU usage drop, between 15%-25%. Other tests performed by Willy Tarreau show performance improvements: Willy reported that "[..] turning the TSO flag on immediately increases the HTTP request rate from 1680 to 1820 per second (30 kB objects)". Tested-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Simon Guinot <sguinot@lacie.com> Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com> --- drivers/net/ethernet/marvell/mvneta.c | 207 +++++++++++++++++++++++++++++++++- 1 file changed, 206 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c index e5bd3ca..cd6b998 100644 --- a/drivers/net/ethernet/marvell/mvneta.c +++ b/drivers/net/ethernet/marvell/mvneta.c @@ -244,6 +244,9 @@ #define MVNETA_TX_MTU_MAX 0x3ffff +/* TSO header size */ +#define TSO_HEADER_SIZE 128 + /* Max number of Rx descriptors */ #define MVNETA_MAX_RXD 128 @@ -413,6 +416,12 @@ struct mvneta_tx_queue { /* Index of the next TX DMA descriptor to process */ int next_desc_to_proc; + + /* DMA buffers for TSO headers */ + char *tso_hdrs; + + /* DMA address of TSO headers */ + dma_addr_t tso_hdrs_phys; }; struct mvneta_rx_queue { @@ -1519,6 +1528,181 @@ static int mvneta_rx(struct mvneta_port *pp, int rx_todo, return rx_done; } +static inline void +mvneta_tso_build_hdr(struct net_device *dev, struct mvneta_tx_queue *txq, + struct sk_buff *skb, int hdr_len, int size, + u32 tcp_seq, u16 ip_id, bool is_last) +{ + struct mvneta_port *pp = netdev_priv(dev); + struct mvneta_tx_desc *tx_desc; + struct iphdr *iph; + struct tcphdr *tcph; + char *mac; + int mac_hdr_len = skb_network_offset(skb); + + mac = txq->tso_hdrs + txq->txq_put_index * TSO_HEADER_SIZE; + memcpy(mac, skb->data, hdr_len); + + iph = (struct iphdr *)(mac + mac_hdr_len); + iph->id = htons(ip_id); + iph->tot_len = htons(size + hdr_len - mac_hdr_len); + + tcph = (struct tcphdr *)(mac + skb_transport_offset(skb)); + tcph->seq = htonl(tcp_seq); + + if (!is_last) { + /* Clear all special flags for not last packet */ + tcph->psh = 0; + tcph->fin = 0; + tcph->rst = 0; + } + + txq->tx_skb[txq->txq_put_index] = NULL; + tx_desc = mvneta_txq_next_desc_get(txq); + tx_desc->data_size = hdr_len; + tx_desc->command = mvneta_skb_tx_csum(pp, skb); + tx_desc->command |= MVNETA_TXD_F_DESC; + tx_desc->buf_phys_addr = txq->tso_hdrs_phys + + txq->txq_put_index * TSO_HEADER_SIZE; + mvneta_txq_inc_put(txq); +} + +static inline int +mvneta_tso_build_data(struct net_device *dev, struct mvneta_tx_queue *txq, + struct sk_buff *skb, char *frag_ptr, int frag_size, + int data_left, bool is_last) +{ + int size; + struct mvneta_tx_desc *tx_desc; + + size = (frag_size < data_left) ? frag_size : data_left; + + tx_desc = mvneta_txq_next_desc_get(txq); + tx_desc->data_size = size; + tx_desc->buf_phys_addr = dma_map_single(dev->dev.parent, frag_ptr, + size, DMA_TO_DEVICE); + if (unlikely(dma_mapping_error(dev->dev.parent, + tx_desc->buf_phys_addr))) { + mvneta_txq_desc_put(txq); + return 0; + } + + tx_desc->command = 0; + txq->tx_skb[txq->txq_put_index] = NULL; + + if (size == data_left) { + /* last descriptor in the TCP packet */ + tx_desc->command = MVNETA_TXD_L_DESC; + + /* last descriptor in SKB */ + if (is_last) + txq->tx_skb[txq->txq_put_index] = skb; + } + mvneta_txq_inc_put(txq); + return size; +} + +static int mvneta_tx_tso(struct sk_buff *skb, struct net_device *dev, + struct mvneta_tx_queue *txq) +{ + int total_len, hdr_len, size, frag_size, data_left; + int desc_count; + u16 ip_id; + u32 tcp_seq; + skb_frag_t *frag; + int frag_idx = 0; + char *frag_ptr; + const struct tcphdr *th = tcp_hdr(skb); + struct mvneta_port *pp = netdev_priv(dev); + int i; + + /* Calculate expected number of TX descriptors */ + desc_count = skb_shinfo(skb)->gso_segs * 2 + skb_shinfo(skb)->nr_frags; + if ((txq->count + desc_count) >= txq->size) + return 0; + + total_len = skb->len; + hdr_len = (skb_transport_offset(skb) + tcp_hdrlen(skb)); + + total_len -= hdr_len; + ip_id = ntohs(ip_hdr(skb)->id); + tcp_seq = ntohl(th->seq); + + frag_size = skb_headlen(skb); + frag_ptr = skb->data; + + if (frag_size < hdr_len) + return 0; + + frag_size -= hdr_len; + frag_ptr += hdr_len; + if (frag_size == 0) { + frag = &skb_shinfo(skb)->frags[frag_idx]; + + /* Move to next segment */ + frag_size = frag->size; + frag_ptr = page_address(frag->page.p) + frag->page_offset; + frag_idx++; + } + desc_count = 0; + + while (total_len > 0) { + data_left = (skb_shinfo(skb)->gso_size < total_len) ? + skb_shinfo(skb)->gso_size : total_len; + desc_count++; + total_len -= data_left; + + /* prepare packet headers: MAC + IP + TCP */ + mvneta_tso_build_hdr(dev, txq, skb, hdr_len, data_left, + tcp_seq, ip_id, total_len == 0); + ip_id++; + + while (data_left > 0) { + desc_count++; + + size = mvneta_tso_build_data(dev, txq, skb, + frag_ptr, frag_size, + data_left, total_len == 0); + if (size == 0) + goto err_release; + + data_left -= size; + tcp_seq += size; + + frag_size -= size; + frag_ptr += size; + + if ((frag_size == 0) && + (frag_idx < skb_shinfo(skb)->nr_frags)) { + frag = &skb_shinfo(skb)->frags[frag_idx]; + + /* Move to next segment */ + frag_size = frag->size; + frag_ptr = page_address(frag->page.p) + + frag->page_offset; + frag_idx++; + } + } + } + + return desc_count; + +err_release: + /* Release all used data descriptors; header descriptors must not + * be DMA-unmapped. + */ + for (i = desc_count - 1; i >= 0; i--) { + struct mvneta_tx_desc *tx_desc = txq->descs + i; + if (!(tx_desc->command & MVNETA_TXD_F_DESC)) + dma_unmap_single(pp->dev->dev.parent, + tx_desc->buf_phys_addr, + tx_desc->data_size, + DMA_TO_DEVICE); + mvneta_txq_desc_put(txq); + } + return 0; +} + /* Handle tx fragmentation processing */ static int mvneta_tx_frag_process(struct mvneta_port *pp, struct sk_buff *skb, struct mvneta_tx_queue *txq) @@ -1590,6 +1774,11 @@ static int mvneta_tx(struct sk_buff *skb, struct net_device *dev) if (!netif_running(dev)) goto out; + if (skb_is_gso(skb)) { + frags = mvneta_tx_tso(skb, dev, txq); + goto out; + } + frags = skb_shinfo(skb)->nr_frags + 1; /* Get a descriptor for the first part of the packet */ @@ -2108,6 +2297,18 @@ static int mvneta_txq_init(struct mvneta_port *pp, txq->descs, txq->descs_phys); return -ENOMEM; } + + /* Allocate DMA buffers for TSO MAC/IP/TCP headers */ + txq->tso_hdrs = dma_alloc_coherent(pp->dev->dev.parent, + txq->size * TSO_HEADER_SIZE, + &txq->tso_hdrs_phys, GFP_KERNEL); + if (txq->tso_hdrs == NULL) { + kfree(txq->tx_skb); + dma_free_coherent(pp->dev->dev.parent, + txq->size * MVNETA_DESC_ALIGNED_SIZE, + txq->descs, txq->descs_phys); + return -ENOMEM; + } mvneta_tx_done_pkts_coal_set(pp, txq, txq->done_pkts_coal); return 0; @@ -2119,6 +2320,10 @@ static void mvneta_txq_deinit(struct mvneta_port *pp, { kfree(txq->tx_skb); + if (txq->tso_hdrs) + dma_free_coherent(pp->dev->dev.parent, + txq->size * TSO_HEADER_SIZE, + txq->tso_hdrs, txq->tso_hdrs_phys); if (txq->descs) dma_free_coherent(pp->dev->dev.parent, txq->size * MVNETA_DESC_ALIGNED_SIZE, @@ -2861,7 +3066,7 @@ static int mvneta_probe(struct platform_device *pdev) netif_napi_add(dev, &pp->napi, mvneta_poll, pp->weight); - dev->features = NETIF_F_SG | NETIF_F_IP_CSUM; + dev->features = NETIF_F_SG | NETIF_F_IP_CSUM | NETIF_F_TSO; dev->hw_features |= dev->features; dev->vlan_features |= dev->features; dev->priv_flags |= IFF_UNICAST_FLT; -- 1.9.1 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH 3/3] net: mvneta: Introduce a software TSO implementation 2014-04-10 22:58 ` [PATCH 3/3] net: mvneta: Introduce a software TSO implementation Ezequiel Garcia @ 2014-04-10 23:16 ` Eric Dumazet 2014-05-05 14:47 ` Ezequiel Garcia 1 sibling, 0 replies; 17+ messages in thread From: Eric Dumazet @ 2014-04-10 23:16 UTC (permalink / raw) To: Ezequiel Garcia Cc: netdev, David S. Miller, Thomas Petazzoni, Gregory Clement, Simon Guinot, Willy Tarreau, Tawfik Bayouk, Lior Amsalem, Simon Guinot On Thu, 2014-04-10 at 19:58 -0300, Ezequiel Garcia wrote: > + /* Calculate expected number of TX descriptors */ > + desc_count = skb_shinfo(skb)->gso_segs * 2 + skb_shinfo(skb)->nr_frags; > + if ((txq->count + desc_count) >= txq->size) > + return 0; > + Hmm.. have you tried to play with small MSS ? I think you should set dev->gso_max_segs to a sensible value... see commit 7e6d06f0de3f7 for details. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 3/3] net: mvneta: Introduce a software TSO implementation 2014-04-10 22:58 ` [PATCH 3/3] net: mvneta: Introduce a software TSO implementation Ezequiel Garcia 2014-04-10 23:16 ` Eric Dumazet @ 2014-05-05 14:47 ` Ezequiel Garcia 2014-05-07 6:04 ` Willy Tarreau 2014-05-21 2:11 ` Ben Hutchings 1 sibling, 2 replies; 17+ messages in thread From: Ezequiel Garcia @ 2014-05-05 14:47 UTC (permalink / raw) To: netdev Cc: David S. Miller, Eric Dumazet, cmetcalf, Thomas Petazzoni, Gregory Clement, Simon Guinot, Willy Tarreau, Tawfik Bayouk, Lior Amsalem, Simon Guinot Hi all, On 10 Apr 07:58 PM, Ezequiel Garcia wrote: [..] > + > + /* Calculate expected number of TX descriptors */ > + desc_count = skb_shinfo(skb)->gso_segs * 2 + skb_shinfo(skb)->nr_frags; > + if ((txq->count + desc_count) >= txq->size) > + return 0; > + Is this calculus correct? Does it give the accurate number of needed descriptors or is it an approximation? Tilera's tilegx driver does a much stricter descriptor count (see tso_count_edescs). This functions loops through the skb_frag_t fragments as it's done later in the data egress, hence strictly counting the needed descriptors. However, as it's a much heavier routine than the one shown above, I'm wondering if we can get away without it. Willy, Any ideas here? -- Ezequiel García, Free Electrons Embedded Linux, Kernel and Android Engineering http://free-electrons.com ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 3/3] net: mvneta: Introduce a software TSO implementation 2014-05-05 14:47 ` Ezequiel Garcia @ 2014-05-07 6:04 ` Willy Tarreau 2014-05-21 2:11 ` Ben Hutchings 1 sibling, 0 replies; 17+ messages in thread From: Willy Tarreau @ 2014-05-07 6:04 UTC (permalink / raw) To: Ezequiel Garcia Cc: netdev, David S. Miller, Eric Dumazet, cmetcalf, Thomas Petazzoni, Gregory Clement, Simon Guinot, Tawfik Bayouk, Lior Amsalem, Simon Guinot Hi Ezequiel, On Mon, May 05, 2014 at 11:47:02AM -0300, Ezequiel Garcia wrote: > Hi all, > > On 10 Apr 07:58 PM, Ezequiel Garcia wrote: > [..] > > + > > + /* Calculate expected number of TX descriptors */ > > + desc_count = skb_shinfo(skb)->gso_segs * 2 + skb_shinfo(skb)->nr_frags; > > + if ((txq->count + desc_count) >= txq->size) > > + return 0; > > + > > Is this calculus correct? Does it give the accurate number of needed > descriptors or is it an approximation? > > Tilera's tilegx driver does a much stricter descriptor count (see > tso_count_edescs). This functions loops through the skb_frag_t fragments > as it's done later in the data egress, hence strictly counting the > needed descriptors. > > However, as it's a much heavier routine than the one shown above, > I'm wondering if we can get away without it. > > Willy, Any ideas here? I think we don't absolutely need to have the exact value, provided we don't fall into the case where we cannot send a single skb with an empty queue. Here we have 532 tx descriptors max, which means that with an MSS below 246 bytes, we may fall into this situation. Eric pointed out commit 7e6d06f0de3f7 which applies a limit on the number of segments for the sfc driver, I think you should take a look at it. I haven't played with gso_max_segs yet, but there are clear explanations in the commit message and in the comments that can serve as a nice starting point. Hoping this helps :-/ Willy ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 3/3] net: mvneta: Introduce a software TSO implementation 2014-05-05 14:47 ` Ezequiel Garcia 2014-05-07 6:04 ` Willy Tarreau @ 2014-05-21 2:11 ` Ben Hutchings 2014-05-22 17:47 ` Ezequiel Garcia 1 sibling, 1 reply; 17+ messages in thread From: Ben Hutchings @ 2014-05-21 2:11 UTC (permalink / raw) To: Ezequiel Garcia Cc: netdev, David S. Miller, Eric Dumazet, cmetcalf, Thomas Petazzoni, Gregory Clement, Simon Guinot, Willy Tarreau, Tawfik Bayouk, Lior Amsalem, Simon Guinot [-- Attachment #1: Type: text/plain, Size: 1955 bytes --] On Mon, 2014-05-05 at 11:47 -0300, Ezequiel Garcia wrote: > Hi all, > > On 10 Apr 07:58 PM, Ezequiel Garcia wrote: > [..] > > + > > + /* Calculate expected number of TX descriptors */ > > + desc_count = skb_shinfo(skb)->gso_segs * 2 + skb_shinfo(skb)->nr_frags; > > + if ((txq->count + desc_count) >= txq->size) > > + return 0; > > + > > Is this calculus correct? Does it give the accurate number of needed > descriptors or is it an approximation? > > Tilera's tilegx driver does a much stricter descriptor count (see > tso_count_edescs). This functions loops through the skb_frag_t fragments > as it's done later in the data egress, hence strictly counting the > needed descriptors. > > However, as it's a much heavier routine than the one shown above, > I'm wondering if we can get away without it. > > Willy, Any ideas here? There are (at least) three potential bugs you need to avoid: 1. You must not underestimate the number of descriptors required here, otherwise the ring may overflow. Hopefully that's obvious. 2. You must ensure that the worst case number of descriptors does actually fit in the ring, and will probably need to set net_dev->gso_max_segs accordingly. Eric pointed that out already. 3. If you stop the queue because an skb doesn't fit, you should not wake it before there is enough space. A simple way to do this is: - Set a maximum number of segments to support (for sfc, I reckoned that 100 was enough) - Calculate the maximum number of descriptors that could be needed for that number of segments - Stop the queue when the free space in the ring is less than that maximum - Wake the queue when the free space reaches a threshold somewhere between that maximum and completely free It may make more sense to work backward from the ring size to maximum number of segments. Ben. -- Ben Hutchings Life would be so much easier if we could look at the source code. [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 3/3] net: mvneta: Introduce a software TSO implementation 2014-05-21 2:11 ` Ben Hutchings @ 2014-05-22 17:47 ` Ezequiel Garcia 0 siblings, 0 replies; 17+ messages in thread From: Ezequiel Garcia @ 2014-05-22 17:47 UTC (permalink / raw) To: Ben Hutchings Cc: netdev, David S. Miller, Eric Dumazet, cmetcalf, Thomas Petazzoni, Gregory Clement, Simon Guinot, Willy Tarreau, Tawfik Bayouk, Lior Amsalem, Simon Guinot On 21 May 03:11 AM, Ben Hutchings wrote: > On Mon, 2014-05-05 at 11:47 -0300, Ezequiel Garcia wrote: > > Hi all, > > > > On 10 Apr 07:58 PM, Ezequiel Garcia wrote: > > [..] > > > + > > > + /* Calculate expected number of TX descriptors */ > > > + desc_count = skb_shinfo(skb)->gso_segs * 2 + skb_shinfo(skb)->nr_frags; > > > + if ((txq->count + desc_count) >= txq->size) > > > + return 0; > > > + > > > > Is this calculus correct? Does it give the accurate number of needed > > descriptors or is it an approximation? > > > > There are (at least) three potential bugs you need to avoid: > > 1. You must not underestimate the number of descriptors required here, > otherwise the ring may overflow. Hopefully that's obvious. Yes, fully understood. > 2. You must ensure that the worst case number of descriptors does > actually fit in the ring, and will probably need to set > net_dev->gso_max_segs accordingly. Eric pointed that out already. Yeah, I still need to take a deep look at the commit pointed out by Eric. > 3. If you stop the queue because an skb doesn't fit, you should not wake > it before there is enough space. > > A simple way to do this is: > > - Set a maximum number of segments to support (for sfc, I reckoned that > 100 was enough) > - Calculate the maximum number of descriptors that could be needed for > that number of segments > - Stop the queue when the free space in the ring is less than that > maximum > - Wake the queue when the free space reaches a threshold somewhere > between that maximum and completely free > > It may make more sense to work backward from the ring size to maximum > number of segments. > Hm, this seems a good suggestion. I'm wondering if this can be added to the generic code. Although this is likely each driver's responsability. BTW, have you seen the proposal for the generic TSO API? http://permalink.gmane.org/gmane.linux.network/316852 Any comments about it would be much appreciated, as we'd really like to see TSO implemented on our drivers. Thanks, -- Ezequiel García, Free Electrons Embedded Linux, Kernel and Android Engineering http://free-electrons.com ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 0/3] mvneta: software TSO implementation 2014-04-10 22:57 [PATCH 0/3] mvneta: software TSO implementation Ezequiel Garcia ` (2 preceding siblings ...) 2014-04-10 22:58 ` [PATCH 3/3] net: mvneta: Introduce a software TSO implementation Ezequiel Garcia @ 2014-04-11 0:51 ` David Miller 2014-04-11 1:02 ` Eric Dumazet 3 siblings, 1 reply; 17+ messages in thread From: David Miller @ 2014-04-11 0:51 UTC (permalink / raw) To: ezequiel.garcia Cc: netdev, eric.dumazet, thomas.petazzoni, gregory.clement, simon.guinot, w, tawfik, alior This is not really driver specific code. Please create an appropriate abstraction, and put this code in a common place so that any driver can do this. Thanks. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 0/3] mvneta: software TSO implementation 2014-04-11 0:51 ` [PATCH 0/3] mvneta: " David Miller @ 2014-04-11 1:02 ` Eric Dumazet 2014-04-11 5:48 ` Willy Tarreau 0 siblings, 1 reply; 17+ messages in thread From: Eric Dumazet @ 2014-04-11 1:02 UTC (permalink / raw) To: David Miller Cc: ezequiel.garcia, netdev, thomas.petazzoni, gregory.clement, simon.guinot, w, tawfik, alior On Thu, 2014-04-10 at 20:51 -0400, David Miller wrote: > This is not really driver specific code. > > Please create an appropriate abstraction, and put this code in a > common place so that any driver can do this. > > Thanks. Note a prior implementation existed in drivers/net/ethernet/tile/tilegx.c Note sure how it can be properly factorized, but worth trying . ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 0/3] mvneta: software TSO implementation 2014-04-11 1:02 ` Eric Dumazet @ 2014-04-11 5:48 ` Willy Tarreau 2014-04-11 6:20 ` David Miller 2014-04-11 19:08 ` Ezequiel Garcia 0 siblings, 2 replies; 17+ messages in thread From: Willy Tarreau @ 2014-04-11 5:48 UTC (permalink / raw) To: Eric Dumazet Cc: David Miller, ezequiel.garcia, netdev, thomas.petazzoni, gregory.clement, simon.guinot, tawfik, alior On Thu, Apr 10, 2014 at 06:02:30PM -0700, Eric Dumazet wrote: > On Thu, 2014-04-10 at 20:51 -0400, David Miller wrote: > > This is not really driver specific code. > > > > Please create an appropriate abstraction, and put this code in a > > common place so that any driver can do this. > > > > Thanks. > > Note a prior implementation existed in > > drivers/net/ethernet/tile/tilegx.c > > Note sure how it can be properly factorized, but worth trying . I also tried to find how to do this for mv643xx_eth a few years ago based on other drivers (eg: tilegx) and failed to find anything common. At this level, we're really playing with the NIC's descriptors, and many of them have different capabilities (eg: HW checksums for all frames, VLAN or none, etc...). I don't really see how this could be factorized, and since many NICs do support TSO, I'm not convinced by the effort. Willy ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 0/3] mvneta: software TSO implementation 2014-04-11 5:48 ` Willy Tarreau @ 2014-04-11 6:20 ` David Miller 2014-04-11 6:30 ` Willy Tarreau 2014-04-11 19:08 ` Ezequiel Garcia 1 sibling, 1 reply; 17+ messages in thread From: David Miller @ 2014-04-11 6:20 UTC (permalink / raw) To: w Cc: eric.dumazet, ezequiel.garcia, netdev, thomas.petazzoni, gregory.clement, simon.guinot, tawfik, alior From: Willy Tarreau <w@1wt.eu> Date: Fri, 11 Apr 2014 07:48:47 +0200 > I also tried to find how to do this for mv643xx_eth a few years ago > based on other drivers (eg: tilegx) and failed to find anything > common. At this level, we're really playing with the NIC's descriptors, > and many of them have different capabilities (eg: HW checksums for all > frames, VLAN or none, etc...). I don't really see how this could be > factorized, and since many NICs do support TSO, I'm not convinced by > the effort. I wanted to hook this into the NIU driver as well, which also lacks TSO support. The descriptor and checksum part is chip specific, but the segmentation of the buffer segments, calculating the number of necessary descriptors, etc. absolutely is not. Please work on finding a suitable abstraction, it's possible. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 0/3] mvneta: software TSO implementation 2014-04-11 6:20 ` David Miller @ 2014-04-11 6:30 ` Willy Tarreau 2014-04-11 16:58 ` David Miller 0 siblings, 1 reply; 17+ messages in thread From: Willy Tarreau @ 2014-04-11 6:30 UTC (permalink / raw) To: David Miller Cc: eric.dumazet, ezequiel.garcia, netdev, thomas.petazzoni, gregory.clement, simon.guinot, tawfik, alior On Fri, Apr 11, 2014 at 02:20:06AM -0400, David Miller wrote: > From: Willy Tarreau <w@1wt.eu> > Date: Fri, 11 Apr 2014 07:48:47 +0200 > > > I also tried to find how to do this for mv643xx_eth a few years ago > > based on other drivers (eg: tilegx) and failed to find anything > > common. At this level, we're really playing with the NIC's descriptors, > > and many of them have different capabilities (eg: HW checksums for all > > frames, VLAN or none, etc...). I don't really see how this could be > > factorized, and since many NICs do support TSO, I'm not convinced by > > the effort. > > I wanted to hook this into the NIU driver as well, which also lacks > TSO support. > > The descriptor and checksum part is chip specific, but the segmentation > of the buffer segments, calculating the number of necessary descriptors, > etc. absolutely is not. Agreed. > Please work on finding a suitable abstraction, it's possible. Do you think for example that having an xmit_tso() function which would iterate over all segments and call something like : ->build_hdr() ->build_data() ->unroll_all() would be sufficient ? I fear that relying on function pointers in the inner loop can significantly offset the CPU savings, but it could be attempted. I think that the fact you want to do it for NIU is a good opportunity to try to do it the right way in the most generic mode, but we'll surely need your help on this! Willy ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 0/3] mvneta: software TSO implementation 2014-04-11 6:30 ` Willy Tarreau @ 2014-04-11 16:58 ` David Miller 0 siblings, 0 replies; 17+ messages in thread From: David Miller @ 2014-04-11 16:58 UTC (permalink / raw) To: w Cc: eric.dumazet, ezequiel.garcia, netdev, thomas.petazzoni, gregory.clement, simon.guinot, tawfik, alior From: Willy Tarreau <w@1wt.eu> Date: Fri, 11 Apr 2014 08:30:21 +0200 > Do you think for example that having an xmit_tso() function which would > iterate over all segments and call something like : > ->build_hdr() > ->build_data() > ->unroll_all() > > would be sufficient ? Yes, something involving callbacks is probably going to be necessary. > I think that the fact you want to do it for NIU is a good opportunity > to try to do it the right way in the most generic mode, but we'll > surely need your help on this! I can help for sure :) ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 0/3] mvneta: software TSO implementation 2014-04-11 5:48 ` Willy Tarreau 2014-04-11 6:20 ` David Miller @ 2014-04-11 19:08 ` Ezequiel Garcia 2014-04-11 19:43 ` Willy Tarreau 1 sibling, 1 reply; 17+ messages in thread From: Ezequiel Garcia @ 2014-04-11 19:08 UTC (permalink / raw) To: Willy Tarreau Cc: Eric Dumazet, David Miller, netdev, thomas.petazzoni, gregory.clement, simon.guinot, tawfik, alior On Apr 11, Willy Tarreau wrote: > On Thu, Apr 10, 2014 at 06:02:30PM -0700, Eric Dumazet wrote: > > On Thu, 2014-04-10 at 20:51 -0400, David Miller wrote: > > > This is not really driver specific code. > > > > > > Please create an appropriate abstraction, and put this code in a > > > common place so that any driver can do this. > > > > > > Thanks. > > > > Note a prior implementation existed in > > > > drivers/net/ethernet/tile/tilegx.c > > > > Note sure how it can be properly factorized, but worth trying . > > I also tried to find how to do this for mv643xx_eth a few years ago > [..] Can you share *any* code about this attempt on mv643xx_eth? It would be helpful to see other TSO cases, to have more examples in order to come up with a proper abstraction. -- Ezequiel García, Free Electrons Embedded Linux, Kernel and Android Engineering http://free-electrons.com ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 0/3] mvneta: software TSO implementation 2014-04-11 19:08 ` Ezequiel Garcia @ 2014-04-11 19:43 ` Willy Tarreau 0 siblings, 0 replies; 17+ messages in thread From: Willy Tarreau @ 2014-04-11 19:43 UTC (permalink / raw) To: Ezequiel Garcia Cc: Eric Dumazet, David Miller, netdev, thomas.petazzoni, gregory.clement, simon.guinot, tawfik, alior On Fri, Apr 11, 2014 at 04:08:27PM -0300, Ezequiel Garcia wrote: > On Apr 11, Willy Tarreau wrote: > > On Thu, Apr 10, 2014 at 06:02:30PM -0700, Eric Dumazet wrote: > > > On Thu, 2014-04-10 at 20:51 -0400, David Miller wrote: > > > > This is not really driver specific code. > > > > > > > > Please create an appropriate abstraction, and put this code in a > > > > common place so that any driver can do this. > > > > > > > > Thanks. > > > > > > Note a prior implementation existed in > > > > > > drivers/net/ethernet/tile/tilegx.c > > > > > > Note sure how it can be properly factorized, but worth trying . > > > > I also tried to find how to do this for mv643xx_eth a few years ago > > > [..] > > Can you share *any* code about this attempt on mv643xx_eth? It would be > helpful to see other TSO cases, to have more examples in order to come > up with a proper abstraction. Unfortunately no. I remember that I picked the equivalent code from mv_eth in the kirkwood's LSP and tried to adapt it to the mainline driver, but unfortunately by then I didn't exactly understand well what the driver was doing despite Eric explaining me a few things about GSO and segmentation. Too many things were obscure to me and I had to give up. The only vague memories I have are that this NIC doesn't support Tx csum on VLANs nor IPv6. That's about all, I regret. By the way, I could be wrong but I also have some memories about this in the myri10ge driver. I recall something like it using software TSO for VLANs or IPv6 or a combination of both, I don't exactly remember. It's also possible that it depends on the firmware versions. Anyway the driver is clean and well commented, it's easier to read than average. Hoping this helps, Willy ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2014-05-22 17:48 UTC | newest] Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-04-10 22:57 [PATCH 0/3] mvneta: software TSO implementation Ezequiel Garcia 2014-04-10 22:58 ` [PATCH 1/3] net: mvneta: Factorize feature setting Ezequiel Garcia 2014-04-10 22:58 ` [PATCH 2/3] net: mvneta: Clean mvneta_tx() sk_buff handling Ezequiel Garcia 2014-04-10 22:58 ` [PATCH 3/3] net: mvneta: Introduce a software TSO implementation Ezequiel Garcia 2014-04-10 23:16 ` Eric Dumazet 2014-05-05 14:47 ` Ezequiel Garcia 2014-05-07 6:04 ` Willy Tarreau 2014-05-21 2:11 ` Ben Hutchings 2014-05-22 17:47 ` Ezequiel Garcia 2014-04-11 0:51 ` [PATCH 0/3] mvneta: " David Miller 2014-04-11 1:02 ` Eric Dumazet 2014-04-11 5:48 ` Willy Tarreau 2014-04-11 6:20 ` David Miller 2014-04-11 6:30 ` Willy Tarreau 2014-04-11 16:58 ` David Miller 2014-04-11 19:08 ` Ezequiel Garcia 2014-04-11 19:43 ` Willy Tarreau
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.