From mboxrd@z Thu Jan  1 00:00:00 1970
From: "Dumitrescu, Cristian" <cristian.dumitrescu@intel.com>
Subject: Re: [PATCH 4/4] port: fix ethdev writer burst too big
Date: Thu, 31 Mar 2016 13:22:47 +0000
Message-ID: <3EB4FA525960D640B5BDFFD6A3D8912647974F2E@IRSMSX108.ger.corp.intel.com>
References: <1459198297-49854-1-git-send-email-rsanford@akamai.com>
 <1459198297-49854-5-git-send-email-rsanford@akamai.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Cc: "Liang, Cunming" <cunming.liang@intel.com>
To: Robert Sanford <rsanford2@gmail.com>, "dev@dpdk.org" <dev@dpdk.org>
Return-path: <dev-bounces@dpdk.org>
Received: from mga09.intel.com (mga09.intel.com [134.134.136.24])
 by dpdk.org (Postfix) with ESMTP id B26307F70
 for <dev@dpdk.org>; Thu, 31 Mar 2016 15:22:50 +0200 (CEST)
In-Reply-To: <1459198297-49854-5-git-send-email-rsanford@akamai.com>
Content-Language: en-US
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>


> -----Original Message-----
> From: Robert Sanford [mailto:rsanford2@gmail.com]
> Sent: Monday, March 28, 2016 9:52 PM
> To: dev@dpdk.org; Dumitrescu, Cristian <cristian.dumitrescu@intel.com>
> Subject: [PATCH 4/4] port: fix ethdev writer burst too big
>=20
> For f_tx_bulk functions in rte_port_ethdev.c, we may unintentionally
> send bursts larger than tx_burst_sz to the underlying ethdev.
> Some PMDs (e.g., ixgbe) may truncate this request to their maximum
> burst size, resulting in unnecessary enqueuing failures or ethdev
> writer retries.

Sending bursts larger than tx_burst_sz is actually intentional. The assumpt=
ion is that NIC performance benefits from larger burst size. So the tx_burs=
t_sz is used as a minimal burst size requirement, not as a maximal or fixed=
 burst size requirement.

I agree with you that a while ago the vector version of IXGBE driver used t=
o work the way you describe it, but I don't think this is the case anymore.=
 As an example, if TX burst size is set to 32 and 48 packets are transmitte=
d, than the PMD will TX all the 48 packets (internally it can work in batch=
es of 4, 8, 32, etc, should not matter) rather than TXing just 32 packets o=
ut of 48 and user having to either discard or retry with the remaining 16 p=
ackets. I am CC-ing Steve Liang for confirming this.

Is there any PMD that people can name that currently behaves the opposite, =
i.e. given a burst of 48 pkts for TX, accept 32 pkts and discard the other =
16?

>=20
> We propose to fix this by moving the tx buffer flushing logic from
> *after* the loop that puts all packets into the tx buffer, to *inside*
> the loop, testing for a full burst when adding each packet.
>=20

The issue I have with this approach is the introduction of a branch that ha=
s to be tested for each iteration of the loop rather than once for the enti=
re loop.

The code branch where you add this is actually the slow(er) code path (wher=
e local variable expr !=3D 0), which is used for non-contiguous or bursts s=
maller than tx_burst_sz. Is there a particular reason you are only interest=
ed of enabling this strategy (of using tx_burst_sz as a fixed burst size re=
quirement) only on this code path? The reason I am asking is the other fast=
(er) code path (where expr =3D=3D 0) also uses tx_burst_sz as a minimal req=
uirement and therefore it can send burst sizes bigger than tx_burst_sz.


> Signed-off-by: Robert Sanford <rsanford@akamai.com>
> ---
>  lib/librte_port/rte_port_ethdev.c |   20 ++++++++++----------
>  1 files changed, 10 insertions(+), 10 deletions(-)
>=20
> diff --git a/lib/librte_port/rte_port_ethdev.c
> b/lib/librte_port/rte_port_ethdev.c
> index 3fb4947..1283338 100644
> --- a/lib/librte_port/rte_port_ethdev.c
> +++ b/lib/librte_port/rte_port_ethdev.c
> @@ -151,7 +151,7 @@ static int rte_port_ethdev_reader_stats_read(void
> *port,
>  struct rte_port_ethdev_writer {
>  	struct rte_port_out_stats stats;
>=20
> -	struct rte_mbuf *tx_buf[2 * RTE_PORT_IN_BURST_SIZE_MAX];
> +	struct rte_mbuf *tx_buf[RTE_PORT_IN_BURST_SIZE_MAX];
>  	uint32_t tx_burst_sz;
>  	uint16_t tx_buf_count;
>  	uint64_t bsz_mask;
> @@ -257,11 +257,11 @@ rte_port_ethdev_writer_tx_bulk(void *port,
>  			p->tx_buf[tx_buf_count++] =3D pkt;
>=20
> 	RTE_PORT_ETHDEV_WRITER_STATS_PKTS_IN_ADD(p, 1);
>  			pkts_mask &=3D ~pkt_mask;
> -		}
>=20
> -		p->tx_buf_count =3D tx_buf_count;
> -		if (tx_buf_count >=3D p->tx_burst_sz)
> -			send_burst(p);
> +			p->tx_buf_count =3D tx_buf_count;
> +			if (tx_buf_count >=3D p->tx_burst_sz)
> +				send_burst(p);
> +		}
>  	}

One observation here: if we enable this proposal (which I have an issue wit=
h due to the executing the branch per loop iteration rather than once per e=
ntire loop), it also eliminates the buffer overflow issue flagged by you in=
 the other email :), so no need to e.g. doble the size of the port internal=
 buffer (tx_buf).

>=20
>  	return 0;
> @@ -328,7 +328,7 @@ static int rte_port_ethdev_writer_stats_read(void
> *port,
>  struct rte_port_ethdev_writer_nodrop {
>  	struct rte_port_out_stats stats;
>=20
> -	struct rte_mbuf *tx_buf[2 * RTE_PORT_IN_BURST_SIZE_MAX];
> +	struct rte_mbuf *tx_buf[RTE_PORT_IN_BURST_SIZE_MAX];
>  	uint32_t tx_burst_sz;
>  	uint16_t tx_buf_count;
>  	uint64_t bsz_mask;
> @@ -466,11 +466,11 @@ rte_port_ethdev_writer_nodrop_tx_bulk(void
> *port,
>  			p->tx_buf[tx_buf_count++] =3D pkt;
>=20
> 	RTE_PORT_ETHDEV_WRITER_NODROP_STATS_PKTS_IN_ADD(p, 1);
>  			pkts_mask &=3D ~pkt_mask;
> -		}
>=20
> -		p->tx_buf_count =3D tx_buf_count;
> -		if (tx_buf_count >=3D p->tx_burst_sz)
> -			send_burst_nodrop(p);
> +			p->tx_buf_count =3D tx_buf_count;
> +			if (tx_buf_count >=3D p->tx_burst_sz)
> +				send_burst_nodrop(p);
> +		}
>  	}
>=20
>  	return 0;
> --
> 1.7.1