From mboxrd@z Thu Jan 1 00:00:00 1970 From: Matan Azrad Subject: Re: [PATCH v2 4/7] net/mlx4: merge Tx path functions Date: Thu, 26 Oct 2017 12:30:54 +0000 Message-ID: References: <1508752838-30408-1-git-send-email-ophirmu@mellanox.com> <1508768520-4810-1-git-send-email-ophirmu@mellanox.com> <1508768520-4810-5-git-send-email-ophirmu@mellanox.com> <20171024135149.fyg4nzcbygo2amtz@laranjeiro-vm> <20171025075006.znxl7mezy4pfyzsj@laranjeiro-vm> <20171026121219.ke3dz7hv4a5zfpih@laranjeiro-vm> Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Cc: Ophir Munk , Adrien Mazarguil , "dev@dpdk.org" , Thomas Monjalon , Olga Shern , Mordechay Haimovsky To: =?iso-8859-1?Q?N=E9lio_Laranjeiro?= Return-path: Received: from EUR02-HE1-obe.outbound.protection.outlook.com (mail-eopbgr10074.outbound.protection.outlook.com [40.107.1.74]) by dpdk.org (Postfix) with ESMTP id F2A021BAE2 for ; Thu, 26 Oct 2017 14:31:02 +0200 (CEST) In-Reply-To: <20171026121219.ke3dz7hv4a5zfpih@laranjeiro-vm> Content-Language: en-US List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hi Nelio Please see my comments below (3). > -----Original Message----- > From: N=E9lio Laranjeiro [mailto:nelio.laranjeiro@6wind.com] > Sent: Thursday, October 26, 2017 3:12 PM > To: Matan Azrad > Cc: Ophir Munk ; Adrien Mazarguil > ; dev@dpdk.org; Thomas Monjalon > ; Olga Shern ; Mordechay > Haimovsky > Subject: Re: [dpdk-dev] [PATCH v2 4/7] net/mlx4: merge Tx path functions >=20 > On Thu, Oct 26, 2017 at 10:31:06AM +0000, Matan Azrad wrote: > > Hi Nelio > > > > I think the memory barrier discussion is not relevant for this patch > > (if it will be relevant I will create new one). > > Please see my comments inline. >=20 > It was not my single comment. There is also useless code like having nul= l > segments in the packets which is not allowed on DPDK. Sorry, but I can't find comments in the previous mails. Moreover this comment(first time I see it) is not relevant to this patch a= nd asking something else. All what this patch does is to merge 2 functions to prevent double asking about WQ remain space... Remove memory\compiler barriers or dealing with null segments are not in th= e scope here.=20 >=20 > > Regarding this specific patch, I didn't see any comment from you, Are > > you agree with it? > > > > > -----Original Message----- > > > From: N=E9lio Laranjeiro [mailto:nelio.laranjeiro@6wind.com] > > > Sent: Wednesday, October 25, 2017 10:50 AM > > > To: Ophir Munk > > > Cc: Adrien Mazarguil ; dev@dpdk.org; > > > Thomas Monjalon ; Olga Shern > > > ; Matan Azrad > > > Subject: Re: [dpdk-dev] [PATCH v2 4/7] net/mlx4: merge Tx path > > > functions > > > > > > On Tue, Oct 24, 2017 at 08:36:52PM +0000, Ophir Munk wrote: > > > > Hi, > > > > > > > > On Tuesday, October 24, 2017 4:52 PM, N=E9lio Laranjeiro wrote: > > > > > > > > > > On Mon, Oct 23, 2017 at 02:21:57PM +0000, Ophir Munk wrote: > > > > > > From: Matan Azrad > > > > > > > > > > > > Merge tx_burst and mlx4_post_send functions to prevent double > > > > > > asking about WQ remain space. > > > > > > > > > > > > This should improve performance. > > > > > > > > > > > > Signed-off-by: Matan Azrad > > > > > > --- > > > > > > drivers/net/mlx4/mlx4_rxtx.c | 353 > > > > > > +++++++++++++++++++++---------------------- > > > > > > 1 file changed, 170 insertions(+), 183 deletions(-) > > > > > > > > > > What are the real expectation you have on the remaining patches > > > > > of the series? > > > > > > > > > > According to the comment of this commit log "This should improve > > > > > performance" there are too many barriers at each packet/segment > > > > > level to improve something. > > > > > > > > > > The point is, mlx4_burst_tx() should write all the WQE without > > > > > any barrier as it is processing a burst of packets (whereas > > > > > Verbs functions which may only process a single packet). > > > > > > > > > The lonely barrier which should be present is the one to ensure > > > > > that all the host memory is flushed before triggering the Tx door= bell. > > > > > > > > > > > > > There is a known ConnectX-3 HW limitation: the first 4 bytes of > > > > every TXWBB (64 bytes chunks) should be written in a reversed > > > > order (from last TXWBB to first TXWBB). > > > > > > This means the first WQE filled by the burst function is the doorbell= . > > > In such situation, the first four bytes of it can be written before > > > leaving the burst function and after a write memory barrier. > > > > > > Until this first WQE is not complete, the NIC won't start processing > > > the packets. Memory barriers per packets becomes useless. > > > > I think this is not true, Since mlx4 HW can prefetch advanced TXbbs if > > their first 4 bytes are valid in spite of the first WQE is still not va= lid (please > read the spec). >=20 > A compiler barrier is enough on x86 to forbid the CPU to re-order the > instructions, on arm you need a memory barrier, there is a macro in DPDK = for > that, rte_io_wmb(). >=20 We are also using compiler barrier here. > Before triggering the doorbell you must flush the case, this is the only = place > where the rte_wmb() should be used. >=20 We are also using memory barrier only for this reason. > > > It gives something like: > > > > > > uint32_t tx_bb_db =3D 0; > > > void *first_wqe =3D NULL; > > > > > > /* > > > * Prepare all Packets by writing the WQEs without the 4 first bytes= of > > > * the first WQE. > > > */ > > > for () { > > > if (!wqe) { > > > first_wqe =3D wqe; > > > tx_bb_db =3D foo; > > > } > > > } > > > /* Leaving. */ > > > rte_wmb(); > > > *(uin32_t*)wqe =3D tx_bb_db; > > > return n; > > > > > > > I will take care to check if we can do 2 loops: > > Write all last 60B per TXbb. > > Memory barrier. > > Write all first 4B per TXbbs. > > > > > > The last 60 bytes of any TXWBB can be written in any order (before > > > > writing the first 4 bytes). > > > > Is your last statement (using lonely barrier) is in accordance > > > > with this limitation? Please explain. > > > > > > > > > There is also too many cases handled which are useless in bursts > > > situation, > > > > > this function needs to be re-written to its minimal use case i.e. > > > processing a > > > > > valid burst of packets/segments and triggering at the end of the > > > > > burst the > > > Tx > > > > > doorbell. > > > > > > > > > > > Regards, > > > > > > -- > > > N=E9lio Laranjeiro > > > 6WIND >=20 > Regards, >=20 > -- > N=E9lio Laranjeiro > 6WIND