From mboxrd@z Thu Jan 1 00:00:00 1970 From: Shahaf Shuler Subject: Re: [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER Date: Tue, 19 Mar 2019 19:42:21 +0000 Message-ID: References: <1552913893-43407-1-git-send-email-dekelp@mellanox.com> <001d01d4de03$378f18a0$a6ad49e0$@linux.vnet.ibm.com> <1789153.zrlSK8XYcq@xps> Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Cc: Yongseok Koh , "dev@dpdk.org" , Ori Kam , "stable@dpdk.org" To: Thomas Monjalon , Dekel Peled , Chao Zhu Return-path: In-Reply-To: <1789153.zrlSK8XYcq@xps> Content-Language: en-US List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Tuesday, March 19, 2019 1:15 PM, Thomas Monjalon: > Subject: Re: [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER >=20 > Guys, please let's avoid top-post. >=20 > You are both not replying to each other: >=20 > 1/ Dekel mentioned the IBM doc but Chao did not argue about the lack of I= O > protection with lwsync. > We assume that rte_mb should protect any access including IO. >=20 > 2/ Chao asked about the semantic of the barrier used in mlx5 code, but De= kel > did not reply about the semantic: are we protecting IO or general memory > access? In mlx5 code we want to sync between two different writes: 1. write to system memory (RAM) 2. write to MMIO memory (device) We need #1 to be visible on host memory before #2 is committed to NIC. We want to have a single type of barrier which will translate to the correc= t assembly based on the system arch, and in addition we want it light-weigh= t as possible. So far, when not running on power, we used the rte_wmb for that. On x86 and= ARM systems it provided the needed guarantees. =20 It is also mentioned in the barrier doxygen on ARM arch: " Write memory barrier. =20 =20 Guarantees that the STORE operations generated before the barrier occur before the STORE operations generated after. " It doesn't restrict to store to system memory only.=20 w/ power is on somewhat different and in fact rte_mb is required. It obviou= sly miss the point of those barrier if we will need to use a different barr= ier based on the system arch.=20 We need to align the definition of the different barriers in DPDK: 1. need a clear documentation of each. this should be global and not part o= f the specific implementation on each arch.=20 2. either modify ppc rte_wmb to match ARM and x86 ones or to define a new t= ype of barrier which will sync between both I/O and stores to systems memor= y.=20 >=20 >=20 > 19/03/2019 11:05, Dekel Peled: > > Hi, > > > > For ppc, rte_io_mb() is defined as rte_mb(), which is defined as asm sy= nc. > > According to comments in arch/ppc_64/rte_atomic.h, rte_wmb() and > rte_rmb() are the same as rte_mb(), for store and load respectively. > > My patch propose to define rte_wmb() and rte_rmb() as asm sync, like > rte_mb(), since using lwsync is incorrect for them. > > > > Regards, > > Dekel > > > > > -----Original Message----- > > > From: Chao Zhu > > > Sent: Tuesday, March 19, 2019 5:24 AM > > > To: Dekel Peled > > > Cc: Yongseok Koh ; Shahaf Shuler > > > ; dev@dpdk.org; Ori Kam > ; > > > Thomas Monjalon ; stable@dpdk.org > > > Subject: RE: [PATCH] eal/ppc: remove fix of memory barrier for IBM > > > POWER > > > > > > Dekel=A3=AC > > > > > > To control the memory order for device memory, I think you should > > > use > > > rte_io_mb() instead of rte_mb(). This will generate correct result. > > > rte_wmb() is used for system memory. > > > > > > > -----Original Message----- > > > > From: Dekel Peled > > > > Sent: Monday, March 18, 2019 8:58 PM > > > > To: chaozhu@linux.vnet.ibm.com > > > > Cc: yskoh@mellanox.com; shahafs@mellanox.com; dev@dpdk.org; > > > > orika@mellanox.com; thomas@monjalon.net; dekelp@mellanox.com; > > > > stable@dpdk.org > > > > Subject: [PATCH] eal/ppc: remove fix of memory barrier for IBM > > > > POWER > > > > > > > > From previous patch description: "to improve performance on PPC64, > > > > use light weight sync instruction instead of sync instruction." > > > > > > > > Excerpt from IBM doc [1], section "Memory barrier instructions": > > > > "The second form of the sync instruction is light-weight sync, or l= wsync. > > > > This form is used to control ordering for storage accesses to > > > > system memory only. It does not create a memory barrier for > > > > accesses to device > > > memory." > > > > > > > > This patch removes the use of lwsync, so calls to rte_wmb() and > > > > rte_rmb() will provide correct memory barrier to ensure order of > > > > accesses to system memory and device memory. > > > > > > > > [1] > > > > > > > > https://eur03.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2Fww > > > w > > > . > > > > > > > > ibm.com%2Fdeveloperworks%2Fsystems%2Farticles%2Fpowerpc.html& > > > ;data=3D > > > > > > > > 02%7C01%7Cdekelp%40mellanox.com%7C381426b6b9d042f776fa08d6ac1a5d > > > c5%7Ca > > > > > > > > 652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636885626593364016&am > > > p;sdata > > > > > > > > =3DwFYTcFX2A%2BMdtQMgtojTAtUOzqds7U5pypNS%2F2SoXUM%3D&re > > > served=3D0 > > > > > > > > Fixes: d23a6bd04d72 ("eal/ppc: fix memory barrier for IBM POWER") > > > > Cc: stable@dpdk.org > > > > > > > > Signed-off-by: Dekel Peled > > > > --- > > > > lib/librte_eal/common/include/arch/ppc_64/rte_atomic.h | 8 > > > > -------- > > > > 1 file changed, 8 deletions(-) > > > > > > > > diff --git > > > > a/lib/librte_eal/common/include/arch/ppc_64/rte_atomic.h > > > > b/lib/librte_eal/common/include/arch/ppc_64/rte_atomic.h > > > > index ce38350..797381c 100644 > > > > --- a/lib/librte_eal/common/include/arch/ppc_64/rte_atomic.h > > > > +++ b/lib/librte_eal/common/include/arch/ppc_64/rte_atomic.h > > > > @@ -63,11 +63,7 @@ > > > > * Guarantees that the STORE operations generated before the barri= er > > > > * occur before the STORE operations generated after. > > > > */ > > > > -#ifdef RTE_ARCH_64 > > > > -#define rte_wmb() asm volatile("lwsync" : : : "memory") > > > > -#else > > > > #define rte_wmb() asm volatile("sync" : : : "memory") > > > > -#endif > > > > > > > > /** > > > > * Read memory barrier. > > > > @@ -75,11 +71,7 @@ > > > > * Guarantees that the LOAD operations generated before the barrie= r > > > > * occur before the LOAD operations generated after. > > > > */ > > > > -#ifdef RTE_ARCH_64 > > > > -#define rte_rmb() asm volatile("lwsync" : : : "memory") > > > > -#else > > > > #define rte_rmb() asm volatile("sync" : : : "memory") > > > > -#endif > > > > > > > > #define rte_smp_mb() rte_mb() >=20 >=20