From mboxrd@z Thu Jan  1 00:00:00 1970
From: "Phil Yang (Arm Technology China)" <Phil.Yang@arm.com>
Subject: Re: [PATCH v2 2/3] kni: fix kni fifo synchronization
Date: Fri, 21 Sep 2018 09:00:31 +0000
Message-ID: <DB7PR08MB33854B2E37011D9E04FFB7CFE9120@DB7PR08MB3385.eurprd08.prod.outlook.com>
References: <1537363820-3827-1-git-send-email-phil.yang@arm.com>
 <1537364560-4124-1-git-send-email-phil.yang@arm.com>
 <1537364560-4124-2-git-send-email-phil.yang@arm.com>
 <20180920082846.GB19425@jerin>
 <AM6PR08MB36723E16AEC4D338242690E598130@AM6PR08MB3672.eurprd08.prod.outlook.com>
 <20180920153700.GA9459@jerin>
 <AM6PR08MB36720F445FF19B928E144E3F98120@AM6PR08MB3672.eurprd08.prod.outlook.com>
 <20180921055529.GA15861@jerin>
 <AM6PR08MB3672606A5866D22C3C727AA498120@AM6PR08MB3672.eurprd08.prod.outlook.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Cc: "dev@dpdk.org" <dev@dpdk.org>, nd <nd@arm.com>,
 "kkokkilagadda@caviumnetworks.com" <kkokkilagadda@caviumnetworks.com>, "Gavin
 Hu (Arm Technology China)" <Gavin.Hu@arm.com>, "ferruh.yigit@intel.com"
 <ferruh.yigit@intel.com>, Ola Liljedahl <Ola.Liljedahl@arm.com>
To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>, Jerin Jacob
 <jerin.jacob@caviumnetworks.com>
Return-path: <dev-bounces@dpdk.org>
Received: from EUR03-DB5-obe.outbound.protection.outlook.com
 (mail-eopbgr40085.outbound.protection.outlook.com [40.107.4.85])
 by dpdk.org (Postfix) with ESMTP id 4195B1041
 for <dev@dpdk.org>; Fri, 21 Sep 2018 11:00:33 +0200 (CEST)
In-Reply-To: <AM6PR08MB3672606A5866D22C3C727AA498120@AM6PR08MB3672.eurprd08.prod.outlook.com>
Content-Language: en-US
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

+ Ola Liljedahl <Ola.Liljedahl@arm.com>

Thanks,
Phil Yang

> -----Original Message-----
> From: Honnappa Nagarahalli
> Sent: Friday, September 21, 2018 2:37 PM
> To: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> Cc: Phil Yang (Arm Technology China) <Phil.Yang@arm.com>; dev@dpdk.org; n=
d
> <nd@arm.com>; kkokkilagadda@caviumnetworks.com; Gavin Hu (Arm
> Technology China) <Gavin.Hu@arm.com>; ferruh.yigit@intel.com
> Subject: RE: [PATCH v2 2/3] kni: fix kni fifo synchronization
>=20
> > > > > > >
> > > > > > > @@ -69,5 +89,13 @@ kni_fifo_get(struct rte_kni_fifo *fifo,
> > > > > > > void **data, unsigned num)  static inline uint32_t
> > > > > > > kni_fifo_count(struct rte_kni_fifo *fifo)  {
> > > > > > > +#ifdef RTE_USE_C11_MEM_MODEL
> > > > > > > +       unsigned fifo_write =3D __atomic_load_n(&fifo->write,
> > > > > > > +                                                 __ATOMIC_AC=
QUIRE);
> > > > > > > +       unsigned fifo_read =3D __atomic_load_n(&fifo->read,
> > > > > > > +
> > > > > > > +__ATOMIC_ACQUIRE);
> > > > > >
> > > > > > Isn't too  heavy to have two __ATOMIC_ACQUIREs? a simple
> > > > > > rte_smp_rmb() would be enough here. Right?
> > > > > > or
> > > > > > Do we need __ATOMIC_ACQUIRE for fifo_write case?
> > > > > >
> > > > > We also had some amount of debate internally on this:
> > > > > 1) We do not want to use rte_smp_rmb() as we want to keep the
> > > > > memory
> > > > models separated (for ex: while using C11, use C11 everywhere). It
> > > > is also not sufficient, please see 3) below.
> > > >
> > > > But Nothing technically wrong in using rte_smp_rmb() here in terms
> > > > functionally and code generated by the compiler.
> > >
> > > rte_smp_rmb() generates 'DMB ISHLD'. This works fine, but it is not o=
ptimal.
> > 'LDAR' is a better option which is generated when C11 atomics are used.
> >
> > Yes. But which one is optimal 1 x DMB ISHLD vs 2 x LDAR ?
>=20
> Good point. I am not sure which one is optimal, it needs to be measured. =
'DMB
> ISHLD' orders 'all' earlier loads against 'all' later loads and stores. '=
LDAR' orders
> the 'specific' load with 'all' later loads and stores.
>=20
> >
> > >
> > > >
> > > > > 2) This API can get called from writer or reader, so both the
> > > > > loads have to be __ATOMIC_ACQUIRE
> > > > > 3) Other option is to use __ATOMIC_RELAXED. That would allow any
> > > > loads/stores around of this API to get reordered, especially since
> > > > this is an inline function. This would put burden on the
> > > > application to manage the ordering depending on its usage. It will
> > > > also require the application to understand the implementation of th=
is API.
> > > >
> > > > __ATOMIC_RELAXED may be fine too for _count() case as it may not
> > > > very important to get the exact count for the exact very moment,
> > > > Application can retry.
> > > >
> > > > I am in favor of performance effective implementation.
> > >
> > > The requirement on the correctness of the count depends on the usage
> > > of
> > this function. I see the following usage:
> > >
> > > In the file kni_net.c, function: kni_net_tx:
> > >
> > >        if (kni_fifo_free_count(kni->tx_q) =3D=3D 0 ||
> > >                         kni_fifo_count(kni->alloc_q) =3D=3D 0) {
> > >                 /**
> > >                  * If no free entry in tx_q or no entry in alloc_q,
> > >                  * drops skb and goes out.
> > >                  */
> > >                 goto drop;
> > >         }
> > >
> > > There is no retry here, the packet is dropped.
> >
> > OK. Then pick an implementation which is an optimal this case.
> > I think, then rte_smp_rmb() makes sense here as
> > a) no #ifdef clutter
> > b) it is optimal compared to 2 x LDAR
> >
> As I understand, one of the principals of using C11 model is to match the=
 store
> releases and load acquires. IMO, combining C11 memory model with barrier
> based functions makes the code unreadable.
> I realized rte_smp_rmb() is required for x86 as well to prevent compiler
> reordering. We can add that in the non-C11 case. This way, we will have c=
lean
> code for both the options (similar to rte_ring).
> So, if 'RTE_USE_C11_MEM_MODEL' is set to 'n', then the 'rte_smp_rmb' woul=
d
> be used.
>=20
> We can look at handling the #ifdef clutter based on Ferruh's feedback.
>=20
> >
> > >
> > > >
> > > > >
> > > > > >
> > > > > > Other than that, I prefer to avoid ifdef clutter by
> > > > > > introducing two separate file just like ring C11 implementation=
.
> > > > > >
> > > > > > I don't have strong opinion on this this part, I let KNI
> > > > > > MAINTAINER to decide on how to accommodate this change.
> > > > >
> > > > > I prefer to change this as well, I am open for suggestions.
> > > > > Introducing two separate files would be too much for this library=
.
> > > > > A better
> > > > way would be to have something similar to 'smp_store_release'
> > > > provided by the kernel. i.e. create #defines for loads/stores.
> > > > Hide the clutter behind the #defines.
> > > >
> > > > No Strong opinion on this, leaving to KNI Maintainer.
> > > Will wait on this before re-spinning the patch
> > >
> > > >
> > > > This patch needs to split by two,
> > > > a) Fixes for non C11 implementation(i.e new addition to
> > > > rte_smp_wmb())
> > > > b) add support for C11 implementation.
> > > Agree
> > >
> > > >
> > > > >
> > > > > >
> > > > > > > +       return (fifo->len + fifo_write - fifo_read) &
> > > > > > > +(fifo->len - 1); #else
> > > > > > >         return (fifo->len + fifo->write - fifo->read) &
> > > > > > > (fifo->len
> > > > > > > - 1);
> Requires rte_smp_rmb() for x86 to prevent compiler reordering.
>=20
> > > > > > > +#endif
> > > > > > >  }
> > > > > > > --
> > > > > > > 2.7.4
> > > > > > >