From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BE6EAC48BE5 for ; Wed, 16 Jun 2021 13:37:46 +0000 (UTC) Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by mail.kernel.org (Postfix) with ESMTP id 39D5E61166 for ; Wed, 16 Jun 2021 13:37:46 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 39D5E61166 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=dev-bounces@dpdk.org Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 806914067A; Wed, 16 Jun 2021 15:37:45 +0200 (CEST) Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by mails.dpdk.org (Postfix) with ESMTP id 1941740140 for ; Wed, 16 Jun 2021 15:37:42 +0200 (CEST) IronPort-SDR: M8WAgos8SX01O8bgZz6nRZvHTAsn6rua4uS+9VvDdyV3nrckdmnLw2p6rUt1POxYEljIaMzdaN awudGIYOOEIg== X-IronPort-AV: E=McAfee;i="6200,9189,10016"; a="270025751" X-IronPort-AV: E=Sophos;i="5.83,278,1616482800"; d="scan'208";a="270025751" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Jun 2021 06:37:41 -0700 IronPort-SDR: 3Ht/53wIRJkQ00ZWIDCDhrbhbYCqzFcnFAPs5shhpKjQIk5AQDAca8hcpz3fRA5feQD+Jbp/xA Rd5mjOzqdnAQ== X-IronPort-AV: E=Sophos;i="5.83,278,1616482800"; d="scan'208";a="488187271" Received: from bricha3-mobl.ger.corp.intel.com ([10.252.12.169]) by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-SHA; 16 Jun 2021 06:37:39 -0700 Date: Wed, 16 Jun 2021 14:37:36 +0100 From: Bruce Richardson To: "Zhang, Qi Z" Cc: Honnappa Nagarahalli , Joyce Kong , "Xing, Beilei" , Ruifeng Wang , "dev@dpdk.org" , nd Message-ID: References: <20210604073405.14880-1-joyce.kong@arm.com> <561469a10f13450bae9e857f186b0123@intel.com> <2cb94e2e1bf74840acaadc389b4745f5@intel.com> <12226b6e56ad4c11845242031c9505d9@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <12226b6e56ad4c11845242031c9505d9@intel.com> Subject: Re: [dpdk-dev] [PATCH v1] net/i40e: remove the SMP barrier in HW scanning func X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On Wed, Jun 16, 2021 at 01:29:24PM +0000, Zhang, Qi Z wrote: > Hi > > > -----Original Message----- > > From: Honnappa Nagarahalli > > Sent: Tuesday, June 8, 2021 5:36 AM > > To: Zhang, Qi Z ; Joyce Kong ; > > Xing, Beilei ; Ruifeng Wang > > Cc: dev@dpdk.org; nd ; Honnappa Nagarahalli > > ; nd > > Subject: RE: [PATCH v1] net/i40e: remove the SMP barrier in HW scanning > > func > > > > > > > > > > > > > > > > > > > > > > Add the logic to determine how many DD bits have been set for > > > > > > contiguous packets, for removing the SMP barrier while reading descs. > > > > > > > > > > I didn't understand this. > > > > > The current logic already guarantee the read out DD bits are from > > > > > continue packets, as it read Rx descriptor in a reversed order > > > > > from the > > > ring. > > > > Qi, the comments in the code mention that there is a race condition > > > > if the descriptors are not read in the reverse order. But, they do > > > > not mention what the race condition is and how it can occur. > > > > Appreciate if you could explain that. > > > > > > The Race condition happens between the NIC and CPU, if write and read > > > DD bit in the same order, there might be a hole (e.g. 1011) with the > > > reverse read order, we make sure no more "1" after the first "0" > > > as the read address are declared as volatile, compiler will not > > > re-ordered them. > > My understanding is that > > > > 1) the NIC will write an entire cache line of descriptors to memory "atomically" > > (i.e. the entire cache line is visible to the CPU at once) if there are enough > > descriptors ready to fill one cache line. > > 2) But, if there are not enough descriptors ready (because for ex: there is not > > enough traffic), then it might write partial cache lines. > > Yes, for example a cache line contains 4 x16 bytes descriptors and it is possible we get 1 1 1 0 for DD bit at some moment. > > > > > Please correct me if I am wrong. > > > > For #1, I do not think it matters if we read the descriptors in reverse order or > > not as the cache line is written atomically. > > I think below cases may happens if we don't read in reserve order. > > 1. CPU get first cache line as 1 1 1 0 in a loop > 2. new packets coming and NIC append last 1 to the first cache and a new cache line with 1 1 1 1. > 3. CPU continue new cache line with 1 1 1 1 in the same loop, but the last 1 of first cache line is missed, so finally it get 1 1 1 0 1 1 1 1. > The one-sentence answer here is: when two entities are moving along a line in the same direction - like two runners in a race - then they can pass each other multiple times as each goes slower or faster at any point in time, whereas if they are moving in opposite directions there will only ever be one cross-over point no matter how the speed of each changes. In the case of NIC and software this fact means that there will always be a clear cross-over point from DD set to not-set. > > > For #1, if we read in reverse order, does it make sense to not check the DD bits > > of descriptors that are earlier in the order once we encounter a descriptor that > > has its DD bit set? This is because NIC updates the descriptors in order. > > I think the answer is yes, when we met the first DD bit, we should able to calculated the exact number base on the index, but not sure how much performance gain. > The other factors here are: 1. The driver does not do a straight read of all 32 DD bits in one go, rather it does 8 at a time and aborts at the end of a set of 8 if not all are valid. 2. For any that are set, we have to read the descriptor anyway to get the packet data out of it, so in the shortcut case of the last descriptor being set, we still have to read the other 7 anyway, and DD comes for free as part of it. 3. Blindly reading 8 at a time reduces the branching to just a single decision point at the end of each set of 8, reducing possible branch mispredicts.