From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?UTF-8?Q?Mattias_R=c3=b6nnblom?= Subject: Re: [PATCH 1/2] ring: synchronize the load and store of the tail Date: Tue, 6 Nov 2018 12:03:25 +0100 Message-ID: References: <1537172244-64874-2-git-send-email-gavin.hu@arm.com> <1874944.OrACW1nkDZ@xps> <20181027150024.GA2294@jerin> <17713879.gC9jYcxDUo@xps> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Cc: "Gavin Hu (Arm Technology China)" , "dev@dpdk.org" , "stable@dpdk.org" , Ola Liljedahl , "olivier.matz@6wind.com" , "chaozhu@linux.vnet.ibm.com" , "bruce.richardson@intel.com" , "konstantin.ananyev@intel.com" , nd To: Honnappa Nagarahalli , Thomas Monjalon , Jerin Jacob Return-path: In-Reply-To: Content-Language: en-US List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 2018-11-05 22:51, Honnappa Nagarahalli wrote: >> I've also run an out-of-tree DSW throughput benchmark, and I've found that >> going from Non-C11 to C11 gives a 4% slowdown. After this patch, the >> slowdown is only 2,8%. > This is interesting. The general understanding seems to be that C11 atomics should not add any additional instructions on x86. But, we still see some drop in performance. Is this attributed to compiler not being allowed to re-order? > I was lazy enough not to disassemble, so I don't know. I would suggest non-C11 mode stays as the default on x86_64.