From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752125AbbJEUWr (ORCPT ); Mon, 5 Oct 2015 16:22:47 -0400 Received: from mail-io0-f176.google.com ([209.85.223.176]:33695 "EHLO mail-io0-f176.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752020AbbJEUWp (ORCPT ); Mon, 5 Oct 2015 16:22:45 -0400 MIME-Version: 1.0 In-Reply-To: <5612A3F3.2040609@linux.intel.com> References: <20151005152236.GA8140@thunk.org> <5612A3F3.2040609@linux.intel.com> Date: Mon, 5 Oct 2015 21:22:44 +0100 X-Google-Sender-Auth: lnjlaG_6qwOFW3PfENI6wd02C64 Message-ID: Subject: Re: [REGRESSION] 998ef75ddb and aio-dio-invalidate-failure w/ data=journal From: Linus Torvalds To: Dave Hansen , Peter Anvin Cc: "Theodore Ts'o" , Andrew Morton , "linux-ext4@vger.kernel.org" , Linux Kernel Mailing List Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Oct 5, 2015 at 5:23 PM, Dave Hansen wrote: > > One thing I've been noticing on Skylake is that barriers (implicit and > explicit) are showing up more in profiles. Ahh, you're on skylake? It's entirely possible that the issue is that the whole "stac/mov/clac" is much more expensive because skylake actually ends up supporting those AC instructions. That would make sense. We could probably do them outside the loop, rather than tightly around the actual move instructions. Peter (hpa), is there some sane interface to try to do that? > What we're seeing here > probably isn't actually stac/clac overhead, but the cost of finishing > some other operations that are outstanding before we can proceed through > here. I suspect it actually _is_ stac/clac overhead. It might well be that clac/stac ends up serializing loads some way. Last I heard, they were reasonably cheap but certainly not free - and when we're talking about something that just loops over bringing the line into cache, it might be relatively expensive. How did you do the profile? Use "-e cycles:pp" to get the precise profile information, which should actually attribute the cost to the instruction that really causes it. Linus