From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754393AbcFPUYB (ORCPT ); Thu, 16 Jun 2016 16:24:01 -0400 Received: from mga02.intel.com ([134.134.136.20]:3336 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752655AbcFPUX7 (ORCPT ); Thu, 16 Jun 2016 16:23:59 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.26,481,1459839600"; d="scan'208";a="999225014" Subject: Re: [PATCH v3] Linux VM workaround for Knights Landing A/D leak To: Lukasz Anaczkowski , hpa@zytor.com, mingo@redhat.com, tglx@linutronix.de, ak@linux.intel.com, kirill.shutemov@linux.intel.com, mhocko@suse.com, akpm@linux-foundation.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org References: <1465923672-14232-1-git-send-email-lukasz.anaczkowski@intel.com> <1466090042-30908-1-git-send-email-lukasz.anaczkowski@intel.com> Cc: harish.srinivasappa@intel.com, lukasz.odzioba@intel.com, grzegorz.andrejczuk@intel.com, lukasz.daniluk@intel.com From: Dave Hansen Message-ID: <57630ADE.3040900@linux.intel.com> Date: Thu, 16 Jun 2016 13:23:58 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.8.0 MIME-Version: 1.0 In-Reply-To: <1466090042-30908-1-git-send-email-lukasz.anaczkowski@intel.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 06/16/2016 08:14 AM, Lukasz Anaczkowski wrote: > For reclaim this brings the performance back to before Mel's > flushing changes, but for unmap it disables batching. This turns out to be pretty catastrophic for unmap. In a workload that uses, say 200 hardware threads and alloc/frees() a few MB/sec, this ends up costing hundreds of thousands of extra received IPIs. 10MB=~2500 ptes, and at with 200 threads, that's 250,000 IPIs received just to free 10MB of memory. The initial testing we did on this was on a *bunch* of threads all doing alloc/free. But this is bottlenecked on other things, like mmap_sem being held for write. The scenario that we really needed to test here was on lots of threads doing processing and 1 thread doing alloc/free.