From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.0 required=3.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 195D9C43387 for ; Tue, 18 Dec 2018 19:08:10 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id D27FE21873 for ; Tue, 18 Dec 2018 19:08:08 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=intel-com.20150623.gappssmtp.com header.i=@intel-com.20150623.gappssmtp.com header.b="xrTzBWzr" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726952AbeLRTIH (ORCPT ); Tue, 18 Dec 2018 14:08:07 -0500 Received: from mail-ot1-f67.google.com ([209.85.210.67]:39520 "EHLO mail-ot1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726639AbeLRTIH (ORCPT ); Tue, 18 Dec 2018 14:08:07 -0500 Received: by mail-ot1-f67.google.com with SMTP id n8so16719270otl.6 for ; Tue, 18 Dec 2018 11:08:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=7C9qK6ZnGrnhIYBcuzbzCgP7C9CgQGqy2EGw/nx6yD8=; b=xrTzBWzrT6wsl3MijPV/2h945llENr46VN3J8AQGgHc5lobX1FoGihRtW+aFZluOUE prnSWmd7WvNdZVt89qhiHGQibKc+rGJ5st1ppqxBgUWRogmStnh5RUJOU1+Mv43UeKNB x35phab8o0JpYf9UAm5PI8nB+ozU91IXFlfCxd3Am3HOGxQOyg70G61f7nMKTzzYDJWg x153EWn/4MW5zcdg8vLaQJC7bCU+TF1lq+OEejXFPqeHn5RB2n5eu//wrWsOWuuurLNe Dug1i4Y8MzXlNT9nuXKDDOqWiyESKXLcWc8wFha1p5NdsdJKlZm3IqXZK/rVb1AnX7bu 4AFw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=7C9qK6ZnGrnhIYBcuzbzCgP7C9CgQGqy2EGw/nx6yD8=; b=kh2qjAej7rBnmx/ddNd984oESgEy1dcqEyJxq6nYXWGTCRVPa3ou/WPAGAdfTkvXhN dhvo5xflt6cz2re6wEBx9qHTyIucBAoyYDAvdzBmgHFnKVcO4GlBFM/3VL6XdSVHaz7V 3BGbcQMgklZU3YNp+jImSB8ULYVVveOUHsTwGUB3QlzykAnJNao9/j6oJa4xPBDFSnpO gSwODdWyfPVsc+AWCT571iJclbb1W+X6kMvmu4ZgpBRObJQ0J9bAB5kBxNr6atuDQoHY 1l0vBWCb8IK/0uDSBgjcxn0rymFVqP/EYabvtPC9hCmlnjxNtywNPkG5XFBAoVkIwb98 214w== X-Gm-Message-State: AA+aEWYb9lvgIh0gPgde3eNCcORSVDSixihjoaj9aRuIwcouu6nWl6QS c0Px2oLa0bCL7V8INjP8ul68I6Bi0F0BldyPjAXnSw== X-Google-Smtp-Source: AFSGD/WBrSTyTvdB7ZEFeOdrbyJMhNX5NI3eoXB+h5HzcbkbHUYoHkKpqlqeJ3s3YYpu5KW8tGc4ygoeQWuMJIFmwBY= X-Received: by 2002:a9d:5cc2:: with SMTP id r2mr13173113oti.367.1545160086563; Tue, 18 Dec 2018 11:08:06 -0800 (PST) MIME-Version: 1.0 References: <154483851047.1672629.15001135860756738866.stgit@dwillia2-desk3.amr.corp.intel.com> <154483852617.1672629.2068988045031389440.stgit@dwillia2-desk3.amr.corp.intel.com> <20181216124335.GB30212@rapoport-lnx> <20181218091121.GA25499@rapoport-lnx> In-Reply-To: <20181218091121.GA25499@rapoport-lnx> From: Dan Williams Date: Tue, 18 Dec 2018 11:07:55 -0800 Message-ID: Subject: Re: [PATCH v5 3/5] mm: Shuffle initial free memory to improve memory-side-cache utilization To: Mike Rapoport Cc: Andrew Morton , Michal Hocko , Kees Cook , Dave Hansen , Peter Zijlstra , Linux MM , X86 ML , Linux Kernel Mailing List Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Dec 18, 2018 at 1:11 AM Mike Rapoport wrote: > > On Mon, Dec 17, 2018 at 11:56:36AM -0800, Dan Williams wrote: > > On Sun, Dec 16, 2018 at 4:43 AM Mike Rapoport wrote: > > > > > > On Fri, Dec 14, 2018 at 05:48:46PM -0800, Dan Williams wrote: > > > > Randomization of the page allocator improves the average utilization of > > > > a direct-mapped memory-side-cache. Memory side caching is a platform > > > > capability that Linux has been previously exposed to in HPC > > > > (high-performance computing) environments on specialty platforms. In > > > > that instance it was a smaller pool of high-bandwidth-memory relative to > > > > higher-capacity / lower-bandwidth DRAM. Now, this capability is going to > > > > be found on general purpose server platforms where DRAM is a cache in > > > > front of higher latency persistent memory [1]. > > [..] > > > > diff --git a/mm/memblock.c b/mm/memblock.c > > > > index 185bfd4e87bb..fd617928ccc1 100644 > > > > --- a/mm/memblock.c > > > > +++ b/mm/memblock.c > > > > @@ -834,8 +834,16 @@ int __init_memblock memblock_set_sidecache(phys_addr_t base, phys_addr_t size, > > > > return ret; > > > > > > > > for (i = start_rgn; i < end_rgn; i++) { > > > > - type->regions[i].cache_size = cache_size; > > > > - type->regions[i].direct_mapped = direct_mapped; > > > > + struct memblock_region *r = &type->regions[i]; > > > > + > > > > + r->cache_size = cache_size; > > > > + r->direct_mapped = direct_mapped; > > > > > > I think this change can be merged into the previous patch > > > > Ok, will do. > > > > > > + /* > > > > + * Enable randomization for amortizing direct-mapped > > > > + * memory-side-cache conflicts. > > > > + */ > > > > + if (r->size > r->cache_size && r->direct_mapped) > > > > + page_alloc_shuffle_enable(); > > > > > > It seems that this is the only use for ->direct_mapped in the memblock > > > code. Wouldn't cache_size != 0 suffice? I.e., in the code that sets the > > > memblock region attributes, the cache_size can be set to 0 for the non > > > direct mapped caches, isn't it? > > > > > > > The HMAT specification allows for other cache-topologies, so it's not > > sufficient to just look for non-zero size when a platform implements a > > set-associative cache. The expectation is that a set-associative cache > > would not need the kernel to perform memory randomization to improve > > the cache utilization. > > > > The check for memory size > cache-size is a sanity check for a > > platform BIOS or system configuration that mis-reports or mis-sizes > > the cache. > > Apparently I didn't explain my point well. > > The acpi_numa_memory_affinity_init() already knows whether the cache is > direct mapped or a set-associative. It can just skip calling > memblock_set_sidecache() for the set-associative case. > > Another thing I've noticed only now, is that memory randomization is > enabled if there is at least one memory region with a direct mapped side > cache attached and once the randomization is on the cache size and the > mapping mode do not matter. So, I think it's not necessary to store them in > the memory region at all. Fair enough. I was anticipating the case when non-ACPI systems gain this capability, but you're right no need to design that now. The size sanity check has some small value, but given there is an override and broken platform firmware would need to be fixed I don't think we lose much by getting rid of it. Will re-flow without memblock integration.