From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_PASS, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 48FD6C677FF for ; Thu, 11 Oct 2018 18:03:23 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id E1FEB2098A for ; Thu, 11 Oct 2018 18:03:22 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=intel-com.20150623.gappssmtp.com header.i=@intel-com.20150623.gappssmtp.com header.b="bAR6ZyXd" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E1FEB2098A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727443AbeJLBbj (ORCPT ); Thu, 11 Oct 2018 21:31:39 -0400 Received: from mail-oi1-f194.google.com ([209.85.167.194]:39013 "EHLO mail-oi1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726761AbeJLBbj (ORCPT ); Thu, 11 Oct 2018 21:31:39 -0400 Received: by mail-oi1-f194.google.com with SMTP id y81-v6so7796420oia.6 for ; Thu, 11 Oct 2018 11:03:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=MLw2Qqw10+kZEYNtT4nrNXzblopNaVgaK5XKcCnlg3c=; b=bAR6ZyXdUZxw7LR13MyxmwLET1uYLql3XLEc9DX/+V/Br3lS59ucpPBr5usGs9uQ9k ZnhrSTegn03CptX1S4BjzBEBv+dZw/pSDD7ZMRvrP6J9j3Dy7UqwDzkCojsc5EPDa33S ig+QVtIygCfLTWYBbSitQHVikIjthxWFdN0vm7tQLmFeVZkG5LiwF+OzDkC6odsnNWDc 3JYerJBIeMDSjDln4wbQ13mfJOeNrU2KPc/8KvqrRW7kxNwbsIhRNhZ4GwMHfwTP8mrf huiWWeHGgQGSv0se3owFv19EgbZGCLTvhdB/XMnKRkaOSFCkOBo1gp9UID5KSsvL09uy f7RQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=MLw2Qqw10+kZEYNtT4nrNXzblopNaVgaK5XKcCnlg3c=; b=VUNcGft2gjChGeQHWPMoQTuuvPMDVpQ/lcGe9vwrMRPHHdAfdK+qJAXrd+WYAi4s1Q le+pf5QPy5frWMH2Hd8ETr4V5AoVe9W7/v4m/mDzy7dklrnSA4MQFHvQC5J6Utn6VrGp 16hzNhiur0ADg1zliOmm7zXLy3W9Qp9Qq1mn+GgShj6cO3wV5cIZJWzrPE6AScdJKxGi 12czzSKPdFm3A6HEBt61gVPhA0l2e4wU8wTN9MWrJzGXQdTzGSobFQdudxEQQJCxuGgf Zv3gJYWpzl0YuIcML99btzG9VuNschPGnsTMP/+xy4zruyRCIKgEisWfOHePRDdj2R69 cSlw== X-Gm-Message-State: ABuFfoivm0Cd2FYlxdO2PfLVXZtGjPZmhuzmqbrzTIlQTY1bsJ8m11wf D7vNWSMbhqd1JRPrZabWS5Dm9PwO8ZREHiwMbrWtYg== X-Google-Smtp-Source: ACcGV60VG4a38Qp/8RPmEkCwvw2Hw7+/b9TR/O16cE0e+RlNRiRa9xXrb0Ld8fPgQMJJYektNOTcTzCIuSn0wVUZEf8= X-Received: by 2002:aca:3ad4:: with SMTP id h203-v6mr1384418oia.235.1539280999694; Thu, 11 Oct 2018 11:03:19 -0700 (PDT) MIME-Version: 1.0 References: <153861931865.2863953.11185006931458762795.stgit@dwillia2-desk3.amr.corp.intel.com> <20181004074457.GD22173@dhcp22.suse.cz> <20181009112216.GM8528@dhcp22.suse.cz> <20181010084731.GB5873@dhcp22.suse.cz> <20181011115238.GU5873@dhcp22.suse.cz> In-Reply-To: <20181011115238.GU5873@dhcp22.suse.cz> From: Dan Williams Date: Thu, 11 Oct 2018 11:03:07 -0700 Message-ID: Subject: Re: [PATCH v2 0/3] Randomize free memory To: Michal Hocko Cc: Andrew Morton , Dave Hansen , Kees Cook , Linux MM , Linux Kernel Mailing List Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Oct 11, 2018 at 4:56 AM Michal Hocko wrote: > > On Wed 10-10-18 17:13:14, Dan Williams wrote: > [...] > > On Wed, Oct 10, 2018 at 1:48 AM Michal Hocko wrote: > > ...and now that I've made that argument I think I've come around to > > your point about the shuffle_page_order parameter. The only entity > > that might have a better clue about "safer" shuffle orders than > > MAX_ORDER is the distribution provider. > > And how is somebody providing a kernel for large variety of workloads > supposed to know? True, this would be a much easier discussion with a wider / deeper data set. > > [...] > > > Note, you can also think about this just on pure architecture terms. > > I.e. that for a direct mapped cache anywhere in a system you can have > > a near zero cache conflict rate on a first run of a workload and high > > conflict rate on a second run based on how lucky you are with memory > > allocation placement relative to the first run. Randomization keeps > > you out of such performance troughs and provides more reliable average > > performance. > > I am not disagreeing here. That reliable average might be worse than > what you get with the non-randomized case. And that might be a fair > deal for some workloads. You are, however, providing a functionality > which is enabled by default without any actual numbers (well except for > _a_java_ workload that seems to benefit) so you should really do your > homework stop handwaving and give us some numbers and/or convincing > arguments please. The latest version of the patches no longer enable it by default. I'm giving you the data I can give with respect to pre-production hardware. > > With the numa emulation patch I referenced an > > administrator could constrain a workload to run in a cache-sized > > subset of the available memory if they really know what they are doing > > and need firmer guarantees. > > Then mention how and what you can achieve by that in the changelog. The numa_emulation aspect is orthogonal to the randomization implementation. It does not belong in the randomization changelog. > > The risk if Linux does not have this capability is unstable hacks like > > zonesort and rebooting, as referenced in that KNL article, which are > > not suitable for a general purpose kernel / platform. > > We could have lived without those for quite some time so this doesn't > seem to be anything super urgent to push through without a proper > justification. We lived without them previously because memory-side-caches were limited to niche hardware, now this is moving into general purpose server platforms and the urgency / impact goes up accordingly. > > > > > Many years back while at a university I was playing > > > > > with page coloring as a method to reach a more stable performance > > > > > results due to reduced cache conflicts. It was not always a performance > > > > > gain but it definitely allowed for more stable run-to-run comparable > > > > > results. I can imagine that a randomization might lead to a similar effect > > > > > although I am not sure how much and it would be more interesting to hear > > > > > about that effect. > > > > > > > > Cache coloring is effective up until your workload no longer fits in > > > > that color. > > > > > > Yes, that was my observation back then more or less. But even when you > > > do not fit into the cache a color aware strategy (I was playing with bin > > > hoping as well) produced a more deterministic/stable results. But that > > > is just a side note as it doesn't directly relate to your change. > > > > > > > Randomization helps to attenuate the cache conflict rate > > > > when that happens. > > > > > > I can imagine that. Do we have any numbers to actually back that claim > > > though? > > > > > > > Yes, 2.5X cache conflict rate reduction, in the change log. > > Which is a single benchmark result which is not even described in detail > to be able to reproduce that measurement. I am sorry for nagging > here but I would expect something less obscure. No need to apologize. > How does this behave for > usual workloads that we test cache sensitive workloads. I myself am not > a benchmark person but I am pretty sure there are people who can help > you to find proper ones to run and evaluate. I wouldn't pick benchmarks that are cpu-cache sensitive since those are small number of MBs in size, a memory-side cache is on the order of 10s of GBs. > > > > > For workloads that may fit in the cache, and/or > > > > environments that need more explicit cache control we have the recent > > > > changes to numa_emulation [1] to arrange for cache sized numa nodes. > > > > > > Could you point me to some more documentation. My google-fu is failing > > > me and "5.2.27.5 Memory Side Cache Information Structure" doesn't point > > > to anything official (except for your patch referencing it). > > > > http://www.uefi.org/sites/default/files/resources/ACPI%206_2_A_Sept29.pdf > > Thanks! > > [...] > > > > With all that being said, I think the overal idea makes sense but you > > > should try much harder to explain _why_ we need it and back your > > > justification by actual _data_ before I would consider my ack. > > > > I don't have a known CVE, I only have the ack of people more > > knowledgeable about security than myself like Kees to say in effect, > > "yes, this complicates attacks". If you won't take Kees' word for it, > > I'm not sure what other justification I can present on the security > > aspect. > > In general (nothing against Kees here of course), I prefer a stronger > justification than "somebody said it will make attacks harder". At least > my concern about fragmented memory which is not really hard to achieve > at all should be reasonably clarified. I am fully aware there is no > absolute measure here but making something harder under ideal conditions > doesn't really help for common attack strategies which can prepare the > system into an actual state to exploit allocation predictability. I am > no expert here but if an attacker can deduce the allocation pattern then > fragmenting the memory is one easy step to overcome what people would > consider a security measure. > > So color me unconvinced for now. Another way to attack heap randomization without fragmentation is to just perform heap spraying and hope that lands the data the attacker needs in the right place. I still think that allocation entropy > 0 is positive benefit, but I don't know how to determine the curve of security benefit relative to shuffle order. > > 2.5X cache conflict reduction on a Java benchmark workload that the > > exceeds the cache size by multiple factors is the data I can provide > > today. Post launch it becomes easier to share more precise data, but > > that's post 4.20. The hope of course is to have this capability > > available in an upstream released kernel in advance of wider hardware > > availability. > > I will not comment on timing but in general, any performance related > changes should come with numbers for a wider variety of workloads. That's fair. > In any case, I believe the change itself is not controversial as long it > is opt-in (potentially autotuned based on specific HW) Do you mean disable shuffling on systems that don't have a memory-side-cache unless / until we can devise a security benefit curve relative to shuffle-order? The former I can do, the latter, I'm at a loss. > with a reasonable > API. And no I do not consider $RANDOM_ORDER a good interface. I think the current v4 proposal of compile-time setting is reasonable once we have consensus / guidance on the default shuffle-order.