From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5B50AC43461 for ; Fri, 7 May 2021 07:35:54 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2CC6D61457 for ; Fri, 7 May 2021 07:35:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235375AbhEGHgw (ORCPT ); Fri, 7 May 2021 03:36:52 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:59850 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235363AbhEGHgv (ORCPT ); Fri, 7 May 2021 03:36:51 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1620372951; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=YRLhFcul/s0bAb0h6pwheuix10YVXJYp5em6Jk4RpTY=; b=LKQ8IWaBht40hQ37zGhiXWlx+x+Dj6an1dgqmq+UDQ3Msuggv1C2b3TezDAWLG/j28CUxx SYtxDjtmTeLa6lateTlt99a9Vpwn/iZBxnYdboXBfMzDsZa4lqARUwn76xIegra1nQR16K nHeDtHsZAN4JgiRXzfUjQ3eqptexoS4= Received: from mail-ed1-f71.google.com (mail-ed1-f71.google.com [209.85.208.71]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-600-5SsIjsNUN5WUurbwjX-yUA-1; Fri, 07 May 2021 03:35:49 -0400 X-MC-Unique: 5SsIjsNUN5WUurbwjX-yUA-1 Received: by mail-ed1-f71.google.com with SMTP id i2-20020a0564020542b02903875c5e7a00so3993083edx.6 for ; Fri, 07 May 2021 00:35:49 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:to:cc:references:from:organization:subject :message-id:date:user-agent:mime-version:in-reply-to :content-language:content-transfer-encoding; bh=YRLhFcul/s0bAb0h6pwheuix10YVXJYp5em6Jk4RpTY=; b=ZAaOXV7npGfwd/McDVQFCiFyWVEw68VVKlbm4yjl0mHVEhntq3sDJqB6n4bMo7k5E/ FwVKsHyjHsOisHNU4HqArQbG9wmX0YpMEksZ+q4jNpJgok5QFqDcjbUDiQ/dn1ras0FY EjTlEqjyHBaObMrsgsoDbLW0fX8qdJRb2HPl/gig0ZYU/ZkrBqzlK39X2Eb8zZI/vgCL y6MrGWKBW/h/wz6Jpt1y826Bk4XmsriPuW6loeKs1NVs3tiIjQOJH/phSUGyO/gnX/rR FbCfaF2j3iApmfydaFqP1VppQL7F1HjIkR1y2wakXGhPJlR1A9lrdUAPBahDcbbO2Uw4 wHdQ== X-Gm-Message-State: AOAM532Kjd9GsVLTIzrWitXJql65xeF5/C2nNsNytoccm4YdocQtvxP+ RNg6i5lYF6x+4J15pctvO7btCxRUoDUsIU/QYqecR43JPlnAOG5Pbzyj2o2I6iiPpbYYqLpkLNT JsYCKSdsKxwkRrP8I7Q2A X-Received: by 2002:a17:907:174a:: with SMTP id lf10mr8861595ejc.433.1620372948258; Fri, 07 May 2021 00:35:48 -0700 (PDT) X-Google-Smtp-Source: ABdhPJx5XI2zKUbsKazvcW2Vr5jvMDeiB+dtOEKTtfzbIqz37aYae3i7ugq6B0OmfMhiabycSZroRA== X-Received: by 2002:a17:907:174a:: with SMTP id lf10mr8861569ejc.433.1620372947917; Fri, 07 May 2021 00:35:47 -0700 (PDT) Received: from [192.168.3.132] (p5b0c63c0.dip0.t-ipconnect.de. [91.12.99.192]) by smtp.gmail.com with ESMTPSA id l17sm2925176ejk.22.2021.05.07.00.35.45 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 07 May 2021 00:35:47 -0700 (PDT) To: Nick Kossifidis , jejb@linux.ibm.com Cc: Andrew Morton , Mike Rapoport , Alexander Viro , Andy Lutomirski , Arnd Bergmann , Borislav Petkov , Catalin Marinas , Christopher Lameter , Dan Williams , Dave Hansen , Elena Reshetova , "H. Peter Anvin" , Ingo Molnar , "Kirill A. Shutemov" , Matthew Wilcox , Matthew Garrett , Mark Rutland , Michal Hocko , Mike Rapoport , Michael Kerrisk , Palmer Dabbelt , Paul Walmsley , Peter Zijlstra , "Rafael J. Wysocki" , Rick Edgecombe , Roman Gushchin , Shakeel Butt , Shuah Khan , Thomas Gleixner , Tycho Andersen , Will Deacon , linux-api@vger.kernel.org, linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-nvdimm@lists.01.org, linux-riscv@lists.infradead.org, x86@kernel.org References: <20210303162209.8609-1-rppt@kernel.org> <20210505120806.abfd4ee657ccabf2f221a0eb@linux-foundation.org> <996dbc29-e79c-9c31-1e47-cbf20db2937d@redhat.com> <8eb933f921c9dfe4c9b1b304e8f8fa4fbc249d84.camel@linux.ibm.com> <77fe28bd940b2c1afd69d65b6d349352@mailhost.ics.forth.gr> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH v18 0/9] mm: introduce memfd_secret system call to create "secret" memory areas Message-ID: <5232c8a7-8a05-9d0f-69ff-3dba2b04e784@redhat.com> Date: Fri, 7 May 2021 09:35:45 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.1 MIME-Version: 1.0 In-Reply-To: <77fe28bd940b2c1afd69d65b6d349352@mailhost.ics.forth.gr> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-api@vger.kernel.org On 07.05.21 01:16, Nick Kossifidis wrote: > Στις 2021-05-06 20:05, James Bottomley έγραψε: >> On Thu, 2021-05-06 at 18:45 +0200, David Hildenbrand wrote: >>> >>> Also, there is a way to still read that memory when root by >>> >>> 1. Having kdump active (which would often be the case, but maybe not >>> to dump user pages ) >>> 2. Triggering a kernel crash (easy via proc as root) >>> 3. Waiting for the reboot after kump() created the dump and then >>> reading the content from disk. >> >> Anything that can leave physical memory intact but boot to a kernel >> where the missing direct map entry is restored could theoretically >> extract the secret. However, it's not exactly going to be a stealthy >> extraction ... >> >>> Or, as an attacker, load a custom kexec() kernel and read memory >>> from the new environment. Of course, the latter two are advanced >>> mechanisms, but they are possible when root. We might be able to >>> mitigate, for example, by zeroing out secretmem pages before booting >>> into the kexec kernel, if we care :) >> >> I think we could handle it by marking the region, yes, and a zero on >> shutdown might be useful ... it would prevent all warm reboot type >> attacks. >> > > I had similar concerns about recovering secrets with kdump, and > considered cleaning up keyrings before jumping to the new kernel. The > problem is we can't provide guarantees in that case, once the kernel has > crashed and we are on our way to run crashkernel, we can't be sure we > can reliably zero-out anything, the more code we add to that path the Well, I think it depends. Assume we do the following 1) Zero out any secretmem pages when handing them back to the buddy. (alternative: init_on_free=1) -- if not already done, I didn't check the code. 2) On kdump(), zero out all allocated secretmem. It'd be easier if we'd just allocated from a fixed physical memory area; otherwise we have to walk process page tables or use a PFN walker. And zeroing out secretmem pages without a direct mapping is a different challenge. Now, during 2) it can happen that a) We crash in our clearing code (e.g., something is seriously messed up) and fail to start the kdump kernel. That's actually good, instead of leaking data we fail hard. b) We don't find all secretmem pages, for example, because process page tables are messed up or something messed up our memmap (if we'd use that to identify secretmem pages via a PFN walker somehow) But for the simple cases (e.g., malicious root tries to crash the kernel via /proc/sysrq-trigger) both a) and b) wouldn't apply. Obviously, if an admin would want to mitigate right now, he would want to disable kdump completely, meaning any attempt to load a crashkernel would fail and cannot be enabled again for that kernel (also not via cmdline an attacker could modify to reboot into a system with the option for a crashkernel). Disabling kdump in the kernel when secretmem pages are allocated is one approach, although sub-optimal. > more risky it gets. However during reboot/normal kexec() we should do > some cleanup, it makes sense and secretmem can indeed be useful in that > case. Regarding loading custom kexec() kernels, we mitigate this with > the kexec file-based API where we can verify the signature of the loaded > kimage (assuming the system runs a kernel provided by a trusted 3rd > party and we 've maintained a chain of trust since booting). For example in VMs (like QEMU), we often don't clear physical memory during a reboot. So if an attacker manages to load a kernel that you can trick into reading random physical memory areas, we can leak secretmem data I think. And there might be ways to achieve that just using the cmdline, not necessarily loading a different kernel. For example if you limit the kernel footprint ("mem=256M") and disable strict_iomem_checks ("strict_iomem_checks=relaxed") you can just extract that memory via /dev/mem if I am not wrong. So as an attacker, modify the (grub) cmdline to "mem=256M strict_iomem_checks=relaxed", reboot, and read all memory via /dev/mem. Or load a signed kexec kernel with that cmdline and boot into it. Interesting problem :) -- Thanks, David / dhildenb