From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9490AC4332F for ; Fri, 19 Nov 2021 13:51:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7ACC261247 for ; Fri, 19 Nov 2021 13:51:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235623AbhKSNyU (ORCPT ); Fri, 19 Nov 2021 08:54:20 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:27378 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235563AbhKSNyS (ORCPT ); Fri, 19 Nov 2021 08:54:18 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1637329876; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0MQ6EmX5xeBMO7143sm+kmalOXvvH8veZYgNdztafAY=; b=Du9s4mAvHjp3HTcNatjhyJ3KkDckrh1zfwF+jr9J7tVTUY1CrL3DfvyXUTHnLQHOe9Rno0 suBtY3F0jMozvjK2dV6+9taJfE+cvwqxmE/+wdEDYJ80TACvb3c4u2WGCrA7hu6Ypu+1H8 V3z1nmVNSaNiEo+61RmAq44xQqOo/lo= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-198-mFG95swvNbO-XKaWs3p_VQ-1; Fri, 19 Nov 2021 08:51:15 -0500 X-MC-Unique: mFG95swvNbO-XKaWs3p_VQ-1 Received: by mail-wm1-f70.google.com with SMTP id 145-20020a1c0197000000b0032efc3eb9bcso5952275wmb.0 for ; Fri, 19 Nov 2021 05:51:15 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:cc:references:from:organization:in-reply-to :content-transfer-encoding; bh=0MQ6EmX5xeBMO7143sm+kmalOXvvH8veZYgNdztafAY=; b=sn3ui9fypuLmiqWaOmIZUrXWuOsAyYqGF29ulv+oLHEvu+U4GpBGNbID349lZDqaq7 odmupYYsVTmrklHB/G2NPlW332iwzhT/IMLh+Ud9jryPiMkkdJLGMiobFBvfN3ODZCMB QKZq+/DaMWOe1JTYArmPoWc7dHNYVPb5m9K/GpqFD+zDaQk83L5mQ0VZTP2PrW8m69Qc TbEy8gTSFd6gYthYNTQX8goKKe9NldhBmqLWzRU5kh2tov1KcXl7cJni9eWhBUQWSXUD PxC4PeAWL/Z39gRIGb4VvoUgj5YJ7wTBzyZka9o350Vvqz+4pJGZytvpxGK44ulYJ+X1 XBeA== X-Gm-Message-State: AOAM531smvmNJXpJkyRXxL+fmP0RQrp9Cel74K+M3jWs03lnY+rOhztA HUjRmbkXoqVwZEumYwIHBHt72hwZtAiUuCvB8EEUv6SwOb9pW5gzeXsRphjdwb2GidcoaRJQRjL rrBZFGJqvLPCRVqgbIOGiP+C7 X-Received: by 2002:a05:6000:1a45:: with SMTP id t5mr7549032wry.306.1637329874056; Fri, 19 Nov 2021 05:51:14 -0800 (PST) X-Google-Smtp-Source: ABdhPJyguzOIQ4dtj/UgUT64libXTpwo03IBVeH470Oj9eTA/dOIJOsebKLkqM9WONqcZosshV46JQ== X-Received: by 2002:a05:6000:1a45:: with SMTP id t5mr7548995wry.306.1637329873829; Fri, 19 Nov 2021 05:51:13 -0800 (PST) Received: from [192.168.3.132] (p5b0c6271.dip0.t-ipconnect.de. [91.12.98.113]) by smtp.gmail.com with ESMTPSA id f15sm3823943wmg.30.2021.11.19.05.51.12 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 19 Nov 2021 05:51:13 -0800 (PST) Message-ID: <942e0dd6-e426-06f6-7b6c-0e80d23c27e6@redhat.com> Date: Fri, 19 Nov 2021 14:51:11 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.2.0 Subject: Re: [RFC v2 PATCH 01/13] mm/shmem: Introduce F_SEAL_GUEST Content-Language: en-US To: Chao Peng , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, qemu-devel@nongnu.org Cc: Paolo Bonzini , Jonathan Corbet , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H . Peter Anvin" , Hugh Dickins , Jeff Layton , "J . Bruce Fields" , Andrew Morton , Yu Zhang , "Kirill A . Shutemov" , luto@kernel.org, john.ji@intel.com, susie.li@intel.com, jun.nakajima@intel.com, dave.hansen@intel.com, ak@linux.intel.com References: <20211119134739.20218-1-chao.p.peng@linux.intel.com> <20211119134739.20218-2-chao.p.peng@linux.intel.com> From: David Hildenbrand Organization: Red Hat In-Reply-To: <20211119134739.20218-2-chao.p.peng@linux.intel.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 19.11.21 14:47, Chao Peng wrote: > From: "Kirill A. Shutemov" > > The new seal type provides semantics required for KVM guest private > memory support. A file descriptor with the seal set is going to be used > as source of guest memory in confidential computing environments such as > Intel TDX and AMD SEV. > > F_SEAL_GUEST can only be set on empty memfd. After the seal is set > userspace cannot read, write or mmap the memfd. > > Userspace is in charge of guest memory lifecycle: it can allocate the > memory with falloc or punch hole to free memory from the guest. > > The file descriptor passed down to KVM as guest memory backend. KVM > register itself as the owner of the memfd via memfd_register_guest(). > > KVM provides callback that needed to be called on fallocate and punch > hole. > > memfd_register_guest() returns callbacks that need be used for > requesting a new page from memfd. > Repeating the feedback I already shared in a private mail thread: As long as page migration / swapping is not supported, these pages behave like any longterm pinned pages (e.g., VFIO) or secretmem pages. 1. These pages are not MOVABLE. They must not end up on ZONE_MOVABLE or MIGRATE_CMA. That should be easy to handle, you have to adjust the gfp_mask to mapping_set_gfp_mask(inode->i_mapping, GFP_HIGHUSER); just as mm/secretmem.c:secretmem_file_create() does. 2. These pages behave like mlocked pages and should be accounted as such. This is probably where the accounting "fun" starts, but maybe it's easier than I think to handle. See mm/secretmem.c:secretmem_mmap(), where we account the pages as VM_LOCKED and will consequently check per-process mlock limits. As we don't mmap(), the same approach cannot be reused. See drivers/vfio/vfio_iommu_type1.c:vfio_pin_map_dma() and vfio_pin_pages_remote() on how to manually account via mm->locked_vm . But it's a bit hairy because these pages are not actually mapped into the page tables of the MM, so it might need some thought. Similarly, these pages actually behave like "pinned" (as in mm->pinned_vm), but we just don't increase the refcount AFAIR. Again, accounting really is a bit hairy ... -- Thanks, David / dhildenb From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 98D34C433EF for ; Fri, 19 Nov 2021 14:12:45 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 2645A61261 for ; Fri, 19 Nov 2021 14:12:45 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 2645A61261 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=nongnu.org Received: from localhost ([::1]:47320 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mo4dA-0006CS-4s for qemu-devel@archiver.kernel.org; Fri, 19 Nov 2021 09:12:44 -0500 Received: from eggs.gnu.org ([209.51.188.92]:32908) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mo4IT-0000vM-PD for qemu-devel@nongnu.org; Fri, 19 Nov 2021 08:51:23 -0500 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:36463) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mo4IQ-0000wQ-VO for qemu-devel@nongnu.org; Fri, 19 Nov 2021 08:51:20 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1637329876; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0MQ6EmX5xeBMO7143sm+kmalOXvvH8veZYgNdztafAY=; b=Du9s4mAvHjp3HTcNatjhyJ3KkDckrh1zfwF+jr9J7tVTUY1CrL3DfvyXUTHnLQHOe9Rno0 suBtY3F0jMozvjK2dV6+9taJfE+cvwqxmE/+wdEDYJ80TACvb3c4u2WGCrA7hu6Ypu+1H8 V3z1nmVNSaNiEo+61RmAq44xQqOo/lo= Received: from mail-wm1-f72.google.com (mail-wm1-f72.google.com [209.85.128.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-553-k9LoLr-YPS2vt7SOLIK5LQ-1; Fri, 19 Nov 2021 08:51:15 -0500 X-MC-Unique: k9LoLr-YPS2vt7SOLIK5LQ-1 Received: by mail-wm1-f72.google.com with SMTP id n16-20020a05600c3b9000b003331973fdbbso4784933wms.0 for ; Fri, 19 Nov 2021 05:51:15 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:cc:references:from:organization:in-reply-to :content-transfer-encoding; bh=0MQ6EmX5xeBMO7143sm+kmalOXvvH8veZYgNdztafAY=; b=UbtQ8RZPLrQnFi9GVs47HisxPh6/TaoYPhglWkWY6PqV+vOvilB7vJIcv7qhR87wvs +n9kAm0VBqmWL+bSGmTveNCPa9lwTbbvK3r7/TQbv4nl4TTYKh+Elrt4Cax+ypHrF9ty yqYlbq/2Ofy2ic4o8lXGxIgw0v0n9X3XZU2w+6UdlBIUitNcl7HJ3UKCKI7GMmTifdft YZo7z6FNdqwm+mAN0iVn3lcAeuni7Zl568WCfVq2520OeGfj8CF/n6ZySwjGG0olDoxs 4ZZ/WGCNlFrKubOPT4gJ6srHdSSCrY8BUNDz4EUiH7+zbXSYcQNdlEBfwBaHBesbFhCi JMnQ== X-Gm-Message-State: AOAM5304BaD/ZLnFii2RwaIhZ5LCV88NhJT6fG0d/clORUQhYzr1Afj2 IXUZKSTVGo6OJlmJNYljQhCLBzsXt4YZsZo7yFgRngiuXHsliBq+74zba5ERbUyTxX3+YSo1HnX AhaKw+gXsuBbdRPQ= X-Received: by 2002:a05:6000:1a45:: with SMTP id t5mr7549048wry.306.1637329874064; Fri, 19 Nov 2021 05:51:14 -0800 (PST) X-Google-Smtp-Source: ABdhPJyguzOIQ4dtj/UgUT64libXTpwo03IBVeH470Oj9eTA/dOIJOsebKLkqM9WONqcZosshV46JQ== X-Received: by 2002:a05:6000:1a45:: with SMTP id t5mr7548995wry.306.1637329873829; Fri, 19 Nov 2021 05:51:13 -0800 (PST) Received: from [192.168.3.132] (p5b0c6271.dip0.t-ipconnect.de. [91.12.98.113]) by smtp.gmail.com with ESMTPSA id f15sm3823943wmg.30.2021.11.19.05.51.12 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 19 Nov 2021 05:51:13 -0800 (PST) Message-ID: <942e0dd6-e426-06f6-7b6c-0e80d23c27e6@redhat.com> Date: Fri, 19 Nov 2021 14:51:11 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.2.0 Subject: Re: [RFC v2 PATCH 01/13] mm/shmem: Introduce F_SEAL_GUEST To: Chao Peng , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, qemu-devel@nongnu.org References: <20211119134739.20218-1-chao.p.peng@linux.intel.com> <20211119134739.20218-2-chao.p.peng@linux.intel.com> From: David Hildenbrand Organization: Red Hat In-Reply-To: <20211119134739.20218-2-chao.p.peng@linux.intel.com> Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=david@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Received-SPF: pass client-ip=216.205.24.124; envelope-from=david@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -51 X-Spam_score: -5.2 X-Spam_bar: ----- X-Spam_report: (-5.2 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.7, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, NICE_REPLY_A=-1.727, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Wanpeng Li , luto@kernel.org, "J . Bruce Fields" , dave.hansen@intel.com, "H . Peter Anvin" , ak@linux.intel.com, Jonathan Corbet , Joerg Roedel , x86@kernel.org, Hugh Dickins , Ingo Molnar , Borislav Petkov , jun.nakajima@intel.com, Thomas Gleixner , Vitaly Kuznetsov , Jim Mattson , Sean Christopherson , susie.li@intel.com, Jeff Layton , john.ji@intel.com, Yu Zhang , Paolo Bonzini , Andrew Morton , "Kirill A . Shutemov" Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" On 19.11.21 14:47, Chao Peng wrote: > From: "Kirill A. Shutemov" > > The new seal type provides semantics required for KVM guest private > memory support. A file descriptor with the seal set is going to be used > as source of guest memory in confidential computing environments such as > Intel TDX and AMD SEV. > > F_SEAL_GUEST can only be set on empty memfd. After the seal is set > userspace cannot read, write or mmap the memfd. > > Userspace is in charge of guest memory lifecycle: it can allocate the > memory with falloc or punch hole to free memory from the guest. > > The file descriptor passed down to KVM as guest memory backend. KVM > register itself as the owner of the memfd via memfd_register_guest(). > > KVM provides callback that needed to be called on fallocate and punch > hole. > > memfd_register_guest() returns callbacks that need be used for > requesting a new page from memfd. > Repeating the feedback I already shared in a private mail thread: As long as page migration / swapping is not supported, these pages behave like any longterm pinned pages (e.g., VFIO) or secretmem pages. 1. These pages are not MOVABLE. They must not end up on ZONE_MOVABLE or MIGRATE_CMA. That should be easy to handle, you have to adjust the gfp_mask to mapping_set_gfp_mask(inode->i_mapping, GFP_HIGHUSER); just as mm/secretmem.c:secretmem_file_create() does. 2. These pages behave like mlocked pages and should be accounted as such. This is probably where the accounting "fun" starts, but maybe it's easier than I think to handle. See mm/secretmem.c:secretmem_mmap(), where we account the pages as VM_LOCKED and will consequently check per-process mlock limits. As we don't mmap(), the same approach cannot be reused. See drivers/vfio/vfio_iommu_type1.c:vfio_pin_map_dma() and vfio_pin_pages_remote() on how to manually account via mm->locked_vm . But it's a bit hairy because these pages are not actually mapped into the page tables of the MM, so it might need some thought. Similarly, these pages actually behave like "pinned" (as in mm->pinned_vm), but we just don't increase the refcount AFAIR. Again, accounting really is a bit hairy ... -- Thanks, David / dhildenb