From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 59CB5C77B78 for ; Fri, 21 Apr 2023 18:50:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232891AbjDUSuX (ORCPT ); Fri, 21 Apr 2023 14:50:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54264 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229582AbjDUSuV (ORCPT ); Fri, 21 Apr 2023 14:50:21 -0400 Received: from mail-pl1-x630.google.com (mail-pl1-x630.google.com [IPv6:2607:f8b0:4864:20::630]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B1A6B188 for ; Fri, 21 Apr 2023 11:50:19 -0700 (PDT) Received: by mail-pl1-x630.google.com with SMTP id d9443c01a7336-1a526aa3dd5so27770785ad.3 for ; Fri, 21 Apr 2023 11:50:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rivosinc-com.20221208.gappssmtp.com; s=20221208; t=1682103019; x=1684695019; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=/IIUhUlmRecN66Z6gik+9JylRPrSHv9qYt9esJpbqSI=; b=THDL4+u/vOfwLaOnCW6eh/CjgCzPbUBKAHWf1LgoVb5urObrbbDHtE6LnFnJmkcGbJ X8sScn53VOwmiufg6b6vh6lFIqrfOMJmRlqDY7muic2fAlWCCsOBcmJbCjOoIvNUxnTw CCi7AskMw6mhK+++me+sRjwikLpeFgIwFV0nF0mmjg428f8Rammes3zg4WwGMq8g786W u8Qe//FfgFBTmm/8czZ4ckylnzNPDOezoP2aS60VK1m3/oMe/h5ZP8lwmqiqiNZdCveU hRoUCo88MTFudnwN/fMDvgNm+b5CVsDTOcg14bYbIUZFekjOGitMEoCJ5+oTpJJu4z/U HanA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1682103019; x=1684695019; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=/IIUhUlmRecN66Z6gik+9JylRPrSHv9qYt9esJpbqSI=; b=PJMXBty2c47tB+H7ef/QqHbevprOuV9dqAhQttidcQFPAVG9YhUSfJkD/sRTdYGfaZ M1v2Xh6eUnkL0o6aLxwL0Fb4QvpNLDHpi3v99p+DUO8gWh8HijevyzE101zlpnEimla4 QGvZUtBoKtqc6EF0qmTIeIMjBW5ZnhzMq2i2fc4pQQn3nDm0XMIp8Hz0SlJ7J8mFOsT6 tsRL4Ad9O+2wKyeT7Ru0Ei7qaT9+0EykeinN9Whcv7e9k2oE8y23kDJfY6YkMYhvjdcy 0dFfcVJ566uT+xx3gGGWWcl9CiM+WruUNE6gz0Dga/rrhzAJt+nD2KEhDXpRvgj+l7vU Fl3w== X-Gm-Message-State: AAQBX9d8RSNnwm1pF/ne0MbS/uvRP5abHD88W0nFkjKpQ9TJtY9g8Z0T bzZ7T+L1YFQjhNJb9cUw9ecCn/V2IAHbbqSLkeF4dQ== X-Google-Smtp-Source: AKy350YdSljDnRrQJM9jfWzeZ/GLttvpy4dWrgL7+90Nmaems+z3L9naUfpMZAy9sq8Fz7lbqPOj5hZdKf+W5l/Qavg= X-Received: by 2002:a17:902:8501:b0:1a1:f95a:24f2 with SMTP id bj1-20020a170902850100b001a1f95a24f2mr5252822plb.19.1682103019136; Fri, 21 Apr 2023 11:50:19 -0700 (PDT) MIME-Version: 1.0 References: <20230419221716.3603068-1-atishp@rivosinc.com> <20230419221716.3603068-11-atishp@rivosinc.com> In-Reply-To: From: Atish Kumar Patra Date: Sat, 22 Apr 2023 00:20:08 +0530 Message-ID: Subject: Re: [RFC 10/48] RISC-V: KVM: Implement static memory region measurement To: Sean Christopherson Cc: linux-kernel@vger.kernel.org, Alexandre Ghiti , Andrew Jones , Andrew Morton , Anup Patel , Atish Patra , =?UTF-8?B?QmrDtnJuIFTDtnBlbA==?= , Suzuki K Poulose , Will Deacon , Marc Zyngier , linux-coco@lists.linux.dev, Dylan Reid , abrestic@rivosinc.com, Samuel Ortiz , Christoph Hellwig , Conor Dooley , Greg Kroah-Hartman , Guo Ren , Heiko Stuebner , Jiri Slaby , kvm-riscv@lists.infradead.org, kvm@vger.kernel.org, linux-mm@kvack.org, linux-riscv@lists.infradead.org, Mayuresh Chitale , Palmer Dabbelt , Paolo Bonzini , Paul Walmsley , Rajnesh Kanwal , Uladzislau Rezki Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Apr 20, 2023 at 8:47=E2=80=AFPM Sean Christopherson wrote: > > On Wed, Apr 19, 2023, Atish Patra wrote: > > +int kvm_riscv_cove_vm_measure_pages(struct kvm *kvm, struct kvm_riscv_= cove_measure_region *mr) > > +{ > > + struct kvm_cove_tvm_context *tvmc =3D kvm->arch.tvmc; > > + int rc =3D 0, idx, num_pages; > > + struct kvm_riscv_cove_mem_region *conf; > > + struct page *pinned_page, *conf_page; > > + struct kvm_riscv_cove_page *cpage; > > + > > + if (!tvmc) > > + return -EFAULT; > > + > > + if (tvmc->finalized_done) { > > + kvm_err("measured_mr pages can not be added after finaliz= e\n"); > > + return -EINVAL; > > + } > > + > > + num_pages =3D bytes_to_pages(mr->size); > > + conf =3D &tvmc->confidential_region; > > + > > + if (!IS_ALIGNED(mr->userspace_addr, PAGE_SIZE) || > > + !IS_ALIGNED(mr->gpa, PAGE_SIZE) || !mr->size || > > + !cove_is_within_region(conf->gpa, conf->npages << PAGE_SHIFT,= mr->gpa, mr->size)) > > + return -EINVAL; > > + > > + idx =3D srcu_read_lock(&kvm->srcu); > > + > > + /*TODO: Iterate one page at a time as pinning multiple pages fail= with unmapped panic > > + * with a virtual address range belonging to vmalloc region for s= ome reason. > > I've no idea what code you had, but I suspect the fact that vmalloc'd mem= ory isn't > guaranteed to be physically contiguous is relevant to the panic. > Ahh. That makes sense. Thanks. > > + */ > > + while (num_pages) { > > + if (signal_pending(current)) { > > + rc =3D -ERESTARTSYS; > > + break; > > + } > > + > > + if (need_resched()) > > + cond_resched(); > > + > > + rc =3D get_user_pages_fast(mr->userspace_addr, 1, 0, &pin= ned_page); > > + if (rc < 0) { > > + kvm_err("Pinning the userpsace addr %lx failed\n"= , mr->userspace_addr); > > + break; > > + } > > + > > + /* Enough pages are not available to be pinned */ > > + if (rc !=3D 1) { > > + rc =3D -ENOMEM; > > + break; > > + } > > + conf_page =3D alloc_page(GFP_KERNEL | __GFP_ZERO); > > + if (!conf_page) { > > + rc =3D -ENOMEM; > > + break; > > + } > > + > > + rc =3D cove_convert_pages(page_to_phys(conf_page), 1, tru= e); > > + if (rc) > > + break; > > + > > + /*TODO: Support other pages sizes */ > > + rc =3D sbi_covh_add_measured_pages(tvmc->tvm_guest_id, pa= ge_to_phys(pinned_page), > > + page_to_phys(conf_page),= SBI_COVE_PAGE_4K, > > + 1, mr->gpa); > > + if (rc) > > + break; > > + > > + /* Unpin the page now */ > > + put_page(pinned_page); > > + > > + cpage =3D kmalloc(sizeof(*cpage), GFP_KERNEL_ACCOUNT); > > + if (!cpage) { > > + rc =3D -ENOMEM; > > + break; > > + } > > + > > + cpage->page =3D conf_page; > > + cpage->npages =3D 1; > > + cpage->gpa =3D mr->gpa; > > + cpage->hva =3D mr->userspace_addr; > > Snapshotting the userspace address for the _source_ page can't possibly b= e useful. > Yeah. Currently, the hva in the kvm_riscv_cove_page is not used anywhere in the code. We can remove it. > > + cpage->is_mapped =3D true; > > + INIT_LIST_HEAD(&cpage->link); > > + list_add(&cpage->link, &tvmc->measured_pages); > > + > > + mr->userspace_addr +=3D PAGE_SIZE; > > + mr->gpa +=3D PAGE_SIZE; > > + num_pages--; > > + conf_page =3D NULL; > > + > > + continue; > > + } > > + srcu_read_unlock(&kvm->srcu, idx); > > + > > + if (rc < 0) { > > + /* We don't to need unpin pages as it is allocated by the= hypervisor itself */ > > This comment makes no sense. The above code is doing all of the allocati= on and > pinning, which strongly suggests that KVM is the hypervisor. But this co= mment > implies that KVM is not the hypervisor. > I mean to say here the conf_page is allocated in the kernel using alloc_page. So no pinning/unpinning is required. It seems the comment is probably misleading & confusing at best. I will remove it. > And "pinned_page" is cleared unpinned in the loop after the page is added= +measured, > which looks to be the same model as TDX where "pinned_page" is the source= and > "conf_page" is gifted to the hypervisor. But on failure, e.g. when alloc= ating > "conf_page", that reference is not put. > Thanks. Will fix it. > > + cove_delete_page_list(kvm, &tvmc->measured_pages, false); > > + /* Free the last allocated page for which conversion/meas= urement failed */ > > + kfree(conf_page); > > Assuming my guesses about how the architecture works are correct, this is= broken > if sbi_covh_add_measured_pages() fails. The page has already been gifted = to the Yeah. The last conf_page should be reclaimed as well if measured_pages fail at any point in the loop. All other allocated ones would be reclaimed as a part of cove_delete_page_l= ist. > TSM by cove_convert_pages(), but there is no call to sbi_covh_tsm_reclaim= _pages(), > which I'm guessing is necesary to transition the page back to a state whe= re it can > be safely used by the host.