From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9BF52C433EF for ; Fri, 8 Oct 2021 07:30:45 +0000 (UTC) Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 527E460ED6 for ; Fri, 8 Oct 2021 07:30:45 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 527E460ED6 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=xen.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=lists.xenproject.org Received: from list by lists.xenproject.org with outflank-mailman.204442.359599 (Exim 4.92) (envelope-from ) id 1mYkKt-0007cD-Qe; Fri, 08 Oct 2021 07:30:31 +0000 X-Outflank-Mailman: Message body and most headers restored to incoming version Received: by outflank-mailman (output) from mailman id 204442.359599; Fri, 08 Oct 2021 07:30:31 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1mYkKt-0007c6-NM; Fri, 08 Oct 2021 07:30:31 +0000 Received: by outflank-mailman (input) for mailman id 204442; Fri, 08 Oct 2021 07:30:30 +0000 Received: from mail.xenproject.org ([104.130.215.37]) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1mYkKs-0007c0-D7 for xen-devel@lists.xenproject.org; Fri, 08 Oct 2021 07:30:30 +0000 Received: from xenbits.xenproject.org ([104.239.192.120]) by mail.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1mYkKr-0005oh-Ka; Fri, 08 Oct 2021 07:30:29 +0000 Received: from [92.174.63.74] (helo=[192.168.1.232]) by xenbits.xenproject.org with esmtpsa (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1mYkKr-0001lD-5X; Fri, 08 Oct 2021 07:30:29 +0000 X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=xen.org; s=20200302mail; h=Content-Transfer-Encoding:Content-Type:In-Reply-To:From: References:Cc:To:Subject:MIME-Version:Date:Message-ID; bh=vEIys9Vh77UpxxvT8UC3WhfUTMWVco+WVb6ZcieKSDw=; b=wmzcWXKDN6scuZk0bmrz1kxtjX q5rf/19WIqZsofq73Q13/OPKDI88yOiFvkJhVZFMACLHplVh/jnT5h970lPWEON86uHnfzT9NqXai A311ddILdLcoK2Pa8BCQNPCQCRfovNBA1Xpg62JNwKSzMUDW7ZAl2UDWqoo4cTT4hDjE=; Message-ID: Date: Fri, 8 Oct 2021 09:30:25 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Thunderbird/91.1.2 Subject: Re: [PING] Re: [PATCH] xen/arm: optee: Allocate anonymous domheap pages To: Stefano Stabellini , Volodymyr Babchuk Cc: Oleksandr , "xen-devel@lists.xenproject.org" , Oleksandr Tyshchenko , Julien Grall References: <1630935741-17088-1-git-send-email-olekstysh@gmail.com> <87bl4jrovq.fsf@epam.com> <81f7699b-69c1-da52-a1a7-5000343bf872@gmail.com> <87r1cwra4t.fsf@epam.com> From: Julien Grall In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Hi Stefano, On 07/10/2021 23:14, Stefano Stabellini wrote: > On Thu, 7 Oct 2021, Volodymyr Babchuk wrote: >> Hi Stefano, >> >> Stefano Stabellini writes: >> >>> On Wed, 6 Oct 2021, Oleksandr wrote: >>>> Hello all >>>> >>>> Gentle reminder. >>> >>> Many thanks for the ping, this patch fell off my radar. >>> >>> >>> >>>> On 23.09.21 23:57, Volodymyr Babchuk wrote: >>>>> Hi Stefano, >>>>> >>>>> Stefano Stabellini writes: >>>>> >>>>>> On Mon, 6 Sep 2021, Oleksandr Tyshchenko wrote: >>>>>>> From: Oleksandr Tyshchenko >>>>>>> >>>>>>> Allocate anonymous domheap pages as there is no strict need to >>>>>>> account them to a particular domain. >>>>>>> >>>>>>> Since XSA-383 "xen/arm: Restrict the amount of memory that dom0less >>>>>>> domU and dom0 can allocate" the dom0 cannot allocate memory outside >>>>>>> of the pre-allocated region. This means if we try to allocate >>>>>>> non-anonymous page to be accounted to dom0 we will get an >>>>>>> over-allocation issue when assigning that page to the domain. >>>>>>> The anonymous page, in turn, is not assigned to any domain. >>>>>>> >>>>>>> CC: Julien Grall >>>>>>> Signed-off-by: Oleksandr Tyshchenko >>>>>>> Acked-by: Volodymyr Babchuk >>>>>> Only one question, which is more architectural: given that these pages >>>>>> are "unlimited", could the guest exploit the interface somehow to force >>>>>> Xen to allocate an very high number of anonymous pages? >>>>>> >>>>>> E.g. could a domain call OPTEE_SMC_RPC_FUNC_ALLOC in a loop to force Xen >>>>>> to exaust all memory pages? >>>>> Generally, OP-TEE mediator tracks all resources allocated and imposes >>>>> limits on them. >>>>> >>>>> OPTEE_SMC_RPC_FUNC_ALLOC case is a bit different, because it is issued >>>>> not by domain, but by OP-TEE itself. As OP-TEE is more trusted piece of >>>>> system we allow it to request as many buffers as it wants. Also, we know >>>>> that OP-TEE asks only for one such buffer per every standard call. And >>>>> number of simultaneous calls is limited by number of OP-TEE threads, >>>>> which is quite low: typically only two. >>> >>> So let me repeat it differently to see if I understood correctly: >>> >>> - OPTEE_SMC_RPC_FUNC_ALLOC is only called by OP-TEE, not by the domain >>> - OPTEE is trusted and only call it twice anyway >> >> Correct. >> >>> I am OK with this argument, but do we have a check to make sure a domU >>> cannot issue OPTEE_SMC_RPC_FUNC_ALLOC? >> >> domU can't issue any RPC, because all RPCs are issued from OP-TEE >> side. This is the nature of RPC - OP-TEE requests Normal World for some >> service. But of course, Normal World can perform certain actions that >> will make OP-TEE to issue a RPC. I discuss this in depth below. >> >>> >>> Looking at the patch, there are other two places, in addition to >>> OPTEE_SMC_RPC_FUNC_ALLOC, where the anonymous memory pages can be >>> allocated: >>> >>> 1) copy_std_request >>> 2) translate_noncontig >>> >>> We need to prove that neither 1) or 2) can result in a domU exausting >>> Xen memory. >>> >>> In the case of 1), it looks like the memory is freed before returning to >>> the DomU, right? If so, it should be no problem? >> >> Yes, mediator makes shadow copy of every request buffer to hide >> translated addresses from the guest. Number of requests is limited by >> number of OP-TEE threads. >> >>> In the case of 2), it looks like the memory could outlive the call where >>> it is allocated. Is there any kind of protection against issuing >>> something like OPTEE_MSG_ATTR_TYPE_TMEM_INOUT in a loop? Is it OP-TEE >>> itself that would refuse the attempt? Thus, the idea is that >>> do_call_with_arg will return error and we'll just free the memory there? >> >> Well, translate_noncontig() calls allocate_optee_shm_buf() which counts >> all allocated buffers. So you can't call it more than >> MAX_SHM_BUFFER_COUNT times, without de-allocating previous memory. But, >> thanks to your question, I have found a bug there: memory is not freed >> if allocate_optee_shm_buf() fails. I'll prepare patch later today. >> >>> I cannot see a check for errors returned by do_call_with_arg and memory >>> freeing done because of that. Sorry I am not super familiar with the >>> code, I am just trying to make sure we are not offering to DomUs an easy >>> way to crash the system. >> >> I tried to eliminate all possibilities for a guest to crash the >> system. Of course, this does not mean that there are none of them. >> >> And yes, code is a bit hard to understand, because calls to OP-TEE are >> stateful and you need to account for that state. From NW and SW this >> looks quite fine, because state is handled naturally. But mediator sits >> in a middle, so it's implementation is a bit messy. >> >> I'll try to explain what is going on, so you it will be easier to >> understand logic in the mediator. >> >> There are two types of OP-TEE calls: fast calls and standard calls. Fast >> call is simple: call SMC and get result. It does not allocate thread >> context in OP-TEE and is non-preemptive. So yes, it should be fast. It >> is used for simple things like "get OP-TEE version" or "exchange >> capabilities". It is easy to handle them in mediator: just forward >> the call, check result, return it back to a guest. >> >> Standard calls are stateful. OP-TEE allocates thread for each call. This >> call can be preempted either by IRQ or by RPC. For consistency IRQ >> return is also considered as special type of RPC. So, in general one >> standard call can consist of series of SMCs: >> >> --> SMC with request >> <-- RPC return (like IRQ) >> --> SMC "resume call" >> <-- RPC return (like "read disk") >> --> SMC "resume call" >> <-- RPC return (like "send network packet") >> --> SMC "resume call" >> ... >> <-- Final return >> >> There are many types of RPCs: "handle IRQ", additional shared buffer >> allocation/de-allocation, RPMB access, disks access, network access, >> synchronization primitives (when OP-TEE thread is gets blocked on a >> mutex), etc. >> >> Two more things that makes all this worse: Normal World can register >> shared buffer with OP-TEE. Such buffer can live indefinitely >> long. Also, Normal World decides when to resume call. For example, >> calling process can be preempted and then resumed seconds >> later. Misbehaving guest can decide to not resume call at all. >> >> As I said, I tried to take all this things into account. There are >> basically 3 types of objects that can lead to memory allocation on Xen >> side: >> >> 1. Standard call context. Besides memory space for struct optee_std_call >> itself it allocates page for a shadow buffer, where argument addresses >> are translated by Xen. Number of this objects is limited by number of >> OP-TEE threads: >> >> count = atomic_add_unless(&ctx->call_count, 1, max_optee_threads); >> if ( count == max_optee_threads ) >> return ERR_PTR(-ENOSPC); >> >> 2. Shared buffer. This is a buffer shared by guest with OP-TEE. It can >> be either temporary buffer which is shared for one standard call >> duration, or registered shared buffer, which is remains active until it >> is de-registered. This is where translate_noncontig() comes into play. >> Number of this buffers is limited in allocate_optee_shm_buf(): >> >> count = atomic_add_unless(&ctx->optee_shm_buf_count, 1, >> MAX_SHM_BUFFER_COUNT); >> if ( count == MAX_SHM_BUFFER_COUNT ) >> return ERR_PTR(-ENOMEM); >> >> 3. Shared RPC buffer. This is very special kind of buffer. Basically, >> OP-TEE needs some shared memory to provide RPC call parameters. So it >> requests buffer from Normal World. There is no hard limit on this from >> mediator side, because, as I told earlier, OP-TEE itself limits number >> of this buffers. There is no cases when more that one buffer will be >> allocated per OP-TEE thread. This type of buffer is used only to process >> RPC requests themselves. OP-TEE can request more buffers via RPC, but >> they will fall to p.2: NW uses separate request to register buffer and >> then returns its handle in the preempted call. >> >> >> Apart from those two limits, there is a limit on total number of pages >> which is shared between guest and OP-TEE: MAX_TOTAL_SMH_BUF_PG. This >> limit is for a case when guest tries to allocate few really BIG buffers. >> >> >>> It looks like they could be called from one of the OPTEE operations that >>> a domU could request? Is there a limit for them? >> >> Yes, there are limits, as I described above. >> >> Also, bear in mind that resources available to OP-TEE are also quite >> limited. So, in case of some breach in mediator, OP-TEE will give up >> first. This of course is not an excuse to have bugs in the mediator... > > OK, thanks for the explanation. The reasons for my questions is that if > the allocations are using the memory of DomU, then at worst DomU can run > out of memory. Not really. The worst outcome is still a DoS of the host because we don't pre-allocate memory or even check that the total allocation will not exhaust the memory. The only difference is I would argue this would be a misconfiguration of the system. > But if the allocations are using anonymous memory, then > the whole platform might run out of memory. We have issued XSAs for > things like that in the past. > > This is why I am worried about this patch: if we apply it we really > become reliant on these limits being implemented correctly. A bug can > have much more severe consequences. This is not a problem specific to OP-TEE. Any anymous allocation (xmalloc,...) done in Xen on behalf of the guest has, in theory, the same problem (see more below). > > As you are the maintainer for this code, and this code is not security > supported, I'll leave it up to you (also see the other email about > moving optee to "supported, not security supported"). > > However, maybe a different solution would be to increase max_pages for a > domain when optee is enabled? Maybe just by a few pages (as many as > needed by the optee mediator)? Because if we did that, we wouldn't risk > exposing DOS attack vectors for every bug in the mediator limits checks. I think we need to differentiate two sorts of allocation: 1) Memory used by Xen on behalf of the guest 2) Memory used by the guest itself d->max_pages is only meant to refer to the latter (in fact, a guest can balloon memory up to d->max_pages). In this case, we are discussing about the latter and therefore I think the should be accounted differently as the memory is not exposed to the guest. Today, Xen doesn't have this facility. I know this has been discussed a few times in the past, but AFAIK, a patch series never materialized for it. However, to me, this sounds more an hardening work for the whole Xen rather than OP-TEE itself. So I think the patch provided by Oleksandr is probably the best way to go for this release. > > The below adds a 10 pages slack. > > > diff --git a/xen/arch/arm/tee/tee.c b/xen/arch/arm/tee/tee.c > index 3964a8a5cd..a3105f1a9a 100644 > --- a/xen/arch/arm/tee/tee.c > +++ b/xen/arch/arm/tee/tee.c > @@ -38,8 +38,11 @@ bool tee_handle_call(struct cpu_user_regs *regs) > return cur_mediator->ops->handle_call(regs); > } > > +#define TEE_SLACK (10) > int tee_domain_init(struct domain *d, uint16_t tee_type) > { > + int ret; > + > if ( tee_type == XEN_DOMCTL_CONFIG_TEE_NONE ) > return 0; > > @@ -49,7 +52,15 @@ int tee_domain_init(struct domain *d, uint16_t tee_type) > if ( cur_mediator->tee_type != tee_type ) > return -EINVAL; > > - return cur_mediator->ops->domain_init(d); > + ret = cur_mediator->ops->domain_init(d); > + if ( ret < 0 ) > + return ret; > + > + /* > + * Increase maxmem for domains with TEE, the extra pages are used by > + * the mediator > + */ > + d->max_pages += TEE_SLACK; Regardless what I wrote above, this change would be incorrect because TEE is initialized the when domain is created. However, d->max_pages is set afterwards via DOMCTL_max_mem, so the value will get overridden. However, I don't think OP-TEE code should modify d->max_pages. Instead, this should be accounted by the toolstack (or domain_build for dom0/domU created by Xen). Cheers, -- Julien Grall