Hi Paul, Julien,

Volodymyr hasn't come back with an update to this patch, but I think it
is good enough as-is as a bug fix and I would rather have it in its
current form in 4.14 than not having it at all leaving the bug unfixed.

I think Julien agrees.


Paul, are you OK with this?


On Mon, 11 May 2020, Stefano Stabellini wrote:
> On Mon, 11 May 2020, Andrew Cooper wrote:
> > On 11/05/2020 10:34, Julien Grall wrote:
> > > Hi Volodymyr,
> > >
> > > On 06/05/2020 02:44, Volodymyr Babchuk wrote:
> > >> Normal World can share buffer with OP-TEE for two reasons:
> > >> 1. Some client application wants to exchange data with TA
> > >> 2. OP-TEE asks for shared buffer for internal needs
> > >>
> > >> The second case was handle more strictly than necessary:
> > >>
> > >> 1. In RPC request OP-TEE asks for buffer
> > >> 2. NW allocates buffer and provides it via RPC response
> > >> 3. Xen pins pages and translates data
> > >> 4. Xen provides buffer to OP-TEE
> > >> 5. OP-TEE uses it
> > >> 6. OP-TEE sends request to free the buffer
> > >> 7. NW frees the buffer and sends the RPC response
> > >> 8. Xen unpins pages and forgets about the buffer
> > >>
> > >> The problem is that Xen should forget about buffer in between stages 6
> > >> and 7. I.e. the right flow should be like this:
> > >>
> > >> 6. OP-TEE sends request to free the buffer
> > >> 7. Xen unpins pages and forgets about the buffer
> > >> 8. NW frees the buffer and sends the RPC response
> > >>
> > >> This is because OP-TEE internally frees the buffer before sending the
> > >> "free SHM buffer" request. So we have no reason to hold reference for
> > >> this buffer anymore. Moreover, in multiprocessor systems NW have time
> > >> to reuse buffer cookie for another buffer. Xen complained about this
> > >> and denied the new buffer registration. I have seen this issue while
> > >> running tests on iMX SoC.
> > >>
> > >> So, this patch basically corrects that behavior by freeing the buffer
> > >> earlier, when handling RPC return from OP-TEE.
> > >>
> > >> Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
> > >> ---
> > >>   xen/arch/arm/tee/optee.c | 24 ++++++++++++++++++++----
> > >>   1 file changed, 20 insertions(+), 4 deletions(-)
> > >>
> > >> diff --git a/xen/arch/arm/tee/optee.c b/xen/arch/arm/tee/optee.c
> > >> index 6a035355db..af19fc31f8 100644
> > >> --- a/xen/arch/arm/tee/optee.c
> > >> +++ b/xen/arch/arm/tee/optee.c
> > >> @@ -1099,6 +1099,26 @@ static int handle_rpc_return(struct
> > >> optee_domain *ctx,
> > >>           if ( shm_rpc->xen_arg->cmd == OPTEE_RPC_CMD_SHM_ALLOC )
> > >>               call->rpc_buffer_type =
> > >> shm_rpc->xen_arg->params[0].u.value.a;
> > >>   +        /*
> > >> +         * OP-TEE signals that it frees the buffer that it requested
> > >> +         * before. This is the right for us to do the same.
> > >> +         */
> > >> +        if ( shm_rpc->xen_arg->cmd == OPTEE_RPC_CMD_SHM_FREE )
> > >> +        {
> > >> +            uint64_t cookie = shm_rpc->xen_arg->params[0].u.value.b;
> > >> +
> > >> +            free_optee_shm_buf(ctx, cookie);
> > >> +
> > >> +            /*
> > >> +             * This should never happen. We have a bug either in the
> > >> +             * OP-TEE or in the mediator.
> > >> +             */
> > >> +            if ( call->rpc_data_cookie && call->rpc_data_cookie !=
> > >> cookie )
> > >> +                gprintk(XENLOG_ERR,
> > >> +                        "Saved RPC cookie does not corresponds to
> > >> OP-TEE's (%"PRIx64" != %"PRIx64")\n",
> > >
> > > s/corresponds/correspond/
> > >
> > >> +                        call->rpc_data_cookie, cookie);
> > >
> > > IIUC, if you free the wrong SHM buffer then your guest is likely to be
> > > running incorrectly afterwards. So shouldn't we crash the guest to
> > > avoid further issue?
> > 
> > No - crashing the guest prohibits testing of the interface, and/or the
> > guest realising it screwed up and dumping enough state to usefully debug
> > what is going on.
> > 
> > Furthermore, if userspace could trigger this path, we'd have to issue an
> > XSA.
> > 
> > Crashing the guest is almost never the right thing to do, and definitely
> > not appropriate for a bad parameter.
> 
> Maybe we want to close the OPTEE interface for the guest, instead of
> crashing the whole VM. I.e. freeing the OPTEE context for the domain
> (d->arch.tee)?
> 
> But I think the patch is good as it is honestly.