Hi Paul, Julien, Volodymyr hasn't come back with an update to this patch, but I think it is good enough as-is as a bug fix and I would rather have it in its current form in 4.14 than not having it at all leaving the bug unfixed. I think Julien agrees. Paul, are you OK with this? On Mon, 11 May 2020, Stefano Stabellini wrote: > On Mon, 11 May 2020, Andrew Cooper wrote: > > On 11/05/2020 10:34, Julien Grall wrote: > > > Hi Volodymyr, > > > > > > On 06/05/2020 02:44, Volodymyr Babchuk wrote: > > >> Normal World can share buffer with OP-TEE for two reasons: > > >> 1. Some client application wants to exchange data with TA > > >> 2. OP-TEE asks for shared buffer for internal needs > > >> > > >> The second case was handle more strictly than necessary: > > >> > > >> 1. In RPC request OP-TEE asks for buffer > > >> 2. NW allocates buffer and provides it via RPC response > > >> 3. Xen pins pages and translates data > > >> 4. Xen provides buffer to OP-TEE > > >> 5. OP-TEE uses it > > >> 6. OP-TEE sends request to free the buffer > > >> 7. NW frees the buffer and sends the RPC response > > >> 8. Xen unpins pages and forgets about the buffer > > >> > > >> The problem is that Xen should forget about buffer in between stages 6 > > >> and 7. I.e. the right flow should be like this: > > >> > > >> 6. OP-TEE sends request to free the buffer > > >> 7. Xen unpins pages and forgets about the buffer > > >> 8. NW frees the buffer and sends the RPC response > > >> > > >> This is because OP-TEE internally frees the buffer before sending the > > >> "free SHM buffer" request. So we have no reason to hold reference for > > >> this buffer anymore. Moreover, in multiprocessor systems NW have time > > >> to reuse buffer cookie for another buffer. Xen complained about this > > >> and denied the new buffer registration. I have seen this issue while > > >> running tests on iMX SoC. > > >> > > >> So, this patch basically corrects that behavior by freeing the buffer > > >> earlier, when handling RPC return from OP-TEE. > > >> > > >> Signed-off-by: Volodymyr Babchuk > > >> --- > > >>   xen/arch/arm/tee/optee.c | 24 ++++++++++++++++++++---- > > >>   1 file changed, 20 insertions(+), 4 deletions(-) > > >> > > >> diff --git a/xen/arch/arm/tee/optee.c b/xen/arch/arm/tee/optee.c > > >> index 6a035355db..af19fc31f8 100644 > > >> --- a/xen/arch/arm/tee/optee.c > > >> +++ b/xen/arch/arm/tee/optee.c > > >> @@ -1099,6 +1099,26 @@ static int handle_rpc_return(struct > > >> optee_domain *ctx, > > >>           if ( shm_rpc->xen_arg->cmd == OPTEE_RPC_CMD_SHM_ALLOC ) > > >>               call->rpc_buffer_type = > > >> shm_rpc->xen_arg->params[0].u.value.a; > > >>   +        /* > > >> +         * OP-TEE signals that it frees the buffer that it requested > > >> +         * before. This is the right for us to do the same. > > >> +         */ > > >> +        if ( shm_rpc->xen_arg->cmd == OPTEE_RPC_CMD_SHM_FREE ) > > >> +        { > > >> +            uint64_t cookie = shm_rpc->xen_arg->params[0].u.value.b; > > >> + > > >> +            free_optee_shm_buf(ctx, cookie); > > >> + > > >> +            /* > > >> +             * This should never happen. We have a bug either in the > > >> +             * OP-TEE or in the mediator. > > >> +             */ > > >> +            if ( call->rpc_data_cookie && call->rpc_data_cookie != > > >> cookie ) > > >> +                gprintk(XENLOG_ERR, > > >> +                        "Saved RPC cookie does not corresponds to > > >> OP-TEE's (%"PRIx64" != %"PRIx64")\n", > > > > > > s/corresponds/correspond/ > > > > > >> +                        call->rpc_data_cookie, cookie); > > > > > > IIUC, if you free the wrong SHM buffer then your guest is likely to be > > > running incorrectly afterwards. So shouldn't we crash the guest to > > > avoid further issue? > > > > No - crashing the guest prohibits testing of the interface, and/or the > > guest realising it screwed up and dumping enough state to usefully debug > > what is going on. > > > > Furthermore, if userspace could trigger this path, we'd have to issue an > > XSA. > > > > Crashing the guest is almost never the right thing to do, and definitely > > not appropriate for a bad parameter. > > Maybe we want to close the OPTEE interface for the guest, instead of > crashing the whole VM. I.e. freeing the OPTEE context for the domain > (d->arch.tee)? > > But I think the patch is good as it is honestly.