From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: ** X-Spam-Status: No, score=2.5 required=3.0 tests=BAYES_00,DKIM_ADSP_CUSTOM_MED, DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,HTML_MESSAGE,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 380BDC47094 for ; Thu, 10 Jun 2021 15:59:42 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 05ED4613D0 for ; Thu, 10 Jun 2021 15:59:42 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 05ED4613D0 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=dri-devel-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 79B8B6ED7D; Thu, 10 Jun 2021 15:59:41 +0000 (UTC) Received: from mail-pj1-x102e.google.com (mail-pj1-x102e.google.com [IPv6:2607:f8b0:4864:20::102e]) by gabe.freedesktop.org (Postfix) with ESMTPS id 0045C6ED7D; Thu, 10 Jun 2021 15:59:40 +0000 (UTC) Received: by mail-pj1-x102e.google.com with SMTP id k5so3938377pjj.1; Thu, 10 Jun 2021 08:59:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=5V9FxhzgFVA8Wrsw4ebRp82C6CexEEIPsx/NmzmzgFw=; b=kyuG0CbUq+YfmgGMb0UuuelzgI2dcCQ4i4MWLko1g5t23fk4BygxmeONAaVvN1Pyvn qZbMSwrqT6B9fbe/9sW7cgQYwJgUSNStL0aAQm213E40Kc7FO8msMARIzbIv1fxgLb6q sm/cMJT9/yB1oMbj2L7x3j89CPAXeuqfs6xiAdtZ9RiY/zXgxRHf/sNBpr0pFcCcZbex zrxj/kJDrVaZZ/Ghzk6fsYbYEBVcZJnu9G0NEU7g+52C6Ozokif2J7gCuUGIBIvFZv88 rampCsgVn6o5czfvsHCVsWI5hW0g4S5b/3J1ov+o6KcNsqo/jSFvhY1Auz+8QHPfVmMh L8pg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=5V9FxhzgFVA8Wrsw4ebRp82C6CexEEIPsx/NmzmzgFw=; b=Cz0Btpdm8LN6V0MQosLyVZWLd2Jv9i99NUuH/hkwO2hvmn33+6yzgRga8i6/CeIcEp D1K/EL1IY+uliLSRcRPY7tGWHiW58az76DGxZh6iGysnFnrEZDWrusbkAeiOk8/dDfHZ LNcTtl7Rg1LQvQ7ZlQEmjHJ7DLeJiln4VauxIpX1xSrrLKuYv1F8tnTRwyF/IyptXTor uSxgW2+NLeUsWWbtNtjxCqSCbuVMZQQolIoIzwNAvoFS1IxRuSpY3n94xH0xIeL/uyGQ j09YXv+ThnywYRpxyaFMltw3lvVG/kdl8W3C4CZGSs19qtOWwB52+NCW9ZSB7BePiMzt 2ViQ== X-Gm-Message-State: AOAM532SSSRhlZsSIF/ORXZPi4LAuDJY2P5b1hKIHuJ2ZMotFFtndo3b k1Fa04J30apjXS8JmcyaMJ4TS789p0kcH+gjTqY= X-Google-Smtp-Source: ABdhPJx8dGDISD5S9kTT+wHthi1I6oL0jm+1GchCe0vtGFw9GxBKcZVkgc5KTewm7Lg85TgWxXjqY6la3HwnqS0H3xE= X-Received: by 2002:a17:902:f24c:b029:10e:7d70:aadc with SMTP id j12-20020a170902f24cb029010e7d70aadcmr5607548plc.74.1623340780516; Thu, 10 Jun 2021 08:59:40 -0700 (PDT) MIME-Version: 1.0 References: <0fbb1197-fa88-c474-09db-6daec13d3004@gmail.com> <586edeb3-73df-3da2-4925-1829712cba8b@gmail.com> <1478737b-88aa-a24a-d2d7-cd3716df0cb0@gmail.com> In-Reply-To: From: =?UTF-8?B?TWFyZWsgT2zFocOhaw==?= Date: Thu, 10 Jun 2021 11:59:04 -0400 Message-ID: Subject: Re: [Mesa-dev] Linux Graphics Next: Userspace submission update To: Daniel Vetter Content-Type: multipart/alternative; boundary="000000000000927b5305c46b7af9" X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: =?UTF-8?Q?Christian_K=C3=B6nig?= , =?UTF-8?Q?Michel_D=C3=A4nzer?= , dri-devel , Jason Ekstrand , ML Mesa-dev Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" --000000000000927b5305c46b7af9 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Daniel, We just talked about this whole topic internally and we came up to the conclusion that the hardware needs to understand sync object handles and have high-level wait and signal operations in the command stream. Sync objects will be backed by memory, but they won't be readable or writable by processes directly. The hardware will log all accesses to sync objects and will send the log to the kernel periodically. The kernel will identify malicious behavior. Example of a hardware command stream: ... ImplicitSyncWait(syncObjHandle, sequenceNumber); // the sequence number is assigned by the kernel Draw(); ImplicitSyncSignalWhenDone(syncObjHandle); ... I'm afraid we have no other choice because of the TLB invalidation overhead= . Marek On Wed, Jun 9, 2021 at 2:31 PM Daniel Vetter wrote: > On Wed, Jun 09, 2021 at 03:58:26PM +0200, Christian K=C3=B6nig wrote: > > Am 09.06.21 um 15:19 schrieb Daniel Vetter: > > > [SNIP] > > > > Yeah, we call this the lightweight and the heavyweight tlb flush. > > > > > > > > The lighweight can be used when you are sure that you don't have an= y > of the > > > > PTEs currently in flight in the 3D/DMA engine and you just need to > > > > invalidate the TLB. > > > > > > > > The heavyweight must be used when you need to invalidate the TLB > *AND* make > > > > sure that no concurrently operation moves new stuff into the TLB. > > > > > > > > The problem is for this use case we have to use the heavyweight one= . > > > Just for my own curiosity: So the lightweight flush is only for > in-between > > > CS when you know access is idle? Or does that also not work if > userspace > > > has a CS on a dma engine going at the same time because the tlb aren'= t > > > isolated enough between engines? > > > > More or less correct, yes. > > > > The problem is a lightweight flush only invalidates the TLB, but doesn'= t > > take care of entries which have been handed out to the different engine= s. > > > > In other words what can happen is the following: > > > > 1. Shader asks TLB to resolve address X. > > 2. TLB looks into its cache and can't find address X so it asks the > walker > > to resolve. > > 3. Walker comes back with result for address X and TLB puts that into i= ts > > cache and gives it to Shader. > > 4. Shader starts doing some operation using result for address X. > > 5. You send lightweight TLB invalidate and TLB throws away cached value= s > for > > address X. > > 6. Shader happily still uses whatever the TLB gave to it in step 3 to > > accesses address X > > > > See it like the shader has their own 1 entry L0 TLB cache which is not > > affected by the lightweight flush. > > > > The heavyweight flush on the other hand sends out a broadcast signal to > > everybody and only comes back when we are sure that an address is not i= n > use > > any more. > > Ah makes sense. On intel the shaders only operate in VA, everything goes > around as explicit async messages to IO blocks. So we don't have this, th= e > only difference in tlb flushes is between tlb flush in the IB and an mmio > one which is independent for anything currently being executed on an > egine. > -Daniel > -- > Daniel Vetter > Software Engineer, Intel Corporation > http://blog.ffwll.ch > --000000000000927b5305c46b7af9 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi Daniel,

We just talked ab= out this whole topic internally and we came up to the conclusion that the h= ardware needs to understand sync object handles and have high-level wait an= d signal operations in the command stream. Sync objects will be backed by m= emory, but they won't be readable or writable by processes directly. Th= e hardware will log all accesses to sync objects and will send the log to t= he kernel periodically. The kernel will identify malicious behavior.

Example of a hardware command stream:
...<= /div>
ImplicitSyncWait(syncObjHandle, sequenceNumber); // the sequence = number is assigned by the kernel
Draw();
ImplicitSy= ncSignalWhenDone(syncObjHandle);
...

I&#= 39;m afraid we have no other choice because of the TLB invalidation overhea= d.

Marek


On Wed, Jun 9, 20= 21 at 2:31 PM Daniel Vetter <daniel@f= fwll.ch> wrote:
On Wed, Jun 09, 2021 at 03:58:26PM +0200, Christian K=C3=B6nig wrote= :
> Am 09.06.21 um 15:19 schrieb Daniel Vetter:
> > [SNIP]
> > > Yeah, we call this the lightweight and the heavyweight tlb f= lush.
> > >
> > > The lighweight can be used when you are sure that you don= 9;t have any of the
> > > PTEs currently in flight in the 3D/DMA engine and you just n= eed to
> > > invalidate the TLB.
> > >
> > > The heavyweight must be used when you need to invalidate the= TLB *AND* make
> > > sure that no concurrently operation moves new stuff into the= TLB.
> > >
> > > The problem is for this use case we have to use the heavywei= ght one.
> > Just for my own curiosity: So the lightweight flush is only for i= n-between
> > CS when you know access is idle? Or does that also not work if us= erspace
> > has a CS on a dma engine going at the same time because the tlb a= ren't
> > isolated enough between engines?
>
> More or less correct, yes.
>
> The problem is a lightweight flush only invalidates the TLB, but doesn= 't
> take care of entries which have been handed out to the different engin= es.
>
> In other words what can happen is the following:
>
> 1. Shader asks TLB to resolve address X.
> 2. TLB looks into its cache and can't find address X so it asks th= e walker
> to resolve.
> 3. Walker comes back with result for address X and TLB puts that into = its
> cache and gives it to Shader.
> 4. Shader starts doing some operation using result for address X.
> 5. You send lightweight TLB invalidate and TLB throws away cached valu= es for
> address X.
> 6. Shader happily still uses whatever the TLB gave to it in step 3 to<= br> > accesses address X
>
> See it like the shader has their own 1 entry L0 TLB cache which is not=
> affected by the lightweight flush.
>
> The heavyweight flush on the other hand sends out a broadcast signal t= o
> everybody and only comes back when we are sure that an address is not = in use
> any more.

Ah makes sense. On intel the shaders only operate in VA, everything goes around as explicit async messages to IO blocks. So we don't have this, = the
only difference in tlb flushes is between tlb flush in the IB and an mmio one which is independent for anything currently being executed on an
egine.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http:= //blog.ffwll.ch
--000000000000927b5305c46b7af9--