From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 49AC3C43334 for ; Fri, 10 Jun 2022 06:53:35 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 26D9011256A; Fri, 10 Jun 2022 06:53:31 +0000 (UTC) Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by gabe.freedesktop.org (Postfix) with ESMTPS id 6279611AF9B; Fri, 10 Jun 2022 06:53:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1654844009; x=1686380009; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=1rFyQO9PMUbCGXn/DBESJSiVXfyZw4oym6wwclUsN2A=; b=hI57yYYrIl8SNYBFzB/y8cz89NpZBXkMuyEnhB4ZHhMm+taDM30lAcKP 2j1E2CCS4kMWG8XMGStCtc3/nD7/WrT4A+b0OecFoflEntNrXiXG8QyCn Iaqel1qoxWnnjoxSVbfsJEMUED1QI3anVC9WWRz/PDHtsUP3vRYsVPqOU 5cHhtso96JxUtp7e86jFTbdw/z0MknYC8o8bW/Mne6Hcnqyl+tvrnJ7lW jUyFrg6OiHUvmFLrTd3mSOEPr04IggI9FE6T5HlbRU8F3R05vwhaDp2XN rJyKUQuawF4EexgkYvFupIYisp1qO9DklIIjFZyf5CZTk3Y03hU97GP/h g==; X-IronPort-AV: E=McAfee;i="6400,9594,10373"; a="363866172" X-IronPort-AV: E=Sophos;i="5.91,288,1647327600"; d="scan'208";a="363866172" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Jun 2022 23:53:28 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.91,288,1647327600"; d="scan'208";a="533957626" Received: from linux.intel.com ([10.54.29.200]) by orsmga003.jf.intel.com with ESMTP; 09 Jun 2022 23:53:28 -0700 Received: from [10.249.155.179] (unknown [10.249.155.179]) by linux.intel.com (Postfix) with ESMTP id EAFC258010C; Thu, 9 Jun 2022 23:53:25 -0700 (PDT) Message-ID: <891017f7-e276-66a1-dd9b-cbebc8f8a00d@intel.com> Date: Fri, 10 Jun 2022 09:53:24 +0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.9.1 Subject: Re: [Intel-gfx] [RFC v3 1/3] drm/doc/rfc: VM_BIND feature design document Content-Language: en-US To: Niranjana Vishwanathapura References: <20220603235148.GU4461@nvishwa1-DESK> <20220607181810.GV4461@nvishwa1-DESK> <20220607213209.GY4461@nvishwa1-DESK> <4be022cc-518e-49e1-96bd-b9720a313401@linux.intel.com> <20220608214431.GD4461@nvishwa1-DESK> <54fb6c28-7954-123e-edd6-ba6c15b6d36e@intel.com> <20220609193150.GG4461@nvishwa1-DESK> From: Lionel Landwerlin In-Reply-To: <20220609193150.GG4461@nvishwa1-DESK> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Intel GFX , Maling list - DRI developers , Thomas Hellstrom , Chris Wilson , Jason Ekstrand , Daniel Vetter , =?UTF-8?Q?Christian_K=c3=b6nig?= Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" On 09/06/2022 22:31, Niranjana Vishwanathapura wrote: > On Thu, Jun 09, 2022 at 05:49:09PM +0300, Lionel Landwerlin wrote: >>   On 09/06/2022 00:55, Jason Ekstrand wrote: >> >>     On Wed, Jun 8, 2022 at 4:44 PM Niranjana Vishwanathapura >>     wrote: >> >>       On Wed, Jun 08, 2022 at 08:33:25AM +0100, Tvrtko Ursulin wrote: >>       > >>       > >>       >On 07/06/2022 22:32, Niranjana Vishwanathapura wrote: >>       >>On Tue, Jun 07, 2022 at 11:18:11AM -0700, Niranjana >> Vishwanathapura >>       wrote: >>       >>>On Tue, Jun 07, 2022 at 12:12:03PM -0500, Jason Ekstrand wrote: >>       >>>> On Fri, Jun 3, 2022 at 6:52 PM Niranjana Vishwanathapura >>       >>>> wrote: >>       >>>> >>       >>>>   On Fri, Jun 03, 2022 at 10:20:25AM +0300, Lionel Landwerlin >>       wrote: >>       >>>>   >   On 02/06/2022 23:35, Jason Ekstrand wrote: >>       >>>>   > >>       >>>>   >     On Thu, Jun 2, 2022 at 3:11 PM Niranjana >> Vishwanathapura >>       >>>>   > wrote: >>       >>>>   > >>       >>>>   >       On Wed, Jun 01, 2022 at 01:28:36PM -0700, Matthew >>       >>>>Brost wrote: >>       >>>>   >       >On Wed, Jun 01, 2022 at 05:25:49PM +0300, Lionel >>       Landwerlin >>       >>>>   wrote: >>       >>>>   >       >> On 17/05/2022 21:32, Niranjana Vishwanathapura >>       wrote: >>       >>>>   >       >> > +VM_BIND/UNBIND ioctl will immediately start >>       >>>>   binding/unbinding >>       >>>>   >       the mapping in an >>       >>>>   >       >> > +async worker. The binding and unbinding will >>       >>>>work like a >>       >>>>   special >>       >>>>   >       GPU engine. >>       >>>>   >       >> > +The binding and unbinding operations are >>       serialized and >>       >>>>   will >>       >>>>   >       wait on specified >>       >>>>   >       >> > +input fences before the operation and will >> signal >>       the >>       >>>>   output >>       >>>>   >       fences upon the >>       >>>>   >       >> > +completion of the operation. Due to >>       serialization, >>       >>>>   completion of >>       >>>>   >       an operation >>       >>>>   >       >> > +will also indicate that all previous >> operations >>       >>>>are also >>       >>>>   >       complete. >>       >>>>   >       >> >>       >>>>   >       >> I guess we should avoid saying "will immediately >>       start >>       >>>>   >       binding/unbinding" if >>       >>>>   >       >> there are fences involved. >>       >>>>   >       >> >>       >>>>   >       >> And the fact that it's happening in an async >>       >>>>worker seem to >>       >>>>   imply >>       >>>>   >       it's not >>       >>>>   >       >> immediate. >>       >>>>   >       >> >>       >>>>   > >>       >>>>   >       Ok, will fix. >>       >>>>   >       This was added because in earlier design binding >> was >>       deferred >>       >>>>   until >>       >>>>   >       next execbuff. >>       >>>>   >       But now it is non-deferred (immediate in that >> sense). >>       >>>>But yah, >>       >>>>   this is >>       >>>>   >       confusing >>       >>>>   >       and will fix it. >>       >>>>   > >>       >>>>   >       >> >>       >>>>   >       >> I have a question on the behavior of the bind >>       >>>>operation when >>       >>>>   no >>       >>>>   >       input fence >>       >>>>   >       >> is provided. Let say I do : >>       >>>>   >       >> >>       >>>>   >       >> VM_BIND (out_fence=fence1) >>       >>>>   >       >> >>       >>>>   >       >> VM_BIND (out_fence=fence2) >>       >>>>   >       >> >>       >>>>   >       >> VM_BIND (out_fence=fence3) >>       >>>>   >       >> >>       >>>>   >       >> >>       >>>>   >       >> In what order are the fences going to be >> signaled? >>       >>>>   >       >> >>       >>>>   >       >> In the order of VM_BIND ioctls? Or out of order? >>       >>>>   >       >> >>       >>>>   >       >> Because you wrote "serialized I assume it's : in >>       order >>       >>>>   >       >> >>       >>>>   > >>       >>>>   >       Yes, in the order of VM_BIND/UNBIND ioctls. Note >> that >>       >>>>bind and >>       >>>>   unbind >>       >>>>   >       will use >>       >>>>   >       the same queue and hence are ordered. >>       >>>>   > >>       >>>>   >       >> >>       >>>>   >       >> One thing I didn't realize is that because we >> only >>       get one >>       >>>>   >       "VM_BIND" engine, >>       >>>>   >       >> there is a disconnect from the Vulkan >> specification. >>       >>>>   >       >> >>       >>>>   >       >> In Vulkan VM_BIND operations are serialized but >>       >>>>per engine. >>       >>>>   >       >> >>       >>>>   >       >> So you could have something like this : >>       >>>>   >       >> >>       >>>>   >       >> VM_BIND (engine=rcs0, in_fence=fence1, >>       out_fence=fence2) >>       >>>>   >       >> >>       >>>>   >       >> VM_BIND (engine=ccs0, in_fence=fence3, >>       out_fence=fence4) >>       >>>>   >       >> >>       >>>>   >       >> >>       >>>>   >       >> fence1 is not signaled >>       >>>>   >       >> >>       >>>>   >       >> fence3 is signaled >>       >>>>   >       >> >>       >>>>   >       >> So the second VM_BIND will proceed before the >>       >>>>first VM_BIND. >>       >>>>   >       >> >>       >>>>   >       >> >>       >>>>   >       >> I guess we can deal with that scenario in >>       >>>>userspace by doing >>       >>>>   the >>       >>>>   >       wait >>       >>>>   >       >> ourselves in one thread per engines. >>       >>>>   >       >> >>       >>>>   >       >> But then it makes the VM_BIND input fences >> useless. >>       >>>>   >       >> >>       >>>>   >       >> >>       >>>>   >       >> Daniel : what do you think? Should be rework >> this or >>       just >>       >>>>   deal with >>       >>>>   >       wait >>       >>>>   >       >> fences in userspace? >>       >>>>   >       >> >>       >>>>   >       > >>       >>>>   >       >My opinion is rework this but make the ordering >> via >>       >>>>an engine >>       >>>>   param >>       >>>>   >       optional. >>       >>>>   >       > >>       >>>>   >       >e.g. A VM can be configured so all binds are >> ordered >>       >>>>within the >>       >>>>   VM >>       >>>>   >       > >>       >>>>   >       >e.g. A VM can be configured so all binds accept an >>       engine >>       >>>>   argument >>       >>>>   >       (in >>       >>>>   >       >the case of the i915 likely this is a gem context >>       >>>>handle) and >>       >>>>   binds >>       >>>>   >       >ordered with respect to that engine. >>       >>>>   >       > >>       >>>>   >       >This gives UMDs options as the later likely >> consumes >>       >>>>more KMD >>       >>>>   >       resources >>       >>>>   >       >so if a different UMD can live with binds being >>       >>>>ordered within >>       >>>>   the VM >>       >>>>   >       >they can use a mode consuming less resources. >>       >>>>   >       > >>       >>>>   > >>       >>>>   >       I think we need to be careful here if we are >> looking >>       for some >>       >>>>   out of >>       >>>>   >       (submission) order completion of vm_bind/unbind. >>       >>>>   >       In-order completion means, in a batch of binds and >>       >>>>unbinds to be >>       >>>>   >       completed in-order, user only needs to specify >>       >>>>in-fence for the >>       >>>>   >       first bind/unbind call and the our-fence for the >> last >>       >>>>   bind/unbind >>       >>>>   >       call. Also, the VA released by an unbind call >> can be >>       >>>>re-used by >>       >>>>   >       any subsequent bind call in that in-order batch. >>       >>>>   > >>       >>>>   >       These things will break if binding/unbinding >> were to >>       >>>>be allowed >>       >>>>   to >>       >>>>   >       go out of order (of submission) and user need to be >>       extra >>       >>>>   careful >>       >>>>   >       not to run into pre-mature triggereing of >> out-fence and >>       bind >>       >>>>   failing >>       >>>>   >       as VA is still in use etc. >>       >>>>   > >>       >>>>   >       Also, VM_BIND binds the provided mapping on the >>       specified >>       >>>>   address >>       >>>>   >       space >>       >>>>   >       (VM). So, the uapi is not engine/context specific. >>       >>>>   > >>       >>>>   >       We can however add a 'queue' to the uapi which >> can be >>       >>>>one from >>       >>>>   the >>       >>>>   >       pre-defined queues, >>       >>>>   >       I915_VM_BIND_QUEUE_0 >>       >>>>   >       I915_VM_BIND_QUEUE_1 >>       >>>>   >       ... >>       >>>>   >       I915_VM_BIND_QUEUE_(N-1) >>       >>>>   > >>       >>>>   >       KMD will spawn an async work queue for each >> queue which >>       will >>       >>>>   only >>       >>>>   >       bind the mappings on that queue in the order of >>       submission. >>       >>>>   >       User can assign the queue to per engine or anything >>       >>>>like that. >>       >>>>   > >>       >>>>   >       But again here, user need to be careful and not >>       >>>>deadlock these >>       >>>>   >       queues with circular dependency of fences. >>       >>>>   > >>       >>>>   >       I prefer adding this later an as extension based on >>       >>>>whether it >>       >>>>   >       is really helping with the implementation. >>       >>>>   > >>       >>>>   >     I can tell you right now that having everything on a >>       single >>       >>>>   in-order >>       >>>>   >     queue will not get us the perf we want.  What vulkan >>       >>>>really wants >>       >>>>   is one >>       >>>>   >     of two things: >>       >>>>   >      1. No implicit ordering of VM_BIND ops.  They just >>       happen in >>       >>>>   whatever >>       >>>>   >     their dependencies are resolved and we ensure >> ordering >>       >>>>ourselves >>       >>>>   by >>       >>>>   >     having a syncobj in the VkQueue. >>       >>>>   >      2. The ability to create multiple VM_BIND >> queues.  We >>       need at >>       >>>>   least 2 >>       >>>>   >     but I don't see why there needs to be a limit besides >>       >>>>the limits >>       >>>>   the >>       >>>>   >     i915 API already has on the number of engines.  >> Vulkan >>       could >>       >>>>   expose >>       >>>>   >     multiple sparse binding queues to the client if >> it's not >>       >>>>   arbitrarily >>       >>>>   >     limited. >>       >>>> >>       >>>>   Thanks Jason, Lionel. >>       >>>> >>       >>>>   Jason, what are you referring to when you say "limits >> the i915 >>       API >>       >>>>   already >>       >>>>   has on the number of engines"? I am not sure if there is >> such >>       an uapi >>       >>>>   today. >>       >>>> >>       >>>> There's a limit of something like 64 total engines today >> based on >>       the >>       >>>> number of bits we can cram into the exec flags in >> execbuffer2.  I >>       think >>       >>>> someone had an extended version that allowed more but I >> ripped it >>       out >>       >>>> because no one was using it.  Of course, execbuffer3 might >> not >>       >>>>have that >>       >>>> problem at all. >>       >>>> >>       >>> >>       >>>Thanks Jason. >>       >>>Ok, I am not sure which exec flag is that, but yah, execbuffer3 >>       probably >>       >>>will not have this limiation. So, we need to define a >>       VM_BIND_MAX_QUEUE >>       >>>and somehow export it to user (I am thinking of embedding it in >>       >>>I915_PARAM_HAS_VM_BIND. bits[0]->HAS_VM_BIND, bits[1-3]->'n' >>       meaning 2^n >>       >>>queues. >>       >> >>       >>Ah, I think you are waking about I915_EXEC_RING_MASK (0x3f) >> which >>       execbuf3 >> >>     Yup!  That's exactly the limit I was talking about. >> >>       >>will also have. So, we can simply define in vm_bind/unbind >>       structures, >>       >> >>       >>#define I915_VM_BIND_MAX_QUEUE   64 >>       >>        __u32 queue; >>       >> >>       >>I think that will keep things simple. >>       > >>       >Hmmm? What does execbuf2 limit has to do with how many engines >>       >hardware can have? I suggest not to do that. >>       > >>       >Change with added this: >>       > >>       >       if (set.num_engines > I915_EXEC_RING_MASK + 1) >>       >               return -EINVAL; >>       > >>       >To context creation needs to be undone and so let users create >> engine >>       >maps with all hardware engines, and let execbuf3 access them all. >>       > >> >>       Earlier plan was to carry I915_EXEC_RING_MAP (0x3f) to >> execbuff3 also. >>       Hence, I was using the same limit for VM_BIND queues (64, or 65 >> if we >>       make it N+1). >>       But, as discussed in other thread of this RFC series, we are >> planning >>       to drop this I915_EXEC_RING_MAP in execbuff3. So, there won't be >>       any uapi that limits the number of engines (and hence the vm_bind >>       queues >>       need to be supported). >> >>       If we leave the number of vm_bind queues to be arbitrarily large >>       (__u32 queue_idx) then, we need to have a hashmap for queue (a wq, >>       work_item and a linked list) lookup from the user specified queue >>       index. >>       Other option is to just put some hard limit (say 64 or 65) and use >>       an array of queues in VM (each created upon first use). I >> prefer this. >> >>     I don't get why a VM_BIND queue is any different from any other >> queue or >>     userspace-visible kernel object.  But I'll leave those details up to >>     danvet or whoever else might be reviewing the implementation. >>     --Jason >> >>   I kind of agree here. Wouldn't be simpler to have the bind queue >> created >>   like the others when we build the engine map? >> >>   For userspace it's then just matter of selecting the right queue ID >> when >>   submitting. >> >>   If there is ever a possibility to have this work on the GPU, it >> would be >>   all ready. >> > > I did sync offline with Matt Brost on this. > We can add a VM_BIND engine class and let user create VM_BIND engines > (queues). > The problem is, in i915 engine creating interface is bound to > gem_context. > So, in vm_bind ioctl, we would need both context_id and queue_idx for > proper > lookup of the user created engine. This is bit ackward as vm_bind is an > interface to VM (address space) and has nothing to do with gem_context. A gem_context has a single vm object right? Set through I915_CONTEXT_PARAM_VM at creation or given a default one if not. So it's just like picking up the vm like it's done at execbuffer time right now : eb->context->vm > Another problem is, if two VMs are binding with the same defined engine, > binding on VM1 can get unnecessary blocked by binding on VM2 (which > may be > waiting on its in_fence). Maybe I'm missing something, but how can you have 2 vm objects with a single gem_context right now? > > So, my preference here is to just add a 'u32 queue' index in > vm_bind/unbind > ioctl, and the queues are per VM. > > Niranjana > >>   Thanks, >> >>   -Lionel >> >> >>       Niranjana >> >>       >Regards, >>       > >>       >Tvrtko >>       > >>       >> >>       >>Niranjana >>       >> >>       >>> >>       >>>>   I am trying to see how many queues we need and don't >> want it to >>       be >>       >>>>   arbitrarily >>       >>>>   large and unduely blow up memory usage and complexity in >> i915 >>       driver. >>       >>>> >>       >>>> I expect a Vulkan driver to use at most 2 in the vast >> majority >>       >>>>of cases. I >>       >>>> could imagine a client wanting to create more than 1 sparse >>       >>>>queue in which >>       >>>> case, it'll be N+1 but that's unlikely. As far as complexity >>       >>>>goes, once >>       >>>> you allow two, I don't think the complexity is going up by >>       >>>>allowing N.  As >>       >>>> for memory usage, creating more queues means more memory.  >> That's >>       a >>       >>>> trade-off that userspace can make. Again, the expected number >>       >>>>here is 1 >>       >>>> or 2 in the vast majority of cases so I don't think you >> need to >>       worry. >>       >>> >>       >>>Ok, will start with n=3 meaning 8 queues. >>       >>>That would require us create 8 workqueues. >>       >>>We can change 'n' later if required. >>       >>> >>       >>>Niranjana >>       >>> >>       >>>> >>       >>>>   >     Why?  Because Vulkan has two basic kind of bind >>       >>>>operations and we >>       >>>>   don't >>       >>>>   >     want any dependencies between them: >>       >>>>   >      1. Immediate.  These happen right after BO >> creation or >>       >>>>maybe as >>       >>>>   part of >>       >>>>   >     vkBindImageMemory() or VkBindBufferMemory().  These >>       >>>>don't happen >>       >>>>   on a >>       >>>>   >     queue and we don't want them serialized with >> anything.       To >>       >>>>   synchronize >>       >>>>   >     with submit, we'll have a syncobj in the VkDevice >> which >>       is >>       >>>>   signaled by >>       >>>>   >     all immediate bind operations and make submits >> wait on >>       it. >>       >>>>   >      2. Queued (sparse): These happen on a VkQueue >> which may >>       be the >>       >>>>   same as >>       >>>>   >     a render/compute queue or may be its own queue.  >> It's up >>       to us >>       >>>>   what we >>       >>>>   >     want to advertise.  From the Vulkan API PoV, this >> is like >>       any >>       >>>>   other >>       >>>>   >     queue.  Operations on it wait on and signal >> semaphores.       If we >>       >>>>   have a >>       >>>>   >     VM_BIND engine, we'd provide syncobjs to wait and >>       >>>>signal just like >>       >>>>   we do >>       >>>>   >     in execbuf(). >>       >>>>   >     The important thing is that we don't want one type of >>       >>>>operation to >>       >>>>   block >>       >>>>   >     on the other.  If immediate binds are blocking on >> sparse >>       binds, >>       >>>>   it's >>       >>>>   >     going to cause over-synchronization issues. >>       >>>>   >     In terms of the internal implementation, I know that >>       >>>>there's going >>       >>>>   to be >>       >>>>   >     a lock on the VM and that we can't actually do these >>       things in >>       >>>>   >     parallel.  That's fine.  Once the dma_fences have >>       signaled and >>       >>>>   we're >>       >>>> >>       >>>>   Thats correct. It is like a single VM_BIND engine with >>       >>>>multiple queues >>       >>>>   feeding to it. >>       >>>> >>       >>>> Right.  As long as the queues themselves are independent and >>       >>>>can block on >>       >>>> dma_fences without holding up other queues, I think we're >> fine. >>       >>>> >>       >>>>   >     unblocked to do the bind operation, I don't care if >>       >>>>there's a bit >>       >>>>   of >>       >>>>   >     synchronization due to locking.  That's expected.  >> What >>       >>>>we can't >>       >>>>   afford >>       >>>>   >     to have is an immediate bind operation suddenly >> blocking >>       on a >>       >>>>   sparse >>       >>>>   >     operation which is blocked on a compute job that's >> going >>       to run >>       >>>>   for >>       >>>>   >     another 5ms. >>       >>>> >>       >>>>   As the VM_BIND queue is per VM, VM_BIND on one VM >> doesn't block >>       the >>       >>>>   VM_BIND >>       >>>>   on other VMs. I am not sure about usecases here, but just >>       wanted to >>       >>>>   clarify. >>       >>>> >>       >>>> Yes, that's what I would expect. >>       >>>> --Jason >>       >>>> >>       >>>>   Niranjana >>       >>>> >>       >>>>   >     For reference, Windows solves this by allowing >>       arbitrarily many >>       >>>>   paging >>       >>>>   >     queues (what they call a VM_BIND engine/queue).  That >>       >>>>design works >>       >>>>   >     pretty well and solves the problems in question. >>       >>>>Again, we could >>       >>>>   just >>       >>>>   >     make everything out-of-order and require using >> syncobjs >>       >>>>to order >>       >>>>   things >>       >>>>   >     as userspace wants. That'd be fine too. >>       >>>>   >     One more note while I'm here: danvet said >> something on >>       >>>>IRC about >>       >>>>   VM_BIND >>       >>>>   >     queues waiting for syncobjs to materialize.  We don't >>       really >>       >>>>   want/need >>       >>>>   >     this.  We already have all the machinery in >> userspace to >>       handle >>       >>>>   >     wait-before-signal and waiting for syncobj fences to >>       >>>>materialize >>       >>>>   and >>       >>>>   >     that machinery is on by default.  It would actually >>       >>>>take MORE work >>       >>>>   in >>       >>>>   >     Mesa to turn it off and take advantage of the kernel >>       >>>>being able to >>       >>>>   wait >>       >>>>   >     for syncobjs to materialize. Also, getting that >> right is >>       >>>>   ridiculously >>       >>>>   >     hard and I really don't want to get it wrong in >> kernel >>       >>>>space.     When we >>       >>>>   >     do memory fences, wait-before-signal will be a >> thing.  We >>       don't >>       >>>>   need to >>       >>>>   >     try and make it a thing for syncobj. >>       >>>>   >     --Jason >>       >>>>   > >>       >>>>   >   Thanks Jason, >>       >>>>   > >>       >>>>   >   I missed the bit in the Vulkan spec that we're >> allowed to >>       have a >>       >>>>   sparse >>       >>>>   >   queue that does not implement either graphics or >> compute >>       >>>>operations >>       >>>>   : >>       >>>>   > >>       >>>>   >     "While some implementations may include >>       >>>>   VK_QUEUE_SPARSE_BINDING_BIT >>       >>>>   >     support in queue families that also include >>       >>>>   > >>       >>>>   >      graphics and compute support, other >> implementations may >>       only >>       >>>>   expose a >>       >>>>   > VK_QUEUE_SPARSE_BINDING_BIT-only queue >>       >>>>   > >>       >>>>   >      family." >>       >>>>   > >>       >>>>   >   So it can all be all a vm_bind engine that just does >>       bind/unbind >>       >>>>   >   operations. >>       >>>>   > >>       >>>>   >   But yes we need another engine for the >> immediate/non-sparse >>       >>>>   operations. >>       >>>>   > >>       >>>>   >   -Lionel >>       >>>>   > >>       >>>>   >         > >>       >>>>   >       Daniel, any thoughts? >>       >>>>   > >>       >>>>   >       Niranjana >>       >>>>   > >>       >>>>   >       >Matt >>       >>>>   >       > >>       >>>>   >       >> >>       >>>>   >       >> Sorry I noticed this late. >>       >>>>   >       >> >>       >>>>   >       >> >>       >>>>   >       >> -Lionel >>       >>>>   >       >> >>       >>>>   >       >> From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id F3A1DC43334 for ; Fri, 10 Jun 2022 06:53:40 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id A532411AF9B; Fri, 10 Jun 2022 06:53:31 +0000 (UTC) Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by gabe.freedesktop.org (Postfix) with ESMTPS id 6279611AF9B; Fri, 10 Jun 2022 06:53:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1654844009; x=1686380009; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=1rFyQO9PMUbCGXn/DBESJSiVXfyZw4oym6wwclUsN2A=; b=hI57yYYrIl8SNYBFzB/y8cz89NpZBXkMuyEnhB4ZHhMm+taDM30lAcKP 2j1E2CCS4kMWG8XMGStCtc3/nD7/WrT4A+b0OecFoflEntNrXiXG8QyCn Iaqel1qoxWnnjoxSVbfsJEMUED1QI3anVC9WWRz/PDHtsUP3vRYsVPqOU 5cHhtso96JxUtp7e86jFTbdw/z0MknYC8o8bW/Mne6Hcnqyl+tvrnJ7lW jUyFrg6OiHUvmFLrTd3mSOEPr04IggI9FE6T5HlbRU8F3R05vwhaDp2XN rJyKUQuawF4EexgkYvFupIYisp1qO9DklIIjFZyf5CZTk3Y03hU97GP/h g==; X-IronPort-AV: E=McAfee;i="6400,9594,10373"; a="363866172" X-IronPort-AV: E=Sophos;i="5.91,288,1647327600"; d="scan'208";a="363866172" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Jun 2022 23:53:28 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.91,288,1647327600"; d="scan'208";a="533957626" Received: from linux.intel.com ([10.54.29.200]) by orsmga003.jf.intel.com with ESMTP; 09 Jun 2022 23:53:28 -0700 Received: from [10.249.155.179] (unknown [10.249.155.179]) by linux.intel.com (Postfix) with ESMTP id EAFC258010C; Thu, 9 Jun 2022 23:53:25 -0700 (PDT) Message-ID: <891017f7-e276-66a1-dd9b-cbebc8f8a00d@intel.com> Date: Fri, 10 Jun 2022 09:53:24 +0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.9.1 Content-Language: en-US To: Niranjana Vishwanathapura References: <20220603235148.GU4461@nvishwa1-DESK> <20220607181810.GV4461@nvishwa1-DESK> <20220607213209.GY4461@nvishwa1-DESK> <4be022cc-518e-49e1-96bd-b9720a313401@linux.intel.com> <20220608214431.GD4461@nvishwa1-DESK> <54fb6c28-7954-123e-edd6-ba6c15b6d36e@intel.com> <20220609193150.GG4461@nvishwa1-DESK> From: Lionel Landwerlin In-Reply-To: <20220609193150.GG4461@nvishwa1-DESK> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Subject: Re: [Intel-gfx] [RFC v3 1/3] drm/doc/rfc: VM_BIND feature design document X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Intel GFX , Maling list - DRI developers , Thomas Hellstrom , Chris Wilson , Daniel Vetter , =?UTF-8?Q?Christian_K=c3=b6nig?= Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" On 09/06/2022 22:31, Niranjana Vishwanathapura wrote: > On Thu, Jun 09, 2022 at 05:49:09PM +0300, Lionel Landwerlin wrote: >>   On 09/06/2022 00:55, Jason Ekstrand wrote: >> >>     On Wed, Jun 8, 2022 at 4:44 PM Niranjana Vishwanathapura >>     wrote: >> >>       On Wed, Jun 08, 2022 at 08:33:25AM +0100, Tvrtko Ursulin wrote: >>       > >>       > >>       >On 07/06/2022 22:32, Niranjana Vishwanathapura wrote: >>       >>On Tue, Jun 07, 2022 at 11:18:11AM -0700, Niranjana >> Vishwanathapura >>       wrote: >>       >>>On Tue, Jun 07, 2022 at 12:12:03PM -0500, Jason Ekstrand wrote: >>       >>>> On Fri, Jun 3, 2022 at 6:52 PM Niranjana Vishwanathapura >>       >>>> wrote: >>       >>>> >>       >>>>   On Fri, Jun 03, 2022 at 10:20:25AM +0300, Lionel Landwerlin >>       wrote: >>       >>>>   >   On 02/06/2022 23:35, Jason Ekstrand wrote: >>       >>>>   > >>       >>>>   >     On Thu, Jun 2, 2022 at 3:11 PM Niranjana >> Vishwanathapura >>       >>>>   > wrote: >>       >>>>   > >>       >>>>   >       On Wed, Jun 01, 2022 at 01:28:36PM -0700, Matthew >>       >>>>Brost wrote: >>       >>>>   >       >On Wed, Jun 01, 2022 at 05:25:49PM +0300, Lionel >>       Landwerlin >>       >>>>   wrote: >>       >>>>   >       >> On 17/05/2022 21:32, Niranjana Vishwanathapura >>       wrote: >>       >>>>   >       >> > +VM_BIND/UNBIND ioctl will immediately start >>       >>>>   binding/unbinding >>       >>>>   >       the mapping in an >>       >>>>   >       >> > +async worker. The binding and unbinding will >>       >>>>work like a >>       >>>>   special >>       >>>>   >       GPU engine. >>       >>>>   >       >> > +The binding and unbinding operations are >>       serialized and >>       >>>>   will >>       >>>>   >       wait on specified >>       >>>>   >       >> > +input fences before the operation and will >> signal >>       the >>       >>>>   output >>       >>>>   >       fences upon the >>       >>>>   >       >> > +completion of the operation. Due to >>       serialization, >>       >>>>   completion of >>       >>>>   >       an operation >>       >>>>   >       >> > +will also indicate that all previous >> operations >>       >>>>are also >>       >>>>   >       complete. >>       >>>>   >       >> >>       >>>>   >       >> I guess we should avoid saying "will immediately >>       start >>       >>>>   >       binding/unbinding" if >>       >>>>   >       >> there are fences involved. >>       >>>>   >       >> >>       >>>>   >       >> And the fact that it's happening in an async >>       >>>>worker seem to >>       >>>>   imply >>       >>>>   >       it's not >>       >>>>   >       >> immediate. >>       >>>>   >       >> >>       >>>>   > >>       >>>>   >       Ok, will fix. >>       >>>>   >       This was added because in earlier design binding >> was >>       deferred >>       >>>>   until >>       >>>>   >       next execbuff. >>       >>>>   >       But now it is non-deferred (immediate in that >> sense). >>       >>>>But yah, >>       >>>>   this is >>       >>>>   >       confusing >>       >>>>   >       and will fix it. >>       >>>>   > >>       >>>>   >       >> >>       >>>>   >       >> I have a question on the behavior of the bind >>       >>>>operation when >>       >>>>   no >>       >>>>   >       input fence >>       >>>>   >       >> is provided. Let say I do : >>       >>>>   >       >> >>       >>>>   >       >> VM_BIND (out_fence=fence1) >>       >>>>   >       >> >>       >>>>   >       >> VM_BIND (out_fence=fence2) >>       >>>>   >       >> >>       >>>>   >       >> VM_BIND (out_fence=fence3) >>       >>>>   >       >> >>       >>>>   >       >> >>       >>>>   >       >> In what order are the fences going to be >> signaled? >>       >>>>   >       >> >>       >>>>   >       >> In the order of VM_BIND ioctls? Or out of order? >>       >>>>   >       >> >>       >>>>   >       >> Because you wrote "serialized I assume it's : in >>       order >>       >>>>   >       >> >>       >>>>   > >>       >>>>   >       Yes, in the order of VM_BIND/UNBIND ioctls. Note >> that >>       >>>>bind and >>       >>>>   unbind >>       >>>>   >       will use >>       >>>>   >       the same queue and hence are ordered. >>       >>>>   > >>       >>>>   >       >> >>       >>>>   >       >> One thing I didn't realize is that because we >> only >>       get one >>       >>>>   >       "VM_BIND" engine, >>       >>>>   >       >> there is a disconnect from the Vulkan >> specification. >>       >>>>   >       >> >>       >>>>   >       >> In Vulkan VM_BIND operations are serialized but >>       >>>>per engine. >>       >>>>   >       >> >>       >>>>   >       >> So you could have something like this : >>       >>>>   >       >> >>       >>>>   >       >> VM_BIND (engine=rcs0, in_fence=fence1, >>       out_fence=fence2) >>       >>>>   >       >> >>       >>>>   >       >> VM_BIND (engine=ccs0, in_fence=fence3, >>       out_fence=fence4) >>       >>>>   >       >> >>       >>>>   >       >> >>       >>>>   >       >> fence1 is not signaled >>       >>>>   >       >> >>       >>>>   >       >> fence3 is signaled >>       >>>>   >       >> >>       >>>>   >       >> So the second VM_BIND will proceed before the >>       >>>>first VM_BIND. >>       >>>>   >       >> >>       >>>>   >       >> >>       >>>>   >       >> I guess we can deal with that scenario in >>       >>>>userspace by doing >>       >>>>   the >>       >>>>   >       wait >>       >>>>   >       >> ourselves in one thread per engines. >>       >>>>   >       >> >>       >>>>   >       >> But then it makes the VM_BIND input fences >> useless. >>       >>>>   >       >> >>       >>>>   >       >> >>       >>>>   >       >> Daniel : what do you think? Should be rework >> this or >>       just >>       >>>>   deal with >>       >>>>   >       wait >>       >>>>   >       >> fences in userspace? >>       >>>>   >       >> >>       >>>>   >       > >>       >>>>   >       >My opinion is rework this but make the ordering >> via >>       >>>>an engine >>       >>>>   param >>       >>>>   >       optional. >>       >>>>   >       > >>       >>>>   >       >e.g. A VM can be configured so all binds are >> ordered >>       >>>>within the >>       >>>>   VM >>       >>>>   >       > >>       >>>>   >       >e.g. A VM can be configured so all binds accept an >>       engine >>       >>>>   argument >>       >>>>   >       (in >>       >>>>   >       >the case of the i915 likely this is a gem context >>       >>>>handle) and >>       >>>>   binds >>       >>>>   >       >ordered with respect to that engine. >>       >>>>   >       > >>       >>>>   >       >This gives UMDs options as the later likely >> consumes >>       >>>>more KMD >>       >>>>   >       resources >>       >>>>   >       >so if a different UMD can live with binds being >>       >>>>ordered within >>       >>>>   the VM >>       >>>>   >       >they can use a mode consuming less resources. >>       >>>>   >       > >>       >>>>   > >>       >>>>   >       I think we need to be careful here if we are >> looking >>       for some >>       >>>>   out of >>       >>>>   >       (submission) order completion of vm_bind/unbind. >>       >>>>   >       In-order completion means, in a batch of binds and >>       >>>>unbinds to be >>       >>>>   >       completed in-order, user only needs to specify >>       >>>>in-fence for the >>       >>>>   >       first bind/unbind call and the our-fence for the >> last >>       >>>>   bind/unbind >>       >>>>   >       call. Also, the VA released by an unbind call >> can be >>       >>>>re-used by >>       >>>>   >       any subsequent bind call in that in-order batch. >>       >>>>   > >>       >>>>   >       These things will break if binding/unbinding >> were to >>       >>>>be allowed >>       >>>>   to >>       >>>>   >       go out of order (of submission) and user need to be >>       extra >>       >>>>   careful >>       >>>>   >       not to run into pre-mature triggereing of >> out-fence and >>       bind >>       >>>>   failing >>       >>>>   >       as VA is still in use etc. >>       >>>>   > >>       >>>>   >       Also, VM_BIND binds the provided mapping on the >>       specified >>       >>>>   address >>       >>>>   >       space >>       >>>>   >       (VM). So, the uapi is not engine/context specific. >>       >>>>   > >>       >>>>   >       We can however add a 'queue' to the uapi which >> can be >>       >>>>one from >>       >>>>   the >>       >>>>   >       pre-defined queues, >>       >>>>   >       I915_VM_BIND_QUEUE_0 >>       >>>>   >       I915_VM_BIND_QUEUE_1 >>       >>>>   >       ... >>       >>>>   >       I915_VM_BIND_QUEUE_(N-1) >>       >>>>   > >>       >>>>   >       KMD will spawn an async work queue for each >> queue which >>       will >>       >>>>   only >>       >>>>   >       bind the mappings on that queue in the order of >>       submission. >>       >>>>   >       User can assign the queue to per engine or anything >>       >>>>like that. >>       >>>>   > >>       >>>>   >       But again here, user need to be careful and not >>       >>>>deadlock these >>       >>>>   >       queues with circular dependency of fences. >>       >>>>   > >>       >>>>   >       I prefer adding this later an as extension based on >>       >>>>whether it >>       >>>>   >       is really helping with the implementation. >>       >>>>   > >>       >>>>   >     I can tell you right now that having everything on a >>       single >>       >>>>   in-order >>       >>>>   >     queue will not get us the perf we want.  What vulkan >>       >>>>really wants >>       >>>>   is one >>       >>>>   >     of two things: >>       >>>>   >      1. No implicit ordering of VM_BIND ops.  They just >>       happen in >>       >>>>   whatever >>       >>>>   >     their dependencies are resolved and we ensure >> ordering >>       >>>>ourselves >>       >>>>   by >>       >>>>   >     having a syncobj in the VkQueue. >>       >>>>   >      2. The ability to create multiple VM_BIND >> queues.  We >>       need at >>       >>>>   least 2 >>       >>>>   >     but I don't see why there needs to be a limit besides >>       >>>>the limits >>       >>>>   the >>       >>>>   >     i915 API already has on the number of engines.  >> Vulkan >>       could >>       >>>>   expose >>       >>>>   >     multiple sparse binding queues to the client if >> it's not >>       >>>>   arbitrarily >>       >>>>   >     limited. >>       >>>> >>       >>>>   Thanks Jason, Lionel. >>       >>>> >>       >>>>   Jason, what are you referring to when you say "limits >> the i915 >>       API >>       >>>>   already >>       >>>>   has on the number of engines"? I am not sure if there is >> such >>       an uapi >>       >>>>   today. >>       >>>> >>       >>>> There's a limit of something like 64 total engines today >> based on >>       the >>       >>>> number of bits we can cram into the exec flags in >> execbuffer2.  I >>       think >>       >>>> someone had an extended version that allowed more but I >> ripped it >>       out >>       >>>> because no one was using it.  Of course, execbuffer3 might >> not >>       >>>>have that >>       >>>> problem at all. >>       >>>> >>       >>> >>       >>>Thanks Jason. >>       >>>Ok, I am not sure which exec flag is that, but yah, execbuffer3 >>       probably >>       >>>will not have this limiation. So, we need to define a >>       VM_BIND_MAX_QUEUE >>       >>>and somehow export it to user (I am thinking of embedding it in >>       >>>I915_PARAM_HAS_VM_BIND. bits[0]->HAS_VM_BIND, bits[1-3]->'n' >>       meaning 2^n >>       >>>queues. >>       >> >>       >>Ah, I think you are waking about I915_EXEC_RING_MASK (0x3f) >> which >>       execbuf3 >> >>     Yup!  That's exactly the limit I was talking about. >> >>       >>will also have. So, we can simply define in vm_bind/unbind >>       structures, >>       >> >>       >>#define I915_VM_BIND_MAX_QUEUE   64 >>       >>        __u32 queue; >>       >> >>       >>I think that will keep things simple. >>       > >>       >Hmmm? What does execbuf2 limit has to do with how many engines >>       >hardware can have? I suggest not to do that. >>       > >>       >Change with added this: >>       > >>       >       if (set.num_engines > I915_EXEC_RING_MASK + 1) >>       >               return -EINVAL; >>       > >>       >To context creation needs to be undone and so let users create >> engine >>       >maps with all hardware engines, and let execbuf3 access them all. >>       > >> >>       Earlier plan was to carry I915_EXEC_RING_MAP (0x3f) to >> execbuff3 also. >>       Hence, I was using the same limit for VM_BIND queues (64, or 65 >> if we >>       make it N+1). >>       But, as discussed in other thread of this RFC series, we are >> planning >>       to drop this I915_EXEC_RING_MAP in execbuff3. So, there won't be >>       any uapi that limits the number of engines (and hence the vm_bind >>       queues >>       need to be supported). >> >>       If we leave the number of vm_bind queues to be arbitrarily large >>       (__u32 queue_idx) then, we need to have a hashmap for queue (a wq, >>       work_item and a linked list) lookup from the user specified queue >>       index. >>       Other option is to just put some hard limit (say 64 or 65) and use >>       an array of queues in VM (each created upon first use). I >> prefer this. >> >>     I don't get why a VM_BIND queue is any different from any other >> queue or >>     userspace-visible kernel object.  But I'll leave those details up to >>     danvet or whoever else might be reviewing the implementation. >>     --Jason >> >>   I kind of agree here. Wouldn't be simpler to have the bind queue >> created >>   like the others when we build the engine map? >> >>   For userspace it's then just matter of selecting the right queue ID >> when >>   submitting. >> >>   If there is ever a possibility to have this work on the GPU, it >> would be >>   all ready. >> > > I did sync offline with Matt Brost on this. > We can add a VM_BIND engine class and let user create VM_BIND engines > (queues). > The problem is, in i915 engine creating interface is bound to > gem_context. > So, in vm_bind ioctl, we would need both context_id and queue_idx for > proper > lookup of the user created engine. This is bit ackward as vm_bind is an > interface to VM (address space) and has nothing to do with gem_context. A gem_context has a single vm object right? Set through I915_CONTEXT_PARAM_VM at creation or given a default one if not. So it's just like picking up the vm like it's done at execbuffer time right now : eb->context->vm > Another problem is, if two VMs are binding with the same defined engine, > binding on VM1 can get unnecessary blocked by binding on VM2 (which > may be > waiting on its in_fence). Maybe I'm missing something, but how can you have 2 vm objects with a single gem_context right now? > > So, my preference here is to just add a 'u32 queue' index in > vm_bind/unbind > ioctl, and the queues are per VM. > > Niranjana > >>   Thanks, >> >>   -Lionel >> >> >>       Niranjana >> >>       >Regards, >>       > >>       >Tvrtko >>       > >>       >> >>       >>Niranjana >>       >> >>       >>> >>       >>>>   I am trying to see how many queues we need and don't >> want it to >>       be >>       >>>>   arbitrarily >>       >>>>   large and unduely blow up memory usage and complexity in >> i915 >>       driver. >>       >>>> >>       >>>> I expect a Vulkan driver to use at most 2 in the vast >> majority >>       >>>>of cases. I >>       >>>> could imagine a client wanting to create more than 1 sparse >>       >>>>queue in which >>       >>>> case, it'll be N+1 but that's unlikely. As far as complexity >>       >>>>goes, once >>       >>>> you allow two, I don't think the complexity is going up by >>       >>>>allowing N.  As >>       >>>> for memory usage, creating more queues means more memory.  >> That's >>       a >>       >>>> trade-off that userspace can make. Again, the expected number >>       >>>>here is 1 >>       >>>> or 2 in the vast majority of cases so I don't think you >> need to >>       worry. >>       >>> >>       >>>Ok, will start with n=3 meaning 8 queues. >>       >>>That would require us create 8 workqueues. >>       >>>We can change 'n' later if required. >>       >>> >>       >>>Niranjana >>       >>> >>       >>>> >>       >>>>   >     Why?  Because Vulkan has two basic kind of bind >>       >>>>operations and we >>       >>>>   don't >>       >>>>   >     want any dependencies between them: >>       >>>>   >      1. Immediate.  These happen right after BO >> creation or >>       >>>>maybe as >>       >>>>   part of >>       >>>>   >     vkBindImageMemory() or VkBindBufferMemory().  These >>       >>>>don't happen >>       >>>>   on a >>       >>>>   >     queue and we don't want them serialized with >> anything.       To >>       >>>>   synchronize >>       >>>>   >     with submit, we'll have a syncobj in the VkDevice >> which >>       is >>       >>>>   signaled by >>       >>>>   >     all immediate bind operations and make submits >> wait on >>       it. >>       >>>>   >      2. Queued (sparse): These happen on a VkQueue >> which may >>       be the >>       >>>>   same as >>       >>>>   >     a render/compute queue or may be its own queue.  >> It's up >>       to us >>       >>>>   what we >>       >>>>   >     want to advertise.  From the Vulkan API PoV, this >> is like >>       any >>       >>>>   other >>       >>>>   >     queue.  Operations on it wait on and signal >> semaphores.       If we >>       >>>>   have a >>       >>>>   >     VM_BIND engine, we'd provide syncobjs to wait and >>       >>>>signal just like >>       >>>>   we do >>       >>>>   >     in execbuf(). >>       >>>>   >     The important thing is that we don't want one type of >>       >>>>operation to >>       >>>>   block >>       >>>>   >     on the other.  If immediate binds are blocking on >> sparse >>       binds, >>       >>>>   it's >>       >>>>   >     going to cause over-synchronization issues. >>       >>>>   >     In terms of the internal implementation, I know that >>       >>>>there's going >>       >>>>   to be >>       >>>>   >     a lock on the VM and that we can't actually do these >>       things in >>       >>>>   >     parallel.  That's fine.  Once the dma_fences have >>       signaled and >>       >>>>   we're >>       >>>> >>       >>>>   Thats correct. It is like a single VM_BIND engine with >>       >>>>multiple queues >>       >>>>   feeding to it. >>       >>>> >>       >>>> Right.  As long as the queues themselves are independent and >>       >>>>can block on >>       >>>> dma_fences without holding up other queues, I think we're >> fine. >>       >>>> >>       >>>>   >     unblocked to do the bind operation, I don't care if >>       >>>>there's a bit >>       >>>>   of >>       >>>>   >     synchronization due to locking.  That's expected.  >> What >>       >>>>we can't >>       >>>>   afford >>       >>>>   >     to have is an immediate bind operation suddenly >> blocking >>       on a >>       >>>>   sparse >>       >>>>   >     operation which is blocked on a compute job that's >> going >>       to run >>       >>>>   for >>       >>>>   >     another 5ms. >>       >>>> >>       >>>>   As the VM_BIND queue is per VM, VM_BIND on one VM >> doesn't block >>       the >>       >>>>   VM_BIND >>       >>>>   on other VMs. I am not sure about usecases here, but just >>       wanted to >>       >>>>   clarify. >>       >>>> >>       >>>> Yes, that's what I would expect. >>       >>>> --Jason >>       >>>> >>       >>>>   Niranjana >>       >>>> >>       >>>>   >     For reference, Windows solves this by allowing >>       arbitrarily many >>       >>>>   paging >>       >>>>   >     queues (what they call a VM_BIND engine/queue).  That >>       >>>>design works >>       >>>>   >     pretty well and solves the problems in question. >>       >>>>Again, we could >>       >>>>   just >>       >>>>   >     make everything out-of-order and require using >> syncobjs >>       >>>>to order >>       >>>>   things >>       >>>>   >     as userspace wants. That'd be fine too. >>       >>>>   >     One more note while I'm here: danvet said >> something on >>       >>>>IRC about >>       >>>>   VM_BIND >>       >>>>   >     queues waiting for syncobjs to materialize.  We don't >>       really >>       >>>>   want/need >>       >>>>   >     this.  We already have all the machinery in >> userspace to >>       handle >>       >>>>   >     wait-before-signal and waiting for syncobj fences to >>       >>>>materialize >>       >>>>   and >>       >>>>   >     that machinery is on by default.  It would actually >>       >>>>take MORE work >>       >>>>   in >>       >>>>   >     Mesa to turn it off and take advantage of the kernel >>       >>>>being able to >>       >>>>   wait >>       >>>>   >     for syncobjs to materialize. Also, getting that >> right is >>       >>>>   ridiculously >>       >>>>   >     hard and I really don't want to get it wrong in >> kernel >>       >>>>space.     When we >>       >>>>   >     do memory fences, wait-before-signal will be a >> thing.  We >>       don't >>       >>>>   need to >>       >>>>   >     try and make it a thing for syncobj. >>       >>>>   >     --Jason >>       >>>>   > >>       >>>>   >   Thanks Jason, >>       >>>>   > >>       >>>>   >   I missed the bit in the Vulkan spec that we're >> allowed to >>       have a >>       >>>>   sparse >>       >>>>   >   queue that does not implement either graphics or >> compute >>       >>>>operations >>       >>>>   : >>       >>>>   > >>       >>>>   >     "While some implementations may include >>       >>>>   VK_QUEUE_SPARSE_BINDING_BIT >>       >>>>   >     support in queue families that also include >>       >>>>   > >>       >>>>   >      graphics and compute support, other >> implementations may >>       only >>       >>>>   expose a >>       >>>>   > VK_QUEUE_SPARSE_BINDING_BIT-only queue >>       >>>>   > >>       >>>>   >      family." >>       >>>>   > >>       >>>>   >   So it can all be all a vm_bind engine that just does >>       bind/unbind >>       >>>>   >   operations. >>       >>>>   > >>       >>>>   >   But yes we need another engine for the >> immediate/non-sparse >>       >>>>   operations. >>       >>>>   > >>       >>>>   >   -Lionel >>       >>>>   > >>       >>>>   >         > >>       >>>>   >       Daniel, any thoughts? >>       >>>>   > >>       >>>>   >       Niranjana >>       >>>>   > >>       >>>>   >       >Matt >>       >>>>   >       > >>       >>>>   >       >> >>       >>>>   >       >> Sorry I noticed this late. >>       >>>>   >       >> >>       >>>>   >       >> >>       >>>>   >       >> -Lionel >>       >>>>   >       >> >>       >>>>   >       >>