From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 785F1C433EF for ; Wed, 8 Jun 2022 07:35:42 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id ED2F310E893; Wed, 8 Jun 2022 07:35:40 +0000 (UTC) Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by gabe.freedesktop.org (Postfix) with ESMTPS id 0E08610E7BA; Wed, 8 Jun 2022 07:35:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1654673739; x=1686209739; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=YR6JvwpnvYuFi87DdIA27uP5EJkLAEefO5tjp6ithmQ=; b=oH+9fx9Jgjena+B15eQroituS4/ouLJZnAbbzk6WxwiU27aFrhGl7Dyi AMK1uW0zEhMPkf1Bre7yHltR20kaeBQ/KAo7iWM3asxEDRuIDVT+3ioiK rrjhHT80SowtVRDZSiP23QoHn6LAkvkUDKRH6ww2QSL6eUiCatzrN5Xkj vEedHRVBp/vbf1fbZmvxpf/U5selsqF8cfwqRacCajZR7BiTwKgdGPF6f HrFyFJmTnm11WnXzi5LHIIfnuGJ/2/awdRjsGtcAL7rRmODeUjv1xNeQs bjXuC2/6CuBcPnrOU+iYpwAj8hwqK6sfC0jF+l6RvaIbcZMHqAjxIdcxN g==; X-IronPort-AV: E=McAfee;i="6400,9594,10371"; a="277637856" X-IronPort-AV: E=Sophos;i="5.91,285,1647327600"; d="scan'208";a="277637856" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Jun 2022 00:34:44 -0700 X-IronPort-AV: E=Sophos;i="5.91,285,1647327600"; d="scan'208";a="907520367" Received: from jking17-mobl.ger.corp.intel.com (HELO [10.213.193.156]) ([10.213.193.156]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Jun 2022 00:34:38 -0700 Message-ID: Date: Wed, 8 Jun 2022 08:34:36 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.9.1 Subject: Re: [Intel-gfx] [RFC v3 3/3] drm/doc/rfc: VM_BIND uapi definition Content-Language: en-US To: Niranjana Vishwanathapura References: <20220517183212.20274-1-niranjana.vishwanathapura@intel.com> <20220517183212.20274-4-niranjana.vishwanathapura@intel.com> <20220523191943.GH4461@nvishwa1-DESK> <20220602050833.GP4461@nvishwa1-DESK> <20220603065330.GT4461@nvishwa1-DESK> <20220607212552.GX4461@nvishwa1-DESK> From: Tvrtko Ursulin Organization: Intel Corporation UK Plc In-Reply-To: <20220607212552.GX4461@nvishwa1-DESK> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: "Zanoni, Paulo R" , "intel-gfx@lists.freedesktop.org" , "dri-devel@lists.freedesktop.org" , "Hellstrom, Thomas" , "Wilson, Chris P" , "Vetter, Daniel" , "christian.koenig@amd.com" Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" On 07/06/2022 22:25, Niranjana Vishwanathapura wrote: > On Tue, Jun 07, 2022 at 11:42:08AM +0100, Tvrtko Ursulin wrote: >> >> On 03/06/2022 07:53, Niranjana Vishwanathapura wrote: >>> On Wed, Jun 01, 2022 at 10:08:35PM -0700, Niranjana Vishwanathapura >>> wrote: >>>> On Wed, Jun 01, 2022 at 11:27:17AM +0200, Daniel Vetter wrote: >>>>> On Wed, 1 Jun 2022 at 11:03, Dave Airlie wrote: >>>>>> >>>>>> On Tue, 24 May 2022 at 05:20, Niranjana Vishwanathapura >>>>>> wrote: >>>>>>> >>>>>>> On Thu, May 19, 2022 at 04:07:30PM -0700, Zanoni, Paulo R wrote: >>>>>>>> On Tue, 2022-05-17 at 11:32 -0700, Niranjana Vishwanathapura wrote: >>>>>>>>> VM_BIND and related uapi definitions >>>>>>>>> >>>>>>>>> v2: Ensure proper kernel-doc formatting with cross references. >>>>>>>>>      Also add new uapi and documentation as per review comments >>>>>>>>>      from Daniel. >>>>>>>>> >>>>>>>>> Signed-off-by: Niranjana Vishwanathapura >>>>>>> >>>>>>>>> --- >>>>>>>>>   Documentation/gpu/rfc/i915_vm_bind.h | 399 >>>>>>> +++++++++++++++++++++++++++ >>>>>>>>>   1 file changed, 399 insertions(+) >>>>>>>>>   create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h >>>>>>>>> >>>>>>>>> diff --git a/Documentation/gpu/rfc/i915_vm_bind.h >>>>>>> b/Documentation/gpu/rfc/i915_vm_bind.h >>>>>>>>> new file mode 100644 >>>>>>>>> index 000000000000..589c0a009107 >>>>>>>>> --- /dev/null >>>>>>>>> +++ b/Documentation/gpu/rfc/i915_vm_bind.h >>>>>>>>> @@ -0,0 +1,399 @@ >>>>>>>>> +/* SPDX-License-Identifier: MIT */ >>>>>>>>> +/* >>>>>>>>> + * Copyright © 2022 Intel Corporation >>>>>>>>> + */ >>>>>>>>> + >>>>>>>>> +/** >>>>>>>>> + * DOC: I915_PARAM_HAS_VM_BIND >>>>>>>>> + * >>>>>>>>> + * VM_BIND feature availability. >>>>>>>>> + * See typedef drm_i915_getparam_t param. >>>>>>>>> + */ >>>>>>>>> +#define I915_PARAM_HAS_VM_BIND               57 >>>>>>>>> + >>>>>>>>> +/** >>>>>>>>> + * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND >>>>>>>>> + * >>>>>>>>> + * Flag to opt-in for VM_BIND mode of binding during VM creation. >>>>>>>>> + * See struct drm_i915_gem_vm_control flags. >>>>>>>>> + * >>>>>>>>> + * A VM in VM_BIND mode will not support the older >>>>>>> execbuff mode of binding. >>>>>>>>> + * In VM_BIND mode, execbuff ioctl will not accept any >>>>>>> execlist (ie., the >>>>>>>>> + * &drm_i915_gem_execbuffer2.buffer_count must be 0). >>>>>>>>> + * Also, &drm_i915_gem_execbuffer2.batch_start_offset and >>>>>>>>> + * &drm_i915_gem_execbuffer2.batch_len must be 0. >>>>>>>>> + * DRM_I915_GEM_EXECBUFFER_EXT_BATCH_ADDRESSES extension >>>>>>> must be provided >>>>>>>>> + * to pass in the batch buffer addresses. >>>>>>>>> + * >>>>>>>>> + * Additionally, I915_EXEC_NO_RELOC, I915_EXEC_HANDLE_LUT and >>>>>>>>> + * I915_EXEC_BATCH_FIRST of >>>>>>> &drm_i915_gem_execbuffer2.flags must be 0 >>>>>>>>> + * (not used) in VM_BIND mode. I915_EXEC_USE_EXTENSIONS >>>>>>> flag must always be >>>>>>>>> + * set (See struct drm_i915_gem_execbuffer_ext_batch_addresses). >>>>>>>>> + * The buffers_ptr, buffer_count, batch_start_offset and >>>>>>> batch_len fields >>>>>>>>> + * of struct drm_i915_gem_execbuffer2 are also not used >>>>>>> and must be 0. >>>>>>>>> + */ >>>>>>>> >>>>>>>> From that description, it seems we have: >>>>>>>> >>>>>>>> struct drm_i915_gem_execbuffer2 { >>>>>>>>         __u64 buffers_ptr;              -> must be 0 (new) >>>>>>>>         __u32 buffer_count;             -> must be 0 (new) >>>>>>>>         __u32 batch_start_offset;       -> must be 0 (new) >>>>>>>>         __u32 batch_len;                -> must be 0 (new) >>>>>>>>         __u32 DR1;                      -> must be 0 (old) >>>>>>>>         __u32 DR4;                      -> must be 0 (old) >>>>>>>>         __u32 num_cliprects; (fences)   -> must be 0 since >>>>>>> using extensions >>>>>>>>         __u64 cliprects_ptr; (fences, extensions) -> >>>>>>> contains an actual pointer! >>>>>>>>         __u64 flags;                    -> some flags must be 0 >>>>>>>> (new) >>>>>>>>         __u64 rsvd1; (context info)     -> repurposed field (old) >>>>>>>>         __u64 rsvd2;                    -> unused >>>>>>>> }; >>>>>>>> >>>>>>>> Based on that, why can't we just get drm_i915_gem_execbuffer3 >>>>>>>> instead >>>>>>>> of adding even more complexity to an already abused interface? >>>>>>>> While >>>>>>>> the Vulkan-like extension thing is really nice, I don't think what >>>>>>>> we're doing here is extending the ioctl usage, we're completely >>>>>>>> changing how the base struct should be interpreted based on >>>>>>> how the VM >>>>>>>> was created (which is an entirely different ioctl). >>>>>>>> >>>>>>>> From Rusty Russel's API Design grading, drm_i915_gem_execbuffer2 is >>>>>>>> already at -6 without these changes. I think after vm_bind we'll >>>>>>>> need >>>>>>>> to create a -11 entry just to deal with this ioctl. >>>>>>>> >>>>>>> >>>>>>> The only change here is removing the execlist support for VM_BIND >>>>>>> mode (other than natual extensions). >>>>>>> Adding a new execbuffer3 was considered, but I think we need to >>>>>>> be careful >>>>>>> with that as that goes beyond the VM_BIND support, including any >>>>>>> future >>>>>>> requirements (as we don't want an execbuffer4 after VM_BIND). >>>>>> >>>>>> Why not? it's not like adding extensions here is really that >>>>>> different >>>>>> than adding new ioctls. >>>>>> >>>>>> I definitely think this deserves an execbuffer3 without even >>>>>> considering future requirements. Just  to burn down the old >>>>>> requirements and pointless fields. >>>>>> >>>>>> Make execbuffer3 be vm bind only, no relocs, no legacy bits, leave >>>>>> the >>>>>> older sw on execbuf2 for ever. >>>>> >>>>> I guess another point in favour of execbuf3 would be that it's less >>>>> midlayer. If we share the entry point then there's quite a few vfuncs >>>>> needed to cleanly split out the vm_bind paths from the legacy >>>>> reloc/softping paths. >>>>> >>>>> If we invert this and do execbuf3, then there's the existing ioctl >>>>> vfunc, and then we share code (where it even makes sense, probably >>>>> request setup/submit need to be shared, anything else is probably >>>>> cleaner to just copypaste) with the usual helper approach. >>>>> >>>>> Also that would guarantee that really none of the old concepts like >>>>> i915_active on the vma or vma open counts and all that stuff leaks >>>>> into the new vm_bind execbuf. >>>>> >>>>> Finally I also think that copypasting would make backporting easier, >>>>> or at least more flexible, since it should make it easier to have the >>>>> upstream vm_bind co-exist with all the other things we have. Without >>>>> huge amounts of conflicts (or at least much less) that pushing a pile >>>>> of vfuncs into the existing code would cause. >>>>> >>>>> So maybe we should do this? >>>> >>>> Thanks Dave, Daniel. >>>> There are a few things that will be common between execbuf2 and >>>> execbuf3, like request setup/submit (as you said), fence handling >>>> (timeline fences, fence array, composite fences), engine selection, >>>> etc. Also, many of the 'flags' will be there in execbuf3 also (but >>>> bit position will differ). >>>> But I guess these should be fine as the suggestion here is to >>>> copy-paste the execbuff code and having a shared code where possible. >>>> Besides, we can stop supporting some older feature in execbuff3 >>>> (like fence array in favor of newer timeline fences), which will >>>> further reduce common code. >>>> >>>> Ok, I will update this series by adding execbuf3 and send out soon. >>>> >>> >>> Does this sound reasonable? >>> >>> struct drm_i915_gem_execbuffer3 { >>>        __u32 ctx_id;        /* previously execbuffer2.rsvd1 */ >>> >>>        __u32 batch_count; >>>        __u64 batch_addr_ptr;    /* Pointer to an array of batch gpu >>> virtual addresses */ >> >> Casual stumble upon.. >> >> Alternatively you could embed N pointers to make life a bit easier for >> both userspace and kernel side. Yes, but then "N batch buffers should >> be enough for everyone" problem.. :) >> > > Thanks Tvrtko, > Yes, hence the batch_addr_ptr. Right, but then userspace has to allocate a separate buffer and kernel has to access it separately from a single copy_from_user. Pros and cons of "this many batches should be enough for everyone" versus the extra operations. Hmm.. for the common case of one batch - you could define the uapi to say if batch_count is one then pointer is GPU VA to the batch itself, not a pointer to userspace array of GPU VA? Regards, Tvrtko >>>        __u64 flags; >>> #define I915_EXEC3_RING_MASK              (0x3f) >>> #define I915_EXEC3_DEFAULT                (0<<0) >>> #define I915_EXEC3_RENDER                 (1<<0) >>> #define I915_EXEC3_BSD                    (2<<0) >>> #define I915_EXEC3_BLT                    (3<<0) >>> #define I915_EXEC3_VEBOX                  (4<<0) >>> >>> #define I915_EXEC3_SECURE               (1<<6) >>> #define I915_EXEC3_IS_PINNED            (1<<7) >>> >>> #define I915_EXEC3_BSD_SHIFT     (8) >>> #define I915_EXEC3_BSD_MASK      (3 << I915_EXEC3_BSD_SHIFT) >>> #define I915_EXEC3_BSD_DEFAULT   (0 << I915_EXEC3_BSD_SHIFT) >>> #define I915_EXEC3_BSD_RING1     (1 << I915_EXEC3_BSD_SHIFT) >>> #define I915_EXEC3_BSD_RING2     (2 << I915_EXEC3_BSD_SHIFT) >> >> I'd suggest legacy engine selection is unwanted, especially not with >> the convoluted BSD1/2 flags. Can we just require context with engine >> map and index? Or if default context has to be supported then I'd >> suggest ...class_instance for that mode. >> > > Ok, I will be happy to remove it and only support contexts with > engine map, if UMDs agree on that. > >>> #define I915_EXEC3_FENCE_IN             (1<<10) >>> #define I915_EXEC3_FENCE_OUT            (1<<11) >>> #define I915_EXEC3_FENCE_SUBMIT         (1<<12) >> >> People are likely to object to submit fence since generic mechanism to >> align submissions was rejected. >> > > Ok, again, I can remove it if UMDs are ok with it. > >>> >>>        __u64 in_out_fence;        /* previously execbuffer2.rsvd2 */ >> >> New ioctl you can afford dedicated fields. >> > > Yes, but as I asked below, I am not sure if we need this or the > timeline fence arry extension we have is good enough. > >> In any case I suggest you involve UMD folks in designing it. >> > > Yah. > Paulo, Lionel, Jason, Daniel, can you comment on these regarding > what will UMD need in execbuf3 and what can be removed? > > Thanks, > Niranjana > >> Regards, >> >> Tvrtko >> >>> >>>        __u64 extensions;        /* currently only for >>> DRM_I915_GEM_EXECBUFFER_EXT_TIMELINE_FENCES */ >>> }; >>> >>> With this, user can pass in batch addresses and count directly, >>> instead of as an extension (as this rfc series was proposing). >>> >>> I have removed many of the flags which were either legacy or not >>> applicable to BM_BIND mode. >>> I have also removed fence array support (execbuffer2.cliprects_ptr) >>> as we have timeline fence array support. Is that fine? >>> Do we still need FENCE_IN/FENCE_OUT/FENCE_SUBMIT support? >>> >>> Any thing else needs to be added or removed? >>> >>> Niranjana >>> >>>> Niranjana >>>> >>>>> -Daniel >>>>> -- >>>>> Daniel Vetter >>>>> Software Engineer, Intel Corporation >>>>> http://blog.ffwll.ch