Re: [RFC] drm/i915: store all mmio bases in intel_engines

From: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
To: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>,
	intel-gfx@lists.freedesktop.org
Subject: Re: [RFC] drm/i915: store all mmio bases in intel_engines
Date: Mon, 12 Mar 2018 14:05:49 -0700	[thread overview]
Message-ID: <db4b09e5-bde6-e656-91e7-684a0414bd87@intel.com> (raw)
In-Reply-To: <336af63a-e0e3-d37b-5534-d9590cc9f75c@intel.com>

On 12/03/18 10:48, Daniele Ceraolo Spurio wrote:
> 
> 
> On 12/03/18 04:04, Tvrtko Ursulin wrote:
>>
>> On 09/03/2018 19:44, Tvrtko Ursulin wrote:
>>>
>>> On 09/03/2018 18:47, Daniele Ceraolo Spurio wrote:
>>>> On 09/03/18 01:53, Tvrtko Ursulin wrote:
>>>>>
>>>>> On 08/03/2018 18:46, Daniele Ceraolo Spurio wrote:
>>>>>> On 08/03/18 01:31, Tvrtko Ursulin wrote:
>>>>>>>
>>>>>>> On 07/03/2018 19:45, Daniele Ceraolo Spurio wrote:
>>>>>>>> The mmio bases we're currently storing in the intel_engines 
>>>>>>>> array are
>>>>>>>> only valid for a subset of gens, so we need to ignore them and use
>>>>>>>> different values in some cases. Instead of doing that, we can 
>>>>>>>> have a
>>>>>>>> table of [starting gen, mmio base] pairs for each engine in
>>>>>>>> intel_engines and select the correct one based on the gen we're 
>>>>>>>> running
>>>>>>>> on in a consistent way.
>>>>>>>>
>>>>>>>> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
>>>>>>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>>>>> Signed-off-by: Daniele Ceraolo Spurio 
>>>>>>>> <daniele.ceraolospurio@intel.com>
>>>>>>>> ---
>>>>>>>>   drivers/gpu/drm/i915/intel_engine_cs.c  | 77 
>>>>>>>> +++++++++++++++++++++------------
>>>>>>>>   drivers/gpu/drm/i915/intel_ringbuffer.c |  1 -
>>>>>>>>   2 files changed, 49 insertions(+), 29 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c 
>>>>>>>> b/drivers/gpu/drm/i915/intel_engine_cs.c
>>>>>>>> index 4ba139c27fba..1dd92cac8d67 100644
>>>>>>>> --- a/drivers/gpu/drm/i915/intel_engine_cs.c
>>>>>>>> +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
>>>>>>>> @@ -81,12 +81,16 @@ static const struct engine_class_info 
>>>>>>>> intel_engine_classes[] = {
>>>>>>>>       },
>>>>>>>>   };
>>>>>>>> +#define MAX_MMIO_BASES 3
>>>>>>>>   struct engine_info {
>>>>>>>>       unsigned int hw_id;
>>>>>>>>       unsigned int uabi_id;
>>>>>>>>       u8 class;
>>>>>>>>       u8 instance;
>>>>>>>> -    u32 mmio_base;
>>>>>>>> +    struct engine_mmio_base {
>>>>>>>> +        u32 gen : 8;
>>>>>>>> +        u32 base : 24;
>>>>>>>> +    } mmio_bases[MAX_MMIO_BASES];
>>>>>>>>       unsigned irq_shift;
>>>>>>>>   };
>>>>>>>
>>>>>>> It's not bad, but I am just wondering if it is too complicated 
>>>>>>> versus simply duplicating the tables.
>>>>>>>
>>>>>>> Duplicated tables would certainly be much less code and 
>>>>>>> complexity, but what about the size of pure data?
>>>>>>>
>>>>>>> In this patch sizeof(struct engine_info) grows by MAX_MMIO_BASES 
>>>>>>> * sizeof(u32), so 12, to the total of 30 if I am not mistaken. 
>>>>>>> Times NUM_ENGINES equals 240 bytes for intel_engines[] array with 
>>>>>>> this scheme.
>>>>>>>
>>>>>>
>>>>>> we remove a u32 (the old mmio base) so we only grow 8 per engine, 
>>>>>> but the compiler rounds up to a multiple of u32 so 28 per engine, 
>>>>>> for a total of 224.
>>>>>>
>>>>>>> Separate tables per gens would be:
>>>>>>>
>>>>>>> sizeof(struct engine_info) is 18 bytes.
>>>>>>>
>>>>>>> For < gen6 we'd need 3 engines * 18 = 54
>>>>>>> < gen11 = 5 engines * 18 = 90
>>>>>>>  >= gen11 = 8 engines * 18 = 144
>>>>>>>
>>>>>>> 54 + 90 + 144 = 288 bytes
>>>>>>>
>>>>>>> So a little bit bigger. Hm. Plus we would need to refactor so 
>>>>>>> intel_engines[] is not indexed by engine->id but is contiguous 
>>>>>>> array with engine->id in the elements. Code to lookup the compact 
>>>>>>> array should be simpler than combined new code from this patch, 
>>>>>>> especially if you add the test as Chris suggested. So separate 
>>>>>>> engine info arrays would win I think overall - when considering 
>>>>>>> size of text + size of data + size of source code.
>>>>>>>
>>>>>>
>>>>>> Not sure. I would exclude the selftest, which is usually not 
>>>>>> compiled for released kernels, which leads to:
>>>>>
>>>>> Yes, but we cannot exclude its source since selftests for separate 
>>>>> tables wouldn't be needed.
>>>>>
>>>>>> add/remove: 0/0 grow/shrink: 3/1 up/down: 100/-7 (93)
>>>>>> Function                                     old     new   delta
>>>>>> intel_engines                                160     224     +64
>>>>>> __func__                                   13891   13910     +19
>>>>>> intel_engines_init_mmio                     1247    1264     +17
>>>>>> intel_init_bsd_ring_buffer                   142     135      -7
>>>>>> Total: Before=1479626, After=1479719, chg +0.01%
>>>>>>
>>>>>> Total growth is 93, which is less then your estimation for the 
>>>>>> growth introduced by replicating the table.
>>>>>>
>>>>>>> But on the other hand your solution might be more future proof. 
>>>>>>> So I don't know. Use the crystal ball to decide? :)
>>>>>>>
>>>>>>
>>>>>> I think we should expect that the total engine count could grow 
>>>>>> with future gens. In this case to me adding a new entry to a 
>>>>>> unified table seems much cleaner (and uses less data) than adding 
>>>>>> a completely new table each time.
>>>>>
>>>>> Okay I was off in my estimates. But in general I was coming from 
>>>>> the angle of thinking that every new mmio base difference in this 
>>>>> scheme grows the size for all engines. So if just VCS grows one new 
>>>>> base, impact is multiplied by NUM_ENGINES. But maybe separate 
>>>>> tables are also not a solution. Perhaps pulling out mmio_base 
>>>>> arrays to outside struct engine_info in another set of static 
>>>>> tables, so they could be null terminated would be best of both worlds?
>>>>>
>>>>> struct engine_mmio_base {
>>>>>      u32 gen : 8;
>>>>>      u32 base : 24;
>>>>> };
>>>>>
>>>>> static const struct engine_mmio_base vcs0_mmio_bases[] = {
>>>>>      { .gen = 11, .base = GEN11_BSD_RING_BASE },
>>>>>      { .gen = 6, .base = GEN6_BSD_RING_BASE },
>>>>>      { .gen = 4, .base = BSD_RING_BASE },
>>>>>      { },
>>>>> };
>>>>>
>>>>> And then in intel_engines array, for BSD:
>>>>>
>>>>>     {
>>>>>      ...
>>>>>      .mmio_bases = &vcs0_mmio_bases;
>>>>>      ...
>>>>>     },
>>>>>
>>>>> This way we limit data growth only to engines which change their 
>>>>> mmio bases frequently.
>>>>>
>>>>> Just an idea.
>>>>>
>>>>
>>>> I gave this a try and the code actually grows:
>>>>
>>>> add/remove: 8/0 grow/shrink: 2/0 up/down: 120/0 (120)
>>>> Function                                     old     new   delta
>>>> intel_engines                                224     256     +32
>>>> vcs0_mmio_bases                                -      16     +16
>>>> vecs1_mmio_bases                               -      12     +12
>>>> vecs0_mmio_bases                               -      12     +12
>>>> vcs1_mmio_bases                                -      12     +12
>>>> vcs3_mmio_bases                                -       8      +8
>>>> vcs2_mmio_bases                                -       8      +8
>>>> rcs_mmio_bases                                 -       8      +8
>>>> intel_engines_init_mmio                     1264    1272      +8
>>>> bcs_mmio_bases                                 -       4      +4
>>>> Total: Before=1479719, After=1479839, chg +0.01%
>>>>
>>>> I have no idea what the compiler is doing to grow intel_engines, 
>>>> since by replacing and array of 3 u32s with a pointer I would expect 
>>>> a shrink of 4 bytes per-engine. Anyway, even without the compiler 
>>>> weirdness with
>>>
>>> It probably has to align the pointer to 8 bytes so that creates a 
>>> hole. Moving mmio_base pointer higher up, either to the top or after 
>>> two unsigned ints should work. Or packing the struct.
>>>
>>>> this method we would grow the code size of at least 4 bytes per 
>>>> engine because we replace an array of 3 u32 (12B) with a pointer 
>>>> (8B) and an array of at 2 or more u32 (8B+). I guess we can 
>>>> reconsider if/when one engine reaches more than 3 mmio bases.
>>>
>>> Yeah it's fine. I was thinking that since you are almost there it 
>>> makes sense to future proof it more, since no one will probably 
>>> remember it later. But OK.
>>
>> One idea to compact more, in addition to avoiding alignment holes, 
>> could be to store the engine_mmio_base directly in the 
>> engine_mmio_base pointer when it is a single entry, othewrise use a 
>> pointer to null terminated array. Actually we could store two without 
>> table indirection to be most optimal on 64-bit. Something like:
>>
>> struct engine_mmio_base {
>>      u32 base : 24; /* Must be LSB, */
>>      u32 gen : 8;
>> };
>>
>> union engine_mmio {
>>      struct engine_mmio_base mmio_base[2];
>>      struct engine_mmio_base *mmio_base_list;
>> };
>>
>> #define MMIO_PTR_LIST BIT(0)
>>
>>     {
>>      ... render engine ...
>>      .mmio = { (struct engine_mmio_base){.base = ..., ..gen = ...} },
>>      ...
>>     },
>>     {
>>      ...
>>      .mmio = { .mmio_base_list = &vcs0_mmio_bases | MMIO_PTR_LIST }
> 
> This is giving a compiler error (even with the appropriate casting) for 
> value not constant. I've solved by doing:
> 
> union engine_mmio {
>      const struct engine_mmio_base bases[MAX_BUILTIN_MMIO_BASES];
>      const struct engine_mmio_base *bases_list;
>      union {
>          const u64 is_list : 1;
>          const u64       : 63;
>      };
> };
>

I retract this statement. The trick with the union compiles fine but 
only one of the 2 assignments (between bases_list and is_list) actually 
ends up in the structure.
I've experimented with other possible solutions with shifts and/or ORs, 
but they all lead to a compiler error for value not constant. At this 
point I'll stick with the original implementation as I want to avoid 
over-engineering this.

Daniele

> Code size is almost unchanged but still looks like a good compromise. 
> Will update the selftest and send patches soon.
> 
> Daniele
> 
>>      ...
>>     },
>>
>> This could be the best of both worlds, with up to two bases built-in, 
>> and engines with more point to an array.
>>
>>
>> Regards,
>>
>> Tvrtko
>>
>>
>>
>>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx