On 2020-05-26 4:27 p.m., Douglas Gilbert wrote:
> On 2020-05-26 3:10 p.m., Douglas Gilbert wrote:
>> On 2020-05-26 1:53 p.m., Hannes Reinecke wrote:
>>> On 5/26/20 4:27 PM, Matthew Wilcox wrote:
>>>> On Tue, May 26, 2020 at 08:21:26AM +0200, Hannes Reinecke wrote:
>>>>> On 5/25/20 7:40 PM, Matthew Wilcox wrote:
>>>>>> You aren't the first person to ask about having a 64-bit lookup on
>>>>>> 32-bit machines.  Indeed, I remember Hannes asking for a 256 bit lookup
>>>>>> at LinuxCon in Prague.  I have always been reluctant to add such a thing
>>>>>> because the XArray uses quite a naive data type underneath.  It works well 
>>>>>> with
>>>>>> dense arrays but becomes very wasteful of memory for sparse arrays.
>>>>>>
>>>>>> My understanding of SCSI-world is that most devices have a single
>>>>>> LUN 0.  Most devices that have multiple LUNs number them sequentially.
>>>>>> Some devices have either an internal structure or essentially pick random
>>>>>> LUNs for the devices they expose.
>>>>>
>>>>> Not quite. You are correct that most devices have a single LUN 0
>>>>> (think of libata :-), but those with several LUNs typically
>>>>> enumerate them. In most cases the enumeration starts at 0 (or 1,
>>>>> if LUN 0 is used for a management LUN), and reaches up to 256.
>>>>> Some arrays use a different LUN layout, which means that the top
>>>>> two bit of the LUN number are set, and possibly some intermediate
>>>>> numbers, too. But the LUNs themselves are numbered consecutively, too;
>>>>> it's just at a certain offset.
>>>>> I've never seen anyone picking LUN numbers at random.
>>>>
>>>> Ah, OK.  I think for these arrays you'd be better off accepting the cost
>>>> of an extra 4 bytes in the struct scsi_device rather than the cost of
>>>> storing the scsi_device at the LUN.
>>>>
>>>> Let's just work an example where you have a 64-bit LUN with 4 ranges,
>>>> each of 64 entries (this is almost a best-case scenario for the XArray).
>>>> [0,63], 2^62+[0,63], 2^63+[0,63], 2^63+2^62+[0,63].
>>>>
>>>> If we store them sequentially in an allocating XArray, we take up 256 *
>>>> 4 bytes = 1kB extra space in the scsi_device.  The XArray will allocate
>>>> four nodes plus one node to hold the four nodes, which is 5 * 576 bytes
>>>> (2780 bytes) for a total of 3804 bytes.
>>>>
>>>> Storing them in at their LUN will allocate a top level node which covers
>>>> bits 60-66, then four nodes, each covering bits of 54-59, another four
>>>> nodes covering bits 48-53, four nodes for 42-47, ...  I make it 41 nodes,
>>>> coming to 23616 bytes.  And the pointer chase to get to each LUN is
>>>> ten deep.  It'll mostly be cached, but still ...
>>>>
>>> Which is my worry, too.
>>> In the end we're having a massively large array space (128bit if we take the 
>>> numbers as they stand today), of which only a _tiny_ fraction is actually 
>>> allocated.
>>> We can try to reduce the array space by eg. restricting channels and targets 
>>> to be 16 bits, and the LUN to be 32 bits.
>>> But then we'll be having a hard time arguing; "Hey, we have a cool new 
>>> feature, which has a really nice interface, but we can't support the same set 
>>> of devices as we have now...".
>>> That surely will go down well.
>>>
>>> Maybe one should look at using an rbtree directly; _that_ could be made to 
>>> work with an 128bit index, and lookup should be fast, too.
>>> Let's see.
>>>
>>>>> But still, the original question still stands: what would be the most
>>>>> efficient way using xarrays here?
>>>>> We have a four-level hierarchy Host:Channel:Target:LUN
>>>>> and we need to lookup devices (and, occasinally, targets) per host.
>>>>> At this time, 'channel' and 'target' are unsigned integer, and
>>>>> LUNs are 64 bit.
>>>>
>>>> It certainly seems sensible to me to have a per-host allocating XArray
>>>> to store the targets that belong to that host.  I imagine you also want
>>>> a per-target XArray for the LUNs that belong to that target.  Do you
>>>> also want a per-host XArray to store the LUNs so you can iterate all
>>>> LUNs per host as a single lookup rather than indirecting through the
>>>> target Xarray?  That's a design decision for you to make.
>>>>
>>> Seeing that I'm trying to use the existing HCTL number as index it's quite 
>>> hard to have a per-host lookup structure, as this would inevitably
>>> require a separate LUN index somewhere.
>>> Which would mean to have a two-level xarray structure, where the first xarray 
>>> hold the scsi targets, and the second one the LUNs per target.
>>> Not super-nice, but doable.
>>>
>>> Still thinking about the rbtree idea, though.
>>
>> You already have one, and it handles its own locking. It is called xarray :-)
>>
>>
>> Circling back to where I started: xarrays that allocate their own indexes.
>>
>> Apart from the LUN ***, the other indexes tend to be monotonic increasing, most
>> of the time, correct Hannes? So thinking about an xarray in a shost object
>> with pointers to starget objects. At run time if say a target disappears,
>> often another appears either with the same target_id or a larger target_id,
>> perhaps larger than any previously issued id. So with careful insertions
>> (xa_alloc() will allow us to precisely position replacements) we can maintain
>> that monotonic ascending order. As I showed in patch 6/6 we can easily
>> track the number of elements in each xarray, lets call it 'n'. For
>> n < 12 (say) we don't care and just do O(n) iterator searches. For n >= 12
>> we check if the xarray is in ascending order, if it isn't, we sort it by
>> the index we want (e.g. channel_id,target_id). For elements that we
>> change positions of, in the xarray, we need to change the variable I called 
>> 'sh_idx' (i.e. the xarray index) in the corresponding starget object.
>>
>> Why do this, to do a binary search when n >= 12, of course. Lookup time
>> is then O(ln(n)) with only around ln(n) cache-unfriendly visits to starget
>> objects.
>>
>> And that sort can be much better than O(n.ln(n)) since at the point of an
>> insertion, if the xarray was known to be sorted before the insertion,
>> then we only need as single iteration through the xarray to place the
>> new element in the correct position. After that it would be handy to
>> xa_swap() function that gave us the new xarray index. Making sure the
>> xarray indexes don't become too sparse might also be addressed in the
>> sort.
>>
>>
>> Question to Matthew:
>> Will xa_find(&xarray, &idx, full_range, 0)
>> return NULL _and_ place the index of the lowest "hole" in that xarray
>> (or 1 beyond the upper limit of the range) in 'idx' ?

Looked at xa_find() in lib/xarray.c and the answer is no. No harm,
we can find holes by watching the index arguments output by
xa_find() and xa_find_after() as we iterate over the array.

>> Note to others: the last argument in a typical xa_find() call
>> is either XA_PRESENT or some other mark the app is using, not 0.
>>
>> Doug Gilbert
>>
>>
>> *** the SCSI REPORT LUNS command doesn't say anything about the _ordering_
>>      of LUNs or even if it is stable from one invocation to the next.
>>      There is nothing to stop the scsi_scan.c code from sorting the LUN
>>      inventory before applying it to the ML object tree.
> 
> Maybe interest Matthew into doing the heavy lifting:
> 
> void xa_compact_and_sort(struct xa_array *xa,
>                           int offset_ul_idx,
>                           int (*compar)(const void *l, const void *r));
> 
> [Function modelled on qsort(3), perhaps qsort_r(3) would be better.]
> 
> The second argument to xa_compact_and_sort() is either:
>      - offsetof(struct struct, sh_idx) where sh_idx is an unsigned long, or
>      - if less than zero then ignore index fixup and don't compact
> 
> The third argument is either:
>       - a compare function that, for example, acts on an ordering for the
>         key: channel_id,target_id and uses it to return <0, ==0 or >=0
>         based on being called with pointers to two starget objects, or
>       - NULL: then just do a compaction if ul_idx >= 0
> 
> So when ul_idx >= 0 xa_compact_and_sort() places the new compacted index
> at each *((unsigned long *)((u8 *)starget) + ul_idx)). So in the case of
> my 1/6 patch the type of sh_idx would change from int to unsigned long.

On further thought, I would add a new bool argument 'do_compact' and
drop that "and don't compact" from offset_ul_idx description. Or
just have separate the xa_compact() and xa_sort() functions.

Attached is a rough idea of what the xa_compact logic might do. It in
turn depends on either a xa_swap() or xa_move() function which we
don't currently have. Naturally I would like those functions to "carry"
the mark info with them (unlike xa_load() and xa_store()) as they
swap or move elements.

A common use case might be when all but the last element are already
sorted, and a newly arrived target (disk) is appended to the array.
So only the last element needs to be re-placed if it is out of order.

Doug Gilbert