All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marcel Holtmann <marcel@holtmann.org>
To: ell@lists.01.org
Subject: Re: [PATCH 6/9] hashmap: Add re-entrancy support to foreach function
Date: Mon, 16 Feb 2015 11:03:42 -0800	[thread overview]
Message-ID: <C9C4AA4F-D920-42E6-9519-5121D2FE7691@holtmann.org> (raw)
In-Reply-To: <CABBYNZ+kjhv1coqopMtSOT8a1Hc_UmL1pktv3XKNKruSUMkq7Q@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 6965 bytes --]

Hi Luiz,

>>>>>>>>> glib also crashed with this pattern. Or usually worked ok, as the
>>>>>>>>> removed/added item wasn't always the item used in foreach or the next
>>>>>>>>> item. Fixing this to allow any API call successfully work at any time
>>>>>>>>> requires quite some more work to be done, the above patch by Jukka was
>>>>>>>>> approximately the minimum needed for a remove to work at any one time.
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> If you find a good way to fix this in the data structure, great. But
>>>>>>>> the current fix is not acceptable.  We will not be iterating over the
>>>>>>>> _entire_ data structure twice.  The foreach operation is already
>>>>>>>> expensive and too tempting to abuse.
>>>>>>> 
>>>>>>> 
>>>>>>> The patch would iterate the data structure twice only if user did modify
>>>>>>> the hash in the callback func. That is probably not very common case
>>>>>>> anyway.
>>>>>>> 
>>>>>> 
>>>>>> Does not matter. An operation that you expect to take O(n) suddenly becomes
>>>>>> O(2n).  That's just not acceptable.  Remember, we're running on low-power
>>>>>> devices, so our data structures will be optimized for speed. Programmer
>>>>>> convenience is a secondary concern.
>>>>>> 
>>>>>> Anyway, I pushed a documentation clarification to ell/hashmap.c explaining
>>>>>> that the hashmap must be invariant during an ongoing l_hashmap_foreach
>>>>>> operation.
>>>>>> 
>>>>>> So you need to find an alternate approach.  Think through your data
>>>>>> structures carefully.
>>>>> 
>>>>> Ive solved a similar problem with queues in BlueZ, it is very similar
>>>>> to ell queues, the code looks like this now:
>>>>> 
>>>>> http://fpaste.org/185237/14238402/
>>>>> 
>>>>> So it basically protects the entries by taking a reference before
>>>>> calling the callback, and also the queue itself before starting
>>>>> iterating, and it just need a single loop. While it can still be
>>>>> vulnerable to bad usage I still think it worth doing because this case
>>>>> of callback removing the entry itself it very common, there is
>>>>> actually 3 cases that we want to allow queue_remove(entry),
>>>>> queue_remove_all and queue_unref by the callback so we added unit
>>>>> tests to emulate these 3 scenarios.
>>>> 
>>>> and this is most likely code that we have to take out again. And revert all users to handle this by themselves. It is not in line with the goals of ELL. And I did warn about that. The reference counting for each entry is pretty wasteful from a memory point of view. Especially if we are running on a system where every single byte of memory usage counts.
>>>> 
>>>> The goal for BlueZ is to eventually be able to run on top of ELL. This means that we have to be really cautious about what we provide and how. ELL is not just another GLib. It is not a dumping ground. We are looking at really memory restraint systems. There is a high chance that we have to make ELL even modular and provide an option to compile it without certain modules like D-Bus or netlink.
>>>> 
>>>> I am just mentioning this here so that everybody understands what our goals here. We might be utilizing systems where the userspace is small and really limited. What ELL needs to do is provide common functionality for its users, but it does not have to solve world hunger for its users.
>>> 
>>> Sorry but I cannot understand this motive, the reference counting will
>>> happen anyway since it is most common way to protect against callbacks
>>> destroying the very entry, if you don't do in ell the user code will
>>> do it and memory will be consumed whether you like it or not. So you
>>> talk about memory restraint system but your solution may actually
>>> cause more memory to be consumed, outside of ell.
>> 
>> so adding an int ref_count to each queue entry is adding substantive overhead for every single user. Meaning that every single queue entry we have to store has an extra sizeof(int) allocated.
> 
> That if we use an int, we could use uint8_t which is equivalent to
> bool, the reference is just for internal use anyway it does not get
> more than 2 ever.
> 
>> Compare this with a single sizeof(bool) for something like in_notify as we used in some of our code. So the more elements you store in the queue, the more data you are using. And on top of that it is more data for every single user. No matter if it requires the queue to re-entrancy safe or not.
> 
> Not quite, actually there was 4 flags to track this properly in the
> mgmt code, see for yourself:
> https://git.kernel.org/cgit/bluetooth/bluez.git/commit/?id=b72dee02b2e5dec9fbcf6ce97848fb08363a035d

you are not comparing the right pieces here.

One is the struct queue and the other struct queue_entry. If you add anything to struct queue_entry you are punishing every single data block. It is not a one-time penalty. It is for every user. So if some user wants to store 100 entries in the queue, then they get 100 * sizeof(int) extra overhead.

And as I explained for that exact reason, the queue_entry is single linked list. This means that queue_pop_tail is not a favorable operation and we do not support it. When we started ELL and discussed these details, I ended up going through GLib users and see how much this operation would be needed and what it is used for. The result was we do not want that.

I prefer having 4 constant flags in the queue user compared to punishing every single queue entry with any extra overhead.

>> You might have realized that our queue entry struct has on purpose only a next pointer. We decided really early that we want the queue to be a single list. And we accepted that because of that corner-case operations will be highly expensive. This means that while queues are useful for a lot of cases, they will not be useful for all.
>> 
>> We really do not want the massive bloat GLib repeated. The user of the ELL data structures has to know what they are doing. The queue data structure is not one fits all use case.
> 
> Im not even considering GLib as a good example, it actually surfer
> from the very same problem, as you can see there are a lot of corner
> cases and we end up with O(2n) situation Denis was complaining about
> Jukka's changes, anyway all this code seems to be inspired in at_chat
> from oFono which has the very same O(2n) pattern, so perhaps we need a
> specialized notification list? One that doesn't bloat the caller at
> very least.

There will be cases where a dedicated data structure just makes more sense and is more efficient. That is totally acceptable and encouraged.

This essentially goes back to the original argument that Denis made. ELL is not something that can solve all your problems. If the caller/user of the data structures is behaving stupid, then the only thing ELL can do is abort. Which is a behavior we want since then we can collect traces of the faulty behavior.

Regards

Marcel


  reply	other threads:[~2015-02-16 19:03 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-10 14:42 [PATCH 0/9] hashmap fixes Jukka Rissanen
2015-02-10 14:42 ` [PATCH 1/9] hashmap: Add value free function Jukka Rissanen
2015-02-10 14:42 ` [PATCH 2/9] hashmap: Call user supplied value free function in destroy Jukka Rissanen
2015-02-10 19:07   ` Denis Kenzior
2015-02-10 14:42 ` [PATCH 3/9] hashmap: Call user supplied value free function in insert Jukka Rissanen
2015-02-10 19:18   ` Denis Kenzior
2015-02-11  9:27     ` Patrik Flykt
2015-02-11 11:04       ` Tomasz Bursztyka
2015-02-11 13:50       ` Denis Kenzior
2015-02-10 14:42 ` [PATCH 4/9] unit: hashmap: Add value free hash entry test Jukka Rissanen
2015-02-10 14:42 ` [PATCH 5/9] unit: hashmap: Add replace " Jukka Rissanen
2015-02-10 14:42 ` [PATCH 6/9] hashmap: Add re-entrancy support to foreach function Jukka Rissanen
2015-02-10 19:47   ` Denis Kenzior
2015-02-11  9:21     ` Patrik Flykt
2015-02-11 14:06       ` Denis Kenzior
2015-02-12  7:23         ` Jukka Rissanen
2015-02-12 18:02           ` Denis Kenzior
2015-02-12  7:25         ` Jukka Rissanen
2015-02-12 17:55           ` Denis Kenzior
2015-02-13 15:38             ` Luiz Augusto von Dentz
2015-02-13 17:04               ` Denis Kenzior
2015-02-13 17:36               ` Marcel Holtmann
2015-02-16  9:44                 ` Luiz Augusto von Dentz
2015-02-16 16:18                   ` Marcel Holtmann
2015-02-16 18:27                     ` Luiz Augusto von Dentz
2015-02-16 19:03                       ` Marcel Holtmann [this message]
2015-02-17  9:48                 ` Tomasz Bursztyka
2015-02-17 16:41                   ` Denis Kenzior
2015-02-18  8:23                     ` Tomasz Bursztyka
2015-02-10 14:42 ` [PATCH 7/9] unit: hashmap: Re-entrancy tests added Jukka Rissanen
2015-02-10 14:42 ` [PATCH 8/9] hashmap: Add support to finding an element from hash Jukka Rissanen
2015-02-12  8:35   ` Jukka Rissanen
2015-02-13  0:19     ` Denis Kenzior
2015-02-10 14:42 ` [PATCH 9/9] unit: hashmap: Add unit test for l_hashmap_find Jukka Rissanen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=C9C4AA4F-D920-42E6-9519-5121D2FE7691@holtmann.org \
    --to=marcel@holtmann.org \
    --cc=ell@lists.01.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.