[PATCH] futex: Support smaller futexes of one byte or two byte size.

From: Malte Skarupke <malteskarupke@web.de>
To: tglx@linutronix.de, mingo@redhat.com
Cc: peterz@infradead.org, dvhart@infradead.org,
	linux-kernel@vger.kernel.org, malteskarupke@fastmail.fm,
	malteskarupke@web.de
Subject: [PATCH] futex: Support smaller futexes of one byte or two byte size.
Date: Fri, 20 Dec 2019 14:08:59 -0500	[thread overview]
Message-ID: <20191220190901.23465-1-malteskarupke@web.de> (raw)
In-Reply-To: <20191204235238.10764-1-malteskarupke@web.de>

Hello again,

here are two patches for sized futexes. Option 1 is the same behavior as my
first patch, just with the fixes suggested above. Option 2 requires a size
flag for WAKE and REQUEUE.

The reason for sending two changelists is that after thinking about it for a
while, and after implementing both options, I realized that I do care which
option we pick. I still prefer option 1. I would also be OK with going with
option 2, but let me try one final time to convince you with three reasons:

1. The consistent API conversation. There were two reasons voiced for this:
a) In a private email Thomas Gleixner voiced that just having two different
   calls, WAIT and WAKE, where one takes the size and the other does not,
   would be inconsistent. I would like to think about this in terms of another
   popular API where one takes a size and the other does not: malloc() and
   free(). Over the years many allocator implementers would have hoped that
   free() also takes a size because it can save a few instructions, and the
   user usually knows what they are freeing and could pass in the size. But
   what if there was a version of free() that doesn't need the size, but it
   still takes a size argument and only uses it to verify that the correct
   size was passed in: If you pass an incorrect size it errors out and doesn't
   actually free the memory. Would that be a better interface?

   Because that's essentially what I'm implementing here: WAKE doesn't need
   the size, and the only thing it uses the size for is to verify that it is
   correct, and if it's not, WAKE errors out and doesn't actually wake. I
   don't think that that makes it a better API.

b) On this mailing list Peter Zijlstra voiced that my implementation does not
   implement the MONITOR/MWAIT pattern. This wasn't a problem when there was
   only one size of futex, but it is once you can have overlapping futexes of
   different sizes. I can see that it might be nice to implement this pattern,
   but I chose to not do it mainly because mutexes and condition variables
   don't actually need it. They just need the same semantics we had before,
   except for smaller types. There is no need to handle overlapping futexes,
   so I propose a different mental model: Comparisons+hashtable lookup. Under
   that mental model only an exact match on the address between WAIT and WAKE
   will do anything, because otherwise you won't find anything in the
   hashtable. And under that mental model the size only applies to the
   comparison. That's what I implemented and what I prefer. Option 1 is a more
   internally consistent implementation in that mental model.

2. Precedent. When Windows added support for futexes, they supported different
   sizes of futexes from the beginning. In their API WaitOnAddress() is the
   equivalent to our FUTEX_WAIT, and WakeByAddressSingle() and
   WakeByAddressAll() are the equivalents to our FUTEX_WAKE. In the Windows
   API WaitOnAddress() takes a size, but the WakeByAddress calls do not take a
   size. Obviously there may be reasons to do things differently from how
   Windows does it, but it does show that at least some other programmers came
   to the same conclusions that I did.

3. Odd behavior. I made one change compared to what I wrote earlier in the
   discussion. Earlier I said that calling WAIT with two different sizes on
   the same address should probably not be allowed. The reason for that is
   that it can lead to unexpected behavior. If you do a 8 bit WAIT followed by
   a 16 bit WAIT on the same address, then call 8 bit WAKE on the same address
   twice, the first WAKE call will work and return 1, but the second WAKE call
   will return -EINVAL because of a size mismatch.

   I thought that that's just a bit too weird so I said that I wouldn't allow
   WAITs with different sizes. The reason why I changed my mind on that is
   that I would have to change what WAIT does. Instead of just inserting
   itself into a linked list, it would have to iterate through that linked
   list to check if anybody else already went to sleep with a different size.
   So a O(1) operation would turn into a O(n) operation. Since there can be
   condition variables where many threads sleep on the same futex, this would
   actually be fairly common. So I thought that that would be an unacceptable
   change in behavior. (and a similar change would have to be made to REQUEUE)
   So this does mean that in the version where WAKE verifies the size, you can
   get weird behavior if WAIT was called with two different sizes. (or maybe
   if somebody wasn't consistent with their usage of the flags in REQUEUE) I
   don't think this is a huge concern because I don't expect people to have
   different sized futexes on the same address, but it is a weird API quirk.
   If this does cause bugs, they should also be fairly obvious and easy to
   fix. But they would be bugs that you couldn't have with option 1.

So now that I have tried to convince you one final time, you're free to choose
whichever option that you like. And obviously if you have more feedback for
the code, I would also implement those fixes. Also if you want a patch without
the "option 1" or "option 2" text in the description, I can provide that.

Thanks,

Malte