From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756438AbdHYSrn (ORCPT ); Fri, 25 Aug 2017 14:47:43 -0400 Received: from mail-lf0-f53.google.com ([209.85.215.53]:38323 "EHLO mail-lf0-f53.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756301AbdHYSrl (ORCPT ); Fri, 25 Aug 2017 14:47:41 -0400 MIME-Version: 1.0 In-Reply-To: References: <20170825093335.100892-1-maco@android.com> <20170825093335.100892-4-maco@android.com> From: Martijn Coenen Date: Fri, 25 Aug 2017 20:47:39 +0200 Message-ID: Subject: Re: [PATCH 03/13] ANDROID: binder: add support for RT prio inheritance. To: Thomas Gleixner Cc: Greg KH , John Stultz , Todd Kjos , =?UTF-8?B?QXJ2ZSBIasO4bm5ldsOlZw==?= , Amit Pundir , LKML , devel@driverdev.osuosl.org, Martijn Coenen , Iliyan Malchev , Colin Cross , Peter Zijlstra , Christoph Hellwig Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Thomas, On Fri, Aug 25, 2017 at 5:08 PM, Thomas Gleixner wrote: > Sorry, but this has not much to do with real priority inheritance. Can you clarify what "real priority inheritance" is, or are you more concerned about this particular implementation of it? > > It's a poor mans pseudo PI implementation. What I can't see from the sparse > changelog is how all of this is supposed to work. Ok, guess the latter :-) I agree with you I should have included a more verbose description of how this works, and I should also have mentioned I did look at rtmutex and didn't think it could be easily used for our purposes (more below). > > My interpretation of it is, that you need to make sure that the other > thread in that IPC mechanism gets boosted enough to not block the thread > which needs a reply or action. This is true, but it's not the whole picture - more below. > > Therefor you create a unreadable maze of capability checks and other things > which do not make any sense to me due to lack of comments and something > which explains the big picture. I thought the code itself was reasonably self-explanatory, but I'm happy to comment it more. The binder driver has done "priority inheritance" with nice values for a long time, and this is an extension of that that more or less works in the same way. > > The whole thing looks wrong and engineered sideways circumventing the > existing facilities and making weird assumptions about priority settings. I'd be happy to use existing facilities if they can do what we need. Can you describe the "weird assumptions" in more detail? > Current state: > > 1) thread queues work to worker via binder > > 2) thread waits for the work to complete (I deduced that from a stray > comment about sync work) > > If the waiting thread is a high priority thread it might wait for a long > time if the worker thread is low priority or SCHED_OTHER. > > Desired state: > > 1) thread queues work to worker thread via binder > > 1a) thread boosts the worker to its own priority > > 2) thread waits for the work to complete > > 2a) worker completes and drops priority boost > > Is that about right? It is correct for synchronous transactions. Synchronous transactions are transactions for which the caller blocks until they are completed (eg a reply has been received). The receiving process has a threadpool of one or more threads waiting for transactions. Typically those receiving threads call into the binder driver with an ioctl and are then waiting for work on a waitqueue. Before this patch series, all threads (available for new transactions) would wait on the same waitqueue, and so it was hard to do any sort of PI, because we didn't know which thread was going to wake up. The first set in this series changes this behavior by getting rid of this process-wide wait-queue - the caller itself picks a thread to wake up. Anyway, after that things more or less go as you describe: the thread picks up the work, returns back to userspace to have it do the work, and eventually a reply comes back, for which we unblock the caller (which is blocked on its own waitqueue). One place where using an rtmutex becomes tricky is that there may be no threads waiting for work: all threads could be busy handling transactions, so we can't pick one to wake up and do the work. In that case, we push the work to a process-wide workqueue, and the first thread to become available will pick up the work and do it. In that case a "proxy rtmutex" doesn't really seem to work, because the proxy doesn't know the owner when the work is queued. The current implementation in this patchset just has the thread boost its own priority in this scenario. Another reason rtmutex is not straightforward is that we support something called "node priority inheritance". A node is binder terminology for an object that you can make binder transactions to. For some nodes, we like to set a minimum scheduling priority. An example of this are all the nodes in our system_server process. The reason for this is that binder calls into the system server process often take critical userspace locks. Say a thread calls into system_server with priority 130; the system_server binder thread inherits, runs at 130, takes a lock in userspace and gets preempted; now, when somebody else tries to take the same lock, it can get blocked for a long time. So, we set a minimum scheduling policy for system_server at prio 120. This makes using rtmutex APIs hard, because we don't necessarily want the binder thread to have the priority of the caller. I guess you could say the proper way to fix this is that our userspace mutexes should also support priority inheritance; I don't think they do today, though I don't know the exact reason behind it. The final reason using rtmutex is not straightforward is asynchronous transactions. Those are transactions for which the caller does not block until completion. We push the work, wake up a thread, and then the caller returns to userspace immediately. In that case we don't inherit the priority from the caller, but we still want to run at the minimum node priority. So there really is no caller that can be "blocked" on a lock. I will look into the rtmutex code a bit more - to be honest I hadn't seen the "proxy" mechanism earlier, which is why I thought rtmutex wouldn't even work for synchronous transactions. But if you have suggestions for how to deal with these other scenarios, I'd be happy to see if I can rework this. Thanks, Martijn