From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754863AbbCSL1A (ORCPT ); Thu, 19 Mar 2015 07:27:00 -0400 Received: from mail-ig0-f182.google.com ([209.85.213.182]:35275 "EHLO mail-ig0-f182.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751157AbbCSL0w (ORCPT ); Thu, 19 Mar 2015 07:26:52 -0400 MIME-Version: 1.0 In-Reply-To: References: <1425906560-13798-1-git-send-email-gregkh@linuxfoundation.org> Date: Thu, 19 Mar 2015 12:26:51 +0100 Message-ID: Subject: Re: [PATCH v4 00/14] Add kdbus implementation From: David Herrmann To: Andy Lutomirski Cc: Greg Kroah-Hartman , Arnd Bergmann , "Eric W. Biederman" , One Thousand Gnomes , Tom Gundersen , Jiri Kosina , Linux API , "linux-kernel@vger.kernel.org" , Daniel Mack , Djalal Harouni Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi On Wed, Mar 18, 2015 at 7:24 PM, Andy Lutomirski wrote: > On Wed, Mar 18, 2015 at 6:54 AM, David Herrmann wrote: [...] >> This program sends unicast messages on kdbus and UDS, exactly the same >> number of times with the same 8kb payload. No parsing, no marshaling >> is done, just plain message passing. The interesting spikes are >> sys_read(), sys_write() and the 3 kdbus sys_ioctl(). Everything else >> should be ignored. >> >> sys_read() and sys_ioctl(KDBUS_CMD_RECV) are about the same. But note >> that we don't copy any payload in RECV, so it scales O(1) compared to >> message-size. >> >> sys_write() is about 3x faster than sys_ioctl(KDBUS_CMD_WRITE). > > Is that factor of 3 for 8 kb payloads? If so, I expect it's a factor > of much worse than 3 for small payloads. Yes, factor of 3x for 8kb payloads. ~3.8x for 64byte payloads (see the second flamegraph I linked, which contains data for 64byte payloads (which is basically nothing)). >>> - The time it takes to connect >> >> No idea, never measured it. Why is it of interest? > > Gah, sorry, bad terminology. I mean the time it takes to send a > message to a receiver that you haven't sent to before. Cold message transactions are horribly slow for both, kdbus and UDS, and the performance varies heavily (factor of 10x). I haven't figured out whether it's icache/dcache misses, cold branch predictor, process mem faults, scheduler, whatever.. What I can say, is the kdbus paths are more expensive, in both LOC and execution time. We might be able to improve the cold-transaction performance with _unlikely_() annotations, shortcuts, etc. But I want much more benchmark data before I try to outsmart the compiler. It works reasonably well right now. Note that from a algorithmic view, there's no difference between the first transaction and a following transaction. All relevant accesses are O(1). (Actually looking at the numbers again, worst-case vs. average-case in message transaction is exactly 10x for both, UDS and kdbus. Skipping the first couple, I get <2x. std-dev is roughly 2%) > (The kdbus terminology is weird. You don't send to "endpoints", you > don't "connect" to other participants, and it's not even clear to me > what a participant in the bus is called.) A participant is called a "connection" or "peer" (I prefer 'peer'). It "connects" to a bus via an endpoint of the bus. Endpoints are file-system entries and can be shared, and usually are. Unlike binder, kdbus does not know peer-to-peer links. That is, there is never (not even a temporary) link between only two peers. Messages are always sent to the bus, and the bus makes sure only the addressed recipients will get the message. >> >>> I'm also interested in whether the current design is actually amenable >>> to good performance. I'm told that it is, but ISTM there's a lot of >>> heavyweight stuff going on with each send operation that will be hard >>> to remove. >> >> I disagree. What heavyweight stuff is going on? > > At least metadata generation, metadata free, and policy db checks seem > expensive. It could be worth running a bunch of copies of your > benchmark on different cores and seeing how it scales. metadata handling is local to the connection that sends the message. It does not affect the overall performance of other bus operations in parallel. Furthermore, it's way faster than collecting the "same" data via /proc, so I don't think it slows down the overall transaction at all. If a receiver doesn't want metadata, it should not request it (by setting the receiver-metadata-mask). If a sender doesn't like the overhead, it should not send the metadata (by setting the sender-metadata-mask). Only if both peers set the metadata mask, it will be transmitted. The policy-db does indeed look like a bottleneck. Until v2 we used to have a policy-cache, which I ripped out as it didn't meet our expectations. There are plans to rewrite it, but it's low-priority. Thing is, policy-setup is bus-privileged. As long as it's done in a sane manner (keeping the entries per name minimal), the hash-table based name-lookup gives suitable performance. Only if the number of entries per name rises, it gets problematic due to O(n) list-traversal. But even that could be optimized without a policy cache, by merging matching entries (see kdbus_policy_db_entry_access). Furthermore, the policy-db is skipped for privileged peers or if both, sender and recipient, trust each other (eg., have the same endpoint+uid). Thus, if you have a trusted transaction, the policy-db is skipped, anyway. One real bottleneck I see is the name-registry rwlock. Acquiring/releasing names is still a heavy operation, as it blocks the whole bus due to acquiring the write-lock. Again, I have plans to optimize it (srcu would be an idea, syncing on name-acquire/release), but it's an implementation detail. Thanks David