From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1754863AbbCSL1A (ORCPT <rfc822;w@1wt.eu>);
	Thu, 19 Mar 2015 07:27:00 -0400
Received: from mail-ig0-f182.google.com ([209.85.213.182]:35275 "EHLO
	mail-ig0-f182.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1751157AbbCSL0w (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Thu, 19 Mar 2015 07:26:52 -0400
MIME-Version: 1.0
In-Reply-To: <CALCETrVWbz7YudNQXQD_2PjC1HR0P0cB_1ea8NiYoQPDfQxERg@mail.gmail.com>
References: <1425906560-13798-1-git-send-email-gregkh@linuxfoundation.org>
	<CALCETrUjqTH14fZCeqU_OBoKX_6x4B=712WTeSqSAztpgXGiDw@mail.gmail.com>
	<CANq1E4Tf_6Cn+sxY=BPEbRpEr5WY+-rRX5gipz-_=4PNLa9bnQ@mail.gmail.com>
	<CALCETrVWbz7YudNQXQD_2PjC1HR0P0cB_1ea8NiYoQPDfQxERg@mail.gmail.com>
Date: Thu, 19 Mar 2015 12:26:51 +0100
Message-ID: <CANq1E4RA1ok=3Z5W2LkzqaUOKtbMVX_Joxe7_LRbLnQgfRUEYA@mail.gmail.com>
Subject: Re: [PATCH v4 00/14] Add kdbus implementation
From: David Herrmann <dh.herrmann@gmail.com>
To: Andy Lutomirski <luto@amacapital.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
        Arnd Bergmann <arnd@arndb.de>,
        "Eric W. Biederman" <ebiederm@xmission.com>,
        One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>,
        Tom Gundersen <teg@jklm.no>, Jiri Kosina <jkosina@suse.cz>,
        Linux API <linux-api@vger.kernel.org>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
        Daniel Mack <daniel@zonque.org>, Djalal Harouni <tixxdz@opendz.org>
Content-Type: text/plain; charset=UTF-8
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

Hi

On Wed, Mar 18, 2015 at 7:24 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Wed, Mar 18, 2015 at 6:54 AM, David Herrmann <dh.herrmann@gmail.com> wrote:
[...]
>> This program sends unicast messages on kdbus and UDS, exactly the same
>> number of times with the same 8kb payload. No parsing, no marshaling
>> is done, just plain message passing. The interesting spikes are
>> sys_read(), sys_write() and the 3 kdbus sys_ioctl(). Everything else
>> should be ignored.
>>
>> sys_read() and sys_ioctl(KDBUS_CMD_RECV) are about the same. But note
>> that we don't copy any payload in RECV, so it scales O(1) compared to
>> message-size.
>>
>> sys_write() is about 3x faster than sys_ioctl(KDBUS_CMD_WRITE).
>
> Is that factor of 3 for 8 kb payloads?  If so, I expect it's a factor
> of much worse than 3 for small payloads.

Yes, factor of 3x for 8kb payloads. ~3.8x for 64byte payloads (see the
second flamegraph I linked, which contains data for 64byte payloads
(which is basically nothing)).

>>>  - The time it takes to connect
>>
>> No idea, never measured it. Why is it of interest?
>
> Gah, sorry, bad terminology.  I mean the time it takes to send a
> message to a receiver that you haven't sent to before.

Cold message transactions are horribly slow for both, kdbus and UDS,
and the performance varies heavily (factor of 10x). I haven't figured
out whether it's icache/dcache misses, cold branch predictor, process
mem faults, scheduler, whatever..

What I can say, is the kdbus paths are more expensive, in both LOC and
execution time. We might be able to improve the cold-transaction
performance with _unlikely_() annotations, shortcuts, etc. But I want
much more benchmark data before I try to outsmart the compiler. It
works reasonably well right now.

Note that from a algorithmic view, there's no difference between the
first transaction and a following transaction. All relevant accesses
are O(1).

(Actually looking at the numbers again, worst-case vs. average-case in
message transaction is exactly 10x for both, UDS and kdbus. Skipping
the first couple, I get <2x. std-dev is roughly 2%)

> (The kdbus terminology is weird.  You don't send to "endpoints", you
> don't "connect" to other participants, and it's not even clear to me
> what a participant in the bus is called.)

A participant is called a "connection" or "peer" (I prefer 'peer'). It
"connects" to a bus via an endpoint of the bus. Endpoints are
file-system entries and can be shared, and usually are.
Unlike binder, kdbus does not know peer-to-peer links. That is, there
is never (not even a temporary) link between only two peers. Messages
are always sent to the bus, and the bus makes sure only the addressed
recipients will get the message.

>>
>>> I'm also interested in whether the current design is actually amenable
>>> to good performance.  I'm told that it is, but ISTM there's a lot of
>>> heavyweight stuff going on with each send operation that will be hard
>>> to remove.
>>
>> I disagree. What heavyweight stuff is going on?
>
> At least metadata generation, metadata free, and policy db checks seem
> expensive.  It could be worth running a bunch of copies of your
> benchmark on different cores and seeing how it scales.

metadata handling is local to the connection that sends the message.
It does not affect the overall performance of other bus operations in
parallel. Furthermore, it's way faster than collecting the "same" data
via /proc, so I don't think it slows down the overall transaction at
all. If a receiver doesn't want metadata, it should not request it (by
setting the receiver-metadata-mask). If a sender doesn't like the
overhead, it should not send the metadata (by setting the
sender-metadata-mask). Only if both peers set the metadata mask, it
will be transmitted.

The policy-db does indeed look like a bottleneck. Until v2 we used to
have a policy-cache, which I ripped out as it didn't meet our
expectations. There are plans to rewrite it, but it's low-priority.
Thing is, policy-setup is bus-privileged. As long as it's done in a
sane manner (keeping the entries per name minimal), the hash-table
based name-lookup gives suitable performance. Only if the number of
entries per name rises, it gets problematic due to O(n)
list-traversal. But even that could be optimized without a policy
cache, by merging matching entries (see kdbus_policy_db_entry_access).
Furthermore, the policy-db is skipped for privileged peers or if both,
sender and recipient, trust each other (eg., have the same
endpoint+uid). Thus, if you have a trusted transaction, the policy-db
is skipped, anyway.

One real bottleneck I see is the name-registry rwlock.
Acquiring/releasing names is still a heavy operation, as it blocks the
whole bus due to acquiring the write-lock. Again, I have plans to
optimize it (srcu would be an idea, syncing on name-acquire/release),
but it's an implementation detail.

Thanks
David