From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 04AAFC0044C for ; Sat, 3 Nov 2018 06:30:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id B16082081B for ; Sat, 3 Nov 2018 06:30:08 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B16082081B Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=davemloft.net Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727596AbeKCPkV (ORCPT ); Sat, 3 Nov 2018 11:40:21 -0400 Received: from shards.monkeyblade.net ([23.128.96.9]:46094 "EHLO shards.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726129AbeKCPkU (ORCPT ); Sat, 3 Nov 2018 11:40:20 -0400 Received: from localhost (unknown [IPv6:2601:601:9f80:35cd::cf9]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (Client did not present a certificate) (Authenticated sender: davem-davemloft) by shards.monkeyblade.net (Postfix) with ESMTPSA id 7882F14525CCC; Fri, 2 Nov 2018 23:30:06 -0700 (PDT) Date: Fri, 02 Nov 2018 23:30:03 -0700 (PDT) Message-Id: <20181102.233003.1814045087128749000.davem@davemloft.net> To: jolsa@redhat.com Cc: acme@kernel.org, linux-kernel@vger.kernel.org, namhyung@kernel.org, jolsa@kernel.org Subject: Re: [PATCH RFC] hist lookups From: David Miller In-Reply-To: <20181031.090816.2117345408719881030.davem@davemloft.net> References: <20181031124306.GA10660@kernel.org> <20181031153907.GA29893@krava> <20181031.090816.2117345408719881030.davem@davemloft.net> X-Mailer: Mew version 6.8 on Emacs 26.1 Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.5.12 (shards.monkeyblade.net [149.20.54.216]); Fri, 02 Nov 2018 23:30:06 -0700 (PDT) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: David Miller Date: Wed, 31 Oct 2018 09:08:16 -0700 (PDT) > From: Jiri Olsa > Date: Wed, 31 Oct 2018 16:39:07 +0100 > >> it'd be great to make hist processing faster, but is your main target here >> to get the load out of the reader thread, so we dont lose events during the >> hist processing? >> >> we could queue events directly from reader thread into another thread and >> keep it (the reader thread) free of processing, focusing only on event >> reading/passing > > Indeed, we could create threads that take samples from the thread processing > the ring buffers, and insert them into the histogram. So I played around with some ideas like this and ran into some dead ends. I ran each mmap ring's processing in a separate thread. This doesn't help at all, the problem is that all the threads serialize at the pthread lock for the histogram part of the work. And the histogram part dominates the cost of processing each sample. Nevertheless I started work on formally threading all of the code that the mmap threads operate on, such as symbol processing etc. and while doing so I came to the conclusion that pushing the histogram processing only to a separate thread poses it's own set of big challenges. To make this work we would have to make a piece of transient on-stack state (the processed event) into allocated persistent state. These persistent event structures get queued up to the histogram thread(s). Therefore, if the histogram thread(s) can't keep up (and as per my experiment above, it is easy to enter this state because the histogram code itself is going to run linearly with the histgram lock held), this persistent event memory will just get larger and larger. We would have to find some way to parallelize the histgram code to make any kind of threading worthwhile.