From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1752864AbcJKJ6a convert rfc822-to-8bit (ORCPT <rfc822;w@1wt.eu>);
        Tue, 11 Oct 2016 05:58:30 -0400
Received: from smtprelay0043.hostedemail.com ([216.40.44.43]:42643 "EHLO
        smtprelay.hostedemail.com" rhost-flags-OK-OK-OK-FAIL)
        by vger.kernel.org with ESMTP id S1752735AbcJKJ62 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 11 Oct 2016 05:58:28 -0400
X-Session-Marker: 726F737465647440676F6F646D69732E6F7267
X-Spam-Summary: 50,0,0,,d41d8cd98f00b204,rostedt@goodmis.org,:::,RULES_HIT:41:69:355:379:541:599:800:960:966:967:968:973:988:989:1260:1277:1311:1313:1314:1345:1359:1437:1513:1515:1516:1518:1521:1534:1543:1593:1594:1605:1711:1730:1747:1777:1792:1981:2194:2196:2198:2199:2200:2201:2393:2525:2553:2560:2563:2682:2685:2693:2859:2907:2933:2937:2939:2942:2945:2947:2951:2954:3022:3138:3139:3140:3141:3142:3622:3865:3866:3867:3868:3870:3871:3872:3934:3936:3938:3941:3944:3947:3950:3953:3956:3959:4250:4385:4605:5007:6119:6120:6261:7875:7903:9025:10004:10400:10848:10946:10967:11026:11232:11473:11658:11914:12043:12291:12663:12683:12740:13439:14096:14097:14181:14659:14721:14819:21080:21451:30034:30051:30054:30070:30090:30091,0,RBL:none,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fn,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:3,LUA_SUMMARY:none
X-HE-Tag: coil30_7fa74bab2d521
X-Filterd-Recvd-Size: 4193
Date: Tue, 11 Oct 2016 05:57:53 -0400
From: Steven Rostedt <rostedt@goodmis.org>
To: Joel Fernandes <joelaf@google.com>
Cc: linux-kernel@vger.kernel.org
Subject: Re: [RFC 0/7] pstore: Improve performance of ftrace backend with
 ramoops
Message-ID: <20161011055753.2690178f@grimm.local.home>
In-Reply-To: <1475904515-24970-1-git-send-email-joelaf@google.com>
References: <1475904515-24970-1-git-send-email-joelaf@google.com>
X-Mailer: Claws Mail 3.14.0 (GTK+ 2.24.30; x86_64-pc-linux-gnu)
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 8BIT
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Fri,  7 Oct 2016 22:28:27 -0700
Joel Fernandes <joelaf@google.com> wrote:

> Here's an early RFC for a patch series on improving ftrace throughput with
> ramoops. I am hoping to get some early comments so I'm releasing it in advance.
> It is functional and tested.
> 
> Currently ramoops uses a single zone to store function traces. To make this
> work, it has to uses locking to synchronize accesses to the buffers. Recently
> the synchronization was completely moved from a cmpxchg mechanism to raw
> spinlocks due to difficulties in using cmpxchg on uncached memory and also on
> RAMs behind PCIe. [1] This change further dropped the peformance of ramoops
> pstore backend by more than half in my tests.
> 
> This patch series improves the situation dramatically by around 280% from what
> it is now by creating a ramoops persistent zone for each CPU and avoiding use of
> locking altogether for ftrace. At init time, the persistent zones are then
> merged together.
> 
> Here are some tests to show the improvements.  Tested using a qemu quad core
> x86_64 instance with -mem-path to persist the guest RAM to a file. I measured
> avergage throughput of dd over 30 seconds:
> 
> dd if=/dev/zero | pv | dd of=/dev/null
> 
> Without this patch series: 24MB/s
> With per-cpu buffers and counter increment: 91.5 MB/s (improvement by ~ 281%)
> with per-cpu buffers and trace_clock: 51.9 MB/s
> 
> Some more considerations:
> 1. Inorder to do the merge of the individual buffers, I am using racy counters
> since I didn't want to sacrifice throughput for perfect time stamps.
> trace_clock() for timestamps although did the job but was almost half the
> throughput of using counter based timestamp.
> 
> 2. Since the patches divide the available ftrace persistent space by the number
> of CPUs, lesser space will now be available per-CPU however the user is free to
> disable per CPU behavior and revert to the old behavior by specifying
> PSTORE_PER_CPU flag.  Its a space vs performance trade-off so if user has
> enough space and not a lot of CPUs, then using per-CPU persistent buffers make
> sense for better performance.
> 
> 3. Without using any counters or timestamps, the improvement is even more
> (~140MB/s) but the buffers cannot be merged.
> 
> [1] https://lkml.org/lkml/2016/9/8/375

>>From a tracing point of view, I have no qualms with this patch set.

-- Steve

> 
> Joel Fernandes (7):
>   pstore: Make spinlock per zone instead of global
>   pstore: locking: dont lock unless caller asks to
>   pstore: Remove case of PSTORE_TYPE_PMSG write using deprecated
>     function
>   pstore: Make ramoops_init_przs generic for other prz arrays
>   ramoops: Split ftrace buffer space into per-CPU zones
>   pstore: Add support to store timestamp counter in ftrace records
>   pstore: Merge per-CPU ftrace zones into one zone for output
> 
>  fs/pstore/ftrace.c         |   3 +
>  fs/pstore/inode.c          |   7 +-
>  fs/pstore/internal.h       |  34 -------
>  fs/pstore/ram.c            | 234 +++++++++++++++++++++++++++++++++++----------
>  fs/pstore/ram_core.c       |  30 +++---
>  include/linux/pstore.h     |  69 +++++++++++++
>  include/linux/pstore_ram.h |   6 +-
>  7 files changed, 280 insertions(+), 103 deletions(-)
>