From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752840AbcFTH3v (ORCPT ); Mon, 20 Jun 2016 03:29:51 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:7493 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751639AbcFTH3p (ORCPT ); Mon, 20 Jun 2016 03:29:45 -0400 X-IBM-Helo: d28dlp02.in.ibm.com X-IBM-MailFrom: xinhui.pan@linux.vnet.ibm.com X-IBM-RcptTo: linux-kernel@vger.kernel.org Date: Mon, 20 Jun 2016 15:29:27 +0800 From: xinhui User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.8.0 MIME-Version: 1.0 To: Byungchul Park , peterz@infradead.org, mingo@kernel.org CC: linux-kernel@vger.kernel.org, npiggin@suse.de, walken@google.com, ak@suse.de, tglx@inhelltoy.tec.linutronix.de Subject: Re: [RFC 12/12] x86/dumpstack: Optimize save_stack_trace References: <1466398527-1122-1-git-send-email-byungchul.park@lge.com> <1466398527-1122-13-git-send-email-byungchul.park@lge.com> In-Reply-To: <1466398527-1122-13-git-send-email-byungchul.park@lge.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16062007-0012-0000-0000-000002A0A566 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 16062007-0013-0000-0000-00000D4202AF Message-Id: <57679B57.40905@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2016-06-20_05:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1604210000 definitions=main-1606200090 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2016年06月20日 12:55, Byungchul Park wrote: > Currently, x86 implementation of save_stack_trace() is walking all stack > region word by word regardless of what the trace->max_entries is. > However, it's unnecessary to walk after already fulfilling caller's > requirement, say, if trace->nr_entries >= trace->max_entries is true. > > For example, CONFIG_LOCKDEP_CROSSRELEASE implementation calls > save_stack_trace() with max_entries = 5 frequently. I measured its > overhead and printed its difference of sched_clock() with my QEMU x86 > machine. > > The latency was improved over 70% when trace->max_entries = 5. > [snip] > +static int save_stack_end(void *data) > +{ > + struct stack_trace *trace = data; > + return trace->nr_entries >= trace->max_entries; > +} > + > static const struct stacktrace_ops save_stack_ops = { > .stack = save_stack_stack, > .address = save_stack_address, then why not check the return value of ->address(), -1 indicate there is no room to store any pointer. > .walk_stack = print_context_stack, > + .end_walk = save_stack_end, > }; > > static const struct stacktrace_ops save_stack_ops_nosched = { >