From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934568AbcBDQxH (ORCPT ); Thu, 4 Feb 2016 11:53:07 -0500 Received: from mail.neosystem.cz ([94.23.169.88]:63450 "EHLO mail.neosystem.cz" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932135AbcBDQxE (ORCPT ); Thu, 4 Feb 2016 11:53:04 -0500 X-Greylist: delayed 560 seconds by postgrey-1.27 at vger.kernel.org; Thu, 04 Feb 2016 11:53:03 EST Date: Thu, 4 Feb 2016 17:39:31 +0100 From: Daniel Bilik To: Jan Kara Cc: Thomas Gleixner , Mike Galbraith , LKML , stable@vger.kernel.org Subject: Re: Crashes with 874bbfe600a6 in 3.18.25 Message-Id: <20160204173931.4735a8de14fc0bde6c114321@neosystem.cz> In-Reply-To: <20160204112044.GE4956@quack.suse.cz> References: <20160126093400.GV24938@quack.suse.cz> <20160126111438.GA731@pathway.suse.cz> <56B1C9E4.4020400@suse.cz> <20160203122855.GB6762@dhcp22.suse.cz> <20160203162441.GE14091@mtj.duckdns.org> <1454518913.6148.15.camel@gmail.com> <20160203170652.GI14091@mtj.duckdns.org> <1454580263.3407.114.camel@gmail.com> <20160204112044.GE4956@quack.suse.cz> Organization: neosystem.cz X-Mailer: Sylpheed 3.4.3 (GTK+ 2.24.29; x86_64-portbld-dragonfly4.5) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 4 Feb 2016 12:20:44 +0100 Jan Kara wrote: > Thanks for backport Thomas and to Mike for persistence :). I've asked my > friend seeing crashes with 3.18.25 to try whether this patch fixes the > issues. It may take some time so stay tuned... Patch tested and it really fixes the crash we were experiencing on 3.18.25 with commit 874bbfe+. But it seem to introduce (rather scary) regression. Tested host shows abnormal cpu usage in both kernel and userland under the same load and traffic pattern. One picture is worth a thousand words, so I've taken snapshots of our graphs, see here: http://neosystem.cz/test/linux-3.18.25/ The host was running 3.18.25 with commit 874bbfe+ (1e7af29+ on 3.18-stable) reverted. With this commit included, it crashed within minutes. Around 13:30 we booted 3.18.25 with commit 874bbfe+ included and with the patch from Thomas. And around 15:40 we've booted the host with previous kernel, just to ensure this abnormal behaviour was really caused by the test kernel. Also interesting, in addition to high cpu usage, there is abnormally high number of zombie processes reported by the system. HTH. -- Daniel Bilik neosystem.cz