From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751623AbdHaWKS (ORCPT ); Thu, 31 Aug 2017 18:10:18 -0400 Received: from mx1.redhat.com ([209.132.183.28]:42474 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751001AbdHaWKQ (ORCPT ); Thu, 31 Aug 2017 18:10:16 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 644F44E4C6 Authentication-Results: ext-mx09.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx09.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=dzickus@redhat.com Date: Thu, 31 Aug 2017 18:10:14 -0400 From: Don Zickus To: Thomas Gleixner Cc: LKML , Peter Zijlstra , Ingo Molnar , Andrew Morton , Borislav Petkov , Sebastian Siewior , Nicholas Piggin , Chris Metcalf , Ulrich Obergfell Subject: Re: [patch 00/29] lockup_detector: Cure hotplug deadlocks and replace duct tape Message-ID: <20170831221014.b3tlpk3p6zxwclmy@redhat.com> References: <20170831071558.995235362@linutronix.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170831071558.995235362@linutronix.de> User-Agent: NeoMutt/20170428-dirty (1.8.2) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.38]); Thu, 31 Aug 2017 22:10:16 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Aug 31, 2017 at 09:15:58AM +0200, Thomas Gleixner wrote: > The lockup detector is broken is several ways: > > - It's deadlock prone vs. CPU hotplug in various ways. Some of these > are due to recursive cpus_read_lock() others are due to > cpus_read_lock() from CPU hotplug callbacks which immediately lock > the machine because cpus are write locked. > > - The handling of the cpu hotplug threads happens sideways to the > smpboot thread infrastructure, which is racy and pointless > > - The handling of the user space sysctl interface is a complete > trainwreck as it fiddles directly with variables which can be > modified or evaluated by the running watchdogs. > > - The perf event initialization is a steaming pile of duct tape as it > idiotically tries to create perf events over and over even if perf is > not functional (no hardware, ....). To avoid excessive dmesg spam it > contains magic printk ratelimiting along with either wrong or useless > messages. > > - The code structure is horrible as ifdef sections are scattered all > over the place which makes it unreadable > > - There is more wreckage, but see the changelogs for the ugly details. > > Before I get utterly grumpy, I just pretend that I don't give a sh*t! > > The following series sanitizes the facility and addresses the problems. Hi Thomas, Thanks for the patchset. I agree with most your issues you complained about, just wasn't smart enough to figure out the right way to solve them. Despite your aggressive comments, I will review the code to see if it covers the scenarios that have popped up over the years and run some testing on my side. Probably need a few days to do that. Cheers, Don > > Thanks, > > tglx > --- > arch/parisc/kernel/process.c | 2 > arch/powerpc/kernel/watchdog.c | 22 - > arch/x86/events/intel/core.c | 11 > include/linux/nmi.h | 121 +++---- > include/linux/smpboot.h | 4 > kernel/cpu.c | 6 > kernel/smpboot.c | 22 - > kernel/sysctl.c | 22 - > kernel/watchdog.c | 638 ++++++++++++++--------------------------- > kernel/watchdog_hld.c | 193 ++++++------ > 10 files changed, 433 insertions(+), 608 deletions(-) > >