All of lore.kernel.org
 help / color / mirror / Atom feed
From: Prarit Bhargava <prarit@redhat.com>
To: "Long, Wai Man" <waiman.long@hpe.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"x86@kernel.org" <x86@kernel.org>, Borislav Petkov <bp@suse.de>,
	Andy Lutomirski <luto@kernel.org>,
	"Norton, Scott J" <scott.norton@hpe.com>,
	"Hatch, Douglas B (HPE Servers - Linux)" <doug.hatch@hpe.com>,
	"Wright, Randy (HPE Servers Linux)" <rwright@hpe.com>
Subject: Re: [RESEND PATCH v4] x86/hpet: Reduce HPET counter read contention
Date: Wed, 10 Aug 2016 15:01:12 -0400	[thread overview]
Message-ID: <57AB79F8.8080306@redhat.com> (raw)
In-Reply-To: <CS1PR84MB0312E73E9046C70343752C7CF11D0@CS1PR84MB0312.NAMPRD84.PROD.OUTLOOK.COM>



On 08/10/2016 02:37 PM, Long, Wai Man wrote:
> Hi,
> 
> I would like to restart the discussion about the merit of this patch.
> 
> This patch was created in response to a problem we have on the 16-socket Broadwell-EX systems (up to 768 logical CPUs) that were under development. About 10% of the kernel boots experienced soft lockups:
> 
> [   71.618132] NetLabel: Initializing
> [   71.621967] NetLabel:  domain hash size = 128
> [   71.626848] NetLabel:  protocols = UNLABELED CIPSOv4
> [   71.632418] NetLabel:  unlabeled traffic allowed by default
> [   71.638679] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0, 0, 0, 0, 0, 0
> [   71.646504] hpet0: 8 comparators, 64-bit 14.318180 MHz counter
> [   71.655313] Switching to clocksource hpet
> [   95.679135] BUG: soft lockup - CPU#144 stuck for 23s! [swapper/144:0]
> [   95.693363] BUG: soft lockup - CPU#145 stuck for 23s! [swapper/145:0]
> [   95.694203] Modules linked in:
> [   95.694697] CPU: 145 PID: 0 Comm: swapper/145 Not tainted
> 3.10.0-327.el7.x86_64 #1
> [   95.695580] BUG: soft lockup - CPU#582 stuck for 23s! [swapper/582:0]
> [   95.696145] Hardware name: HP Superdome2 16s x86, BIOS Bundle: 
> 008.001.006
> SFW: 041.063.152 01/16/2016
> [   95.698128] BUG: soft lockup - CPU#357 stuck for 23s! [swapper/357:0]
> [   95.699774] task: ffff8cf7fecf4500 ti: ffff89787c748000 task.ti: 
> ffff89787c748000
> 
> During the bootup process, there is a short time where the clocksource is switched to hpet to calibrate the tsc's. Then it will be switched back to tsc once the calibration is done. It is during the short period that soft lockups may happen.
> 
> Prarit also hit this problem with a smaller Intel box that has 96 cores (192 threads). Maybe he can supply more information of what he had seen.
> 

I've hit this on a system with 192 threads.  The TSC is functional and has
passed the TSC sync checks during boot.  When the HPET is used to resynchronize
the TSC, I occasionally see

PCI: Using ACPI for IRQ routing
NetLabel: Initializing
NetLabel:  domain hash size = 128
NetLabel:  protocols = UNLABELED CIPSOv4
NetLabel:  unlabeled traffic allowed by default
hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0, 0, 0, 0, 0, 0
hpet0: 8 comparators, 64-bit 24.000000 MHz counter
Switched to clocksource hpet

followed by the same NMI flood that Waiman described.  After some debugging I
came to the same conclusion that Waiman had, the HPET is causing contention on
the system with many threads accessing it rapidly.

After applying his patch the problem no longer occurs.

P.

  reply	other threads:[~2016-08-10 19:02 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-10 18:29 [RESEND PATCH v4] x86/hpet: Reduce HPET counter read contention Waiman Long
2016-08-10 18:37 ` Long, Wai Man
2016-08-10 19:01   ` Prarit Bhargava [this message]
2016-08-11 19:32 ` Dave Hansen
2016-08-11 23:22   ` Waiman Long
2016-08-12  0:31     ` Dave Hansen
2016-08-12 17:01       ` Waiman Long
2016-08-12 17:16         ` Dave Hansen
2016-08-12 18:31           ` Waiman Long
2016-08-12 20:18             ` Andy Lutomirski
2016-08-12 21:10               ` Waiman Long
2016-08-12 21:20                 ` Dave Hansen
2016-08-12 21:32                   ` Waiman Long
2016-08-12 21:16               ` Dave Hansen
2016-08-12 21:32                 ` Waiman Long
  -- strict thread matches above, loose matches on Subject: below --
2016-06-17 20:20 Waiman Long
2016-07-13 15:02 ` Waiman Long

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57AB79F8.8080306@redhat.com \
    --to=prarit@redhat.com \
    --cc=bp@suse.de \
    --cc=doug.hatch@hpe.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=rwright@hpe.com \
    --cc=scott.norton@hpe.com \
    --cc=tglx@linutronix.de \
    --cc=waiman.long@hpe.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.