From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C3D87C433E2 for ; Fri, 10 Jul 2020 12:19:34 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A534C20720 for ; Fri, 10 Jul 2020 12:19:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727844AbgGJMTd (ORCPT ); Fri, 10 Jul 2020 08:19:33 -0400 Received: from mga07.intel.com ([134.134.136.100]:42024 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726774AbgGJMTd (ORCPT ); Fri, 10 Jul 2020 08:19:33 -0400 IronPort-SDR: r5V7AY1Ez+IZjKMK49C0ydTMZeo3YD6IR0pohmyKhTVIWVl6giHbKhWVhOtXhQs/6XsRnzFCBB keg/7vO/Ut8w== X-IronPort-AV: E=McAfee;i="6000,8403,9677"; a="213073204" X-IronPort-AV: E=Sophos;i="5.75,335,1589266800"; d="scan'208";a="213073204" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Jul 2020 05:19:32 -0700 IronPort-SDR: R8IRRN7fc1v9hVSd2D1kzUJdeV3EX7GTQ+he1aBMQLevX97oAj/5h5jSQ+qxg808IkMPa0GQxx 3ahXZAr9gW0A== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.75,335,1589266800"; d="scan'208";a="298417723" Received: from cli6-desk1.ccr.corp.intel.com (HELO [10.239.161.135]) ([10.239.161.135]) by orsmga002.jf.intel.com with ESMTP; 10 Jul 2020 05:19:25 -0700 Subject: Re: [RFC PATCH 14/16] irq: Add support for core-wide protection of IRQ and softirq To: Vineeth Remanan Pillai , Nishanth Aravamudan , Julien Desfossez , Peter Zijlstra , Tim Chen , mingo@kernel.org, tglx@linutronix.de, pjt@google.com, torvalds@linux-foundation.org Cc: "Joel Fernandes (Google)" , linux-kernel@vger.kernel.org, subhra.mazumdar@oracle.com, fweisbec@gmail.com, keescook@chromium.org, kerrnel@google.com, Phil Auld , Aaron Lu , Aubrey Li , Valentin Schneider , Mel Gorman , Pawan Gupta , Paolo Bonzini , Joel Fernandes , vineethrp@gmail.com, Chen Yu , Christian Brauner , Tim Chen , "Paul E . McKenney" References: From: "Li, Aubrey" Message-ID: Date: Fri, 10 Jul 2020 20:19:24 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.9.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Joel/Vineeth, On 2020/7/1 5:32, Vineeth Remanan Pillai wrote: > From: "Joel Fernandes (Google)" > > With current core scheduling patchset, non-threaded IRQ and softirq > victims can leak data from its hyperthread to a sibling hyperthread > running an attacker. > > For MDS, it is possible for the IRQ and softirq handlers to leak data to > either host or guest attackers. For L1TF, it is possible to leak to > guest attackers. There is no possible mitigation involving flushing of > buffers to avoid this since the execution of attacker and victims happen > concurrently on 2 or more HTs. > > The solution in this patch is to monitor the outer-most core-wide > irq_enter() and irq_exit() executed by any sibling. In between these > two, we mark the core to be in a special core-wide IRQ state. > > In the IRQ entry, if we detect that the sibling is running untrusted > code, we send a reschedule IPI so that the sibling transitions through > the sibling's irq_exit() to do any waiting there, till the IRQ being > protected finishes. > > We also monitor the per-CPU outer-most irq_exit(). If during the per-cpu > outer-most irq_exit(), the core is still in the special core-wide IRQ > state, we perform a busy-wait till the core exits this state. This > combination of per-cpu and core-wide IRQ states helps to handle any > combination of irq_entry()s and irq_exit()s happening on all of the > siblings of the core in any order. > > Lastly, we also check in the schedule loop if we are about to schedule > an untrusted process while the core is in such a state. This is possible > if a trusted thread enters the scheduler by way of yielding CPU. This > would involve no transitions through the irq_exit() point to do any > waiting, so we have to explicitly do the waiting there. > > Every attempt is made to prevent a busy-wait unnecessarily, and in > testing on real-world ChromeOS usecases, it has not shown a performance > drop. In ChromeOS, with this and the rest of the core scheduling > patchset, we see around a 300% improvement in key press latencies into > Google docs when Camera streaming is running simulatenously (90th > percentile latency of ~150ms drops to ~50ms). > > This fetaure is controlled by the build time config option > CONFIG_SCHED_CORE_IRQ_PAUSE and is enabled by default. There is also a > kernel boot parameter 'sched_core_irq_pause' to enable/disable the > feature at boot time. Default is enabled at boot time. We saw a lot of soft lockups on the screen when we tested v6. [ 186.527883] watchdog: BUG: soft lockup - CPU#86 stuck for 22s! [uperf:5551] [ 186.535884] watchdog: BUG: soft lockup - CPU#87 stuck for 22s! [uperf:5444] [ 186.555883] watchdog: BUG: soft lockup - CPU#89 stuck for 22s! [uperf:5547] [ 187.547884] rcu: INFO: rcu_sched self-detected stall on CPU [ 187.553760] rcu: 40-....: (14997 ticks this GP) idle=49a/1/0x4000000000000002 softirq=1711/1711 fqs=7279 [ 187.564685] NMI watchdog: Watchdog detected hard LOCKUP on cpu 14 [ 187.564723] NMI watchdog: Watchdog detected hard LOCKUP on cpu 38 The problem is gone when we reverted this patch. We are running multiple uperf threads(equal to cpu number) in a cgroup with coresched enabled. This is 100% reproducible on our side. Just wonder if anything already known before we dig into it. Thanks, -Aubrey