From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.3 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_PASS, USER_AGENT_MUTT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7013FC43381 for ; Tue, 19 Mar 2019 15:37:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 3AFA420850 for ; Tue, 19 Mar 2019 15:37:45 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="qO1EBngv" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727745AbfCSPho (ORCPT ); Tue, 19 Mar 2019 11:37:44 -0400 Received: from merlin.infradead.org ([205.233.59.134]:42310 "EHLO merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726579AbfCSPhn (ORCPT ); Tue, 19 Mar 2019 11:37:43 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=merlin.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=Vpnb2AMxvNgluY7lzcdhoE5yoHvclf4Jh9el29LaULE=; b=qO1EBngv7evdNHD5cKO8j69nP unZzGDBk8VCPnHHpvyuSBNSPK16EDdyVDsMKO98Dd0RlnNC7/wtT6eVephmSl6tjTPLQhlyVgn8Ip UvW+lPrE/sQFca+tpHPlqOB1VudF5L4VBB/pM7YR11njZy2SH7CJDJxgp/CR0QF54wIMO/k5fqM8b wuSsCpnKI2Et66HxKzCLAUV1+DHDhAWpXM90NEej+kYNb5P1PMp/U6obJyxpQJkCZCZ/ejPhPNV7k RRDZSrqvx2xErLGN0SwHVbGvmBtHfl9+NnPvqKiddaqvLZU8sQnt0iFQxV0Hkam+RrfNqcm3Ed0pq Z0Nbwm1Cw==; Received: from j217100.upc-j.chello.nl ([24.132.217.100] helo=hirez.programming.kicks-ass.net) by merlin.infradead.org with esmtpsa (Exim 4.90_1 #2 (Red Hat Linux)) id 1h6Go7-0001lq-8f; Tue, 19 Mar 2019 15:37:39 +0000 Received: by hirez.programming.kicks-ass.net (Postfix, from userid 1000) id CC97025E45416; Tue, 19 Mar 2019 16:37:37 +0100 (CET) Date: Tue, 19 Mar 2019 16:37:37 +0100 From: Peter Zijlstra To: Mel Gorman Cc: Ingo Molnar , Valentin Schneider , linux-kernel@vger.kernel.org Subject: Re: [PATCH] sched: Do not re-read h_load_next during hierarchical load calculation v2 Message-ID: <20190319153737.GK5996@hirez.programming.kicks-ass.net> References: <20190319123610.nsivgf3mjbjjesxb@techsingularity.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190319123610.nsivgf3mjbjjesxb@techsingularity.net> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Mar 19, 2019 at 12:36:10PM +0000, Mel Gorman wrote: > Changelog since v1 > o Use WRITE_ONCE > o Add Fixes: > o Add reviewed-by for the READ_ONCE part as I considered it to still be > ok even after the WRITE_ONCE > > A NULL pointer dereference bug was reported on a distribution kernel but > the same issue should be present on mainline kernel. It occured on s390 > but should not be arch-specific. A partial oops looks like > > [775277.408564] Unable to handle kernel pointer dereference in virtual kernel address space > ... > [775277.408759] Call Trace: > [775277.408763] ([<0002c11c56899c61>] 0x2c11c56899c61) > [775277.408766] [<0000000000177bb4>] try_to_wake_up+0xfc/0x450 > [775277.408773] [<000003ff81ede872>] vhost_poll_wakeup+0x3a/0x50 [vhost] > [775277.408777] [<0000000000194ae4>] __wake_up_common+0xbc/0x178 > [775277.408779] [<0000000000194f86>] __wake_up_common_lock+0x9e/0x160 > [775277.408780] [<00000000001950de>] __wake_up_sync_key+0x4e/0x60 > [775277.408785] [<00000000005d911e>] sock_def_readable+0x5e/0x98 > > The bug hits any time between 1 hour to 3 days. The dereference occurs > in update_cfs_rq_h_load when accumulating h_load. The problem is that > cfq_rq->h_load_next is not protected by any locking and can be updated > by parallel calls to task_h_load. Depending on the compiler, code may be > generated that re-reads cfq_rq->h_load_next after the check for NULL and > then oops when reading se->avg.load_avg. The dissassembly showed that it > was possible to reread h_load_next after the check for NULL. > > While this does not appear to be an issue for later compilers, it's still > an accident if the correct code is generated. Full locking in this path > would have high overhead so this patch uses READ_ONCE to read h_load_next > only once and check for NULL before dereferencing. It was confirmed that > there were no further oops after 10 days of testing. > > As Peter pointed out, it is also necessary to use WRITE_ONCE to avoid any > potential problems with store tearing. > > Fixes: 685207963be9 ("sched: Move h_load calculation to task_h_load()") > [peterz@infradead.org: Use WRITE_ONCE to protect against store tearing] > Signed-off-by: Mel Gorman > Reviewed-by: Valentin Schneider > Cc: stable@vger.kernel.org Thanks!