From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.0 required=3.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 68DBEC43387 for ; Wed, 16 Jan 2019 16:56:07 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 2BB15205C9 for ; Wed, 16 Jan 2019 16:56:07 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=intel-com.20150623.gappssmtp.com header.i=@intel-com.20150623.gappssmtp.com header.b="1fohCox4" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2394376AbfAPQ4F (ORCPT ); Wed, 16 Jan 2019 11:56:05 -0500 Received: from mail-ot1-f66.google.com ([209.85.210.66]:35540 "EHLO mail-ot1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2388325AbfAPQ4E (ORCPT ); Wed, 16 Jan 2019 11:56:04 -0500 Received: by mail-ot1-f66.google.com with SMTP id 81so8366511otj.2 for ; Wed, 16 Jan 2019 08:56:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=AOAiEL+/8C4L4IM8a4sTEPSnKDCXeMhX5DoT7rtynYE=; b=1fohCox49WptNDKFo+AllzPtWzec91m1mOEYaAsgXNjWJ5CrmaNa9SXb+GU646gyK6 xGBSgpag4ZWSTEcrNG9m7s3X3LpcCMV3xbRiNCDnkwLO2R6TRY3Kh+DggzVmEEJqkjHJ TqfvzS03YrEFTp4dz0CNRTv+naEgefc2iCMmLuLVfA518QJ29tTQX+5WkXzMbnSS/8B/ le/MvXWhdGVuRm9JPOXDOBcJPWMi/LmDtHTNad9kPi8PfuxPUKEorXa7JvmxMsm+rRHl MH0lTAKlt/kWuxwvW3nq7KYwnRasqnF8HngMz7KMuJaRReyplTiWix/Uco4cCeFI/jtG 1m2Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=AOAiEL+/8C4L4IM8a4sTEPSnKDCXeMhX5DoT7rtynYE=; b=fv2PsOuNDkkdnSD5mNynWi1WlInOGFLwtlnyN73kjbbOQBcoI4cE3dsMf4W+3ZcC/x 2z+onIVhggpjavDd7gCLafsI2qM06Sdf/ovWpFXfTgJ50+qEe+TrTCfjA9Hn65hPj/VF Xkydk8bKCKbb6pJxw3Gc8rP+Rfwjgr26O3MjPU3AXh7BJjqXNt97AbSiYm9pJAzoskQ/ Qq6xZcJIbrJAWwgZhZKEwVh0drCB6KWJPx7doabR+8znNYJvlTJT1jsSyBtCFqIulhau P99RIiKOXXWAVISQcZz/C4I57Gcl3ZIjS23/SDvkSjZYVJ6MDZPxQmslIun9V0VBB39r YeeA== X-Gm-Message-State: AJcUukdKbLFZ0ZnqXyi29t+2FwvQJ+fWJrtp1+WxGmMrCd8yo2Fh6ZAL R7Zry2ZU/Hj7xSKs1RQrGKPW8BGCbkC+wHfd6jDZUA== X-Google-Smtp-Source: ALg8bN5y5OaMHKHunRJ8UPAnFsJvM/MFVlI1THG04lITO1eS6slbD+jeGnS78UFz2u+u9ZNDtte+pMBriGuVYDlLquk= X-Received: by 2002:a9d:6a50:: with SMTP id h16mr5837069otn.95.1547657763252; Wed, 16 Jan 2019 08:56:03 -0800 (PST) MIME-Version: 1.0 References: <20190111081401.GA5080@hori1.linux.bs1.fc.nec.co.jp> <20190116093046.GA29835@hori1.linux.bs1.fc.nec.co.jp> In-Reply-To: <20190116093046.GA29835@hori1.linux.bs1.fc.nec.co.jp> From: Dan Williams Date: Wed, 16 Jan 2019 08:55:52 -0800 Message-ID: Subject: Re: [PATCH] mm: hwpoison: use do_send_sig_info() instead of force_sig() (Re: PMEM error-handling forces SIGKILL causes kernel panic) To: Naoya Horiguchi Cc: Andrew Morton , "linux-mm@kvack.org" , Jane Chu , linux-nvdimm , Linux Kernel Mailing List Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jan 16, 2019 at 1:33 AM Naoya Horiguchi wrote: > > [ CCed Andrew and linux-mm ] > > On Fri, Jan 11, 2019 at 08:14:02AM +0000, Horiguchi Naoya(=E5=A0=80=E5=8F= =A3 =E7=9B=B4=E4=B9=9F) wrote: > > Hi Dan, Jane, > > > > Thanks for the report. > > > > On Wed, Jan 09, 2019 at 03:49:32PM -0800, Dan Williams wrote: > > > [ switch to text mail, add lkml and Naoya ] > > > > > > On Wed, Jan 9, 2019 at 12:19 PM Jane Chu wrote: > > ... > > > > 3. The hardware consists the latest revision CPU and Intel NVDIMM, = we suspected > > > > the CPU faulty because it generated MCE over PMEM UE in a unlike= ly high > > > > rate for any reasonable NVDIMM (like a few per 24hours). > > > > > > > > After swapping the CPU, the problem stopped reproducing. > > > > > > > > But one could argue that perhaps the faulty CPU exposed a small rac= e window > > > > from collect_procs() to unmap_mapping_range() and to kill_procs(), = hence > > > > caught the kernel PMEM error handler off guard. > > > > > > There's definitely a race, and the implementation is buggy as can be > > > seen in __exit_signal: > > > > > > sighand =3D rcu_dereference_check(tsk->sighand, > > > lockdep_tasklist_lock_is_held= ()); > > > spin_lock(&sighand->siglock); > > > > > > ...the memory-failure path needs to hold the proper locks before it > > > can assume that de-referencing tsk->sighand is valid. > > > > > > > Also note, the same workload on the same faulty CPU were run on Lin= ux prior to > > > > the 4.19 PMEM error handling and did not encounter kernel crash, pr= obably because > > > > the prior HWPOISON handler did not force SIGKILL? > > > > > > Before 4.19 this test should result in a machine-check reboot, not > > > much better than a kernel crash. > > > > > > > Should we not to force the SIGKILL, or find a way to close the race= window? > > > > > > The race should be closed by holding the proper tasklist and rcu read= lock(s). > > > > This reasoning and proposal sound right to me. I'm trying to reproduce > > this race (for non-pmem case,) but no luck for now. I'll investigate mo= re. > > I wrote/tested a patch for this issue. > I think that switching signal API effectively does proper locking. > > Thanks, > Naoya Horiguchi > --- > From 16dbf6105ff4831f73276d79d5df238ab467de76 Mon Sep 17 00:00:00 2001 > From: Naoya Horiguchi > Date: Wed, 16 Jan 2019 16:59:27 +0900 > Subject: [PATCH] mm: hwpoison: use do_send_sig_info() instead of force_si= g() > > Currently memory_failure() is racy against process's exiting, > which results in kernel crash by null pointer dereference. > > The root cause is that memory_failure() uses force_sig() to forcibly > kill asynchronous (meaning not in the current context) processes. As > discussed in thread https://lkml.org/lkml/2010/6/8/236 years ago for > OOM fixes, this is not a right thing to do. OOM solves this issue by > using do_send_sig_info() as done in commit d2d393099de2 ("signal: > oom_kill_task: use SEND_SIG_FORCED instead of force_sig()"), so this > patch is suggesting to do the same for hwpoison. do_send_sig_info() > properly accesses to siglock with lock_task_sighand(), so is free from > the reported race. > > I confirmed that the reported bug reproduces with inserting some delay > in kill_procs(), and it never reproduces with this patch. > > Note that memory_failure() can send another type of signal using > force_sig_mceerr(), and the reported race shouldn't happen on it > because force_sig_mceerr() is called only for synchronous processes > (i.e. BUS_MCEERR_AR happens only when some process accesses to the > corrupted memory.) > > Reported-by: Jane Chu > Cc: Dan Williams > Cc: stable@vger.kernel.org > Signed-off-by: Naoya Horiguchi > --- Looks good to me. Reviewed-by: Dan Williams ...but it would still be good to get a Tested-by from Jane.