From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-io0-x232.google.com (mail-io0-x232.google.com [IPv6:2607:f8b0:4001:c06::232]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id E09C921A04823 for ; Mon, 1 May 2017 08:52:23 -0700 (PDT) Received: by mail-io0-x232.google.com with SMTP id p80so122597407iop.3 for ; Mon, 01 May 2017 08:52:23 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: References: <149355594185.9917.1577772489949690281.stgit@dwillia2-desk3.amr.corp.intel.com> <1493652871.30303.15.camel@hpe.com> From: Dan Williams Date: Mon, 1 May 2017 08:52:22 -0700 Message-ID: Subject: Re: [PATCH] libnvdimm: rework region badblocks clearing List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" To: "Kani, Toshimitsu" Cc: "linux-kernel@vger.kernel.org" , "linux-nvdimm@lists.01.org" List-ID: On Mon, May 1, 2017 at 8:43 AM, Dan Williams wrote: > On Mon, May 1, 2017 at 8:34 AM, Kani, Toshimitsu wrote: >> On Sun, 2017-04-30 at 05:39 -0700, Dan Williams wrote: >>> Toshi noticed that the new support for a region-level badblocks >>> missed the case where errors are cleared due to BTT I/O. >>> >>> An initial attempt to fix this ran into a "sleeping while atomic" >>> warning due to taking the nvdimm_bus_lock() in the BTT I/O path to >>> satisfy the locking requirements of __nvdimm_bus_badblocks_clear(). >>> However, that lock is not needed since we are not acting any data >>> that is subject to change due to a change of state of the bus / >>> region. The badblocks instance has its own internal lock to handle >>> mutations of the error list. >>> >>> So, to make it clear that we are just acting on region devices and >>> don't need the lock rename __nvdimm_bus_badblocks_clear() to >>> nvdimm_clear_badblocks_regions(). Eliminate the lock and consolidate >>> all routines in drivers/nvdimm/bus.c. Also, make some cleanups to >>> remove unnecessary casts, make the calling convention of >>> nvdimm_clear_badblocks_regions() clearer by replacing struct resource >>> with the minimal struct clear_badblocks_context, and use the >>> DEVICE_ATTR macro. >> >> Hi Dan, >> >> I was testing the change with CONFIG_DEBUG_ATOMIC_SLEEP set this time, >> and hit the following BUG with BTT. This is a separate issue (not >> introduced by this patch), but it shows that we have an issue with the >> DSM call path as well. > > Ah, great find, thanks! We don't see this in the unit tests because > the nfit_test infrastructure takes no sleeping actions in its > simulated DSM path. Outside of converting btt to use sleeping locks > I'm not sure I see a path forward. I wonder how bad the performance > impact of that would be? Perhaps with opportunistic spinning it won't > be so bad, but I don't see another choice. It's worse than that. Part of the performance optimization of BTT I/O was to avoid locking altogether when we could rely on a BTT lane percpu, so that would also need to be removed. _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757740AbdEAPw2 (ORCPT ); Mon, 1 May 2017 11:52:28 -0400 Received: from mail-io0-f176.google.com ([209.85.223.176]:34122 "EHLO mail-io0-f176.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757227AbdEAPwY (ORCPT ); Mon, 1 May 2017 11:52:24 -0400 MIME-Version: 1.0 In-Reply-To: References: <149355594185.9917.1577772489949690281.stgit@dwillia2-desk3.amr.corp.intel.com> <1493652871.30303.15.camel@hpe.com> From: Dan Williams Date: Mon, 1 May 2017 08:52:22 -0700 Message-ID: Subject: Re: [PATCH] libnvdimm: rework region badblocks clearing To: "Kani, Toshimitsu" Cc: "linux-nvdimm@lists.01.org" , "linux-kernel@vger.kernel.org" , "Jiang, Dave" , "Verma, Vishal L" Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, May 1, 2017 at 8:43 AM, Dan Williams wrote: > On Mon, May 1, 2017 at 8:34 AM, Kani, Toshimitsu wrote: >> On Sun, 2017-04-30 at 05:39 -0700, Dan Williams wrote: >>> Toshi noticed that the new support for a region-level badblocks >>> missed the case where errors are cleared due to BTT I/O. >>> >>> An initial attempt to fix this ran into a "sleeping while atomic" >>> warning due to taking the nvdimm_bus_lock() in the BTT I/O path to >>> satisfy the locking requirements of __nvdimm_bus_badblocks_clear(). >>> However, that lock is not needed since we are not acting any data >>> that is subject to change due to a change of state of the bus / >>> region. The badblocks instance has its own internal lock to handle >>> mutations of the error list. >>> >>> So, to make it clear that we are just acting on region devices and >>> don't need the lock rename __nvdimm_bus_badblocks_clear() to >>> nvdimm_clear_badblocks_regions(). Eliminate the lock and consolidate >>> all routines in drivers/nvdimm/bus.c. Also, make some cleanups to >>> remove unnecessary casts, make the calling convention of >>> nvdimm_clear_badblocks_regions() clearer by replacing struct resource >>> with the minimal struct clear_badblocks_context, and use the >>> DEVICE_ATTR macro. >> >> Hi Dan, >> >> I was testing the change with CONFIG_DEBUG_ATOMIC_SLEEP set this time, >> and hit the following BUG with BTT. This is a separate issue (not >> introduced by this patch), but it shows that we have an issue with the >> DSM call path as well. > > Ah, great find, thanks! We don't see this in the unit tests because > the nfit_test infrastructure takes no sleeping actions in its > simulated DSM path. Outside of converting btt to use sleeping locks > I'm not sure I see a path forward. I wonder how bad the performance > impact of that would be? Perhaps with opportunistic spinning it won't > be so bad, but I don't see another choice. It's worse than that. Part of the performance optimization of BTT I/O was to avoid locking altogether when we could rely on a BTT lane percpu, so that would also need to be removed.