From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755610AbdEAQia (ORCPT ); Mon, 1 May 2017 12:38:30 -0400 Received: from mail-it0-f43.google.com ([209.85.214.43]:37191 "EHLO mail-it0-f43.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752749AbdEAQi2 (ORCPT ); Mon, 1 May 2017 12:38:28 -0400 MIME-Version: 1.0 In-Reply-To: <1493655607.30303.19.camel@hpe.com> References: <149355594185.9917.1577772489949690281.stgit@dwillia2-desk3.amr.corp.intel.com> <1493652871.30303.15.camel@hpe.com> <1493655131.30303.17.camel@hpe.com> <1493655607.30303.19.camel@hpe.com> From: Dan Williams Date: Mon, 1 May 2017 09:38:27 -0700 Message-ID: Subject: Re: [PATCH] libnvdimm: rework region badblocks clearing To: "Kani, Toshimitsu" Cc: "linux-kernel@vger.kernel.org" , "linux-nvdimm@lists.01.org" , "Jiang, Dave" , "Verma, Vishal L" Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, May 1, 2017 at 9:20 AM, Kani, Toshimitsu wrote: > On Mon, 2017-05-01 at 09:16 -0700, Dan Williams wrote: >> On Mon, May 1, 2017 at 9:12 AM, Kani, Toshimitsu >> wrote: >> > On Mon, 2017-05-01 at 08:52 -0700, Dan Williams wrote: >> > > On Mon, May 1, 2017 at 8:43 AM, Dan Williams > > > l.co >> > > m> wrote: >> > > > On Mon, May 1, 2017 at 8:34 AM, Kani, Toshimitsu > > > > e.co >> > > > m> wrote: >> > > > > On Sun, 2017-04-30 at 05:39 -0700, Dan Williams wrote: >> > >> > : >> > > > > >> > > > > Hi Dan, >> > > > > >> > > > > I was testing the change with CONFIG_DEBUG_ATOMIC_SLEEP set >> > > > > this time, and hit the following BUG with BTT. This is a >> > > > > separate issue (not introduced by this patch), but it shows >> > > > > that we have an issue with the DSM call path as well. >> > > > >> > > > Ah, great find, thanks! We don't see this in the unit tests >> > > > because the nfit_test infrastructure takes no sleeping actions >> > > > in its simulated DSM path. Outside of converting btt to use >> > > > sleeping locks I'm not sure I see a path forward. I wonder how >> > > > bad the performance impact of that would be? Perhaps with >> > > > opportunistic spinning it won't be so bad, but I don't see >> > > > another choice. >> > > >> > > It's worse than that. Part of the performance optimization of BTT >> > > I/O was to avoid locking altogether when we could rely on a BTT >> > > lane percpu, so that would also need to be removed. >> > >> > I do not have a good idea either, but I'd rather disable this >> > clearing in the regular BTT write path than adding sleeping locks >> > to BTT. Clearing a bad block in the BTT write path is >> > difficult/challenging since it allocates a new block. >> >> Actually, that may make things easier. Can we teach BTT to track >> error blocks and clear them before they are reassigned? > > I was thinking the same after sending it. I think we should be able to > do that. Ok, but we obviously can't develop something that detailed while the merge window is open, so I think that means we need to revert commit e88da7998d7d "Revert 'libnvdimm: band aid btt vs clear poison locking'" and leave BTT I/O-error-clearing disabled for this cycle and try again for 4.13.