All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: "Kani, Toshimitsu" <toshi.kani@hpe.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>
Subject: Re: [PATCH] libnvdimm: rework region badblocks clearing
Date: Mon, 1 May 2017 08:52:22 -0700	[thread overview]
Message-ID: <CAPcyv4gxuez8BvtWwwcTXCKpzKp+oGQrRQG8+2iQ6XqRAfT_3g@mail.gmail.com> (raw)
In-Reply-To: <CAPcyv4g4G_fJf6OOGpWbpyMdNioPWEFoJ2eO7+dSTcep+OerFw@mail.gmail.com>

On Mon, May 1, 2017 at 8:43 AM, Dan Williams <dan.j.williams@intel.com> wrote:
> On Mon, May 1, 2017 at 8:34 AM, Kani, Toshimitsu <toshi.kani@hpe.com> wrote:
>> On Sun, 2017-04-30 at 05:39 -0700, Dan Williams wrote:
>>> Toshi noticed that the new support for a region-level badblocks
>>> missed the case where errors are cleared due to BTT I/O.
>>>
>>> An initial attempt to fix this ran into a "sleeping while atomic"
>>> warning due to taking the nvdimm_bus_lock() in the BTT I/O path to
>>> satisfy the locking requirements of __nvdimm_bus_badblocks_clear().
>>> However, that lock is not needed since we are not acting any data
>>> that is subject to change due to a change of state of the bus /
>>> region. The badblocks instance has its own internal lock to handle
>>> mutations of the error list.
>>>
>>> So, to make it clear that we are just acting on region devices and
>>> don't need the lock rename __nvdimm_bus_badblocks_clear() to
>>> nvdimm_clear_badblocks_regions(). Eliminate the lock and consolidate
>>> all routines in drivers/nvdimm/bus.c. Also, make some cleanups to
>>> remove unnecessary casts, make the calling convention of
>>> nvdimm_clear_badblocks_regions() clearer by replacing struct resource
>>> with the minimal struct clear_badblocks_context, and use the
>>> DEVICE_ATTR macro.
>>
>> Hi Dan,
>>
>> I was testing the change with CONFIG_DEBUG_ATOMIC_SLEEP set this time,
>> and hit the following BUG with BTT.  This is a separate issue (not
>> introduced by this patch), but it shows that we have an issue with the
>> DSM call path as well.
>
> Ah, great find, thanks! We don't see this in the unit tests because
> the nfit_test infrastructure takes no sleeping actions in its
> simulated DSM path. Outside of converting btt to use sleeping locks
> I'm not sure I see a path forward. I wonder how bad the performance
> impact of that would be? Perhaps with opportunistic spinning it won't
> be so bad, but I don't see another choice.

It's worse than that. Part of the performance optimization of BTT I/O
was to avoid locking altogether when we could rely on a BTT lane
percpu, so that would also need to be removed.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

WARNING: multiple messages have this Message-ID (diff)
From: Dan Williams <dan.j.williams@intel.com>
To: "Kani, Toshimitsu" <toshi.kani@hpe.com>
Cc: "linux-nvdimm@lists.01.org" <linux-nvdimm@ml01.01.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"Jiang, Dave" <dave.jiang@intel.com>,
	"Verma, Vishal L" <vishal.l.verma@intel.com>
Subject: Re: [PATCH] libnvdimm: rework region badblocks clearing
Date: Mon, 1 May 2017 08:52:22 -0700	[thread overview]
Message-ID: <CAPcyv4gxuez8BvtWwwcTXCKpzKp+oGQrRQG8+2iQ6XqRAfT_3g@mail.gmail.com> (raw)
In-Reply-To: <CAPcyv4g4G_fJf6OOGpWbpyMdNioPWEFoJ2eO7+dSTcep+OerFw@mail.gmail.com>

On Mon, May 1, 2017 at 8:43 AM, Dan Williams <dan.j.williams@intel.com> wrote:
> On Mon, May 1, 2017 at 8:34 AM, Kani, Toshimitsu <toshi.kani@hpe.com> wrote:
>> On Sun, 2017-04-30 at 05:39 -0700, Dan Williams wrote:
>>> Toshi noticed that the new support for a region-level badblocks
>>> missed the case where errors are cleared due to BTT I/O.
>>>
>>> An initial attempt to fix this ran into a "sleeping while atomic"
>>> warning due to taking the nvdimm_bus_lock() in the BTT I/O path to
>>> satisfy the locking requirements of __nvdimm_bus_badblocks_clear().
>>> However, that lock is not needed since we are not acting any data
>>> that is subject to change due to a change of state of the bus /
>>> region. The badblocks instance has its own internal lock to handle
>>> mutations of the error list.
>>>
>>> So, to make it clear that we are just acting on region devices and
>>> don't need the lock rename __nvdimm_bus_badblocks_clear() to
>>> nvdimm_clear_badblocks_regions(). Eliminate the lock and consolidate
>>> all routines in drivers/nvdimm/bus.c. Also, make some cleanups to
>>> remove unnecessary casts, make the calling convention of
>>> nvdimm_clear_badblocks_regions() clearer by replacing struct resource
>>> with the minimal struct clear_badblocks_context, and use the
>>> DEVICE_ATTR macro.
>>
>> Hi Dan,
>>
>> I was testing the change with CONFIG_DEBUG_ATOMIC_SLEEP set this time,
>> and hit the following BUG with BTT.  This is a separate issue (not
>> introduced by this patch), but it shows that we have an issue with the
>> DSM call path as well.
>
> Ah, great find, thanks! We don't see this in the unit tests because
> the nfit_test infrastructure takes no sleeping actions in its
> simulated DSM path. Outside of converting btt to use sleeping locks
> I'm not sure I see a path forward. I wonder how bad the performance
> impact of that would be? Perhaps with opportunistic spinning it won't
> be so bad, but I don't see another choice.

It's worse than that. Part of the performance optimization of BTT I/O
was to avoid locking altogether when we could rely on a BTT lane
percpu, so that would also need to be removed.

  reply	other threads:[~2017-05-01 15:52 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-30 12:39 [PATCH] libnvdimm: rework region badblocks clearing Dan Williams
2017-04-30 12:39 ` Dan Williams
2017-05-01 15:34 ` Kani, Toshimitsu
2017-05-01 15:34   ` Kani, Toshimitsu
2017-05-01 15:43   ` Dan Williams
2017-05-01 15:43     ` Dan Williams
2017-05-01 15:52     ` Dan Williams [this message]
2017-05-01 15:52       ` Dan Williams
2017-05-01 16:12       ` Kani, Toshimitsu
2017-05-01 16:12         ` Kani, Toshimitsu
2017-05-01 16:16         ` Dan Williams
2017-05-01 16:16           ` Dan Williams
2017-05-01 16:20           ` Kani, Toshimitsu
2017-05-01 16:20             ` Kani, Toshimitsu
2017-05-01 16:38             ` Dan Williams
2017-05-01 16:38               ` Dan Williams
2017-05-01 16:42               ` Verma, Vishal L
2017-05-01 16:42                 ` Verma, Vishal L
2017-05-01 16:45                 ` Kani, Toshimitsu
2017-05-01 16:45                   ` Kani, Toshimitsu
2017-05-01 21:26 ` Kani, Toshimitsu
2017-05-01 21:26   ` Kani, Toshimitsu
2017-05-01 23:09   ` Dan Williams
2017-05-01 23:09     ` Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAPcyv4gxuez8BvtWwwcTXCKpzKp+oGQrRQG8+2iQ6XqRAfT_3g@mail.gmail.com \
    --to=dan.j.williams@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=toshi.kani@hpe.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.