linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [Possible REGRESSION, 4.16-rc4] Error updating SMART data during runtime and could not connect to lvmetad at some boot attempts
@ 2018-03-11  8:20 Martin Steigerwald
  2018-03-11 14:37 ` Hans de Goede
  2018-03-19  9:42 ` Thorsten Leemhuis
  0 siblings, 2 replies; 18+ messages in thread
From: Martin Steigerwald @ 2018-03-11  8:20 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Thorsten Leemhuis, Tejun Heo, Hans de Goede

[-- Attachment #1: Type: text/plain, Size: 3040 bytes --]

Hello.

Since 4.16-rc4 (upgraded from 4.15.2 which worked) I have an issue
with SMART checks occassionally failing like this:

smartd[28017]: Device: /dev/sdb [SAT], is in SLEEP mode, suspending checks 
udisksd[24408]: Error performing housekeeping for drive /org/freedesktop/UDisks2/drives/INTEL_SSDSA2CW300G3_[…]: Error updating SMART
data: Error sending ATA command CHECK POWER MODE: Unexpected sense data returned:#0120000: 0e 09 0c 00  00 00 ff 00  00 00 00 00  00 00 50 00    ..............P.#0120010: 
00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00    ................#012 (g-io-error-quark, 0) 
merkaba udisksd[24408]: Error performing housekeeping for drive /org/freedesktop/UDisks2/drives/Crucial_CT480M500SSD3_[…]: Error updating SMART dat
a: Error sending ATA command CHECK POWER MODE: Unexpected sense data returned:#0120000: 01 00 1d 00  00 00 0e 09  0c 00 00 00  ff 00 00 00    ................#0120010: 00 0
0 00 00  50 00 00 00  00 00 00 00  00 00 00 00    ....P...........#012 (g-io-error-quark, 0)

(Intel SSD is connected via SATA, Crucial via mSATA in a ThinkPad T520)

However when I then check manually with smartctl -a | -x | -H the device
reports SMART data just fine.

As smartd correctly detects that device is in sleep mode, this may be an
userspace issue in udisksd.

Also at some boot attempts the boot hangs with a message like "could not
connect to lvmetad, scanning manually for devices". I use BTRFS RAID 1
on to LVs (each on one of the SSDs). A configuration that requires a manual
adaption to InitRAMFS in order to boot (basically vgchange -ay before
btrfs device scan).

I wonder whether that has to do with the new SATA LPM policy stuff, but as
I had issues with

 3 => Medium power with Device Initiated PM enabled

(machine did not boot, which could also have been caused by me accidentally
removing all TCP/IP network support in the kernel with that setting)

I set it back to

CONFIG_SATA_MOBILE_LPM_POLICY=0

(firmware settings)

Only other significant change I am aware of is that I switched from SLAB
to SLUB allocator as Debian did with their kernels recently I think.

I attach the complete configuration as xz.

Please understand that I am not into doing a bisect as it can take quite a
a while for the issue to appear and I will be holding a Linux training next
week. If you have any other suggestions, please tell.

I found a thread in LKML about another Crucial SSD not working with more
aggressive LPM settings, yet my current 4.16-rc4 kernel runs with LPM policy
0 which should be safe ([PATCH] libata: Apply NOLPM quirk to Crucial MX100 512GB SSDs).

Also about  3 => Medium power with Device Initiated PM enabled I am not yet
sure which of the both SSDs may cause trouble.

Also posted as bug report:

Bug 199077 - [Possible REGRESSION, 4.16-rc4] Error updating SMART data during runtime and could not connect to lvmetad at some boot attempts
https://bugzilla.kernel.org/show_bug.cgi?id=199077

Thanks,
-- 
Martin

[-- Attachment #2: config-4.16.0-rc4-tp520-btrfstrim+.xz --]
[-- Type: application/x-xz, Size: 26744 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Possible REGRESSION, 4.16-rc4] Error updating SMART data during runtime and could not connect to lvmetad at some boot attempts
  2018-03-11  8:20 [Possible REGRESSION, 4.16-rc4] Error updating SMART data during runtime and could not connect to lvmetad at some boot attempts Martin Steigerwald
@ 2018-03-11 14:37 ` Hans de Goede
  2018-03-11 16:28   ` Martin Steigerwald
                     ` (2 more replies)
  2018-03-19  9:42 ` Thorsten Leemhuis
  1 sibling, 3 replies; 18+ messages in thread
From: Hans de Goede @ 2018-03-11 14:37 UTC (permalink / raw)
  To: Martin Steigerwald, Linux Kernel Mailing List
  Cc: Thorsten Leemhuis, Tejun Heo

[-- Attachment #1: Type: text/plain, Size: 2783 bytes --]

Hi Martin,

On 11-03-18 09:20, Martin Steigerwald wrote:
> Hello.
> 
> Since 4.16-rc4 (upgraded from 4.15.2 which worked) I have an issue
> with SMART checks occassionally failing like this:
> 
> smartd[28017]: Device: /dev/sdb [SAT], is in SLEEP mode, suspending checks
> udisksd[24408]: Error performing housekeeping for drive /org/freedesktop/UDisks2/drives/INTEL_SSDSA2CW300G3_[…]: Error updating SMART
> data: Error sending ATA command CHECK POWER MODE: Unexpected sense data returned:#0120000: 0e 09 0c 00  00 00 ff 00  00 00 00 00  00 00 50 00    ..............P.#0120010:
> 00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00    ................#012 (g-io-error-quark, 0)
> merkaba udisksd[24408]: Error performing housekeeping for drive /org/freedesktop/UDisks2/drives/Crucial_CT480M500SSD3_[…]: Error updating SMART dat
> a: Error sending ATA command CHECK POWER MODE: Unexpected sense data returned:#0120000: 01 00 1d 00  00 00 0e 09  0c 00 00 00  ff 00 00 00    ................#0120010: 00 0
> 0 00 00  50 00 00 00  00 00 00 00  00 00 00 00    ....P...........#012 (g-io-error-quark, 0)
> 
> (Intel SSD is connected via SATA, Crucial via mSATA in a ThinkPad T520)
> 
> However when I then check manually with smartctl -a | -x | -H the device
> reports SMART data just fine.
> 
> As smartd correctly detects that device is in sleep mode, this may be an
> userspace issue in udisksd.
> 
> Also at some boot attempts the boot hangs with a message like "could not
> connect to lvmetad, scanning manually for devices". I use BTRFS RAID 1
> on to LVs (each on one of the SSDs). A configuration that requires a manual
> adaption to InitRAMFS in order to boot (basically vgchange -ay before
> btrfs device scan).
> 
> I wonder whether that has to do with the new SATA LPM policy stuff, but as
> I had issues with
> 
>   3 => Medium power with Device Initiated PM enabled
> 
> (machine did not boot, which could also have been caused by me accidentally
> removing all TCP/IP network support in the kernel with that setting)
> 
> I set it back to
> 
> CONFIG_SATA_MOBILE_LPM_POLICY=0
> 
> (firmware settings)

Right, so at that settings the LPM policy changes are effectively
disabled and cannot explain your SMART issues.

Still I would like to zoom in on this part of your bug report, because
for Fedora 28 we are planning to ship with CONFIG_SATA_MOBILE_LPM_POLICY=3
and AFAIK Ubuntu has similar plans.

I suspect that the issue you were seeing with CONFIG_SATA_MOBILE_LPM_POLICY=3
were with the Crucial disk ? I've attached a patch for you to test, which
disabled LPM for your model Crucial SSD (but keeps it on for the Intel disk)
if you can confirm that with that patch you can run with
CONFIG_SATA_MOBILE_LPM_POLICY=3 without issues that would be great.

Regards,

Hans

[-- Attachment #2: 0001-libata-Apply-NOLPM-quirk-to-Crucial-M500-480GB-SSDs.patch --]
[-- Type: text/x-patch, Size: 1425 bytes --]

>From 551654d311d91c3cecde233eda86686f5d786fc2 Mon Sep 17 00:00:00 2001
From: Hans de Goede <hdegoede@redhat.com>
Date: Sun, 11 Mar 2018 15:32:00 +0100
Subject: [PATCH] libata: Apply NOLPM quirk to Crucial M500 480GB SSDs

There have been reports of the Crucial M500 480GB model not working
with LPM set to min_power / med_power_with_dipm level.

It has no been tested with medium_power, but that typically has no
measurable power-savings.

This commit adds a NOLPM quirk to avoid LPM causing issues with these SSDs.

Cc: stable@vger.kernel.org
Reported-by: Martin Steigerwald <martin@lichtvoll.de>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
---
 drivers/ata/libata-core.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index d8be0fe548f7..197e2c7f560e 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -4535,6 +4535,11 @@ static const struct ata_blacklist_entry ata_device_blacklist [] = {
 						ATA_HORKAGE_ZERO_AFTER_TRIM |
 						ATA_HORKAGE_NOLPM, },
 
+	/* The 480GB version of the M500 has both queued TRIM and LPM issues */
+	{ "Crucial_CT480M500*",		NULL,	ATA_HORKAGE_NO_NCQ_TRIM |
+						ATA_HORKAGE_ZERO_AFTER_TRIM |
+						ATA_HORKAGE_NOLPM, },
+
 	/* devices that don't properly handle queued TRIM commands */
 	{ "Micron_M500_*",		NULL,	ATA_HORKAGE_NO_NCQ_TRIM |
 						ATA_HORKAGE_ZERO_AFTER_TRIM, },
-- 
2.14.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [Possible REGRESSION, 4.16-rc4] Error updating SMART data during runtime and could not connect to lvmetad at some boot attempts
  2018-03-11 14:37 ` Hans de Goede
@ 2018-03-11 16:28   ` Martin Steigerwald
  2018-03-11 16:41     ` Hans de Goede
  2018-03-13 13:08   ` Martin Steigerwald
  2018-03-14 11:01   ` Martin Steigerwald
  2 siblings, 1 reply; 18+ messages in thread
From: Martin Steigerwald @ 2018-03-11 16:28 UTC (permalink / raw)
  To: Hans de Goede; +Cc: Linux Kernel Mailing List, Thorsten Leemhuis, Tejun Heo

Hans de Goede - 11.03.18, 15:37:
> Hi Martin,
> 
> On 11-03-18 09:20, Martin Steigerwald wrote:
> > Hello.
> > 
> > Since 4.16-rc4 (upgraded from 4.15.2 which worked) I have an issue
> > with SMART checks occassionally failing like this:
> > 
> > smartd[28017]: Device: /dev/sdb [SAT], is in SLEEP mode, suspending checks
> > udisksd[24408]: Error performing housekeeping for drive
> > /org/freedesktop/UDisks2/drives/INTEL_SSDSA2CW300G3_[…]: Error updating
> > SMART data: Error sending ATA command CHECK POWER MODE: Unexpected sense
> > data returned:#0120000: 0e 09 0c 00  00 00 ff 00  00 00 00 00  00 00 50
> > 00    ..............P.#0120010: 00 00 00 00  00 00 00 00  00 00 00 00  00
> > 00 00 00    ................#012 (g-io-error-quark, 0) merkaba
> > udisksd[24408]: Error performing housekeeping for drive
> > /org/freedesktop/UDisks2/drives/Crucial_CT480M500SSD3_[…]: Error updating
> > SMART dat a: Error sending ATA command CHECK POWER MODE: Unexpected sense
> > data returned:#0120000: 01 00 1d 00  00 00 0e 09  0c 00 00 00  ff 00 00
> > 00    ................#0120010: 00 0 0 00 00  50 00 00 00  00 00 00 00 
> > 00 00 00 00    ....P...........#012 (g-io-error-quark, 0)
> > 
> > (Intel SSD is connected via SATA, Crucial via mSATA in a ThinkPad T520)
> > 
> > However when I then check manually with smartctl -a | -x | -H the device
> > reports SMART data just fine.
> > 
> > As smartd correctly detects that device is in sleep mode, this may be an
> > userspace issue in udisksd.
> > 
> > Also at some boot attempts the boot hangs with a message like "could not
> > connect to lvmetad, scanning manually for devices". I use BTRFS RAID 1
> > on to LVs (each on one of the SSDs). A configuration that requires a
> > manual
> > adaption to InitRAMFS in order to boot (basically vgchange -ay before
> > btrfs device scan).
> > 
> > I wonder whether that has to do with the new SATA LPM policy stuff, but as
> > I had issues with
> > 
> >   3 => Medium power with Device Initiated PM enabled
> > 
> > (machine did not boot, which could also have been caused by me
> > accidentally
> > removing all TCP/IP network support in the kernel with that setting)
> > 
> > I set it back to
> > 
> > CONFIG_SATA_MOBILE_LPM_POLICY=0
> > 
> > (firmware settings)
> 
> Right, so at that settings the LPM policy changes are effectively
> disabled and cannot explain your SMART issues.
> 
> Still I would like to zoom in on this part of your bug report, because
> for Fedora 28 we are planning to ship with CONFIG_SATA_MOBILE_LPM_POLICY=3
> and AFAIK Ubuntu has similar plans.
> 
> I suspect that the issue you were seeing with
> CONFIG_SATA_MOBILE_LPM_POLICY=3 were with the Crucial disk ? I've attached
> a patch for you to test, which disabled LPM for your model Crucial SSD (but
> keeps it on for the Intel disk) if you can confirm that with that patch you
> can run with
> CONFIG_SATA_MOBILE_LPM_POLICY=3 without issues that would be great.

I think I can do that during the week with 4.16-rc5 then.

Is it possible to override that setting at boot time with a kernel parameter? 
It would make it easier to switch between the policies for testing.

I didn´t see anything regarding that in the help of the kernel option.

Thanks,
-- 
Martin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Possible REGRESSION, 4.16-rc4] Error updating SMART data during runtime and could not connect to lvmetad at some boot attempts
  2018-03-11 16:28   ` Martin Steigerwald
@ 2018-03-11 16:41     ` Hans de Goede
  0 siblings, 0 replies; 18+ messages in thread
From: Hans de Goede @ 2018-03-11 16:41 UTC (permalink / raw)
  To: Martin Steigerwald
  Cc: Linux Kernel Mailing List, Thorsten Leemhuis, Tejun Heo

Hi,

On 11-03-18 17:28, Martin Steigerwald wrote:
> Hans de Goede - 11.03.18, 15:37:
>> Hi Martin,
>>
>> On 11-03-18 09:20, Martin Steigerwald wrote:
>>> Hello.
>>>
>>> Since 4.16-rc4 (upgraded from 4.15.2 which worked) I have an issue
>>> with SMART checks occassionally failing like this:
>>>
>>> smartd[28017]: Device: /dev/sdb [SAT], is in SLEEP mode, suspending checks
>>> udisksd[24408]: Error performing housekeeping for drive
>>> /org/freedesktop/UDisks2/drives/INTEL_SSDSA2CW300G3_[…]: Error updating
>>> SMART data: Error sending ATA command CHECK POWER MODE: Unexpected sense
>>> data returned:#0120000: 0e 09 0c 00  00 00 ff 00  00 00 00 00  00 00 50
>>> 00    ..............P.#0120010: 00 00 00 00  00 00 00 00  00 00 00 00  00
>>> 00 00 00    ................#012 (g-io-error-quark, 0) merkaba
>>> udisksd[24408]: Error performing housekeeping for drive
>>> /org/freedesktop/UDisks2/drives/Crucial_CT480M500SSD3_[…]: Error updating
>>> SMART dat a: Error sending ATA command CHECK POWER MODE: Unexpected sense
>>> data returned:#0120000: 01 00 1d 00  00 00 0e 09  0c 00 00 00  ff 00 00
>>> 00    ................#0120010: 00 0 0 00 00  50 00 00 00  00 00 00 00
>>> 00 00 00 00    ....P...........#012 (g-io-error-quark, 0)
>>>
>>> (Intel SSD is connected via SATA, Crucial via mSATA in a ThinkPad T520)
>>>
>>> However when I then check manually with smartctl -a | -x | -H the device
>>> reports SMART data just fine.
>>>
>>> As smartd correctly detects that device is in sleep mode, this may be an
>>> userspace issue in udisksd.
>>>
>>> Also at some boot attempts the boot hangs with a message like "could not
>>> connect to lvmetad, scanning manually for devices". I use BTRFS RAID 1
>>> on to LVs (each on one of the SSDs). A configuration that requires a
>>> manual
>>> adaption to InitRAMFS in order to boot (basically vgchange -ay before
>>> btrfs device scan).
>>>
>>> I wonder whether that has to do with the new SATA LPM policy stuff, but as
>>> I had issues with
>>>
>>>    3 => Medium power with Device Initiated PM enabled
>>>
>>> (machine did not boot, which could also have been caused by me
>>> accidentally
>>> removing all TCP/IP network support in the kernel with that setting)
>>>
>>> I set it back to
>>>
>>> CONFIG_SATA_MOBILE_LPM_POLICY=0
>>>
>>> (firmware settings)
>>
>> Right, so at that settings the LPM policy changes are effectively
>> disabled and cannot explain your SMART issues.
>>
>> Still I would like to zoom in on this part of your bug report, because
>> for Fedora 28 we are planning to ship with CONFIG_SATA_MOBILE_LPM_POLICY=3
>> and AFAIK Ubuntu has similar plans.
>>
>> I suspect that the issue you were seeing with
>> CONFIG_SATA_MOBILE_LPM_POLICY=3 were with the Crucial disk ? I've attached
>> a patch for you to test, which disabled LPM for your model Crucial SSD (but
>> keeps it on for the Intel disk) if you can confirm that with that patch you
>> can run with
>> CONFIG_SATA_MOBILE_LPM_POLICY=3 without issues that would be great.
> 
> I think I can do that during the week with 4.16-rc5 then.

Great, thank you.

> Is it possible to override that setting at boot time with a kernel parameter?

Yes the Kconfig determines the default value for the ahci.mobile_lpm_policy
cmdline option, and that (setting the default val) is all it does, changing
that option has the same result as changing the Kconfig option and rebuilding.

 > It would make it easier to switch between the policies for testing.

Ack, as said the plan is to make CONFIG_SATA_MOBILE_LPM_POLICY=3 the default for
Fedora 28, so it is in my own interest to make it easy to test different
settings :)

Regards,

Hans

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Possible REGRESSION, 4.16-rc4] Error updating SMART data during runtime and could not connect to lvmetad at some boot attempts
  2018-03-11 14:37 ` Hans de Goede
  2018-03-11 16:28   ` Martin Steigerwald
@ 2018-03-13 13:08   ` Martin Steigerwald
  2018-03-13 14:32     ` Ming Lei
  2018-03-14 11:01   ` Martin Steigerwald
  2 siblings, 1 reply; 18+ messages in thread
From: Martin Steigerwald @ 2018-03-13 13:08 UTC (permalink / raw)
  To: Hans de Goede
  Cc: Linux Kernel Mailing List, Thorsten Leemhuis, Tejun Heo,
	linux-block, Ming Lei, Bart Van Assche

Hans de Goede - 11.03.18, 15:37:
> Hi Martin,
> 
> On 11-03-18 09:20, Martin Steigerwald wrote:
> > Hello.
> > 
> > Since 4.16-rc4 (upgraded from 4.15.2 which worked) I have an issue
> > with SMART checks occassionally failing like this:
> > 
> > smartd[28017]: Device: /dev/sdb [SAT], is in SLEEP mode, suspending checks
> > udisksd[24408]: Error performing housekeeping for drive
> > /org/freedesktop/UDisks2/drives/INTEL_SSDSA2CW300G3_[…]: Error updating
> > SMART data: Error sending ATA command CHECK POWER MODE: Unexpected sense
> > data returned:#0120000: 0e 09 0c 00  00 00 ff 00  00 00 00 00  00 00 50
> > 00    ..............P.#0120010: 00 00 00 00  00 00 00 00  00 00 00 00  00
> > 00 00 00    ................#012 (g-io-error-quark, 0) merkaba
> > udisksd[24408]: Error performing housekeeping for drive
> > /org/freedesktop/UDisks2/drives/Crucial_CT480M500SSD3_[…]: Error updating
> > SMART dat a: Error sending ATA command CHECK POWER MODE: Unexpected sense
> > data returned:#0120000: 01 00 1d 00  00 00 0e 09  0c 00 00 00  ff 00 00
> > 00    ................#0120010: 00 0 0 00 00  50 00 00 00  00 00 00 00 
> > 00 00 00 00    ....P...........#012 (g-io-error-quark, 0)
> > 
> > (Intel SSD is connected via SATA, Crucial via mSATA in a ThinkPad T520)
> > 
> > However when I then check manually with smartctl -a | -x | -H the device
> > reports SMART data just fine.
> > 
> > As smartd correctly detects that device is in sleep mode, this may be an
> > userspace issue in udisksd.
> > 
> > Also at some boot attempts the boot hangs with a message like "could not
> > connect to lvmetad, scanning manually for devices". I use BTRFS RAID 1
> > on to LVs (each on one of the SSDs). A configuration that requires a
> > manual
> > adaption to InitRAMFS in order to boot (basically vgchange -ay before
> > btrfs device scan).
> > 
> > I wonder whether that has to do with the new SATA LPM policy stuff, but as
> > I had issues with
> > 
> >   3 => Medium power with Device Initiated PM enabled
> > 
> > (machine did not boot, which could also have been caused by me
> > accidentally
> > removing all TCP/IP network support in the kernel with that setting)
> > 
> > I set it back to
> > 
> > CONFIG_SATA_MOBILE_LPM_POLICY=0
> > 
> > (firmware settings)
> 
> Right, so at that settings the LPM policy changes are effectively
> disabled and cannot explain your SMART issues.

Yes, I now good a photo of one of those boot failures I mentioned, at it seems 
to be related to blk-mq, as the backtrace contains "blk_mq_terminate_expired".

I add the screenshot to my bug report.

[Possible REGRESSION, 4.16-rc4] Error updating SMART data during runtime and 
boot failures with blk_mq_terminate_expired in backtrace
https://bugzilla.kernel.org/show_bug.cgi?id=199077

Hans, I will test your LPM policy horkage for Crucial m500 patch at a later 
time. I first wanted to add the photo of the boot failure to the bug report.

Ming and Bart, I added you to cc, cause I had to do with you about another 
blk-mq report, please feel free to adapt.

Thanks,
-- 
Martin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Possible REGRESSION, 4.16-rc4] Error updating SMART data during runtime and could not connect to lvmetad at some boot attempts
  2018-03-13 13:08   ` Martin Steigerwald
@ 2018-03-13 14:32     ` Ming Lei
  2018-03-13 14:56       ` Bart Van Assche
  0 siblings, 1 reply; 18+ messages in thread
From: Ming Lei @ 2018-03-13 14:32 UTC (permalink / raw)
  To: Martin Steigerwald
  Cc: Hans de Goede, Linux Kernel Mailing List, Thorsten Leemhuis,
	Tejun Heo, linux-block, Bart Van Assche, linux-scsi,
	Martin K. Petersen, James E.J. Bottomley

On Tue, Mar 13, 2018 at 02:08:23PM +0100, Martin Steigerwald wrote:
> Hans de Goede - 11.03.18, 15:37:
> > Hi Martin,
> > 
> > On 11-03-18 09:20, Martin Steigerwald wrote:
> > > Hello.
> > > 
> > > Since 4.16-rc4 (upgraded from 4.15.2 which worked) I have an issue
> > > with SMART checks occassionally failing like this:
> > > 
> > > smartd[28017]: Device: /dev/sdb [SAT], is in SLEEP mode, suspending checks
> > > udisksd[24408]: Error performing housekeeping for drive
> > > /org/freedesktop/UDisks2/drives/INTEL_SSDSA2CW300G3_[…]: Error updating
> > > SMART data: Error sending ATA command CHECK POWER MODE: Unexpected sense
> > > data returned:#0120000: 0e 09 0c 00  00 00 ff 00  00 00 00 00  00 00 50
> > > 00    ..............P.#0120010: 00 00 00 00  00 00 00 00  00 00 00 00  00
> > > 00 00 00    ................#012 (g-io-error-quark, 0) merkaba
> > > udisksd[24408]: Error performing housekeeping for drive
> > > /org/freedesktop/UDisks2/drives/Crucial_CT480M500SSD3_[…]: Error updating
> > > SMART dat a: Error sending ATA command CHECK POWER MODE: Unexpected sense
> > > data returned:#0120000: 01 00 1d 00  00 00 0e 09  0c 00 00 00  ff 00 00
> > > 00    ................#0120010: 00 0 0 00 00  50 00 00 00  00 00 00 00 
> > > 00 00 00 00    ....P...........#012 (g-io-error-quark, 0)
> > > 
> > > (Intel SSD is connected via SATA, Crucial via mSATA in a ThinkPad T520)
> > > 
> > > However when I then check manually with smartctl -a | -x | -H the device
> > > reports SMART data just fine.
> > > 
> > > As smartd correctly detects that device is in sleep mode, this may be an
> > > userspace issue in udisksd.
> > > 
> > > Also at some boot attempts the boot hangs with a message like "could not
> > > connect to lvmetad, scanning manually for devices". I use BTRFS RAID 1
> > > on to LVs (each on one of the SSDs). A configuration that requires a
> > > manual
> > > adaption to InitRAMFS in order to boot (basically vgchange -ay before
> > > btrfs device scan).
> > > 
> > > I wonder whether that has to do with the new SATA LPM policy stuff, but as
> > > I had issues with
> > > 
> > >   3 => Medium power with Device Initiated PM enabled
> > > 
> > > (machine did not boot, which could also have been caused by me
> > > accidentally
> > > removing all TCP/IP network support in the kernel with that setting)
> > > 
> > > I set it back to
> > > 
> > > CONFIG_SATA_MOBILE_LPM_POLICY=0
> > > 
> > > (firmware settings)
> > 
> > Right, so at that settings the LPM policy changes are effectively
> > disabled and cannot explain your SMART issues.
> 
> Yes, I now good a photo of one of those boot failures I mentioned, at it seems 
> to be related to blk-mq, as the backtrace contains "blk_mq_terminate_expired".
> 
> I add the screenshot to my bug report.
> 
> [Possible REGRESSION, 4.16-rc4] Error updating SMART data during runtime and 
> boot failures with blk_mq_terminate_expired in backtrace
> https://bugzilla.kernel.org/show_bug.cgi?id=199077
> 
> Hans, I will test your LPM policy horkage for Crucial m500 patch at a later 
> time. I first wanted to add the photo of the boot failure to the bug report.
> 
> Ming and Bart, I added you to cc, cause I had to do with you about another 
> blk-mq report, please feel free to adapt.

Looks RIP points to scsi_times_out+0x17/0x1d0, maybe a SCSI regression?

Thanks,
Ming

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Possible REGRESSION, 4.16-rc4] Error updating SMART data during runtime and could not connect to lvmetad at some boot attempts
  2018-03-13 14:32     ` Ming Lei
@ 2018-03-13 14:56       ` Bart Van Assche
  0 siblings, 0 replies; 18+ messages in thread
From: Bart Van Assche @ 2018-03-13 14:56 UTC (permalink / raw)
  To: martin, ming.lei
  Cc: linux-kernel, linux-block, hdegoede, martin.petersen, linux-scsi,
	regressions, tj, jejb

On Tue, 2018-03-13 at 22:32 +0800, Ming Lei wrote:
> On Tue, Mar 13, 2018 at 02:08:23PM +0100, Martin Steigerwald wrote:
> > Ming and Bart, I added you to cc, cause I had to do with you about another 
> > blk-mq report, please feel free to adapt.
> 
> Looks RIP points to scsi_times_out+0x17/0x1d0, maybe a SCSI regression?

I think that it's much more likely that this is a block layer regression. See
e.g. "[PATCH v2] blk-mq: Fix race between resetting the timer and completion
handling" (https://www.mail-archive.com/linux-block@vger.kernel.org/msg18338.html).

Bart.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Possible REGRESSION, 4.16-rc4] Error updating SMART data during runtime and could not connect to lvmetad at some boot attempts
  2018-03-11 14:37 ` Hans de Goede
  2018-03-11 16:28   ` Martin Steigerwald
  2018-03-13 13:08   ` Martin Steigerwald
@ 2018-03-14 11:01   ` Martin Steigerwald
  2018-03-14 11:05     ` Hans de Goede
  2018-03-15 10:48     ` Martin Steigerwald
  2 siblings, 2 replies; 18+ messages in thread
From: Martin Steigerwald @ 2018-03-14 11:01 UTC (permalink / raw)
  To: Hans de Goede; +Cc: Linux Kernel Mailing List, Thorsten Leemhuis, Tejun Heo

Hans de Goede - 11.03.18, 15:37:
> Hi Martin,
> 
> On 11-03-18 09:20, Martin Steigerwald wrote:
> > Hello.
> > 
> > Since 4.16-rc4 (upgraded from 4.15.2 which worked) I have an issue
> > with SMART checks occassionally failing like this:
> > 
> > smartd[28017]: Device: /dev/sdb [SAT], is in SLEEP mode, suspending checks
> > udisksd[24408]: Error performing housekeeping for drive
> > /org/freedesktop/UDisks2/drives/INTEL_SSDSA2CW300G3_[…]: Error updating
> > SMART data: Error sending ATA command CHECK POWER MODE: Unexpected sense
> > data returned:#0120000: 0e 09 0c 00  00 00 ff 00  00 00 00 00  00 00 50
> > 00    ..............P.#0120010: 00 00 00 00  00 00 00 00  00 00 00 00  00
> > 00 00 00    ................#012 (g-io-error-quark, 0) merkaba
> > udisksd[24408]: Error performing housekeeping for drive
> > /org/freedesktop/UDisks2/drives/Crucial_CT480M500SSD3_[…]: Error updating
> > SMART dat a: Error sending ATA command CHECK POWER MODE: Unexpected sense
> > data returned:#0120000: 01 00 1d 00  00 00 0e 09  0c 00 00 00  ff 00 00
> > 00    ................#0120010: 00 0 0 00 00  50 00 00 00  00 00 00 00 
> > 00 00 00 00    ....P...........#012 (g-io-error-quark, 0)
> > 
> > (Intel SSD is connected via SATA, Crucial via mSATA in a ThinkPad T520)
> > 
> > However when I then check manually with smartctl -a | -x | -H the device
> > reports SMART data just fine.
> > 
> > As smartd correctly detects that device is in sleep mode, this may be an
> > userspace issue in udisksd.
> > 
> > Also at some boot attempts the boot hangs with a message like "could not
> > connect to lvmetad, scanning manually for devices". I use BTRFS RAID 1
> > on to LVs (each on one of the SSDs). A configuration that requires a
> > manual
> > adaption to InitRAMFS in order to boot (basically vgchange -ay before
> > btrfs device scan).
> > 
> > I wonder whether that has to do with the new SATA LPM policy stuff, but as
> > I had issues with
> > 
> >   3 => Medium power with Device Initiated PM enabled
> > 
> > (machine did not boot, which could also have been caused by me
> > accidentally
> > removing all TCP/IP network support in the kernel with that setting)
> > 
> > I set it back to
> > 
> > CONFIG_SATA_MOBILE_LPM_POLICY=0
> > 
> > (firmware settings)
> 
> Right, so at that settings the LPM policy changes are effectively
> disabled and cannot explain your SMART issues.
> 
> Still I would like to zoom in on this part of your bug report, because
> for Fedora 28 we are planning to ship with CONFIG_SATA_MOBILE_LPM_POLICY=3
> and AFAIK Ubuntu has similar plans.
> 
> I suspect that the issue you were seeing with
> CONFIG_SATA_MOBILE_LPM_POLICY=3 were with the Crucial disk ? I've attached
> a patch for you to test, which disabled LPM for your model Crucial SSD (but
> keeps it on for the Intel disk) if you can confirm that with that patch you
> can run with
> CONFIG_SATA_MOBILE_LPM_POLICY=3 without issues that would be great.

With 4.16-rc5 with CONFIG_SATA_MOBILE_LPM_POLICY=3 the system successfully 
booted three times in a row. So feel free to add tested-by.

Let´s see whether the blk_mq_terminate_expired or the smartd/udisks error 
messages reappear with rc5. I still think they are a different issue.

Thanks,
-- 
Martin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Possible REGRESSION, 4.16-rc4] Error updating SMART data during runtime and could not connect to lvmetad at some boot attempts
  2018-03-14 11:01   ` Martin Steigerwald
@ 2018-03-14 11:05     ` Hans de Goede
  2018-03-14 12:48       ` Martin Steigerwald
  2018-03-15 10:48     ` Martin Steigerwald
  1 sibling, 1 reply; 18+ messages in thread
From: Hans de Goede @ 2018-03-14 11:05 UTC (permalink / raw)
  To: Martin Steigerwald
  Cc: Linux Kernel Mailing List, Thorsten Leemhuis, Tejun Heo

Hi,

On 14-03-18 12:01, Martin Steigerwald wrote:
> Hans de Goede - 11.03.18, 15:37:
>> Hi Martin,
>>
>> On 11-03-18 09:20, Martin Steigerwald wrote:
>>> Hello.
>>>
>>> Since 4.16-rc4 (upgraded from 4.15.2 which worked) I have an issue
>>> with SMART checks occassionally failing like this:
>>>
>>> smartd[28017]: Device: /dev/sdb [SAT], is in SLEEP mode, suspending checks
>>> udisksd[24408]: Error performing housekeeping for drive
>>> /org/freedesktop/UDisks2/drives/INTEL_SSDSA2CW300G3_[…]: Error updating
>>> SMART data: Error sending ATA command CHECK POWER MODE: Unexpected sense
>>> data returned:#0120000: 0e 09 0c 00  00 00 ff 00  00 00 00 00  00 00 50
>>> 00    ..............P.#0120010: 00 00 00 00  00 00 00 00  00 00 00 00  00
>>> 00 00 00    ................#012 (g-io-error-quark, 0) merkaba
>>> udisksd[24408]: Error performing housekeeping for drive
>>> /org/freedesktop/UDisks2/drives/Crucial_CT480M500SSD3_[…]: Error updating
>>> SMART dat a: Error sending ATA command CHECK POWER MODE: Unexpected sense
>>> data returned:#0120000: 01 00 1d 00  00 00 0e 09  0c 00 00 00  ff 00 00
>>> 00    ................#0120010: 00 0 0 00 00  50 00 00 00  00 00 00 00
>>> 00 00 00 00    ....P...........#012 (g-io-error-quark, 0)
>>>
>>> (Intel SSD is connected via SATA, Crucial via mSATA in a ThinkPad T520)
>>>
>>> However when I then check manually with smartctl -a | -x | -H the device
>>> reports SMART data just fine.
>>>
>>> As smartd correctly detects that device is in sleep mode, this may be an
>>> userspace issue in udisksd.
>>>
>>> Also at some boot attempts the boot hangs with a message like "could not
>>> connect to lvmetad, scanning manually for devices". I use BTRFS RAID 1
>>> on to LVs (each on one of the SSDs). A configuration that requires a
>>> manual
>>> adaption to InitRAMFS in order to boot (basically vgchange -ay before
>>> btrfs device scan).
>>>
>>> I wonder whether that has to do with the new SATA LPM policy stuff, but as
>>> I had issues with
>>>
>>>    3 => Medium power with Device Initiated PM enabled
>>>
>>> (machine did not boot, which could also have been caused by me
>>> accidentally
>>> removing all TCP/IP network support in the kernel with that setting)
>>>
>>> I set it back to
>>>
>>> CONFIG_SATA_MOBILE_LPM_POLICY=0
>>>
>>> (firmware settings)
>>
>> Right, so at that settings the LPM policy changes are effectively
>> disabled and cannot explain your SMART issues.
>>
>> Still I would like to zoom in on this part of your bug report, because
>> for Fedora 28 we are planning to ship with CONFIG_SATA_MOBILE_LPM_POLICY=3
>> and AFAIK Ubuntu has similar plans.
>>
>> I suspect that the issue you were seeing with
>> CONFIG_SATA_MOBILE_LPM_POLICY=3 were with the Crucial disk ? I've attached
>> a patch for you to test, which disabled LPM for your model Crucial SSD (but
>> keeps it on for the Intel disk) if you can confirm that with that patch you
>> can run with
>> CONFIG_SATA_MOBILE_LPM_POLICY=3 without issues that would be great.
> 
> With 4.16-rc5 with CONFIG_SATA_MOBILE_LPM_POLICY=3 the system successfully
> booted three times in a row. So feel free to add tested-by.

Thanks.

To be clear, you're talking about 4.16-rc5 with the patch I made to
blacklist the Crucial disk I assume, not just plain 4.16-rc5, right ?

Regards,

Hans

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Possible REGRESSION, 4.16-rc4] Error updating SMART data during runtime and could not connect to lvmetad at some boot attempts
  2018-03-14 11:05     ` Hans de Goede
@ 2018-03-14 12:48       ` Martin Steigerwald
  2018-03-18 21:34         ` Hans de Goede
  0 siblings, 1 reply; 18+ messages in thread
From: Martin Steigerwald @ 2018-03-14 12:48 UTC (permalink / raw)
  To: Hans de Goede; +Cc: Linux Kernel Mailing List, Thorsten Leemhuis, Tejun Heo

Hans de Goede - 14.03.18, 12:05:
> Hi,
> 
> On 14-03-18 12:01, Martin Steigerwald wrote:
> > Hans de Goede - 11.03.18, 15:37:
> >> Hi Martin,
> >> 
> >> On 11-03-18 09:20, Martin Steigerwald wrote:
> >>> Hello.
> >>> 
> >>> Since 4.16-rc4 (upgraded from 4.15.2 which worked) I have an issue
> >>> with SMART checks occassionally failing like this:
> >>> 
> >>> smartd[28017]: Device: /dev/sdb [SAT], is in SLEEP mode, suspending
> >>> checks
> >>> udisksd[24408]: Error performing housekeeping for drive
> >>> /org/freedesktop/UDisks2/drives/INTEL_SSDSA2CW300G3_[…]: Error updating
> >>> SMART data: Error sending ATA command CHECK POWER MODE: Unexpected sense
> >>> data returned:#0120000: 0e 09 0c 00  00 00 ff 00  00 00 00 00  00 00 50
> >>> 00    ..............P.#0120010: 00 00 00 00  00 00 00 00  00 00 00 00 
> >>> 00
> >>> 00 00 00    ................#012 (g-io-error-quark, 0) merkaba
> >>> udisksd[24408]: Error performing housekeeping for drive
> >>> /org/freedesktop/UDisks2/drives/Crucial_CT480M500SSD3_[…]: Error
> >>> updating
> >>> SMART dat a: Error sending ATA command CHECK POWER MODE: Unexpected
> >>> sense
> >>> data returned:#0120000: 01 00 1d 00  00 00 0e 09  0c 00 00 00  ff 00 00
> >>> 00    ................#0120010: 00 0 0 00 00  50 00 00 00  00 00 00 00
> >>> 00 00 00 00    ....P...........#012 (g-io-error-quark, 0)
> >>> 
> >>> (Intel SSD is connected via SATA, Crucial via mSATA in a ThinkPad T520)
> >>> 
> >>> However when I then check manually with smartctl -a | -x | -H the device
> >>> reports SMART data just fine.
> >>> 
> >>> As smartd correctly detects that device is in sleep mode, this may be an
> >>> userspace issue in udisksd.
> >>> 
> >>> Also at some boot attempts the boot hangs with a message like "could not
> >>> connect to lvmetad, scanning manually for devices". I use BTRFS RAID 1
> >>> on to LVs (each on one of the SSDs). A configuration that requires a
> >>> manual
> >>> adaption to InitRAMFS in order to boot (basically vgchange -ay before
> >>> btrfs device scan).
> >>> 
> >>> I wonder whether that has to do with the new SATA LPM policy stuff, but
> >>> as
> >>> I had issues with
> >>> 
> >>> 3 => Medium power with Device Initiated PM enabled
> >>> 
> >>> (machine did not boot, which could also have been caused by me
> >>> accidentally
> >>> removing all TCP/IP network support in the kernel with that setting)
> >>> 
> >>> I set it back to
> >>> 
> >>> CONFIG_SATA_MOBILE_LPM_POLICY=0
> >>> 
> >>> (firmware settings)
> >> 
> >> Right, so at that settings the LPM policy changes are effectively
> >> disabled and cannot explain your SMART issues.
> >> 
> >> Still I would like to zoom in on this part of your bug report, because
> >> for Fedora 28 we are planning to ship with
> >> CONFIG_SATA_MOBILE_LPM_POLICY=3
> >> and AFAIK Ubuntu has similar plans.
> >> 
> >> I suspect that the issue you were seeing with
> >> CONFIG_SATA_MOBILE_LPM_POLICY=3 were with the Crucial disk ? I've
> >> attached
> >> a patch for you to test, which disabled LPM for your model Crucial SSD
> >> (but
> >> keeps it on for the Intel disk) if you can confirm that with that patch
> >> you
> >> can run with
> >> CONFIG_SATA_MOBILE_LPM_POLICY=3 without issues that would be great.
> > 
> > With 4.16-rc5 with CONFIG_SATA_MOBILE_LPM_POLICY=3 the system successfully
> > booted three times in a row. So feel free to add tested-by.
> 
> Thanks.
> 
> To be clear, you're talking about 4.16-rc5 with the patch I made to
> blacklist the Crucial disk I assume, not just plain 4.16-rc5, right ?

4.16-rc5 with your

0001-libata-Apply-NOLPM-quirk-to-Crucial-M500-480GB-SSDs.patch

patch.

Thanks,
-- 
Martin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Possible REGRESSION, 4.16-rc4] Error updating SMART data during runtime and could not connect to lvmetad at some boot attempts
  2018-03-14 11:01   ` Martin Steigerwald
  2018-03-14 11:05     ` Hans de Goede
@ 2018-03-15 10:48     ` Martin Steigerwald
  1 sibling, 0 replies; 18+ messages in thread
From: Martin Steigerwald @ 2018-03-15 10:48 UTC (permalink / raw)
  To: Martin Steigerwald
  Cc: Hans de Goede, Linux Kernel Mailing List, Thorsten Leemhuis, Tejun Heo

Martin Steigerwald - 14.03.18, 12:01:
> Hans de Goede - 11.03.18, 15:37:
> > Hi Martin,
> > 
> > On 11-03-18 09:20, Martin Steigerwald wrote:
> > > Hello.
> > > 
> > > Since 4.16-rc4 (upgraded from 4.15.2 which worked) I have an issue
> > > with SMART checks occassionally failing like this:
> > > 
> > > smartd[28017]: Device: /dev/sdb [SAT], is in SLEEP mode, suspending
> > > checks
> > > udisksd[24408]: Error performing housekeeping for drive
> > > /org/freedesktop/UDisks2/drives/INTEL_SSDSA2CW300G3_[…]: Error updating
> > > SMART data: Error sending ATA command CHECK POWER MODE: Unexpected sense
> > > data returned:#0120000: 0e 09 0c 00  00 00 ff 00  00 00 00 00  00 00 50
> > > 00    ..............P.#0120010: 00 00 00 00  00 00 00 00  00 00 00 00 
> > > 00
> > > 00 00 00    ................#012 (g-io-error-quark, 0) merkaba
> > > udisksd[24408]: Error performing housekeeping for drive
> > > /org/freedesktop/UDisks2/drives/Crucial_CT480M500SSD3_[…]: Error
> > > updating
> > > SMART dat a: Error sending ATA command CHECK POWER MODE: Unexpected
> > > sense
> > > data returned:#0120000: 01 00 1d 00  00 00 0e 09  0c 00 00 00  ff 00 00
> > > 00    ................#0120010: 00 0 0 00 00  50 00 00 00  00 00 00 00
> > > 00 00 00 00    ....P...........#012 (g-io-error-quark, 0)
> > > 
> > > (Intel SSD is connected via SATA, Crucial via mSATA in a ThinkPad T520)
> > > 
> > > However when I then check manually with smartctl -a | -x | -H the device
> > > reports SMART data just fine.
> > > 
> > > As smartd correctly detects that device is in sleep mode, this may be an
> > > userspace issue in udisksd.
> > > 
> > > Also at some boot attempts the boot hangs with a message like "could not
> > > connect to lvmetad, scanning manually for devices". I use BTRFS RAID 1
> > > on to LVs (each on one of the SSDs). A configuration that requires a
> > > manual
> > > adaption to InitRAMFS in order to boot (basically vgchange -ay before
> > > btrfs device scan).
> > > 
> > > I wonder whether that has to do with the new SATA LPM policy stuff, but
> > > as
> > > I had issues with
> > > 
> > >   3 => Medium power with Device Initiated PM enabled
> > > 
> > > (machine did not boot, which could also have been caused by me
> > > accidentally
> > > removing all TCP/IP network support in the kernel with that setting)
> > > 
> > > I set it back to
> > > 
> > > CONFIG_SATA_MOBILE_LPM_POLICY=0
> > > 
> > > (firmware settings)
> > 
> > Right, so at that settings the LPM policy changes are effectively
> > disabled and cannot explain your SMART issues.
> > 
> > Still I would like to zoom in on this part of your bug report, because
> > for Fedora 28 we are planning to ship with CONFIG_SATA_MOBILE_LPM_POLICY=3
> > and AFAIK Ubuntu has similar plans.
> > 
> > I suspect that the issue you were seeing with
> > CONFIG_SATA_MOBILE_LPM_POLICY=3 were with the Crucial disk ? I've attached
> > a patch for you to test, which disabled LPM for your model Crucial SSD
> > (but
> > keeps it on for the Intel disk) if you can confirm that with that patch
> > you
> > can run with
> > CONFIG_SATA_MOBILE_LPM_POLICY=3 without issues that would be great.
> 
> With 4.16-rc5 with CONFIG_SATA_MOBILE_LPM_POLICY=3 the system successfully
> booted three times in a row. So feel free to add tested-by.
>
> Let´s see whether the blk_mq_terminate_expired or the smartd/udisks error
> messages reappear with rc5. I still think they are a different issue.

As expected these two other issues still happen with 4.16-rc5

Thanks,
-- 
Martin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Possible REGRESSION, 4.16-rc4] Error updating SMART data during runtime and could not connect to lvmetad at some boot attempts
  2018-03-14 12:48       ` Martin Steigerwald
@ 2018-03-18 21:34         ` Hans de Goede
  2018-03-18 22:06           ` Martin Steigerwald
  0 siblings, 1 reply; 18+ messages in thread
From: Hans de Goede @ 2018-03-18 21:34 UTC (permalink / raw)
  To: Martin Steigerwald
  Cc: Linux Kernel Mailing List, Thorsten Leemhuis, Tejun Heo

Hi,

On 14-03-18 13:48, Martin Steigerwald wrote:
> Hans de Goede - 14.03.18, 12:05:
>> Hi,
>>
>> On 14-03-18 12:01, Martin Steigerwald wrote:
>>> Hans de Goede - 11.03.18, 15:37:
>>>> Hi Martin,
>>>>
>>>> On 11-03-18 09:20, Martin Steigerwald wrote:
>>>>> Hello.
>>>>>
>>>>> Since 4.16-rc4 (upgraded from 4.15.2 which worked) I have an issue
>>>>> with SMART checks occassionally failing like this:
>>>>>
>>>>> smartd[28017]: Device: /dev/sdb [SAT], is in SLEEP mode, suspending
>>>>> checks
>>>>> udisksd[24408]: Error performing housekeeping for drive
>>>>> /org/freedesktop/UDisks2/drives/INTEL_SSDSA2CW300G3_[…]: Error updating
>>>>> SMART data: Error sending ATA command CHECK POWER MODE: Unexpected sense
>>>>> data returned:#0120000: 0e 09 0c 00  00 00 ff 00  00 00 00 00  00 00 50
>>>>> 00    ..............P.#0120010: 00 00 00 00  00 00 00 00  00 00 00 00
>>>>> 00
>>>>> 00 00 00    ................#012 (g-io-error-quark, 0) merkaba
>>>>> udisksd[24408]: Error performing housekeeping for drive
>>>>> /org/freedesktop/UDisks2/drives/Crucial_CT480M500SSD3_[…]: Error
>>>>> updating
>>>>> SMART dat a: Error sending ATA command CHECK POWER MODE: Unexpected
>>>>> sense
>>>>> data returned:#0120000: 01 00 1d 00  00 00 0e 09  0c 00 00 00  ff 00 00
>>>>> 00    ................#0120010: 00 0 0 00 00  50 00 00 00  00 00 00 00
>>>>> 00 00 00 00    ....P...........#012 (g-io-error-quark, 0)
>>>>>
>>>>> (Intel SSD is connected via SATA, Crucial via mSATA in a ThinkPad T520)
>>>>>
>>>>> However when I then check manually with smartctl -a | -x | -H the device
>>>>> reports SMART data just fine.
>>>>>
>>>>> As smartd correctly detects that device is in sleep mode, this may be an
>>>>> userspace issue in udisksd.
>>>>>
>>>>> Also at some boot attempts the boot hangs with a message like "could not
>>>>> connect to lvmetad, scanning manually for devices". I use BTRFS RAID 1
>>>>> on to LVs (each on one of the SSDs). A configuration that requires a
>>>>> manual
>>>>> adaption to InitRAMFS in order to boot (basically vgchange -ay before
>>>>> btrfs device scan).
>>>>>
>>>>> I wonder whether that has to do with the new SATA LPM policy stuff, but
>>>>> as
>>>>> I had issues with
>>>>>
>>>>> 3 => Medium power with Device Initiated PM enabled
>>>>>
>>>>> (machine did not boot, which could also have been caused by me
>>>>> accidentally
>>>>> removing all TCP/IP network support in the kernel with that setting)
>>>>>
>>>>> I set it back to
>>>>>
>>>>> CONFIG_SATA_MOBILE_LPM_POLICY=0
>>>>>
>>>>> (firmware settings)
>>>>
>>>> Right, so at that settings the LPM policy changes are effectively
>>>> disabled and cannot explain your SMART issues.
>>>>
>>>> Still I would like to zoom in on this part of your bug report, because
>>>> for Fedora 28 we are planning to ship with
>>>> CONFIG_SATA_MOBILE_LPM_POLICY=3
>>>> and AFAIK Ubuntu has similar plans.
>>>>
>>>> I suspect that the issue you were seeing with
>>>> CONFIG_SATA_MOBILE_LPM_POLICY=3 were with the Crucial disk ? I've
>>>> attached
>>>> a patch for you to test, which disabled LPM for your model Crucial SSD
>>>> (but
>>>> keeps it on for the Intel disk) if you can confirm that with that patch
>>>> you
>>>> can run with
>>>> CONFIG_SATA_MOBILE_LPM_POLICY=3 without issues that would be great.
>>>
>>> With 4.16-rc5 with CONFIG_SATA_MOBILE_LPM_POLICY=3 the system successfully
>>> booted three times in a row. So feel free to add tested-by.
>>
>> Thanks.
>>
>> To be clear, you're talking about 4.16-rc5 with the patch I made to
>> blacklist the Crucial disk I assume, not just plain 4.16-rc5, right ?
> 
> 4.16-rc5 with your
> 
> 0001-libata-Apply-NOLPM-quirk-to-Crucial-M500-480GB-SSDs.patch

I was about to submit this upstream and was planning on extending it to
also cover the 960GB version, which lead to me doing a quick google.
Judging from the google results it seems that there are multiple firmware
versions of this SSD out there and I wonder if you are perhaps running
an older version of the firmware. If you do:

dmesg | grep Crucial_CT480M500

You should see something like this:

ata2.00: ATA-9: Crucial_CT480M500SSD3, MU03, max UDMA/133

I'm interested in the "MU03" part, what is that in your case?

Note I'm not saying we should not do the NOLPM quirk, but maybe we
can limit it to older firmware.

Regards,

Hans

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Possible REGRESSION, 4.16-rc4] Error updating SMART data during runtime and could not connect to lvmetad at some boot attempts
  2018-03-18 21:34         ` Hans de Goede
@ 2018-03-18 22:06           ` Martin Steigerwald
  2018-03-19  9:32             ` Hans de Goede
  0 siblings, 1 reply; 18+ messages in thread
From: Martin Steigerwald @ 2018-03-18 22:06 UTC (permalink / raw)
  To: Hans de Goede; +Cc: Linux Kernel Mailing List, Thorsten Leemhuis, Tejun Heo

Hi Hans.

Hans de Goede - 18.03.18, 22:34:
> On 14-03-18 13:48, Martin Steigerwald wrote:
> > Hans de Goede - 14.03.18, 12:05:
> >> Hi,
> >> 
> >> On 14-03-18 12:01, Martin Steigerwald wrote:
> >>> Hans de Goede - 11.03.18, 15:37:
> >>>> Hi Martin,
> >>>> 
> >>>> On 11-03-18 09:20, Martin Steigerwald wrote:
> >>>>> Hello.
> >>>>> 
> >>>>> Since 4.16-rc4 (upgraded from 4.15.2 which worked) I have an issue
> >>>>> with SMART checks occassionally failing like this:
> >>>>> 
> >>>>> smartd[28017]: Device: /dev/sdb [SAT], is in SLEEP mode, suspending
> >>>>> checks
> >>>>> udisksd[24408]: Error performing housekeeping for drive
> >>>>> /org/freedesktop/UDisks2/drives/INTEL_SSDSA2CW300G3_[…]: Error
> >>>>> updating
> >>>>> SMART data: Error sending ATA command CHECK POWER MODE: Unexpected
> >>>>> sense
> >>>>> data returned:#0120000: 0e 09 0c 00  00 00 ff 00  00 00 00 00  00 00
> >>>>> 50
> >>>>> 00    ..............P.#0120010: 00 00 00 00  00 00 00 00  00 00 00 00
> >>>>> 00
> >>>>> 00 00 00    ................#012 (g-io-error-quark, 0) merkaba
> >>>>> udisksd[24408]: Error performing housekeeping for drive
> >>>>> /org/freedesktop/UDisks2/drives/Crucial_CT480M500SSD3_[…]: Error
> >>>>> updating
> >>>>> SMART dat a: Error sending ATA command CHECK POWER MODE: Unexpected
> >>>>> sense
> >>>>> data returned:#0120000: 01 00 1d 00  00 00 0e 09  0c 00 00 00  ff 00
> >>>>> 00
> >>>>> 00    ................#0120010: 00 0 0 00 00  50 00 00 00  00 00 00 00
> >>>>> 00 00 00 00    ....P...........#012 (g-io-error-quark, 0)
> >>>>> 
> >>>>> (Intel SSD is connected via SATA, Crucial via mSATA in a ThinkPad
> >>>>> T520)
> >>>>> 
> >>>>> However when I then check manually with smartctl -a | -x | -H the
> >>>>> device
> >>>>> reports SMART data just fine.
> >>>>> 
> >>>>> As smartd correctly detects that device is in sleep mode, this may be
> >>>>> an
> >>>>> userspace issue in udisksd.
> >>>>> 
> >>>>> Also at some boot attempts the boot hangs with a message like "could
> >>>>> not
> >>>>> connect to lvmetad, scanning manually for devices". I use BTRFS RAID 1
> >>>>> on to LVs (each on one of the SSDs). A configuration that requires a
> >>>>> manual
> >>>>> adaption to InitRAMFS in order to boot (basically vgchange -ay before
> >>>>> btrfs device scan).
> >>>>> 
> >>>>> I wonder whether that has to do with the new SATA LPM policy stuff,
> >>>>> but
> >>>>> as
> >>>>> I had issues with
> >>>>> 
> >>>>> 3 => Medium power with Device Initiated PM enabled
> >>>>> 
> >>>>> (machine did not boot, which could also have been caused by me
> >>>>> accidentally
> >>>>> removing all TCP/IP network support in the kernel with that setting)
> >>>>> 
> >>>>> I set it back to
> >>>>> 
> >>>>> CONFIG_SATA_MOBILE_LPM_POLICY=0
> >>>>> 
> >>>>> (firmware settings)
> >>>> 
> >>>> Right, so at that settings the LPM policy changes are effectively
> >>>> disabled and cannot explain your SMART issues.
> >>>> 
> >>>> Still I would like to zoom in on this part of your bug report, because
> >>>> for Fedora 28 we are planning to ship with
> >>>> CONFIG_SATA_MOBILE_LPM_POLICY=3
> >>>> and AFAIK Ubuntu has similar plans.
> >>>> 
> >>>> I suspect that the issue you were seeing with
> >>>> CONFIG_SATA_MOBILE_LPM_POLICY=3 were with the Crucial disk ? I've
> >>>> attached
> >>>> a patch for you to test, which disabled LPM for your model Crucial SSD
> >>>> (but
> >>>> keeps it on for the Intel disk) if you can confirm that with that patch
> >>>> you
> >>>> can run with
> >>>> CONFIG_SATA_MOBILE_LPM_POLICY=3 without issues that would be great.
> >>> 
> >>> With 4.16-rc5 with CONFIG_SATA_MOBILE_LPM_POLICY=3 the system
> >>> successfully
> >>> booted three times in a row. So feel free to add tested-by.
> >> 
> >> Thanks.
> >> 
> >> To be clear, you're talking about 4.16-rc5 with the patch I made to
> >> blacklist the Crucial disk I assume, not just plain 4.16-rc5, right ?
> > 
> > 4.16-rc5 with your
> > 
> > 0001-libata-Apply-NOLPM-quirk-to-Crucial-M500-480GB-SSDs.patch
> 
> I was about to submit this upstream and was planning on extending it to
> also cover the 960GB version, which lead to me doing a quick google.
> Judging from the google results it seems that there are multiple firmware
> versions of this SSD out there and I wonder if you are perhaps running
> an older version of the firmware. If you do:
> 
> dmesg | grep Crucial_CT480M500
> 
> You should see something like this:
> 
> ata2.00: ATA-9: Crucial_CT480M500SSD3, MU03, max UDMA/133
> 
> I'm interested in the "MU03" part, what is that in your case?

Although I never updated the firmware, I do have MU03:

% lsscsi | grep Crucial
[2:0:0:0]    disk    ATA      Crucial_CT480M50 MU03  /dev/sdb

% dmesg | grep Crucial_CT480M500
[    2.424537] ata3.00: ATA-9: Crucial_CT480M500SSD3, MU03, max UDMA/133

> Note I'm not saying we should not do the NOLPM quirk, but maybe we
> can limit it to older firmware.

Thanks,
-- 
Martin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Possible REGRESSION, 4.16-rc4] Error updating SMART data during runtime and could not connect to lvmetad at some boot attempts
  2018-03-18 22:06           ` Martin Steigerwald
@ 2018-03-19  9:32             ` Hans de Goede
  0 siblings, 0 replies; 18+ messages in thread
From: Hans de Goede @ 2018-03-19  9:32 UTC (permalink / raw)
  To: Martin Steigerwald
  Cc: Linux Kernel Mailing List, Thorsten Leemhuis, Tejun Heo

Hi,

On 18-03-18 23:06, Martin Steigerwald wrote:
> Hi Hans.
> 
> Hans de Goede - 18.03.18, 22:34:
>> On 14-03-18 13:48, Martin Steigerwald wrote:
>>> Hans de Goede - 14.03.18, 12:05:
>>>> Hi,
>>>>
>>>> On 14-03-18 12:01, Martin Steigerwald wrote:
>>>>> Hans de Goede - 11.03.18, 15:37:
>>>>>> Hi Martin,
>>>>>>
>>>>>> On 11-03-18 09:20, Martin Steigerwald wrote:
>>>>>>> Hello.
>>>>>>>
>>>>>>> Since 4.16-rc4 (upgraded from 4.15.2 which worked) I have an issue
>>>>>>> with SMART checks occassionally failing like this:
>>>>>>>
>>>>>>> smartd[28017]: Device: /dev/sdb [SAT], is in SLEEP mode, suspending
>>>>>>> checks
>>>>>>> udisksd[24408]: Error performing housekeeping for drive
>>>>>>> /org/freedesktop/UDisks2/drives/INTEL_SSDSA2CW300G3_[…]: Error
>>>>>>> updating
>>>>>>> SMART data: Error sending ATA command CHECK POWER MODE: Unexpected
>>>>>>> sense
>>>>>>> data returned:#0120000: 0e 09 0c 00  00 00 ff 00  00 00 00 00  00 00
>>>>>>> 50
>>>>>>> 00    ..............P.#0120010: 00 00 00 00  00 00 00 00  00 00 00 00
>>>>>>> 00
>>>>>>> 00 00 00    ................#012 (g-io-error-quark, 0) merkaba
>>>>>>> udisksd[24408]: Error performing housekeeping for drive
>>>>>>> /org/freedesktop/UDisks2/drives/Crucial_CT480M500SSD3_[…]: Error
>>>>>>> updating
>>>>>>> SMART dat a: Error sending ATA command CHECK POWER MODE: Unexpected
>>>>>>> sense
>>>>>>> data returned:#0120000: 01 00 1d 00  00 00 0e 09  0c 00 00 00  ff 00
>>>>>>> 00
>>>>>>> 00    ................#0120010: 00 0 0 00 00  50 00 00 00  00 00 00 00
>>>>>>> 00 00 00 00    ....P...........#012 (g-io-error-quark, 0)
>>>>>>>
>>>>>>> (Intel SSD is connected via SATA, Crucial via mSATA in a ThinkPad
>>>>>>> T520)
>>>>>>>
>>>>>>> However when I then check manually with smartctl -a | -x | -H the
>>>>>>> device
>>>>>>> reports SMART data just fine.
>>>>>>>
>>>>>>> As smartd correctly detects that device is in sleep mode, this may be
>>>>>>> an
>>>>>>> userspace issue in udisksd.
>>>>>>>
>>>>>>> Also at some boot attempts the boot hangs with a message like "could
>>>>>>> not
>>>>>>> connect to lvmetad, scanning manually for devices". I use BTRFS RAID 1
>>>>>>> on to LVs (each on one of the SSDs). A configuration that requires a
>>>>>>> manual
>>>>>>> adaption to InitRAMFS in order to boot (basically vgchange -ay before
>>>>>>> btrfs device scan).
>>>>>>>
>>>>>>> I wonder whether that has to do with the new SATA LPM policy stuff,
>>>>>>> but
>>>>>>> as
>>>>>>> I had issues with
>>>>>>>
>>>>>>> 3 => Medium power with Device Initiated PM enabled
>>>>>>>
>>>>>>> (machine did not boot, which could also have been caused by me
>>>>>>> accidentally
>>>>>>> removing all TCP/IP network support in the kernel with that setting)
>>>>>>>
>>>>>>> I set it back to
>>>>>>>
>>>>>>> CONFIG_SATA_MOBILE_LPM_POLICY=0
>>>>>>>
>>>>>>> (firmware settings)
>>>>>>
>>>>>> Right, so at that settings the LPM policy changes are effectively
>>>>>> disabled and cannot explain your SMART issues.
>>>>>>
>>>>>> Still I would like to zoom in on this part of your bug report, because
>>>>>> for Fedora 28 we are planning to ship with
>>>>>> CONFIG_SATA_MOBILE_LPM_POLICY=3
>>>>>> and AFAIK Ubuntu has similar plans.
>>>>>>
>>>>>> I suspect that the issue you were seeing with
>>>>>> CONFIG_SATA_MOBILE_LPM_POLICY=3 were with the Crucial disk ? I've
>>>>>> attached
>>>>>> a patch for you to test, which disabled LPM for your model Crucial SSD
>>>>>> (but
>>>>>> keeps it on for the Intel disk) if you can confirm that with that patch
>>>>>> you
>>>>>> can run with
>>>>>> CONFIG_SATA_MOBILE_LPM_POLICY=3 without issues that would be great.
>>>>>
>>>>> With 4.16-rc5 with CONFIG_SATA_MOBILE_LPM_POLICY=3 the system
>>>>> successfully
>>>>> booted three times in a row. So feel free to add tested-by.
>>>>
>>>> Thanks.
>>>>
>>>> To be clear, you're talking about 4.16-rc5 with the patch I made to
>>>> blacklist the Crucial disk I assume, not just plain 4.16-rc5, right ?
>>>
>>> 4.16-rc5 with your
>>>
>>> 0001-libata-Apply-NOLPM-quirk-to-Crucial-M500-480GB-SSDs.patch
>>
>> I was about to submit this upstream and was planning on extending it to
>> also cover the 960GB version, which lead to me doing a quick google.
>> Judging from the google results it seems that there are multiple firmware
>> versions of this SSD out there and I wonder if you are perhaps running
>> an older version of the firmware. If you do:
>>
>> dmesg | grep Crucial_CT480M500
>>
>> You should see something like this:
>>
>> ata2.00: ATA-9: Crucial_CT480M500SSD3, MU03, max UDMA/133
>>
>> I'm interested in the "MU03" part, what is that in your case?
> 
> Although I never updated the firmware, I do have MU03:
> 
> % lsscsi | grep Crucial
> [2:0:0:0]    disk    ATA      Crucial_CT480M50 MU03  /dev/sdb
> 
> % dmesg | grep Crucial_CT480M500
> [    2.424537] ata3.00: ATA-9: Crucial_CT480M500SSD3, MU03, max UDMA/133

Thanks. So there is an MU05 update:

www.crucial.com/wcsstore/CrucialSAS/firmware/M500/MU05/crucial-m500-iso-firmware-update-mu05-en.pdf

Which according to its changelog features:
"Improved drive latency performance in applications with SMART polling"

Which is not relevant to the LPM issues you are seeing, but seems relevant to
the other issues you are seeing.

Unfortunately the MU05 update does not seem to specifically address any
LPM issues, so I'm just going to do the blacklist for all 480GB+ models
for now (my experience with other Crucial models is that smaller variants
seem to not suffer from LPM issues).

Regards,

Hans

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Possible REGRESSION, 4.16-rc4] Error updating SMART data during runtime and could not connect to lvmetad at some boot attempts
  2018-03-11  8:20 [Possible REGRESSION, 4.16-rc4] Error updating SMART data during runtime and could not connect to lvmetad at some boot attempts Martin Steigerwald
  2018-03-11 14:37 ` Hans de Goede
@ 2018-03-19  9:42 ` Thorsten Leemhuis
  2018-03-19  9:50   ` Hans de Goede
  1 sibling, 1 reply; 18+ messages in thread
From: Thorsten Leemhuis @ 2018-03-19  9:42 UTC (permalink / raw)
  To: Martin Steigerwald, Linux Kernel Mailing List; +Cc: Tejun Heo, Hans de Goede

Hi! On 11.03.2018 09:20, Martin Steigerwald wrote:
>
> Since 4.16-rc4 (upgraded from 4.15.2 which worked) I have an issue
> with SMART checks occassionally failing like this:

Martin (or someone else): Could you gibe a status update? I have this
issue on my list or regressions, but it's hard to follow as two
different issues seem to be discussed. Or is it just one issue? Did the
patch/discussion that Bart pointed to help? Is the issue still showing
up in rc6?

Ciao, Thorsten

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Possible REGRESSION, 4.16-rc4] Error updating SMART data during runtime and could not connect to lvmetad at some boot attempts
  2018-03-19  9:42 ` Thorsten Leemhuis
@ 2018-03-19  9:50   ` Hans de Goede
  2018-03-19 12:35     ` Martin Steigerwald
  2018-04-10 17:30     ` Martin Steigerwald
  0 siblings, 2 replies; 18+ messages in thread
From: Hans de Goede @ 2018-03-19  9:50 UTC (permalink / raw)
  To: Thorsten Leemhuis, Martin Steigerwald, Linux Kernel Mailing List
  Cc: Tejun Heo

Hi Thorsten,

On 19-03-18 10:42, Thorsten Leemhuis wrote:
> Hi! On 11.03.2018 09:20, Martin Steigerwald wrote:
>>
>> Since 4.16-rc4 (upgraded from 4.15.2 which worked) I have an issue
>> with SMART checks occassionally failing like this:
> 
> Martin (or someone else): Could you gibe a status update? I have this
> issue on my list or regressions, but it's hard to follow as two
> different issues seem to be discussed. Or is it just one issue? Did the
> patch/discussion that Bart pointed to help? Is the issue still showing
> up in rc6?

Your right there are 2 issues here:

1) The Crucial M500 SSD (at least the 480GB MU03 firmware version) does
not like enabling SATA link power-management at a level of min_power
or at the new(ish) med_power_with_dipm level. This problem exists in
older kernels too, so this is not really a regression.

New in 4.16 is a Kconfig option to enable SATA LPM by default, which
makes this existing problem much more noticeable. Not sure if you want
to count this as a regression. Either way I'm preparing and sending
out a patch fixing this (by blacklisting LPM for this model SSD) right
now.

2) There seem to be some latency issues in the MU03 version of the
firmware, triggered by polling SMART data, which causes lvmetad to
timeout in some cases. Note I'm not involved in that part of this
thread, but I believe that issue is currently unresolved.

Regards,

Hans

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Possible REGRESSION, 4.16-rc4] Error updating SMART data during runtime and could not connect to lvmetad at some boot attempts
  2018-03-19  9:50   ` Hans de Goede
@ 2018-03-19 12:35     ` Martin Steigerwald
  2018-04-10 17:30     ` Martin Steigerwald
  1 sibling, 0 replies; 18+ messages in thread
From: Martin Steigerwald @ 2018-03-19 12:35 UTC (permalink / raw)
  To: Hans de Goede; +Cc: Thorsten Leemhuis, Linux Kernel Mailing List, Tejun Heo

Hi Thorsten.

Hans de Goede - 19.03.18, 10:50:
> On 19-03-18 10:42, Thorsten Leemhuis wrote:
> > Hi! On 11.03.2018 09:20, Martin Steigerwald wrote:
> >> Since 4.16-rc4 (upgraded from 4.15.2 which worked) I have an issue
> > 
> >> with SMART checks occassionally failing like this:
> > Martin (or someone else): Could you gibe a status update? I have this
> > issue on my list or regressions, but it's hard to follow as two
> > different issues seem to be discussed. Or is it just one issue? Did the

There are at least two issues.

> > patch/discussion that Bart pointed to help? Is the issue still showing
> > up in rc6?
> 
> Your right there are 2 issues here:
> 
> 1) The Crucial M500 SSD (at least the 480GB MU03 firmware version) does
> not like enabling SATA link power-management at a level of min_power
> or at the new(ish) med_power_with_dipm level. This problem exists in
> older kernels too, so this is not really a regression.
> 
> New in 4.16 is a Kconfig option to enable SATA LPM by default, which
> makes this existing problem much more noticeable. Not sure if you want
> to count this as a regression. Either way I'm preparing and sending
> out a patch fixing this (by blacklisting LPM for this model SSD) right
> now.

Yes, and this is fixed by the nolpm quirk patch of Hans.

> 2) There seem to be some latency issues in the MU03 version of the
> firmware, triggered by polling SMART data, which causes lvmetad to
> timeout in some cases. Note I'm not involved in that part of this
> thread, but I believe that issue is currently unresolved.

Additionally I get a failure on boot / resume from hibernation in 
blk_mq_terminate_expire occassionally. But I tend to believe that this is the 
same issue.

This is still unresolved as of 4.16-rc6 + nolpm quick patch from Hans and the 
"Change synchronize_rcu() in scsi_device_quiesce() into synchronize_sched()" 
patch by Bart. Cause I had this occassional boot failure with it already.

The patch by Bart seems to be related to another issue of the blk-mq quiescing 
stuff.

Thanks,
-- 
Martin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Possible REGRESSION, 4.16-rc4] Error updating SMART data during runtime and could not connect to lvmetad at some boot attempts
  2018-03-19  9:50   ` Hans de Goede
  2018-03-19 12:35     ` Martin Steigerwald
@ 2018-04-10 17:30     ` Martin Steigerwald
  1 sibling, 0 replies; 18+ messages in thread
From: Martin Steigerwald @ 2018-04-10 17:30 UTC (permalink / raw)
  To: Hans de Goede
  Cc: Thorsten Leemhuis, Linux Kernel Mailing List, Tejun Heo, Bart Van Assche

Hans de Goede - 19.03.18, 10:50:
> > Martin (or someone else): Could you gibe a status update? I have this
> > issue on my list or regressions, but it's hard to follow as two
> > different issues seem to be discussed. Or is it just one issue? Did the
> > patch/discussion that Bart pointed to help? Is the issue still showing
> > up in rc6?
> 
> Your right there are 2 issues here:
[…]
> 2) There seem to be some latency issues in the MU03 version of the
> firmware, triggered by polling SMART data, which causes lvmetad to
> timeout in some cases. Note I'm not involved in that part of this
> thread, but I believe that issue is currently unresolved.

The second issue consists of what Hans described + an occassional hang on boot 
for resume from hibernation to disk.

The second issue is still unfixed as of 4.16 + [PATCH v2] block: Change a 
rcu_read_{lock,unlock}_sched() pair into rcu_read_{lock,unlock}() vom Bart Van 
Asche, which Jens Axboe accepted¹.

[1] https://patchwork.kernel.org/patch/10294287/

Currently compiling 4.16.1, but I do not expect a change, as there is nothing 
about blk-mq subsystem in the changelog as far as I saw.

Will update

[Bug 199077] [Possible REGRESSION, 4.16-rc4] Error updating SMART data during 
runtime and boot failures with blk_mq_terminate_expired in backtrace
https://bugzilla.kernel.org/show_bug.cgi?id=199077

as well about the current state. The bug report contains a screenshot of one 
of the boot hangs. I had two more on Monday, but did not take the chance to 
make another photo. I will do so next time in case its convenient enough and 
compare whether it reveals anything more than my first photo.

Thanks,
-- 
Martin

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2018-04-10 17:30 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-11  8:20 [Possible REGRESSION, 4.16-rc4] Error updating SMART data during runtime and could not connect to lvmetad at some boot attempts Martin Steigerwald
2018-03-11 14:37 ` Hans de Goede
2018-03-11 16:28   ` Martin Steigerwald
2018-03-11 16:41     ` Hans de Goede
2018-03-13 13:08   ` Martin Steigerwald
2018-03-13 14:32     ` Ming Lei
2018-03-13 14:56       ` Bart Van Assche
2018-03-14 11:01   ` Martin Steigerwald
2018-03-14 11:05     ` Hans de Goede
2018-03-14 12:48       ` Martin Steigerwald
2018-03-18 21:34         ` Hans de Goede
2018-03-18 22:06           ` Martin Steigerwald
2018-03-19  9:32             ` Hans de Goede
2018-03-15 10:48     ` Martin Steigerwald
2018-03-19  9:42 ` Thorsten Leemhuis
2018-03-19  9:50   ` Hans de Goede
2018-03-19 12:35     ` Martin Steigerwald
2018-04-10 17:30     ` Martin Steigerwald

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).