Note: Resending in plain-text mode due to receipt failures (HTML-formatted emails treated as spam). Sorry!
Steps used to bisect kernel:
git bisect start
# bad: [18ed766f3642fa75262885462d3052ad7c8c87a2] Linux 5.10.140
git bisect bad cc656e7275577e3527b8fe2fea34fe9bd4a83b3b
# good: [665ee746071bf02ce8b7b9d729c8beab704393c2] Linux 5.10.139
git bisect good 722106da9469536bef4015092d6fa19b6ca71cbc
# good: [c08a104a8bce832f6e7a4e8d9ac091777b9982ea] netfilter: nf_tables: disallow binding to already bound chain
git bisect good c08a104a8bce832f6e7a4e8d9ac091777b9982ea
# good: [b2d352ed4d489f9fdfe5caf7d5e62bd2c310f0a8] btrfs: replace: drop assert for suspended replace
git bisect good b2d352ed4d489f9fdfe5caf7d5e62bd2c310f0a8
# good: [62af37c5cd7f5fd071086cab645844bf5bcdc0ef] mm/hugetlb: fix hugetlb not supporting softdirty tracking
git bisect good 62af37c5cd7f5fd071086cab645844bf5bcdc0ef
# good: [3ddbd0907f6d202e2cfd7d5b5f6ceed9361282fc] blk-mq: fix io hung due to missing commit_rqs
git bisect good 3ddbd0907f6d202e2cfd7d5b5f6ceed9361282fc
# bad: [8d5c106fe216bf16080d7070c37adf56a9227e60] scsi: ufs: core: Enable link lost interrupt
git bisect bad 8d5c106fe216bf16080d7070c37adf56a9227e60
jason@storage-server:~/linux-build/linux$ git bisect good
8d5c106fe216bf16080d7070c37adf56a9227e60 is the first bad commit
commit 8d5c106fe216bf16080d7070c37adf56a9227e60
Author: Kiwoong Kim <
kwmad.kim@samsung.com>
Date: Tue Aug 2 10:42:31 2022 +0900
scsi: ufs: core: Enable link lost interrupt
commit 6d17a112e9a63ff6a5edffd1676b99e0ffbcd269 upstream.
Link lost is treated as fatal error with commit c99b9b230149 ("scsi: ufs:
Treat link loss as fatal error"), but the event isn't registered as
interrupt source. Enable it.
Link:
https://lore.kernel.org/r/1659404551-160958-1-git-send-email-kwmad.kim@samsung.comFixes: c99b9b230149 ("scsi: ufs: Treat link loss as fatal error")
Reviewed-by: Bart Van Assche <
bvanassche@acm.org>
Signed-off-by: Kiwoong Kim <
kwmad.kim@samsung.com>
Signed-off-by: Martin K. Petersen <
martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <
gregkh@linuxfoundation.org>
drivers/scsi/ufs/ufshci.h | 6 +-----
1 file changed, 1 insertion(+), 5 deletions(-)
Hardware:
Dell R730xd
2 x E5-2667v4 CPUs
8 x 16GB DDR4-2400 Registered ECC DIMMs
SAS Controller with impacted drives - LSI 9207-8e running P20 IT firmware
07:00.0 Serial Attached SCSI controller: Broadcom / LSI SAS2308 PCI-Express Fusion-MPT SAS-2 (rev 05)
The drives are attached to two Netapp DS4246 4U 24-bay 3.5" enclosures
SMART output from one of the impacted drives:
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.10.139-custom6] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke,
www.smartmontools.org=== START OF INFORMATION SECTION ===
Device Model: WDC WD160EMFZ-11AFXA0
Serial Number: 2BH4US9N
LU WWN Device Id: 5 000cca 295d049a8
Firmware Version: 81.00A81
User Capacity: 16,000,900,661,248 bytes [16.0 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5400 rpm
Form Factor: 3.5 inches
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ACS-4 published, ANSI INCITS 529-2018
SATA Version is: SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Wed Sep 14 21:43:40 2022 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x80) Offline data collection activity
was never started.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 101) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: (1767) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 100 100 001 Pre-fail Always - 0
2 Throughput_Performance 0x0004 133 133 054 Old_age Offline - 108
3 Spin_Up_Time 0x0007 083 083 001 Pre-fail Always - 353 (Average 352)
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 81
5 Reallocated_Sector_Ct 0x0033 100 100 001 Pre-fail Always - 0
7 Seek_Error_Rate 0x000a 100 100 001 Old_age Always - 0
8 Seek_Time_Performance 0x0004 140 140 020 Old_age Offline - 15
9 Power_On_Hours 0x0012 099 099 000 Old_age Always - 12424
10 Spin_Retry_Count 0x0012 100 100 001 Old_age Always - 0
12 Power_Cycle_Count 0x0032 099 099 000 Old_age Always - 81
22 Unknown_Attribute 0x0023 100 100 025 Pre-fail Always - 100
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 610
193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always - 610
194 Temperature_Celsius 0x0002 050 050 000 Old_age Always - 33 (Min/Max 20/47)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x000a 100 100 000 Old_age Always - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 24 -
# 2 Short offline Completed without error 00% 0 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
LSI utility output from good kernel:
LSI Logic MPT Configuration Utility, Version 1.71, Sep 18, 2013
1 MPT Port found
Port Name Chip Vendor/Type/Rev MPT Rev Firmware Rev IOC
1. ioc0 LSI Logic SAS2308 D1 200 14000700 0
ioc0 is SCSI host 1
B___T___L Type Operating System Device Name
0 0 0 Disk /dev/sdd [1:0:0:0]
0 1 0 Disk /dev/sde [1:0:1:0]
0 2 0 Disk /dev/sdf [1:0:2:0]
0 3 0 Disk /dev/sdg [1:0:3:0]
0 4 0 Disk /dev/sdh [1:0:4:0]
0 5 0 Disk /dev/sdi [1:0:5:0]
0 6 0 Disk /dev/sdj [1:0:6:0]
0 7 0 Disk /dev/sdk [1:0:7:0]
0 8 0 Disk /dev/sdl [1:0:8:0]
0 9 0 Disk /dev/sdm [1:0:9:0]
0 10 0 Disk /dev/sdn [1:0:10:0]
0 11 0 Disk /dev/sdo [1:0:11:0]
0 12 0 Disk /dev/sdp [1:0:12:0]
0 13 0 Disk /dev/sdq [1:0:13:0]
0 14 0 Disk /dev/sdr [1:0:14:0]
0 15 0 Disk /dev/sds [1:0:15:0]
0 16 0 Disk /dev/sdt [1:0:16:0]
0 17 0 Disk /dev/sdu [1:0:17:0]
0 18 0 Disk /dev/sdv [1:0:18:0]
0 19 0 Disk /dev/sdw [1:0:19:0]
0 20 0 Disk /dev/sdx [1:0:20:0]
0 21 0 Disk /dev/sdy [1:0:21:0]
0 22 0 Disk /dev/sdz [1:0:22:0]
0 23 0 Disk /dev/sdaa [1:0:23:0]
0 24 0 EnclServ
0 25 0 Disk /dev/sdab [1:0:25:0]
0 26 0 Disk /dev/sdac [1:0:26:0]
0 27 0 Disk /dev/sdad [1:0:27:0]
0 28 0 Disk /dev/sdae [1:0:28:0]
0 29 0 Disk /dev/sdaf [1:0:29:0]
0 30 0 Disk /dev/sdag [1:0:30:0]
0 31 0 Disk /dev/sdah [1:0:31:0]
0 32 0 Disk /dev/sdai [1:0:32:0]
0 33 0 Disk /dev/sdaj [1:0:33:0]
0 34 0 Disk /dev/sdak [1:0:34:0]
0 35 0 Disk /dev/sdal [1:0:35:0]
0 36 0 Disk /dev/sdam [1:0:36:0]
0 37 0 Disk /dev/sdan [1:0:37:0]
0 38 0 Disk /dev/sdao [1:0:38:0]
0 39 0 Disk /dev/sdap [1:0:39:0]
0 40 0 Disk /dev/sdaq [1:0:40:0]
0 41 0 Disk /dev/sdar [1:0:41:0]
0 42 0 Disk /dev/sdas [1:0:42:0]
0 43 0 Disk /dev/sdat [1:0:43:0]
0 44 0 Disk /dev/sdau [1:0:44:0]
0 45 0 Disk /dev/sdav [1:0:45:0]
0 46 0 Disk /dev/sdaw [1:0:46:0]
0 47 0 Disk /dev/sdax [1:0:47:0]
0 48 0 Disk /dev/sday [1:0:48:0]
0 49 0 EnclServ
On an impacted kernel, the drives that drop off will be re-enumerated when they reappear with numbering above 1:0:48:0 as drives that are detached and reattached do not reuse their prior numbering.