All of lore.kernel.org
 help / color / mirror / Atom feed
* Question
@ 2020-02-07 15:49 o1bigtenor
  2020-02-07 15:53 ` Question Reindl Harald
  2020-02-07 22:50 ` Question Sarah Newman
  0 siblings, 2 replies; 12+ messages in thread
From: o1bigtenor @ 2020-02-07 15:49 UTC (permalink / raw)
  To: Linux-RAID

Greetings

Running a Raid-10 array made up of 4 - 1 TB drives on a debian testing
(11) system.
mdadm - v4.1 - 2018-10-01 is the version being used.

Some weirdness is happening - - - vis a vis - - - I have one directory
(not small) that has disappeared. I last accessed said directory
(still have the pdf open which is how I could get this information)
'Last accessed 2020-01-19 6:32 A.M.'  as indicated in the 'Properties'
section of the file in question.

Has been suggested to me that I make the array read only until this is resolved.
I have space on the the array on a different system to recover this array.
Suggestions on how to do both of the above would be aprreciated

Checked the drives that make up the array using smartctl and the
results are (I removed text from each result that did not seem to have
any bearing on the test results (I likely used the wrong 'code')):

# smartctl -a /dev/sdb
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.0-2-amd64] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red
Device Model:     WDC WD10EFRX-68FYTN0
Serial Number:    WD-WCC4J3TC25D3
LU WWN Device Id: 5 0014ee 20cd9cd93
Firmware Version: 82.00A82
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Fri Feb  7 08:35:51 2020 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail
Always       -       0
  3 Spin_Up_Time            0x0027   132   130   021    Pre-fail
Always       -       4391
  4 Start_Stop_Count        0x0032   100   100   000    Old_age
Always       -       165
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail
Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age
Always       -       0
  9 Power_On_Hours          0x0032   058   058   000    Old_age
Always       -       31314
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age
Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age
Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age
Always       -       148
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age
Always       -       81
193 Load_Cycle_Count        0x0032   187   187   000    Old_age
Always       -       39644
194 Temperature_Celsius     0x0022   115   101   000    Old_age
Always       -       28
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age
Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age
Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age
Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age
Always       -       0
200 Multi_Zone_Error_Rate   0x0008   100   253   000    Old_age
Offline      -       0

SMART Error Log Version: 1
No Errors Logged

# smartctl -a /dev/sdc
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.0-2-amd64] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red
Device Model:     WDC WD10EFRX-68FYTN0
Serial Number:    WD-WCC4J6TTLZTE
LU WWN Device Id: 5 0014ee 2b784490e
Firmware Version: 82.00A82
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Fri Feb  7 08:36:49 2020 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail
Always       -       0
  3 Spin_Up_Time            0x0027   132   130   021    Pre-fail
Always       -       4375
  4 Start_Stop_Count        0x0032   100   100   000    Old_age
Always       -       151
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail
Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age
Always       -       0
  9 Power_On_Hours          0x0032   056   056   000    Old_age
Always       -       32529
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age
Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age
Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age
Always       -       149
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age
Always       -       80
193 Load_Cycle_Count        0x0032   187   187   000    Old_age
Always       -       39651
194 Temperature_Celsius     0x0022   115   099   000    Old_age
Always       -       28
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age
Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age
Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age
Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age
Always       -       0
200 Multi_Zone_Error_Rate   0x0008   100   253   000    Old_age
Offline      -       0

SMART Error Log Version: 1
No Errors Logged

# smartctl -a /dev/sde
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.0-2-amd64] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red
Device Model:     WDC WD10EFRX-68FYTN0
Serial Number:    WD-WCC4J6XRY5AN
LU WWN Device Id: 5 0014ee 20cd940c6
Firmware Version: 82.00A82
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Fri Feb  7 08:37:21 2020 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail
Always       -       0
  3 Spin_Up_Time            0x0027   131   129   021    Pre-fail
Always       -       4441
  4 Start_Stop_Count        0x0032   100   100   000    Old_age
Always       -       152
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail
Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age
Always       -       0
  9 Power_On_Hours          0x0032   056   056   000    Old_age
Always       -       32530
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age
Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age
Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age
Always       -       150
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age
Always       -       83
193 Load_Cycle_Count        0x0032   187   187   000    Old_age
Always       -       39603
194 Temperature_Celsius     0x0022   113   094   000    Old_age
Always       -       30
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age
Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age
Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age
Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age
Always       -       1
200 Multi_Zone_Error_Rate   0x0008   100   253   000    Old_age
Offline      -       0

SMART Error Log Version: 1
No Errors Logged

# smartctl -a /dev/sdf
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.0-2-amd64] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red
Device Model:     WDC WD10EFRX-68FYTN0
Serial Number:    WD-WCC4J4XV62F4
LU WWN Device Id: 5 0014ee 20cd9d7d1
Firmware Version: 82.00A82
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Fri Feb  7 08:37:44 2020 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail
Always       -       8
  3 Spin_Up_Time            0x0027   135   134   021    Pre-fail
Always       -       4241
  4 Start_Stop_Count        0x0032   100   100   000    Old_age
Always       -       165
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail
Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age
Always       -       0
  9 Power_On_Hours          0x0032   059   059   000    Old_age
Always       -       30121
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age
Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age
Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age
Always       -       148
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age
Always       -       82
193 Load_Cycle_Count        0x0032   187   187   000    Old_age
Always       -       39528
194 Temperature_Celsius     0x0022   115   097   000    Old_age
Always       -       28
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age
Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age
Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age
Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age
Always       -       0
200 Multi_Zone_Error_Rate   0x0008   100   253   000    Old_age
Offline      -       0

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Question
  2020-02-07 15:49 Question o1bigtenor
@ 2020-02-07 15:53 ` Reindl Harald
  2020-02-07 16:26   ` Question o1bigtenor
  2020-02-07 22:50 ` Question Sarah Newman
  1 sibling, 1 reply; 12+ messages in thread
From: Reindl Harald @ 2020-02-07 15:53 UTC (permalink / raw)
  To: o1bigtenor, Linux-RAID



Am 07.02.20 um 16:49 schrieb o1bigtenor:
> Running a Raid-10 array made up of 4 - 1 TB drives on a debian testing
> (11) system.
> mdadm - v4.1 - 2018-10-01 is the version being used.
> 
> Some weirdness is happening - - - vis a vis - - - I have one directory
> (not small) that has disappeared. I last accessed said directory
> (still have the pdf open which is how I could get this information)
> 'Last accessed 2020-01-19 6:32 A.M.'  as indicated in the 'Properties'
> section of the file in question.
> 
> Has been suggested to me that I make the array read only until this is resolved.
> I have space on the the array on a different system to recover this array.
> Suggestions on how to do both of the above would be aprreciated

directories on a filesystem on top of a RAID don't disappear silently -
my bet is a simple drag&drop move or deletion aka PEBCAK

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Question
  2020-02-07 15:53 ` Question Reindl Harald
@ 2020-02-07 16:26   ` o1bigtenor
  2020-02-07 17:30     ` Question Reindl Harald
  0 siblings, 1 reply; 12+ messages in thread
From: o1bigtenor @ 2020-02-07 16:26 UTC (permalink / raw)
  To: Reindl Harald; +Cc: Linux-RAID

On Fri, Feb 7, 2020 at 9:53 AM Reindl Harald <h.reindl@thelounge.net> wrote:
>
>
>
> Am 07.02.20 um 16:49 schrieb o1bigtenor:
> > Running a Raid-10 array made up of 4 - 1 TB drives on a debian testing
> > (11) system.
> > mdadm - v4.1 - 2018-10-01 is the version being used.
> >
> > Some weirdness is happening - - - vis a vis - - - I have one directory
> > (not small) that has disappeared. I last accessed said directory
> > (still have the pdf open which is how I could get this information)
> > 'Last accessed 2020-01-19 6:32 A.M.'  as indicated in the 'Properties'
> > section of the file in question.
> >
> > Has been suggested to me that I make the array read only until this is resolved.
> > I have space on the the array on a different system to recover this array.
> > Suggestions on how to do both of the above would be aprreciated
>
> directories on a filesystem on top of a RAID don't disappear silently -
> my bet is a simple drag&drop move or deletion aka PEBCAK

I checked with bash - - history  and in about 500 items there is no mention of
such. Looked in log files and can't find anything either. Quite
puzzling - - - -
that's why I'm asking here.

And yes - - - I am aware that all too often I'm the problem. I've
gotten a lot more
careful that I was even 5 years ago - - - grin.

Thanks

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Question
  2020-02-07 16:26   ` Question o1bigtenor
@ 2020-02-07 17:30     ` Reindl Harald
  2020-02-07 18:00       ` Question o1bigtenor
  2020-02-07 19:27       ` Question Wols Lists
  0 siblings, 2 replies; 12+ messages in thread
From: Reindl Harald @ 2020-02-07 17:30 UTC (permalink / raw)
  To: o1bigtenor; +Cc: Linux-RAID



Am 07.02.20 um 17:26 schrieb o1bigtenor:
> On Fri, Feb 7, 2020 at 9:53 AM Reindl Harald <h.reindl@thelounge.net> wrote:
>>
>> Am 07.02.20 um 16:49 schrieb o1bigtenor:
>>> Running a Raid-10 array made up of 4 - 1 TB drives on a debian testing
>>> (11) system.
>>> mdadm - v4.1 - 2018-10-01 is the version being used.
>>>
>>> Some weirdness is happening - - - vis a vis - - - I have one directory
>>> (not small) that has disappeared. I last accessed said directory
>>> (still have the pdf open which is how I could get this information)
>>> 'Last accessed 2020-01-19 6:32 A.M.'  as indicated in the 'Properties'
>>> section of the file in question.
>>>
>>> Has been suggested to me that I make the array read only until this is resolved.
>>> I have space on the the array on a different system to recover this array.
>>> Suggestions on how to do both of the above would be aprreciated
>>
>> directories on a filesystem on top of a RAID don't disappear silently -
>> my bet is a simple drag&drop move or deletion aka PEBCAK
> 
> I checked with bash - - history  and in about 500 items there is no mention of
> such. Looked in log files and can't find anything either. Quite
> puzzling - - - -
> that's why I'm asking here.
> 
> And yes - - - I am aware that all too often I'm the problem. I've
> gotten a lot more
> careful that I was even 5 years ago - - - grin.

if you shave no system error showing any evidence you are wrong here -
even if it would be true witout any errors message you are wrong here
because it's the same as asking if your neighbours dog pissed on your wall

even with a filesystem error you are wrong here and it's impossible that
the raid forget about exactly one folder because that layer don't now
about folders and the FS would puke if a complete area desiapperas at
running fsck - period

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Question
  2020-02-07 17:30     ` Question Reindl Harald
@ 2020-02-07 18:00       ` o1bigtenor
  2020-02-07 19:27       ` Question Wols Lists
  1 sibling, 0 replies; 12+ messages in thread
From: o1bigtenor @ 2020-02-07 18:00 UTC (permalink / raw)
  To: Reindl Harald; +Cc: Linux-RAID

On Fri, Feb 7, 2020 at 11:30 AM Reindl Harald <h.reindl@thelounge.net> wrote:
>
>
>
> Am 07.02.20 um 17:26 schrieb o1bigtenor:
> > On Fri, Feb 7, 2020 at 9:53 AM Reindl Harald <h.reindl@thelounge.net> wrote:
> >>
> >> Am 07.02.20 um 16:49 schrieb o1bigtenor:
> >>> Running a Raid-10 array made up of 4 - 1 TB drives on a debian testing
> >>> (11) system.
> >>> mdadm - v4.1 - 2018-10-01 is the version being used.
> >>>
> >>> Some weirdness is happening - - - vis a vis - - - I have one directory
> >>> (not small) that has disappeared. I last accessed said directory
> >>> (still have the pdf open which is how I could get this information)
> >>> 'Last accessed 2020-01-19 6:32 A.M.'  as indicated in the 'Properties'
> >>> section of the file in question.
> >>>
> >>> Has been suggested to me that I make the array read only until this is resolved.
> >>> I have space on the the array on a different system to recover this array.
> >>> Suggestions on how to do both of the above would be aprreciated
> >>
> >> directories on a filesystem on top of a RAID don't disappear silently -
> >> my bet is a simple drag&drop move or deletion aka PEBCAK
> >
> > I checked with bash - - history  and in about 500 items there is no mention of
> > such. Looked in log files and can't find anything either. Quite
> > puzzling - - - -
> > that's why I'm asking here.
> >
> > And yes - - - I am aware that all too often I'm the problem. I've
> > gotten a lot more
> > careful that I was even 5 years ago - - - grin.
>
> if you shave no system error showing any evidence you are wrong here -
> even if it would be true witout any errors message you are wrong here
> because it's the same as asking if your neighbours dog pissed on your wall
>
> even with a filesystem error you are wrong here and it's impossible that
> the raid forget about exactly one folder because that layer don't now
> about folders and the FS would puke if a complete area desiapperas at
> running fsck - period

I understand that this is a highly unusual occurrence.

That's why I'm asking here.

Regards

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Question
  2020-02-07 17:30     ` Question Reindl Harald
  2020-02-07 18:00       ` Question o1bigtenor
@ 2020-02-07 19:27       ` Wols Lists
  1 sibling, 0 replies; 12+ messages in thread
From: Wols Lists @ 2020-02-07 19:27 UTC (permalink / raw)
  To: Reindl Harald, o1bigtenor; +Cc: Linux-RAID

On 07/02/20 17:30, Reindl Harald wrote:
> 
> 
> Am 07.02.20 um 17:26 schrieb o1bigtenor:
>> On Fri, Feb 7, 2020 at 9:53 AM Reindl Harald <h.reindl@thelounge.net> wrote:
>>>
>>> Am 07.02.20 um 16:49 schrieb o1bigtenor:
>>>> Running a Raid-10 array made up of 4 - 1 TB drives on a debian testing
>>>> (11) system.
>>>> mdadm - v4.1 - 2018-10-01 is the version being used.
>>>>
>>>> Some weirdness is happening - - - vis a vis - - - I have one directory
>>>> (not small) that has disappeared. I last accessed said directory
>>>> (still have the pdf open which is how I could get this information)
>>>> 'Last accessed 2020-01-19 6:32 A.M.'  as indicated in the 'Properties'
>>>> section of the file in question.
>>>>
>>>> Has been suggested to me that I make the array read only until this is resolved.
>>>> I have space on the the array on a different system to recover this array.
>>>> Suggestions on how to do both of the above would be aprreciated
>>>
>>> directories on a filesystem on top of a RAID don't disappear silently -
>>> my bet is a simple drag&drop move or deletion aka PEBCAK
>>
>> I checked with bash - - history  and in about 500 items there is no mention of
>> such. Looked in log files and can't find anything either. Quite
>> puzzling - - - -
>> that's why I'm asking here.
>>
>> And yes - - - I am aware that all too often I'm the problem. I've
>> gotten a lot more
>> careful that I was even 5 years ago - - - grin.
> 
> if you shave no system error showing any evidence you are wrong here -
> even if it would be true witout any errors message you are wrong here
> because it's the same as asking if your neighbours dog pissed on your wall

You're rather dogmatic here ... I agree if you're not getting any system
error then it's unlikely to be the system. Though I would ask has the OP
run a check on the array itself - if raid thinks the array is good then
it probably is.
> 
> even with a filesystem error you are wrong here and it's impossible that
> the raid forget about exactly one folder because that layer don't now
> about folders and the FS would puke if a complete area desiapperas at
> running fsck - period
> 
And here you are TOO dogmatic. Ever heard of file corruption where PART
of a file gets corrupted? And it's nothing to do with the disk. The
entry for the directory could easily have got damaged.

And I've had the exact same thing seem to happen where PEBKAC is highly
unlikely - unless pebkac can magically acquire root permissions on a
system where I almost never do things as root ... and like it seems the
OP, drag and drop is very unlikely also as I rarely have a file manager
running.

So yes, I know exactly where the OP is coming from, and I'm just as
confused as to what's going on, especially as I deliberately change
permissions on stuff that's meant to be write-once to make sure that it
*IS* write-once.

Cheers,
Wol

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Question
  2020-02-07 15:49 Question o1bigtenor
  2020-02-07 15:53 ` Question Reindl Harald
@ 2020-02-07 22:50 ` Sarah Newman
  2020-02-07 23:21   ` Question o1bigtenor
  1 sibling, 1 reply; 12+ messages in thread
From: Sarah Newman @ 2020-02-07 22:50 UTC (permalink / raw)
  To: o1bigtenor, Linux-RAID

On 2/7/20 7:49 AM, o1bigtenor wrote:
> Greetings
> 
> Running a Raid-10 array made up of 4 - 1 TB drives on a debian testing
> (11) system.
> mdadm - v4.1 - 2018-10-01 is the version being used.
> 
> Some weirdness is happening - - - vis a vis - - - I have one directory
> (not small) that has disappeared. I last accessed said directory
> (still have the pdf open which is how I could get this information)
> 'Last accessed 2020-01-19 6:32 A.M.'  as indicated in the 'Properties'
> section of the file in question.

I assume you've looked at lsof?

https://www.linux.com/news/bring-back-deleted-files-lsof/

If it is a software problem, it just as likely, if not more likely, that it is a file system problem rather than a raid problem. You don't mention 
what file system. You're possibly also actually looking at data in the in-memory disk cache rather than what's actually stored on disk given there's 
been no reboot.

Is there anything suspicious in dmesg?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Question
  2020-02-07 22:50 ` Question Sarah Newman
@ 2020-02-07 23:21   ` o1bigtenor
  2020-02-07 23:41     ` Question Sarah Newman
  0 siblings, 1 reply; 12+ messages in thread
From: o1bigtenor @ 2020-02-07 23:21 UTC (permalink / raw)
  To: Sarah Newman; +Cc: Linux-RAID

On Fri, Feb 7, 2020 at 4:50 PM Sarah Newman <srn@prgmr.com> wrote:
>
> On 2/7/20 7:49 AM, o1bigtenor wrote:
> > Greetings
> >
> > Running a Raid-10 array made up of 4 - 1 TB drives on a debian testing
> > (11) system.
> > mdadm - v4.1 - 2018-10-01 is the version being used.
> >
> > Some weirdness is happening - - - vis a vis - - - I have one directory
> > (not small) that has disappeared. I last accessed said directory
> > (still have the pdf open which is how I could get this information)
> > 'Last accessed 2020-01-19 6:32 A.M.'  as indicated in the 'Properties'
> > section of the file in question.
>

Greetings


> I assume you've looked at lsof?

No I hadn't - - - - thanks for the tip.
only a few thousand line in a terminal - - - - - but nothing what I was
looking for.
>
> https://www.linux.com/news/bring-back-deleted-files-lsof/
>
> If it is a software problem, it just as likely, if not more likely, that it is a file system problem rather than a raid problem. You don't mention
> what file system. You're possibly also actually looking at data in the in-memory disk cache rather than what's actually stored on disk given there's
> been no reboot.

The array (raid-10) is on ext4.
>
> Is there anything suspicious in dmesg?

I hadn't looked at the messages files in /var/log so I went back to
date in question.
Didn't see anything there either.

What about doing this:

Made the array read only.
Copy the whole array using dd to a larger array on a different machine
(good overnight job).
Then run something like testdisk on the whole array.
The last would largely be a waste of time as what has
disappeared is one of about 40 upper level directories  and
it likely contained about 10 to 50 GB of files (dunno how many
levels of directories though - - - I use LOTS).

I'm looking for a reasonably solid method of trying to recover
this directory and all of its contents (about 8 years worth of
putting things into it so replicating it - - - - tough!).

Thanks for the assistance!

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Question
  2020-02-07 23:21   ` Question o1bigtenor
@ 2020-02-07 23:41     ` Sarah Newman
  2020-02-08  0:56       ` Was Re: Question - - - - now: issue resolved o1bigtenor
  0 siblings, 1 reply; 12+ messages in thread
From: Sarah Newman @ 2020-02-07 23:41 UTC (permalink / raw)
  To: o1bigtenor; +Cc: Linux-RAID

On 2/7/20 3:21 PM, o1bigtenor wrote:
> On Fri, Feb 7, 2020 at 4:50 PM Sarah Newman <srn@prgmr.com> wrote:
>>
>> On 2/7/20 7:49 AM, o1bigtenor wrote:
>>> Greetings
>>>
>>> Running a Raid-10 array made up of 4 - 1 TB drives on a debian testing
>>> (11) system.
>>> mdadm - v4.1 - 2018-10-01 is the version being used.
>>>
>>> Some weirdness is happening - - - vis a vis - - - I have one directory
>>> (not small) that has disappeared. I last accessed said directory
>>> (still have the pdf open which is how I could get this information)
>>> 'Last accessed 2020-01-19 6:32 A.M.'  as indicated in the 'Properties'
>>> section of the file in question.
>>
> 
> Greetings
> 
> 
>> I assume you've looked at lsof?
> 
> No I hadn't - - - - thanks for the tip.
> only a few thousand line in a terminal - - - - - but nothing what I was
> looking for.
>>
>> https://www.linux.com/news/bring-back-deleted-files-lsof/
>>
>> If it is a software problem, it just as likely, if not more likely, that it is a file system problem rather than a raid problem. You don't mention
>> what file system. You're possibly also actually looking at data in the in-memory disk cache rather than what's actually stored on disk given there's
>> been no reboot.
> 
> The array (raid-10) is on ext4.
>>
>> Is there anything suspicious in dmesg?
> 
> I hadn't looked at the messages files in /var/log so I went back to
> date in question.
> Didn't see anything there either.

I said the command dmesg, not /var/log.

If systemd-journald is broken, or your file system is broken, you could have tons of error messages in dmesg and nothing logged to disk.

> 
> What about doing this:
> 
> Made the array read only.
> Copy the whole array using dd to a larger array on a different machine
> (good overnight job).
> Then run something like testdisk on the whole array.
> The last would largely be a waste of time as what has
> disappeared is one of about 40 upper level directories  and
> it likely contained about 10 to 50 GB of files (dunno how many
> levels of directories though - - - I use LOTS).
> 
> I'm looking for a reasonably solid method of trying to recover
> this directory and all of its contents (about 8 years worth of
> putting things into it so replicating it - - - - tough!).

Making the original data read-only and operating on a copy of it is a reasonable idea.

You probably want http://extundelete.sourceforge.net/ though I would first try

find / -name "somefileyouknowthenameof"

just to make sure it hasn't been moved elsewhere on accident. That seems like the most likely scenario given the lack of error messages, unless no 
messages at all have been logged due to previously mentioned issues.

--Sarah

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Was Re: Question - - - - now: issue resolved
  2020-02-07 23:41     ` Question Sarah Newman
@ 2020-02-08  0:56       ` o1bigtenor
  2020-02-08  1:24         ` Sarah Newman
  0 siblings, 1 reply; 12+ messages in thread
From: o1bigtenor @ 2020-02-08  0:56 UTC (permalink / raw)
  To: Sarah Newman; +Cc: Linux-RAID

On Fri, Feb 7, 2020 at 5:41 PM Sarah Newman <srn@prgmr.com> wrote:
>
> On 2/7/20 3:21 PM, o1bigtenor wrote:
> > On Fri, Feb 7, 2020 at 4:50 PM Sarah Newman <srn@prgmr.com> wrote:
> >>
> >> On 2/7/20 7:49 AM, o1bigtenor wrote:
> >>> Greetings
> >>>
> >>> Running a Raid-10 array made up of 4 - 1 TB drives on a debian testing
> >>> (11) system.
> >>> mdadm - v4.1 - 2018-10-01 is the version being used.
> >>>
> >>> Some weirdness is happening - - - vis a vis - - - I have one directory
> >>> (not small) that has disappeared. I last accessed said directory
> >>> (still have the pdf open which is how I could get this information)
> >>> 'Last accessed 2020-01-19 6:32 A.M.'  as indicated in the 'Properties'
> >>> section of the file in question.
> >>
> >
> > Greetings
> >
> >
> >> I assume you've looked at lsof?
> >
> > No I hadn't - - - - thanks for the tip.
> > only a few thousand line in a terminal - - - - - but nothing what I was
> > looking for.
> >>
> >> https://www.linux.com/news/bring-back-deleted-files-lsof/
> >>
> >> If it is a software problem, it just as likely, if not more likely, that it is a file system problem rather than a raid problem. You don't mention
> >> what file system. You're possibly also actually looking at data in the in-memory disk cache rather than what's actually stored on disk given there's
> >> been no reboot.
> >
> > The array (raid-10) is on ext4.
> >>
> >> Is there anything suspicious in dmesg?
> >
> > I hadn't looked at the messages files in /var/log so I went back to
> > date in question.
> > Didn't see anything there either.
>
> I said the command dmesg, not /var/log.
>
> If systemd-journald is broken, or your file system is broken, you could have tons of error messages in dmesg and nothing logged to disk.
>

Found one couplet - - - it might be applicable (please advise):

[12458.717443] EXT4-fs (md0p1): warning: maximal mount count reached,
running e2fsck is recommended
[12459.215097] EXT4-fs (md0p1): mounted filesystem with ordered data
mode. Opts: (null)

> >
> > What about doing this:
> >
> > Made the array read only.
> > Copy the whole array using dd to a larger array on a different machine
> > (good overnight job).
> > Then run something like testdisk on the whole array.
> > The last would largely be a waste of time as what has
> > disappeared is one of about 40 upper level directories  and
> > it likely contained about 10 to 50 GB of files (dunno how many
> > levels of directories though - - - I use LOTS).
> >
> > I'm looking for a reasonably solid method of trying to recover
> > this directory and all of its contents (about 8 years worth of
> > putting things into it so replicating it - - - - tough!).
>
> Making the original data read-only and operating on a copy of it is a reasonable idea.

Reading the 'man' page I think the command to change the array to read-only
would be mdadm /name/of/array --readonly

Is that correct?
>
> You probably want http://extundelete.sourceforge.net/ though I would first try
>
> find / -name "somefileyouknowthenameof"
>
> just to make sure it hasn't been moved elsewhere on accident. That seems like the most likely scenario given the lack of error messages, unless no
> messages at all have been logged due to previously mentioned issues.
>
Ran the suggested - - - - - well - - - - somehow I managed to drop the directory
into a much smaller one. Dunno how that happened or any details (if someone
cares to give method(s) and means for determining that I would be very
grateful!) but I have now found the missing directory and its contents
seem to be intact.

I did understand that maybe asking on the linux-raid 'exchange' might not
have been the 'best' place to do so but this seemed quite weird and the
directory was on a raid-array and I thought that maybe this could be a signal
that there were more issues brewing. That seems now not to be the case.

Thank you to those who took the time to give suggestions or ideas with
the end result that the issue was resolved!!

Thanks a muchly!!!!
(hope that the subject change and adding that the issue was resolved
is appropriate
for this group)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Was Re: Question - - - - now: issue resolved
  2020-02-08  0:56       ` Was Re: Question - - - - now: issue resolved o1bigtenor
@ 2020-02-08  1:24         ` Sarah Newman
  2020-02-08  2:16           ` o1bigtenor
  0 siblings, 1 reply; 12+ messages in thread
From: Sarah Newman @ 2020-02-08  1:24 UTC (permalink / raw)
  To: o1bigtenor; +Cc: Linux-RAID

On 2/7/20 4:56 PM, o1bigtenor wrote:
> On Fri, Feb 7, 2020 at 5:41 PM Sarah Newman <srn@prgmr.com> wrote:

>> I said the command dmesg, not /var/log.
>>
>> If systemd-journald is broken, or your file system is broken, you could have tons of error messages in dmesg and nothing logged to disk.
>>
> 
> Found one couplet - - - it might be applicable (please advise):
> 
> [12458.717443] EXT4-fs (md0p1): warning: maximal mount count reached,
> running e2fsck is recommended
> [12459.215097] EXT4-fs (md0p1): mounted filesystem with ordered data
> mode. Opts: (null)

What it says. You might want to run fsck at some point. Don't do it while the file system is mounted. If fsck wants to make changes, back up your data 
first.

>> just to make sure it hasn't been moved elsewhere on accident. That seems like the most likely scenario given the lack of error messages, unless no
>> messages at all have been logged due to previously mentioned issues.
>>
> Ran the suggested - - - - - well - - - - somehow I managed to drop the directory
> into a much smaller one. Dunno how that happened or any details (if someone
> cares to give method(s) and means for determining that I would be very
> grateful!) but I have now found the missing directory and its contents
> seem to be intact.
> 
> I did understand that maybe asking on the linux-raid 'exchange' might not
> have been the 'best' place to do so but this seemed quite weird and the
> directory was on a raid-array and I thought that maybe this could be a signal
> that there were more issues brewing. That seems now not to be the case.

If something in hardware or the kernel is having issues, almost always you will see error messages.

It's also better to do quick and easy checks first even if you don't have a hypothesis for what would have lead to that state.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Was Re: Question - - - - now: issue resolved
  2020-02-08  1:24         ` Sarah Newman
@ 2020-02-08  2:16           ` o1bigtenor
  0 siblings, 0 replies; 12+ messages in thread
From: o1bigtenor @ 2020-02-08  2:16 UTC (permalink / raw)
  To: Sarah Newman; +Cc: Linux-RAID

On Fri, Feb 7, 2020 at 7:24 PM Sarah Newman <srn@prgmr.com> wrote:
>
> On 2/7/20 4:56 PM, o1bigtenor wrote:
> > On Fri, Feb 7, 2020 at 5:41 PM Sarah Newman <srn@prgmr.com> wrote:
>
> >> I said the command dmesg, not /var/log.
> >>
> >> If systemd-journald is broken, or your file system is broken, you could have tons of error messages in dmesg and nothing logged to disk.
> >>
> >
> > Found one couplet - - - it might be applicable (please advise):
> >
> > [12458.717443] EXT4-fs (md0p1): warning: maximal mount count reached,
> > running e2fsck is recommended
> > [12459.215097] EXT4-fs (md0p1): mounted filesystem with ordered data
> > mode. Opts: (null)
>
> What it says. You might want to run fsck at some point. Don't do it while the file system is mounted. If fsck wants to make changes, back up your data
> first.
>
> >> just to make sure it hasn't been moved elsewhere on accident. That seems like the most likely scenario given the lack of error messages, unless no
> >> messages at all have been logged due to previously mentioned issues.
> >>
> > Ran the suggested - - - - - well - - - - somehow I managed to drop the directory
> > into a much smaller one. Dunno how that happened or any details (if someone
> > cares to give method(s) and means for determining that I would be very
> > grateful!) but I have now found the missing directory and its contents
> > seem to be intact.
> >
> > I did understand that maybe asking on the linux-raid 'exchange' might not
> > have been the 'best' place to do so but this seemed quite weird and the
> > directory was on a raid-array and I thought that maybe this could be a signal
> > that there were more issues brewing. That seems now not to be the case.
>
> If something in hardware or the kernel is having issues, almost always you will see error messages.
>
> It's also better to do quick and easy checks first even if you don't have a hypothesis for what would have lead to that state.

Thanks for the help!

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2020-02-08  2:16 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-07 15:49 Question o1bigtenor
2020-02-07 15:53 ` Question Reindl Harald
2020-02-07 16:26   ` Question o1bigtenor
2020-02-07 17:30     ` Question Reindl Harald
2020-02-07 18:00       ` Question o1bigtenor
2020-02-07 19:27       ` Question Wols Lists
2020-02-07 22:50 ` Question Sarah Newman
2020-02-07 23:21   ` Question o1bigtenor
2020-02-07 23:41     ` Question Sarah Newman
2020-02-08  0:56       ` Was Re: Question - - - - now: issue resolved o1bigtenor
2020-02-08  1:24         ` Sarah Newman
2020-02-08  2:16           ` o1bigtenor

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.