All of lore.kernel.org
 help / color / mirror / Atom feed
* upgrading LSI SAS9211-8i fw IR->IT
@ 2017-10-24 11:47 Eyal Lebedinsky
  2017-10-24 12:14 ` Roman Mamedov
  2017-10-26  3:31 ` [sucess?] " Eyal Lebedinsky
  0 siblings, 2 replies; 16+ messages in thread
From: Eyal Lebedinsky @ 2017-10-24 11:47 UTC (permalink / raw)
  To: list linux-raid

[-- Attachment #1: Type: text/plain, Size: 1991 bytes --]

[This is a resend, as plain text and with reduced size attachment...]

Following some excitement with my 4yo+ controller I acquires a new one.
I now want to upgrade the fw from IR to IT.

I did some reading, which suggests the process is simple and straight forward - if I am lucky.
The issue seems to be that the flashing program does not run on all mobos.

I think that my server (Intel BOXDH77KC) is not booting UEFI.
Anyway, I plan to do the upgrade elsewhere, probably on my workstation (Gigabyte GA-G33M-DS2R).
Both are rather old and will be upgraded within a year. I decided to give it a test.

I disconnected the (only) disk it has and installed the LSI controller.
[No problem here except that using the on-board VGA video (until now I used an add-on video card)
I see that the letters q-z do not show properly in FreeDOS, but are OK during POST]

I started reading here:
    https://forums.servethehome.com/index.php?threads/tutorial-updating-ibm-m1015-lsi-9211-8i-firmware-on-uefi-systems.11462/
where one needs to switch between UEFI and DOS mode. Probably unsuitable for me.

I then proceeded here
    http://brycv.com/blog/2012/flashing-it-firmware-to-lsi-sas9211-8i/
where the blog mentions some hurdles along the way.

I used files from two packages:
    LSI-9211-8i.zip
    9211-8i_Package_P20_IR_IT_FW_BIOS_for_MSDOS_Windows.zip

Booted OK from a FreeDOS 1.2 USB disk. Ran 'sas2flsh.exe -list' and got the attached screen.
Seems to me that this worked. At least I did not get the
      ERROR: Failed to initialize PAL. Exiting program.
message, in which case I would need to use the efi mode flasher.

Q1) Does this mean that I am clear to proceed?

Now I am ready to do a flash erase followed by a flash program.
However, LSI warns that a failure after the erase and before the program will leave a dead (unrecoverable) card.

Q2) How do I check that I can safely do BOTH steps?

TIA

-- 
Eyal Lebedinsky (eyal@eyal.emu.id.au)


-- 
Eyal Lebedinsky (eyal@eyal.emu.id.au)

[-- Attachment #2: 20171024_145524-small.jpg --]
[-- Type: image/jpeg, Size: 39991 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: upgrading LSI SAS9211-8i fw IR->IT
  2017-10-24 11:47 upgrading LSI SAS9211-8i fw IR->IT Eyal Lebedinsky
@ 2017-10-24 12:14 ` Roman Mamedov
  2017-10-24 13:04   ` Eyal Lebedinsky
  2017-10-26  3:31 ` [sucess?] " Eyal Lebedinsky
  1 sibling, 1 reply; 16+ messages in thread
From: Roman Mamedov @ 2017-10-24 12:14 UTC (permalink / raw)
  To: Eyal Lebedinsky; +Cc: list linux-raid

On Tue, 24 Oct 2017 22:47:48 +1100
Eyal Lebedinsky <eyal@eyal.emu.id.au> wrote:

> I now want to upgrade the fw from IR to IT.

Is there any practical reason for doing that?

I have a SAS9212-4i with a very ancient IR firmware -- and the thing is, it
just works perfectly as a dumb SATA controller, including full SMART access to
connected disks -- what else is there to ask from it?

(no TRIM pass-through, but that's kind of "by design" with these cards, and I
doubt it's fixed in the new firmware)

If I'm not mistaken it's IT that is the simpler firmware for when you don't
need the hardware RAID, but considering the above, and how cumbersome and risky
the flashing process is, what am I losing by staying with IR?

Anyways here's one more HOWTO on flashing that may or may not be more helpful
than the ones you listed:
https://wiki.hackspherelabs.com/index.php?title=LSI_Raid_Firmware_Bios_Flashing

-- 
With respect,
Roman

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: upgrading LSI SAS9211-8i fw IR->IT
  2017-10-24 12:14 ` Roman Mamedov
@ 2017-10-24 13:04   ` Eyal Lebedinsky
  2017-10-24 17:59     ` Roman Mamedov
  0 siblings, 1 reply; 16+ messages in thread
From: Eyal Lebedinsky @ 2017-10-24 13:04 UTC (permalink / raw)
  To: Roman Mamedov; +Cc: list linux-raid

On 24/10/17 23:14, Roman Mamedov wrote:
> On Tue, 24 Oct 2017 22:47:48 +1100
> Eyal Lebedinsky <eyal@eyal.emu.id.au> wrote:
> 
>> I now want to upgrade the fw from IR to IT.
> 
> Is there any practical reason for doing that?

I want to avoid the risk of the IR fw writing any metadata to the disks which already
hold a software RAID.

Is it not the case that the IR fw may mess with the disks (I read a comment suggesting it might)?

Regardless, I want to update to fw 20 - the card is on 18 now.

> I have a SAS9212-4i with a very ancient IR firmware -- and the thing is, it
> just works perfectly as a dumb SATA controller, including full SMART access to
> connected disks -- what else is there to ask from it?

This is a fair comment which I will consider. ATM I avoid booting this card with the
disks connected.

> (no TRIM pass-through, but that's kind of "by design" with these cards, and I
> doubt it's fixed in the new firmware)

Not an issue for me.

> If I'm not mistaken it's IT that is the simpler firmware for when you don't
> need the hardware RAID, but considering the above, and how cumbersome and risky
> the flashing process is, what am I losing by staying with IR?

Yes, I expect the IT fw is simpler, and I hope more stable too as it mostly stays
out of the way.

> Anyways here's one more HOWTO on flashing that may or may not be more helpful
> than the ones you listed:
> https://wiki.hackspherelabs.com/index.php?title=LSI_Raid_Firmware_Bios_Flashing

Thanks for the link.

-- 
Eyal Lebedinsky (eyal@eyal.emu.id.au)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: upgrading LSI SAS9211-8i fw IR->IT
  2017-10-24 13:04   ` Eyal Lebedinsky
@ 2017-10-24 17:59     ` Roman Mamedov
  0 siblings, 0 replies; 16+ messages in thread
From: Roman Mamedov @ 2017-10-24 17:59 UTC (permalink / raw)
  To: Eyal Lebedinsky; +Cc: list linux-raid

On Wed, 25 Oct 2017 00:04:37 +1100
Eyal Lebedinsky <eyal@eyal.emu.id.au> wrote:

> I want to avoid the risk of the IR fw writing any metadata to the disks which already
> hold a software RAID.
> 
> Is it not the case that the IR fw may mess with the disks (I read a comment suggesting it might)?

I believe when HW RAID controllers do that, they set up HPA to reserve a small
area at the end, and write the metadata there. But just checked, there is no
HPA on any of the 3 disks that I have connected.


$ sudo hdparm -N /dev/sdi

/dev/sdi:
 max sectors   = 3907029168/3907029168, HPA is disabled

$ sudo hdparm -N /dev/sdh

/dev/sdh:
 max sectors   = 3907029168/3907029168, HPA is disabled

$ sudo hdparm -N /dev/sdg

/dev/sdg:
 max sectors   = 3907029168/3907029168, HPA is disabled


You can test by connecting some disk with data you don't care about, and
checking if its contents get modified (especially at the end), or if it gets
HPA enabled.

-- 
With respect,
Roman

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [sucess?] upgrading LSI SAS9211-8i fw IR->IT
  2017-10-24 11:47 upgrading LSI SAS9211-8i fw IR->IT Eyal Lebedinsky
  2017-10-24 12:14 ` Roman Mamedov
@ 2017-10-26  3:31 ` Eyal Lebedinsky
  2017-10-27 23:09   ` Eyal Lebedinsky
  2017-11-02 10:51   ` [sucess] " Eyal Lebedinsky
  1 sibling, 2 replies; 16+ messages in thread
From: Eyal Lebedinsky @ 2017-10-26  3:31 UTC (permalink / raw)
  To: list linux-raid

On 24/10/17 22:47, Eyal Lebedinsky wrote:
> [This is a resend, as plain text and with reduced size attachment...]
> 
> Following some excitement with my 4yo+ controller I acquires a new one.
> I now want to upgrade the fw from IR to IT.
> 
> I did some reading, which suggests the process is simple and straight forward - if I am lucky.
> The issue seems to be that the flashing program does not run on all mobos.
> 
> I think that my server (Intel BOXDH77KC) is not booting UEFI.
> Anyway, I plan to do the upgrade elsewhere, probably on my workstation (Gigabyte GA-G33M-DS2R).
> Both are rather old and will be upgraded within a year. I decided to give it a test.
> 
> I disconnected the (only) disk it has and installed the LSI controller.
> [No problem here except that using the on-board VGA video (until now I used an add-on video card)
> I see that the letters q-z do not show properly in FreeDOS, but are OK during POST]
> 
> I started reading here:
>     https://forums.servethehome.com/index.php?threads/tutorial-updating-ibm-m1015-lsi-9211-8i-firmware-on-uefi-systems.11462/
> where one needs to switch between UEFI and DOS mode. Probably unsuitable for me.
> 
> I then proceeded here
>     http://brycv.com/blog/2012/flashing-it-firmware-to-lsi-sas9211-8i/
> where the blog mentions some hurdles along the way.
> 
> I used files from two packages:
>     LSI-9211-8i.zip
>     9211-8i_Package_P20_IR_IT_FW_BIOS_for_MSDOS_Windows.zip
> 
> Booted OK from a FreeDOS 1.2 USB disk. Ran 'sas2flsh.exe -list' and got the attached screen.
> Seems to me that this worked. At least I did not get the
>       ERROR: Failed to initialize PAL. Exiting program.
> message, in which case I would need to use the efi mode flasher.
> 
> Q1) Does this mean that I am clear to proceed?
> 
> Now I am ready to do a flash erase followed by a flash program.
> However, LSI warns that a failure after the erase and before the program will leave a dead (unrecoverable) card.
> 
> Q2) How do I check that I can safely do BOTH steps?
> 
> TIA

I progressed slowly and I think that I finally succeeded.

1) Prepared a bootable USB disk (full FreeDOS 1.2)
2) Copied the required files to the USB disk.
3) Installed the LSI card in a PC after disconnecting all the disks
4) Booted from the USB and tried
	sas2flsh -list
    It worked.
5) Upgraded from fw IR 18 to IR 20
	sas2flsh -o -f 2118ir.bin -b mptsas2.rom
    It worked again, so now I was ready to attempt a flash clear and upgrade to IT
6) Cleared the flash
	sas2flsh -o -e 6
    After it said that it is clearing I got no more messages for over 10 minutes.
    This was my worst worry as the card would be bricked.
    ^C had no effect. Ctl-Alt-Del had no effect. The machine was deal.
    Is this proof that there IS a God?

[BTW, this machine was known to lock up at times, I thought it was the system (linux)
  but it now seems to be a more fundamental issue]

7) Re-booted from the USB and ran
	sas2flsh -list
    A message came up saying the card is not operational, but surprisingly
    it proceeded to say a firmware is required and asked for a file name.
    I entered '2118it.bin' and it succeeded in flashing it as a '-list' proved.

At the end I rebooted once more and all looked good. Naturally there is no BIOS
programmed (I could flash it but decided that the IT fw probably does not require it)
	Q1) is this correct?

Comparing the '-list' details before and after the process, on top of the new fw
(and no BIOS) I noticed that the "SAS Address" changed.
	Q2) Should I reconfigure the card with the original address with
		sas2flsh -o -sasadd 500605B-#-####-####
	    I assume it is only used as a global unique ID.

As a final test I plan to boot the actual server with this card and 3 sacrificial
disks (now all zeroed) attached, to confirm that nothing is written to the disks.

As an aside, I now do not see the text corruption I mentioned earlier, so it was
probably the BIOS causing it and not FreeDOS.

cheers

-- 
Eyal Lebedinsky (eyal@eyal.emu.id.au)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [sucess?] upgrading LSI SAS9211-8i fw IR->IT
  2017-10-26  3:31 ` [sucess?] " Eyal Lebedinsky
@ 2017-10-27 23:09   ` Eyal Lebedinsky
  2017-10-29 15:02     ` Brad Campbell
  2017-11-02 10:51   ` [sucess] " Eyal Lebedinsky
  1 sibling, 1 reply; 16+ messages in thread
From: Eyal Lebedinsky @ 2017-10-27 23:09 UTC (permalink / raw)
  To: list linux-raid

On 26/10/17 14:31, Eyal Lebedinsky wrote:

[trimmed fw upgrade notes]

> As a final test I plan to boot the actual server with this card and 3 sacrificial
> disks (now all zeroed) attached, to confirm that nothing is written to the disks.

I installed the card with three disks that were zeroed and booted a fedora 26. I saw the disks OK.
I rebooted the machine (removing the disks) and then checked all disks and they were still zeroed.

I am ready to commission this controller after next week's backup, but would still like answers to
some questions.

Q1) I did not flash a rom file, do I need to do so (with the IT fw)?

Q2) The upgrade changed the SAS Address, should I reprogram the original address?

Q3) Below are the relevant messages from the test, do they look good?
     Is the "overriding NVDATA EEDPTagMode setting" OK?

cheers
	Eyal

kernel: mpt3sas version 15.100.00.00 loaded
kernel: mpt3sas 0000:01:00.0: can't disable ASPM; OS doesn't have ASPM control
kernel: mpt2sas_cm0: 64 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem (8162760 kB)
kernel: mpt2sas_cm0: MSI-X vectors supported: 1, no of cores: 4, max_msix_vectors: -1
kernel: mpt2sas0-msix0: PCI-MSI-X enabled: IRQ 30
kernel: mpt2sas_cm0: iomem(0x00000000f1040000), mapped(0xffffb20541444000), size(16384)
kernel: mpt2sas_cm0: ioport(0x000000000000b000), size(256)
kernel: mpt2sas_cm0: Allocated physical memory: size(7579 kB)
kernel: mpt2sas_cm0: Current Controller Queue Depth(3364),Max Controller Queue Depth(3432)
kernel: mpt2sas_cm0: Scatter Gather Elements per IO(128)
kernel: mpt2sas_cm0: overriding NVDATA EEDPTagMode setting
kernel: mpt2sas_cm0: LSISAS2008: FWVersion(20.00.07.00), ChipRevision(0x03), BiosVersion(00.00.00.00)
kernel: mpt2sas_cm0: Protocol=(
kernel: Initiator
kernel: ,Target
kernel: ),
kernel: Capabilities=(
kernel: TLR
kernel: ,EEDP
kernel: ,Snapshot Buffer
kernel: ,Diag Trace Buffer
kernel: ,Task Set Full
kernel: ,NCQ
kernel: )
kernel: scsi host8: Fusion MPT SAS Host
kernel: mpt2sas_cm0: sending port enable !!
kernel: mpt2sas_cm0: host_add: handle(0x0001), sas_addr(0x500605b0013ca580), phys(8)
kernel: scsi 8:0:0:0: Direct-Access     ATA      SAMSUNG HD400LJ  0-15 PQ: 0 ANSI: 6
kernel: scsi 8:0:0:0: SATA: handle(0x0009), sas_addr(0x4433221101000000), phy(1), device_name(0x0000000000000000)
kernel: scsi 8:0:0:0: SATA: enclosure_logical_id(0x500605b0013ca580), slot(2)
kernel: scsi 8:0:0:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
kernel: scsi 8:0:1:0: Direct-Access     ATA      SAMSUNG HD401LJ  0-15 PQ: 0 ANSI: 6
kernel: scsi 8:0:1:0: SATA: handle(0x000a), sas_addr(0x4433221102000000), phy(2), device_name(0x0000000000000000)
kernel: scsi 8:0:1:0: SATA: enclosure_logical_id(0x500605b0013ca580), slot(1)
kernel: scsi 8:0:1:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
kernel: scsi 8:0:2:0: Direct-Access     ATA      WDC WD3200JD-00K 5J08 PQ: 0 ANSI: 6
kernel: scsi 8:0:2:0: SATA: handle(0x000b), sas_addr(0x4433221103000000), phy(3), device_name(0x0000000000000000)
kernel: scsi 8:0:2:0: SATA: enclosure_logical_id(0x500605b0013ca580), slot(0)
kernel: scsi 8:0:2:0: atapi(n), ncq(n), asyn_notify(n), smart(y), fua(n), sw_preserve(n)
kernel: mpt2sas_cm0: port enable: SUCCESS
kernel: sd 8:0:0:0: Attached scsi generic sg10 type 0
kernel: sd 8:0:1:0: Attached scsi generic sg11 type 0
kernel: sd 8:0:2:0: Attached scsi generic sg12 type 0
kernel: sd 8:0:2:0: [sdl] 625140335 512-byte logical blocks: (320 GB/298 GiB)
kernel: sd 8:0:0:0: [sdj] 781422768 512-byte logical blocks: (400 GB/373 GiB)
kernel: sd 8:0:1:0: [sdk] 781422768 512-byte logical blocks: (400 GB/373 GiB)
kernel: sd 8:0:2:0: [sdl] Write Protect is off
kernel: sd 8:0:2:0: [sdl] Write cache: enabled, read cache: enabled, supports DPO and FUA
kernel: sd 8:0:0:0: [sdj] Write Protect is off
kernel: sd 8:0:2:0: [sdl] Attached SCSI disk
kernel: sd 8:0:0:0: [sdj] Write cache: enabled, read cache: enabled, supports DPO and FUA
kernel: sd 8:0:1:0: [sdk] Write Protect is off
kernel: sd 8:0:1:0: [sdk] Write cache: enabled, read cache: enabled, supports DPO and FUA
kernel: sd 8:0:0:0: [sdj] Attached SCSI disk
kernel: sd 8:0:1:0: [sdk] Attached SCSI disk

-- 
Eyal Lebedinsky (eyal@eyal.emu.id.au)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [sucess?] upgrading LSI SAS9211-8i fw IR->IT
  2017-10-27 23:09   ` Eyal Lebedinsky
@ 2017-10-29 15:02     ` Brad Campbell
  0 siblings, 0 replies; 16+ messages in thread
From: Brad Campbell @ 2017-10-29 15:02 UTC (permalink / raw)
  To: Eyal Lebedinsky, list linux-raid

On 28/10/17 07:09, Eyal Lebedinsky wrote:

> I am ready to commission this controller after next week's backup, but
> would still like answers to
> some questions.
>
> Q1) I did not flash a rom file, do I need to do so (with the IT fw)?

Not unless you plan on trying to boot from one. I have the BIOS zeroed 
on all my cards as it speeds up POST.

> Q2) The upgrade changed the SAS Address, should I reprogram the original
> address?

I always reset the address, but mainly because I don't want 2 cards in 
the machine with the same address.

> Q3) Below are the relevant messages from the test, do they look good?
>     Is the "overriding NVDATA EEDPTagMode setting" OK?

Can't answer that one. A quick squiz at the source doesn't seem like 
it's particularly evil though.

Brad
-- 
Dolphins are so intelligent that within a few weeks they can
train Americans to stand at the edge of the pool and throw them
fish.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [sucess] upgrading LSI SAS9211-8i fw IR->IT
  2017-10-26  3:31 ` [sucess?] " Eyal Lebedinsky
  2017-10-27 23:09   ` Eyal Lebedinsky
@ 2017-11-02 10:51   ` Eyal Lebedinsky
  2017-11-03  0:54     ` Brad Campbell
  1 sibling, 1 reply; 16+ messages in thread
From: Eyal Lebedinsky @ 2017-11-02 10:51 UTC (permalink / raw)
  To: list linux-raid

On 26/10/17 14:31, Eyal Lebedinsky wrote:
> On 24/10/17 22:47, Eyal Lebedinsky wrote:

[trimmed the story of reflashing the LSI to the latest IT fw]

> As a final test I plan to boot the actual server with this card and 3 sacrificial
> disks (now all zeroed) attached, to confirm that nothing is written to the disks.

For the record, in closing this thread:
- I did the above test and confirmed that the disks are not written to by the driver.
- I replaced the HighPoint HBA with the LSI and simply moved the SFF-8087 across.
   The array came up without issue.
- I ran a full raid 'check' - zero mismatches.
   This was a good surprise because I had three cases of full array failures with the HighPoint and
   expected some corruption.

What I noted so far:
- the heat sink feels very hot even after short usage.
- the disks were now in a different order, pretty much reverse order.
   I am not sure the order will remain fixed (by port number?) or variable (as the disks spin up).

I thank all who responded, I found it very helpful.

cheers
	Eyal

> As an aside, I now do not see the text corruption I mentioned earlier, so it was
> probably the BIOS causing it and not FreeDOS.
> 
> cheers
 >	Eyal

-- 
Eyal Lebedinsky (eyal@eyal.emu.id.au)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [sucess] upgrading LSI SAS9211-8i fw IR->IT
  2017-11-02 10:51   ` [sucess] " Eyal Lebedinsky
@ 2017-11-03  0:54     ` Brad Campbell
  2017-11-03  2:31       ` Eyal Lebedinsky
  0 siblings, 1 reply; 16+ messages in thread
From: Brad Campbell @ 2017-11-03  0:54 UTC (permalink / raw)
  To: Eyal Lebedinsky, list linux-raid

On 02/11/17 18:51, Eyal Lebedinsky wrote:

> What I noted so far:
> - the heat sink feels very hot even after short usage.

Yeah, they do get warm. Best to make sure you have a bit of airflow over 
them. It doesn't take much air movement to keep temps in check.

> - the disks were now in a different order, pretty much reverse order.
>    I am not sure the order will remain fixed (by port number?) or 
> variable (as the disks spin up).

The driver scans them in port/slot order, and apparently in order of 
increasing pci address in the case of multiple cards. In my case where I 
have staggered spinup enabled it spins them up in groups and then waits 
for them in order, so things don't tend to move around unless you shift 
hardware about or a drive fails.

Make sure you do a periodic lsdrv just for records sake, but as yet I've 
not needed it. I keep a spreadsheet which lists which drive S/N is in 
which physical slot so when something happens I can just look up which 
drive needs to be popped without risk of pulling the wrong disk.

I swapped out a set of highpoint controllers for these LSI units back in 
2011 and it was the best thing I ever did for storage speed and reliability.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [sucess] upgrading LSI SAS9211-8i fw IR->IT
  2017-11-03  0:54     ` Brad Campbell
@ 2017-11-03  2:31       ` Eyal Lebedinsky
  2017-11-03  3:03         ` Brad Campbell
  2017-11-06 20:15         ` Wolfgang Denk
  0 siblings, 2 replies; 16+ messages in thread
From: Eyal Lebedinsky @ 2017-11-03  2:31 UTC (permalink / raw)
  To: Brad Campbell, list linux-raid

On 03/11/17 11:54, Brad Campbell wrote:
> On 02/11/17 18:51, Eyal Lebedinsky wrote:
> 
>> What I noted so far:
>> - the heat sink feels very hot even after short usage.
> 
> Yeah, they do get warm. Best to make sure you have a bit of airflow over them. It doesn't take much air movement to keep temps in check.
> 
>> - the disks were now in a different order, pretty much reverse order.
>>    I am not sure the order will remain fixed (by port number?) or variable (as the disks spin up).
> 
> The driver scans them in port/slot order, and apparently in order of increasing pci address in the case of multiple cards. In my case where I have staggered spinup enabled it spins them up in groups and then waits for them in order, so things don't tend to move around unless you shift hardware about or a drive fails.

I have only one card. However, I did not reconnect the drives to the SFF-8087 but merely moved the two harnesses
to the new card.

I recorded the S/N of the disks that were detected as c,d,e,f,g,h,i and recorded their physical
location (I had many disk failures/replacements in the 4 years life of this array).

Now, with the LSI, looking at the S/N I see the disks detected as h,g,f,e,d,c,i. I would understand
if two 4-way connectors were swapped, but this (mostly) reverse order?

I expect that the two sockets on the card are in different order, and the four lanes on each SFF
are also in reverse order.

BTW, sdi was always very slow to spin up so maybe this is why it is last rather than before sdd
if it followed the same reverse ordering.

> Make sure you do a periodic lsdrv just for records sake, but as yet I've not needed it. I keep a spreadsheet which lists which drive S/N is in which physical slot so when something happens I can just look up which drive needs to be popped without risk of pulling the wrong disk.

The HighPoint had a habit of resetting a nearby disk when hot-removing another so I always did the swapping
offline. Maybe the LSI is better? My 7 disk array (4TB WD blacks) had 11 replacements so far (4 years) ...

> I swapped out a set of highpoint controllers for these LSI units back in 2011 and it was the best thing I ever did for storage speed and reliability.

-- 
Eyal Lebedinsky (eyal@eyal.emu.id.au)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [sucess] upgrading LSI SAS9211-8i fw IR->IT
  2017-11-03  2:31       ` Eyal Lebedinsky
@ 2017-11-03  3:03         ` Brad Campbell
  2017-11-03  3:39           ` Eyal Lebedinsky
  2017-11-06 20:15         ` Wolfgang Denk
  1 sibling, 1 reply; 16+ messages in thread
From: Brad Campbell @ 2017-11-03  3:03 UTC (permalink / raw)
  To: Eyal Lebedinsky, list linux-raid

On 03/11/17 10:31, Eyal Lebedinsky wrote:

> The HighPoint had a habit of resetting a nearby disk when hot-removing 
> another so I always did the swapping
> offline. Maybe the LSI is better? My 7 disk array (4TB WD blacks) had 11 
> replacements so far (4 years) ...
> 

I've swapped out everything from individual disks to entire arrays with 
the machine running. If possible and I care about the disk I'll take 
care to spin it down with hdparm first, but regardless the LSI 
controllers have behaved flawlessly.

The only issue I see periodically is when a smart poll coincides with 
some seriously heavy activity I might get one or more messages like the 
following in dmesg :

[1936062.640198] mpt2sas_cm0: log_info(0x31120303): originator(PL), 
code(0x12), sub_code(0x0303)

Generally I only see those during my monthly array scrub where every 
disk in every array is going hammer and tongs simultaneously and I've 
never seen an issue related to those messages.

Unrelated an with regard to SSD, for TRIM to work you need "returns 
deterministic" *and* "returns zero" set for the card to enable trim. I 
have some intel 330's that do, and some Samsung 830s that don't. 
Apparently the 840Pro was the only Samsung drive that did the business.

When I do my next SSD upgrade (these are only 60% gone after 6 years) 
I'll seek out drives that have the right features.

I recently upgraded one of my base servers and had to replace an 8 port 
LSI with a 16 port (2x2008 with 1x2016). I did the same firmware upgrade 
and there have been no performance or reliability issues. I really like 
the LSI cards.

Odd to hear of your disk issues. I just swapped out 7 WD Green drives 
with 6 years on them. I started with 10 and lost 1 to early life, and 2 
to grown defects in the last year or so (classic bathtub curve). They 
were early units though that still had TLER enabled on them. The only 
drives I've had mass attrition on were Seagate/Maxtor 1TB 7200.11, and 
they were known as not great units.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [sucess] upgrading LSI SAS9211-8i fw IR->IT
  2017-11-03  3:03         ` Brad Campbell
@ 2017-11-03  3:39           ` Eyal Lebedinsky
  0 siblings, 0 replies; 16+ messages in thread
From: Eyal Lebedinsky @ 2017-11-03  3:39 UTC (permalink / raw)
  To: Brad Campbell, list linux-raid

On 03/11/17 14:03, Brad Campbell wrote:
> On 03/11/17 10:31, Eyal Lebedinsky wrote:
> 
>> The HighPoint had a habit of resetting a nearby disk when hot-removing another so I always did the swapping
>> offline. Maybe the LSI is better? My 7 disk array (4TB WD blacks) had 11 replacements so far (4 years) ...
>>
> 
> I've swapped out everything from individual disks to entire arrays with the machine running. If possible and I care about the disk I'll take care to spin it down with hdparm first, but regardless the LSI controllers have behaved flawlessly.
> 
> The only issue I see periodically is when a smart poll coincides with some seriously heavy activity I might get one or more messages like the following in dmesg :
> 
> [1936062.640198] mpt2sas_cm0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303)
> 
> Generally I only see those during my monthly array scrub where every disk in every array is going hammer and tongs simultaneously and I've never seen an issue related to those messages.

I will keep an eye on such messages.

A message I saw once is this one:
	kernel: mpt3sas 0000:01:00.0: invalid short VPD tag 00 at offset 1
which a web search suggests is effectively a warning.
It coincides with an 'lspci' that runs nightly.

On another machine I do at times see the same from a NIC:
	kernel: r8169 0000:04:00.0: invalid short VPD tag 00 at offset 1
Again, probably from 'lspci'.

> Unrelated an with regard to SSD, for TRIM to work you need "returns deterministic" *and* "returns zero" set for the card to enable trim. I have some intel 330's that do, and some Samsung 830s that don't. Apparently the 840Pro was the only Samsung drive that did the business.
> 
> When I do my next SSD upgrade (these are only 60% gone after 6 years) I'll seek out drives that have the right features.
> 
> I recently upgraded one of my base servers and had to replace an 8 port LSI with a 16 port (2x2008 with 1x2016). I did the same firmware upgrade and there have been no performance or reliability issues. I really like the LSI cards.
> 
> Odd to hear of your disk issues. I just swapped out 7 WD Green drives with 6 years on them. I started with 10 and lost 1 to early life, and 2 to grown defects in the last year or so (classic bathtub curve). They were early units though that still had TLER enabled on them. The only drives I've had mass attrition on were Seagate/Maxtor 1TB 7200.11, and they were known as not great units.
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Eyal Lebedinsky (eyal@eyal.emu.id.au)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [sucess] upgrading LSI SAS9211-8i fw IR->IT
  2017-11-03  2:31       ` Eyal Lebedinsky
  2017-11-03  3:03         ` Brad Campbell
@ 2017-11-06 20:15         ` Wolfgang Denk
  2017-11-06 21:38           ` Eyal Lebedinsky
  1 sibling, 1 reply; 16+ messages in thread
From: Wolfgang Denk @ 2017-11-06 20:15 UTC (permalink / raw)
  To: Eyal Lebedinsky; +Cc: Brad Campbell, list linux-raid

Dear Eyal,

In message <c12a2a32-2321-1ed7-e1de-ce0e408552e1@eyal.emu.id.au> you wrote:
>
> (I had many disk failures/replacements in the 4 years life of this array).

Reading this makes me wonder if you checked your environment for
other influences.  there must be some reason for an exceptional high
number of failures.

I remeber we also had a nighmare of disk errors in the rack in the
2nd floor of our building - which disappeared after moving the rack
into the basement.  I can't prove it, but I blame it on vibrations.
We have a heavy traffic train line less than 50 meters away, and
disks (classic, magnetic ones) definitely do not like vibrations -
see [1].  Maybe you have other influences you did not check for yet?

[1] https://www.youtube.com/watch?v=tDacjrSCeq4

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
Speculation is always more interesting than facts.
                                    - Terry Pratchett, _Making_Money_

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [sucess] upgrading LSI SAS9211-8i fw IR->IT
  2017-11-06 20:15         ` Wolfgang Denk
@ 2017-11-06 21:38           ` Eyal Lebedinsky
  2017-11-06 21:48             ` Phil Turmel
  0 siblings, 1 reply; 16+ messages in thread
From: Eyal Lebedinsky @ 2017-11-06 21:38 UTC (permalink / raw)
  To: list linux-raid; +Cc: Wolfgang Denk, Brad Campbell

On 07/11/17 07:15, Wolfgang Denk wrote:
> Dear Eyal,
> 
> In message <c12a2a32-2321-1ed7-e1de-ce0e408552e1@eyal.emu.id.au> you wrote:
>>
>> (I had many disk failures/replacements in the 4 years life of this array).
> 
> Reading this makes me wonder if you checked your environment for
> other influences.  there must be some reason for an exceptional high
> number of failures.
> 
> I remeber we also had a nighmare of disk errors in the rack in the
> 2nd floor of our building - which disappeared after moving the rack
> into the basement.  I can't prove it, but I blame it on vibrations.
> We have a heavy traffic train line less than 50 meters away, and
> disks (classic, magnetic ones) definitely do not like vibrations -
> see [1].  Maybe you have other influences you did not check for yet?

Interesting Wolfgang,

- This array is at  home, a relatively quiet place.
- I monitor the disks temperatures and it is OK.
- The machine runs off a UPS which can be a source of bad power
   (if the PS does not filter it out).
- The HBA may be somehow bothering the disks?

The disks are under warranty until late next year so there is
time to see if the disks do better with the LSI.

BTW, two of the RMAs were for disks that arrived DOA (as RMAs).
I  do not have high regard for the WD blacks. The failures were
spread across the last 4 years (so not infant mortality).

If nothing else, this experience made me comfortable with software
raid, and encouraged me to stick to my backup schedule.

cheers

> [1] https://www.youtube.com/watch?v=tDacjrSCeq4
> 
> Best regards,
> 
> Wolfgang Denk

-- 
Eyal Lebedinsky (eyal@eyal.emu.id.au)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [sucess] upgrading LSI SAS9211-8i fw IR->IT
  2017-11-06 21:38           ` Eyal Lebedinsky
@ 2017-11-06 21:48             ` Phil Turmel
  2017-11-06 21:59               ` Eyal Lebedinsky
  0 siblings, 1 reply; 16+ messages in thread
From: Phil Turmel @ 2017-11-06 21:48 UTC (permalink / raw)
  To: Eyal Lebedinsky, list linux-raid; +Cc: Wolfgang Denk, Brad Campbell

On 11/06/2017 04:38 PM, Eyal Lebedinsky wrote:
> On 07/11/17 07:15, Wolfgang Denk wrote:
>> Dear Eyal,
>> 
>> In message <c12a2a32-2321-1ed7-e1de-ce0e408552e1@eyal.emu.id.au>
>> you wrote:
>>> 
>>> (I had many disk failures/replacements in the 4 years life of
>>> this array).
>> 
>> Reading this makes me wonder if you checked your environment for 
>> other influences.  there must be some reason for an exceptional
>> high number of failures.
>> 
>> I remeber we also had a nighmare of disk errors in the rack in the 
>> 2nd floor of our building - which disappeared after moving the
>> rack into the basement.  I can't prove it, but I blame it on
>> vibrations. We have a heavy traffic train line less than 50 meters
>> away, and disks (classic, magnetic ones) definitely do not like
>> vibrations - see [1].  Maybe you have other influences you did not
>> check for yet?
> 
> Interesting Wolfgang,
> 
> - This array is at  home, a relatively quiet place. - I monitor the
> disks temperatures and it is OK. - The machine runs off a UPS which
> can be a source of bad power (if the PS does not filter it out). -
> The HBA may be somehow bothering the disks?
> 
> The disks are under warranty until late next year so there is time to
> see if the disks do better with the LSI.
> 
> BTW, two of the RMAs were for disks that arrived DOA (as RMAs). I  do
> not have high regard for the WD blacks. The failures were spread
> across the last 4 years (so not infant mortality).
> 
> If nothing else, this experience made me comfortable with software 
> raid, and encouraged me to stick to my backup schedule.

That's a really bad failure rate.  But they're WD Blacks, which if I
recall correctly, do not support scterc.  Did you deal with your driver
timeouts?  If not, those drives probably weren't really dead.  Just not
raid-compatible out-of-the-box.

Phil

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [sucess] upgrading LSI SAS9211-8i fw IR->IT
  2017-11-06 21:48             ` Phil Turmel
@ 2017-11-06 21:59               ` Eyal Lebedinsky
  0 siblings, 0 replies; 16+ messages in thread
From: Eyal Lebedinsky @ 2017-11-06 21:59 UTC (permalink / raw)
  To: Phil Turmel, list linux-raid; +Cc: Wolfgang Denk, Brad Campbell

On 07/11/17 08:48, Phil Turmel wrote:
> On 11/06/2017 04:38 PM, Eyal Lebedinsky wrote:
>> On 07/11/17 07:15, Wolfgang Denk wrote:
>>> Dear Eyal,
>>>
>>> In message <c12a2a32-2321-1ed7-e1de-ce0e408552e1@eyal.emu.id.au>
>>> you wrote:
>>>>
>>>> (I had many disk failures/replacements in the 4 years life of
>>>> this array).
>>>
>>> Reading this makes me wonder if you checked your environment for
>>> other influences.  there must be some reason for an exceptional
>>> high number of failures.
>>>
>>> I remeber we also had a nighmare of disk errors in the rack in the
>>> 2nd floor of our building - which disappeared after moving the
>>> rack into the basement.  I can't prove it, but I blame it on
>>> vibrations. We have a heavy traffic train line less than 50 meters
>>> away, and disks (classic, magnetic ones) definitely do not like
>>> vibrations - see [1].  Maybe you have other influences you did not
>>> check for yet?
>>
>> Interesting Wolfgang,
>>
>> - This array is at  home, a relatively quiet place. - I monitor the
>> disks temperatures and it is OK. - The machine runs off a UPS which
>> can be a source of bad power (if the PS does not filter it out). -
>> The HBA may be somehow bothering the disks?
>>
>> The disks are under warranty until late next year so there is time to
>> see if the disks do better with the LSI.
>>
>> BTW, two of the RMAs were for disks that arrived DOA (as RMAs). I  do
>> not have high regard for the WD blacks. The failures were spread
>> across the last 4 years (so not infant mortality).
>>
>> If nothing else, this experience made me comfortable with software
>> raid, and encouraged me to stick to my backup schedule.
> 
> That's a really bad failure rate.  But they're WD Blacks, which if I
> recall correctly, do not support scterc.  Did you deal with your driver
> timeouts?
Yes, my rc.local does
	# echo 180 >"/sys/block/$disk/device/timeout"

The disks were let go after developing bad sectors (more than once with
increased rate). I did not have a case of a disk kicked out due to lack
of scterc.

I *will* get more suitable disks next time.

> If not, those drives probably weren't really dead.  Just not
> raid-compatible out-of-the-box.
> 
> Phil

-- 
Eyal Lebedinsky (eyal@eyal.emu.id.au)

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2017-11-06 21:59 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-24 11:47 upgrading LSI SAS9211-8i fw IR->IT Eyal Lebedinsky
2017-10-24 12:14 ` Roman Mamedov
2017-10-24 13:04   ` Eyal Lebedinsky
2017-10-24 17:59     ` Roman Mamedov
2017-10-26  3:31 ` [sucess?] " Eyal Lebedinsky
2017-10-27 23:09   ` Eyal Lebedinsky
2017-10-29 15:02     ` Brad Campbell
2017-11-02 10:51   ` [sucess] " Eyal Lebedinsky
2017-11-03  0:54     ` Brad Campbell
2017-11-03  2:31       ` Eyal Lebedinsky
2017-11-03  3:03         ` Brad Campbell
2017-11-03  3:39           ` Eyal Lebedinsky
2017-11-06 20:15         ` Wolfgang Denk
2017-11-06 21:38           ` Eyal Lebedinsky
2017-11-06 21:48             ` Phil Turmel
2017-11-06 21:59               ` Eyal Lebedinsky

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.