linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RESEND v2] scsi: ignore Synchronize Cache command failures to keep using drives not supporting it
@ 2020-01-02 22:05 Guido Trentalancia
  0 siblings, 0 replies; 17+ messages in thread
From: Guido Trentalancia @ 2020-01-02 22:05 UTC (permalink / raw)
  To: linux-scsi

Many obsolete hard drives do not support the Synchronize Cache SCSI
command. Such command is generally issued during fsync() calls which
at the moment therefore fail with the ILLEGAL_REQUEST sense key.

Since this failure is currently treated as critical in the kernel SCSI
disk driver, such obsolete hard drives cannot be used anymore: they
cannot be formatted, mounted and/or checked using tools such as e2fsprogs.

Because there is nothing which can be done if the drive does not support
such command, such ILLEGAL_REQUEST should be treated as non-critical so
that the underlying operation does not fail and the obsolete hard drive
can be used normally.

This patch disables the Write Cache feature as a precaution on hard
drives which do not support the Synchronize Cache command and therefore
the cache flushing functionality.

Fixes bug: https://bugzilla.kernel.org/show_bug.cgi?id=203635

Signed-off-by: Guido Trentalancia <guido@trentalancia.com>
---
 drivers/scsi/sd.c |   29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff -pru a/drivers/scsi/sd.c b/drivers/scsi/sd.c
--- a/drivers/scsi/sd.c	2019-03-17 18:22:04.822720851 +0100
+++ b/drivers/scsi/sd.c	2019-03-20 17:41:44.526957307 +0100
@@ -22,6 +22,10 @@
  *	 - Badari Pulavarty <pbadari@us.ibm.com>, Matthew Wilcox 
  *	   <willy@debian.org>, Kurt Garloff <garloff@suse.de>: 
  *	   Support 32k/1M disks.
+ *	 - Guido Trentalancia <guido@trentalancia.com> ignore Synchronize
+ *	   Cache command failures on hard-drives that do not support it
+ *	   and disable the Write Cache functionality on such devices as a
+ *	   precaution: this allows to keep using several obsolete drives.
  *
  *	Logging policy (needs CONFIG_SCSI_LOGGING defined):
  *	 - setting up transfer: SCSI_LOG_HLQUEUE levels 1 and 2
@@ -1633,6 +1637,20 @@ static int sd_sync_cache(struct scsi_dis
 	}
 
 	if (res) {
+		/*
+		 * sshdr.sense_key == ILLEGAL_REQUEST means this drive
+		 * doesn't support sync. There's not much to do and
+		 * sync shouldn't fail.
+		 */
+		if (sshdr->sense_key == ILLEGAL_REQUEST && sshdr->asc == 0x20) {
+			if (sdkp->WCE) {
+				sdkp->WCE = 0;
+				sd_printk(KERN_NOTICE, sdkp, "Drive does not support Synchronize Cache(10) command: disabling write cache.\n");
+				sd_set_flush_flag(sdkp);
+			}
+			return 0;
+		}
+
 		sd_print_result(sdkp, "Synchronize Cache(10) failed", res);
 
 		if (driver_byte(res) == DRIVER_SENSE)
@@ -2022,6 +2040,17 @@ static int sd_done(struct scsi_cmnd *SCp
 					req->rq_flags |= RQF_QUIET;
 				}
 				break;
+			case SYNCHRONIZE_CACHE:
+				if (sshdr.asc == 0x20) {
+					if (sdkp->WCE) {
+						sdkp->WCE = 0;
+						sd_printk(KERN_NOTICE, sdkp, "Drive does not support Synchronize Cache(10) command: disabling write cache.\n");
+						sd_set_flush_flag(sdkp);
+					}
+					SCpnt->result = 0;
+					good_bytes = scsi_bufflen(SCpnt);
+				}
+				break;
 			}
 		}
 		break;

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RESEND v2] scsi: ignore Synchronize Cache command failures to keep using drives not supporting it
  2021-03-01 13:04                 ` Damien Le Moal
@ 2021-03-01 13:12                   ` Guido Trentalancia
  0 siblings, 0 replies; 17+ messages in thread
From: Guido Trentalancia @ 2021-03-01 13:12 UTC (permalink / raw)
  To: Damien Le Moal, linux-scsi

On Mon, 01/03/2021 at 13.04 +0000, Damien Le Moal wrote:
> Yes, I believe that: your patch disables the write cache ! So no
> synchronize
> cache command, no caching, no data loss. All good.

[...]

> Because the drive does not have synchronize cache while caching data,
> which is
> crazy.

[...]

> Sure, because you end up with WCE=0. All good.

[...]

> I understand the situation perfectly. I am not doubting your result.
> I looked at
> your patch and understand it. It is sensible, but does not plug all
> the possible
> holes with such weird drive. So that is not a good solution to me.

It is the only possible solution to the best of my knowledge.

> The alternative, safe this one, is a udev rule disabling your drive
> write cache
> or you setting the permanent drive config with WCE=0. That will have
> *exactly*
> the same effect as your patch: things will work just fine. Try that
> solution
> please. No need for a kernel patch.

That is a cumbersome trick, not a safe and stable solution.

The simplest point of failure of such trick is that an unexperienced
user plugs in one of such drives without knowing the trick and data
corruption occurs.

Another very big disadvantage is that on multiple systems the same
configuration needs to be replicated over and over on all systems.

Guido

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RESEND v2] scsi: ignore Synchronize Cache command failures to keep using drives not supporting it
  2021-03-01 12:57               ` Damien Le Moal
@ 2021-03-01 13:05                 ` Guido Trentalancia
  0 siblings, 0 replies; 17+ messages in thread
From: Guido Trentalancia @ 2021-03-01 13:05 UTC (permalink / raw)
  To: Damien Le Moal, linux-scsi

On Mon, 01/03/2021 at 12.57 +0000, Damien Le Moal wrote:
> On 2021/03/01 21:52, Guido Trentalancia wrote:
> > As already said, I have tested the patch for over a year now and I
> > have
> > never experienced the problem that you are foreseeing !
> 
> That may be true for your specific drive, but since we are in
> uncharted
> (non-standard) territory here, we cannot say that this will hold for
> all such
> weird drives out there.

I do not work in the hard drive industry, so I cannot make assumptions
about all existing models.

All patches should normally undergo further testing and this is no
exception.

I would stress once again that the proposed patch disables write
caching as soon as it realizes that the drive does not support the Sync
Cache command, so it has been designed to be extra safe.

> > The current alternative is data corruption each time that the drive
> > is
> > mounted and the inability to use it.
> > 
> > So, the patch is the way forward for using such drives plug and
> > play
> > without cumbersome configuration such as disabling the write cache,
> > which advanced users can always make.

[...]

> As mentioned, the alternative is a udev rule to disable write cache.
> Or if the
> drive supports that, permanently save WCE=0 setting in the drive
> config so that
> it always come up with write cache disabled. No kernel patch needed,
> and you
> will note that this is also exactly the same as what your patch does,
> without
> waiting for an error.

The above is cumbersome, the kernel should support such drives plug and
play, without causing data corruption which is happening at the moment.

Guido

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RESEND v2] scsi: ignore Synchronize Cache command failures to keep using drives not supporting it
  2021-03-01 12:57               ` Guido Trentalancia
@ 2021-03-01 13:04                 ` Damien Le Moal
  2021-03-01 13:12                   ` Guido Trentalancia
  0 siblings, 1 reply; 17+ messages in thread
From: Damien Le Moal @ 2021-03-01 13:04 UTC (permalink / raw)
  To: Guido Trentalancia, linux-scsi

On 2021/03/01 21:57, Guido Trentalancia wrote:
> On Mon, 01/03/2021 at 12.51 +0000, Damien Le Moal wrote:
>> On 2021/03/01 21:39, Guido Trentalancia wrote:
>>> If the system is shut down before the sync or drive unmounting and
>>> the
>>> write cache is enabled, there might be the loss of data in the
>>> cache,
>>> but this is because of the way the drive is designed.
>>
>> That drive is not usable. Even the best journaling file system would
>> get corrupted.
> 
> It is usable and an ext3 filesystem has not been causing any problem
> for over a year now.

Yes, I believe that: your patch disables the write cache ! So no synchronize
cache command, no caching, no data loss. All good.

> 
>>> I believe the kernel should support the drive as it is - plug and
>>> play
>>> - without requiring cumbersome configurations.
>>
>> No. That would be lying to the user. The user expect things to work.
>> Not data
>> corruptions.
> 
> Data corruption is what occurs with the current kernel when one of such
> drives is mounted.

Because the drive does not have synchronize cache while caching data, which is
crazy.

> With the patch the drive can be used normally and no data corruption
> occurs with the drive that I have tested.

Sure, because you end up with WCE=0. All good.

> This is a proposed patch and nobody is lying, I can report the specific
> drive model that I have tested and the tests can be easily replicated
> to confirm my findings.

I understand the situation perfectly. I am not doubting your result. I looked at
your patch and understand it. It is sensible, but does not plug all the possible
holes with such weird drive. So that is not a good solution to me.

The alternative, safe this one, is a udev rule disabling your drive write cache
or you setting the permanent drive config with WCE=0. That will have *exactly*
the same effect as your patch: things will work just fine. Try that solution
please. No need for a kernel patch.

> 
> Guido
> 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RESEND v2] scsi: ignore Synchronize Cache command failures to keep using drives not supporting it
  2021-03-01 12:51             ` Damien Le Moal
@ 2021-03-01 12:57               ` Guido Trentalancia
  2021-03-01 13:04                 ` Damien Le Moal
  0 siblings, 1 reply; 17+ messages in thread
From: Guido Trentalancia @ 2021-03-01 12:57 UTC (permalink / raw)
  To: Damien Le Moal, linux-scsi

On Mon, 01/03/2021 at 12.51 +0000, Damien Le Moal wrote:
> On 2021/03/01 21:39, Guido Trentalancia wrote:
> > If the system is shut down before the sync or drive unmounting and
> > the
> > write cache is enabled, there might be the loss of data in the
> > cache,
> > but this is because of the way the drive is designed.
> 
> That drive is not usable. Even the best journaling file system would
> get corrupted.

It is usable and an ext3 filesystem has not been causing any problem
for over a year now.

> > I believe the kernel should support the drive as it is - plug and
> > play
> > - without requiring cumbersome configurations.
> 
> No. That would be lying to the user. The user expect things to work.
> Not data
> corruptions.

Data corruption is what occurs with the current kernel when one of such
drives is mounted.

With the patch the drive can be used normally and no data corruption
occurs with the drive that I have tested.

This is a proposed patch and nobody is lying, I can report the specific
drive model that I have tested and the tests can be easily replicated
to confirm my findings.

Guido

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RESEND v2] scsi: ignore Synchronize Cache command failures to keep using drives not supporting it
  2021-03-01 12:52             ` Guido Trentalancia
@ 2021-03-01 12:57               ` Damien Le Moal
  2021-03-01 13:05                 ` Guido Trentalancia
  0 siblings, 1 reply; 17+ messages in thread
From: Damien Le Moal @ 2021-03-01 12:57 UTC (permalink / raw)
  To: Guido Trentalancia, linux-scsi

On 2021/03/01 21:52, Guido Trentalancia wrote:
> On Mon, 01/03/2021 at 12.42 +0000, Damien Le Moal wrote:
>> I checked the standards again. It turns out that SYNCHRONIZE CACHE
>> command is
>> optional in SBC... Aouch. Got so used to have that one on any drive
>> that I
>> thought it was mandatory.
>>
>> Well, it certainly is mandatory if the drive has a write cache, which
>> seems to
>> be the case for you.
>>
>> The problem with your patch though is that you disable write caching
>> when you
>> see an ILLEGAL REQUEST/INVALID OPCODE termination of synchronize
>> cache. Which
>> means that the drive was already used, written too and the cache has
>> likely
>> dirty data and I do not see any method to guarantee that that data
>> makes it to
>> persistent media before shutdown. Imagine if that was the synchronize
>> cache sent
>> before shutdown.
> 
> As already said, I have tested the patch for over a year now and I have
> never experienced the problem that you are foreseeing !

That may be true for your specific drive, but since we are in uncharted
(non-standard) territory here, we cannot say that this will hold for all such
weird drives out there.

> The current alternative is data corruption each time that the drive is
> mounted and the inability to use it.
> 
> So, the patch is the way forward for using such drives plug and play
> without cumbersome configuration such as disabling the write cache,
> which advanced users can always make.
> 
>> So the only reasonable solution for such drive is to use it with
>> write cache
>> disabled from the start.
>> down.
> 
> It works well even with write caching enabled.
> 
> If you have an alternative patch to propose, go ahead otherwise I would
> push for getting this merged and sorting out the issue for good.

As mentioned, the alternative is a udev rule to disable write cache. Or if the
drive supports that, permanently save WCE=0 setting in the drive config so that
it always come up with write cache disabled. No kernel patch needed, and you
will note that this is also exactly the same as what your patch does, without
waiting for an error.

> 
> Guido
> 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RESEND v2] scsi: ignore Synchronize Cache command failures to keep using drives not supporting it
  2021-03-01 12:42           ` Damien Le Moal
@ 2021-03-01 12:52             ` Guido Trentalancia
  2021-03-01 12:57               ` Damien Le Moal
  0 siblings, 1 reply; 17+ messages in thread
From: Guido Trentalancia @ 2021-03-01 12:52 UTC (permalink / raw)
  To: Damien Le Moal, linux-scsi

On Mon, 01/03/2021 at 12.42 +0000, Damien Le Moal wrote:
> I checked the standards again. It turns out that SYNCHRONIZE CACHE
> command is
> optional in SBC... Aouch. Got so used to have that one on any drive
> that I
> thought it was mandatory.
> 
> Well, it certainly is mandatory if the drive has a write cache, which
> seems to
> be the case for you.
> 
> The problem with your patch though is that you disable write caching
> when you
> see an ILLEGAL REQUEST/INVALID OPCODE termination of synchronize
> cache. Which
> means that the drive was already used, written too and the cache has
> likely
> dirty data and I do not see any method to guarantee that that data
> makes it to
> persistent media before shutdown. Imagine if that was the synchronize
> cache sent
> before shutdown.

As already said, I have tested the patch for over a year now and I have
never experienced the problem that you are foreseeing !

The current alternative is data corruption each time that the drive is
mounted and the inability to use it.

So, the patch is the way forward for using such drives plug and play
without cumbersome configuration such as disabling the write cache,
which advanced users can always make.

> So the only reasonable solution for such drive is to use it with
> write cache
> disabled from the start.
> down.

It works well even with write caching enabled.

If you have an alternative patch to propose, go ahead otherwise I would
push for getting this merged and sorting out the issue for good.

Guido

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RESEND v2] scsi: ignore Synchronize Cache command failures to keep using drives not supporting it
  2021-03-01 12:39           ` Guido Trentalancia
@ 2021-03-01 12:51             ` Damien Le Moal
  2021-03-01 12:57               ` Guido Trentalancia
  0 siblings, 1 reply; 17+ messages in thread
From: Damien Le Moal @ 2021-03-01 12:51 UTC (permalink / raw)
  To: Guido Trentalancia, linux-scsi

On 2021/03/01 21:39, Guido Trentalancia wrote:
> If the system is shut down before the sync or drive unmounting and the
> write cache is enabled, there might be the loss of data in the cache,
> but this is because of the way the drive is designed.

That drive is not usable. Even the best journaling file system would get corrupted.

> I believe the kernel should support the drive as it is - plug and play
> - without requiring cumbersome configurations.

No. That would be lying to the user. The user expect things to work. Not data
corruptions.

> Disabling the write cache for increased data security is up to the user
> eventually requiring an increased level of safety against data loss.

Sure. Some do that, most of the time because their application does not have
journaling capabilities. But with your drive, that is mandatory to ensure
correct operation, regardless of the host software using the drive.

> In those earlier days, most operating systems were giving the user very
> specific warning about the risks of write caching. Such warnings have
> now been removed because of the advent of more recent hard-drive
> specifications supporting new features such as the Synchronize Cache
> command, but in my opinion this has nothing to do with the fact that
> earlier drives should be supported in a plug and play fashion as they
> were in earlier days.

As I explained at length, write caching and synchronize cache command are old.
The warning you mention are because of all the poor drive firmware
implementations of synchronize cache command that existed at that time.

> 
> Guido
> 
> On Mon, 01/03/2021 at 12.27 +0000, Damien Le Moal wrote:
>> On 2021/03/01 20:33, Guido Trentalancia wrote:
>>> I have just checked the drive and I can now confirm that you are
>>> right
>>> about the capabilities reported: it is indeed reporting Write
>>> Caching.
>>>
>>> However the point is that the current kernel behaviour is wrong and
>>> leading to data corruption on such drives, as the sync function
>>> fails
>>> due to missing Synchronize Cache command.
>>
>> Without sync working, how could you ever guarantee that even a clean
>> shutdown of
>> the host does not result in data loss ? I am not even talking about
>> power
>> failures and crashes here. A simple, clean "shutdown -h now". Without
>> issuing a
>> synchronize cache command after flushing the page cache and
>> unmounting the FS,
>> you will loose data and corrupt things. 100% guaranteed.
>>
>> As James explained in different terms, it is the other way around:
>> the kernel is
>> correct and behaves according to the fact that the drive is saying "I
>> am caching
>> written sectors". That implies that synchronize cache *is* supported,
>> per the
>> standards. The corruption errors you are seeing are due to the drive
>> being silly
>> and failing synchronize cache commands, resulting in the cached data
>> *not* going
>> to persistent media, and loss of data on power down or crash. This is
>> not a bad
>> behavior. On the contrary, not seeing any data corruption would only
>> mean that
>> you are being lucky.
>>
>>> Once again, this is because of very old HD specifications that were
>>> implementing Write Caching without that command.
>>
>> If the drive is scanned and initialized by sd/libata, it means that
>> it is
>> reporting following a supported standard version (SPC, SBC, ACS).
>> Probably an
>> old version, but still a standard. Synchronize cache command (for
>> scsi) and
>> flush cache/flush cache ext (for ATA) are not recent additions to
>> SCSI & ATA.
>> These have been around for a long time. It does not matter that the
>> drive is old
>> and following an old version. It should support cache flushing. Refer
>> to the ACS
>> specs. It is clearly noted that: "If the volatile write cache is
>> disabled or no volatile write cache is present, the device shall
>> indicate
>> command completion without error.". Get the point ? That drive
>> firmware is
>> simply broken and missing a critical command.
>>
>>> The way forward is to treat the command failure as non-critical
>>> (see
>>> attached patch) because clearly it's not implemented in all drives,
>>> but
>>> only more recent ones.
>>
>> Nope. Simply disable the write cache. Try "hdparm -W 0 /dev/sdX" or
>> "sdparm
>> --clear=WCE /dev/sdX" to disable it. And a udev rule can do that for
>> you on boot
>> too. Failures and data corruption will go away. But the performance
>> will likely
>> go to the trash bin too...
>>
>> A little bit of history: it used to be a thing, many many years ago,
>> to see
>> drives that had synchronize cache commands implemented as "nop" or
>> not
>> implemented at all. Unscrupulous vendors would do that to get better
>> performance
>> results for their drives with benchmarks. Because synchronize cache
>> can be very
>> costly and can take several seconds to complete. Using such drives in
>> big RAID
>> arrays with power failure protections would be fine, but any other
>> use case by
>> any regular user would create data corruption situations very
>> quickly. As noted
>> above, even a simple clean shutdown would lose data ! Such bad
>> practice ended
>> fairly quickly. For some reasons these bad drives failed to sell very
>> well :)
>>
>>>
>>> Guido
>>>
>>> On Mon, 01/03/2021 at 07.38 +0000, Damien Le Moal wrote:
>>>> On 2021/03/01 16:12, Guido Trentalancia wrote:
>>>>> Hi James,
>>>>>
>>>>> thanks for getting back on this issue.
>>>>>
>>>>> I have tested this patch for over a year and it works
>>>>> flawlessly
>>>>> without any data corruption !
>>>>>
>>>>> On such kind of drives the actual situation is just the
>>>>> opposite as
>>>>> you
>>>>> describe: data corruption occurs when not using this patch !
>>>>>
>>>>> I do not agree with you: if a drive does not support
>>>>> Synchronize
>>>>> Cache
>>>>> command, there is no point in treating the failure as critical
>>>>> and
>>>>> by
>>>>> all means the failure must be ignored, as there is nothing
>>>>> which
>>>>> can be
>>>>> done about it and it should not be treated as a failure !
>>>>
>>>> If the drive does not support synchronize cache, then the drive
>>>> should *not*
>>>> report write caching either. If it does, then the kernel will
>>>> issue
>>>> synchronize
>>>> cache commands and that command failing indicates the drive is
>>>> broken/lying ==
>>>> junk and should not be trusted.
>>>>
>>>> The user can trivially remedy to this situation by force
>>>> disabling
>>>> the write
>>>> cache: no more synchronize cache commands will be issued and no
>>>> more
>>>> failures.
>>>> No need to patch the kernel for that. And if the drive does not
>>>> allow
>>>> disabling
>>>> write caching, then I seriously recommend replacing it :)
>>>>
>>>>>
>>>>> However, if you are willing to propose an alternative patch,
>>>>> I'd be
>>>>> happy to test it and report back, as long as this bug is fixed
>>>>> in
>>>>> the
>>>>> shortest time possible.
>>>>>
>>>>> Guido
>>>>>
>>>>> On Sun, 28/02/2021 at 08.37 -0800, James Bottomley wrote:
>>>>>> On Sun, 2021-02-28 at 10:01 +0100, Guido Trentalancia wrote:
>>>>>>> Many obsolete hard drives do not support the Synchronize
>>>>>>> Cache
>>>>>>> SCSI
>>>>>>> command. Such command is generally issued during fsync()
>>>>>>> calls
>>>>>>> which
>>>>>>> at the moment therefore fail with the ILLEGAL_REQUEST sense
>>>>>>> key.
>>>>>>
>>>>>> It should be that all drives that don't support sync cache
>>>>>> also
>>>>>> don't
>>>>>> have write back caches, which means we don't try to do a
>>>>>> cache
>>>>>> sync
>>>>>> on
>>>>>> them.  The only time you we ever try to sync the cache is if
>>>>>> the
>>>>>> device
>>>>>> advertises a write back cache, in which case the sync cache
>>>>>> command
>>>>>> is
>>>>>> mandatory.
>>>>>>
>>>>>> I'm sure some SATA manufacturers somewhere cut enough corners
>>>>>> to
>>>>>> produce an illegally spec'd drive like this, but your
>>>>>> proposed
>>>>>> remedy
>>>>>> is unviable: you can't ignore a cache failure on flush
>>>>>> barriers
>>>>>> which
>>>>>> will cause data corruption.  You have to disable barriers on
>>>>>> the
>>>>>> filesystem to get correct operation and be very careful about
>>>>>> power
>>>>>> down.
>>>>>>
>>>>>> James
>>>>>>
>>>>>>
>>>>
>>>>
>>
>>
> 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RESEND v2] scsi: ignore Synchronize Cache command failures to keep using drives not supporting it
  2021-03-01 12:27         ` Damien Le Moal
  2021-03-01 12:39           ` Guido Trentalancia
@ 2021-03-01 12:42           ` Damien Le Moal
  2021-03-01 12:52             ` Guido Trentalancia
  1 sibling, 1 reply; 17+ messages in thread
From: Damien Le Moal @ 2021-03-01 12:42 UTC (permalink / raw)
  To: Guido Trentalancia, linux-scsi

On 2021/03/01 21:30, Damien Le Moal wrote:
> On 2021/03/01 20:33, Guido Trentalancia wrote:
>> I have just checked the drive and I can now confirm that you are right
>> about the capabilities reported: it is indeed reporting Write Caching.
>>
>> However the point is that the current kernel behaviour is wrong and
>> leading to data corruption on such drives, as the sync function fails
>> due to missing Synchronize Cache command.
> 
> Without sync working, how could you ever guarantee that even a clean shutdown of
> the host does not result in data loss ? I am not even talking about power
> failures and crashes here. A simple, clean "shutdown -h now". Without issuing a
> synchronize cache command after flushing the page cache and unmounting the FS,
> you will loose data and corrupt things. 100% guaranteed.
> 
> As James explained in different terms, it is the other way around: the kernel is
> correct and behaves according to the fact that the drive is saying "I am caching
> written sectors". That implies that synchronize cache *is* supported, per the
> standards. The corruption errors you are seeing are due to the drive being silly
> and failing synchronize cache commands, resulting in the cached data *not* going
> to persistent media, and loss of data on power down or crash. This is not a bad
> behavior. On the contrary, not seeing any data corruption would only mean that
> you are being lucky.
> 
>> Once again, this is because of very old HD specifications that were
>> implementing Write Caching without that command.
> 
> If the drive is scanned and initialized by sd/libata, it means that it is
> reporting following a supported standard version (SPC, SBC, ACS). Probably an
> old version, but still a standard. Synchronize cache command (for scsi) and
> flush cache/flush cache ext (for ATA) are not recent additions to SCSI & ATA.
> These have been around for a long time. It does not matter that the drive is old
> and following an old version. It should support cache flushing. Refer to the ACS
> specs. It is clearly noted that: "If the volatile write cache is
> disabled or no volatile write cache is present, the device shall indicate
> command completion without error.". Get the point ? That drive firmware is
> simply broken and missing a critical command.
> 
>> The way forward is to treat the command failure as non-critical (see
>> attached patch) because clearly it's not implemented in all drives, but
>> only more recent ones.
> 
> Nope. Simply disable the write cache. Try "hdparm -W 0 /dev/sdX" or "sdparm
> --clear=WCE /dev/sdX" to disable it. And a udev rule can do that for you on boot
> too. Failures and data corruption will go away. But the performance will likely
> go to the trash bin too...
> 
> A little bit of history: it used to be a thing, many many years ago, to see
> drives that had synchronize cache commands implemented as "nop" or not
> implemented at all. Unscrupulous vendors would do that to get better performance
> results for their drives with benchmarks. Because synchronize cache can be very
> costly and can take several seconds to complete. Using such drives in big RAID
> arrays with power failure protections would be fine, but any other use case by
> any regular user would create data corruption situations very quickly. As noted
> above, even a simple clean shutdown would lose data ! Such bad practice ended
> fairly quickly. For some reasons these bad drives failed to sell very well :)


I checked the standards again. It turns out that SYNCHRONIZE CACHE command is
optional in SBC... Aouch. Got so used to have that one on any drive that I
thought it was mandatory.

Well, it certainly is mandatory if the drive has a write cache, which seems to
be the case for you.

The problem with your patch though is that you disable write caching when you
see an ILLEGAL REQUEST/INVALID OPCODE termination of synchronize cache. Which
means that the drive was already used, written too and the cache has likely
dirty data and I do not see any method to guarantee that that data makes it to
persistent media before shutdown. Imagine if that was the synchronize cache sent
before shutdown.

So the only reasonable solution for such drive is to use it with write cache
disabled from the start.

> 
>>
>> Guido
>>
>> On Mon, 01/03/2021 at 07.38 +0000, Damien Le Moal wrote:
>>> On 2021/03/01 16:12, Guido Trentalancia wrote:
>>>> Hi James,
>>>>
>>>> thanks for getting back on this issue.
>>>>
>>>> I have tested this patch for over a year and it works flawlessly
>>>> without any data corruption !
>>>>
>>>> On such kind of drives the actual situation is just the opposite as
>>>> you
>>>> describe: data corruption occurs when not using this patch !
>>>>
>>>> I do not agree with you: if a drive does not support Synchronize
>>>> Cache
>>>> command, there is no point in treating the failure as critical and
>>>> by
>>>> all means the failure must be ignored, as there is nothing which
>>>> can be
>>>> done about it and it should not be treated as a failure !
>>>
>>> If the drive does not support synchronize cache, then the drive
>>> should *not*
>>> report write caching either. If it does, then the kernel will issue
>>> synchronize
>>> cache commands and that command failing indicates the drive is
>>> broken/lying ==
>>> junk and should not be trusted.
>>>
>>> The user can trivially remedy to this situation by force disabling
>>> the write
>>> cache: no more synchronize cache commands will be issued and no more
>>> failures.
>>> No need to patch the kernel for that. And if the drive does not allow
>>> disabling
>>> write caching, then I seriously recommend replacing it :)
>>>
>>>>
>>>> However, if you are willing to propose an alternative patch, I'd be
>>>> happy to test it and report back, as long as this bug is fixed in
>>>> the
>>>> shortest time possible.
>>>>
>>>> Guido
>>>>
>>>> On Sun, 28/02/2021 at 08.37 -0800, James Bottomley wrote:
>>>>> On Sun, 2021-02-28 at 10:01 +0100, Guido Trentalancia wrote:
>>>>>> Many obsolete hard drives do not support the Synchronize Cache
>>>>>> SCSI
>>>>>> command. Such command is generally issued during fsync() calls
>>>>>> which
>>>>>> at the moment therefore fail with the ILLEGAL_REQUEST sense
>>>>>> key.
>>>>>
>>>>> It should be that all drives that don't support sync cache also
>>>>> don't
>>>>> have write back caches, which means we don't try to do a cache
>>>>> sync
>>>>> on
>>>>> them.  The only time you we ever try to sync the cache is if the
>>>>> device
>>>>> advertises a write back cache, in which case the sync cache
>>>>> command
>>>>> is
>>>>> mandatory.
>>>>>
>>>>> I'm sure some SATA manufacturers somewhere cut enough corners to
>>>>> produce an illegally spec'd drive like this, but your proposed
>>>>> remedy
>>>>> is unviable: you can't ignore a cache failure on flush barriers
>>>>> which
>>>>> will cause data corruption.  You have to disable barriers on the
>>>>> filesystem to get correct operation and be very careful about
>>>>> power
>>>>> down.
>>>>>
>>>>> James
>>>>>
>>>>>
>>>
>>>
>>
> 
> 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RESEND v2] scsi: ignore Synchronize Cache command failures to keep using drives not supporting it
  2021-03-01 12:27         ` Damien Le Moal
@ 2021-03-01 12:39           ` Guido Trentalancia
  2021-03-01 12:51             ` Damien Le Moal
  2021-03-01 12:42           ` Damien Le Moal
  1 sibling, 1 reply; 17+ messages in thread
From: Guido Trentalancia @ 2021-03-01 12:39 UTC (permalink / raw)
  To: Damien Le Moal, linux-scsi

If the system is shut down before the sync or drive unmounting and the
write cache is enabled, there might be the loss of data in the cache,
but this is because of the way the drive is designed.

I believe the kernel should support the drive as it is - plug and play
- without requiring cumbersome configurations.

Disabling the write cache for increased data security is up to the user
eventually requiring an increased level of safety against data loss.

In those earlier days, most operating systems were giving the user very
specific warning about the risks of write caching. Such warnings have
now been removed because of the advent of more recent hard-drive
specifications supporting new features such as the Synchronize Cache
command, but in my opinion this has nothing to do with the fact that
earlier drives should be supported in a plug and play fashion as they
were in earlier days.

Guido

On Mon, 01/03/2021 at 12.27 +0000, Damien Le Moal wrote:
> On 2021/03/01 20:33, Guido Trentalancia wrote:
> > I have just checked the drive and I can now confirm that you are
> > right
> > about the capabilities reported: it is indeed reporting Write
> > Caching.
> > 
> > However the point is that the current kernel behaviour is wrong and
> > leading to data corruption on such drives, as the sync function
> > fails
> > due to missing Synchronize Cache command.
> 
> Without sync working, how could you ever guarantee that even a clean
> shutdown of
> the host does not result in data loss ? I am not even talking about
> power
> failures and crashes here. A simple, clean "shutdown -h now". Without
> issuing a
> synchronize cache command after flushing the page cache and
> unmounting the FS,
> you will loose data and corrupt things. 100% guaranteed.
> 
> As James explained in different terms, it is the other way around:
> the kernel is
> correct and behaves according to the fact that the drive is saying "I
> am caching
> written sectors". That implies that synchronize cache *is* supported,
> per the
> standards. The corruption errors you are seeing are due to the drive
> being silly
> and failing synchronize cache commands, resulting in the cached data
> *not* going
> to persistent media, and loss of data on power down or crash. This is
> not a bad
> behavior. On the contrary, not seeing any data corruption would only
> mean that
> you are being lucky.
> 
> > Once again, this is because of very old HD specifications that were
> > implementing Write Caching without that command.
> 
> If the drive is scanned and initialized by sd/libata, it means that
> it is
> reporting following a supported standard version (SPC, SBC, ACS).
> Probably an
> old version, but still a standard. Synchronize cache command (for
> scsi) and
> flush cache/flush cache ext (for ATA) are not recent additions to
> SCSI & ATA.
> These have been around for a long time. It does not matter that the
> drive is old
> and following an old version. It should support cache flushing. Refer
> to the ACS
> specs. It is clearly noted that: "If the volatile write cache is
> disabled or no volatile write cache is present, the device shall
> indicate
> command completion without error.". Get the point ? That drive
> firmware is
> simply broken and missing a critical command.
> 
> > The way forward is to treat the command failure as non-critical
> > (see
> > attached patch) because clearly it's not implemented in all drives,
> > but
> > only more recent ones.
> 
> Nope. Simply disable the write cache. Try "hdparm -W 0 /dev/sdX" or
> "sdparm
> --clear=WCE /dev/sdX" to disable it. And a udev rule can do that for
> you on boot
> too. Failures and data corruption will go away. But the performance
> will likely
> go to the trash bin too...
> 
> A little bit of history: it used to be a thing, many many years ago,
> to see
> drives that had synchronize cache commands implemented as "nop" or
> not
> implemented at all. Unscrupulous vendors would do that to get better
> performance
> results for their drives with benchmarks. Because synchronize cache
> can be very
> costly and can take several seconds to complete. Using such drives in
> big RAID
> arrays with power failure protections would be fine, but any other
> use case by
> any regular user would create data corruption situations very
> quickly. As noted
> above, even a simple clean shutdown would lose data ! Such bad
> practice ended
> fairly quickly. For some reasons these bad drives failed to sell very
> well :)
> 
> > 
> > Guido
> > 
> > On Mon, 01/03/2021 at 07.38 +0000, Damien Le Moal wrote:
> > > On 2021/03/01 16:12, Guido Trentalancia wrote:
> > > > Hi James,
> > > > 
> > > > thanks for getting back on this issue.
> > > > 
> > > > I have tested this patch for over a year and it works
> > > > flawlessly
> > > > without any data corruption !
> > > > 
> > > > On such kind of drives the actual situation is just the
> > > > opposite as
> > > > you
> > > > describe: data corruption occurs when not using this patch !
> > > > 
> > > > I do not agree with you: if a drive does not support
> > > > Synchronize
> > > > Cache
> > > > command, there is no point in treating the failure as critical
> > > > and
> > > > by
> > > > all means the failure must be ignored, as there is nothing
> > > > which
> > > > can be
> > > > done about it and it should not be treated as a failure !
> > > 
> > > If the drive does not support synchronize cache, then the drive
> > > should *not*
> > > report write caching either. If it does, then the kernel will
> > > issue
> > > synchronize
> > > cache commands and that command failing indicates the drive is
> > > broken/lying ==
> > > junk and should not be trusted.
> > > 
> > > The user can trivially remedy to this situation by force
> > > disabling
> > > the write
> > > cache: no more synchronize cache commands will be issued and no
> > > more
> > > failures.
> > > No need to patch the kernel for that. And if the drive does not
> > > allow
> > > disabling
> > > write caching, then I seriously recommend replacing it :)
> > > 
> > > > 
> > > > However, if you are willing to propose an alternative patch,
> > > > I'd be
> > > > happy to test it and report back, as long as this bug is fixed
> > > > in
> > > > the
> > > > shortest time possible.
> > > > 
> > > > Guido
> > > > 
> > > > On Sun, 28/02/2021 at 08.37 -0800, James Bottomley wrote:
> > > > > On Sun, 2021-02-28 at 10:01 +0100, Guido Trentalancia wrote:
> > > > > > Many obsolete hard drives do not support the Synchronize
> > > > > > Cache
> > > > > > SCSI
> > > > > > command. Such command is generally issued during fsync()
> > > > > > calls
> > > > > > which
> > > > > > at the moment therefore fail with the ILLEGAL_REQUEST sense
> > > > > > key.
> > > > > 
> > > > > It should be that all drives that don't support sync cache
> > > > > also
> > > > > don't
> > > > > have write back caches, which means we don't try to do a
> > > > > cache
> > > > > sync
> > > > > on
> > > > > them.  The only time you we ever try to sync the cache is if
> > > > > the
> > > > > device
> > > > > advertises a write back cache, in which case the sync cache
> > > > > command
> > > > > is
> > > > > mandatory.
> > > > > 
> > > > > I'm sure some SATA manufacturers somewhere cut enough corners
> > > > > to
> > > > > produce an illegally spec'd drive like this, but your
> > > > > proposed
> > > > > remedy
> > > > > is unviable: you can't ignore a cache failure on flush
> > > > > barriers
> > > > > which
> > > > > will cause data corruption.  You have to disable barriers on
> > > > > the
> > > > > filesystem to get correct operation and be very careful about
> > > > > power
> > > > > down.
> > > > > 
> > > > > James
> > > > > 
> > > > > 
> > > 
> > > 
> 
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RESEND v2] scsi: ignore Synchronize Cache command failures to keep using drives not supporting it
  2021-03-01 11:33       ` Guido Trentalancia
@ 2021-03-01 12:27         ` Damien Le Moal
  2021-03-01 12:39           ` Guido Trentalancia
  2021-03-01 12:42           ` Damien Le Moal
  0 siblings, 2 replies; 17+ messages in thread
From: Damien Le Moal @ 2021-03-01 12:27 UTC (permalink / raw)
  To: Guido Trentalancia, linux-scsi

On 2021/03/01 20:33, Guido Trentalancia wrote:
> I have just checked the drive and I can now confirm that you are right
> about the capabilities reported: it is indeed reporting Write Caching.
> 
> However the point is that the current kernel behaviour is wrong and
> leading to data corruption on such drives, as the sync function fails
> due to missing Synchronize Cache command.

Without sync working, how could you ever guarantee that even a clean shutdown of
the host does not result in data loss ? I am not even talking about power
failures and crashes here. A simple, clean "shutdown -h now". Without issuing a
synchronize cache command after flushing the page cache and unmounting the FS,
you will loose data and corrupt things. 100% guaranteed.

As James explained in different terms, it is the other way around: the kernel is
correct and behaves according to the fact that the drive is saying "I am caching
written sectors". That implies that synchronize cache *is* supported, per the
standards. The corruption errors you are seeing are due to the drive being silly
and failing synchronize cache commands, resulting in the cached data *not* going
to persistent media, and loss of data on power down or crash. This is not a bad
behavior. On the contrary, not seeing any data corruption would only mean that
you are being lucky.

> Once again, this is because of very old HD specifications that were
> implementing Write Caching without that command.

If the drive is scanned and initialized by sd/libata, it means that it is
reporting following a supported standard version (SPC, SBC, ACS). Probably an
old version, but still a standard. Synchronize cache command (for scsi) and
flush cache/flush cache ext (for ATA) are not recent additions to SCSI & ATA.
These have been around for a long time. It does not matter that the drive is old
and following an old version. It should support cache flushing. Refer to the ACS
specs. It is clearly noted that: "If the volatile write cache is
disabled or no volatile write cache is present, the device shall indicate
command completion without error.". Get the point ? That drive firmware is
simply broken and missing a critical command.

> The way forward is to treat the command failure as non-critical (see
> attached patch) because clearly it's not implemented in all drives, but
> only more recent ones.

Nope. Simply disable the write cache. Try "hdparm -W 0 /dev/sdX" or "sdparm
--clear=WCE /dev/sdX" to disable it. And a udev rule can do that for you on boot
too. Failures and data corruption will go away. But the performance will likely
go to the trash bin too...

A little bit of history: it used to be a thing, many many years ago, to see
drives that had synchronize cache commands implemented as "nop" or not
implemented at all. Unscrupulous vendors would do that to get better performance
results for their drives with benchmarks. Because synchronize cache can be very
costly and can take several seconds to complete. Using such drives in big RAID
arrays with power failure protections would be fine, but any other use case by
any regular user would create data corruption situations very quickly. As noted
above, even a simple clean shutdown would lose data ! Such bad practice ended
fairly quickly. For some reasons these bad drives failed to sell very well :)

> 
> Guido
> 
> On Mon, 01/03/2021 at 07.38 +0000, Damien Le Moal wrote:
>> On 2021/03/01 16:12, Guido Trentalancia wrote:
>>> Hi James,
>>>
>>> thanks for getting back on this issue.
>>>
>>> I have tested this patch for over a year and it works flawlessly
>>> without any data corruption !
>>>
>>> On such kind of drives the actual situation is just the opposite as
>>> you
>>> describe: data corruption occurs when not using this patch !
>>>
>>> I do not agree with you: if a drive does not support Synchronize
>>> Cache
>>> command, there is no point in treating the failure as critical and
>>> by
>>> all means the failure must be ignored, as there is nothing which
>>> can be
>>> done about it and it should not be treated as a failure !
>>
>> If the drive does not support synchronize cache, then the drive
>> should *not*
>> report write caching either. If it does, then the kernel will issue
>> synchronize
>> cache commands and that command failing indicates the drive is
>> broken/lying ==
>> junk and should not be trusted.
>>
>> The user can trivially remedy to this situation by force disabling
>> the write
>> cache: no more synchronize cache commands will be issued and no more
>> failures.
>> No need to patch the kernel for that. And if the drive does not allow
>> disabling
>> write caching, then I seriously recommend replacing it :)
>>
>>>
>>> However, if you are willing to propose an alternative patch, I'd be
>>> happy to test it and report back, as long as this bug is fixed in
>>> the
>>> shortest time possible.
>>>
>>> Guido
>>>
>>> On Sun, 28/02/2021 at 08.37 -0800, James Bottomley wrote:
>>>> On Sun, 2021-02-28 at 10:01 +0100, Guido Trentalancia wrote:
>>>>> Many obsolete hard drives do not support the Synchronize Cache
>>>>> SCSI
>>>>> command. Such command is generally issued during fsync() calls
>>>>> which
>>>>> at the moment therefore fail with the ILLEGAL_REQUEST sense
>>>>> key.
>>>>
>>>> It should be that all drives that don't support sync cache also
>>>> don't
>>>> have write back caches, which means we don't try to do a cache
>>>> sync
>>>> on
>>>> them.  The only time you we ever try to sync the cache is if the
>>>> device
>>>> advertises a write back cache, in which case the sync cache
>>>> command
>>>> is
>>>> mandatory.
>>>>
>>>> I'm sure some SATA manufacturers somewhere cut enough corners to
>>>> produce an illegally spec'd drive like this, but your proposed
>>>> remedy
>>>> is unviable: you can't ignore a cache failure on flush barriers
>>>> which
>>>> will cause data corruption.  You have to disable barriers on the
>>>> filesystem to get correct operation and be very careful about
>>>> power
>>>> down.
>>>>
>>>> James
>>>>
>>>>
>>
>>
> 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RESEND v2] scsi: ignore Synchronize Cache command failures to keep using drives not supporting it
  2021-03-01  7:38     ` Damien Le Moal
  2021-03-01 11:33       ` Guido Trentalancia
@ 2021-03-01 12:07       ` Guido Trentalancia
  1 sibling, 0 replies; 17+ messages in thread
From: Guido Trentalancia @ 2021-03-01 12:07 UTC (permalink / raw)
  To: Damien Le Moal, linux-scsi

Damien,

what tells you the drive is reporting write cache ? I am not sure about
this...

Consider, we are talking about rather old drives.

I suppose the point is that they were designed with very different
specifications compared to recent drives, surely including the absence
of Synchronize Cache but possibly Write Caching too: the point of
failure is at the Synchronize Cache issuing code, so I patched there.

Whatever the case and as already pointed out, if you are willing to
propose an alternative patch, I will be happy to test it and report
back, as long as the issue is resolved as soon as possible.

Guido

On Mon, 01/03/2021 at 07.38 +0000, Damien Le Moal wrote:
> On 2021/03/01 16:12, Guido Trentalancia wrote:
> > Hi James,
> > 
> > thanks for getting back on this issue.
> > 
> > I have tested this patch for over a year and it works flawlessly
> > without any data corruption !
> > 
> > On such kind of drives the actual situation is just the opposite as
> > you
> > describe: data corruption occurs when not using this patch !
> > 
> > I do not agree with you: if a drive does not support Synchronize
> > Cache
> > command, there is no point in treating the failure as critical and
> > by
> > all means the failure must be ignored, as there is nothing which
> > can be
> > done about it and it should not be treated as a failure !
> 
> If the drive does not support synchronize cache, then the drive
> should *not*
> report write caching either. If it does, then the kernel will issue
> synchronize
> cache commands and that command failing indicates the drive is
> broken/lying ==
> junk and should not be trusted.
> 
> The user can trivially remedy to this situation by force disabling
> the write
> cache: no more synchronize cache commands will be issued and no more
> failures.
> No need to patch the kernel for that. And if the drive does not allow
> disabling
> write caching, then I seriously recommend replacing it :)
> 
> > 
> > However, if you are willing to propose an alternative patch, I'd be
> > happy to test it and report back, as long as this bug is fixed in
> > the
> > shortest time possible.
> > 
> > Guido
> > 
> > On Sun, 28/02/2021 at 08.37 -0800, James Bottomley wrote:
> > > On Sun, 2021-02-28 at 10:01 +0100, Guido Trentalancia wrote:
> > > > Many obsolete hard drives do not support the Synchronize Cache
> > > > SCSI
> > > > command. Such command is generally issued during fsync() calls
> > > > which
> > > > at the moment therefore fail with the ILLEGAL_REQUEST sense
> > > > key.
> > > 
> > > It should be that all drives that don't support sync cache also
> > > don't
> > > have write back caches, which means we don't try to do a cache
> > > sync
> > > on
> > > them.  The only time you we ever try to sync the cache is if the
> > > device
> > > advertises a write back cache, in which case the sync cache
> > > command
> > > is
> > > mandatory.
> > > 
> > > I'm sure some SATA manufacturers somewhere cut enough corners to
> > > produce an illegally spec'd drive like this, but your proposed
> > > remedy
> > > is unviable: you can't ignore a cache failure on flush barriers
> > > which
> > > will cause data corruption.  You have to disable barriers on the
> > > filesystem to get correct operation and be very careful about
> > > power
> > > down.
> > > 
> > > James
> > > 
> > > 
> 
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RESEND v2] scsi: ignore Synchronize Cache command failures to keep using drives not supporting it
  2021-03-01  7:38     ` Damien Le Moal
@ 2021-03-01 11:33       ` Guido Trentalancia
  2021-03-01 12:27         ` Damien Le Moal
  2021-03-01 12:07       ` Guido Trentalancia
  1 sibling, 1 reply; 17+ messages in thread
From: Guido Trentalancia @ 2021-03-01 11:33 UTC (permalink / raw)
  To: Damien Le Moal, linux-scsi

I have just checked the drive and I can now confirm that you are right
about the capabilities reported: it is indeed reporting Write Caching.

However the point is that the current kernel behaviour is wrong and
leading to data corruption on such drives, as the sync function fails
due to missing Synchronize Cache command.

Once again, this is because of very old HD specifications that were
implementing Write Caching without that command.

The way forward is to treat the command failure as non-critical (see
attached patch) because clearly it's not implemented in all drives, but
only more recent ones.

Guido

On Mon, 01/03/2021 at 07.38 +0000, Damien Le Moal wrote:
> On 2021/03/01 16:12, Guido Trentalancia wrote:
> > Hi James,
> > 
> > thanks for getting back on this issue.
> > 
> > I have tested this patch for over a year and it works flawlessly
> > without any data corruption !
> > 
> > On such kind of drives the actual situation is just the opposite as
> > you
> > describe: data corruption occurs when not using this patch !
> > 
> > I do not agree with you: if a drive does not support Synchronize
> > Cache
> > command, there is no point in treating the failure as critical and
> > by
> > all means the failure must be ignored, as there is nothing which
> > can be
> > done about it and it should not be treated as a failure !
> 
> If the drive does not support synchronize cache, then the drive
> should *not*
> report write caching either. If it does, then the kernel will issue
> synchronize
> cache commands and that command failing indicates the drive is
> broken/lying ==
> junk and should not be trusted.
> 
> The user can trivially remedy to this situation by force disabling
> the write
> cache: no more synchronize cache commands will be issued and no more
> failures.
> No need to patch the kernel for that. And if the drive does not allow
> disabling
> write caching, then I seriously recommend replacing it :)
> 
> > 
> > However, if you are willing to propose an alternative patch, I'd be
> > happy to test it and report back, as long as this bug is fixed in
> > the
> > shortest time possible.
> > 
> > Guido
> > 
> > On Sun, 28/02/2021 at 08.37 -0800, James Bottomley wrote:
> > > On Sun, 2021-02-28 at 10:01 +0100, Guido Trentalancia wrote:
> > > > Many obsolete hard drives do not support the Synchronize Cache
> > > > SCSI
> > > > command. Such command is generally issued during fsync() calls
> > > > which
> > > > at the moment therefore fail with the ILLEGAL_REQUEST sense
> > > > key.
> > > 
> > > It should be that all drives that don't support sync cache also
> > > don't
> > > have write back caches, which means we don't try to do a cache
> > > sync
> > > on
> > > them.  The only time you we ever try to sync the cache is if the
> > > device
> > > advertises a write back cache, in which case the sync cache
> > > command
> > > is
> > > mandatory.
> > > 
> > > I'm sure some SATA manufacturers somewhere cut enough corners to
> > > produce an illegally spec'd drive like this, but your proposed
> > > remedy
> > > is unviable: you can't ignore a cache failure on flush barriers
> > > which
> > > will cause data corruption.  You have to disable barriers on the
> > > filesystem to get correct operation and be very careful about
> > > power
> > > down.
> > > 
> > > James
> > > 
> > > 
> 
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RESEND v2] scsi: ignore Synchronize Cache command failures to keep using drives not supporting it
  2021-03-01  7:06   ` Guido Trentalancia
@ 2021-03-01  7:38     ` Damien Le Moal
  2021-03-01 11:33       ` Guido Trentalancia
  2021-03-01 12:07       ` Guido Trentalancia
  0 siblings, 2 replies; 17+ messages in thread
From: Damien Le Moal @ 2021-03-01  7:38 UTC (permalink / raw)
  To: Guido Trentalancia, James Bottomley, linux-scsi

On 2021/03/01 16:12, Guido Trentalancia wrote:
> Hi James,
> 
> thanks for getting back on this issue.
> 
> I have tested this patch for over a year and it works flawlessly
> without any data corruption !
> 
> On such kind of drives the actual situation is just the opposite as you
> describe: data corruption occurs when not using this patch !
> 
> I do not agree with you: if a drive does not support Synchronize Cache
> command, there is no point in treating the failure as critical and by
> all means the failure must be ignored, as there is nothing which can be
> done about it and it should not be treated as a failure !

If the drive does not support synchronize cache, then the drive should *not*
report write caching either. If it does, then the kernel will issue synchronize
cache commands and that command failing indicates the drive is broken/lying ==
junk and should not be trusted.

The user can trivially remedy to this situation by force disabling the write
cache: no more synchronize cache commands will be issued and no more failures.
No need to patch the kernel for that. And if the drive does not allow disabling
write caching, then I seriously recommend replacing it :)

> 
> However, if you are willing to propose an alternative patch, I'd be
> happy to test it and report back, as long as this bug is fixed in the
> shortest time possible.
> 
> Guido
> 
> On Sun, 28/02/2021 at 08.37 -0800, James Bottomley wrote:
>> On Sun, 2021-02-28 at 10:01 +0100, Guido Trentalancia wrote:
>>> Many obsolete hard drives do not support the Synchronize Cache SCSI
>>> command. Such command is generally issued during fsync() calls
>>> which
>>> at the moment therefore fail with the ILLEGAL_REQUEST sense key.
>>
>> It should be that all drives that don't support sync cache also don't
>> have write back caches, which means we don't try to do a cache sync
>> on
>> them.  The only time you we ever try to sync the cache is if the
>> device
>> advertises a write back cache, in which case the sync cache command
>> is
>> mandatory.
>>
>> I'm sure some SATA manufacturers somewhere cut enough corners to
>> produce an illegally spec'd drive like this, but your proposed remedy
>> is unviable: you can't ignore a cache failure on flush barriers which
>> will cause data corruption.  You have to disable barriers on the
>> filesystem to get correct operation and be very careful about power
>> down.
>>
>> James
>>
>>
> 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RESEND v2] scsi: ignore Synchronize Cache command failures to keep using drives not supporting it
  2021-02-28 16:37 ` James Bottomley
@ 2021-03-01  7:06   ` Guido Trentalancia
  2021-03-01  7:38     ` Damien Le Moal
  0 siblings, 1 reply; 17+ messages in thread
From: Guido Trentalancia @ 2021-03-01  7:06 UTC (permalink / raw)
  To: James Bottomley, linux-scsi

Hi James,

thanks for getting back on this issue.

I have tested this patch for over a year and it works flawlessly
without any data corruption !

On such kind of drives the actual situation is just the opposite as you
describe: data corruption occurs when not using this patch !

I do not agree with you: if a drive does not support Synchronize Cache
command, there is no point in treating the failure as critical and by
all means the failure must be ignored, as there is nothing which can be
done about it and it should not be treated as a failure !

However, if you are willing to propose an alternative patch, I'd be
happy to test it and report back, as long as this bug is fixed in the
shortest time possible.

Guido

On Sun, 28/02/2021 at 08.37 -0800, James Bottomley wrote:
> On Sun, 2021-02-28 at 10:01 +0100, Guido Trentalancia wrote:
> > Many obsolete hard drives do not support the Synchronize Cache SCSI
> > command. Such command is generally issued during fsync() calls
> > which
> > at the moment therefore fail with the ILLEGAL_REQUEST sense key.
> 
> It should be that all drives that don't support sync cache also don't
> have write back caches, which means we don't try to do a cache sync
> on
> them.  The only time you we ever try to sync the cache is if the
> device
> advertises a write back cache, in which case the sync cache command
> is
> mandatory.
> 
> I'm sure some SATA manufacturers somewhere cut enough corners to
> produce an illegally spec'd drive like this, but your proposed remedy
> is unviable: you can't ignore a cache failure on flush barriers which
> will cause data corruption.  You have to disable barriers on the
> filesystem to get correct operation and be very careful about power
> down.
> 
> James
> 
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RESEND v2] scsi: ignore Synchronize Cache command failures to keep using drives not supporting it
  2021-02-28  9:01 Guido Trentalancia
@ 2021-02-28 16:37 ` James Bottomley
  2021-03-01  7:06   ` Guido Trentalancia
  0 siblings, 1 reply; 17+ messages in thread
From: James Bottomley @ 2021-02-28 16:37 UTC (permalink / raw)
  To: Guido Trentalancia, linux-scsi

On Sun, 2021-02-28 at 10:01 +0100, Guido Trentalancia wrote:
> Many obsolete hard drives do not support the Synchronize Cache SCSI
> command. Such command is generally issued during fsync() calls which
> at the moment therefore fail with the ILLEGAL_REQUEST sense key.

It should be that all drives that don't support sync cache also don't
have write back caches, which means we don't try to do a cache sync on
them.  The only time you we ever try to sync the cache is if the device
advertises a write back cache, in which case the sync cache command is
mandatory.

I'm sure some SATA manufacturers somewhere cut enough corners to
produce an illegally spec'd drive like this, but your proposed remedy
is unviable: you can't ignore a cache failure on flush barriers which
will cause data corruption.  You have to disable barriers on the
filesystem to get correct operation and be very careful about power
down.

James



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH RESEND v2] scsi: ignore Synchronize Cache command failures to keep using drives not supporting it
@ 2021-02-28  9:01 Guido Trentalancia
  2021-02-28 16:37 ` James Bottomley
  0 siblings, 1 reply; 17+ messages in thread
From: Guido Trentalancia @ 2021-02-28  9:01 UTC (permalink / raw)
  To: linux-scsi

Many obsolete hard drives do not support the Synchronize Cache SCSI
command. Such command is generally issued during fsync() calls which
at the moment therefore fail with the ILLEGAL_REQUEST sense key.

Since this failure is currently treated as critical in the kernel SCSI
disk driver, such obsolete hard drives cannot be used anymore: they
cannot be formatted, mounted and/or checked using tools such as e2fsprogs.

Because there is nothing which can be done if the drive does not support
such command, such ILLEGAL_REQUEST should be treated as non-critical so
that the underlying operation does not fail and the obsolete hard drive
can be used normally.

This patch disables the Write Cache feature and therefore the cache
flushing functionality, as a precaution on hard drives which do not
support the Synchronize Cache command.

Fixes bug: https://bugzilla.kernel.org/show_bug.cgi?id=203635

Signed-off-by: Guido Trentalancia <guido@trentalancia.com>
---
 drivers/scsi/sd.c |   29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff -pru linux-5.0.2-orig/drivers/scsi/sd.c linux-5.0.2/drivers/scsi/sd.c
--- linux-5.0.2-orig/drivers/scsi/sd.c	2019-03-17 18:22:04.822720851 +0100
+++ linux-5.0.2/drivers/scsi/sd.c	2019-03-20 17:41:44.526957307 +0100
@@ -22,6 +22,10 @@
  *	 - Badari Pulavarty <pbadari@us.ibm.com>, Matthew Wilcox 
  *	   <willy@debian.org>, Kurt Garloff <garloff@suse.de>: 
  *	   Support 32k/1M disks.
+ *	 - Guido Trentalancia <guido@trentalancia.com> ignore Synchronize
+ *	   Cache command failures on hard-drives that do not support it
+ *	   and disable the Write Cache functionality on such devices as a
+ *	   precaution: this allows to keep using several obsolete drives.
  *
  *	Logging policy (needs CONFIG_SCSI_LOGGING defined):
  *	 - setting up transfer: SCSI_LOG_HLQUEUE levels 1 and 2
@@ -1633,6 +1637,20 @@ static int sd_sync_cache(struct scsi_dis
 	}
 
 	if (res) {
+		/*
+		 * sshdr.sense_key == ILLEGAL_REQUEST means this drive
+		 * doesn't support sync. There's not much to do and
+		 * sync shouldn't fail.
+		 */
+		if (sshdr->sense_key == ILLEGAL_REQUEST && sshdr->asc == 0x20) {
+			if (sdkp->WCE) {
+				sdkp->WCE = 0;
+				sd_printk(KERN_NOTICE, sdkp, "Drive does not support Synchronize Cache(10) command: disabling write cache.\n");
+				sd_set_flush_flag(sdkp);
+			}
+			return 0;
+		}
+
 		sd_print_result(sdkp, "Synchronize Cache(10) failed", res);
 
 		if (driver_byte(res) == DRIVER_SENSE)
@@ -2022,6 +2040,17 @@ static int sd_done(struct scsi_cmnd *SCp
 					req->rq_flags |= RQF_QUIET;
 				}
 				break;
+			case SYNCHRONIZE_CACHE:
+				if (sshdr.asc == 0x20) {
+					if (sdkp->WCE) {
+						sdkp->WCE = 0;
+						sd_printk(KERN_NOTICE, sdkp, "Drive does not support Synchronize Cache(10) command: disabling write cache.\n");
+						sd_set_flush_flag(sdkp);
+					}
+					SCpnt->result = 0;
+					good_bytes = scsi_bufflen(SCpnt);
+				}
+				break;
 			}
 		}
 		break;

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2021-03-01 13:14 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-02 22:05 [PATCH RESEND v2] scsi: ignore Synchronize Cache command failures to keep using drives not supporting it Guido Trentalancia
2021-02-28  9:01 Guido Trentalancia
2021-02-28 16:37 ` James Bottomley
2021-03-01  7:06   ` Guido Trentalancia
2021-03-01  7:38     ` Damien Le Moal
2021-03-01 11:33       ` Guido Trentalancia
2021-03-01 12:27         ` Damien Le Moal
2021-03-01 12:39           ` Guido Trentalancia
2021-03-01 12:51             ` Damien Le Moal
2021-03-01 12:57               ` Guido Trentalancia
2021-03-01 13:04                 ` Damien Le Moal
2021-03-01 13:12                   ` Guido Trentalancia
2021-03-01 12:42           ` Damien Le Moal
2021-03-01 12:52             ` Guido Trentalancia
2021-03-01 12:57               ` Damien Le Moal
2021-03-01 13:05                 ` Guido Trentalancia
2021-03-01 12:07       ` Guido Trentalancia

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).