linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* BLKZEROOUT not zeroing md dev on VMDK
@ 2016-05-18 22:39 Sitsofe Wheeler
  2016-05-27  4:18 ` Darrick J. Wong
  0 siblings, 1 reply; 7+ messages in thread
From: Sitsofe Wheeler @ 2016-05-18 22:39 UTC (permalink / raw)
  To: Shaohua Li
  Cc: Jens Axboe, Darrick J. Wong, Arvind Kumar, VMware PV-Drivers,
	linux-raid, linux-scsi, linux-block, linux-kernel

Hi,

With Ubuntu's 4.4.0-22-generic kernel and a Fedora 23
4.6.0-1.vanilla.knurd.1.fc23.x86_64 kernel I've found that the
BLKZEROOUT syscall can malfunction and not zero data.

When BLKZEROOUT is issued to an MD device atop a PVSCSI controller
supplied VMDK from ESXi 6.0 the call returns immediately and with a zero
return code. Unfortunately, inspecting the data on the MD device shows
that it has not been zeroed and is in fact untouched. The easiest way to
see this behaviour is to boot the VM, create an mdadm device atop
/dev/sd?, scribble some non-zero value on the disk and then use
blkdiscard --zeroout /dev/md??? . If you then inspect the MD disk (e.g.
with hexdump) you will still see the old data and using POSIX_FADV_DONTNEED
on the MD device doesn't change the outcome.

The only clue I've seen is that
/sys/block/sd?/queue/write_same_max_bytes starts out being 33553920 but
after a WRITE SAME is issued it becomes 0. If the MD device is created
after write_same_max_bytes has become 0 on the backing disk then
BLKZEROOUT seems to work correctly.

-- 
Sitsofe | http://sucs.org/~sits/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: BLKZEROOUT not zeroing md dev on VMDK
  2016-05-18 22:39 BLKZEROOUT not zeroing md dev on VMDK Sitsofe Wheeler
@ 2016-05-27  4:18 ` Darrick J. Wong
  2016-05-27  4:45   ` Sitsofe Wheeler
  0 siblings, 1 reply; 7+ messages in thread
From: Darrick J. Wong @ 2016-05-27  4:18 UTC (permalink / raw)
  To: Sitsofe Wheeler
  Cc: Shaohua Li, Jens Axboe, Arvind Kumar, VMware PV-Drivers,
	linux-raid, linux-scsi, linux-block, linux-kernel

On Wed, May 18, 2016 at 11:39:30PM +0100, Sitsofe Wheeler wrote:
> Hi,
> 
> With Ubuntu's 4.4.0-22-generic kernel and a Fedora 23
> 4.6.0-1.vanilla.knurd.1.fc23.x86_64 kernel I've found that the
> BLKZEROOUT syscall can malfunction and not zero data.
> 
> When BLKZEROOUT is issued to an MD device atop a PVSCSI controller
> supplied VMDK from ESXi 6.0 the call returns immediately and with a zero
> return code. Unfortunately, inspecting the data on the MD device shows
> that it has not been zeroed and is in fact untouched. The easiest way to
> see this behaviour is to boot the VM, create an mdadm device atop
> /dev/sd?, scribble some non-zero value on the disk and then use
> blkdiscard --zeroout /dev/md??? . If you then inspect the MD disk (e.g.
> with hexdump) you will still see the old data and using POSIX_FADV_DONTNEED
> on the MD device doesn't change the outcome.
> 
> The only clue I've seen is that
> /sys/block/sd?/queue/write_same_max_bytes starts out being 33553920 but
> after a WRITE SAME is issued it becomes 0. If the MD device is created
> after write_same_max_bytes has become 0 on the backing disk then
> BLKZEROOUT seems to work correctly.

It's possible that the pvscsi device advertised WRITE SAME, but if the device
sends back ILLEGAL REQUEST then the SCSI disk driver will set
write_same_max_bytes=0.  Subsequent BLKZEROOUT attempts will then issue writes
of zeroes to the drive.

--D

> 
> -- 
> Sitsofe | http://sucs.org/~sits/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: BLKZEROOUT not zeroing md dev on VMDK
  2016-05-27  4:18 ` Darrick J. Wong
@ 2016-05-27  4:45   ` Sitsofe Wheeler
  2016-05-27  9:30     ` Tom Yan
  0 siblings, 1 reply; 7+ messages in thread
From: Sitsofe Wheeler @ 2016-05-27  4:45 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Shaohua Li, Jens Axboe, Arvind Kumar, VMware PV-Drivers,
	linux-raid, linux-scsi, linux-block, linux-kernel

On 27 May 2016 at 05:18, Darrick J. Wong <darrick.wong@oracle.com> wrote:
>
> It's possible that the pvscsi device advertised WRITE SAME, but if the device
> sends back ILLEGAL REQUEST then the SCSI disk driver will set
> write_same_max_bytes=0.  Subsequent BLKZEROOUT attempts will then issue writes
> of zeroes to the drive.

Thanks for following up on this but that's not what happens on the md
device - you can go on to issue as many BLKZEROOUT requests as you
like but the md disk is never zeroed nor is an error returned.

I filed a bug at https://bugzilla.kernel.org/show_bug.cgi?id=118581
(see https://bugzilla.kernel.org/show_bug.cgi?id=118581#c6 for
alternative reproduction steps that use scsi_debug and can be reworked
to impact device mapper) and Shaohua Li noted that
blkdev_issue_write_same could return 0 even when the disk didn't
support write same (see
https://bugzilla.kernel.org/show_bug.cgi?id=118581#c8 ).

Shaohua went on to create a patch for this ("block: correctly fallback
for zeroout" - https://patchwork.kernel.org/patch/9137311/ ) which has
yet to be reviewed.

-- 
Sitsofe | http://sucs.org/~sits/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: BLKZEROOUT not zeroing md dev on VMDK
  2016-05-27  4:45   ` Sitsofe Wheeler
@ 2016-05-27  9:30     ` Tom Yan
  2016-06-01  5:04       ` Sitsofe Wheeler
  0 siblings, 1 reply; 7+ messages in thread
From: Tom Yan @ 2016-05-27  9:30 UTC (permalink / raw)
  To: Sitsofe Wheeler
  Cc: Darrick J. Wong, Shaohua Li, Jens Axboe, Arvind Kumar,
	VMware PV-Drivers, linux-raid, linux-scsi, linux-block,
	linux-kernel

There seems to be some sort of race condition between
blkdev_issue_zeroout() and the scsi disk driver (disabling write same
after an illegal request). On my UAS drive, sometimes `blkdiscard -z
/dev/sdX` will return right away, even though if I then check
`write_same_max_bytes` it has turned 0. Sometimes it will just write
zero with SCSI WRITE even if `write_same_max_bytes` is 33553920 before
I issue `blkdiscard -z` (`write_same_max_bytes` also turned 0, as
expected).

Not sure if it is directly related to the case here though.

On 27 May 2016 at 12:45, Sitsofe Wheeler <sitsofe@gmail.com> wrote:
> On 27 May 2016 at 05:18, Darrick J. Wong <darrick.wong@oracle.com> wrote:
>>
>> It's possible that the pvscsi device advertised WRITE SAME, but if the device
>> sends back ILLEGAL REQUEST then the SCSI disk driver will set
>> write_same_max_bytes=0.  Subsequent BLKZEROOUT attempts will then issue writes
>> of zeroes to the drive.
>
> Thanks for following up on this but that's not what happens on the md
> device - you can go on to issue as many BLKZEROOUT requests as you
> like but the md disk is never zeroed nor is an error returned.
>
> I filed a bug at https://bugzilla.kernel.org/show_bug.cgi?id=118581
> (see https://bugzilla.kernel.org/show_bug.cgi?id=118581#c6 for
> alternative reproduction steps that use scsi_debug and can be reworked
> to impact device mapper) and Shaohua Li noted that
> blkdev_issue_write_same could return 0 even when the disk didn't
> support write same (see
> https://bugzilla.kernel.org/show_bug.cgi?id=118581#c8 ).
>
> Shaohua went on to create a patch for this ("block: correctly fallback
> for zeroout" - https://patchwork.kernel.org/patch/9137311/ ) which has
> yet to be reviewed.
>
> --
> Sitsofe | http://sucs.org/~sits/
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: BLKZEROOUT not zeroing md dev on VMDK
  2016-05-27  9:30     ` Tom Yan
@ 2016-06-01  5:04       ` Sitsofe Wheeler
  2016-06-15 18:17         ` Arvind Kumar
  0 siblings, 1 reply; 7+ messages in thread
From: Sitsofe Wheeler @ 2016-06-01  5:04 UTC (permalink / raw)
  To: Tom Yan
  Cc: Darrick J. Wong, Shaohua Li, Jens Axboe, Arvind Kumar,
	VMware PV-Drivers, linux-raid, linux-scsi, linux-block,
	linux-kernel

On 27 May 2016 at 10:30, Tom Yan <tom.ty89@gmail.com> wrote:
> There seems to be some sort of race condition between
> blkdev_issue_zeroout() and the scsi disk driver (disabling write same
> after an illegal request). On my UAS drive, sometimes `blkdiscard -z
> /dev/sdX` will return right away, even though if I then check
> `write_same_max_bytes` it has turned 0. Sometimes it will just write
> zero with SCSI WRITE even if `write_same_max_bytes` is 33553920 before
> I issue `blkdiscard -z` (`write_same_max_bytes` also turned 0, as
> expected).
>
> Not sure if it is directly related to the case here though.

I'm not aware of hitting that particular problem myself directly on
the underlying "SCSI" device but the patch on
https://patchwork.kernel.org/patch/9137311/ should be able to resolve
that issue. Could you test it and follow up on
http://permalink.gmane.org/gmane.linux.kernel/2229377 ? I'm hoping
more testing reports will lead to the patch being reviewed and
accepted sooner rather than later as it's currently stalled...

-- 
Sitsofe | http://sucs.org/~sits/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: BLKZEROOUT not zeroing md dev on VMDK
  2016-06-01  5:04       ` Sitsofe Wheeler
@ 2016-06-15 18:17         ` Arvind Kumar
  2016-06-15 21:33           ` Sitsofe Wheeler
  0 siblings, 1 reply; 7+ messages in thread
From: Arvind Kumar @ 2016-06-15 18:17 UTC (permalink / raw)
  To: Sitsofe Wheeler, Tom Yan
  Cc: Darrick J. Wong, Shaohua Li, Jens Axboe, VMware PV-Drivers,
	linux-raid, linux-scsi, linux-block, linux-kernel,
	Petr Vandrovec

It is possibly some race. We saw a WRITE SAME related issue in past for which Petr sent out a patch but looks like the patch didn't make it. :(

https://groups.google.com/forum/#!topic/linux.kernel/1WGDSlyY0y0

Thanks!
Arvind
________________________________________
From: Sitsofe Wheeler <sitsofe@gmail.com>
Sent: Tuesday, May 31, 2016 10:04 PM
To: Tom Yan
Cc: Darrick J. Wong; Shaohua Li; Jens Axboe; Arvind Kumar; VMware PV-Drivers; linux-raid@vger.kernel.org; linux-scsi@vger.kernel.org; linux-block@vger.kernel.org; linux-kernel@vger.kernel.org
Subject: Re: BLKZEROOUT not zeroing md dev on VMDK

On 27 May 2016 at 10:30, Tom Yan <tom.ty89@gmail.com> wrote:
> There seems to be some sort of race condition between
> blkdev_issue_zeroout() and the scsi disk driver (disabling write same
> after an illegal request). On my UAS drive, sometimes `blkdiscard -z
> /dev/sdX` will return right away, even though if I then check
> `write_same_max_bytes` it has turned 0. Sometimes it will just write
> zero with SCSI WRITE even if `write_same_max_bytes` is 33553920 before
> I issue `blkdiscard -z` (`write_same_max_bytes` also turned 0, as
> expected).
>
> Not sure if it is directly related to the case here though.

I'm not aware of hitting that particular problem myself directly on
the underlying "SCSI" device but the patch on
https://urldefense.proofpoint.com/v2/url?u=https-3A__patchwork.kernel.org_patch_9137311_&d=CwIBaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=bUMaNc7nC9xbXtaMJrOvIIPNpPH0chY2kdRsskQn6GY&m=rx_5ntfhkt2GOpfjpiQjoCb5n4gCY7jKznXO0gKYcVI&s=W1F45VBu8NDxu2ImcbKM5b3d6UnUCLGgH8xEM9e6JQk&e=  should be able to resolve
that issue. Could you test it and follow up on
https://urldefense.proofpoint.com/v2/url?u=http-3A__permalink.gmane.org_gmane.linux.kernel_2229377&d=CwIBaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=bUMaNc7nC9xbXtaMJrOvIIPNpPH0chY2kdRsskQn6GY&m=rx_5ntfhkt2GOpfjpiQjoCb5n4gCY7jKznXO0gKYcVI&s=9ekqmTk18vzcwcY0SSMF8AZnJ_lWezFIM8tDvQqeDHI&e=  ? I'm hoping
more testing reports will lead to the patch being reviewed and
accepted sooner rather than later as it's currently stalled...

--
Sitsofe | https://urldefense.proofpoint.com/v2/url?u=http-3A__sucs.org_-7Esits_&d=CwIBaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=bUMaNc7nC9xbXtaMJrOvIIPNpPH0chY2kdRsskQn6GY&m=rx_5ntfhkt2GOpfjpiQjoCb5n4gCY7jKznXO0gKYcVI&s=arwniVbdl5KJZfyreWLhq-WUlgvKAf_eW1i6D2GbFGQ&e=

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: BLKZEROOUT not zeroing md dev on VMDK
  2016-06-15 18:17         ` Arvind Kumar
@ 2016-06-15 21:33           ` Sitsofe Wheeler
  0 siblings, 0 replies; 7+ messages in thread
From: Sitsofe Wheeler @ 2016-06-15 21:33 UTC (permalink / raw)
  To: Arvind Kumar
  Cc: Tom Yan, Darrick J. Wong, Shaohua Li, Jens Axboe,
	VMware PV-Drivers, linux-raid, linux-scsi, linux-block,
	linux-kernel, Petr Vandrovec

On Wed, Jun 15, 2016 at 06:17:37PM +0000, Arvind Kumar wrote:
> It is possibly some race. We saw a WRITE SAME related issue in past
> for which Petr sent out a patch but looks like the patch didn't make
> it. :(
> 
> https://groups.google.com/forum/#!topic/linux.kernel/1WGDSlyY0y0

Indeed - the investigation you folks did is linked to within the
upstream Bugzilla bug (see
https://bugzilla.kernel.org/show_bug.cgi?id=118581#c2 ). Hopefully this
issue will be resolved but there's still some debate over on
http://thread.gmane.org/gmane.linux.kernel/2236800 . The problem is that
it is causing real problems in stable kernels (data not being correctly
zero'd) today...

-- 
Sitsofe | http://sucs.org/~sits/

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-06-15 21:33 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-18 22:39 BLKZEROOUT not zeroing md dev on VMDK Sitsofe Wheeler
2016-05-27  4:18 ` Darrick J. Wong
2016-05-27  4:45   ` Sitsofe Wheeler
2016-05-27  9:30     ` Tom Yan
2016-06-01  5:04       ` Sitsofe Wheeler
2016-06-15 18:17         ` Arvind Kumar
2016-06-15 21:33           ` Sitsofe Wheeler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).