LKML Archive on lore.kernel.org
 help / color / Atom feed
* BUG: aio/direct-io data corruption in 4.7
@ 2016-09-12 18:38 Jonathan Nicklin
  2016-09-21 14:15 ` Christoph Hellwig
  0 siblings, 1 reply; 6+ messages in thread
From: Jonathan Nicklin @ 2016-09-12 18:38 UTC (permalink / raw)
  To: linux-kernel

In 4.7.2, the kernel is acknowledging block writes that have not
completed to disk. To reproduce: create an MD array, run FIO (direct +
libaio), and pull all drives. FIO will continue to run without
receiving I/O errors. I have also reproduced the bug using physical
drives. In this case, only a limited number of I/Os are incorrectly
acknowledged; FIO eventually receives an I/O error after the device
reference is removed.

The root cause of the problem is that dio_complete() does not
correctly propagate I/O errors in the is_async case. Specifically,
generic_write_sync() appears to be overwriting the return status
destined for ki_complete().

This bug appears to have been introduced by the following commit:

Description: "fs: simplify the generic_write_sync prototype"
Committed: Apr 7, 2016
Hash: e259221763a40403d5bb232209998e8c45804ab8
Affects: 4.7-rc1 - master

I have confirmed a fix for the AIO/Direct-IO failure condition but
have not reviewed the rest of the changes associated with that commit.
If you would like a small patch for direct-io.c, let me know.

Regards,
-Jonathan

^ permalink raw reply	[flat|nested] 6+ messages in thread
* Re: BUG: aio/direct-io data corruption in 4.7
@ 2018-11-05 15:16 Gregory Shapiro
  2018-11-06  7:28 ` Jack Wang
  0 siblings, 1 reply; 6+ messages in thread
From: Gregory Shapiro @ 2018-11-05 15:16 UTC (permalink / raw)
  To: hch, jnicklin
  Cc: linux-kernel, linux-fsdevel, gregory.shapiro, Gregory Shapiro

Hello, my name is Gregory Shapiro and I am a newbie on this list.
I recently encountered data corruption as I got a kernel to
acknowledge write ("io_getevents" system call with a correct number of
bytes) but undergoing write to disk failed.
After investigating the problem I found it is identical to issue found
in direct-io.c mentioned the bellow thread.
https://lore.kernel.org/lkml/20160921141539.GA17898@infradead.org/
Is there a reason proposed patch didn't apply to the kernel?
When can I expect it to be applied?
Thanks,
 Gregory

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, back to index

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-12 18:38 BUG: aio/direct-io data corruption in 4.7 Jonathan Nicklin
2016-09-21 14:15 ` Christoph Hellwig
2018-11-05 15:16 Gregory Shapiro
2018-11-06  7:28 ` Jack Wang
2018-11-06 11:31   ` Gregory Shapiro
2018-11-09 15:44     ` Jack Wang

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git
	git clone --mirror https://lore.kernel.org/lkml/9 lkml/git/9.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git