From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Wed, 18 Jul 2018 13:40:07 +0200 From: Jan Kara To: Johannes Thumshirn Cc: Ming Lei , Martin Wilck , Ming Lei , Jens Axboe , Hannes Reinecke , Christoph Hellwig , "linux-block@vger.kernel.org" , jack@suse.com, kent.overstreet@gmail.com Subject: Re: Silent data corruption in blkdev_direct_IO() Message-ID: <20180718114007.huwriszokmcksqs6@quack2.suse.cz> References: <20180718024758.GB11151@ming.t460p> <54436062eee1e10644b536ae3c8c40f94da3ccbd.camel@suse.com> <20180718075440.GA15254@ming.t460p> <20180718092014.65k4dvg2ezrpbnzn@linux-x5ow.site> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: <20180718092014.65k4dvg2ezrpbnzn@linux-x5ow.site> List-ID: On Wed 18-07-18 11:20:15, Johannes Thumshirn wrote: > On Wed, Jul 18, 2018 at 03:54:46PM +0800, Ming Lei wrote: > > Please go ahead and take care of it since you have the test cases. > > Speaking of which, do we already know how it is triggered and can we > cook up a blktests testcase for it? This would be more than helpful > for all parties. Using multiple iovecs with writev / readv trivially triggers the case of IO that is done partly as direct and partly as buffered. Neither me nor Martin were able to trigger the data corruption the customer is seeing with KVM though (since the generic code tries to maintain data integrity even if the IO is mixed). It should be possible to trigger the corruption by having two processes doing write to the same PAGE_SIZE region of a block device, just at different offsets. And if the first process happens to use direct IO while the second ends up doing read-modify-write cycle through page cache, the first write could end up being lost. I'll try whether something like this is able to see the corruption... Honza -- Jan Kara SUSE Labs, CR