From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Ewan D. Milne" Subject: Re: SG does not ignore dxferp (direct io + mmap) Date: Fri, 02 Dec 2016 14:29:37 -0500 Message-ID: <1480706977.28416.189.camel@localhost.localdomain> References: <1479752642.19792.43.camel@localhost.localdomain> <20161122083759.xeifuex3xxfimuwz@linux-x5ow.site> <1479839407.28416.21.camel@localhost.localdomain> <2146476957.2165908.1479927335303.JavaMail.zimbra@redhat.com> <1479932524.28416.43.camel@localhost.localdomain> <20161125080758.5bh5jkcgvhw3xuvb@linux-x5ow.site> <1194718949.74785.1480096576577.JavaMail.zimbra@redhat.com> <1480523188.28416.94.camel@localhost.localdomain> <20161202122133.GA14247@infradead.org> <1480685383.28416.147.camel@localhost.localdomain> <9af9525a-be53-c3ee-6977-dd7aa30385f9@suse.de> Reply-To: emilne@redhat.com Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit Return-path: Received: from mx1.redhat.com ([209.132.183.28]:57392 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751618AbcLBT3j (ORCPT ); Fri, 2 Dec 2016 14:29:39 -0500 In-Reply-To: <9af9525a-be53-c3ee-6977-dd7aa30385f9@suse.de> Sender: linux-scsi-owner@vger.kernel.org List-Id: linux-scsi@vger.kernel.org To: Hannes Reinecke Cc: Christoph Hellwig , "Martin K. Petersen" , Johannes Thumshirn , Laurence Oberman , Eyal Ben David , dgilbert@interlog.com, linux-scsi@vger.kernel.org On Fri, 2016-12-02 at 15:10 +0100, Hannes Reinecke wrote: > On 12/02/2016 02:29 PM, Ewan D. Milne wrote: > > On Fri, 2016-12-02 at 04:21 -0800, Christoph Hellwig wrote: > >> On Thu, Dec 01, 2016 at 08:40:31AM -0500, Martin K. Petersen wrote: > >>> Specifically, the problem appears to be caused by the removal of > >>> the setting of bio->bi_bdev, which would previously be set to NULL. > >>> If I add: > >> > >> Very odd. For one I would expect it to be NULL anyway, second > >> I don't see why the behavior changed. But given that this reverts > >> to the original assignment and makes things work I'll happily hack it > >> to get things working again: > >> > >> Acked-by: Christoph Hellwig > > > > Yeah, I'm not sure I understand this either, apart from the change > > adjusting the code to effectively do what it used to and making the > > test case work. I'm reluctant to cc: stable yet, let me look at this > > a bit more and I'll post the actual patch soon. > > > Plus we found that this is basically a timing issue; we've found that > supposedly fixed bugs will crop up after ~4k iterations. > (Johannes did a _lot_ of testing here :-) > So just because the bug failed to materialize can also mean that you > simply didn't test long enough. > Yes, and following the code paths it isn't completely clear how this leads to the single zero-byte corruption, I am continuing to investigate. There may very well be more than one problem. On kernel versions I tested where I got a failure it was a solid failure, it never worked no matter how many times I tried, but I did not exhaustively test apparently successful kernel versions. Not thousands, of times, anyway. -Ewan