From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from zeniv.linux.org.uk ([195.92.253.2]:46648 "EHLO ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751392AbcBGDxf (ORCPT ); Sat, 6 Feb 2016 22:53:35 -0500 Date: Sun, 7 Feb 2016 03:53:31 +0000 From: Al Viro To: Mike Marshall Cc: Linus Torvalds , linux-fsdevel , Stephen Rothwell Subject: Re: Orangefs ABI documentation Message-ID: <20160207035331.GZ17997@ZenIV.linux.org.uk> References: <20160123214006.GO17997@ZenIV.linux.org.uk> <20160124001615.GT17997@ZenIV.linux.org.uk> <20160124040529.GX17997@ZenIV.linux.org.uk> <20160130173413.GE17997@ZenIV.linux.org.uk> <20160130182731.GF17997@ZenIV.linux.org.uk> <20160206194210.GX17997@ZenIV.linux.org.uk> <20160207013835.GY17997@ZenIV.linux.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160207013835.GY17997@ZenIV.linux.org.uk> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Sun, Feb 07, 2016 at 01:38:35AM +0000, Al Viro wrote: > > > As for the WARN_ONs, the waitqueue one is easy to hit when the > > > client-core stops and restarts, you can see here where precopy_buffers > > > started whining about the client-core, you can see that the client > > > core restarted when the debug mask got sent back over, and then > > > the WARN_ON in waitqueue gets hit: > > > > [ 1239.198976] precopy_buffers: Failed to copy-in buffers. Please make > > > sure that the pvfs2-client is running. -14 > > Very interesting... > > Looks like there's another bug in restart handling. Namely, restart happening > on write() tries to fetch more data from iter, without bothering to rewind to > where it used to be. That's where those -EFAULT are coming from. Easy to fix, > fortunately - on top of the double-free fix, apply the following: BTW, could you try to reproduce that WARN_ON with these two patches added and with bufmap debugging turned on? Both double-free and lack of rewinding are real; I can see scenarios where they would trigger, and I'm pretty sure that the latter is triggering in your reproducer. Moreover, I'm absolutely sure that spurious dropping of bufmap references is happening there; what I'm not sure is whether it was on this double-free or on something else...