From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf0-f43.google.com ([209.85.215.43]:33814 "EHLO mail-lf0-f43.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2994275AbcBSWLc (ORCPT ); Fri, 19 Feb 2016 17:11:32 -0500 Received: by mail-lf0-f43.google.com with SMTP id j78so62624023lfb.1 for ; Fri, 19 Feb 2016 14:11:30 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <20160219002539.GX17997@ZenIV.linux.org.uk> References: <20160217231524.GQ17997@ZenIV.linux.org.uk> <20160218000439.GR17997@ZenIV.linux.org.uk> <20160218111122.GS17997@ZenIV.linux.org.uk> <20160218205206.GW17997@ZenIV.linux.org.uk> <20160219002539.GX17997@ZenIV.linux.org.uk> Date: Fri, 19 Feb 2016 17:11:29 -0500 Message-ID: Subject: Re: Orangefs ABI documentation From: Mike Marshall To: Al Viro Cc: Martin Brandenburg , Linus Torvalds , linux-fsdevel , Stephen Rothwell , Mike Marshall Content-Type: text/plain; charset=UTF-8 Sender: linux-fsdevel-owner@vger.kernel.org List-ID: Yay! The problem is fixed. Boo! Now a new problem is uncovered, I don't have a handle on it yet. Now it is possible to create a broken file on the orangefs server across a restart of the client-core. dbench: (808) open ./clients/client0/~dmtmp/PWRPNT/PPTC112.TMP failed for handle 10042 (No such file or directory) ls -l /pvfsmnt/clients/client0/~dmtmp/PWRPNT ls: cannot access /pvfsmnt/clients/client0/~dmtmp/PWRPNT/PPTC112.TMP: No such file or directory total 1364 -rw-------. 1 root root 85026 Feb 19 14:53 NEWPCB.PPT -rw-------. 1 root root 260096 Feb 19 14:52 PCBENCHM.PPT ??????????? ? ? ? ? ? PPTC112.TMP -rw-------. 1 root root 260096 Feb 19 14:51 PPTOOLS1.PPA -rw-------. 1 root root 260096 Feb 19 14:51 TIPS.PPT -rw-------. 1 root root 260096 Feb 19 14:51 TRIDOTS.POT -rw-------. 1 root root 260096 Feb 19 14:51 ZD16.BMP The filename comes back from the server in the readdir buffer. I can reproduce this, so I'll have to work the problem some more to find more information. First place I'll look is the khandle code ... Anywho... The fixed version of the client-core for the other problem is in this SVN repository: http://www.orangefs.org/svn/orangefs/branches/trunk.kernel.update/ As far as orangefs for-next is concerned... I don't see how to update it without destroying the top few commit messages in the commit history. I plan to update the kernel.org orangefs for-next tree to look exactly like the "current" branch of my github tree, unless someone says not to: github.com/hubcapsc/linux/tree/current Latest commit c1223ca -Mike On Thu, Feb 18, 2016 at 7:25 PM, Al Viro wrote: > On Thu, Feb 18, 2016 at 04:50:11PM -0500, Mike Marshall wrote: >> As part of the attempt to go upstream, this "hubcap" guy you see >> in the comments worked on a thing that changes 64bit userspace handles >> back and forth into 128bit kernel handles... we did this because >> one day, when we have orangefs3, we will be using 128bit uuid-derived >> handles, and we believe it is our responsibility to not break the >> upstream kernel module. >> >> Anywho, I bet you are right Al, he messed up this part of it... >> I'll look and see if that is really so, and get it fixed. >> >> -Mike "hubcap" > > OK... I'll fold the trivial braino fix (op_is_cancel() checking the wrong > thing) into "orangefs: delay freeing slot until cancel completes" where it > had been introduced, but the rest of it is probably too far and will have > to be a couple of commits on top of that queue. Had it been just my tree, > I probably would still reorder and fold, but I know that my habits in that > respect are rather extreme. > > FWIW, the scenario spotted by Martin wouldn't cause any real problems, but > only because by the time we ended copying to/from daemon service_operation() > couldn't have reached resubmit - it only happens if there had been a purge > and that can't happen while somebody is inside a control device method. > > So the original code had been correct, but it was more brittle than > I'd like *and* making sure that nobody else sees an op by the time > orangefs_clean_interrupted_operation() returns is a good thing. > > New logics gives that, and avoids the need to play with refcounts on ops. > > I've pushed that into #orangefs-untested; if that works, please switch your > for-next to it.