From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lb0-f180.google.com ([209.85.217.180]:32873 "EHLO mail-lb0-f180.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1422697AbcBZUV0 (ORCPT ); Fri, 26 Feb 2016 15:21:26 -0500 Received: by mail-lb0-f180.google.com with SMTP id ep2so20761203lbb.0 for ; Fri, 26 Feb 2016 12:21:25 -0800 (PST) MIME-Version: 1.0 In-Reply-To: References: <20160218205206.GW17997@ZenIV.linux.org.uk> <20160219002539.GX17997@ZenIV.linux.org.uk> <20160219222221.GB17997@ZenIV.linux.org.uk> <20160220133651.GG17997@ZenIV.linux.org.uk> Date: Fri, 26 Feb 2016 15:21:24 -0500 Message-ID: Subject: Re: Orangefs ABI documentation From: Mike Marshall To: Al Viro Cc: Martin Brandenburg , Linus Torvalds , linux-fsdevel , Stephen Rothwell , Mike Marshall Content-Type: text/plain; charset=UTF-8 Sender: linux-fsdevel-owner@vger.kernel.org List-ID: I have updated orangefs.txt to reflect all the work Al has done recently. There are also several other cosmetic and comment updates in the code, along with the time patches that Arnd Bergmann sent today. I would appreciate any corrections or comments on the new parts in orangefs.txt, thanks! git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux.git for-next -Mike On Tue, Feb 23, 2016 at 4:58 PM, Mike Marshall wrote: > Ok, I understand these last couple of problems better. > > If the client-core crashes (kill -9 in my test cases) in > the middle of a rename or an unlink (and maybe some other > operations, these are the ones I have captured and studied) a > couple of things can happen. > > In both cases, you get: > > service_operation > queue the operation > wait_for_matching_downcall = -EAGAIN > queue the operation > wait_for_matching_downcall = 0 > out: > > Sometimes when the operation is first queued, the client-core > will be in the middle of the state machine code and the operation > will be half done when the client-core dies, and the object that > was being operated on will be broken. In other words, it is possible > for a userspace program using the Orangefs native API to corrupt > the filesystem if it crashes in a critical area. Do other userspace > filesystems have this same problem? > > Other times, when the operation is first queued, the client-core > gets the operation fully launched, and then dies. Then, when > the operation is queued up again, the operation fails on -ENOENT. > You can't rename a to b if a has already been renamed to b. You can't > unlink a if there is no a. > > For the first case I don't see how there's anything that can be done. > The filesystem is corrupted. Its not toast or anything, but there's > a directory somewhere with a broken file in it. > > I have made a patch that appears to actually work and cause no bad > side effects for the second case. Al has come colorful phrases, > like "this is too ugly to live" and some others. What you think about > this patch, Al ... > > The d_drop is how I implemented the idea you had at first, Al, I'm > not sure now if it helps or hurts or is a no-op. > > # git --no-pager diff > diff --git a/fs/orangefs/namei.c b/fs/orangefs/namei.c > index b3ae374..6d953b1 100644 > --- a/fs/orangefs/namei.c > +++ b/fs/orangefs/namei.c > @@ -61,6 +61,7 @@ static int orangefs_create(struct inode *dir, > __func__, > dentry->d_name.name); > ret = PTR_ERR(inode); > + d_drop(dentry); > goto out; > } > > @@ -246,12 +247,22 @@ static int orangefs_unlink(struct inode *dir, > struct dentry *dentry) > > op_release(new_op); > > - if (!ret) { > + /* > + * We would never have gotten here if the object didn't > + * exist when we started down this path. There's a race > + * condition where if a restart of the client-core > + * coincides just right with an in-progress unlink a > + * file can get deleted on the server and be gone > + * when service-operation does the retry... > + */ > + if ((!ret) || (ret == -ENOENT)) { > drop_nlink(inode); > > SetMtimeFlag(parent); > dir->i_mtime = dir->i_ctime = current_fs_time(dir->i_sb); > mark_inode_dirty_sync(dir); > + > + ret = 0; > } > return ret; > } > @@ -433,6 +444,17 @@ static int orangefs_rename(struct inode *old_dir, > "orangefs_rename: got downcall status %d\n", > ret); > > + /* > + * We would never have gotten here if the object didn't > + * exist when we started down this path. There's a race > + * condition where if a restart of the client-core > + * coincides just right with an in-progress rename a > + * file can get renamed on the server and be gone > + * when service-operation does the retry... > + */ > + if (ret == -ENOENT) > + ret = 0; > + > if (new_dentry->d_inode) > new_dentry->d_inode->i_ctime = CURRENT_TIME; > > > On Mon, Feb 22, 2016 at 4:22 PM, Mike Marshall wrote: >> I did this and the problem seems fixed: >> >> # git diff >> diff --git a/fs/orangefs/namei.c b/fs/orangefs/namei.c >> index b3ae374..249bda5 100644 >> --- a/fs/orangefs/namei.c >> +++ b/fs/orangefs/namei.c >> @@ -61,6 +61,7 @@ static int orangefs_create(struct inode *dir, >> __func__, >> dentry->d_name.name); >> ret = PTR_ERR(inode); >> + d_drop(dentry); >> goto out; >> } >> >> Of course, this has uncovered yet another reproducible problem: >> >> 710055 orangefs_unlink: called on PPTB1E4.TMP >> 710058 service_operation: orangefs_unlink ffff880014828000 >> >> right in here I think the rm is >> being processed in the server just >> as the client-core has died. >> >> 710534 wait_for_matching_downcall: operation purged ffff880014828000 >> 710538 service_operation: orangefs_unlink ffff880014828000 >> 710539 service_operation:client core is NOT in service >> >> right in here I think stuff starts >> working again and we're going >> to unsuccessfully try to process >> the rm again. >> >> 710646 wait_for_matching_downcall returned 0 for ffff880014828000 >> >> happy, because we got the matching downcall >> >> 710647 service_operation orangefs_unlink returning -2 for ffff880014828000 >> 710648 orangefs_unlink: service_operation returned -2 >> >> sad, because we got ENOENT on second rm >> >> 710649 Releasing OP ffff880014828000 >> >> so... the userspace process (dbench in this case) thinks >> the rm failed, but it didn't. >> >> >> >> On Mon, Feb 22, 2016 at 11:20 AM, Mike Marshall wrote: >>> > Looks like I'd screwed up checking last time. >>> >>> Probably not that ... my branch did diverge over the course >>> of the few days that we were thrashing around in the kernel trying >>> to fix what I had broken two years ago in userspace. >>> >>> I can relate to why you were motivated to remove the thrashing >>> around from the git history, but your git-foo is much stronger >>> than mine. I wanted to try and get my branch back into line using >>> a methodology that I understand to keep from ending up like >>> this fellow: >>> >>> http://myweb.clemson.edu/~hubcap/harris.jpg >>> >>> I'm glad it worked out... my kernel.org for-next branch is updated now. >>> >>> so, I'll keep working the problem, using your d_drop idea first off... >>> I'll be back with more information, and hopefully even have it fixed, soon... >>> >>> -Mike >>> >>> On Sat, Feb 20, 2016 at 8:36 AM, Al Viro wrote: >>>> On Sat, Feb 20, 2016 at 07:14:26AM -0500, Mike Marshall wrote: >>>> >>>>> Your orangefs-untested branch has 5625087 commits. My "current" branch >>>>> has 5625087 commits. In each all of the commit signatures match, except >>>>> for the most recent 15 commits. The last 15 commits in my "current" >>>>> branch were made from your orangefs-untested branch with "git format-patch" >>>>> and applied to my "current" branch with "git am -s". "git log -p" shows that >>>>> my most recent 15 commits differ from your most recent 15 commits by >>>>> the addition of my "sign off" line. >>>> >>>> *blinks* >>>> *checks* >>>> >>>> OK, ignore what I asked, then. Looks like I'd screwed up checking last time. >>>> >>>>> I will absolutely update my kernel.org for-next branch with the procedure you >>>>> outlined, because you said so. >>>>> >>>>> I wish I understood it better, though... I can only guess at this point that >>>>> the procedure you outlined will do some desirable thing to git metadata...? >>>> >>>> None whatsoever, ignore it.