From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf0-f54.google.com ([209.85.215.54]:34272 "EHLO mail-lf0-f54.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755613AbcBWV6G (ORCPT ); Tue, 23 Feb 2016 16:58:06 -0500 Received: by mail-lf0-f54.google.com with SMTP id j78so124544212lfb.1 for ; Tue, 23 Feb 2016 13:58:05 -0800 (PST) MIME-Version: 1.0 In-Reply-To: References: <20160218205206.GW17997@ZenIV.linux.org.uk> <20160219002539.GX17997@ZenIV.linux.org.uk> <20160219222221.GB17997@ZenIV.linux.org.uk> <20160220133651.GG17997@ZenIV.linux.org.uk> Date: Tue, 23 Feb 2016 16:58:04 -0500 Message-ID: Subject: Re: Orangefs ABI documentation From: Mike Marshall To: Al Viro Cc: Martin Brandenburg , Linus Torvalds , linux-fsdevel , Stephen Rothwell , Mike Marshall Content-Type: text/plain; charset=UTF-8 Sender: linux-fsdevel-owner@vger.kernel.org List-ID: Ok, I understand these last couple of problems better. If the client-core crashes (kill -9 in my test cases) in the middle of a rename or an unlink (and maybe some other operations, these are the ones I have captured and studied) a couple of things can happen. In both cases, you get: service_operation queue the operation wait_for_matching_downcall = -EAGAIN queue the operation wait_for_matching_downcall = 0 out: Sometimes when the operation is first queued, the client-core will be in the middle of the state machine code and the operation will be half done when the client-core dies, and the object that was being operated on will be broken. In other words, it is possible for a userspace program using the Orangefs native API to corrupt the filesystem if it crashes in a critical area. Do other userspace filesystems have this same problem? Other times, when the operation is first queued, the client-core gets the operation fully launched, and then dies. Then, when the operation is queued up again, the operation fails on -ENOENT. You can't rename a to b if a has already been renamed to b. You can't unlink a if there is no a. For the first case I don't see how there's anything that can be done. The filesystem is corrupted. Its not toast or anything, but there's a directory somewhere with a broken file in it. I have made a patch that appears to actually work and cause no bad side effects for the second case. Al has come colorful phrases, like "this is too ugly to live" and some others. What you think about this patch, Al ... The d_drop is how I implemented the idea you had at first, Al, I'm not sure now if it helps or hurts or is a no-op. # git --no-pager diff diff --git a/fs/orangefs/namei.c b/fs/orangefs/namei.c index b3ae374..6d953b1 100644 --- a/fs/orangefs/namei.c +++ b/fs/orangefs/namei.c @@ -61,6 +61,7 @@ static int orangefs_create(struct inode *dir, __func__, dentry->d_name.name); ret = PTR_ERR(inode); + d_drop(dentry); goto out; } @@ -246,12 +247,22 @@ static int orangefs_unlink(struct inode *dir, struct dentry *dentry) op_release(new_op); - if (!ret) { + /* + * We would never have gotten here if the object didn't + * exist when we started down this path. There's a race + * condition where if a restart of the client-core + * coincides just right with an in-progress unlink a + * file can get deleted on the server and be gone + * when service-operation does the retry... + */ + if ((!ret) || (ret == -ENOENT)) { drop_nlink(inode); SetMtimeFlag(parent); dir->i_mtime = dir->i_ctime = current_fs_time(dir->i_sb); mark_inode_dirty_sync(dir); + + ret = 0; } return ret; } @@ -433,6 +444,17 @@ static int orangefs_rename(struct inode *old_dir, "orangefs_rename: got downcall status %d\n", ret); + /* + * We would never have gotten here if the object didn't + * exist when we started down this path. There's a race + * condition where if a restart of the client-core + * coincides just right with an in-progress rename a + * file can get renamed on the server and be gone + * when service-operation does the retry... + */ + if (ret == -ENOENT) + ret = 0; + if (new_dentry->d_inode) new_dentry->d_inode->i_ctime = CURRENT_TIME; On Mon, Feb 22, 2016 at 4:22 PM, Mike Marshall wrote: > I did this and the problem seems fixed: > > # git diff > diff --git a/fs/orangefs/namei.c b/fs/orangefs/namei.c > index b3ae374..249bda5 100644 > --- a/fs/orangefs/namei.c > +++ b/fs/orangefs/namei.c > @@ -61,6 +61,7 @@ static int orangefs_create(struct inode *dir, > __func__, > dentry->d_name.name); > ret = PTR_ERR(inode); > + d_drop(dentry); > goto out; > } > > Of course, this has uncovered yet another reproducible problem: > > 710055 orangefs_unlink: called on PPTB1E4.TMP > 710058 service_operation: orangefs_unlink ffff880014828000 > > right in here I think the rm is > being processed in the server just > as the client-core has died. > > 710534 wait_for_matching_downcall: operation purged ffff880014828000 > 710538 service_operation: orangefs_unlink ffff880014828000 > 710539 service_operation:client core is NOT in service > > right in here I think stuff starts > working again and we're going > to unsuccessfully try to process > the rm again. > > 710646 wait_for_matching_downcall returned 0 for ffff880014828000 > > happy, because we got the matching downcall > > 710647 service_operation orangefs_unlink returning -2 for ffff880014828000 > 710648 orangefs_unlink: service_operation returned -2 > > sad, because we got ENOENT on second rm > > 710649 Releasing OP ffff880014828000 > > so... the userspace process (dbench in this case) thinks > the rm failed, but it didn't. > > > > On Mon, Feb 22, 2016 at 11:20 AM, Mike Marshall wrote: >> > Looks like I'd screwed up checking last time. >> >> Probably not that ... my branch did diverge over the course >> of the few days that we were thrashing around in the kernel trying >> to fix what I had broken two years ago in userspace. >> >> I can relate to why you were motivated to remove the thrashing >> around from the git history, but your git-foo is much stronger >> than mine. I wanted to try and get my branch back into line using >> a methodology that I understand to keep from ending up like >> this fellow: >> >> http://myweb.clemson.edu/~hubcap/harris.jpg >> >> I'm glad it worked out... my kernel.org for-next branch is updated now. >> >> so, I'll keep working the problem, using your d_drop idea first off... >> I'll be back with more information, and hopefully even have it fixed, soon... >> >> -Mike >> >> On Sat, Feb 20, 2016 at 8:36 AM, Al Viro wrote: >>> On Sat, Feb 20, 2016 at 07:14:26AM -0500, Mike Marshall wrote: >>> >>>> Your orangefs-untested branch has 5625087 commits. My "current" branch >>>> has 5625087 commits. In each all of the commit signatures match, except >>>> for the most recent 15 commits. The last 15 commits in my "current" >>>> branch were made from your orangefs-untested branch with "git format-patch" >>>> and applied to my "current" branch with "git am -s". "git log -p" shows that >>>> my most recent 15 commits differ from your most recent 15 commits by >>>> the addition of my "sign off" line. >>> >>> *blinks* >>> *checks* >>> >>> OK, ignore what I asked, then. Looks like I'd screwed up checking last time. >>> >>>> I will absolutely update my kernel.org for-next branch with the procedure you >>>> outlined, because you said so. >>>> >>>> I wish I understood it better, though... I can only guess at this point that >>>> the procedure you outlined will do some desirable thing to git metadata...? >>> >>> None whatsoever, ignore it.