From mboxrd@z Thu Jan 1 00:00:00 1970 From: Bob Peterson Date: Mon, 21 Mar 2016 14:12:43 -0400 (EDT) Subject: [Cluster-devel] [PATCH 4/4] GFS2: Make deletes use regular inode_lookup with special code path In-Reply-To: <20160321164906.GC2915@octiron.msp.redhat.com> References: <1450465350-10060-1-git-send-email-rpeterso@redhat.com> <1450465350-10060-5-git-send-email-rpeterso@redhat.com> <20160321164906.GC2915@octiron.msp.redhat.com> Message-ID: <1047807525.41204097.1458583963016.JavaMail.zimbra@redhat.com> List-Id: To: cluster-devel.redhat.com MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Hi Ben, ----- Original Message ----- > On Fri, Dec 18, 2015 at 01:02:30PM -0600, Bob Peterson wrote: > > This patch changes the inode reclaiming code, delete_work_func, use > > the normal gfs2_inode_lookup, but with a special parameter, for_del > > which causes additional checks to be made. > > I get what you're doing here, and I can't see anything really wrong with > it, but I haven't gone through all the delete code as carefully as I > could. I do have two questions. > > 1. I don't see any reason why, even if it's just for paranoias sake, we > don't avoid dereferencing the gfs2_inode in verify_unlinked before > locking and checking it. It seems like a simple fix that avoids any > chance that there is a problem due to not holding the directory glock > when we're getting the inode. I'm not sure I understand the question. Today, function verify_unlinked dereferences the gfs2_inode, ip, but so does its only caller, gfs2_inode_lookup. We already know ip is valid at that point, so more references shouldn't matter. Unless you mean the inode's glock? > 2. gfs2_inode_lookup calls iget_locked, which creates an inode if one > doesn't already exist. If there's any chance of racing and calling > delete_work_func for an inode that has already been removed, won't we > just end up creating a new one here? > > -Ben Yes, and that's exactly what we want. I found cases in which the delete_work_func races with the evict() code. In those cases, we need the inode to be recreated in order to go through another evict() cycle which will end up deleting it, whereas the first evict didn't. For example, the shrinker can arbitrarily call evict() and get rid of the inode while its nlink is still positive, IOW, it's not deleted, but after that point in time, the system gets a callback from another node instructing it to delete the inode because it was unlinked remotely from that other node. In that case, we still need to transition it from unlinked to free, but we need a valid inode in order to do the proper delete. So we recreate it and then it gets evicted again, in the name of deleting it. Incidentally, my latest version has changed just a bit. I decided to keep the call to gfs2_ilookup in delete_work_func. So the latest version of the delete_work_func() is pasted below. Regards, Bob Peterson Red Hat File Systems --- static void delete_work_func(struct work_struct *work) { struct gfs2_glock *gl = container_of(work, struct gfs2_glock, gl_delete); struct gfs2_sbd *sdp = gl->gl_name.ln_sbd; struct gfs2_inode *ip; struct inode *inode = NULL; u64 no_addr = gl->gl_name.ln_number; /* If someone's using this glock to create a new dinode, the block must have been freed by another node, then re-used, in which case our iopen callback is too late after the fact. Ignore it. */ if (test_bit(GLF_INODE_CREATING, &gl->gl_flags)) goto out; ip = gl->gl_object; /* Note: Unsafe to dereference ip as we don't hold right refs/locks */ if (ip) inode = gfs2_ilookup(sdp->sd_vfs, no_addr); if (inode == NULL || IS_ERR(inode)) inode = gfs2_inode_lookup(sdp->sd_vfs, DT_UNKNOWN, no_addr, 0, 1); if (inode && !IS_ERR(inode)) { d_prune_aliases(inode); iput(inode); } out: gfs2_glock_put(gl); }