From mboxrd@z Thu Jan  1 00:00:00 1970
From: Bob Peterson <rpeterso@redhat.com>
Date: Mon, 21 Mar 2016 14:12:43 -0400 (EDT)
Subject: [Cluster-devel] [PATCH 4/4] GFS2: Make deletes use regular
 inode_lookup with special code path
In-Reply-To: <20160321164906.GC2915@octiron.msp.redhat.com>
References: <1450465350-10060-1-git-send-email-rpeterso@redhat.com>
	<1450465350-10060-5-git-send-email-rpeterso@redhat.com>
	<20160321164906.GC2915@octiron.msp.redhat.com>
Message-ID: <1047807525.41204097.1458583963016.JavaMail.zimbra@redhat.com>
List-Id: <cluster-devel.redhat.com>
To: cluster-devel.redhat.com
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit

Hi Ben,

----- Original Message -----
> On Fri, Dec 18, 2015 at 01:02:30PM -0600, Bob Peterson wrote:
> > This patch changes the inode reclaiming code, delete_work_func, use
> > the normal gfs2_inode_lookup, but with a special parameter, for_del
> > which causes additional checks to be made.
> 
> I get what you're doing here, and I can't see anything really wrong with
> it, but I haven't gone through all the delete code as carefully as I
> could.  I do have two questions.
> 
> 1. I don't see any reason why, even if it's just for paranoias sake, we
> don't avoid dereferencing the gfs2_inode in verify_unlinked before
> locking and checking it. It seems like a simple fix that avoids any
> chance that there is a problem due to not holding the directory glock
> when we're getting the inode.

I'm not sure I understand the question. Today, function verify_unlinked
dereferences the gfs2_inode, ip, but so does its only caller,
gfs2_inode_lookup. We already know ip is valid at that point, so more
references shouldn't matter. Unless you mean the inode's glock?
 
> 2. gfs2_inode_lookup calls iget_locked, which creates an inode if one
> doesn't already exist.  If there's any chance of racing and calling
> delete_work_func for an inode that has already been removed, won't we
> just end up creating a new one here?
> 
> -Ben

Yes, and that's exactly what we want. I found cases in which the
delete_work_func races with the evict() code. In those cases, we need
the inode to be recreated in order to go through another evict() cycle
which will end up deleting it, whereas the first evict didn't.
For example, the shrinker can arbitrarily call evict() and get rid of
the inode while its nlink is still positive, IOW, it's not deleted,
but after that point in time, the system gets a callback from another
node instructing it to delete the inode because it was unlinked
remotely from that other node. In that case, we still need to
transition it from unlinked to free, but we need a valid inode
in order to do the proper delete. So we recreate it and then it gets
evicted again, in the name of deleting it.

Incidentally, my latest version has changed just a bit. I decided
to keep the call to gfs2_ilookup in delete_work_func. So the latest
version of the delete_work_func() is pasted below.

Regards,

Bob Peterson
Red Hat File Systems
---
static void delete_work_func(struct work_struct *work)
{
	struct gfs2_glock *gl = container_of(work, struct gfs2_glock, gl_delete);
	struct gfs2_sbd *sdp = gl->gl_name.ln_sbd;
	struct gfs2_inode *ip;
	struct inode *inode = NULL;
	u64 no_addr = gl->gl_name.ln_number;

	/* If someone's using this glock to create a new dinode, the block must
	   have been freed by another node, then re-used, in which case our
	   iopen callback is too late after the fact. Ignore it. */
	if (test_bit(GLF_INODE_CREATING, &gl->gl_flags))
		goto out;

	ip = gl->gl_object;
	/* Note: Unsafe to dereference ip as we don't hold right refs/locks */

	if (ip)
		inode = gfs2_ilookup(sdp->sd_vfs, no_addr);

	if (inode == NULL || IS_ERR(inode))
		inode = gfs2_inode_lookup(sdp->sd_vfs, DT_UNKNOWN, no_addr,
					  0, 1);
	if (inode && !IS_ERR(inode)) {
		d_prune_aliases(inode);
		iput(inode);
	}
out:
	gfs2_glock_put(gl);
}