From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Thu, 17 Jun 2021 17:21:43 -0400 From: Vivek Goyal Message-ID: <20210617212143.GD1142820@redhat.com> References: <20210609155551.44437-1-mreitz@redhat.com> <20210609155551.44437-8-mreitz@redhat.com> <20210611200459.GB767764@redhat.com> <9cea5642-e5ea-961f-d816-0998e52aad9f@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <9cea5642-e5ea-961f-d816-0998e52aad9f@redhat.com> Subject: Re: [Virtio-fs] [PATCH v2 7/9] virtiofsd: Add inodes_by_handle hash table List-Id: Development discussions about virtio-fs List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Max Reitz Cc: virtio-fs@redhat.com, qemu-devel@nongnu.org On Wed, Jun 16, 2021 at 03:38:13PM +0200, Max Reitz wrote: > On 11.06.21 22:04, Vivek Goyal wrote: > > On Wed, Jun 09, 2021 at 05:55:49PM +0200, Max Reitz wrote: > > > Currently, lo_inode.fhandle is always NULL and so always keep an O_PATH > > > FD in lo_inode.fd. Therefore, when the respective inode is unlinked, > > > its inode ID will remain in use until we drop our lo_inode (and > > > lo_inode_put() thus closes the FD). Therefore, lo_find() can safely use > > > the inode ID as an lo_inode key, because any inode with an inode ID we > > > find in lo_data.inodes (on the same filesystem) must be the exact same > > > file. > > > > > > This will change when we start setting lo_inode.fhandle so we do not > > > have to keep an O_PATH FD open. Then, unlinking such an inode will > > > immediately remove it, so its ID can then be reused by newly created > > > files, even while the lo_inode object is still there[1]. > > > > > > So creating a new file can then reuse the old file's inode ID, and > > > looking up the new file would lead to us finding the old file's > > > lo_inode, which is not ideal. > > > > > > Luckily, just as file handles cause this problem, they also solve it: A > > > file handle contains a generation ID, which changes when an inode ID is > > > reused, so the new file can be distinguished from the old one. So all > > > we need to do is to add a second map besides lo_data.inodes that maps > > > file handles to lo_inodes, namely lo_data.inodes_by_handle. For > > > clarity, lo_data.inodes is renamed to lo_data.inodes_by_ids. > > > > > > Unfortunately, we cannot rely on being able to generate file handles > > > every time. > > Hi Max, > > > > What are the cases where we can not rely being able to generate file > > handles? > > I have no idea, but it’s much easier to claim we can’t than to prove that we > can. I’d rather be resilient. Assuming that we can not genererate file handles all the time and hence mainitaing another inode cache seems little problematic to me. I would rather start with that we can generate file handles and have a single inode cache. > > > > Therefore, we still enter every lo_inode object into > > > inodes_by_ids, but having an entry in inodes_by_handle is optional. A > > > potential inodes_by_handle entry then has precedence, the inodes_by_ids > > > entry is just a fallback. > > If we have to keep inodes_by_ids around, then can we just add fhandle > > to the lo_key. That way we can manage with single hash table and still > > be able to detect if inode ID has been reused. > > We cannot, because I assume we cannot rely on name_to_handle_at() working > every time. I guess either we need concrete information that we can't generate file handle every time or we should assume we can until we are proven wrong. And then fix it accordingly, IMHO. > Therefore, maybe at one point we can generate a file handle, and > at another, we cannot – we should still be able to look up the inode > regardless. > > If the file handle were part of inodes_by_ids, then we can look up inodes > only if we can generate a file handle either every time (for a given inode) > or never. Right. And is there a reason to belive that for the same file we can sometimes generate file handles and other times not. Thanks Vivek > Or, well, I suppose we could always create two entries, one with > the file handles zeroed out, and one with the file handle specified, but I > wouldn’t find that very beautiful. > > Max > > > > Note that we do not generate lo_fhandle objects yet, and so we also do > > > not enter anything into the inodes_by_handle map yet. Also, all lookups > > > skip that map. We might manually create file handles with some code > > > that is immediately removed by the next patch again, but that would > > > break the assumption in lo_find() that every lo_inode with a non-NULL > > > .fhandle must have an entry in inodes_by_handle and vice versa. So we > > > leave actually using the inodes_by_handle map for the next patch. > > > > > > [1] If some application in the guest still has the file open, there is > > > going to be a corresponding FD mapping in lo_data.fd_map. In such a > > > case, the inode will only go away once every application in the guest > > > has closed it. The problem described only applies to cases where the > > > guest does not have the file open, and it is just in the dentry cache, > > > basically. > > > > > > Signed-off-by: Max Reitz > > > Reviewed-by: Connor Kuehl > > > --- > > > tools/virtiofsd/passthrough_ll.c | 80 +++++++++++++++++++++++++------- > > > 1 file changed, 64 insertions(+), 16 deletions(-) > > > > > > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c > > > index e665575401..793d2c333e 100644 > > > --- a/tools/virtiofsd/passthrough_ll.c > > > +++ b/tools/virtiofsd/passthrough_ll.c > > > @@ -179,7 +179,8 @@ struct lo_data { > > > int announce_submounts; > > > bool use_statx; > > > struct lo_inode root; > > > - GHashTable *inodes; /* protected by lo->mutex */ > > > + GHashTable *inodes_by_ids; /* protected by lo->mutex */ > > > + GHashTable *inodes_by_handle; /* protected by lo->mutex */ > > > struct lo_map ino_map; /* protected by lo->mutex */ > > > struct lo_map dirp_map; /* protected by lo->mutex */ > > > struct lo_map fd_map; /* protected by lo->mutex */ > > > @@ -257,8 +258,9 @@ static struct { > > > /* That we loaded cap-ng in the current thread from the saved */ > > > static __thread bool cap_loaded = 0; > > > -static struct lo_inode *lo_find(struct lo_data *lo, struct stat *st, > > > - uint64_t mnt_id); > > > +static struct lo_inode *lo_find(struct lo_data *lo, > > > + const struct lo_fhandle *fhandle, > > > + struct stat *st, uint64_t mnt_id); > > > static int xattr_map_client(const struct lo_data *lo, const char *client_name, > > > char **out_name); > > > @@ -1032,18 +1034,39 @@ out_err: > > > fuse_reply_err(req, saverr); > > > } > > > -static struct lo_inode *lo_find(struct lo_data *lo, struct stat *st, > > > - uint64_t mnt_id) > > > +static struct lo_inode *lo_find(struct lo_data *lo, > > > + const struct lo_fhandle *fhandle, > > > + struct stat *st, uint64_t mnt_id) > > > { > > > - struct lo_inode *p; > > > - struct lo_key key = { > > > + struct lo_inode *p = NULL; > > > + struct lo_key ids_key = { > > > .ino = st->st_ino, > > > .dev = st->st_dev, > > > .mnt_id = mnt_id, > > > }; > > > pthread_mutex_lock(&lo->mutex); > > > - p = g_hash_table_lookup(lo->inodes, &key); > > > + if (fhandle) { > > > + p = g_hash_table_lookup(lo->inodes_by_handle, fhandle); > > > + } > > > + if (!p) { > > > + p = g_hash_table_lookup(lo->inodes_by_ids, &ids_key); > > > + /* > > > + * When we had to fall back to looking up an inode by its IDs, > > > + * ensure that we hit an entry that does not have a file > > > + * handle. Entries with file handles must also have a handle > > > + * alt key, so if we have not found it by that handle alt key, > > > + * we must have found an entry with a mismatching handle; i.e. > > > + * an entry for a different file, even though it has the same > > > + * inode ID. > > > + * (This can happen when we look up a new file that has reused > > > + * the inode ID of some previously unlinked inode for which we > > > + * still have an lo_inode object.) > > > + */ > > > + if (p && fhandle != NULL && p->fhandle != NULL) { > > > + p = NULL; > > > + } > > > + } > > > if (p) { > > > assert(p->nlookup > 0); > > > p->nlookup++; > > > @@ -1183,7 +1206,7 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name, > > > e->attr_flags |= FUSE_ATTR_SUBMOUNT; > > > } > > > - inode = lo_find(lo, &e->attr, mnt_id); > > > + inode = lo_find(lo, NULL, &e->attr, mnt_id); > > > if (inode) { > > > close(newfd); > > > } else { > > > @@ -1213,7 +1236,7 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name, > > > } > > > pthread_mutex_lock(&lo->mutex); > > > inode->fuse_ino = lo_add_inode_mapping(req, inode); > > > - g_hash_table_insert(lo->inodes, &inode->key, inode); > > > + g_hash_table_insert(lo->inodes_by_ids, &inode->key, inode); > > > pthread_mutex_unlock(&lo->mutex); > > > } > > > e->ino = inode->fuse_ino; > > > @@ -1525,7 +1548,7 @@ static struct lo_inode *lookup_name(fuse_req_t req, fuse_ino_t parent, > > > return NULL; > > > } > > > - return lo_find(lo, &attr, mnt_id); > > > + return lo_find(lo, NULL, &attr, mnt_id); > > > } > > > static void lo_rmdir(fuse_req_t req, fuse_ino_t parent, const char *name) > > > @@ -1688,7 +1711,7 @@ static void unref_inode(struct lo_data *lo, struct lo_inode *inode, uint64_t n) > > > inode->nlookup -= n; > > > if (!inode->nlookup) { > > > lo_map_remove(&lo->ino_map, inode->fuse_ino); > > > - g_hash_table_remove(lo->inodes, &inode->key); > > > + g_hash_table_remove(lo->inodes_by_ids, &inode->key); > > > if (lo->posix_lock) { > > > if (g_hash_table_size(inode->posix_locks)) { > > > fuse_log(FUSE_LOG_WARNING, "Hash table is not empty\n"); > > > @@ -3388,7 +3411,7 @@ static void lo_destroy(void *userdata) > > > GHashTableIter iter; > > > gpointer key, value; > > > - g_hash_table_iter_init(&iter, lo->inodes); > > > + g_hash_table_iter_init(&iter, lo->inodes_by_ids); > > > if (!g_hash_table_iter_next(&iter, &key, &value)) { > > > break; > > > } > > > @@ -3931,10 +3954,34 @@ static gboolean lo_key_equal(gconstpointer a, gconstpointer b) > > > return la->ino == lb->ino && la->dev == lb->dev && la->mnt_id == lb->mnt_id; > > > } > > > +static guint lo_fhandle_hash(gconstpointer key) > > > +{ > > > + const struct lo_fhandle *fh = key; > > > + guint hash; > > > + size_t i; > > > + > > > + /* Basically g_str_hash() */ > > > + hash = 5381; > > > + for (i = 0; i < sizeof(fh->padding); i++) { > > > + hash += hash * 33 + (unsigned char)fh->padding[i]; > > > + } > > > + hash += hash * 33 + fh->mount_id; > > > + > > > + return hash; > > > +} > > > + > > > +static gboolean lo_fhandle_equal(gconstpointer a, gconstpointer b) > > > +{ > > > + return !memcmp(a, b, sizeof(struct lo_fhandle)); > > > +} > > > + > > > static void fuse_lo_data_cleanup(struct lo_data *lo) > > > { > > > - if (lo->inodes) { > > > - g_hash_table_destroy(lo->inodes); > > > + if (lo->inodes_by_ids) { > > > + g_hash_table_destroy(lo->inodes_by_ids); > > > + } > > > + if (lo->inodes_by_ids) { > > > + g_hash_table_destroy(lo->inodes_by_handle); > > > } > > > if (lo->root.posix_locks) { > > > @@ -3990,7 +4037,8 @@ int main(int argc, char *argv[]) > > > qemu_init_exec_dir(argv[0]); > > > pthread_mutex_init(&lo.mutex, NULL); > > > - lo.inodes = g_hash_table_new(lo_key_hash, lo_key_equal); > > > + lo.inodes_by_ids = g_hash_table_new(lo_key_hash, lo_key_equal); > > > + lo.inodes_by_handle = g_hash_table_new(lo_fhandle_hash, lo_fhandle_equal); > > > lo.root.fd = -1; > > > lo.root.fuse_ino = FUSE_ROOT_ID; > > > lo.cache = CACHE_AUTO; > > > -- > > > 2.31.1 > > > > > > _______________________________________________ > > > Virtio-fs mailing list > > > Virtio-fs@redhat.com > > > https://listman.redhat.com/mailman/listinfo/virtio-fs > > > >