All of lore.kernel.org
 help / color / mirror / Atom feed
* broken hash use in fs/cifs/dfs_cache.c
@ 2021-03-22 17:48 Al Viro
  2021-03-22 18:23 ` Paulo Alcantara
  2021-03-22 18:38 ` Aurélien Aptel
  0 siblings, 2 replies; 4+ messages in thread
From: Al Viro @ 2021-03-22 17:48 UTC (permalink / raw)
  To: linux-cifs; +Cc: Paulo Alcantara, Aurelien Aptel, Steve French, linux-fsdevel

	Found while trying to untangle some... unidiomatic string handling
in cifs:

static struct cache_entry *__lookup_cache_entry(const char *path)
{
        struct cache_entry *ce;
        unsigned int h;
        bool found = false;

        h = cache_entry_hash(path, strlen(path));

        hlist_for_each_entry(ce, &cache_htable[h], hlist) {
                if (!strcasecmp(path, ce->path)) {
                        found = true;
                        dump_ce(ce);
                        break;
                }
        }

        if (!found)
                ce = ERR_PTR(-ENOENT);
        return ce;
}

combined with

static inline unsigned int cache_entry_hash(const void *data, int size)
{
        unsigned int h;

        h = jhash(data, size, 0);
        return h & (CACHE_HTABLE_SIZE - 1);
}

That can't possibly work.  The fundamental requirement for hashes is that
lookups for all keys matching an entry *MUST* lead to searches in the same
hash chain.  Here the test is strcasecmp(), so "foo" and "Foo" are expected
to match the same entry.  But jhash() yields different values on those -
it's a general-purpose hash, and it doesn't give a damn about upper and
lower case letters.  Moreover, even though we look at the value modulo
32, I don't believe that it's going to be case-insensitive.

Either the key comparison or the hash function is wrong here.  *IF* something
external guarantees the full match, we don't need strcasecmp() - strcmp()
would work.  Otherwise, the hash function needs to be changed.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: broken hash use in fs/cifs/dfs_cache.c
  2021-03-22 17:48 broken hash use in fs/cifs/dfs_cache.c Al Viro
@ 2021-03-22 18:23 ` Paulo Alcantara
  2021-03-22 18:38 ` Aurélien Aptel
  1 sibling, 0 replies; 4+ messages in thread
From: Paulo Alcantara @ 2021-03-22 18:23 UTC (permalink / raw)
  To: Al Viro, linux-cifs; +Cc: Aurelien Aptel, Steve French, linux-fsdevel

Al Viro <viro@zeniv.linux.org.uk> writes:

> 	Found while trying to untangle some... unidiomatic string handling
> in cifs:
>
> static struct cache_entry *__lookup_cache_entry(const char *path)
> {
>         struct cache_entry *ce;
>         unsigned int h;
>         bool found = false;
>
>         h = cache_entry_hash(path, strlen(path));
>
>         hlist_for_each_entry(ce, &cache_htable[h], hlist) {
>                 if (!strcasecmp(path, ce->path)) {
>                         found = true;
>                         dump_ce(ce);
>                         break;
>                 }
>         }
>
>         if (!found)
>                 ce = ERR_PTR(-ENOENT);
>         return ce;
> }
>
> combined with
>
> static inline unsigned int cache_entry_hash(const void *data, int size)
> {
>         unsigned int h;
>
>         h = jhash(data, size, 0);
>         return h & (CACHE_HTABLE_SIZE - 1);
> }
>
> That can't possibly work.  The fundamental requirement for hashes is that
> lookups for all keys matching an entry *MUST* lead to searches in the same
> hash chain.  Here the test is strcasecmp(), so "foo" and "Foo" are expected
> to match the same entry.  But jhash() yields different values on those -
> it's a general-purpose hash, and it doesn't give a damn about upper and
> lower case letters.  Moreover, even though we look at the value modulo
> 32, I don't believe that it's going to be case-insensitive.

Good catch!  Yes, it is completely broken.

> Either the key comparison or the hash function is wrong here.  *IF* something
> external guarantees the full match, we don't need strcasecmp() - strcmp()
> would work.  Otherwise, the hash function needs to be changed.

Agreed.

I'll look into it.  Thanks!

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: broken hash use in fs/cifs/dfs_cache.c
  2021-03-22 17:48 broken hash use in fs/cifs/dfs_cache.c Al Viro
  2021-03-22 18:23 ` Paulo Alcantara
@ 2021-03-22 18:38 ` Aurélien Aptel
  2021-03-22 20:23   ` Al Viro
  1 sibling, 1 reply; 4+ messages in thread
From: Aurélien Aptel @ 2021-03-22 18:38 UTC (permalink / raw)
  To: Al Viro, linux-cifs; +Cc: Paulo Alcantara, Steve French, linux-fsdevel

Al Viro <viro@zeniv.linux.org.uk> writes:
> Either the key comparison or the hash function is wrong here.  *IF* something
> external guarantees the full match, we don't need strcasecmp() - strcmp()
> would work.  Otherwise, the hash function needs to be changed.

I think here we need to make the hash case-insensitive.

Perhaps calling jhash() with lower-cased bytes like so (pseudo-code):

static inline unsigned int cache_entry_hash(const void *data, int size)
{
	unsigned int h = 0;
        u32 c;
        u8 *s = data;

        while (s < data+size) {
        	int len = decode_char(s, &c);
        	if (!len)
        		break;
		c = tolower(c);
                s += len;
                h = jhash(&c, sizeof(c), h);
        }

	return h % CACHE_HTABLE_SIZE;
}

Cheers,
-- 
Aurélien Aptel / SUSE Labs Samba Team
GPG: 1839 CB5F 9F5B FB9B AA97  8C99 03C8 A49B 521B D5D3
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg, DE
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah HRB 247165 (AG München)


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: broken hash use in fs/cifs/dfs_cache.c
  2021-03-22 18:38 ` Aurélien Aptel
@ 2021-03-22 20:23   ` Al Viro
  0 siblings, 0 replies; 4+ messages in thread
From: Al Viro @ 2021-03-22 20:23 UTC (permalink / raw)
  To: Aurélien Aptel
  Cc: linux-cifs, Paulo Alcantara, Steve French, linux-fsdevel

On Mon, Mar 22, 2021 at 07:38:15PM +0100, Aurélien Aptel wrote:
> Al Viro <viro@zeniv.linux.org.uk> writes:
> > Either the key comparison or the hash function is wrong here.  *IF* something
> > external guarantees the full match, we don't need strcasecmp() - strcmp()
> > would work.  Otherwise, the hash function needs to be changed.
> 
> I think here we need to make the hash case-insensitive.
> 
> Perhaps calling jhash() with lower-cased bytes like so (pseudo-code):

[snip]

Then you really do not want to recalculate it again and again.
Look:

__lookup_cache_entry() calculates it for the key

lookup_cache_entry() contains
        if (cnt < 3) {
                h = cache_entry_hash(path, strlen(path));
                ce = __lookup_cache_entry(path);
                goto out;
        }

... and
                ce = __lookup_cache_entry(npath);
                if (!IS_ERR(ce)) {
                        h = cache_entry_hash(npath, strlen(npath));
                        break;
                }
(in a loop, at that)

Take a look at that the aforementioned loop:
        h = cache_entry_hash(npath, strlen(npath));
        e = npath + strlen(npath) - 1;
        while (e > s) {
                char tmp;

                /* skip separators */
                while (e > s && *e == sep)
                        e--;
                if (e == s)
                        goto out;

                tmp = *(e+1);
                *(e+1) = 0;

                ce = __lookup_cache_entry(npath);
                if (!IS_ERR(ce)) {
                        h = cache_entry_hash(npath, strlen(npath));
                        break;
                }

                *(e+1) = tmp;
                /* backward until separator */
                while (e > s && *e != sep)
                        e--;
        }
We call __lookup_cache_entry() for shorter and shorter prefixes of
npath.  They get NUL-terminated for the duration of __lookup_cache_entry(),
then reverted to the original.  What for?  cache_entry_hash() already
gets length as explicit argument.  And strcasecmp() is trivially
replaced with strncasecmp().

Just have __lookup_cache_entry() take key, hash and length.  Then it
turns into
		len = e + 1 - s;
		hash = cache_entry_hash(path, len);
		ce = __lookup_cache_entry(path, hash, len);
		if (!IS_ERR(ce)) {
			h = hash;
			break;
		}
and we are done.  No need to modify npath contents, undo the modifications
or *have* npath in the first place - the reason the current variant needs to
copy path is precisely that it goes to those contortions.

Incidentally, you also have a problem with trailing separators - anything
with those inserted into hash won't be found by lookup_cache_entry(),
since you trim the trailing separators from the key on searches.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-03-22 20:24 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-22 17:48 broken hash use in fs/cifs/dfs_cache.c Al Viro
2021-03-22 18:23 ` Paulo Alcantara
2021-03-22 18:38 ` Aurélien Aptel
2021-03-22 20:23   ` Al Viro

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.