Re: [PATCH v2 1/3] sparc64: NG4 memset 32 bits overflow

From: Andi Kleen <andi@firstfloor.org>
To: Pasha Tatashin <pasha.tatashin@oracle.com>
Cc: Andi Kleen <andi@firstfloor.org>,
	linux-mm@kvack.org, sparclinux@vger.kernel.org,
	linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH v2 1/3] sparc64: NG4 memset 32 bits overflow
Date: Wed, 1 Mar 2017 15:10:25 -0800	[thread overview]
Message-ID: <20170301231025.GJ26852@two.firstfloor.org> (raw)
In-Reply-To: <1e7db21b-808d-1f47-e78c-7d55c543ae39@oracle.com>

> For example, I am pretty sure that scale value in most places should
> be changed from literal value (inode scale = 14, dentry scale = 13,
> etc to: (PAGE_SHIFT + value): inode scale would become (PAGE_SHIFT +
> 2), dentry scale would become (PAGE_SHIFT + 1), etc. This is because
> we want 1/4 inodes and 1/2 dentries per every page in the system.

This is still far too much for a large system. The algorithm
simply was not designed for TB systems.

It's unlikely to have nowhere near that many small files active, as it's 
better to use the memory for something that is actually useful.

Also even a few hops in the open hash table are normally not a problems
dentry/inode; it is not that file lookups are that critical.

For networking the picture may be different, but I suspect GBs worth of
hash tables are still overkill there (Dave et.al. may have stronger opinions on this) 

I think a upper size (with user override which already exists) is fine,
but if you really don't want to do it then scale the factor down 
very aggressively for larger sizes, so that we don't end up with more
than a few tens of MB.

> This is basically a bug, and would not change the theory, but I am
> sure that changing scales without at least some theoretical backup

One dentry per page would only make sense if the files are zero sized.
If the file even has one byte then it already needs more than 1 page just to
cache the contents (even ignoring inodes and other caches)

With larger files that need multiple pages it makes even less sense.

So clearly one dentry per page theory is nonsense if the files are actually
used.

There is the "make find / + stat fast" case (where only the entries 
and inodes are cached). But even there it is unlikely that the TB system
has a much larger file system with more files than the 100GB system, so
I once a reasonable plateau is reached I don't see why you would want 
to exceed that.

Also the reason to make hash tables big is to minimize collisions,
but we have fairly good hash functions and a few hops worse case 
are likely not a problem for an already expensive file access
or open.

BTW the other option would be to switch all the large system hashes to a
rhashtable and do the resizing only when it is actually needed. 
But that would be more work than just adding a reasonable upper limit.

-Andi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>