tmpfs fails fallocate(more than DRAM)

* tmpfs fails fallocate(more than DRAM)
@ 2019-02-18 13:34 Adam Borowski
  2019-02-18 15:15 ` Matthew Wilcox
  2019-02-18 20:25 ` Adam Borowski
  0 siblings, 2 replies; 5+ messages in thread
From: Adam Borowski @ 2019-02-18 13:34 UTC (permalink / raw)
  To: linux-mm, linux-fsdevel; +Cc: Marcin Ślusarz

Hi!
There's something that looks like a bug in tmpfs' implementation of
fallocate.  If you try to fallocate more than the available DRAM (yet
with plenty of swap space), it will evict everything swappable out
then fail, undoing all the work done so far first.

The returned error is ENOMEM rather than POSIX mandated ENOSPC (for
posix_allocate(), but our documentation doesn't mention ENOMEM for
Linux-specific fallocate() either).

Doing the same allocation in multiple calls -- be it via non-overlapping
calls or even with same offset but increasing len -- works as expected.

An example:
Machine has 32GB RAM, minus 4GB memmapped as fake pmem.  No big tasks
(X, some shells, browser, ...).  Run ｢while :;do free -m;done｣ on another
terminal, then:

# mount -osize=64G -t tmpfs none /mnt/vol1
# chown you /mnt/vol1
$ cd /mnt/vol1
$ fallocate -l 32G foo
fallocate: fallocate failed: Cannot allocate memory
$ fallocate -l 28G foo
fallocate: fallocate failed: Cannot allocate memory
$ fallocate -l 27G foo
fallocate: fallocate failed: Cannot allocate memory
$ fallocate -l 26G foo
$ fallocate -l 52G foo

It takes a few seconds for the allocation to succeed, then a couple for it
to be torn down if it fails.  More if it has to writeout the zeroes it
allocated in the previous call.

This raises multiple questions:
* why would fallocate bother to prefault the memory instead of just
  reserving it?  We want to kill overcommit, but reserving swap is as good
  -- if there's memory pressure, our big allocation will be evicted anyway.
* why does it insist on doing everything in one piece?  Biggest chunk I
  see to be beneficial is 1G (for hugepages).
* when it fails, why does it undo the work done so far?  This can matter
  for other reasons, such as EINTR -- and fallocate isn't expected to be
  atomic anyway.
* if I'm wrong and atomicity+prefaulting are desired, why does fallocate
  forces just the delta (pages not yet allocated) to reside in core, rather
  than the entire requested range?

Thus, I believe fallocate on tmpfs should behave consistently with other
filesystems and succeed unless we run into ENOSPC.

Am I missing something?

Meow!
-- 
⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢠⠒⠀⣿⡁
⢿⡄⠘⠷⠚⠋⠀ Have you accepted Khorne as your lord and saviour?
⠈⠳⣄⠀⠀⠀⠀

^ permalink raw reply	[flat|nested] 5+ messages in thread