linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* block allocator issue with ext4+DAX
@ 2016-03-30 22:01 Ross Zwisler
  2016-03-31  8:59 ` Jan Kara
  0 siblings, 1 reply; 3+ messages in thread
From: Ross Zwisler @ 2016-03-30 22:01 UTC (permalink / raw)
  To: Jan Kara, Theodore Ts'o; +Cc: linux-ext4, linux-fsdevel, linux-kernel

I've hit an issue in my testing which I believe to be related to the ext4
block allocator when using the DAX mount option.  I originally found this
issue with the generic/102 xfstest, but have reduced it to the minimal
reproducer at the bottom of this email.  I've been able to reproduce this with
both BRD and with PMEM as the underlying block device.

For this test we're running in a very small filesystem, only 512 MiB.  We
fallocate() 400 MiB of that space, unlink the file, then try and rewrite that
400 MiB file one chunk at a time.

What actually happens is that during the rewrite we run out of memory and the
DAX call to get_block() in dax_io() fails with -ENOSPC.

Here are the steps to reproduce this issue:

  # fdisk -l /dev/ram0
  Disk /dev/ram0: 1 GiB, 1073741824 bytes, 2097152 sectors
  Units: sectors of 1 * 512 = 512 bytes
  Sector size (logical/physical): 512 bytes / 4096 bytes
  I/O size (minimum/optimal): 4096 bytes / 4096 bytes
  
  # mkfs.ext4 /dev/ram0 512M
  
  # mount /dev/ram0 /mnt
  
  # gcc -o test test.c
  
  # ./test	# success!
  
  # umount /mnt
  
  # mount -o dax /dev/ram0 /mnt	# requires CONFIG_BLK_DEV_RAM_DAX
  
  # ./test	# failure
  Partial write - only 577536 written

This test succeeds with xfs, ext2, and with ext4 without the DAX mount option.
I've also tried it with O_DIRECT, and that has the same behavior - we succeed
without DAX and fail with DAX.

Another clue is that a sync() call in the middle of the test between the
unlink and the following writes clears up the issue.

Something that might be related is the output in
/proc/fs/ext4/ram0/mb_groups.  Here is that output when we're in a good
state, and the writes will succeed:

#group: free  frags first [ 2^0   2^1   2^2   2^3   2^4   2^5   2^6   2^7 2^8   2^9   2^10  2^11  2^12  2^13  ]
#0    : 30673 1     2095  [ 1     0     0     0     1     0     1     1     1 1     1     0     1     3     ]
#1    : 32735 1     33    [ 1     1     1     1     1     0     1     1     1 1     1     1     1     3     ]
#2    : 28672 1     4096  [ 0     0     0     0     0     0     0     0     0 0     0     0     1     3     ]
#3    : 32735 1     33    [ 1     1     1     1     1     0     1     1     1 1     1     1     1     3     ]

Here is the output in that file when we're in a bad state, and our writes are
about to fail:

#group: free  frags first [ 2^0   2^1   2^2   2^3   2^4   2^5   2^6   2^7   2^8   2^9   2^10  2^11  2^12  2^13  ]
#0    : 18385 1     14383 [ 1     0     0     0     1     0     1     1     1     1     1     0     0     2     ]
#1    : 2015  1     33    [ 1     1     1     1     1     0     1     1     1     1     1     0     0     0     ]
#2    : 0     0     32768 [ 0     0     0     0     0     0     0     0     0     0     0     0     0     0     ]
#3    : 2015  1     33    [ 1     1     1     1     1     0     1     1     1     1     1     0     0     0     ]

It appears as though we've exhausted group #2.  Interestingly, if I run sync()
at this point it takes us from the bad output to the good, which leads me to
believe the newly unlinked blocks in group #2 are finally being freed back
into that group for reallocation or something. (I've clearly reached the
limits of my ext4-fu. :)  )

I'm happy to help test proposed fixes.

Thanks,
- Ross

---
#define _GNU_SOURCE
#include <sys/stat.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

#define MB(a) ((a)*1024ULL*1024)

int main(int argc, char *argv[])
{
	int i, fd, ret;
	void *buffer; 

	buffer = malloc(MB(1));

	fd = open("/mnt/file", O_RDWR|O_CREAT, S_IRUSR|S_IWUSR);
	if (fd < 0) {
		perror("fd");
		return 1;
	}

	ret = fallocate(fd, 0, 0, MB(400));
	if (ret) {
		perror("fallocate");
		return 1;
	}
	close(fd);

	unlink("/mnt/file");

	/* a sync() call here makes the DAX case of this test pass */
//	sync();

	fd = open("/mnt/file", O_RDWR|O_CREAT, S_IRUSR|S_IWUSR);
	if (fd < 0) {
		perror("fd");
		return 1;
	}

	for (i = 0; i < 400; i++) {
		ret = write(fd, buffer, MB(1));

		if (ret < 0) {
			perror("write");
			return 1;
		} else if (ret != MB(1)) {
			fprintf(stderr, "Partial write - only %lu written\n",
					ret);
			return 1;
		}
	}

	close(fd);
	free(buffer);
	return 0;
}

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: block allocator issue with ext4+DAX
  2016-03-30 22:01 block allocator issue with ext4+DAX Ross Zwisler
@ 2016-03-31  8:59 ` Jan Kara
  2016-03-31 15:13   ` Ross Zwisler
  0 siblings, 1 reply; 3+ messages in thread
From: Jan Kara @ 2016-03-31  8:59 UTC (permalink / raw)
  To: Ross Zwisler
  Cc: Jan Kara, Theodore Ts'o, linux-ext4, linux-fsdevel, linux-kernel

On Wed 30-03-16 16:01:29, Ross Zwisler wrote:
> I've hit an issue in my testing which I believe to be related to the ext4
> block allocator when using the DAX mount option.  I originally found this
> issue with the generic/102 xfstest, but have reduced it to the minimal
> reproducer at the bottom of this email.  I've been able to reproduce this with
> both BRD and with PMEM as the underlying block device.
> 
> For this test we're running in a very small filesystem, only 512 MiB.  We
> fallocate() 400 MiB of that space, unlink the file, then try and rewrite that
> 400 MiB file one chunk at a time.
> 
> What actually happens is that during the rewrite we run out of memory and the
> DAX call to get_block() in dax_io() fails with -ENOSPC.

Yes, I have already sent a fix for this bug here:

http://www.spinics.net/lists/linux-ext4/msg51649.html

Ted, can you please pick it up? Thanks!

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: block allocator issue with ext4+DAX
  2016-03-31  8:59 ` Jan Kara
@ 2016-03-31 15:13   ` Ross Zwisler
  0 siblings, 0 replies; 3+ messages in thread
From: Ross Zwisler @ 2016-03-31 15:13 UTC (permalink / raw)
  To: Jan Kara
  Cc: Ross Zwisler, Theodore Ts'o, linux-ext4, linux-fsdevel, linux-kernel

On Thu, Mar 31, 2016 at 10:59:25AM +0200, Jan Kara wrote:
> On Wed 30-03-16 16:01:29, Ross Zwisler wrote:
> > I've hit an issue in my testing which I believe to be related to the ext4
> > block allocator when using the DAX mount option.  I originally found this
> > issue with the generic/102 xfstest, but have reduced it to the minimal
> > reproducer at the bottom of this email.  I've been able to reproduce this with
> > both BRD and with PMEM as the underlying block device.
> > 
> > For this test we're running in a very small filesystem, only 512 MiB.  We
> > fallocate() 400 MiB of that space, unlink the file, then try and rewrite that
> > 400 MiB file one chunk at a time.
> > 
> > What actually happens is that during the rewrite we run out of memory and the
> > DAX call to get_block() in dax_io() fails with -ENOSPC.
> 
> Yes, I have already sent a fix for this bug here:
> 
> http://www.spinics.net/lists/linux-ext4/msg51649.html
> 
> Ted, can you please pick it up? Thanks!
> 
> 								Honza

Yay!

Ted, you can add my

Tested-by: Ross Zwisler <ross.zwisler@linux.intel.com>

to that patch.

Thanks for the fix, Jan!

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2016-03-31 15:14 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-30 22:01 block allocator issue with ext4+DAX Ross Zwisler
2016-03-31  8:59 ` Jan Kara
2016-03-31 15:13   ` Ross Zwisler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).