linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Can't mlock hugetlb in 2.6.15
@ 2006-01-18 19:49 Don Dupuis
  2006-01-21  7:52 ` Andrew Morton
  0 siblings, 1 reply; 8+ messages in thread
From: Don Dupuis @ 2006-01-18 19:49 UTC (permalink / raw)
  To: linux-kernel

I have an app that mlocks hugepages. The same app works just fine in 2.6.14.
This app has 128MB or more of shared memory that is using hugepages via
mmap. When I try this, I get the error "can't allocate memory".  Is this a
kernel bug or is this not supported anymore.  I want to guarantee that
this memory doesn't get swapped out to a swap device. I made the same
modifications to include/linux/resource.h that was in 2.6.14, which
set MLOCK_LIMIT to 2GB.

Thanks

Don

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Can't mlock hugetlb in 2.6.15
  2006-01-18 19:49 Can't mlock hugetlb in 2.6.15 Don Dupuis
@ 2006-01-21  7:52 ` Andrew Morton
  2006-01-21 14:12   ` Nick Piggin
  0 siblings, 1 reply; 8+ messages in thread
From: Andrew Morton @ 2006-01-21  7:52 UTC (permalink / raw)
  To: Don Dupuis; +Cc: linux-kernel, Nick Piggin, Hugh Dickins

Don Dupuis <dondster@gmail.com> wrote:
>
> I have an app that mlocks hugepages. The same app works just fine in 2.6.14.
> This app has 128MB or more of shared memory that is using hugepages via
> mmap. When I try this, I get the error "can't allocate memory".  Is this a
> kernel bug or is this not supported anymore.  I want to guarantee that
> this memory doesn't get swapped out to a swap device.

hugetlb areas are not pageable and it's very unlikely that they will become
so in the forseeable future.  So you don't need to do this.

That being said, we shouldn't have broken your application.

I guess a suitable back-compatibility fix would be to check for a hugetlb
vma early on and return "success" for that vma section without actually
doing anything.

But we need to understand why this happened.

> I made the same
> modifications to include/linux/resource.h that was in 2.6.14, which
> set MLOCK_LIMIT to 2GB.
> 

That's rather naughty of you ;) You're supposed to use setrlimit() in a
parent process for this...


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Can't mlock hugetlb in 2.6.15
  2006-01-21  7:52 ` Andrew Morton
@ 2006-01-21 14:12   ` Nick Piggin
  2006-01-23  2:32     ` Don Dupuis
  0 siblings, 1 reply; 8+ messages in thread
From: Nick Piggin @ 2006-01-21 14:12 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Don Dupuis, linux-kernel, Hugh Dickins

Andrew Morton wrote:
> Don Dupuis <dondster@gmail.com> wrote:
> 
>>I have an app that mlocks hugepages. The same app works just fine in 2.6.14.
>>This app has 128MB or more of shared memory that is using hugepages via
>>mmap. When I try this, I get the error "can't allocate memory".  Is this a
>>kernel bug or is this not supported anymore.  I want to guarantee that
>>this memory doesn't get swapped out to a swap device.
> 
> 
> hugetlb areas are not pageable and it's very unlikely that they will become
> so in the forseeable future.  So you don't need to do this.
> 
> That being said, we shouldn't have broken your application.
> 

Yep, and it does not sound unreasonable to have mlock succeed on hugepage
areas (though I'm not reading any standardese). And you wouldn't expect
mlockall to fail if an app is using hugepages either.

I don't have an idea off the top of my head though. Don, an strace log of
the failing sequence of syscalls could be helpful.

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Can't mlock hugetlb in 2.6.15
  2006-01-21 14:12   ` Nick Piggin
@ 2006-01-23  2:32     ` Don Dupuis
  2006-01-23 19:51       ` Hugh Dickins
  0 siblings, 1 reply; 8+ messages in thread
From: Don Dupuis @ 2006-01-23  2:32 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Andrew Morton, linux-kernel

On 1/21/06, Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> Andrew Morton wrote:
> > Don Dupuis <dondster@gmail.com> wrote:
> >
> >>I have an app that mlocks hugepages. The same app works just fine in 2.6.14.
> >>This app has 128MB or more of shared memory that is using hugepages via
> >>mmap. When I try this, I get the error "can't allocate memory".  Is this a
> >>kernel bug or is this not supported anymore.  I want to guarantee that
> >>this memory doesn't get swapped out to a swap device.
> >
> >
> > hugetlb areas are not pageable and it's very unlikely that they will become
> > so in the forseeable future.  So you don't need to do this.
> >
> > That being said, we shouldn't have broken your application.
> >
>
> Yep, and it does not sound unreasonable to have mlock succeed on hugepage
> areas (though I'm not reading any standardese). And you wouldn't expect
> mlockall to fail if an app is using hugepages either.
>
> I don't have an idea off the top of my head though. Don, an strace log of
> the failing sequence of syscalls could be helpful.
>
> --
> SUSE Labs, Novell Inc.
> Send instant messages to your online friends http://au.messenger.yahoo.com
>
>
This first program sets everything up. The directory /pivot3/mem is
mounted on a hugetlbfs filesystem. Here is the strace output of
sducstart:


execve("/pivot3/bin/sducstart", ["/pivot3/bin/sducstart"], [/* 17 vars */]) = 0
uname({sys="Linux", node="DB-FVVQK61", ...}) = 0
brk(0)                                  = 0x804b000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY)      = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=10819, ...}) = 0
old_mmap(NULL, 10819, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7f45000
close(3)                                = 0
open("/lib/tls/libpthread.so.0", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0@G\0\000"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=105916, ...}) = 0
old_mmap(NULL, 70128, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE,
3, 0) = 0xb7f33000
old_mmap(0xb7f41000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xd000) = 0xb7f41000
old_mmap(0xb7f43000, 4592, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb7f43000
close(3)                                = 0
open("/lib/tls/librt.so.1", O_RDONLY)   = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\340 \0"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=49096, ...}) = 0
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
-1, 0) = 0xb7f32000
old_mmap(NULL, 81912, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE,
3, 0) = 0xb7f1e000
old_mmap(0xb7f26000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x7000) = 0xb7f26000
old_mmap(0xb7f28000, 40952, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb7f28000
close(3)                                = 0
open("/usr/lib/libaio.so.1.0.0", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0$\4\0\000"...,
512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=2764, ...}) = 0
old_mmap(NULL, 6120, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE,
3, 0) = 0xb7f1c000
old_mmap(0xb7f1d000, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0) = 0xb7f1d000
close(3)                                = 0
open("/usr/lib/libncurses.so.5", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\240\341"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=878185, ...}) = 0
old_mmap(NULL, 264076, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE,
3, 0) = 0xb7edb000
old_mmap(0xb7f13000, 32768, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x38000) = 0xb7f13000
old_mmap(0xb7f1b000, 1932, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb7f1b000
close(3)                                = 0
open("/usr/lib/liblwres.so.1", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0 #\0\000"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=59620, ...}) = 0
old_mmap(NULL, 62556, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE,
3, 0) = 0xb7ecb000
old_mmap(0xb7eda000, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xe000) = 0xb7eda000
close(3)                                = 0
open("/lib/libnsl.so.1", O_RDONLY)      = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0p:\0\000"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=94216, ...}) = 0
old_mmap(NULL, 88288, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE,
3, 0) = 0xb7eb5000
old_mmap(0xb7ec7000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x11000) = 0xb7ec7000
old_mmap(0xb7ec9000, 6368, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb7ec9000
close(3)                                = 0
open("/lib/libuuid.so.1", O_RDONLY)     = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0 \n\0\000"...,
512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=8232, ...}) = 0
old_mmap(NULL, 11132, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE,
3, 0) = 0xb7eb2000
old_mmap(0xb7eb4000, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1000) = 0xb7eb4000
close(3)                                = 0
open("/usr/local/lib/libdbxml-2.1.so", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\360N\7"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=21882033, ...}) = 0
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
-1, 0) = 0xb7eb1000
old_mmap(NULL, 1548048, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb7d37000
old_mmap(0xb7ea9000, 32768, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x172000) = 0xb7ea9000
close(3)                                = 0
open("/usr/local/lib/libdb_cxx-4.3.so", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\4~\1\000"...,
512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=922992, ...}) = 0
old_mmap(NULL, 827964, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE,
3, 0) = 0xb7c6c000
old_mmap(0xb7d34000, 12288, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xc7000) = 0xb7d34000
close(3)                                = 0
open("/usr/local/lib/libpathan.so.3", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0L!\n\000"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=3402296, ...}) = 0
old_mmap(NULL, 2614400, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb79ed000
old_mmap(0xb7bcc000, 651264, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1de000) = 0xb7bcc000
old_mmap(0xb7c6b000, 1152, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb7c6b000
close(3)                                = 0
open("/usr/local/lib/libxerces-c.so.26", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0H\325\16"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=4115828, ...}) = 0
old_mmap(NULL, 3328916, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb76c0000
old_mmap(0xb79bd000, 196608, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2fc000) = 0xb79bd000
close(3)                                = 0
open("/usr/local/lib/libxquery-1.1.so", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\330\373"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=4217955, ...}) = 0
old_mmap(NULL, 716312, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE,
3, 0) = 0xb7611000
old_mmap(0xb76bd000, 12288, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xab000) = 0xb76bd000
close(3)                                = 0
open("/lib/libcrypt.so.1", O_RDONLY)    = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\340\7\0"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=26940, ...}) = 0
old_mmap(NULL, 184636, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE,
3, 0) = 0xb75e3000
old_mmap(0xb75e8000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x4000) = 0xb75e8000
old_mmap(0xb75ea000, 155964, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb75ea000
close(3)                                = 0
open("/lib/tls/libc.so.6", O_RDONLY)    = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\260K\1"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=1488740, ...}) = 0
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
-1, 0) = 0xb75e2000
old_mmap(NULL, 1195116, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb74be000
old_mmap(0xb75dc000, 16384, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x11d000) = 0xb75dc000
old_mmap(0xb75e0000, 7276, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb75e0000
close(3)                                = 0
open("/usr/lib/libstdc++.so.5", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\0\276\3"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=739700, ...}) = 0
old_mmap(NULL, 759124, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE,
3, 0) = 0xb7404000
old_mmap(0xb74b4000, 20480, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xb0000) = 0xb74b4000
old_mmap(0xb74b9000, 17748, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb74b9000
close(3)                                = 0
open("/lib/tls/libm.so.6", O_RDONLY)    = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\0003\0"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=212692, ...}) = 0
old_mmap(NULL, 139424, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE,
3, 0) = 0xb73e1000
old_mmap(0xb7402000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x20000) = 0xb7402000
close(3)                                = 0
open("/lib/libgcc_s.so.1", O_RDONLY)    = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\f\25\0"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=29404, ...}) = 0
old_mmap(NULL, 32216, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE,
3, 0) = 0xb73d9000
old_mmap(0xb73e0000, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x6000) = 0xb73e0000
close(3)                                = 0
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
-1, 0) = 0xb73d8000
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
-1, 0) = 0xb73d7000
mprotect(0xb7402000, 4096, PROT_READ)   = 0
mprotect(0xb75dc000, 8192, PROT_READ)   = 0
mprotect(0xb75e8000, 4096, PROT_READ)   = 0
mprotect(0xb7ec7000, 4096, PROT_READ)   = 0
mprotect(0xb7f26000, 4096, PROT_READ)   = 0
mprotect(0xb7f41000, 4096, PROT_READ)   = 0
set_thread_area({entry_number:-1 -> 6, base_addr:0xb73d7080,
limit:1048575, seg_32bit:1, contents:0, read_exec_only:0,
limit_in_pages:1, seg_not_present:0, useable:1}) = 0
munmap(0xb7f45000, 10819)               = 0
set_tid_address(0xb73d70c8)             = 10775
rt_sigaction(SIGRTMIN, {0xb7f373a0, [], SA_SIGINFO}, NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [RTMIN], NULL, 8) = 0
getrlimit(RLIMIT_STACK, {rlim_cur=8192*1024, rlim_max=RLIM_INFINITY}) = 0
_sysctl({{CTL_KERN, KERN_VERSION}, 2, 0xbfc59ec8, 35, (nil), 0}) = 0
brk(0)                                  = 0x804b000
brk(0x806c000)                          = 0x806c000
brk(0x808d000)                          = 0x808d000
brk(0x80b2000)                          = 0x80b2000
pipe([3, 4])                            = 0
clone(child_stack=0,
flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
child_tidptr=0xb73d70c8) = 10776
close(4)                                = 0
fstat64(3, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
0) = 0xb7f47000
read(3, "420\n", 4096)                  = 4
--- SIGCHLD (Child exited) @ 0 (0) ---
close(3)                                = 0
waitpid(10776, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0) = 10776
munmap(0xb7f47000, 4096)                = 0
pipe([3, 4])                            = 0
clone(child_stack=0,
flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
child_tidptr=0xb73d70c8) = 10777
close(4)                                = 0
fstat64(3, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
0) = 0xb7f47000
read(3, "4096\n", 4096)                 = 5
close(3)                                = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
waitpid(10777, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0) = 10777
munmap(0xb7f47000, 4096)                = 0
fstat64(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(4, 64), ...}) = 0
ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, {B115200 opost isig icanon
echo ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
0) = 0xb7f47000
write(1, "SDUC_InitGetAvailableHugePages: "..., 73) = 73
open("/pivot3/mem/sduc", O_RDWR|O_CREAT, 0666) = 3
mmap2(NULL, 1761607680, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_LOCKED,
3, 0) = 0x4e000000
write(1, "SDUC_CreateShareMemObject: creat"..., 51) = 51
munmap(0x4e000000, 1761607680)          = 0
close(3)                                = 0
statfs("/dev/shm/", {f_type=0x1021994, f_bsize=4096, f_blocks=259412,
f_bfree=259412, f_bavail=259412, f_files=223977, f_ffree=223976,
f_fsid={0, 0}, f_namelen=255, f_frsize=4096}) = 0
futex(0xb7f27258, FUTEX_WAKE, 2147483647) = 0
open("/dev/shm/__PROFILER_POSIX_SHAREDMEM_OBJECT__",
O_RDWR|O_CREAT|O_EXCL|O_NOFOLLOW, 0666) = 3
fcntl64(3, F_GETFD)                     = 0
fcntl64(3, F_SETFD, FD_CLOEXEC)         = 0
ftruncate(3, 131072)                    = 0
mmap2(NULL, 131072, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = 0xb73b7000
close(3)                                = 0
munmap(0xb7f47000, 4096)                = 0
exit_group(0)                           = ?


This is the strace output of sductest that is a test program to access
the shared memory that was setup by sducstart:

execve("/pivot3/bin/SDUCTest", ["/pivot3/bin/SDUCTest"], [/* 17 vars */]) = 0
uname({sys="Linux", node="DB-FVVQK61", ...}) = 0
brk(0)                                  = 0x804f000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY)      = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=10819, ...}) = 0
old_mmap(NULL, 10819, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7f6e000
close(3)                                = 0
open("/lib/tls/libpthread.so.0", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0@G\0\000"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=105916, ...}) = 0
old_mmap(NULL, 70128, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE,
3, 0) = 0xb7f5c000
old_mmap(0xb7f6a000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xd000) = 0xb7f6a000
old_mmap(0xb7f6c000, 4592, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb7f6c000
close(3)                                = 0
open("/lib/tls/librt.so.1", O_RDONLY)   = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\340 \0"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=49096, ...}) = 0
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
-1, 0) = 0xb7f5b000
old_mmap(NULL, 81912, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE,
3, 0) = 0xb7f47000
old_mmap(0xb7f4f000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x7000) = 0xb7f4f000
old_mmap(0xb7f51000, 40952, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb7f51000
close(3)                                = 0
open("/usr/lib/libaio.so.1.0.0", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0$\4\0\000"...,
512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=2764, ...}) = 0
old_mmap(NULL, 6120, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE,
3, 0) = 0xb7f45000
old_mmap(0xb7f46000, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0) = 0xb7f46000
close(3)                                = 0
open("/usr/lib/libncurses.so.5", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\240\341"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=878185, ...}) = 0
old_mmap(NULL, 264076, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE,
3, 0) = 0xb7f04000
old_mmap(0xb7f3c000, 32768, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x38000) = 0xb7f3c000
old_mmap(0xb7f44000, 1932, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb7f44000
close(3)                                = 0
open("/usr/lib/liblwres.so.1", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0 #\0\000"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=59620, ...}) = 0
old_mmap(NULL, 62556, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE,
3, 0) = 0xb7ef4000
old_mmap(0xb7f03000, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xe000) = 0xb7f03000
close(3)                                = 0
open("/lib/libnsl.so.1", O_RDONLY)      = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0p:\0\000"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=94216, ...}) = 0
old_mmap(NULL, 88288, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE,
3, 0) = 0xb7ede000
old_mmap(0xb7ef0000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x11000) = 0xb7ef0000
old_mmap(0xb7ef2000, 6368, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb7ef2000
close(3)                                = 0
open("/lib/libuuid.so.1", O_RDONLY)     = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0 \n\0\000"...,
512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=8232, ...}) = 0
old_mmap(NULL, 11132, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE,
3, 0) = 0xb7edb000
old_mmap(0xb7edd000, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1000) = 0xb7edd000
close(3)                                = 0
open("/usr/local/lib/libdbxml-2.1.so", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\360N\7"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=21882033, ...}) = 0
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
-1, 0) = 0xb7eda000
old_mmap(NULL, 1548048, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb7d60000
old_mmap(0xb7ed2000, 32768, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x172000) = 0xb7ed2000
close(3)                                = 0
open("/usr/local/lib/libdb_cxx-4.3.so", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\4~\1\000"...,
512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=922992, ...}) = 0
old_mmap(NULL, 827964, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE,
3, 0) = 0xb7c95000
old_mmap(0xb7d5d000, 12288, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xc7000) = 0xb7d5d000
close(3)                                = 0
open("/usr/local/lib/libpathan.so.3", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0L!\n\000"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=3402296, ...}) = 0
old_mmap(NULL, 2614400, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb7a16000
old_mmap(0xb7bf5000, 651264, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1de000) = 0xb7bf5000
old_mmap(0xb7c94000, 1152, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb7c94000
close(3)                                = 0
open("/usr/local/lib/libxerces-c.so.26", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0H\325\16"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=4115828, ...}) = 0
old_mmap(NULL, 3328916, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb76e9000
old_mmap(0xb79e6000, 196608, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2fc000) = 0xb79e6000
close(3)                                = 0
open("/usr/local/lib/libxquery-1.1.so", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\330\373"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=4217955, ...}) = 0
old_mmap(NULL, 716312, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE,
3, 0) = 0xb763a000
old_mmap(0xb76e6000, 12288, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xab000) = 0xb76e6000
close(3)                                = 0
open("/lib/libcrypt.so.1", O_RDONLY)    = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\340\7\0"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=26940, ...}) = 0
old_mmap(NULL, 184636, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE,
3, 0) = 0xb760c000
old_mmap(0xb7611000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x4000) = 0xb7611000
old_mmap(0xb7613000, 155964, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb7613000
close(3)                                = 0
open("/lib/tls/libc.so.6", O_RDONLY)    = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\260K\1"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=1488740, ...}) = 0
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
-1, 0) = 0xb760b000
old_mmap(NULL, 1195116, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb74e7000
old_mmap(0xb7605000, 16384, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x11d000) = 0xb7605000
old_mmap(0xb7609000, 7276, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb7609000
close(3)                                = 0
open("/usr/lib/libstdc++.so.5", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\0\276\3"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=739700, ...}) = 0
old_mmap(NULL, 759124, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE,
3, 0) = 0xb742d000
old_mmap(0xb74dd000, 20480, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xb0000) = 0xb74dd000
old_mmap(0xb74e2000, 17748, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb74e2000
close(3)                                = 0
open("/lib/tls/libm.so.6", O_RDONLY)    = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\0003\0"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=212692, ...}) = 0
old_mmap(NULL, 139424, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE,
3, 0) = 0xb740a000
old_mmap(0xb742b000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x20000) = 0xb742b000
close(3)                                = 0
open("/lib/libgcc_s.so.1", O_RDONLY)    = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\f\25\0"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=29404, ...}) = 0
old_mmap(NULL, 32216, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE,
3, 0) = 0xb7402000
old_mmap(0xb7409000, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x6000) = 0xb7409000
close(3)                                = 0
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
-1, 0) = 0xb7401000
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
-1, 0) = 0xb7400000
mprotect(0xb742b000, 4096, PROT_READ)   = 0
mprotect(0xb7605000, 8192, PROT_READ)   = 0
mprotect(0xb7611000, 4096, PROT_READ)   = 0
mprotect(0xb7ef0000, 4096, PROT_READ)   = 0
mprotect(0xb7f4f000, 4096, PROT_READ)   = 0
mprotect(0xb7f6a000, 4096, PROT_READ)   = 0
set_thread_area({entry_number:-1 -> 6, base_addr:0xb7400080,
limit:1048575, seg_32bit:1, contents:0, read_exec_only:0,
limit_in_pages:1, seg_not_present:0, useable:1}) = 0
munmap(0xb7f6e000, 10819)               = 0
set_tid_address(0xb74000c8)             = 10780
rt_sigaction(SIGRTMIN, {0xb7f603a0, [], SA_SIGINFO}, NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [RTMIN], NULL, 8) = 0
getrlimit(RLIMIT_STACK, {rlim_cur=8192*1024, rlim_max=RLIM_INFINITY}) = 0
_sysctl({{CTL_KERN, KERN_VERSION}, 2, 0xbf885598, 35, (nil), 0}) = 0
brk(0)                                  = 0x804f000
brk(0x8070000)                          = 0x8070000
brk(0x8091000)                          = 0x8091000
brk(0x80b6000)                          = 0x80b6000
open("/pivot3/mem/sduc", O_RDWR)        = 3
mmap2(NULL, 4194304, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_LOCKED, 3,
0) = -1 ENOMEM (Cannot allocate memory)
close(3)                                = 0
fstat64(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(4, 64), ...}) = 0
ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, {B115200 opost isig icanon
echo ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
0) = 0xb6fff000
write(1, "SDUC_MapShareMemObject SDUC shar"..., 57) = 57
unlink("/pivot3/mem/sduc")              = 0
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
+++ killed by SIGSEGV +++

Thanks

Don

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Can't mlock hugetlb in 2.6.15
  2006-01-23  2:32     ` Don Dupuis
@ 2006-01-23 19:51       ` Hugh Dickins
  2006-01-23 20:53         ` Don Dupuis
                           ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Hugh Dickins @ 2006-01-23 19:51 UTC (permalink / raw)
  To: Don Dupuis
  Cc: Nick Piggin, Andrew Morton, Adam Litke, William Irwin, linux-kernel

On Sun, 22 Jan 2006, Don Dupuis wrote:
> On 1/21/06, Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> > Andrew Morton wrote:
> > > Don Dupuis <dondster@gmail.com> wrote:
> > >>I have an app that mlocks hugepages. The same app works just fine in 2.6.14.
> > > That being said, we shouldn't have broken your application.
> > Don, an strace log of the failing sequence of syscalls could be helpful.
> 
> sducstart:
> open("/pivot3/mem/sduc", O_RDWR|O_CREAT, 0666) = 3
> mmap2(NULL, 1761607680, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_LOCKED,
> 3, 0) = 0x4e000000
> 
> This is the strace output of sductest that is a test program to access
> the shared memory that was setup by sducstart:
> open("/pivot3/mem/sduc", O_RDWR)        = 3
> mmap2(NULL, 4194304, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_LOCKED, 3,
> 0) = -1 ENOMEM (Cannot allocate memory)

Thanks a lot for the strace, that indeed helped to track it down.

This has nothing to do with mlock or MAP_LOCKED - which by the way do
make more sense in 2.6.15, since they provide a way of prefaulting the
hugepage area like in earlier releases (now hugepages are being faulted
in on demand, though never paged out, as Andrew said).

Please try the patch below, and let us know if it works for you - thanks.
Looks like we'll need this in 2.6.16-rc-git and 2.6.15-stable.


2.6.15's hugepage faulting introduced huge_pages_needed accounting into
hugetlbfs: to count how many pages are already in cache, for spot check
on how far a new mapping may be allowed to extend the file.  But it's
muddled: each hugepage found covers HPAGE_SIZE, not PAGE_SIZE.  Once
pages were already in cache, it would overshoot, wrap its hugepages
count backwards, and so fail a harmless repeat mapping with -ENOMEM.
Fixes the problem found by Don Dupuis.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
---

 fs/hugetlbfs/inode.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

--- 2.6.15/fs/hugetlbfs/inode.c	2006-01-03 03:21:10.000000000 +0000
+++ linux/fs/hugetlbfs/inode.c	2006-01-23 18:39:47.000000000 +0000
@@ -71,8 +71,8 @@ huge_pages_needed(struct address_space *
 	unsigned long start = vma->vm_start;
 	unsigned long end = vma->vm_end;
 	unsigned long hugepages = (end - start) >> HPAGE_SHIFT;
-	pgoff_t next = vma->vm_pgoff;
-	pgoff_t endpg = next + ((end - start) >> PAGE_SHIFT);
+	pgoff_t next = vma->vm_pgoff >> (HPAGE_SHIFT - PAGE_SHIFT);
+	pgoff_t endpg = next + hugepages;
 
 	pagevec_init(&pvec, 0);
 	while (next < endpg) {

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Can't mlock hugetlb in 2.6.15
  2006-01-23 19:51       ` Hugh Dickins
@ 2006-01-23 20:53         ` Don Dupuis
  2006-01-23 21:34         ` Adam Litke
  2006-01-23 23:52         ` William Lee Irwin III
  2 siblings, 0 replies; 8+ messages in thread
From: Don Dupuis @ 2006-01-23 20:53 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Nick Piggin, Andrew Morton, Adam Litke, William Irwin, linux-kernel

On 1/23/06, Hugh Dickins <hugh@veritas.com> wrote:
> On Sun, 22 Jan 2006, Don Dupuis wrote:
> > On 1/21/06, Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> > > Andrew Morton wrote:
> > > > Don Dupuis <dondster@gmail.com> wrote:
> > > >>I have an app that mlocks hugepages. The same app works just fine in 2.6.14.
> > > > That being said, we shouldn't have broken your application.
> > > Don, an strace log of the failing sequence of syscalls could be helpful.
> >
> > sducstart:
> > open("/pivot3/mem/sduc", O_RDWR|O_CREAT, 0666) = 3
> > mmap2(NULL, 1761607680, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_LOCKED,
> > 3, 0) = 0x4e000000
> >
> > This is the strace output of sductest that is a test program to access
> > the shared memory that was setup by sducstart:
> > open("/pivot3/mem/sduc", O_RDWR)        = 3
> > mmap2(NULL, 4194304, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_LOCKED, 3,
> > 0) = -1 ENOMEM (Cannot allocate memory)
>
> Thanks a lot for the strace, that indeed helped to track it down.
>
> This has nothing to do with mlock or MAP_LOCKED - which by the way do
> make more sense in 2.6.15, since they provide a way of prefaulting the
> hugepage area like in earlier releases (now hugepages are being faulted
> in on demand, though never paged out, as Andrew said).
>
> Please try the patch below, and let us know if it works for you - thanks.
> Looks like we'll need this in 2.6.16-rc-git and 2.6.15-stable.
>
>
> 2.6.15's hugepage faulting introduced huge_pages_needed accounting into
> hugetlbfs: to count how many pages are already in cache, for spot check
> on how far a new mapping may be allowed to extend the file.  But it's
> muddled: each hugepage found covers HPAGE_SIZE, not PAGE_SIZE.  Once
> pages were already in cache, it would overshoot, wrap its hugepages
> count backwards, and so fail a harmless repeat mapping with -ENOMEM.
> Fixes the problem found by Don Dupuis.
>
> Signed-off-by: Hugh Dickins <hugh@veritas.com>
> ---
>
>  fs/hugetlbfs/inode.c |    4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
>
> --- 2.6.15/fs/hugetlbfs/inode.c 2006-01-03 03:21:10.000000000 +0000
> +++ linux/fs/hugetlbfs/inode.c  2006-01-23 18:39:47.000000000 +0000
> @@ -71,8 +71,8 @@ huge_pages_needed(struct address_space *
>         unsigned long start = vma->vm_start;
>         unsigned long end = vma->vm_end;
>         unsigned long hugepages = (end - start) >> HPAGE_SHIFT;
> -       pgoff_t next = vma->vm_pgoff;
> -       pgoff_t endpg = next + ((end - start) >> PAGE_SHIFT);
> +       pgoff_t next = vma->vm_pgoff >> (HPAGE_SHIFT - PAGE_SHIFT);
> +       pgoff_t endpg = next + hugepages;
>
>         pagevec_init(&pvec, 0);
>         while (next < endpg) {
>

This patch fixed my problem.

Thanks very much

Don

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Can't mlock hugetlb in 2.6.15
  2006-01-23 19:51       ` Hugh Dickins
  2006-01-23 20:53         ` Don Dupuis
@ 2006-01-23 21:34         ` Adam Litke
  2006-01-23 23:52         ` William Lee Irwin III
  2 siblings, 0 replies; 8+ messages in thread
From: Adam Litke @ 2006-01-23 21:34 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Don Dupuis, Nick Piggin, Andrew Morton, William Irwin, linux-kernel

Aye.

On Mon, 2006-01-23 at 19:51 +0000, Hugh Dickins wrote:
> Thanks a lot for the strace, that indeed helped to track it down.
> 
> This has nothing to do with mlock or MAP_LOCKED - which by the way do
> make more sense in 2.6.15, since they provide a way of prefaulting the
> hugepage area like in earlier releases (now hugepages are being faulted
> in on demand, though never paged out, as Andrew said).
> 
> Please try the patch below, and let us know if it works for you - thanks.
> Looks like we'll need this in 2.6.16-rc-git and 2.6.15-stable.
> 
> 
> 2.6.15's hugepage faulting introduced huge_pages_needed accounting into
> hugetlbfs: to count how many pages are already in cache, for spot check
> on how far a new mapping may be allowed to extend the file.  But it's
> muddled: each hugepage found covers HPAGE_SIZE, not PAGE_SIZE.  Once
> pages were already in cache, it would overshoot, wrap its hugepages
> count backwards, and so fail a harmless repeat mapping with -ENOMEM.
> Fixes the problem found by Don Dupuis.
> 
> Signed-off-by: Hugh Dickins <hugh@veritas.com>
Acked-By: Adam Litke <agl@us.ibm.com>
> ---
> 
>  fs/hugetlbfs/inode.c |    4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
> 
> --- 2.6.15/fs/hugetlbfs/inode.c	2006-01-03 03:21:10.000000000 +0000
> +++ linux/fs/hugetlbfs/inode.c	2006-01-23 18:39:47.000000000 +0000
> @@ -71,8 +71,8 @@ huge_pages_needed(struct address_space *
>  	unsigned long start = vma->vm_start;
>  	unsigned long end = vma->vm_end;
>  	unsigned long hugepages = (end - start) >> HPAGE_SHIFT;
> -	pgoff_t next = vma->vm_pgoff;
> -	pgoff_t endpg = next + ((end - start) >> PAGE_SHIFT);
> +	pgoff_t next = vma->vm_pgoff >> (HPAGE_SHIFT - PAGE_SHIFT);
> +	pgoff_t endpg = next + hugepages;
>  
>  	pagevec_init(&pvec, 0);
>  	while (next < endpg) {
> 
> 
-- 
Adam Litke - (agl at us.ibm.com)
IBM Linux Technology Center


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Can't mlock hugetlb in 2.6.15
  2006-01-23 19:51       ` Hugh Dickins
  2006-01-23 20:53         ` Don Dupuis
  2006-01-23 21:34         ` Adam Litke
@ 2006-01-23 23:52         ` William Lee Irwin III
  2 siblings, 0 replies; 8+ messages in thread
From: William Lee Irwin III @ 2006-01-23 23:52 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Don Dupuis, Nick Piggin, Andrew Morton, Adam Litke, linux-kernel

On Mon, Jan 23, 2006 at 07:51:51PM +0000, Hugh Dickins wrote:
> This has nothing to do with mlock or MAP_LOCKED - which by the way do
> make more sense in 2.6.15, since they provide a way of prefaulting the
> hugepage area like in earlier releases (now hugepages are being faulted
> in on demand, though never paged out, as Andrew said).
> Please try the patch below, and let us know if it works for you - thanks.
> Looks like we'll need this in 2.6.16-rc-git and 2.6.15-stable.
> 2.6.15's hugepage faulting introduced huge_pages_needed accounting into
> hugetlbfs: to count how many pages are already in cache, for spot check
> on how far a new mapping may be allowed to extend the file.  But it's
> muddled: each hugepage found covers HPAGE_SIZE, not PAGE_SIZE.  Once
> pages were already in cache, it would overshoot, wrap its hugepages
> count backwards, and so fail a harmless repeat mapping with -ENOMEM.
> Fixes the problem found by Don Dupuis.
> Signed-off-by: Hugh Dickins <hugh@veritas.com>

Acked-by: William Irwin <wli@holomorphy.com>

A unit conversion error, as usual. It's difficult to understand why
such a natural decision as to use only one radix tree entry per
hugepage is so difficult to cope with. If only my eyes had been sharp
enough to catch it on its way in.

Excellent detective work as always. Thanks again, Hugh.


-- wli

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2006-01-23 23:52 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-01-18 19:49 Can't mlock hugetlb in 2.6.15 Don Dupuis
2006-01-21  7:52 ` Andrew Morton
2006-01-21 14:12   ` Nick Piggin
2006-01-23  2:32     ` Don Dupuis
2006-01-23 19:51       ` Hugh Dickins
2006-01-23 20:53         ` Don Dupuis
2006-01-23 21:34         ` Adam Litke
2006-01-23 23:52         ` William Lee Irwin III

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).