All of lore.kernel.org
 help / color / mirror / Atom feed
* segmentation fault in numa_node_to_cpus_v1
@ 2010-11-01 19:52 Michael Spiegel
  2010-11-01 20:39 ` Scott Lurndal
  2010-11-01 22:59 ` Cliff Wickman
  0 siblings, 2 replies; 5+ messages in thread
From: Michael Spiegel @ 2010-11-01 19:52 UTC (permalink / raw)
  To: linux-numa

Hi,

I'm trying to run the HotSpot Java VM on an SGI UV 1000 with 4096
cores.  When I enable the NUMA-aware garbage collection algorithm, I
get a segmentation fault as the virtual machine is initializing.  The
sigsegv is occurring at one of the memcpy's in numa_node_to_cpus_v1,
although I'm afraid I can't determine whether libnuma is being called
correctly or incorrectly.  I am testing on a system that has numactl
2.0.5.

Thanks,
--Michael

#6  <signal handler called>
#7  0x00007f4066fb9ad0 in memcpy () from /lib64/libc.so.6
#8  0x00007f40658d4c6a in numa_node_to_cpus_v1 (node=132, buffer=0x40112d40,
   bufferlen=<value optimized out>) at libnuma.c:1203
#9  0x00007f4066a85255 in os::Linux::rebuild_cpu_to_node_map() ()
  from /usr/ue/0/mspiegel/jdk1.6.0_22/jre/lib/amd64/server/libjvm.so
#10 0x00007f4066a8502f in os::Linux::libnuma_init() ()
  from /usr/ue/0/mspiegel/jdk1.6.0_22/jre/lib/amd64/server/libjvm.so
#11 0x00007f4066a86c38 in os::init_2() ()
  from /usr/ue/0/mspiegel/jdk1.6.0_22/jre/lib/amd64/server/libjvm.so
#12 0x00007f4066b81c4d in Threads::create_vm(JavaVMInitArgs*, bool*) ()
  from /usr/ue/0/mspiegel/jdk1.6.0_22/jre/lib/amd64/server/libjvm.so

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: segmentation fault in numa_node_to_cpus_v1
  2010-11-01 19:52 segmentation fault in numa_node_to_cpus_v1 Michael Spiegel
@ 2010-11-01 20:39 ` Scott Lurndal
  2010-11-01 22:59 ` Cliff Wickman
  1 sibling, 0 replies; 5+ messages in thread
From: Scott Lurndal @ 2010-11-01 20:39 UTC (permalink / raw)
  To: Michael Spiegel; +Cc: linux-numa

On Mon, November 1, 2010 11:52 am, Michael Spiegel wrote:
> Hi,
>
> I'm trying to run the HotSpot Java VM on an SGI UV 1000 with 4096
> cores.  When I enable the NUMA-aware garbage collection algorithm, I
> get a segmentation fault as the virtual machine is initializing.  The
> sigsegv is occurring at one of the memcpy's in numa_node_to_cpus_v1,
> although I'm afraid I can't determine whether libnuma is being called
> correctly or incorrectly.  I am testing on a system that has numactl
> 2.0.5.

I ran into this issue with Oracle 11i.  It was linked against a library
with an older API.   Because I coulnd't change oracle, I modified the
libnuma.so we were using as follows:

libnuma.c:

@@ -1240,7 +1246,7 @@
        }
        return err;
 }
-__asm__(".symver numa_node_to_cpus_v1,numa_node_to_cpus@libnuma_1.1");
+__asm__(".symver numa_node_to_cpus_v2,numa_node_to_cpus@libnuma_1.1");

 /*
  * test whether a node has cpus
@@ -1316,7 +1322,7 @@
        }
        return err;
 }
-__asm__(".symver numa_node_to_cpus_v2,numa_node_to_cpus@@libnuma_1.2");
+__asm__(".symver numa_node_to_cpus_v1,numa_node_to_cpus@@libnuma_1.2");

 make_internal_alias(numa_node_to_cpus_v1);
 make_internal_alias(numa_node_to_cpus_v2);

After replacing the libnuma.so that was being used by oracle with a new
one where I swapped the symbol versions, oracle started working correctly.

scott

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: segmentation fault in numa_node_to_cpus_v1
  2010-11-01 19:52 segmentation fault in numa_node_to_cpus_v1 Michael Spiegel
  2010-11-01 20:39 ` Scott Lurndal
@ 2010-11-01 22:59 ` Cliff Wickman
  2010-11-02  1:10   ` Michael Spiegel
  2010-11-02  1:36   ` Scott Lurndal
  1 sibling, 2 replies; 5+ messages in thread
From: Cliff Wickman @ 2010-11-01 22:59 UTC (permalink / raw)
  To: Michael Spiegel; +Cc: linux-numa

On Mon, Nov 01, 2010 at 03:52:59PM -0400, Michael Spiegel wrote:
> Hi,
> 
> I'm trying to run the HotSpot Java VM on an SGI UV 1000 with 4096
> cores.  When I enable the NUMA-aware garbage collection algorithm, I
> get a segmentation fault as the virtual machine is initializing.  The
> sigsegv is occurring at one of the memcpy's in numa_node_to_cpus_v1,
> although I'm afraid I can't determine whether libnuma is being called
> correctly or incorrectly.  I am testing on a system that has numactl
> 2.0.5.
> 
> Thanks,
> --Michael

Hi Michael,

  I see that Scott Lundal gave you a possible fix.
  There were some important corrections added to the latest version, so
  if you could try building numactl/libnuma from numactl-2.0.6-rc3.tar.gz 
  that would be an interesting test.
  (ftp://oss.sgi.com/www/projects/libnuma/download/)
> 
> #6  <signal handler called>
> #7  0x00007f4066fb9ad0 in memcpy () from /lib64/libc.so.6
> #8  0x00007f40658d4c6a in numa_node_to_cpus_v1 (node=132, buffer=0x40112d40,
>    bufferlen=<value optimized out>) at libnuma.c:1203
> #9  0x00007f4066a85255 in os::Linux::rebuild_cpu_to_node_map() ()
>   from /usr/ue/0/mspiegel/jdk1.6.0_22/jre/lib/amd64/server/libjvm.so
> #10 0x00007f4066a8502f in os::Linux::libnuma_init() ()
>   from /usr/ue/0/mspiegel/jdk1.6.0_22/jre/lib/amd64/server/libjvm.so
> #11 0x00007f4066a86c38 in os::init_2() ()
>   from /usr/ue/0/mspiegel/jdk1.6.0_22/jre/lib/amd64/server/libjvm.so
> #12 0x00007f4066b81c4d in Threads::create_vm(JavaVMInitArgs*, bool*) ()
>   from /usr/ue/0/mspiegel/jdk1.6.0_22/jre/lib/amd64/server/libjvm.so
> --
> To unsubscribe from this list: send the line "unsubscribe linux-numa" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Cliff Wickman
SGI
cpw@sgi.com
(651) 683-3824

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: segmentation fault in numa_node_to_cpus_v1
  2010-11-01 22:59 ` Cliff Wickman
@ 2010-11-02  1:10   ` Michael Spiegel
  2010-11-02  1:36   ` Scott Lurndal
  1 sibling, 0 replies; 5+ messages in thread
From: Michael Spiegel @ 2010-11-02  1:10 UTC (permalink / raw)
  To: linux-numa

Hi everyone,

I tried the suggestions from Cliff and Scott, with no change in
behavior.  I tried some primitive debugging and noticed that
NUMA_NUM_NODES was 128 and the node argument to numa_node_to_cpus_v1
was 132.  When I changed the definition of NUMA_NUM_NODES in numa.h to
2048, I can eliminate the segmentation fault.  Now I'm getting "mbind:
invalid argument" errors.

Thanks,
--Michael

On Mon, Nov 1, 2010 at 6:59 PM, Cliff Wickman <cpw@sgi.com> wrote:
> On Mon, Nov 01, 2010 at 03:52:59PM -0400, Michael Spiegel wrote:
>> Hi,
>>
>> I'm trying to run the HotSpot Java VM on an SGI UV 1000 with 4096
>> cores.  When I enable the NUMA-aware garbage collection algorithm, I
>> get a segmentation fault as the virtual machine is initializing.  The
>> sigsegv is occurring at one of the memcpy's in numa_node_to_cpus_v1,
>> although I'm afraid I can't determine whether libnuma is being called
>> correctly or incorrectly.  I am testing on a system that has numactl
>> 2.0.5.
>>
>> Thanks,
>> --Michael
>
> Hi Michael,
>
>  I see that Scott Lundal gave you a possible fix.
>  There were some important corrections added to the latest version, so
>  if you could try building numactl/libnuma from numactl-2.0.6-rc3.tar.gz
>  that would be an interesting test.
>  (ftp://oss.sgi.com/www/projects/libnuma/download/)
>>
>> #6  <signal handler called>
>> #7  0x00007f4066fb9ad0 in memcpy () from /lib64/libc.so.6
>> #8  0x00007f40658d4c6a in numa_node_to_cpus_v1 (node=132, buffer=0x40112d40,
>>    bufferlen=<value optimized out>) at libnuma.c:1203
>> #9  0x00007f4066a85255 in os::Linux::rebuild_cpu_to_node_map() ()
>>   from /usr/ue/0/mspiegel/jdk1.6.0_22/jre/lib/amd64/server/libjvm.so
>> #10 0x00007f4066a8502f in os::Linux::libnuma_init() ()
>>   from /usr/ue/0/mspiegel/jdk1.6.0_22/jre/lib/amd64/server/libjvm.so
>> #11 0x00007f4066a86c38 in os::init_2() ()
>>   from /usr/ue/0/mspiegel/jdk1.6.0_22/jre/lib/amd64/server/libjvm.so
>> #12 0x00007f4066b81c4d in Threads::create_vm(JavaVMInitArgs*, bool*) ()
>>   from /usr/ue/0/mspiegel/jdk1.6.0_22/jre/lib/amd64/server/libjvm.so
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-numa" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
> --
> Cliff Wickman
> SGI
> cpw@sgi.com
> (651) 683-3824
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: segmentation fault in numa_node_to_cpus_v1
  2010-11-01 22:59 ` Cliff Wickman
  2010-11-02  1:10   ` Michael Spiegel
@ 2010-11-02  1:36   ` Scott Lurndal
  1 sibling, 0 replies; 5+ messages in thread
From: Scott Lurndal @ 2010-11-02  1:36 UTC (permalink / raw)
  To: Cliff Wickman; +Cc: Michael Spiegel, linux-numa

On Mon, Nov 01, 2010 at 05:59:42PM -0500, Cliff Wickman wrote:
> On Mon, Nov 01, 2010 at 03:52:59PM -0400, Michael Spiegel wrote:
> > Hi,
> > 
> > I'm trying to run the HotSpot Java VM on an SGI UV 1000 with 4096
> > cores.  When I enable the NUMA-aware garbage collection algorithm, I
> > get a segmentation fault as the virtual machine is initializing.  The
> > sigsegv is occurring at one of the memcpy's in numa_node_to_cpus_v1,
> > although I'm afraid I can't determine whether libnuma is being called
> > correctly or incorrectly.  I am testing on a system that has numactl
> > 2.0.5.
> > 
> > Thanks,
> > --Michael
> 
> Hi Michael,
> 
>   I see that Scott Lundal gave you a possible fix.
>   There were some important corrections added to the latest version, so
>   if you could try building numactl/libnuma from numactl-2.0.6-rc3.tar.gz 
>   that would be an interesting test.
>   (ftp://oss.sgi.com/www/projects/libnuma/download/)

Hi Cliff,

   I suspect that Michael will find the same problem with the newest
numactl release;   the problem is that oracle (in my case) and the JVM
(in Michael's case) don't use the 'dlvsym' function after dynamically
loading (dlopen) the libnuma library, they just use 'dlsym'.  Thus they'll get
the wrong API for 'numa_node_to_cpus' instead of the one for the
version the jvm was coded for.   We've seen a seg fault if the
wrong version of numa_node_to_cpus is called because of the signature
change.

  My patch just changes the symbol versioning to swap the _v1 and _v2
default.  Of course, any properly written applications cannot use this
modified library, so it's best to load it with LD_LIBRARY_PATH rather
than replacing the system library.

scott

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2010-11-02  1:36 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-11-01 19:52 segmentation fault in numa_node_to_cpus_v1 Michael Spiegel
2010-11-01 20:39 ` Scott Lurndal
2010-11-01 22:59 ` Cliff Wickman
2010-11-02  1:10   ` Michael Spiegel
2010-11-02  1:36   ` Scott Lurndal

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.