All of lore.kernel.org
 help / color / mirror / Atom feed
* Fwd:  NetBSD xl core-dump not working... Memory fault (core dumped)
       [not found] <52770EED.9090804@gmx.de>
@ 2013-11-04 22:13 ` Mike C.
  2013-11-07 10:29   ` Ian Campbell
  0 siblings, 1 reply; 18+ messages in thread
From: Mike C. @ 2013-11-04 22:13 UTC (permalink / raw)
  To: xen-devel, port-xen


On 31.10.13 04:34, Miguel Clara wrote:

> I was trying to get a core-dump for a domU with xl and got this error:
>
> # xl dump-core 20 test.core
> Memory fault
>
> GDB shows this:
>
> a# gdb xl xl.core
> GNU gdb (GDB) 7.3.1
> Copyright (C) 2011 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later<http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64--netbsd".
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>...
> Reading symbols from /usr/sbin/xl...done.
> [New process 1]
> Core was generated by `xl'.
> Program terminated with signal 11, Segmentation fault.
> #0  0x00007f7ff7007b45 in xc_domain_dumpcore_via_callback
> (xch=0x7f7ff7b0d800, domid=20, args=0x7f7fffffdae0,
> dump_rtn=0x7f7ff700632c<local_file_dump>)
>      at xc_core.c:860
> 860     xc_core.c: No such file or directory.
>          in xc_core.c
>
>
> (gdb) backtrace
> #0  0x00007f7ff7007b45 in xc_domain_dumpcore_via_callback
> (xch=0x7f7ff7b0d800, domid=20, args=0x7f7fffffdae0,
> dump_rtn=0x7f7ff700632c<local_file_dump>)
>      at xc_core.c:860
> #1  0x00007f7ff7007fda in xc_domain_dumpcore (xch=0x7f7ff7b0d800,
> domid=20, corename=0x7f7ffffffe78 "test.core") at xc_core.c:983
> #2  0x00007f7ff74117b3 in libxl_domain_core_dump (ctx=0x7f7ff7b03200,
> domid=20, filename=0x7f7ffffffe78 "test.core", ao_how=<optimized out>)
> at libxl.c:808
> #3  0x000000000040f748 in core_dump_domain (filename=0x7f7ffffffe78
> "test.core", domain_spec=<optimized out>) at xl_cmdimpl.c:3301
> #4  main_dump_core (argc=<optimized out>, argv=0x7f7fffffdca0) at
> xl_cmdimpl.c:3642
> #5  0x0000000000407055 in main (argc=3, argv=0x7f7fffffdca0) at xl.c:267
>

I think, xen-devel is the right list for this.
It's ok to cross-post to keep NetBSD people involved for answering
NetBSD specific questions from the Xen/Citrix people that would
be not answered, otherwise.

Christoph

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Fwd: NetBSD xl core-dump not working... Memory fault (core dumped)
  2013-11-04 22:13 ` Fwd: NetBSD xl core-dump not working... Memory fault (core dumped) Mike C.
@ 2013-11-07 10:29   ` Ian Campbell
  2013-11-07 21:04     ` [Xen-devel] " Miguel C.
  0 siblings, 1 reply; 18+ messages in thread
From: Ian Campbell @ 2013-11-07 10:29 UTC (permalink / raw)
  To: Mike C.; +Cc: xen-devel, port-xen

On Mon, 2013-11-04 at 22:13 +0000, Mike C. wrote:
> On 31.10.13 04:34, Miguel Clara wrote:
> 
> > I was trying to get a core-dump for a domU with xl and got this error:
> >
> > # xl dump-core 20 test.core
> > Memory fault
> >
> > GDB shows this:
> >
> > a# gdb xl xl.core
> > GNU gdb (GDB) 7.3.1
> > Copyright (C) 2011 Free Software Foundation, Inc.
> > License GPLv3+: GNU GPL version 3 or later<http://gnu.org/licenses/gpl.html>
> > This is free software: you are free to change and redistribute it.
> > There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> > and "show warranty" for details.
> > This GDB was configured as "x86_64--netbsd".
> > For bug reporting instructions, please see:
> > <http://www.gnu.org/software/gdb/bugs/>...
> > Reading symbols from /usr/sbin/xl...done.
> > [New process 1]
> > Core was generated by `xl'.
> > Program terminated with signal 11, Segmentation fault.
> > #0  0x00007f7ff7007b45 in xc_domain_dumpcore_via_callback
> > (xch=0x7f7ff7b0d800, domid=20, args=0x7f7fffffdae0,
> > dump_rtn=0x7f7ff700632c<local_file_dump>)
> >      at xc_core.c:860

We need to know your version of Xen (ideally the changeset id) to make
sense of these line numbers. Line 860 of this file doesn't look
plausible for unstable or 4.3.0. Could be 4.2 I guess?

> > 860     xc_core.c: No such file or directory.
> >          in xc_core.c
> >
> >
> > (gdb) backtrace
> > #0  0x00007f7ff7007b45 in xc_domain_dumpcore_via_callback
> > (xch=0x7f7ff7b0d800, domid=20, args=0x7f7fffffdae0,
> > dump_rtn=0x7f7ff700632c<local_file_dump>)
> >      at xc_core.c:860
> > #1  0x00007f7ff7007fda in xc_domain_dumpcore (xch=0x7f7ff7b0d800,
> > domid=20, corename=0x7f7ffffffe78 "test.core") at xc_core.c:983
> > #2  0x00007f7ff74117b3 in libxl_domain_core_dump (ctx=0x7f7ff7b03200,
> > domid=20, filename=0x7f7ffffffe78 "test.core", ao_how=<optimized out>)
> > at libxl.c:808
> > #3  0x000000000040f748 in core_dump_domain (filename=0x7f7ffffffe78
> > "test.core", domain_spec=<optimized out>) at xl_cmdimpl.c:3301
> > #4  main_dump_core (argc=<optimized out>, argv=0x7f7fffffdca0) at
> > xl_cmdimpl.c:3642
> > #5  0x0000000000407055 in main (argc=3, argv=0x7f7fffffdca0) at xl.c:267
> >
> 
> I think, xen-devel is the right list for this.
> It's ok to cross-post to keep NetBSD people involved for answering
> NetBSD specific questions from the Xen/Citrix people that would
> be not answered, otherwise.
> 
> Christoph
> 
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xen-devel] Fwd: NetBSD xl core-dump not working... Memory fault (core dumped)
  2013-11-07 10:29   ` Ian Campbell
@ 2013-11-07 21:04     ` Miguel C.
  2013-11-08 10:29       ` Ian Campbell
  0 siblings, 1 reply; 18+ messages in thread
From: Miguel C. @ 2013-11-07 21:04 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel, port-xen

yes its 4.2 from pkgsrc. how can i get the changeset id?

Ian Campbell <Ian.Campbell@citrix.com> wrote:
>On Mon, 2013-11-04 at 22:13 +0000, Mike C. wrote:
>> On 31.10.13 04:34, Miguel Clara wrote:
>> 
>> > I was trying to get a core-dump for a domU with xl and got this
>error:
>> >
>> > # xl dump-core 20 test.core
>> > Memory fault
>> >
>> > GDB shows this:
>> >
>> > a# gdb xl xl.core
>> > GNU gdb (GDB) 7.3.1
>> > Copyright (C) 2011 Free Software Foundation, Inc.
>> > License GPLv3+: GNU GPL version 3 or
>later<http://gnu.org/licenses/gpl.html>
>> > This is free software: you are free to change and redistribute it.
>> > There is NO WARRANTY, to the extent permitted by law.  Type "show
>copying"
>> > and "show warranty" for details.
>> > This GDB was configured as "x86_64--netbsd".
>> > For bug reporting instructions, please see:
>> > <http://www.gnu.org/software/gdb/bugs/>...
>> > Reading symbols from /usr/sbin/xl...done.
>> > [New process 1]
>> > Core was generated by `xl'.
>> > Program terminated with signal 11, Segmentation fault.
>> > #0  0x00007f7ff7007b45 in xc_domain_dumpcore_via_callback
>> > (xch=0x7f7ff7b0d800, domid=20, args=0x7f7fffffdae0,
>> > dump_rtn=0x7f7ff700632c<local_file_dump>)
>> >      at xc_core.c:860
>
>We need to know your version of Xen (ideally the changeset id) to make
>sense of these line numbers. Line 860 of this file doesn't look
>plausible for unstable or 4.3.0. Could be 4.2 I guess?
>
>> > 860     xc_core.c: No such file or directory.
>> >          in xc_core.c
>> >
>> >
>> > (gdb) backtrace
>> > #0  0x00007f7ff7007b45 in xc_domain_dumpcore_via_callback
>> > (xch=0x7f7ff7b0d800, domid=20, args=0x7f7fffffdae0,
>> > dump_rtn=0x7f7ff700632c<local_file_dump>)
>> >      at xc_core.c:860
>> > #1  0x00007f7ff7007fda in xc_domain_dumpcore (xch=0x7f7ff7b0d800,
>> > domid=20, corename=0x7f7ffffffe78 "test.core") at xc_core.c:983
>> > #2  0x00007f7ff74117b3 in libxl_domain_core_dump
>(ctx=0x7f7ff7b03200,
>> > domid=20, filename=0x7f7ffffffe78 "test.core", ao_how=<optimized
>out>)
>> > at libxl.c:808
>> > #3  0x000000000040f748 in core_dump_domain (filename=0x7f7ffffffe78
>> > "test.core", domain_spec=<optimized out>) at xl_cmdimpl.c:3301
>> > #4  main_dump_core (argc=<optimized out>, argv=0x7f7fffffdca0) at
>> > xl_cmdimpl.c:3642
>> > #5  0x0000000000407055 in main (argc=3, argv=0x7f7fffffdca0) at
>xl.c:267
>> >
>> 
>> I think, xen-devel is the right list for this.
>> It's ok to cross-post to keep NetBSD people involved for answering
>> NetBSD specific questions from the Xen/Citrix people that would
>> be not answered, otherwise.
>> 
>> Christoph
>> 
>> 
>> 
>> 
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xen-devel] Fwd: NetBSD xl core-dump not working... Memory fault (core dumped)
  2013-11-07 21:04     ` [Xen-devel] " Miguel C.
@ 2013-11-08 10:29       ` Ian Campbell
  2013-11-08 17:20         ` John Nemeth
  2013-11-12  9:48         ` [Xen-devel] " Roger Pau Monné
  0 siblings, 2 replies; 18+ messages in thread
From: Ian Campbell @ 2013-11-08 10:29 UTC (permalink / raw)
  To: Miguel C.; +Cc: xen-devel, port-xen

On Thu, 2013-11-07 at 21:04 +0000, Miguel C. wrote:
> yes its 4.2 from pkgsrc.

Thanks, that might be enough.

>  how can i get the changeset id?

that'd be one for the port-xen folks I think. It might be printed in the
xen dmesg, but that depends on how it was built and 4.2 may be too old
to have such functionalilty.

> Ian Campbell <Ian.Campbell@citrix.com> wrote:
> >On Mon, 2013-11-04 at 22:13 +0000, Mike C. wrote:
> >> On 31.10.13 04:34, Miguel Clara wrote:
> >> 
> >> > I was trying to get a core-dump for a domU with xl and got this
> >error:
> >> >
> >> > # xl dump-core 20 test.core
> >> > Memory fault
> >> >
> >> > GDB shows this:
> >> >
> >> > a# gdb xl xl.core
> >> > GNU gdb (GDB) 7.3.1
> >> > Copyright (C) 2011 Free Software Foundation, Inc.
> >> > License GPLv3+: GNU GPL version 3 or
> >later<http://gnu.org/licenses/gpl.html>
> >> > This is free software: you are free to change and redistribute it.
> >> > There is NO WARRANTY, to the extent permitted by law.  Type "show
> >copying"
> >> > and "show warranty" for details.
> >> > This GDB was configured as "x86_64--netbsd".
> >> > For bug reporting instructions, please see:
> >> > <http://www.gnu.org/software/gdb/bugs/>...
> >> > Reading symbols from /usr/sbin/xl...done.
> >> > [New process 1]
> >> > Core was generated by `xl'.
> >> > Program terminated with signal 11, Segmentation fault.
> >> > #0  0x00007f7ff7007b45 in xc_domain_dumpcore_via_callback
> >> > (xch=0x7f7ff7b0d800, domid=20, args=0x7f7fffffdae0,
> >> > dump_rtn=0x7f7ff700632c<local_file_dump>)
> >> >      at xc_core.c:860
> >

In 4.2.0 this corresponds to
 memcpy(dump_mem, vaddr, PAGE_SIZE);
which is a plausible source of a segfault.

xc_core.c has only changed in immaterial ways (although ways which
caused all the line numbers to shift) since 4.2.0 AFAICT so it is likely
that this bug is still present.

Can you tell via gdb what the faulting address was and whether it
corresponds to dump_mem or vaddr? gdb's "info locals" might give you at
least some of that? Also you can use disas to identify the precise
instruction at 0x00007f7ff7007b45, which will show you the registers
which might lead you to the faulting address.

vaddr is certainly not NULL, it's checked right before. It could be
non-NULL and still invalid if xc_map_foreign_range were buggy on NetBSD,
but that is surely used elsewhere? I suppose it might have mapped an MFN
which was either invalid (or became invalid, but your bug is
deterministic, right?. IIRC NetBSD's privcmd foreign mappings are
populated lazily and not immediately like on Linux? If that were the
case (and I'm only vaguely aware of how NetBSD operates) then it would
be plausible that xc_map_foreign_range would succeed but that a
subsequent attempt to access the region would fault?

dump_mem isn't NULL, it's a pointer into the dump_mem_start array which
has a check for failure when it is allocated. Since dump_mem is just
normal process memory and vaddr is a magic foreign mapping I'd be
inclined to suspect vaddr was not right in some way...

Does "xl -vvv core-dump" give any useful additional logging?

Unfortunately I don't think anyone has done valgrind support for
debugging processes which use Xen hypercalls for *BSD (if you were very
keen you could probably follow what was done for Linux
http://blog.xen.org/index.php/2013/01/18/using-valgrind-to-debug-xen-toolstacks/
and wire up the BSD privcmd ioctl to the generic Xen hypercall code I
added)

I fear this bug is going to take someone on the ground with a NetBSD
system and the ability to dive into BSD kernel internals to get to the
bottom of...

Ian.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xen-devel] Fwd: NetBSD xl core-dump not working... Memory fault (core dumped)
  2013-11-08 10:29       ` Ian Campbell
@ 2013-11-08 17:20         ` John Nemeth
  2013-11-12  9:35           ` Ian Campbell
  2013-11-12  9:48         ` [Xen-devel] " Roger Pau Monné
  1 sibling, 1 reply; 18+ messages in thread
From: John Nemeth @ 2013-11-08 17:20 UTC (permalink / raw)
  To: Ian Campbell, Miguel C.; +Cc: xen-devel, port-xen

On Nov 8, 10:29am, Ian Campbell wrote:
} On Thu, 2013-11-07 at 21:04 +0000, Miguel C. wrote:
} > yes its 4.2 from pkgsrc.
} 
} Thanks, that might be enough.

     More specifically, it's 4.2.3.

} >  how can i get the changeset id?
} 
} that'd be one for the port-xen folks I think. It might be printed in the
} xen dmesg, but that depends on how it was built and 4.2 may be too old
} to have such functionalilty.

     xl dmesg says:

(XEN) Latest ChangeSet: unavailable

The package was built using this tarball:

http://bits.xensource.com/oss-xen/release/4.2.3/xen-4.2.3.tar.gz

And, just for reference, this is the info we have on the tarball:

SHA1 (xen-4.2.3.tar.gz) = 7c72e1aa870cc938afdc50bd9f2d879118aa8b99
RMD160 (xen-4.2.3.tar.gz) = da0fbb7bbc0796bd83c223f7d21015ce0d4c8553
Size (xen-4.2.3.tar.gz) = 15613235 bytes

} > Ian Campbell <Ian.Campbell@citrix.com> wrote:
} > >On Mon, 2013-11-04 at 22:13 +0000, Mike C. wrote:
} > >> On 31.10.13 04:34, Miguel Clara wrote:
} > >> 
} > >> > I was trying to get a core-dump for a domU with xl and got this
} > >error:
} > >> >
} > >> > # xl dump-core 20 test.core
} > >> > Memory fault
} > >> >
} > >> > GDB shows this:
} > >> >
} > >> > a# gdb xl xl.core
} > >> > GNU gdb (GDB) 7.3.1
} > >> > Copyright (C) 2011 Free Software Foundation, Inc.
} > >> > License GPLv3+: GNU GPL version 3 or
} > >later<http://gnu.org/licenses/gpl.html>
} > >> > This is free software: you are free to change and redistribute it.
} > >> > There is NO WARRANTY, to the extent permitted by law.  Type "show
} > >copying"
} > >> > and "show warranty" for details.
} > >> > This GDB was configured as "x86_64--netbsd".
} > >> > For bug reporting instructions, please see:
} > >> > <http://www.gnu.org/software/gdb/bugs/>...
} > >> > Reading symbols from /usr/sbin/xl...done.
} > >> > [New process 1]
} > >> > Core was generated by `xl'.
} > >> > Program terminated with signal 11, Segmentation fault.
} > >> > #0  0x00007f7ff7007b45 in xc_domain_dumpcore_via_callback
} > >> > (xch=0x7f7ff7b0d800, domid=20, args=0x7f7fffffdae0,
} > >> > dump_rtn=0x7f7ff700632c<local_file_dump>)
} > >> >      at xc_core.c:860
} > >
} 
} In 4.2.0 this corresponds to
}  memcpy(dump_mem, vaddr, PAGE_SIZE);
} which is a plausible source of a segfault.
} 
} xc_core.c has only changed in immaterial ways (although ways which
} caused all the line numbers to shift) since 4.2.0 AFAICT so it is likely
} that this bug is still present.
} 
} Can you tell via gdb what the faulting address was and whether it
} corresponds to dump_mem or vaddr? gdb's "info locals" might give you at
} least some of that? Also you can use disas to identify the precise
} instruction at 0x00007f7ff7007b45, which will show you the registers
} which might lead you to the faulting address.
} 
} vaddr is certainly not NULL, it's checked right before. It could be
} non-NULL and still invalid if xc_map_foreign_range were buggy on NetBSD,
} but that is surely used elsewhere? I suppose it might have mapped an MFN
} which was either invalid (or became invalid, but your bug is
} deterministic, right?. IIRC NetBSD's privcmd foreign mappings are
} populated lazily and not immediately like on Linux? If that were the
} case (and I'm only vaguely aware of how NetBSD operates) then it would
} be plausible that xc_map_foreign_range would succeed but that a
} subsequent attempt to access the region would fault?
} 
} dump_mem isn't NULL, it's a pointer into the dump_mem_start array which
} has a check for failure when it is allocated. Since dump_mem is just
} normal process memory and vaddr is a magic foreign mapping I'd be
} inclined to suspect vaddr was not right in some way...
} 
} Does "xl -vvv core-dump" give any useful additional logging?
} 
} Unfortunately I don't think anyone has done valgrind support for
} debugging processes which use Xen hypercalls for *BSD (if you were very
} keen you could probably follow what was done for Linux
} http://blog.xen.org/index.php/2013/01/18/using-valgrind-to-debug-xen-toolstacks/
} and wire up the BSD privcmd ioctl to the generic Xen hypercall code I
} added)
} 
} I fear this bug is going to take someone on the ground with a NetBSD
} system and the ability to dive into BSD kernel internals to get to the
} bottom of...
} 
} Ian.
} 
}-- End of excerpt from Ian Campbell

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Fwd: NetBSD xl core-dump not working... Memory fault (core dumped)
  2013-11-08 17:20         ` John Nemeth
@ 2013-11-12  9:35           ` Ian Campbell
  2013-11-13 21:31             ` James Harper
  0 siblings, 1 reply; 18+ messages in thread
From: Ian Campbell @ 2013-11-12  9:35 UTC (permalink / raw)
  To: John Nemeth; +Cc: xen-devel, port-xen, Miguel C.

On Fri, 2013-11-08 at 09:20 -0800, John Nemeth wrote:
> On Nov 8, 10:29am, Ian Campbell wrote:
> } On Thu, 2013-11-07 at 21:04 +0000, Miguel C. wrote:
> } > yes its 4.2 from pkgsrc.
> } 
> } Thanks, that might be enough.
> 
>      More specifically, it's 4.2.3.

Thanks. This seems to confirm that it is the memcpy I pointed to below.

I'm afraid that any further progress here is going to require input from
you on the other questions I asked, and perhaps from someone who
understands how the NetBSD kernel (in particular the privcmd driver)
operates.

Ian.

> 
> } >  how can i get the changeset id?
> } 
> } that'd be one for the port-xen folks I think. It might be printed in the
> } xen dmesg, but that depends on how it was built and 4.2 may be too old
> } to have such functionalilty.
> 
>      xl dmesg says:
> 
> (XEN) Latest ChangeSet: unavailable
> 
> The package was built using this tarball:
> 
> http://bits.xensource.com/oss-xen/release/4.2.3/xen-4.2.3.tar.gz
> 
> And, just for reference, this is the info we have on the tarball:
> 
> SHA1 (xen-4.2.3.tar.gz) = 7c72e1aa870cc938afdc50bd9f2d879118aa8b99
> RMD160 (xen-4.2.3.tar.gz) = da0fbb7bbc0796bd83c223f7d21015ce0d4c8553
> Size (xen-4.2.3.tar.gz) = 15613235 bytes
> 
> } > Ian Campbell <Ian.Campbell@citrix.com> wrote:
> } > >On Mon, 2013-11-04 at 22:13 +0000, Mike C. wrote:
> } > >> On 31.10.13 04:34, Miguel Clara wrote:
> } > >> 
> } > >> > I was trying to get a core-dump for a domU with xl and got this
> } > >error:
> } > >> >
> } > >> > # xl dump-core 20 test.core
> } > >> > Memory fault
> } > >> >
> } > >> > GDB shows this:
> } > >> >
> } > >> > a# gdb xl xl.core
> } > >> > GNU gdb (GDB) 7.3.1
> } > >> > Copyright (C) 2011 Free Software Foundation, Inc.
> } > >> > License GPLv3+: GNU GPL version 3 or
> } > >later<http://gnu.org/licenses/gpl.html>
> } > >> > This is free software: you are free to change and redistribute it.
> } > >> > There is NO WARRANTY, to the extent permitted by law.  Type "show
> } > >copying"
> } > >> > and "show warranty" for details.
> } > >> > This GDB was configured as "x86_64--netbsd".
> } > >> > For bug reporting instructions, please see:
> } > >> > <http://www.gnu.org/software/gdb/bugs/>...
> } > >> > Reading symbols from /usr/sbin/xl...done.
> } > >> > [New process 1]
> } > >> > Core was generated by `xl'.
> } > >> > Program terminated with signal 11, Segmentation fault.
> } > >> > #0  0x00007f7ff7007b45 in xc_domain_dumpcore_via_callback
> } > >> > (xch=0x7f7ff7b0d800, domid=20, args=0x7f7fffffdae0,
> } > >> > dump_rtn=0x7f7ff700632c<local_file_dump>)
> } > >> >      at xc_core.c:860
> } > >
> } 
> } In 4.2.0 this corresponds to
> }  memcpy(dump_mem, vaddr, PAGE_SIZE);
> } which is a plausible source of a segfault.
> } 
> } xc_core.c has only changed in immaterial ways (although ways which
> } caused all the line numbers to shift) since 4.2.0 AFAICT so it is likely
> } that this bug is still present.
> } 
> } Can you tell via gdb what the faulting address was and whether it
> } corresponds to dump_mem or vaddr? gdb's "info locals" might give you at
> } least some of that? Also you can use disas to identify the precise
> } instruction at 0x00007f7ff7007b45, which will show you the registers
> } which might lead you to the faulting address.
> } 
> } vaddr is certainly not NULL, it's checked right before. It could be
> } non-NULL and still invalid if xc_map_foreign_range were buggy on NetBSD,
> } but that is surely used elsewhere? I suppose it might have mapped an MFN
> } which was either invalid (or became invalid, but your bug is
> } deterministic, right?. IIRC NetBSD's privcmd foreign mappings are
> } populated lazily and not immediately like on Linux? If that were the
> } case (and I'm only vaguely aware of how NetBSD operates) then it would
> } be plausible that xc_map_foreign_range would succeed but that a
> } subsequent attempt to access the region would fault?
> } 
> } dump_mem isn't NULL, it's a pointer into the dump_mem_start array which
> } has a check for failure when it is allocated. Since dump_mem is just
> } normal process memory and vaddr is a magic foreign mapping I'd be
> } inclined to suspect vaddr was not right in some way...
> } 
> } Does "xl -vvv core-dump" give any useful additional logging?
> } 
> } Unfortunately I don't think anyone has done valgrind support for
> } debugging processes which use Xen hypercalls for *BSD (if you were very
> } keen you could probably follow what was done for Linux
> } http://blog.xen.org/index.php/2013/01/18/using-valgrind-to-debug-xen-toolstacks/
> } and wire up the BSD privcmd ioctl to the generic Xen hypercall code I
> } added)
> } 
> } I fear this bug is going to take someone on the ground with a NetBSD
> } system and the ability to dive into BSD kernel internals to get to the
> } bottom of...
> } 
> } Ian.
> } 
> }-- End of excerpt from Ian Campbell
> 
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xen-devel] Fwd: NetBSD xl core-dump not working... Memory fault (core dumped)
  2013-11-08 10:29       ` Ian Campbell
  2013-11-08 17:20         ` John Nemeth
@ 2013-11-12  9:48         ` Roger Pau Monné
  2013-11-12 10:00           ` Ian Campbell
  1 sibling, 1 reply; 18+ messages in thread
From: Roger Pau Monné @ 2013-11-12  9:48 UTC (permalink / raw)
  To: Ian Campbell, Miguel C.; +Cc: xen-devel, port-xen

On 08/11/13 11:29, Ian Campbell wrote:
> On Thu, 2013-11-07 at 21:04 +0000, Miguel C. wrote:
>> yes its 4.2 from pkgsrc.
> 
> Thanks, that might be enough.
> 
>>  how can i get the changeset id?
> 
> that'd be one for the port-xen folks I think. It might be printed in the
> xen dmesg, but that depends on how it was built and 4.2 may be too old
> to have such functionalilty.
> 
>> Ian Campbell <Ian.Campbell@citrix.com> wrote:
>>> On Mon, 2013-11-04 at 22:13 +0000, Mike C. wrote:
>>>> On 31.10.13 04:34, Miguel Clara wrote:
>>>>
>>>>> I was trying to get a core-dump for a domU with xl and got this
>>> error:
>>>>>
>>>>> # xl dump-core 20 test.core
>>>>> Memory fault
>>>>>
>>>>> GDB shows this:
>>>>>
>>>>> a# gdb xl xl.core
>>>>> GNU gdb (GDB) 7.3.1
>>>>> Copyright (C) 2011 Free Software Foundation, Inc.
>>>>> License GPLv3+: GNU GPL version 3 or
>>> later<http://gnu.org/licenses/gpl.html>
>>>>> This is free software: you are free to change and redistribute it.
>>>>> There is NO WARRANTY, to the extent permitted by law.  Type "show
>>> copying"
>>>>> and "show warranty" for details.
>>>>> This GDB was configured as "x86_64--netbsd".
>>>>> For bug reporting instructions, please see:
>>>>> <http://www.gnu.org/software/gdb/bugs/>...
>>>>> Reading symbols from /usr/sbin/xl...done.
>>>>> [New process 1]
>>>>> Core was generated by `xl'.
>>>>> Program terminated with signal 11, Segmentation fault.
>>>>> #0  0x00007f7ff7007b45 in xc_domain_dumpcore_via_callback
>>>>> (xch=0x7f7ff7b0d800, domid=20, args=0x7f7fffffdae0,
>>>>> dump_rtn=0x7f7ff700632c<local_file_dump>)
>>>>>      at xc_core.c:860
>>>
> 
> In 4.2.0 this corresponds to
>  memcpy(dump_mem, vaddr, PAGE_SIZE);
> which is a plausible source of a segfault.
> 
> xc_core.c has only changed in immaterial ways (although ways which
> caused all the line numbers to shift) since 4.2.0 AFAICT so it is likely
> that this bug is still present.
> 
> Can you tell via gdb what the faulting address was and whether it
> corresponds to dump_mem or vaddr? gdb's "info locals" might give you at
> least some of that? Also you can use disas to identify the precise
> instruction at 0x00007f7ff7007b45, which will show you the registers
> which might lead you to the faulting address.
> 
> vaddr is certainly not NULL, it's checked right before. It could be
> non-NULL and still invalid if xc_map_foreign_range were buggy on NetBSD,
> but that is surely used elsewhere? I suppose it might have mapped an MFN
> which was either invalid (or became invalid, but your bug is
> deterministic, right?. IIRC NetBSD's privcmd foreign mappings are
> populated lazily and not immediately like on Linux? If that were the
> case (and I'm only vaguely aware of how NetBSD operates) then it would
> be plausible that xc_map_foreign_range would succeed but that a
> subsequent attempt to access the region would fault?

Yes, NetBSD privcmd maps the region lazily (it does the actual map on
the page fault handler for that region). I have not tested it, but could
you give a try to the following patch:

http://mail-index.netbsd.org/port-xen/2012/06/27/msg007464.html

It's quite old, but I expect there hasn't been many changes in NetBSD
privcmd recently.

Roger.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xen-devel] Fwd: NetBSD xl core-dump not working... Memory fault (core dumped)
  2013-11-12  9:48         ` [Xen-devel] " Roger Pau Monné
@ 2013-11-12 10:00           ` Ian Campbell
  2013-11-12 10:09             ` Roger Pau Monné
  0 siblings, 1 reply; 18+ messages in thread
From: Ian Campbell @ 2013-11-12 10:00 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: Miguel C., xen-devel, port-xen

On Tue, 2013-11-12 at 10:48 +0100, Roger Pau Monné wrote:
> > vaddr is certainly not NULL, it's checked right before. It could be
> > non-NULL and still invalid if xc_map_foreign_range were buggy on NetBSD,
> > but that is surely used elsewhere? I suppose it might have mapped an MFN
> > which was either invalid (or became invalid, but your bug is
> > deterministic, right?. IIRC NetBSD's privcmd foreign mappings are
> > populated lazily and not immediately like on Linux? If that were the
> > case (and I'm only vaguely aware of how NetBSD operates) then it would
> > be plausible that xc_map_foreign_range would succeed but that a
> > subsequent attempt to access the region would fault?
> 
> Yes, NetBSD privcmd maps the region lazily (it does the actual map on
> the page fault handler for that region).

Thanks for the confirmation. Would it be expected that a message would
be logged to dom0's dmesg if something went wrong here?

>  I have not tested it, but could
> you give a try to the following patch:
> 
> http://mail-index.netbsd.org/port-xen/2012/06/27/msg007464.html
> 
> It's quite old, but I expect there hasn't been many changes in NetBSD
> privcmd recently.
> 
> Roger.
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Fwd: NetBSD xl core-dump not working... Memory fault (core dumped)
  2013-11-12 10:00           ` Ian Campbell
@ 2013-11-12 10:09             ` Roger Pau Monné
  2013-11-13 12:36               ` [Xen-devel] " Miguel C.
  0 siblings, 1 reply; 18+ messages in thread
From: Roger Pau Monné @ 2013-11-12 10:09 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel, port-xen, Miguel C.

On 12/11/13 11:00, Ian Campbell wrote:
> On Tue, 2013-11-12 at 10:48 +0100, Roger Pau Monné wrote:
>> Yes, NetBSD privcmd maps the region lazily (it does the actual map on
>> the page fault handler for that region).
> 
> Thanks for the confirmation. Would it be expected that a message would
> be logged to dom0's dmesg if something went wrong here?

By doing a quick look at current NetBSD privcmd code I'm not sure a
message is printed on all error cases, so it's possible that it just
fails silently. You might get some messages from the hypervisor if
compiled with debug=y, but I have not tried it.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xen-devel] Fwd: NetBSD xl core-dump not working... Memory fault (core dumped)
  2013-11-12 10:09             ` Roger Pau Monné
@ 2013-11-13 12:36               ` Miguel C.
  2013-11-13 12:39                 ` Roger Pau Monné
  0 siblings, 1 reply; 18+ messages in thread
From: Miguel C. @ 2013-11-13 12:36 UTC (permalink / raw)
  To: Roger Pau Monné, Ian Campbell; +Cc: xen-devel, port-xen

I have the xenkernel debug version but in this case you mean the tool right?

I recompile xentools again with debug support pater today or tomorrow and give some more feedback.

Thanks for following up on this so far.


"Roger Pau Monné" <roger.pau@citrix.com> wrote:
>On 12/11/13 11:00, Ian Campbell wrote:
>> On Tue, 2013-11-12 at 10:48 +0100, Roger Pau Monné wrote:
>>> Yes, NetBSD privcmd maps the region lazily (it does the actual map
>on
>>> the page fault handler for that region).
>> 
>> Thanks for the confirmation. Would it be expected that a message
>would
>> be logged to dom0's dmesg if something went wrong here?
>
>By doing a quick look at current NetBSD privcmd code I'm not sure a
>message is printed on all error cases, so it's possible that it just
>fails silently. You might get some messages from the hypervisor if
>compiled with debug=y, but I have not tried it.

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xen-devel] Fwd: NetBSD xl core-dump not working... Memory fault (core dumped)
  2013-11-13 12:36               ` [Xen-devel] " Miguel C.
@ 2013-11-13 12:39                 ` Roger Pau Monné
  2013-11-13 17:59                   ` Miguel C.
  2013-12-03 18:14                   ` Mike C.
  0 siblings, 2 replies; 18+ messages in thread
From: Roger Pau Monné @ 2013-11-13 12:39 UTC (permalink / raw)
  To: Miguel C., Ian Campbell; +Cc: xen-devel, port-xen

On 13/11/13 13:36, Miguel C. wrote:
> I have the xenkernel debug version but in this case you mean the tool right?
> 
> I recompile xentools again with debug support pater today or tomorrow and give some more feedback.

I mean that you need to compile the hypervisor with debug=y. Have you
tried to apply the patch on the link that I've posted to NetBSD source
and rebuild the kernel?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xen-devel] Fwd: NetBSD xl core-dump not working... Memory fault (core dumped)
  2013-11-13 12:39                 ` Roger Pau Monné
@ 2013-11-13 17:59                   ` Miguel C.
  2013-12-03 18:14                   ` Mike C.
  1 sibling, 0 replies; 18+ messages in thread
From: Miguel C. @ 2013-11-13 17:59 UTC (permalink / raw)
  To: Roger Pau Monné, Ian Campbell; +Cc: xen-devel, port-xen


not yet it, and it seems I wont have time today.

I will try that tomorrow.

thanks


"Roger Pau Monné" <roger.pau@citrix.com> wrote:
>On 13/11/13 13:36, Miguel C. wrote:
>> I have the xenkernel debug version but in this case you mean the tool
>right?
>> 
>> I recompile xentools again with debug support pater today or tomorrow
>and give some more feedback.
>
>I mean that you need to compile the hypervisor with debug=y. Have you
>tried to apply the patch on the link that I've posted to NetBSD source
>and rebuild the kernel?

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Fwd: NetBSD xl core-dump not working... Memory fault (core dumped)
  2013-11-12  9:35           ` Ian Campbell
@ 2013-11-13 21:31             ` James Harper
  0 siblings, 0 replies; 18+ messages in thread
From: James Harper @ 2013-11-13 21:31 UTC (permalink / raw)
  To: Ian Campbell, John Nemeth; +Cc: xen-devel, Miguel C., port-xen

> >
> >      More specifically, it's 4.2.3.
> 
> Thanks. This seems to confirm that it is the memcpy I pointed to below.
> 
> I'm afraid that any further progress here is going to require input from
> you on the other questions I asked, and perhaps from someone who
> understands how the NetBSD kernel (in particular the privcmd driver)
> operates.
> 

FWIW, the resulting core file appears to be the right size, and has the ELF header etc, but is missing the section strings.

James

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xen-devel] Fwd: NetBSD xl core-dump not working... Memory fault (core dumped)
  2013-11-13 12:39                 ` Roger Pau Monné
  2013-11-13 17:59                   ` Miguel C.
@ 2013-12-03 18:14                   ` Mike C.
  2013-12-10  8:21                     ` James Harper
  1 sibling, 1 reply; 18+ messages in thread
From: Mike C. @ 2013-12-03 18:14 UTC (permalink / raw)
  To: Roger Pau Monné, Ian Campbell; +Cc: xen-devel, port-xen



On 11/13/13 12:39, Roger Pau Monné wrote:
> On 13/11/13 13:36, Miguel C. wrote:
>> I have the xenkernel debug version but in this case you mean the tool right?
>>
>> I recompile xentools again with debug support pater today or tomorrow and give some more feedback.
> 
> I mean that you need to compile the hypervisor with debug=y. Have you
> tried to apply the patch on the link that I've posted to NetBSD source
> and rebuild the kernel?
> 

Hi, I've rebuilded with the patch + debug a=ynd tried the xl core dump
again, I still get the same issue!

It really seems to fail close to the end (at least judging for the size
of the files)

GDB seems to show similar output, not sure if the debug option should
give more info?!

(gdb) run
Starting program: /usr/sbin/xl -vf dump-core w2k12 core.dump

Program received signal SIGSEGV, Segmentation fault.
[Switching to LWP 1]
0x00007f7ff7007b45 in xc_domain_dumpcore_via_callback
(xch=0x7f7ff7b0d800, domid=32, args=0x7f7fffffdae0,
dump_rtn=0x7f7ff700632c <local_file_dump>) at xc_core.c:860
860     xc_core.c: No such file or directory.
        in xc_core.c
(gdb) backtrace
#0  0x00007f7ff7007b45 in xc_domain_dumpcore_via_callback
(xch=0x7f7ff7b0d800, domid=32, args=0x7f7fffffdae0,
dump_rtn=0x7f7ff700632c <local_file_dump>) at xc_core.c:860
#1  0x00007f7ff7007fda in xc_domain_dumpcore (xch=0x7f7ff7b0d800,
domid=32, corename=0x7f7ffffffe91 "core.dump") at xc_core.c:983
#2  0x00007f7ff74117b3 in libxl_domain_core_dump (ctx=0x7f7ff7b03200,
domid=32, filename=0x7f7ffffffe91 "core.dump", ao_how=<optimized out>)
at libxl.c:808
#3  0x000000000040f748 in core_dump_domain (filename=0x7f7ffffffe91
"core.dump", domain_spec=<optimized out>) at xl_cmdimpl.c:3301
#4  main_dump_core (argc=<optimized out>, argv=0x7f7fffffdca8) at
xl_cmdimpl.c:3642
#5  0x0000000000407055 in main (argc=3, argv=0x7f7fffffdca8) at xl.c:267
(gdb)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Fwd: NetBSD xl core-dump not working... Memory fault (core dumped)
  2013-12-03 18:14                   ` Mike C.
@ 2013-12-10  8:21                     ` James Harper
  2013-12-10  9:27                       ` [Xen-devel] " James Harper
  2013-12-10 10:41                       ` Andrew Cooper
  0 siblings, 2 replies; 18+ messages in thread
From: James Harper @ 2013-12-10  8:21 UTC (permalink / raw)
  To: Mike C., Roger Pau Monné, Ian Campbell; +Cc: xen-devel, port-xen

I've been working with Mike on this today. After he re-applied the patch (something must have gone wrong initially), an ioctl error is repeated constantly instead of SIGSEGV:

xc: error: xc_map_foreign_range: ioctl failed (14 = Bad address): Internal error

I dumped out some of the variables though, and:

nr_memory_map = 1
pfn_start = 0, pfn_end = 1048575

this equates to 4GB of pfn's to be dumped on a vm with mem/maxmem = 256MB... is there code that skips empty pages? If not, that seems to be the explanation for the errors.

James

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [Xen-devel] Fwd: NetBSD xl core-dump not working... Memory fault (core dumped)
  2013-12-10  8:21                     ` James Harper
@ 2013-12-10  9:27                       ` James Harper
  2013-12-10 10:41                       ` Andrew Cooper
  1 sibling, 0 replies; 18+ messages in thread
From: James Harper @ 2013-12-10  9:27 UTC (permalink / raw)
  To: James Harper, Mike C., Roger Pau Monné, Ian Campbell
  Cc: xen-devel, port-xen

> 
> I've been working with Mike on this today. After he re-applied the patch
> (something must have gone wrong initially), an ioctl error is repeated
> constantly instead of SIGSEGV:
> 
> xc: error: xc_map_foreign_range: ioctl failed (14 = Bad address): Internal
> error
> 
> I dumped out some of the variables though, and:
> 
> nr_memory_map = 1
> pfn_start = 0, pfn_end = 1048575
> 
> this equates to 4GB of pfn's to be dumped on a vm with mem/maxmem =
> 256MB... is there code that skips empty pages? If not, that seems to be the
> explanation for the errors.
> 

A bit more info with a bit more debugging printf's, and removing the perror in xc_map_foreign_range:

nr_pages = 64472
nr_memory_map = 1
map_idx = 0
 pfn_start = 0, pfn_end = 1048575
xc: info: j (63456) != nr_pages (64472)

The resulting dump file is readable by my xen->windows dump converter, and the windows debugger doesn't complain about the resulting windows dump file, so it seems to be working okay.

James

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xen-devel] Fwd: NetBSD xl core-dump not working... Memory fault (core dumped)
  2013-12-10  8:21                     ` James Harper
  2013-12-10  9:27                       ` [Xen-devel] " James Harper
@ 2013-12-10 10:41                       ` Andrew Cooper
  2013-12-10 10:46                         ` James Harper
  1 sibling, 1 reply; 18+ messages in thread
From: Andrew Cooper @ 2013-12-10 10:41 UTC (permalink / raw)
  To: James Harper
  Cc: Mike C., Roger Pau Monné, Ian Campbell, xen-devel, port-xen

On 10/12/13 08:21, James Harper wrote:
> I've been working with Mike on this today. After he re-applied the patch (something must have gone wrong initially), an ioctl error is repeated constantly instead of SIGSEGV:
>
> xc: error: xc_map_foreign_range: ioctl failed (14 = Bad address): Internal error
>
> I dumped out some of the variables though, and:
>
> nr_memory_map = 1
> pfn_start = 0, pfn_end = 1048575
>
> this equates to 4GB of pfn's to be dumped on a vm with mem/maxmem = 256MB... is there code that skips empty pages? If not, that seems to be the explanation for the errors.
>
> James

xc_map_foreign_range is completely broken as far as errors go.

The privcmd driver ends up doing:

if ( HYPERVISOR_mmu_update(foo,bar) < 0 )
    return -EFAULT;

Your best bet here is intercepting this and finding the real error.

privcmd (and evenchn and gnttab) devices are generally broken as far as
errors go, because it is impossible to distinguish between a kernel
error and a Xen error.


In someones copious free time, (possibly mine if I ever get any) a brand
new set of ioctls on each of the Xen devices would not go amis.

~Andrew

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Fwd: NetBSD xl core-dump not working... Memory fault (core dumped)
  2013-12-10 10:41                       ` Andrew Cooper
@ 2013-12-10 10:46                         ` James Harper
  0 siblings, 0 replies; 18+ messages in thread
From: James Harper @ 2013-12-10 10:46 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: port-xen, xen-devel, Ian Campbell, Mike C., Roger Pau Monné

> 
> On 10/12/13 08:21, James Harper wrote:
> > I've been working with Mike on this today. After he re-applied the patch
> (something must have gone wrong initially), an ioctl error is repeated
> constantly instead of SIGSEGV:
> >
> > xc: error: xc_map_foreign_range: ioctl failed (14 = Bad address): Internal
> error
> >
> > I dumped out some of the variables though, and:
> >
> > nr_memory_map = 1
> > pfn_start = 0, pfn_end = 1048575
> >
> > this equates to 4GB of pfn's to be dumped on a vm with mem/maxmem =
> > 256MB... is there code that skips empty pages? If not, that seems to be the
> > explanation for the errors.
> >
> > James
> 
> xc_map_foreign_range is completely broken as far as errors go.
> 
> The privcmd driver ends up doing:
> 
> if ( HYPERVISOR_mmu_update(foo,bar) < 0 )
>     return -EFAULT;
> 
> Your best bet here is intercepting this and finding the real error.
> 
> privcmd (and evenchn and gnttab) devices are generally broken as far as
> errors go, because it is impossible to distinguish between a kernel
> error and a Xen error.
> 
> 
> In someones copious free time, (possibly mine if I ever get any) a brand
> new set of ioctls on each of the Xen devices would not go amis.
> 

I think that the core dump stuff just iterates over the whole memory range and skips anything that xc_map_foreign_range returns an error on. After applying the patch that caused the resulting vaddr to sigsegv, the only problem was that it logged an error when trying to map a page. Rmoving that perror is appears to be sufficient for now, although maybe it should only do it on certain errors...

James

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2013-12-10 10:46 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <52770EED.9090804@gmx.de>
2013-11-04 22:13 ` Fwd: NetBSD xl core-dump not working... Memory fault (core dumped) Mike C.
2013-11-07 10:29   ` Ian Campbell
2013-11-07 21:04     ` [Xen-devel] " Miguel C.
2013-11-08 10:29       ` Ian Campbell
2013-11-08 17:20         ` John Nemeth
2013-11-12  9:35           ` Ian Campbell
2013-11-13 21:31             ` James Harper
2013-11-12  9:48         ` [Xen-devel] " Roger Pau Monné
2013-11-12 10:00           ` Ian Campbell
2013-11-12 10:09             ` Roger Pau Monné
2013-11-13 12:36               ` [Xen-devel] " Miguel C.
2013-11-13 12:39                 ` Roger Pau Monné
2013-11-13 17:59                   ` Miguel C.
2013-12-03 18:14                   ` Mike C.
2013-12-10  8:21                     ` James Harper
2013-12-10  9:27                       ` [Xen-devel] " James Harper
2013-12-10 10:41                       ` Andrew Cooper
2013-12-10 10:46                         ` James Harper

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.