All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jerin Jacob <jerinjacobk@gmail.com>
To: Stephen Hemminger <stephen@networkplumber.org>
Cc: Jerin Jacob <jerinj@marvell.com>, dpdk-dev <dev@dpdk.org>,
	 Bruce Richardson <bruce.richardson@intel.com>,
	Ray Kinsella <mdr@ashroe.eu>,
	 Thomas Monjalon <thomas@monjalon.net>,
	David Marchand <david.marchand@redhat.com>,
	 Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>,
	 Narcisa Ana Maria Vasile <navasile@linux.microsoft.com>,
	 "Dmitry Malloy (MESHCHANINOV)" <dmitrym@microsoft.com>,
	Pallavi Kadam <pallavi.kadam@intel.com>,
	 "Ananyev, Konstantin" <konstantin.ananyev@intel.com>,
	 "Ruifeng Wang (Arm Technology China)" <ruifeng.wang@arm.com>,
	Jan Viktorin <viktorin@rehivetech.com>,
	 David Christensen <drc@linux.vnet.ibm.com>
Subject: Re: [dpdk-dev] [PATCH v2 1/6] eal: introduce oops handling API
Date: Wed, 18 Aug 2021 15:07:25 +0530	[thread overview]
Message-ID: <CALBAE1PJPE7jOQTgBsUXncTqoB5zoBk47rGptsoSj5-=2oEQJw@mail.gmail.com> (raw)
In-Reply-To: <20210817085231.16be26c5@hermes.local>

On Tue, Aug 17, 2021 at 9:22 PM Stephen Hemminger
<stephen@networkplumber.org> wrote:
>
> On Tue, 17 Aug 2021 20:57:50 +0530
> Jerin Jacob <jerinjacobk@gmail.com> wrote:
>
> > On Tue, Aug 17, 2021 at 8:39 PM Stephen Hemminger
> > <stephen@networkplumber.org> wrote:
> > >
> > > On Tue, 17 Aug 2021 13:08:46 +0530
> > > Jerin Jacob <jerinjacobk@gmail.com> wrote:
> > >
> > > > On Tue, Aug 17, 2021 at 9:23 AM Stephen Hemminger
> > > > <stephen@networkplumber.org> wrote:
> > > > >
> > > > > On Tue, 17 Aug 2021 08:57:18 +0530
> > > > > <jerinj@marvell.com> wrote:
> > > > >
> > > > > > From: Jerin Jacob <jerinj@marvell.com>
> > > > > >
> > > > > > Introducing oops handling API with following specification
> > > > > > and enable stub implementation for Linux and FreeBSD.
> > > > > >
> > > > > > On rte_eal_init() invocation, the EAL library installs the
> > > > > > oops handler for the essential signals.
> > > > > > The rte_oops_signals_enabled() API provides the list
> > > > > > of signals the library installed by the EAL.
> > > > >
> > > > > This is a big change, and many applications already handle these
> > > > > signals themselves. Therefore adding this needs to be opt-in
> > > > > and not enabled by default.
> > > >
> > > > In order to avoid every application explicitly register this
> > > > sighandler and to cater to the
> > > > co-existing application-specific signal-hander usage.
> > > > The following design has been chosen. (It is mentioned in the commit log,
> > > > I will describe here for more clarity)
> > > >
> > > > Case 1:
> > > > a) The application installs the signal handler prior to rte_eal_init().
> > > > b) Implementation stores the application-specific signal and replace a
> > > > signal handler as oops eal handler
> > > > c) when application/DPDK get the segfault, the default EAL oops
> > > > handler gets invoked
> > > > d) Then it dumps the EAL specific message, it calls the
> > > > application-specific signal handler
> > > > installed in step 1 by application. This avoids breaking any contract
> > > > with the application.
> > > > i.e Behavior is the same current EAL now.
> > > > That is the reason for not using SA_RESETHAND(which call SIG_DFL after
> > > > eal oops handler instead
> > > > application-specific handler)
> > > >
> > > > Case 2:
> > > > a) The application install the signal handler after rte_eal_init(),
> > > > b) EAL hander get replaced with application handle then the application can call
> > > > rte_oops_decode() to decode.
> > > >
> > > > In order to cater the above use case, rte_oops_signals_enabled() and
> > > > rte_oops_decode()
> > > > provided.
> > > >
> > > > Here we are not breaking any contract with the application.
> > > > Do you have concerns about this design?
> > >
> > > In our application as a service it is important not to do any backtrace
> > > in production. We rely on other infrastructure to process coredumps.
> >
> > Other infrastructure will work. For example, If we are using standard coredump
> > using linux infra. In Current implementation,
> > - EAL handler dump the DPDK OOPS like kernel on stderr
> > - Implementation calls SIG_DFL in eal oops handler
> > - The above step creates the coredump or re-directs any other
> > infrastructure you are using for coredump.
> >
> > >
> > > This should be controlled enabled by a command line argument.
> >
> > If we allow other infrastructure coredump to work as-is, why
> > enable/disable required from eal?
>
> The addition of DPDK OOPS adds additional steps which make all
> faults be identified as the oops code.

Since we are using SA_ONSTACK it is not losing the original segfault
info.

I verified like this, Please find below the steps.

0) Enable coredump infra in Linux using coredumpctl or so
1) Apply this series
2) Apply for the following patch to create a segfault from the library.
This will test, segfault caught by eal and forward to default Linux singal
handler.

[main]dell[dpdk.org] $ git diff
diff --git a/lib/eal/linux/eal.c b/lib/eal/linux/eal.c
index 3438a96b75..b935c32c98 100644
--- a/lib/eal/linux/eal.c
+++ b/lib/eal/linux/eal.c
@@ -1338,6 +1338,8 @@ rte_eal_init(int argc, char **argv)

        eal_mcfg_complete();

+       /* Generate a segfault */
+       *(volatile int *)0x05 = 0;
        return fctret;

 }
3)Build
meson --buildtype debug build
ninja -C build

4) Run
$ ./build/app/test/dpdk-test --no-huge  -c 0x2

Please find oops dump[1] and gdb core dump backtrace[2].
Gdb core dump trace preserves the original segfault cause and trace.

Any other concerns?


[1]
[main]dell[dpdk.org] $ ./build/app/test/dpdk-test --no-huge  -c 0x2
EAL: Detected 56 lcore(s)
EAL: Detected 2 NUMA nodes
EAL: Static memory layout is selected, amount of reserved memory can
be adjusted with -m or --socket-mem
EAL: Detected static linkage of DPDK
EAL: Multi-process socket /run/user/1000/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'VA'
EAL: WARNING: Main core has no memory on local socket!
Signal info:
------------
PID:           2666512
Signal number: 11
Fault address: 0x5

Backtrace:
----------
[  0x5582acd1e08a]: rte_eal_init()+0xe18
[  0x5582ac086f4e]: main()+0x298
[  0x7f0facf1fb25]: __libc_start_main()+0xd5
[  0x5582ac079c9e]: _start()+0x2e

Arch info:
----------
R8 : 0x0000000000000002  R9 : 0x00007ffe9273c590
R10: 0x0000000000000000  R11: 0x0000000000000246
R12: 0x00005582bc3ce7a0  R13: 0x00000000000000ca
R14: 0x0000000000000000  R15: 0x0000000000000000
RAX: 0x0000000000000005  RBX: 0x00005582bc3c75c8
RCX: 0x00007ffe9273c530  RDX: 0x0000000000000000
RBP: 0x00007ffe9273c820  RSP: 0x00007ffe9273c690
RSI: 0x0000000000000008  RDI: 0x00000000000000ca
RIP: 0x00005582acd1e08a  EFL: 0x0000000000010246


[2]

Core was generated by `./build/app/test/dpdk-test --no-huge -c 0x2'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  rte_eal_init (argc=4, argv=0x7ffe9273cec8) at ../lib/eal/linux/eal.c:1342
1342            *(volatile int *)0x05 = 0;
[Current thread is 1 (Thread 0x7f0faca83c00 (LWP 2666512))]
(gdb) bt
#0  rte_eal_init (argc=4, argv=0x7ffe9273cec8) at ../lib/eal/linux/eal.c:1342
#1  0x00005582ac086f4e in main (argc=4, argv=0x7ffe9273cec8) at
../app/test/test.c:146




>

  reply	other threads:[~2021-08-18  9:37 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-30  8:49 [dpdk-dev] 0/6] support oops handling jerinj
2021-07-30  8:49 ` [dpdk-dev] 1/6] eal: introduce oops handling API jerinj
2021-08-17  3:27   ` [dpdk-dev] [PATCH v2 0/6] support oops handling jerinj
2021-08-17  3:27     ` [dpdk-dev] [PATCH v2 1/6] eal: introduce oops handling API jerinj
2021-08-17  3:53       ` Stephen Hemminger
2021-08-17  7:38         ` Jerin Jacob
2021-08-17 15:09           ` Stephen Hemminger
2021-08-17 15:27             ` Jerin Jacob
2021-08-17 15:52               ` Stephen Hemminger
2021-08-18  9:37                 ` Jerin Jacob [this message]
2021-08-18 16:46                   ` Stephen Hemminger
2021-08-18 18:04                     ` Jerin Jacob
2021-08-17  3:27     ` [dpdk-dev] [PATCH v2 2/6] eal: oops handling API implementation jerinj
2021-08-17  3:52       ` Stephen Hemminger
2021-08-17 10:24         ` Jerin Jacob
2021-08-17  3:27     ` [dpdk-dev] [PATCH v2 3/6] eal: support libunwind based backtrace jerinj
2021-08-17  3:27     ` [dpdk-dev] [PATCH v2 4/6] eal/x86: support register dump for oops jerinj
2021-08-17  3:27     ` [dpdk-dev] [PATCH v2 5/6] eal/arm64: " jerinj
2021-08-17  3:27     ` [dpdk-dev] [PATCH v2 6/6] test/oops: support unit test case for oops handling APIs jerinj
2021-09-06  4:17     ` [dpdk-dev] [PATCH v3 0/6] support oops handling jerinj
2021-09-06  4:17       ` [dpdk-dev] [PATCH v3 1/6] eal: introduce oops handling API jerinj
2021-09-06  4:17       ` [dpdk-dev] [PATCH v3 2/6] eal: oops handling API implementation jerinj
2021-09-06  4:17       ` [dpdk-dev] [PATCH v3 3/6] eal: support libunwind based backtrace jerinj
2022-01-27 20:47         ` Stephen Hemminger
2022-01-28  4:33           ` Jerin Jacob
2022-01-28  8:41             ` Thomas Monjalon
2022-01-28 14:27               ` Jerin Jacob
2022-01-28 17:05                 ` Stephen Hemminger
2021-09-06  4:17       ` [dpdk-dev] [PATCH v3 4/6] eal/x86: support register dump for oops jerinj
2021-09-06  4:17       ` [dpdk-dev] [PATCH v3 5/6] eal/arm64: " jerinj
2021-09-06  4:17       ` [dpdk-dev] [PATCH v3 6/6] test/oops: support unit test case for oops handling APIs jerinj
2021-09-21 17:30       ` [dpdk-dev] [PATCH v3 0/6] support oops handling Thomas Monjalon
2021-09-21 17:54         ` Jerin Jacob
2021-09-22  7:34           ` Thomas Monjalon
2021-09-22  8:03             ` Jerin Jacob
2021-09-22  8:33               ` Thomas Monjalon
2021-09-22  8:49                 ` Jerin Jacob
2021-07-30  8:49 ` [dpdk-dev] 2/6] eal: oops handling API implementation jerinj
2021-08-02 22:46   ` David Christensen
2021-07-30  8:49 ` [dpdk-dev] 3/6] eal: support libunwind based backtrace jerinj
2021-07-30  8:49 ` [dpdk-dev] 4/6] eal/x86: support register dump for oops jerinj
2021-07-30  8:49 ` [dpdk-dev] 5/6] eal/arm64: " jerinj
2021-08-02 22:49   ` David Christensen
2021-08-16 16:24     ` Jerin Jacob
2021-07-30  8:49 ` [dpdk-dev] 6/6] test/oops: support unit test case for oops handling APIs jerinj

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CALBAE1PJPE7jOQTgBsUXncTqoB5zoBk47rGptsoSj5-=2oEQJw@mail.gmail.com' \
    --to=jerinjacobk@gmail.com \
    --cc=bruce.richardson@intel.com \
    --cc=david.marchand@redhat.com \
    --cc=dev@dpdk.org \
    --cc=dmitry.kozliuk@gmail.com \
    --cc=dmitrym@microsoft.com \
    --cc=drc@linux.vnet.ibm.com \
    --cc=jerinj@marvell.com \
    --cc=konstantin.ananyev@intel.com \
    --cc=mdr@ashroe.eu \
    --cc=navasile@linux.microsoft.com \
    --cc=pallavi.kadam@intel.com \
    --cc=ruifeng.wang@arm.com \
    --cc=stephen@networkplumber.org \
    --cc=thomas@monjalon.net \
    --cc=viktorin@rehivetech.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.