All of lore.kernel.org
 help / color / mirror / Atom feed
From: Aaron Conole <aconole@redhat.com>
To: "Van Haaren\, Harry" <harry.van.haaren@intel.com>
Cc: David Marchand <david.marchand@redhat.com>,  dev <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH] eal/service: fix exit by resetting service lcores
Date: Tue, 10 Mar 2020 15:14:27 -0400	[thread overview]
Message-ID: <f7t8sk858fg.fsf@dhcp-25.97.bos.redhat.com> (raw)
In-Reply-To: <f7th7yw5ckn.fsf@dhcp-25.97.bos.redhat.com> (Aaron Conole's message of "Tue, 10 Mar 2020 13:44:56 -0400")

Aaron Conole <aconole@redhat.com> writes:

> "Van Haaren, Harry" <harry.van.haaren@intel.com> writes:
>
>>> -----Original Message-----
>>> From: David Marchand <david.marchand@redhat.com>
>>> Sent: Tuesday, March 10, 2020 4:31 PM
>>> To: Van Haaren, Harry <harry.van.haaren@intel.com>
>>> Cc: dev <dev@dpdk.org>; Aaron Conole <aconole@redhat.com>
>>> Subject: Re: [PATCH] eal/service: fix exit by resetting service lcores
>>> 
>>> On Tue, Mar 10, 2020 at 2:32 PM Harry van Haaren
>>> <harry.van.haaren@intel.com> wrote:
>>> >
>>> > This commit releases all service cores from thier role,
>>> > returning them to ROLE_RTE on rte_service_finalize().
>>> >
>>> > This may fix an issue relating to the service cores causing
>>> > a race-condition on eal_cleanup(), where the service core
>>> > could still be executing while the main thread has already
>>> > free-d the service memory, leading to a segfault.
>>> 
>>> Adding rte_service_lcore_reset_all() just tells a (remaining) service
>>> lcore to quit its loop, but does not close the race on lcore_states.
>>> 
>>> The backtrace shows the same.
>>> 
>>> (gdb) bt full
>>> #0  rte_service_runner_func (arg=<optimized out>) at
>>> ../lib/librte_eal/common/rte_service.c:455
>>>         service_mask = 1
>>>         i = <optimized out>
>>>         lcore = 1
>>>         cs = 0x1003ea200
>>> #1  0x00007ffff72030ef in eal_thread_loop (arg=<optimized out>) at
>>> ../lib/librte_eal/linux/eal/eal_thread.c:153
>>>         fct_arg = <optimized out>
>>>         c = 0 '\000'
>>>         n = <optimized out>
>>>         ret = <optimized out>
>>>         lcore_id = <optimized out>
>>>         thread_id = 140737203603200
>>>         m2s = 14
>>>         s2m = 22
>>>         cpuset = "1", '\000' <repeats 175 times>,
>>> "\200\000\000\000\000\000\000\000\221\354e\360\377\177", '\000'
>>> <repeats 65 times>
>>>         __func__ = "eal_thread_loop"
>>> #2  0x00007ffff065ddd5 in start_thread () from /lib64/libpthread.so.0
>>> No symbol table info available.
>>> #3  0x00007ffff038702d in clone () from /lib64/libc.so.6
>>> No symbol table info available.
>>> 
>>> 
>>> I added a rte_eal_mp_wait_lcore(), to ensure that each service lcore
>>> _did_ quit its loop.
>>> @@ -123,6 +123,7 @@ rte_service_finalize(void)
>>>                 return;
>>> 
>>>         rte_service_lcore_reset_all();
>>> +       rte_eal_mp_wait_lcore();
>>> 
>>>         rte_free(rte_services);
>>>         rte_free(lcore_states);
>>> 
>>> 
>>> I can't reproduce with this.
>>
>> OK - that's good news, thanks for the quick testing & feedback.
>>
>> Agree with your analysis of the above, indeed waiting for the cores
>> explicitly seems the right solution to remove the race.
>>
>> Will I spin up a v2 patchset with your co-authored-by added and the above
>> change included?
>
> Please spin the v2 - I am currently testing with David's incremental on
> my setup now.

Additionally, with the incremental:

Acked-by: Aaron Conole <aconole@redhat.com>

Please make sure to cc stable@.


  reply	other threads:[~2020-03-10 19:14 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-10 13:33 [dpdk-dev] [PATCH] eal/service: fix exit by resetting service lcores Harry van Haaren
2020-03-10 16:31 ` David Marchand
2020-03-10 16:38   ` Van Haaren, Harry
2020-03-10 17:44     ` Aaron Conole
2020-03-10 19:14       ` Aaron Conole [this message]
2020-03-11  9:09     ` David Marchand
2020-03-11 14:39 ` [dpdk-dev] [PATCH v2] " Harry van Haaren
     [not found]   ` <CAJFAV8zTcyE8Swm-s8R3jQ3pajLV6+HXPHRmwUhqw6KktCm1MQ@mail.gmail.com>
2020-03-11 17:08     ` Aaron Conole
2020-03-12  9:03       ` David Marchand
     [not found]     ` <MWHPR1101MB21575ECFB47831F8DE513465D7FC0@MWHPR1101MB2157.namprd11.prod.outlook.com>
2020-03-12  8:59       ` David Marchand
2020-03-13 10:04   ` David Marchand
2020-04-06 10:30     ` Burakov, Anatoly
2020-04-14 13:22       ` Aaron Conole

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f7t8sk858fg.fsf@dhcp-25.97.bos.redhat.com \
    --to=aconole@redhat.com \
    --cc=david.marchand@redhat.com \
    --cc=dev@dpdk.org \
    --cc=harry.van.haaren@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.