All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Jérémie Galarneau" <jeremie.galarneau@efficios.com>
To: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Cc: "lttng-dev@lists.lttng.org" <lttng-dev@lists.lttng.org>,
	Jeremie Galarneau <jgalar@efficios.com>
Subject: Re: [RFC PATCH v2 10/13] Teardown apps_notify_thread before thread_manage_apps
Date: Wed, 13 Dec 2017 20:57:24 -0500	[thread overview]
Message-ID: <CA+jJMxu+jX=+DLXWMwZiLCjnnLgcjmy3N=0qDU4cZ-qyzt=scw__47470.1159903363$1513216674$gmane$org@mail.gmail.com> (raw)
In-Reply-To: <20170918225206.17725-11-jonathan.rajotte-julien@efficios.com>

Sorry, I fat-finger-sent an incomplete reply, disregard it. This e-mail
contains all the comments.

On 18 September 2017 at 18:52, Jonathan Rajotte
<jonathan.rajotte-julien@efficios.com> wrote:
> RFC:
> This is necessary since we use the ust_app_ht_by_notify_sock to
> perform the cleanup operation. Since both ust_app_unregister and
> ust_app_ust_app_notify_unregister perform a remove of the app on the
> ust_ust_app_by_notify_sock but only ust_app_notify_unregister actually

There are some extra ust_ and ust_app_ on the last two lines ;)

> end up performing a close(call_rcu) on the socket.

a close(call_rcu) ?

>
> Other way to do fix this problem?
>

Let me walk through the problem as I understand it and
we'll see if/where I am going off-course.

An ust_app has two sockets:
  - command (through which the sessiond sends commands to the app)
  - notify (through which the application notifies the sessiond of
    various conditions such as new events being available, etc.)

These two sockets are received from applications by the
reg_apps_thread [1] which posts them to the dispatch_thread [2].
Once the dispatch thread is aware of both sockets for a given app,
the two sockets are "dispatched" to their respective management
threads.

* The "command" socket is sent to the "apps_thread" [3]
* The "notify" socket is sent to the "apps_notify_thread" [4]

Let's look at what happens when an application dies.

The "apps_thread" (handles application command sockets):
- wakes up from its poll() and notices an error/hang-up has
  occurred on an application's command socket
  - calls ust_app_unregister() on with that socket's FD
    - retrieves the ust_app from the socket's FD through
      ust_app_ht_by_sock
      - flushes that application's buffers and metadata
      - removes the app from ust_app_ht_by_sock
      - removes the app from ust_app_ht_by_notify_sock
        (it's okay for this to fail)
      - removes the app from ust_app_ht (pid <-> ust_app)
      - enqueues a call_rcu() job to close the command socket

The "apps_notify_thread" (handles application notification sockets):
- wakes up from its poll() and notices an error/hand-up has
  occurred on an application's notification socket
  - calls ust_app_notify_sock_unregister()
    - removes the app from ust_app_ht_by_notify_sock
      (it's okay for this to fail since both threads are trying
       to clean this up)
    - call_rcu() to close the notify socket


Now, provided that I understand the problem correctly
(as stated in my reply to patch #06), you want to clean-up
the command and notify sockets in the same way that they
would be if their associated apps had died. That's fair.

However, as seen above, cleaning up the "apps_thread" will cause
it to empty all the ust_app data structures:
  * ust_app_ht_by_sock
  * ust_app_ht_by_notify_sock
  * ust_app_ht

However, the "apps_notify_thread" needs at least one of those
data structures to be present to iterate on all of the
applications and perform its clean-up.

Hence, you want to make sure that the "apps_notify_thread" can
complete its shutdown before the "apps_thread" starts its own
clean-up. Correct?

I would be okay with reordering the threads' teardown to
provide that guarantee. My only gripe is that it needs to
be documented _extensively_ as it is not obvious at all that
such a dependency exists between those threads.

On the other hand, it seems to me that it would be simpler
to _not_ perform the clean-up you added in the "apps_thread"
and leave that to the sessiond_cleanup(), once all threads
have been torn down.

Not performing the clean-up in the "apps_thread" allows the
clean-up (that you added) to occur in the "apps_notify_thread"
as the data structures (such as ust_app_ht) are still in place.

As a result of the "apps_notify_thread" clean-up, the notify
sockets would eventually be closed by through the call_rcu()'s
during the next grace period. This will then unblock the
applications that are stuck waiting on their notify socket.

Would that solve the problem or am I missing something?

Jérémie

[1] https://github.com/lttng/lttng-tools/blob/master/src/bin/lttng-sessiond/main.c#L2005
[2] https://github.com/lttng/lttng-tools/blob/master/src/bin/lttng-sessiond/main.c#L1749
[3] https://github.com/lttng/lttng-tools/blob/master/src/bin/lttng-sessiond/main.c#L1450
[4] https://github.com/lttng/lttng-tools/blob/master/src/bin/lttng-sessiond/ust-thread.c#L33


> Could we simply not remove it on a ust_app_unregister? And always defer
> to the apps_notify_thread for cleanup?
>
> Update the value in the hash table to -1 and emit a close and remove
> from the hash table if the value is -1?
>
> We could also keep a local list of fd in apps_notify_thread and use it for
> cleanup instead of relying on ust_ust_app_by_notify_sock.
>
> I'm not sure what is the best/elegant solution here. I am not a fan of
> the current solution but it working.
>
> Obviously this commit will be reworded and modified accordingly.
>
> Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
> ---
>  src/bin/lttng-sessiond/main.c | 55 ++++++++++++++++++++++++-------------------
>  1 file changed, 31 insertions(+), 24 deletions(-)
>
> diff --git a/src/bin/lttng-sessiond/main.c b/src/bin/lttng-sessiond/main.c
> index 4a2a661f..216d0da6 100644
> --- a/src/bin/lttng-sessiond/main.c
> +++ b/src/bin/lttng-sessiond/main.c
> @@ -6209,16 +6209,6 @@ int main(int argc, char **argv)
>         }
>         notification_thread_running = true;
>
> -       /* Create thread to manage application notify socket */
> -       ret = pthread_create(&apps_notify_thread, default_pthread_attr(),
> -                       ust_thread_manage_notify, (void *) NULL);
> -       if (ret) {
> -               errno = ret;
> -               PERROR("pthread_create notify");
> -               retval = -1;
> -               stop_threads();
> -               goto exit_apps_notify;
> -       }
>
>         /* Create thread to manage application socket */
>         ret = pthread_create(&apps_thread, default_pthread_attr(),
> @@ -6231,6 +6221,17 @@ int main(int argc, char **argv)
>                 goto exit_apps;
>         }
>
> +       /* Create thread to manage application notify socket */
> +       ret = pthread_create(&apps_notify_thread, default_pthread_attr(),
> +                       ust_thread_manage_notify, (void *) NULL);
> +       if (ret) {
> +               errno = ret;
> +               PERROR("pthread_create notify");
> +               retval = -1;
> +               stop_threads();
> +               goto exit_apps_notify;
> +       }
> +
>         /* Create thread to dispatch registration */
>         ret = pthread_create(&dispatch_thread, default_pthread_attr(),
>                         thread_dispatch_ust_registration, (void *) NULL);
> @@ -6358,20 +6359,6 @@ exit_reg_apps:
>         }
>
>  exit_dispatch:
> -       /* Instruct the apps thread to quit */
> -       ret = notify_thread_pipe(thread_apps_teardown_trigger_pipe[1]);
> -       if (ret < 0) {
> -               ERR("write error on thread quit pipe");
> -       }
> -
> -       ret = pthread_join(apps_thread, &status);
> -       if (ret) {
> -               errno = ret;
> -               PERROR("pthread_join apps");
> -               retval = -1;
> -       }
> -
> -exit_apps:
>         /* Instruct the apps_notify thread to quit */
>         ret = notify_thread_pipe(thread_apps_notify_teardown_trigger_pipe[1]);
>         if (ret < 0) {
> @@ -6386,6 +6373,26 @@ exit_apps:
>         }
>
>  exit_apps_notify:
> +       /*
> +        * The barrier ensure that all previous resources, notify sockets in
> +        * particular, are freed/closed.
> +        */
> +       rcu_barrier();
> +
> +       /* Instruct the apps thread to quit */
> +       ret = notify_thread_pipe(thread_apps_teardown_trigger_pipe[1]);
> +       if (ret < 0) {
> +               ERR("write error on thread quit pipe");
> +       }
> +
> +       ret = pthread_join(apps_thread, &status);
> +       if (ret) {
> +               errno = ret;
> +               PERROR("pthread_join apps");
> +               retval = -1;
> +       }
> +
> +exit_apps:
>  exit_notification:
>  exit_health:
>  exit_init_data:
> --
> 2.11.0
>



-- 
Jérémie Galarneau
EfficiOS Inc.
http://www.efficios.com
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

  parent reply	other threads:[~2017-12-14  1:57 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20170918225206.17725-1-jonathan.rajotte-julien@efficios.com>
2017-09-18 22:51 ` [RFC PATCH v2 01/13] Extend health thread lifetime Jonathan Rajotte
2017-09-18 22:51 ` [RFC PATCH v2 02/13] Reorder pthread_join for easier ordering later on Jonathan Rajotte
2017-09-18 22:51 ` [RFC PATCH v2 03/13] Terminate dispatch thread after reg_apps_thread is terminated Jonathan Rajotte
2017-09-18 22:51 ` [RFC PATCH v2 04/13] Order termination of thread_manage_apps after dispatch_thread Jonathan Rajotte
2017-09-18 22:51 ` [RFC PATCH v2 05/13] Control thread_apps_notify lifetime with specialized quit pipe Jonathan Rajotte
2017-09-18 22:51 ` [RFC PATCH v2 06/13] Fix: unregister app notify socket on sessiond tear down Jonathan Rajotte
2017-09-18 22:52 ` [RFC PATCH v2 07/13] Always reply to an inquiring app Jonathan Rajotte
2017-09-18 22:52 ` [RFC PATCH v2 08/13] Fix: perform lookup then test for condition Jonathan Rajotte
2017-09-18 22:52 ` [RFC PATCH v2 09/13] Perform ust_app_unregister on thread_manage_apps teardown Jonathan Rajotte
2017-09-18 22:52 ` [RFC PATCH v2 10/13] Teardown apps_notify_thread before thread_manage_apps Jonathan Rajotte
2017-09-18 22:52 ` [RFC PATCH v2 11/13] Comments update Jonathan Rajotte
2017-09-18 22:52 ` [RFC PATCH v2 12/13] Fix: delay termination on consumerd to allow metadata flushing Jonathan Rajotte
2017-09-18 22:52 ` [RFC PATCH v2 13/13] Fix: quit early if instructed to Jonathan Rajotte
     [not found] ` <20170918225206.17725-2-jonathan.rajotte-julien@efficios.com>
2017-12-14  1:54   ` [RFC PATCH v2 01/13] Extend health thread lifetime Jérémie Galarneau
     [not found] ` <20170918225206.17725-3-jonathan.rajotte-julien@efficios.com>
2017-12-14  1:54   ` [RFC PATCH v2 02/13] Reorder pthread_join for easier ordering later on Jérémie Galarneau
     [not found] ` <20170918225206.17725-4-jonathan.rajotte-julien@efficios.com>
2017-12-14  1:55   ` [RFC PATCH v2 03/13] Terminate dispatch thread after reg_apps_thread is terminated Jérémie Galarneau
     [not found] ` <20170918225206.17725-5-jonathan.rajotte-julien@efficios.com>
2017-12-14  1:55   ` [RFC PATCH v2 04/13] Order termination of thread_manage_apps after dispatch_thread Jérémie Galarneau
     [not found] ` <20170918225206.17725-6-jonathan.rajotte-julien@efficios.com>
2017-12-14  1:55   ` [RFC PATCH v2 05/13] Control thread_apps_notify lifetime with specialized quit pipe Jérémie Galarneau
     [not found] ` <20170918225206.17725-7-jonathan.rajotte-julien@efficios.com>
2017-12-14  1:56   ` [RFC PATCH v2 06/13] Fix: unregister app notify socket on sessiond tear down Jérémie Galarneau
     [not found] ` <20170918225206.17725-8-jonathan.rajotte-julien@efficios.com>
2017-12-14  1:56   ` [RFC PATCH v2 07/13] Always reply to an inquiring app Jérémie Galarneau
     [not found] ` <20170918225206.17725-9-jonathan.rajotte-julien@efficios.com>
2017-12-14  1:56   ` [RFC PATCH v2 08/13] Fix: perform lookup then test for condition Jérémie Galarneau
     [not found] ` <20170918225206.17725-10-jonathan.rajotte-julien@efficios.com>
2017-12-14  1:57   ` [RFC PATCH v2 09/13] Perform ust_app_unregister on thread_manage_apps teardown Jérémie Galarneau
     [not found] ` <20170918225206.17725-11-jonathan.rajotte-julien@efficios.com>
2017-12-13 17:58   ` [RFC PATCH v2 10/13] Teardown apps_notify_thread before thread_manage_apps Jérémie Galarneau
2017-12-14  1:57   ` Jérémie Galarneau [this message]
     [not found] ` <20170918225206.17725-12-jonathan.rajotte-julien@efficios.com>
2017-12-14  1:57   ` [RFC PATCH v2 11/13] Comments update Jérémie Galarneau
     [not found] ` <20170918225206.17725-13-jonathan.rajotte-julien@efficios.com>
2017-12-14  1:57   ` [RFC PATCH v2 12/13] Fix: delay termination on consumerd to allow metadata flushing Jérémie Galarneau
     [not found] ` <20170918225206.17725-14-jonathan.rajotte-julien@efficios.com>
2017-12-14  1:58   ` [RFC PATCH v2 13/13] Fix: quit early if instructed to Jérémie Galarneau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CA+jJMxu+jX=+DLXWMwZiLCjnnLgcjmy3N=0qDU4cZ-qyzt=scw__47470.1159903363$1513216674$gmane$org@mail.gmail.com' \
    --to=jeremie.galarneau@efficios.com \
    --cc=jgalar@efficios.com \
    --cc=jonathan.rajotte-julien@efficios.com \
    --cc=lttng-dev@lists.lttng.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.