Re: [PATCH v4] eal: add asynchronous request API to DPDK IPC

From: "Burakov, Anatoly" <anatoly.burakov@intel.com>
To: "Tan, Jianfeng" <jianfeng.tan@intel.com>, dev@dpdk.org
Cc: konstantin.ananyev@intel.com
Subject: Re: [PATCH v4] eal: add asynchronous request API to DPDK IPC
Date: Fri, 23 Mar 2018 18:21:01 +0000	[thread overview]
Message-ID: <8046b5ca-08d7-50f4-4917-e23950ac5856@intel.com> (raw)
In-Reply-To: <e312e2eb-693b-2b22-028c-f644dfee82d5@intel.com>

On 23-Mar-18 3:38 PM, Tan, Jianfeng wrote:
> Hi Anatoly,
> 
> Two general comments firstly.
> 
> Q1. As I understand, malloc usage as an example, when a primary process 
> receives a request in rte_mp_handle thread, it will allocate memory, and 
> broadcast an async request to all secondary processes, and it will 
> return immediately; then responses are replied from each secondary 
> process, which are recorded at rte_mp_async_handle thread firstly; 
> either timeout or responses are all collected, rte_mp_async_handle will 
> trigger an async action.
> 
> I agree the necessity of the async request API; without it, each caller 
> who has similar requirement needs lots of code to implement it.
> But can we achieve that without creating another pthread by leveraging 
> the alarm set? For example, to send an async request,
> step 1. set an alarm;
> step 2. send the request;
> step 3. receive and record responses in mp thread;
> step 4. if alarm timeout, trigger the async action with timeout result;
> step 5. if all responses are collected, cancel the alarm, and trigger 
> the async action with success result.

That would work, but this "alarm" sounds like another thread. The main 
thread is always blocked in recvmsg(), and interrupting that would 
involve a signal, which is a recipe for disaster (you know, global 
overwritable signal handlers, undefined behavior as to who actually gets 
the signal, stuff like that).

That is, unless you were referring to DPDK's own "alarm" API, in which 
case it's a chicken and egg problem - we want to use alarm for IPC, but 
can't because rte_malloc relies on IPC, which relies on alarm, which 
relies on rte_malloc. So unless we rewrite (or add an option for) alarm 
API to not use rte_malloc'd memory, this isn't going to work.

We of course can do it, but really, i would be inclined to leave this as 
is in the interests of merging this earlier, unless there are strong 
objections to it.

> 
> I don't have strong objection for adding another thread, and actually 
> this is an internal thing in ipc system, we can optimize it later.
> 
> Q2. Do we really need to register actions for result handler of async 
> request? Instead, we can put it as a parameter into 
> rte_mp_request_async(), whenever the request fails or succeeds, we just 
> invoke the function.

That can be done, sure. This would look cleaner on the backend too, so 
i'll probably do that for v5.

> 
> Please see some other comments inline.
> 
> On 3/14/2018 1:42 AM, Anatoly Burakov wrote:
>> This API is similar to the blocking API that is already present,
>> but reply will be received in a separate callback by the caller.
>>
>> Under the hood, we create a separate thread to deal with replies to
>> asynchronous requests, that will just wait to be notified by the
>> main thread, or woken up on a timer (it'll wake itself up every
>> minute regardless of whether it was called, but if there are no
>> requests in the queue, nothing will be done and it'll go to sleep
>> for another minute).
> 
> Wait for 1min seems a little strange to me; it shall wake up for a 
> latest timeout of pending async requests?

We do. However, we have to wait for *something* if there aren't any 
asynchronous requests pending. There isn't a way to put "wait infinite 
amount" as a time value, so i opted for next best thing - large enough 
to not cause any performance issues. The timeout is arbitrary.

>>   /** Double linked list of actions. */
>> @@ -60,13 +65,37 @@ struct mp_msg_internal {
>>       struct rte_mp_msg msg;
>>   };
>> +enum mp_request_type {
>> +    REQUEST_TYPE_SYNC,
>> +    REQUEST_TYPE_ASYNC
>> +};
>> +
>> +struct async_request_shared_param {
>> +    struct rte_mp_reply *user_reply;
>> +    struct timespec *end;
> 
> Why we have to make these two as pointers? Operating on pointers 
> introduce unnecessary complexity.

Because those are shared between different pending requests. Each 
pending request gets its own entry in the queue (because it expects 
answer from a particular process), but the request data (callback, 
number of requests processed, etc.) is shared between all requests for 
this sync operation. We don't have the luxury of storing all of that in 
a local variable like we do with synchronous requests :)

> 
>> +    int n_requests_processed;
> 
> It sounds like recording how many requests are sent, but it means how 
> many responses are received, right? n_responses sounds better?

No, it doesn't. Well, perhaps "n_requests_processed" should be 
"n_responses_processed", but not "n_responses", because this is a value 
that is indicating how many responses we've already seen (to know when 
to trigger the callback).

> 
>> +};
>> +
>> +struct async_request_param {
>> +    struct async_request_shared_param *param;
>> +};
>> +
>> +struct sync_request_param {
>> +    pthread_cond_t cond;
>> +};
>> +
>>   struct sync_request {
> 
> I know "sync_" in the original version was for synchronization between 
> the blocked thread (who sends the request) and the mp thread.
> 
> But it indeed makes the code difficult to understand. We may change the 
> name to "pending_request"?

This can be done, sure.

> 
> 
>>       TAILQ_ENTRY(sync_request) next;
>> -    int reply_received;
>> +    enum mp_request_type type;
>>       char dst[PATH_MAX];
>>       struct rte_mp_msg *request;
>> -    struct rte_mp_msg *reply;
>> -    pthread_cond_t cond;
>> +    struct rte_mp_msg *reply_msg;
>> +    int reply_received;
>> +    RTE_STD_C11
>> +    union {
>> +        struct sync_request_param sync;
>> +        struct async_request_param async;
>> +    };
>>   };
> 
> Too many structs are defined? How about just putting it like this:
> 
> struct pending_request {
>          TAILQ_ENTRY(sync_request) next;
>          enum mp_request_type type;
>          char dst[PATH_MAX];
>          struct rte_mp_msg *request;
>          struct rte_mp_msg *reply_msg;
>          int reply_received;
>          RTE_STD_C11
>          union {
>                  /* for sync request */
>                  struct {
>                          pthread_cond_t cond; /* used for mp thread to 
> wake up requesting thread */
>                  };
> 
>                  /* for async request */
>                  struct {
>                          struct rte_mp_reply user_reply;
>                          struct timespec end;
>                          int n_requests_processed; /* store how requests */
>                  };
>          };
> };

That can work, sure. However, i actually think that my approach is 
clearer, because when you're working with autocomplete and a proper IDE, 
it's clear which values are for which case (and i would argue it makes 
the code more readable as well).

>> +static void *
>> +async_reply_handle(void *arg __rte_unused)
>> +{
>> +    struct sync_request *sr;
>> +    struct timeval now;
>> +    struct timespec timeout, ts_now;
>> +    do {
> 
> Put while(1) here so that it's clear to be an infinite loop.

Will do.

>> +    /* we have to lock the request queue here, as we will be adding a 
>> bunch
>> +     * of requests to the queue at once, and some of the replies may 
>> arrive
>> +     * before we add all of the requests to the queue.
>> +     */
>> +    pthread_mutex_lock(&sync_requests.lock);
> 
> Why do we share this lock for both sync and async requests?

Because sync and async requests share the queue.

>> + * Asynchronous reply function typedef used by other components.
>> + *
>> + * As we create  socket channel for primary/secondary communication, use
> 
> Here are two spaces.

Will fix.

-- 
Thanks,
Anatoly