Hi Jonathan :)

It's very unlikely that the race could occur, BUT it can happen.

OK run
1. test_event_tracker starts gen-ust-events
2. test_event_tracker waits for gen-ust-events to create AFTER_FIRST_PATH
3. gen-ust-event create first event and create AFTER_FIRST_PATH
4. gen-ust-event continue and create 99 events
5. gen-ust-event wait for  BEFORE_LAST_PATH
6. test_event_track start collecting events (lttng start .....)
7. test_event_tracker calls "lttng track ... -pid "Faulty PID"
8. test_event_tracker touches BEFORE_LAST_PATH
9. gen-ust-events creates the last event
10. test_event finishes and since it tracks the faulty pid the result will be 0 events == OK

Faulty run
1. test_event_tracker starts gen-ust-events
2. test_event_tracker waits for gen-ust-events to create AFTER_FIRST_PATH
3. gen-ust-event create first event and create AFTER_FIRST_PATH
4. gen-ust-event gets rescheduled before it has created 99 events, e.g after 9 events
5. test_event_track start collecting events (lttng start .....)
6. gen-ust-event is rescheduled and starts generating the remaining events. 90 events
7. lttng collects these 90 events since we have not setup "tracking" of PID yet
8. test_event_tracker calls "lttng track ... -pid "Faulty PID"
9. test_event_tracker touches BEFORE_LAST_PATH
10. gen-ust-events creates the last event
11. test_event finishes and since it tracks the faulty pid the result should  be 0 events, but we got 90 == FAULTY

We can solve this by;
A: using NR_ITER=2
or
B: by adding a flag to gen-ust-events to wait before sending the first event
1. test_event_tracker starts gen-ust-events
2. test_event_tracker waits for gen-ust-events waits for BEFORE_FIRST_PATH
3. test_event_track start collecting events (lttng start .....)
4. test_event_tracker calls "lttng track ... -pid "Faulty PID"
5. test_event_track creates BEFORE_FIRST_PATH
6. gen_ust_event creates 100 events
7. test_event_tracker waits for gen_event_ust to end
8. test_event finishes and since it tracked the faulty pid the result will be 0 events == OK

This is in principle how gen-kernel-test-events works (but with different arguments)
I would suggest to use B since that will be bulletproof

Anders Wallin


On Wed, Mar 31, 2021 at 11:33 PM Jonathan Rajotte-Julien <jonathan.rajotte-julien@efficios.com> wrote:
Hi,

On Wed, Mar 31, 2021 at 11:09:42PM +0200, Anders Wallin wrote:
> Hi Julian,

You can use Jonathan. ;)

>
> Neither mine "sleep 0.1" or your version with "while [! -f ............"
> are race condition free.

I might be missing something here but as far as I understand the race you are
trying to mitigate is that the test script execute/continue before the `backgrounded`
command (background test app) had time to execute, right?

If so at least waiting for the app to create a file is necessary. Now
gen_kernel_test_events does not have this functionality. The PATH_WAIT_FILE is
used to control when the testapp can continue. Hence the script still cannot
know if the app have been scheduled.

Now based on the test case you might need more synchronization for the test
cases.

Note that in the ust cases, the trace_ust_app uses `touch "$BEFORE_LAST_PATH"`
that effectively unblock the app and allows it to perform the last tracepoint
hit and the we `wait` on the background process.

Note: some tests case are a bit clever and uses "trace_"$domain"_app" instead of
calling trace_ust_app directly.

For these tests case it seems that we are only expecting at least a single event
matching the event name under test. Here the last tracepoint hit should satisfy
this criteria.

Am I missing a race?

Cheers


> I suggest that we add an option to gen-ust-events to wait before the first
> event is generated.
> gen_kernel_test_events already have this functionality to wait before the
> first event.
>
> Something like this
> static struct option long_options[] =
> {
> /* These options set a flag. */
> {"iter", required_argument, 0, 'i'},
> {"wait", required_argument, 0, 'w'},
> {"sync-after-first-event", required_argument, 0, 'a'},
> {"sync-before-last-event", required_argument, 0, 'b'},
> {"sync-before-last-event-touch", required_argument, 0, 'c'},
> {"sync-before-exit", required_argument, 0, 'd'},
> {"sync-before-exit-touch", required_argument, 0, 'e'},
> *+ {"sync-before-first-event", required_argument, 0, 'f'},*
>
> {0, 0, 0, 0}
> };
> ....
>
> I will create one or more patches for this tomorrow
>
> Anders Wallin
>
>
> On Wed, Mar 31, 2021 at 9:25 PM Jonathan Rajotte-Julien <
> jonathan.rajotte-julien@efficios.com> wrote:
>
> > > #
> > > # SPDX-License-Identifier: GPL-2.0-only
> > >
> > > -TEST_DESC="LTTng - Event traker test"
> > > +TEST_DESC="LTTng - Event tracker test"
> > >
> > > CURDIR=$(dirname "$0")/
> > > TESTDIR="$CURDIR/../../.."
> > > @@ -42,6 +42,8 @@ function prepare_ust_app
> > >
> > >       $TESTAPP_BIN -i $NR_ITER -w $NR_USEC_WAIT -a "$AFTER_FIRST_PATH" -b
> > >       "$BEFORE_LAST_PATH" &
> > >       CHILD_PID=$!
> > > +     # voluntary context switch to start $TESTAPP_BIN
> > > +     sleep 0.1
> >
> > A wait on the $AFTER_FIRST_PATH file would be probably more deterministic
> > than a sleep here.
> >
> >   while [ ! -f "${AFTER_FIRST_PATH}" ]; do
> >           sleep 0.1
> >   done
> >
> > I would also expect something similar for the `prepare_kernel_app`
> > function considering the same race is most probably present and simply not
> > triggered by a chance of luck.
> > Seems like gen-kernel-test-events does not expose the same sync
> > capabilities here, please use gen-ust-events as an example of how it is
> > done.
> >
> > Let us know how your testing goes.
> >
> > Thanks
> >

--
Jonathan Rajotte-Julien
EfficiOS