All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] eal/unix: allow creating thread with real-time priority
@ 2023-10-24 12:54 Thomas Monjalon
  2023-10-24 13:55 ` Morten Brørup
                   ` (5 more replies)
  0 siblings, 6 replies; 54+ messages in thread
From: Thomas Monjalon @ 2023-10-24 12:54 UTC (permalink / raw)
  To: dev
  Cc: David Marchand, stable, Anatoly Burakov, Stephen Hemminger,
	Narcisa Vasile, Tyler Retzlaff, Dmitry Kozlyuk,
	Konstantin Ananyev, Andrew Rybchenko, Morten Brørup

When adding an API for creating threads,
the real-time priority has been forbidden on Unix.

There is a known issue with ring behaviour,
but it should not be completely forbidden.

Fixes: ca04c78b6262 ("eal: get/set thread priority per thread identifier")
Fixes: ce6e911d20f6 ("eal: add thread lifetime API")
Fixes: a7ba40b2b1bf ("drivers: convert to internal control threads")
Cc: stable@dpdk.org

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
---
 app/test/test_threads.c                         | 9 ---------
 doc/guides/prog_guide/env_abstraction_layer.rst | 4 +++-
 lib/eal/include/rte_thread.h                    | 2 +-
 lib/eal/unix/rte_thread.c                       | 9 ---------
 4 files changed, 4 insertions(+), 20 deletions(-)

diff --git a/app/test/test_threads.c b/app/test/test_threads.c
index 4ac3f2671a..4f7d10e074 100644
--- a/app/test/test_threads.c
+++ b/app/test/test_threads.c
@@ -97,21 +97,12 @@ test_thread_priority(void)
 		"Priority set mismatches priority get");
 
 	priority = RTE_THREAD_PRIORITY_REALTIME_CRITICAL;
-#ifndef RTE_EXEC_ENV_WINDOWS
-	RTE_TEST_ASSERT(rte_thread_set_priority(thread_id, priority) == ENOTSUP,
-		"Priority set to critical should fail");
-	RTE_TEST_ASSERT(rte_thread_get_priority(thread_id, &priority) == 0,
-		"Failed to get thread priority");
-	RTE_TEST_ASSERT(priority == RTE_THREAD_PRIORITY_NORMAL,
-		"Failed set to critical should have retained normal");
-#else
 	RTE_TEST_ASSERT(rte_thread_set_priority(thread_id, priority) == 0,
 		"Priority set to critical should succeed");
 	RTE_TEST_ASSERT(rte_thread_get_priority(thread_id, &priority) == 0,
 		"Failed to get thread priority");
 	RTE_TEST_ASSERT(priority == RTE_THREAD_PRIORITY_REALTIME_CRITICAL,
 		"Priority set mismatches priority get");
-#endif
 
 	priority = RTE_THREAD_PRIORITY_NORMAL;
 	RTE_TEST_ASSERT(rte_thread_set_priority(thread_id, priority) == 0,
diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst
index 6debf54efb..d1f7cae7cd 100644
--- a/doc/guides/prog_guide/env_abstraction_layer.rst
+++ b/doc/guides/prog_guide/env_abstraction_layer.rst
@@ -815,7 +815,9 @@ Known Issues
 
   4. It MAY be used by preemptible multi-producer and/or preemptible multi-consumer pthreads whose scheduling policy are all SCHED_OTHER(cfs), SCHED_IDLE or SCHED_BATCH. User SHOULD be aware of the performance penalty before using it.
 
-  5. It MUST not be used by multi-producer/consumer pthreads, whose scheduling policies are SCHED_FIFO or SCHED_RR.
+  5. It MUST not be used by multi-producer/consumer pthreads
+     whose scheduling policies are ``SCHED_FIFO``
+     or ``SCHED_RR`` (``RTE_THREAD_PRIORITY_REALTIME_CRITICAL``).
 
   Alternatively, applications can use the lock-free stack mempool handler. When
   considering this handler, note that:
diff --git a/lib/eal/include/rte_thread.h b/lib/eal/include/rte_thread.h
index 8da9d4d3fb..5e624015e4 100644
--- a/lib/eal/include/rte_thread.h
+++ b/lib/eal/include/rte_thread.h
@@ -59,7 +59,7 @@ enum rte_thread_priority {
 	RTE_THREAD_PRIORITY_NORMAL            = 0,
 	/**< normal thread priority, the default */
 	RTE_THREAD_PRIORITY_REALTIME_CRITICAL = 1,
-	/**< highest thread priority allowed */
+	/**< highest thread priority, use with caution */
 };
 
 /**
diff --git a/lib/eal/unix/rte_thread.c b/lib/eal/unix/rte_thread.c
index 36a21ab2f9..0ff291c6f1 100644
--- a/lib/eal/unix/rte_thread.c
+++ b/lib/eal/unix/rte_thread.c
@@ -153,11 +153,6 @@ rte_thread_create(rte_thread_t *thread_id,
 			goto cleanup;
 		}
 
-		if (thread_attr->priority ==
-				RTE_THREAD_PRIORITY_REALTIME_CRITICAL) {
-			ret = ENOTSUP;
-			goto cleanup;
-		}
 		ret = thread_map_priority_to_os_value(thread_attr->priority,
 				&param.sched_priority, &policy);
 		if (ret != 0)
@@ -275,10 +270,6 @@ rte_thread_set_priority(rte_thread_t thread_id,
 	int policy;
 	int ret;
 
-	/* Realtime priority can cause crashes on non-Windows platforms. */
-	if (priority == RTE_THREAD_PRIORITY_REALTIME_CRITICAL)
-		return ENOTSUP;
-
 	ret = thread_map_priority_to_os_value(priority, &param.sched_priority,
 		&policy);
 	if (ret != 0)
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* RE: [PATCH] eal/unix: allow creating thread with real-time priority
  2023-10-24 12:54 [PATCH] eal/unix: allow creating thread with real-time priority Thomas Monjalon
@ 2023-10-24 13:55 ` Morten Brørup
  2023-10-24 16:04   ` Stephen Hemminger
  2023-10-25 15:13 ` [PATCH v2] " Thomas Monjalon
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 54+ messages in thread
From: Morten Brørup @ 2023-10-24 13:55 UTC (permalink / raw)
  To: Thomas Monjalon, dev
  Cc: David Marchand, stable, Anatoly Burakov, Stephen Hemminger,
	Narcisa Vasile, Tyler Retzlaff, Dmitry Kozlyuk,
	Konstantin Ananyev, Andrew Rybchenko

> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> Sent: Tuesday, 24 October 2023 14.54
> 
> When adding an API for creating threads,
> the real-time priority has been forbidden on Unix.
> 
> There is a known issue with ring behaviour,
> but it should not be completely forbidden.
> 
> Fixes: ca04c78b6262 ("eal: get/set thread priority per thread identifier")
> Fixes: ce6e911d20f6 ("eal: add thread lifetime API")
> Fixes: a7ba40b2b1bf ("drivers: convert to internal control threads")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
> ---

[...]

> @@ -815,7 +815,9 @@ Known Issues
> 
>    4. It MAY be used by preemptible multi-producer and/or preemptible multi-
> consumer pthreads whose scheduling policy are all SCHED_OTHER(cfs), SCHED_IDLE
> or SCHED_BATCH. User SHOULD be aware of the performance penalty before using
> it.
> 
> -  5. It MUST not be used by multi-producer/consumer pthreads, whose
> scheduling policies are SCHED_FIFO or SCHED_RR.
> +  5. It MUST not be used by multi-producer/consumer pthreads
> +     whose scheduling policies are ``SCHED_FIFO``
> +     or ``SCHED_RR`` (``RTE_THREAD_PRIORITY_REALTIME_CRITICAL``).

Do the RTS or HTS ring modes make any difference here?

Anyway, I agree that real-time priority should not be forbidden on Unix.

Acked-by: Morten Brørup <mb@smartsharesystems.com>


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH] eal/unix: allow creating thread with real-time priority
  2023-10-24 13:55 ` Morten Brørup
@ 2023-10-24 16:04   ` Stephen Hemminger
  2023-10-25 13:15     ` Thomas Monjalon
  0 siblings, 1 reply; 54+ messages in thread
From: Stephen Hemminger @ 2023-10-24 16:04 UTC (permalink / raw)
  To: Morten Brørup
  Cc: Thomas Monjalon, dev, David Marchand, stable, Anatoly Burakov,
	Narcisa Vasile, Tyler Retzlaff, Dmitry Kozlyuk,
	Konstantin Ananyev, Andrew Rybchenko

On Tue, 24 Oct 2023 15:55:13 +0200
Morten Brørup <mb@smartsharesystems.com> wrote:

> > 
> >    4. It MAY be used by preemptible multi-producer and/or preemptible multi-
> > consumer pthreads whose scheduling policy are all SCHED_OTHER(cfs), SCHED_IDLE
> > or SCHED_BATCH. User SHOULD be aware of the performance penalty before using
> > it.
> > 
> > -  5. It MUST not be used by multi-producer/consumer pthreads, whose
> > scheduling policies are SCHED_FIFO or SCHED_RR.
> > +  5. It MUST not be used by multi-producer/consumer pthreads
> > +     whose scheduling policies are ``SCHED_FIFO``
> > +     or ``SCHED_RR`` (``RTE_THREAD_PRIORITY_REALTIME_CRITICAL``).  
> 
> Do the RTS or HTS ring modes make any difference here?
> 
> Anyway, I agree that real-time priority should not be forbidden on Unix.
> 
> Acked-by: Morten Brørup <mb@smartsharesystems.com>

Please add a big warning message in the rte_thread.c and the documentation
to describe the problem. Need to have the "you have been warned" action.

Use of RT priority is incompatible with 100% poll mode as is typically done
in DPDK applications. A real time thread has higher priority than other necessary
kernel threads on the same CPU. Therefore if the RT thread never sleeps, critical
system actions such as delayed writes, network packet processing and timer updates
will not happen which makes the system unstable.

Multiple DPDK users have learned this the hard way.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH] eal/unix: allow creating thread with real-time priority
  2023-10-24 16:04   ` Stephen Hemminger
@ 2023-10-25 13:15     ` Thomas Monjalon
  2023-10-25 13:34       ` Bruce Richardson
  0 siblings, 1 reply; 54+ messages in thread
From: Thomas Monjalon @ 2023-10-25 13:15 UTC (permalink / raw)
  To: Morten Brørup, Stephen Hemminger, Min Zhou
  Cc: stable, dev, David Marchand, stable, Anatoly Burakov,
	Narcisa Vasile, Tyler Retzlaff, Dmitry Kozlyuk,
	Konstantin Ananyev, Andrew Rybchenko

24/10/2023 18:04, Stephen Hemminger:
> On Tue, 24 Oct 2023 15:55:13 +0200
> Morten Brørup <mb@smartsharesystems.com> wrote:
> 
> > > 
> > >    4. It MAY be used by preemptible multi-producer and/or preemptible multi-
> > > consumer pthreads whose scheduling policy are all SCHED_OTHER(cfs), SCHED_IDLE
> > > or SCHED_BATCH. User SHOULD be aware of the performance penalty before using
> > > it.
> > > 
> > > -  5. It MUST not be used by multi-producer/consumer pthreads, whose
> > > scheduling policies are SCHED_FIFO or SCHED_RR.
> > > +  5. It MUST not be used by multi-producer/consumer pthreads
> > > +     whose scheduling policies are ``SCHED_FIFO``
> > > +     or ``SCHED_RR`` (``RTE_THREAD_PRIORITY_REALTIME_CRITICAL``).  
> > 
> > Do the RTS or HTS ring modes make any difference here?
> > 
> > Anyway, I agree that real-time priority should not be forbidden on Unix.
> > 
> > Acked-by: Morten Brørup <mb@smartsharesystems.com>
> 
> Please add a big warning message in the rte_thread.c and the documentation
> to describe the problem. Need to have the "you have been warned" action.

Yes I can add more warnings.

> Use of RT priority is incompatible with 100% poll mode as is typically done
> in DPDK applications. A real time thread has higher priority than other necessary
> kernel threads on the same CPU. Therefore if the RT thread never sleeps, critical
> system actions such as delayed writes, network packet processing and timer updates
> will not happen which makes the system unstable.

Yes, and it is shown by the test on loongarch:
DPDK:fast-tests / threads_autotest        TIMEOUT        80.01s
http://mails.dpdk.org/archives/test-report/2023-October/488760.html

I'll try to pass the test by adding a sleep in the test thread.

> Multiple DPDK users have learned this the hard way.





^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH] eal/unix: allow creating thread with real-time priority
  2023-10-25 13:15     ` Thomas Monjalon
@ 2023-10-25 13:34       ` Bruce Richardson
  2023-10-25 13:44         ` Thomas Monjalon
  0 siblings, 1 reply; 54+ messages in thread
From: Bruce Richardson @ 2023-10-25 13:34 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Morten Brørup, Stephen Hemminger, Min Zhou, stable, dev,
	David Marchand, Anatoly Burakov, Narcisa Vasile, Tyler Retzlaff,
	Dmitry Kozlyuk, Konstantin Ananyev, Andrew Rybchenko

On Wed, Oct 25, 2023 at 03:15:49PM +0200, Thomas Monjalon wrote:
> 24/10/2023 18:04, Stephen Hemminger:
> > On Tue, 24 Oct 2023 15:55:13 +0200
> > Morten Brørup <mb@smartsharesystems.com> wrote:
> > 
> > > > 
> > > >    4. It MAY be used by preemptible multi-producer and/or preemptible multi-
> > > > consumer pthreads whose scheduling policy are all SCHED_OTHER(cfs), SCHED_IDLE
> > > > or SCHED_BATCH. User SHOULD be aware of the performance penalty before using
> > > > it.
> > > > 
> > > > -  5. It MUST not be used by multi-producer/consumer pthreads, whose
> > > > scheduling policies are SCHED_FIFO or SCHED_RR.
> > > > +  5. It MUST not be used by multi-producer/consumer pthreads
> > > > +     whose scheduling policies are ``SCHED_FIFO``
> > > > +     or ``SCHED_RR`` (``RTE_THREAD_PRIORITY_REALTIME_CRITICAL``).  
> > > 
> > > Do the RTS or HTS ring modes make any difference here?
> > > 
> > > Anyway, I agree that real-time priority should not be forbidden on Unix.
> > > 
> > > Acked-by: Morten Brørup <mb@smartsharesystems.com>
> > 
> > Please add a big warning message in the rte_thread.c and the documentation
> > to describe the problem. Need to have the "you have been warned" action.
> 
> Yes I can add more warnings.
> 
> > Use of RT priority is incompatible with 100% poll mode as is typically done
> > in DPDK applications. A real time thread has higher priority than other necessary
> > kernel threads on the same CPU. Therefore if the RT thread never sleeps, critical
> > system actions such as delayed writes, network packet processing and timer updates
> > will not happen which makes the system unstable.
> 
> Yes, and it is shown by the test on loongarch:
> DPDK:fast-tests / threads_autotest        TIMEOUT        80.01s
> http://mails.dpdk.org/archives/test-report/2023-October/488760.html
> 
> I'll try to pass the test by adding a sleep in the test thread.
> 

"sched_yield()" rather than sleep perhaps? Might better convey the
intention of the call.


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH] eal/unix: allow creating thread with real-time priority
  2023-10-25 13:34       ` Bruce Richardson
@ 2023-10-25 13:44         ` Thomas Monjalon
  2023-10-25 15:08           ` Stephen Hemminger
  0 siblings, 1 reply; 54+ messages in thread
From: Thomas Monjalon @ 2023-10-25 13:44 UTC (permalink / raw)
  To: Bruce Richardson, Tyler Retzlaff
  Cc: dev, Morten Brørup, Stephen Hemminger, Min Zhou, stable,
	dev, David Marchand, Anatoly Burakov, Narcisa Vasile,
	Tyler Retzlaff, Dmitry Kozlyuk, Konstantin Ananyev,
	Andrew Rybchenko

25/10/2023 15:34, Bruce Richardson:
> On Wed, Oct 25, 2023 at 03:15:49PM +0200, Thomas Monjalon wrote:
> > 24/10/2023 18:04, Stephen Hemminger:
> > > On Tue, 24 Oct 2023 15:55:13 +0200
> > > Morten Brørup <mb@smartsharesystems.com> wrote:
> > > 
> > > > > 
> > > > >    4. It MAY be used by preemptible multi-producer and/or preemptible multi-
> > > > > consumer pthreads whose scheduling policy are all SCHED_OTHER(cfs), SCHED_IDLE
> > > > > or SCHED_BATCH. User SHOULD be aware of the performance penalty before using
> > > > > it.
> > > > > 
> > > > > -  5. It MUST not be used by multi-producer/consumer pthreads, whose
> > > > > scheduling policies are SCHED_FIFO or SCHED_RR.
> > > > > +  5. It MUST not be used by multi-producer/consumer pthreads
> > > > > +     whose scheduling policies are ``SCHED_FIFO``
> > > > > +     or ``SCHED_RR`` (``RTE_THREAD_PRIORITY_REALTIME_CRITICAL``).  
> > > > 
> > > > Do the RTS or HTS ring modes make any difference here?
> > > > 
> > > > Anyway, I agree that real-time priority should not be forbidden on Unix.
> > > > 
> > > > Acked-by: Morten Brørup <mb@smartsharesystems.com>
> > > 
> > > Please add a big warning message in the rte_thread.c and the documentation
> > > to describe the problem. Need to have the "you have been warned" action.
> > 
> > Yes I can add more warnings.
> > 
> > > Use of RT priority is incompatible with 100% poll mode as is typically done
> > > in DPDK applications. A real time thread has higher priority than other necessary
> > > kernel threads on the same CPU. Therefore if the RT thread never sleeps, critical
> > > system actions such as delayed writes, network packet processing and timer updates
> > > will not happen which makes the system unstable.
> > 
> > Yes, and it is shown by the test on loongarch:
> > DPDK:fast-tests / threads_autotest        TIMEOUT        80.01s
> > http://mails.dpdk.org/archives/test-report/2023-October/488760.html
> > 
> > I'll try to pass the test by adding a sleep in the test thread.
> > 
> 
> "sched_yield()" rather than sleep perhaps? Might better convey the
> intention of the call.

Do we have sched_yield on Windows?




^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH] eal/unix: allow creating thread with real-time priority
  2023-10-25 13:44         ` Thomas Monjalon
@ 2023-10-25 15:08           ` Stephen Hemminger
  2023-10-25 15:14             ` Bruce Richardson
  0 siblings, 1 reply; 54+ messages in thread
From: Stephen Hemminger @ 2023-10-25 15:08 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Bruce Richardson, Tyler Retzlaff, dev, Morten Brørup,
	Min Zhou, stable, David Marchand, Anatoly Burakov,
	Narcisa Vasile, Dmitry Kozlyuk, Konstantin Ananyev,
	Andrew Rybchenko

On Wed, 25 Oct 2023 15:44:25 +0200
Thomas Monjalon <thomas@monjalon.net> wrote:

> > > 
> > > I'll try to pass the test by adding a sleep in the test thread.
> > >   
> > 
> > "sched_yield()" rather than sleep perhaps? Might better convey the
> > intention of the call.  
> 
> Do we have sched_yield on Windows?

Windows has an equivalent but sched_yield() won't work here.
Since the DPDK thread is still higher priority than the kernel thread,
the scheduler will reschedule the DPDK thread. You need to sleep
to let kthread run.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v2] eal/unix: allow creating thread with real-time priority
  2023-10-24 12:54 [PATCH] eal/unix: allow creating thread with real-time priority Thomas Monjalon
  2023-10-24 13:55 ` Morten Brørup
@ 2023-10-25 15:13 ` Thomas Monjalon
  2023-10-25 15:37   ` Stephen Hemminger
  2023-10-25 16:31 ` [PATCH v3 0/2] " Thomas Monjalon
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 54+ messages in thread
From: Thomas Monjalon @ 2023-10-25 15:13 UTC (permalink / raw)
  To: dev
  Cc: David Marchand, stable, Morten Brørup, Anatoly Burakov,
	Dmitry Kozlyuk, Narcisa Ana Maria Vasile, Dmitry Malloy,
	Pallavi Kadam, Stephen Hemminger, Tyler Retzlaff,
	Andrew Rybchenko, Konstantin Ananyev

When adding an API for creating threads,
the real-time priority has been forbidden on Unix.

There is a known issue with ring behaviour,
but it should not be completely forbidden.

Real-time thread can block some kernel threads on the same core,
making the system unstable.
That's why a pause is added in the test thread.
This pause is a new API function rte_thread_yield(),
compatible with both Unix and Windows.

Fixes: ca04c78b6262 ("eal: get/set thread priority per thread identifier")
Fixes: ce6e911d20f6 ("eal: add thread lifetime API")
Fixes: a7ba40b2b1bf ("drivers: convert to internal control threads")
Cc: stable@dpdk.org

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
---
 app/test/test_threads.c                       | 11 +---------
 .../prog_guide/env_abstraction_layer.rst      |  4 +++-
 lib/eal/include/rte_thread.h                  | 13 ++++++++++--
 lib/eal/unix/rte_thread.c                     | 21 +++++++++++--------
 lib/eal/version.map                           |  3 +++
 lib/eal/windows/rte_thread.c                  |  6 ++++++
 6 files changed, 36 insertions(+), 22 deletions(-)

diff --git a/app/test/test_threads.c b/app/test/test_threads.c
index 4ac3f2671a..9a449ba9c5 100644
--- a/app/test/test_threads.c
+++ b/app/test/test_threads.c
@@ -22,7 +22,7 @@ thread_main(void *arg)
 	__atomic_store_n(&thread_id_ready, 1, __ATOMIC_RELEASE);
 
 	while (__atomic_load_n(&thread_id_ready, __ATOMIC_ACQUIRE) == 1)
-		;
+		rte_thread_yield(); /* required in case of real-time priority */
 
 	return 0;
 }
@@ -97,21 +97,12 @@ test_thread_priority(void)
 		"Priority set mismatches priority get");
 
 	priority = RTE_THREAD_PRIORITY_REALTIME_CRITICAL;
-#ifndef RTE_EXEC_ENV_WINDOWS
-	RTE_TEST_ASSERT(rte_thread_set_priority(thread_id, priority) == ENOTSUP,
-		"Priority set to critical should fail");
-	RTE_TEST_ASSERT(rte_thread_get_priority(thread_id, &priority) == 0,
-		"Failed to get thread priority");
-	RTE_TEST_ASSERT(priority == RTE_THREAD_PRIORITY_NORMAL,
-		"Failed set to critical should have retained normal");
-#else
 	RTE_TEST_ASSERT(rte_thread_set_priority(thread_id, priority) == 0,
 		"Priority set to critical should succeed");
 	RTE_TEST_ASSERT(rte_thread_get_priority(thread_id, &priority) == 0,
 		"Failed to get thread priority");
 	RTE_TEST_ASSERT(priority == RTE_THREAD_PRIORITY_REALTIME_CRITICAL,
 		"Priority set mismatches priority get");
-#endif
 
 	priority = RTE_THREAD_PRIORITY_NORMAL;
 	RTE_TEST_ASSERT(rte_thread_set_priority(thread_id, priority) == 0,
diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst
index 6debf54efb..d1f7cae7cd 100644
--- a/doc/guides/prog_guide/env_abstraction_layer.rst
+++ b/doc/guides/prog_guide/env_abstraction_layer.rst
@@ -815,7 +815,9 @@ Known Issues
 
   4. It MAY be used by preemptible multi-producer and/or preemptible multi-consumer pthreads whose scheduling policy are all SCHED_OTHER(cfs), SCHED_IDLE or SCHED_BATCH. User SHOULD be aware of the performance penalty before using it.
 
-  5. It MUST not be used by multi-producer/consumer pthreads, whose scheduling policies are SCHED_FIFO or SCHED_RR.
+  5. It MUST not be used by multi-producer/consumer pthreads
+     whose scheduling policies are ``SCHED_FIFO``
+     or ``SCHED_RR`` (``RTE_THREAD_PRIORITY_REALTIME_CRITICAL``).
 
   Alternatively, applications can use the lock-free stack mempool handler. When
   considering this handler, note that:
diff --git a/lib/eal/include/rte_thread.h b/lib/eal/include/rte_thread.h
index 8da9d4d3fb..eeccc40532 100644
--- a/lib/eal/include/rte_thread.h
+++ b/lib/eal/include/rte_thread.h
@@ -56,10 +56,11 @@ typedef uint32_t (*rte_thread_func) (void *arg);
  * Thread priority values.
  */
 enum rte_thread_priority {
+	/** Normal thread priority, the default. */
 	RTE_THREAD_PRIORITY_NORMAL            = 0,
-	/**< normal thread priority, the default */
+	/** Highest thread priority, use with caution.
+	 *  WARNING: System may be unstable because of a real-time busy loop. */
 	RTE_THREAD_PRIORITY_REALTIME_CRITICAL = 1,
-	/**< highest thread priority allowed */
 };
 
 /**
@@ -183,6 +184,14 @@ int rte_thread_join(rte_thread_t thread_id, uint32_t *value_ptr);
  */
 int rte_thread_detach(rte_thread_t thread_id);
 
+/**
+ * Allow another thread to run on the same CPU core.
+ *
+ * Especially useful in real-time thread priority.
+ * @see RTE_THREAD_PRIORITY_REALTIME_CRITICAL
+ */
+void rte_thread_yield(void);
+
 /**
  * Get the id of the calling thread.
  *
diff --git a/lib/eal/unix/rte_thread.c b/lib/eal/unix/rte_thread.c
index 36a21ab2f9..399acf2fa0 100644
--- a/lib/eal/unix/rte_thread.c
+++ b/lib/eal/unix/rte_thread.c
@@ -5,6 +5,7 @@
 
 #include <errno.h>
 #include <pthread.h>
+#include <sched.h>
 #include <stdbool.h>
 #include <stdlib.h>
 #include <string.h>
@@ -49,6 +50,11 @@ thread_map_priority_to_os_value(enum rte_thread_priority eal_pri, int *os_pri,
 			sched_get_priority_max(SCHED_OTHER)) / 2;
 		break;
 	case RTE_THREAD_PRIORITY_REALTIME_CRITICAL:
+		/*
+		 * WARNING: Real-time busy loop takes priority on kernel threads,
+		 *          making the system unstable.
+		 *          There is also a known issue when using rte_ring.
+		 */
 		*pol = SCHED_RR;
 		*os_pri = sched_get_priority_max(SCHED_RR);
 		break;
@@ -153,11 +159,6 @@ rte_thread_create(rte_thread_t *thread_id,
 			goto cleanup;
 		}
 
-		if (thread_attr->priority ==
-				RTE_THREAD_PRIORITY_REALTIME_CRITICAL) {
-			ret = ENOTSUP;
-			goto cleanup;
-		}
 		ret = thread_map_priority_to_os_value(thread_attr->priority,
 				&param.sched_priority, &policy);
 		if (ret != 0)
@@ -227,6 +228,12 @@ rte_thread_detach(rte_thread_t thread_id)
 	return pthread_detach((pthread_t)thread_id.opaque_id);
 }
 
+void
+rte_thread_yield(void)
+{
+	sched_yield();
+}
+
 int
 rte_thread_equal(rte_thread_t t1, rte_thread_t t2)
 {
@@ -275,10 +282,6 @@ rte_thread_set_priority(rte_thread_t thread_id,
 	int policy;
 	int ret;
 
-	/* Realtime priority can cause crashes on non-Windows platforms. */
-	if (priority == RTE_THREAD_PRIORITY_REALTIME_CRITICAL)
-		return ENOTSUP;
-
 	ret = thread_map_priority_to_os_value(priority, &param.sched_priority,
 		&policy);
 	if (ret != 0)
diff --git a/lib/eal/version.map b/lib/eal/version.map
index e00a844805..0b4a503c5f 100644
--- a/lib/eal/version.map
+++ b/lib/eal/version.map
@@ -413,6 +413,9 @@ EXPERIMENTAL {
 	# added in 23.07
 	rte_memzone_max_get;
 	rte_memzone_max_set;
+
+	# added in 23.11
+	rte_thread_yield;
 };
 
 INTERNAL {
diff --git a/lib/eal/windows/rte_thread.c b/lib/eal/windows/rte_thread.c
index acf648456c..b0373b1a55 100644
--- a/lib/eal/windows/rte_thread.c
+++ b/lib/eal/windows/rte_thread.c
@@ -304,6 +304,12 @@ rte_thread_detach(rte_thread_t thread_id)
 	return 0;
 }
 
+void
+rte_thread_yield(void)
+{
+	Sleep(0);
+}
+
 int
 rte_thread_equal(rte_thread_t t1, rte_thread_t t2)
 {
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [PATCH] eal/unix: allow creating thread with real-time priority
  2023-10-25 15:08           ` Stephen Hemminger
@ 2023-10-25 15:14             ` Bruce Richardson
  2023-10-25 15:18               ` Thomas Monjalon
  0 siblings, 1 reply; 54+ messages in thread
From: Bruce Richardson @ 2023-10-25 15:14 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Thomas Monjalon, Tyler Retzlaff, dev, Morten Brørup,
	Min Zhou, stable, David Marchand, Anatoly Burakov,
	Narcisa Vasile, Dmitry Kozlyuk, Konstantin Ananyev,
	Andrew Rybchenko

On Wed, Oct 25, 2023 at 08:08:52AM -0700, Stephen Hemminger wrote:
> On Wed, 25 Oct 2023 15:44:25 +0200
> Thomas Monjalon <thomas@monjalon.net> wrote:
> 
> > > > 
> > > > I'll try to pass the test by adding a sleep in the test thread.
> > > >   
> > > 
> > > "sched_yield()" rather than sleep perhaps? Might better convey the
> > > intention of the call.  
> > 
> > Do we have sched_yield on Windows?
> 
> Windows has an equivalent but sched_yield() won't work here.
> Since the DPDK thread is still higher priority than the kernel thread,
> the scheduler will reschedule the DPDK thread. You need to sleep
> to let kthread run.

Interesting. Thanks for clarifying the situation.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH] eal/unix: allow creating thread with real-time priority
  2023-10-25 15:14             ` Bruce Richardson
@ 2023-10-25 15:18               ` Thomas Monjalon
  2023-10-25 15:32                 ` Thomas Monjalon
  0 siblings, 1 reply; 54+ messages in thread
From: Thomas Monjalon @ 2023-10-25 15:18 UTC (permalink / raw)
  To: Stephen Hemminger, Bruce Richardson
  Cc: Tyler Retzlaff, dev, Morten Brørup, Min Zhou, stable,
	David Marchand, Anatoly Burakov, Narcisa Vasile, Dmitry Kozlyuk,
	Konstantin Ananyev, Andrew Rybchenko

25/10/2023 17:14, Bruce Richardson:
> On Wed, Oct 25, 2023 at 08:08:52AM -0700, Stephen Hemminger wrote:
> > On Wed, 25 Oct 2023 15:44:25 +0200
> > Thomas Monjalon <thomas@monjalon.net> wrote:
> > 
> > > > > 
> > > > > I'll try to pass the test by adding a sleep in the test thread.
> > > > >   
> > > > 
> > > > "sched_yield()" rather than sleep perhaps? Might better convey the
> > > > intention of the call.  
> > > 
> > > Do we have sched_yield on Windows?
> > 
> > Windows has an equivalent but sched_yield() won't work here.
> > Since the DPDK thread is still higher priority than the kernel thread,
> > the scheduler will reschedule the DPDK thread. You need to sleep
> > to let kthread run.
> 
> Interesting. Thanks for clarifying the situation.

Indeed interesting.
I've just sent a v2 before reading this.

So I should try a v3 with a sleep.
But then I need to find a better name than rte_thread_yield.
Ideas?



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH] eal/unix: allow creating thread with real-time priority
  2023-10-25 15:18               ` Thomas Monjalon
@ 2023-10-25 15:32                 ` Thomas Monjalon
  0 siblings, 0 replies; 54+ messages in thread
From: Thomas Monjalon @ 2023-10-25 15:32 UTC (permalink / raw)
  To: Stephen Hemminger, Bruce Richardson, Tyler Retzlaff
  Cc: dev, dev, Morten Brørup, Min Zhou, stable, David Marchand,
	Anatoly Burakov, Narcisa Vasile, Dmitry Kozlyuk,
	Konstantin Ananyev, Andrew Rybchenko

25/10/2023 17:18, Thomas Monjalon:
> 25/10/2023 17:14, Bruce Richardson:
> > On Wed, Oct 25, 2023 at 08:08:52AM -0700, Stephen Hemminger wrote:
> > > On Wed, 25 Oct 2023 15:44:25 +0200
> > > Thomas Monjalon <thomas@monjalon.net> wrote:
> > > 
> > > > > > 
> > > > > > I'll try to pass the test by adding a sleep in the test thread.
> > > > > >   
> > > > > 
> > > > > "sched_yield()" rather than sleep perhaps? Might better convey the
> > > > > intention of the call.  
> > > > 
> > > > Do we have sched_yield on Windows?
> > > 
> > > Windows has an equivalent but sched_yield() won't work here.
> > > Since the DPDK thread is still higher priority than the kernel thread,
> > > the scheduler will reschedule the DPDK thread. You need to sleep
> > > to let kthread run.
> > 
> > Interesting. Thanks for clarifying the situation.
> 
> Indeed interesting.
> I've just sent a v2 before reading this.
> 
> So I should try a v3 with a sleep.
> But then I need to find a better name than rte_thread_yield.
> Ideas?

I will go with rte_thread_yield_realtime().
Any sleep will suffice on Linux? What about a nanosleep?
I suppose Sleep(0) is OK on Windows?




^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2] eal/unix: allow creating thread with real-time priority
  2023-10-25 15:13 ` [PATCH v2] " Thomas Monjalon
@ 2023-10-25 15:37   ` Stephen Hemminger
  2023-10-25 16:46     ` Thomas Monjalon
  0 siblings, 1 reply; 54+ messages in thread
From: Stephen Hemminger @ 2023-10-25 15:37 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: dev, David Marchand, stable, Morten Brørup, Anatoly Burakov,
	Dmitry Kozlyuk, Narcisa Ana Maria Vasile, Dmitry Malloy,
	Pallavi Kadam, Tyler Retzlaff, Andrew Rybchenko,
	Konstantin Ananyev

On Wed, 25 Oct 2023 17:13:14 +0200
Thomas Monjalon <thomas@monjalon.net> wrote:

>  	case RTE_THREAD_PRIORITY_REALTIME_CRITICAL:
> +		/*
> +		 * WARNING: Real-time busy loop takes priority on kernel threads,
> +		 *          making the system unstable.
> +		 *          There is also a known issue when using rte_ring.
> +		 */

I was thinking something like:

	static bool warned;
	if (!warned) {
		RTE_LOG(NOTICE, EAL, "Real time priority is unstable when thread is polling without sleep\n");
		warned = true;
	}

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v3 0/2] allow creating thread with real-time priority
  2023-10-24 12:54 [PATCH] eal/unix: allow creating thread with real-time priority Thomas Monjalon
  2023-10-24 13:55 ` Morten Brørup
  2023-10-25 15:13 ` [PATCH v2] " Thomas Monjalon
@ 2023-10-25 16:31 ` Thomas Monjalon
  2023-10-25 16:31   ` [PATCH v3 1/2] eal: add thread yield functions Thomas Monjalon
                     ` (2 more replies)
  2023-10-26 13:37 ` [PATCH v4 " Thomas Monjalon
                   ` (2 subsequent siblings)
  5 siblings, 3 replies; 54+ messages in thread
From: Thomas Monjalon @ 2023-10-25 16:31 UTC (permalink / raw)
  To: dev; +Cc: David Marchand

Real-time thread priority was been forbidden on Unix
because of problems they can cause.
Warnings and helpers are added to avoid deadlocks,
so real-time can be allowed on all systems.

Thomas Monjalon (2):
  eal: add thread yield functions
  eal/unix: allow creating thread with real-time priority

 app/test/test_threads.c                       | 11 +------
 .../prog_guide/env_abstraction_layer.rst      |  4 ++-
 lib/eal/include/rte_thread.h                  | 29 ++++++++++++++++--
 lib/eal/unix/rte_thread.c                     | 30 +++++++++++++------
 lib/eal/version.map                           |  4 +++
 lib/eal/windows/rte_thread.c                  | 15 ++++++++++
 6 files changed, 71 insertions(+), 22 deletions(-)

-- 
2.42.0


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v3 1/2] eal: add thread yield functions
  2023-10-25 16:31 ` [PATCH v3 0/2] " Thomas Monjalon
@ 2023-10-25 16:31   ` Thomas Monjalon
  2023-10-25 16:40     ` Bruce Richardson
  2023-10-25 18:07     ` Morten Brørup
  2023-10-25 16:31   ` [PATCH v3 2/2] eal/unix: allow creating thread with real-time priority Thomas Monjalon
  2023-10-26 13:36   ` [PATCH v3 0/2] " Thomas Monjalon
  2 siblings, 2 replies; 54+ messages in thread
From: Thomas Monjalon @ 2023-10-25 16:31 UTC (permalink / raw)
  To: dev
  Cc: David Marchand, Dmitry Kozlyuk, Narcisa Ana Maria Vasile,
	Dmitry Malloy, Pallavi Kadam

When running real-time threads, we may need to force scheduling
kernel threads or other real-time threads.
New functions are added to address these cases.

The yield functions should not have any interest for normal threads.
Note: other purposes may be addressed with rte_pause() or rte_delay_*().

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
---
 lib/eal/include/rte_thread.h | 22 ++++++++++++++++++++++
 lib/eal/unix/rte_thread.c    | 16 ++++++++++++++++
 lib/eal/version.map          |  4 ++++
 lib/eal/windows/rte_thread.c | 15 +++++++++++++++
 4 files changed, 57 insertions(+)

diff --git a/lib/eal/include/rte_thread.h b/lib/eal/include/rte_thread.h
index 8da9d4d3fb..139cafac96 100644
--- a/lib/eal/include/rte_thread.h
+++ b/lib/eal/include/rte_thread.h
@@ -183,6 +183,28 @@ int rte_thread_join(rte_thread_t thread_id, uint32_t *value_ptr);
  */
 int rte_thread_detach(rte_thread_t thread_id);
 
+/**
+ * Allow another thread to run on the same CPU core.
+ *
+ * Lower priority threads may not be scheduled.
+ *
+ * Especially useful in real-time thread priority
+ * to schedule other real-time threads.
+ * @see RTE_THREAD_PRIORITY_REALTIME_CRITICAL
+ */
+__rte_experimental
+void rte_thread_yield(void);
+
+/**
+ * Unblock a CPU core running busy in a real-time thread.
+ *
+ * Especially useful in real-time thread priority
+ * to avoid a busy loop blocking vital threads on a core.
+ * @see RTE_THREAD_PRIORITY_REALTIME_CRITICAL
+ */
+__rte_experimental
+void rte_thread_yield_realtime(void);
+
 /**
  * Get the id of the calling thread.
  *
diff --git a/lib/eal/unix/rte_thread.c b/lib/eal/unix/rte_thread.c
index 36a21ab2f9..92b4e53adb 100644
--- a/lib/eal/unix/rte_thread.c
+++ b/lib/eal/unix/rte_thread.c
@@ -5,9 +5,11 @@
 
 #include <errno.h>
 #include <pthread.h>
+#include <sched.h>
 #include <stdbool.h>
 #include <stdlib.h>
 #include <string.h>
+#include <time.h>
 
 #include <rte_errno.h>
 #include <rte_log.h>
@@ -227,6 +229,20 @@ rte_thread_detach(rte_thread_t thread_id)
 	return pthread_detach((pthread_t)thread_id.opaque_id);
 }
 
+void
+rte_thread_yield(void)
+{
+	sched_yield();
+}
+
+void
+rte_thread_yield_realtime(void)
+{
+	/* A simple yield may not be enough to schedule kernel threads. */
+	struct timespec wait = {.tv_nsec = 1};
+	nanosleep(&wait, NULL);
+}
+
 int
 rte_thread_equal(rte_thread_t t1, rte_thread_t t2)
 {
diff --git a/lib/eal/version.map b/lib/eal/version.map
index e00a844805..b81ac3e3af 100644
--- a/lib/eal/version.map
+++ b/lib/eal/version.map
@@ -413,6 +413,10 @@ EXPERIMENTAL {
 	# added in 23.07
 	rte_memzone_max_get;
 	rte_memzone_max_set;
+
+	# added in 23.11
+	rte_thread_yield;
+	rte_thread_yield_realtime;
 };
 
 INTERNAL {
diff --git a/lib/eal/windows/rte_thread.c b/lib/eal/windows/rte_thread.c
index acf648456c..1e031eca40 100644
--- a/lib/eal/windows/rte_thread.c
+++ b/lib/eal/windows/rte_thread.c
@@ -304,6 +304,21 @@ rte_thread_detach(rte_thread_t thread_id)
 	return 0;
 }
 
+void
+rte_thread_yield(void)
+{
+	Sleep(0);
+}
+
+void
+rte_thread_yield_realtime(void)
+{
+	/* Real-time threads are not causing problems on Windows.
+	 * A normal yield should be fine.
+	 */
+	Sleep(0);
+}
+
 int
 rte_thread_equal(rte_thread_t t1, rte_thread_t t2)
 {
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v3 2/2] eal/unix: allow creating thread with real-time priority
  2023-10-25 16:31 ` [PATCH v3 0/2] " Thomas Monjalon
  2023-10-25 16:31   ` [PATCH v3 1/2] eal: add thread yield functions Thomas Monjalon
@ 2023-10-25 16:31   ` Thomas Monjalon
  2023-10-26 13:36   ` [PATCH v3 0/2] " Thomas Monjalon
  2 siblings, 0 replies; 54+ messages in thread
From: Thomas Monjalon @ 2023-10-25 16:31 UTC (permalink / raw)
  To: dev
  Cc: David Marchand, stable, Morten Brørup, Anatoly Burakov,
	Stephen Hemminger, Tyler Retzlaff, Narcisa Vasile,
	Dmitry Kozlyuk, Andrew Rybchenko, Konstantin Ananyev

When adding an API for creating threads,
the real-time priority has been forbidden on Unix.

There is a known issue with ring behaviour,
but it should not be completely forbidden.

Real-time thread can block some kernel threads on the same core,
making the system unstable.
That's why a pause is added in the test thread.

Fixes: ca04c78b6262 ("eal: get/set thread priority per thread identifier")
Fixes: ce6e911d20f6 ("eal: add thread lifetime API")
Fixes: a7ba40b2b1bf ("drivers: convert to internal control threads")
Cc: stable@dpdk.org

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
---
 app/test/test_threads.c                         | 11 +----------
 doc/guides/prog_guide/env_abstraction_layer.rst |  4 +++-
 lib/eal/include/rte_thread.h                    |  7 +++++--
 lib/eal/unix/rte_thread.c                       | 14 +++++---------
 4 files changed, 14 insertions(+), 22 deletions(-)

diff --git a/app/test/test_threads.c b/app/test/test_threads.c
index 4ac3f2671a..c14d39fc83 100644
--- a/app/test/test_threads.c
+++ b/app/test/test_threads.c
@@ -22,7 +22,7 @@ thread_main(void *arg)
 	__atomic_store_n(&thread_id_ready, 1, __ATOMIC_RELEASE);
 
 	while (__atomic_load_n(&thread_id_ready, __ATOMIC_ACQUIRE) == 1)
-		;
+		rte_thread_yield_realtime(); /* required for RT priority */
 
 	return 0;
 }
@@ -97,21 +97,12 @@ test_thread_priority(void)
 		"Priority set mismatches priority get");
 
 	priority = RTE_THREAD_PRIORITY_REALTIME_CRITICAL;
-#ifndef RTE_EXEC_ENV_WINDOWS
-	RTE_TEST_ASSERT(rte_thread_set_priority(thread_id, priority) == ENOTSUP,
-		"Priority set to critical should fail");
-	RTE_TEST_ASSERT(rte_thread_get_priority(thread_id, &priority) == 0,
-		"Failed to get thread priority");
-	RTE_TEST_ASSERT(priority == RTE_THREAD_PRIORITY_NORMAL,
-		"Failed set to critical should have retained normal");
-#else
 	RTE_TEST_ASSERT(rte_thread_set_priority(thread_id, priority) == 0,
 		"Priority set to critical should succeed");
 	RTE_TEST_ASSERT(rte_thread_get_priority(thread_id, &priority) == 0,
 		"Failed to get thread priority");
 	RTE_TEST_ASSERT(priority == RTE_THREAD_PRIORITY_REALTIME_CRITICAL,
 		"Priority set mismatches priority get");
-#endif
 
 	priority = RTE_THREAD_PRIORITY_NORMAL;
 	RTE_TEST_ASSERT(rte_thread_set_priority(thread_id, priority) == 0,
diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst
index 6debf54efb..d1f7cae7cd 100644
--- a/doc/guides/prog_guide/env_abstraction_layer.rst
+++ b/doc/guides/prog_guide/env_abstraction_layer.rst
@@ -815,7 +815,9 @@ Known Issues
 
   4. It MAY be used by preemptible multi-producer and/or preemptible multi-consumer pthreads whose scheduling policy are all SCHED_OTHER(cfs), SCHED_IDLE or SCHED_BATCH. User SHOULD be aware of the performance penalty before using it.
 
-  5. It MUST not be used by multi-producer/consumer pthreads, whose scheduling policies are SCHED_FIFO or SCHED_RR.
+  5. It MUST not be used by multi-producer/consumer pthreads
+     whose scheduling policies are ``SCHED_FIFO``
+     or ``SCHED_RR`` (``RTE_THREAD_PRIORITY_REALTIME_CRITICAL``).
 
   Alternatively, applications can use the lock-free stack mempool handler. When
   considering this handler, note that:
diff --git a/lib/eal/include/rte_thread.h b/lib/eal/include/rte_thread.h
index 139cafac96..1952a10155 100644
--- a/lib/eal/include/rte_thread.h
+++ b/lib/eal/include/rte_thread.h
@@ -56,10 +56,13 @@ typedef uint32_t (*rte_thread_func) (void *arg);
  * Thread priority values.
  */
 enum rte_thread_priority {
+	/** Normal thread priority, the default. */
 	RTE_THREAD_PRIORITY_NORMAL            = 0,
-	/**< normal thread priority, the default */
+	/**
+	 * Highest thread priority, use with caution.
+	 * WARNING: System may be unstable because of a real-time busy loop.
+	 */
 	RTE_THREAD_PRIORITY_REALTIME_CRITICAL = 1,
-	/**< highest thread priority allowed */
 };
 
 /**
diff --git a/lib/eal/unix/rte_thread.c b/lib/eal/unix/rte_thread.c
index 92b4e53adb..87ddf25f1c 100644
--- a/lib/eal/unix/rte_thread.c
+++ b/lib/eal/unix/rte_thread.c
@@ -51,6 +51,11 @@ thread_map_priority_to_os_value(enum rte_thread_priority eal_pri, int *os_pri,
 			sched_get_priority_max(SCHED_OTHER)) / 2;
 		break;
 	case RTE_THREAD_PRIORITY_REALTIME_CRITICAL:
+		/*
+		 * WARNING: Real-time busy loop takes priority on kernel threads,
+		 *          making the system unstable.
+		 *          There is also a known issue when using rte_ring.
+		 */
 		*pol = SCHED_RR;
 		*os_pri = sched_get_priority_max(SCHED_RR);
 		break;
@@ -155,11 +160,6 @@ rte_thread_create(rte_thread_t *thread_id,
 			goto cleanup;
 		}
 
-		if (thread_attr->priority ==
-				RTE_THREAD_PRIORITY_REALTIME_CRITICAL) {
-			ret = ENOTSUP;
-			goto cleanup;
-		}
 		ret = thread_map_priority_to_os_value(thread_attr->priority,
 				&param.sched_priority, &policy);
 		if (ret != 0)
@@ -291,10 +291,6 @@ rte_thread_set_priority(rte_thread_t thread_id,
 	int policy;
 	int ret;
 
-	/* Realtime priority can cause crashes on non-Windows platforms. */
-	if (priority == RTE_THREAD_PRIORITY_REALTIME_CRITICAL)
-		return ENOTSUP;
-
 	ret = thread_map_priority_to_os_value(priority, &param.sched_priority,
 		&policy);
 	if (ret != 0)
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 1/2] eal: add thread yield functions
  2023-10-25 16:31   ` [PATCH v3 1/2] eal: add thread yield functions Thomas Monjalon
@ 2023-10-25 16:40     ` Bruce Richardson
  2023-10-25 17:06       ` Thomas Monjalon
  2023-10-25 18:07     ` Morten Brørup
  1 sibling, 1 reply; 54+ messages in thread
From: Bruce Richardson @ 2023-10-25 16:40 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: dev, David Marchand, Dmitry Kozlyuk, Narcisa Ana Maria Vasile,
	Dmitry Malloy, Pallavi Kadam

On Wed, Oct 25, 2023 at 06:31:10PM +0200, Thomas Monjalon wrote:
> When running real-time threads, we may need to force scheduling
> kernel threads or other real-time threads.
> New functions are added to address these cases.
> 
> The yield functions should not have any interest for normal threads.
> Note: other purposes may be addressed with rte_pause() or rte_delay_*().
> 
> Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
> ---
>  lib/eal/include/rte_thread.h | 22 ++++++++++++++++++++++
>  lib/eal/unix/rte_thread.c    | 16 ++++++++++++++++
>  lib/eal/version.map          |  4 ++++
>  lib/eal/windows/rte_thread.c | 15 +++++++++++++++
>  4 files changed, 57 insertions(+)
> 
> diff --git a/lib/eal/include/rte_thread.h b/lib/eal/include/rte_thread.h
> index 8da9d4d3fb..139cafac96 100644
> --- a/lib/eal/include/rte_thread.h
> +++ b/lib/eal/include/rte_thread.h
> @@ -183,6 +183,28 @@ int rte_thread_join(rte_thread_t thread_id, uint32_t *value_ptr);
>   */
>  int rte_thread_detach(rte_thread_t thread_id);
>  
> +/**
> + * Allow another thread to run on the same CPU core.
> + *
> + * Lower priority threads may not be scheduled.
> + *
> + * Especially useful in real-time thread priority
> + * to schedule other real-time threads.
> + * @see RTE_THREAD_PRIORITY_REALTIME_CRITICAL
> + */
> +__rte_experimental
> +void rte_thread_yield(void);
> +
> +/**
> + * Unblock a CPU core running busy in a real-time thread.
> + *
> + * Especially useful in real-time thread priority
> + * to avoid a busy loop blocking vital threads on a core.
> + * @see RTE_THREAD_PRIORITY_REALTIME_CRITICAL
> + */
> +__rte_experimental
> +void rte_thread_yield_realtime(void);
> +
>  /**
>   * Get the id of the calling thread.
>   *
> diff --git a/lib/eal/unix/rte_thread.c b/lib/eal/unix/rte_thread.c
> index 36a21ab2f9..92b4e53adb 100644
> --- a/lib/eal/unix/rte_thread.c
> +++ b/lib/eal/unix/rte_thread.c
> @@ -5,9 +5,11 @@
>  
>  #include <errno.h>
>  #include <pthread.h>
> +#include <sched.h>
>  #include <stdbool.h>
>  #include <stdlib.h>
>  #include <string.h>
> +#include <time.h>
>  
>  #include <rte_errno.h>
>  #include <rte_log.h>
> @@ -227,6 +229,20 @@ rte_thread_detach(rte_thread_t thread_id)
>  	return pthread_detach((pthread_t)thread_id.opaque_id);
>  }
>  
> +void
> +rte_thread_yield(void)
> +{
> +	sched_yield();
> +}
> +
> +void
> +rte_thread_yield_realtime(void)
> +{
> +	/* A simple yield may not be enough to schedule kernel threads. */
> +	struct timespec wait = {.tv_nsec = 1};
> +	nanosleep(&wait, NULL);
> +}
> +
While I realise we discussed this earlier, and I also was the original
suggester of using sched_yield, I think just having just one function using
sleep is probably best after all.

/Bruce

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2] eal/unix: allow creating thread with real-time priority
  2023-10-25 15:37   ` Stephen Hemminger
@ 2023-10-25 16:46     ` Thomas Monjalon
  2023-10-25 17:54       ` Morten Brørup
  0 siblings, 1 reply; 54+ messages in thread
From: Thomas Monjalon @ 2023-10-25 16:46 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: dev, David Marchand, stable, Morten Brørup, Anatoly Burakov,
	Dmitry Kozlyuk, Narcisa Ana Maria Vasile, Dmitry Malloy,
	Pallavi Kadam, Tyler Retzlaff, Andrew Rybchenko,
	Konstantin Ananyev

25/10/2023 17:37, Stephen Hemminger:
> On Wed, 25 Oct 2023 17:13:14 +0200
> Thomas Monjalon <thomas@monjalon.net> wrote:
> 
> >  	case RTE_THREAD_PRIORITY_REALTIME_CRITICAL:
> > +		/*
> > +		 * WARNING: Real-time busy loop takes priority on kernel threads,
> > +		 *          making the system unstable.
> > +		 *          There is also a known issue when using rte_ring.
> > +		 */
> 
> I was thinking something like:
> 
> 	static bool warned;
> 	if (!warned) {
> 		RTE_LOG(NOTICE, EAL, "Real time priority is unstable when thread is polling without sleep\n");
> 		warned = true;
> 	}

I'm not sure about bothering users.
They can fear something is wrong even if the developer took care of it.
I think doc warnings for developers are more appropriate.
I've added notes in the API.



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 1/2] eal: add thread yield functions
  2023-10-25 16:40     ` Bruce Richardson
@ 2023-10-25 17:06       ` Thomas Monjalon
  0 siblings, 0 replies; 54+ messages in thread
From: Thomas Monjalon @ 2023-10-25 17:06 UTC (permalink / raw)
  To: Bruce Richardson
  Cc: dev, David Marchand, Dmitry Kozlyuk, Narcisa Ana Maria Vasile,
	Dmitry Malloy, Pallavi Kadam

25/10/2023 18:40, Bruce Richardson:
> On Wed, Oct 25, 2023 at 06:31:10PM +0200, Thomas Monjalon wrote:
> > When running real-time threads, we may need to force scheduling
> > kernel threads or other real-time threads.
> > New functions are added to address these cases.
> > 
> > The yield functions should not have any interest for normal threads.
> > Note: other purposes may be addressed with rte_pause() or rte_delay_*().
> > 
> > Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
> > ---
> >  lib/eal/include/rte_thread.h | 22 ++++++++++++++++++++++
> >  lib/eal/unix/rte_thread.c    | 16 ++++++++++++++++
> >  lib/eal/version.map          |  4 ++++
> >  lib/eal/windows/rte_thread.c | 15 +++++++++++++++
> >  4 files changed, 57 insertions(+)
> > 
> > diff --git a/lib/eal/include/rte_thread.h b/lib/eal/include/rte_thread.h
> > index 8da9d4d3fb..139cafac96 100644
> > --- a/lib/eal/include/rte_thread.h
> > +++ b/lib/eal/include/rte_thread.h
> > @@ -183,6 +183,28 @@ int rte_thread_join(rte_thread_t thread_id, uint32_t *value_ptr);
> >   */
> >  int rte_thread_detach(rte_thread_t thread_id);
> >  
> > +/**
> > + * Allow another thread to run on the same CPU core.
> > + *
> > + * Lower priority threads may not be scheduled.
> > + *
> > + * Especially useful in real-time thread priority
> > + * to schedule other real-time threads.
> > + * @see RTE_THREAD_PRIORITY_REALTIME_CRITICAL
> > + */
> > +__rte_experimental
> > +void rte_thread_yield(void);
> > +
> > +/**
> > + * Unblock a CPU core running busy in a real-time thread.
> > + *
> > + * Especially useful in real-time thread priority
> > + * to avoid a busy loop blocking vital threads on a core.
> > + * @see RTE_THREAD_PRIORITY_REALTIME_CRITICAL
> > + */
> > +__rte_experimental
> > +void rte_thread_yield_realtime(void);
> > +
> >  /**
> >   * Get the id of the calling thread.
> >   *
> > diff --git a/lib/eal/unix/rte_thread.c b/lib/eal/unix/rte_thread.c
> > index 36a21ab2f9..92b4e53adb 100644
> > --- a/lib/eal/unix/rte_thread.c
> > +++ b/lib/eal/unix/rte_thread.c
> > @@ -5,9 +5,11 @@
> >  
> >  #include <errno.h>
> >  #include <pthread.h>
> > +#include <sched.h>
> >  #include <stdbool.h>
> >  #include <stdlib.h>
> >  #include <string.h>
> > +#include <time.h>
> >  
> >  #include <rte_errno.h>
> >  #include <rte_log.h>
> > @@ -227,6 +229,20 @@ rte_thread_detach(rte_thread_t thread_id)
> >  	return pthread_detach((pthread_t)thread_id.opaque_id);
> >  }
> >  
> > +void
> > +rte_thread_yield(void)
> > +{
> > +	sched_yield();
> > +}
> > +
> > +void
> > +rte_thread_yield_realtime(void)
> > +{
> > +	/* A simple yield may not be enough to schedule kernel threads. */
> > +	struct timespec wait = {.tv_nsec = 1};
> > +	nanosleep(&wait, NULL);
> > +}
> > +
> While I realise we discussed this earlier, and I also was the original
> suggester of using sched_yield, I think just having just one function using
> sleep is probably best after all.

I think there is a value to have a simple yield function
for scheduling between multiple real-time threads
without sleep overhead (not sure about the overhead).
If there is not much overhead, then a single function is OK.

Note I'm preparing a new version with a simple yield implemented
with the lighter SwitchToThread() on Windows.




^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [PATCH v2] eal/unix: allow creating thread with real-time priority
  2023-10-25 16:46     ` Thomas Monjalon
@ 2023-10-25 17:54       ` Morten Brørup
  2023-10-25 21:33         ` Stephen Hemminger
  2023-10-26  0:00         ` Stephen Hemminger
  0 siblings, 2 replies; 54+ messages in thread
From: Morten Brørup @ 2023-10-25 17:54 UTC (permalink / raw)
  To: Thomas Monjalon, Stephen Hemminger
  Cc: dev, David Marchand, stable, Anatoly Burakov, Dmitry Kozlyuk,
	Narcisa Ana Maria Vasile, Dmitry Malloy, Pallavi Kadam,
	Tyler Retzlaff, Andrew Rybchenko, Konstantin Ananyev

> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> Sent: Wednesday, 25 October 2023 18.46
> 
> 25/10/2023 17:37, Stephen Hemminger:
> > On Wed, 25 Oct 2023 17:13:14 +0200
> > Thomas Monjalon <thomas@monjalon.net> wrote:
> >
> > >  	case RTE_THREAD_PRIORITY_REALTIME_CRITICAL:
> > > +		/*
> > > +		 * WARNING: Real-time busy loop takes priority on kernel
> threads,
> > > +		 *          making the system unstable.
> > > +		 *          There is also a known issue when using
> rte_ring.
> > > +		 */
> >
> > I was thinking something like:
> >
> > 	static bool warned;
> > 	if (!warned) {
> > 		RTE_LOG(NOTICE, EAL, "Real time priority is unstable when
> thread is polling without sleep\n");
> > 		warned = true;
> > 	}
> 
> I'm not sure about bothering users.
> They can fear something is wrong even if the developer took care of it.
> I think doc warnings for developers are more appropriate.
> I've added notes in the API.

I agree with Thomas on this.

If you want the log message, please degrade it to INFO or DEBUG level. It is only relevant when chasing problems, not for normal production - and thus NOTICE is too high.


Someone might build a kernel with options to keep non-dataplane threads off some dedicated CPU cores, so they can be used for guaranteed low-latency dataplane threads. We do. We don't use real-time priority, though.

For reference, we did some experiments (using this custom built kernel) with a dedicated thread doing nothing but a loop calling rte_rdtsc_precise() and registering the delta. Although the overwhelming majority is ca. CPU 80 cycles, there are some big outliers at ca. 9,000 CPU cycles. (Order of magnitude: ca. 45 of these big outliers per minute.) Apparently some kernel threads steal some cycles from this thread, regardless of our customizations. We haven't bothered analyzing and optimizing it further.

I think our experiment supports the need to allow kernel threads to run, e.g. by calling sleep() or similar, when an EAL thread has real-time priority.


^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [PATCH v3 1/2] eal: add thread yield functions
  2023-10-25 16:31   ` [PATCH v3 1/2] eal: add thread yield functions Thomas Monjalon
  2023-10-25 16:40     ` Bruce Richardson
@ 2023-10-25 18:07     ` Morten Brørup
  1 sibling, 0 replies; 54+ messages in thread
From: Morten Brørup @ 2023-10-25 18:07 UTC (permalink / raw)
  To: Thomas Monjalon, dev
  Cc: David Marchand, Dmitry Kozlyuk, Narcisa Ana Maria Vasile,
	Dmitry Malloy, Pallavi Kadam, Bruce Richardson

> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> Sent: Wednesday, 25 October 2023 18.31
> 
> When running real-time threads, we may need to force scheduling
> kernel threads or other real-time threads.
> New functions are added to address these cases.
> 
> The yield functions should not have any interest for normal threads.
> Note: other purposes may be addressed with rte_pause() or rte_delay_*().
> 
> Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
> ---
>  lib/eal/include/rte_thread.h | 22 ++++++++++++++++++++++
>  lib/eal/unix/rte_thread.c    | 16 ++++++++++++++++
>  lib/eal/version.map          |  4 ++++
>  lib/eal/windows/rte_thread.c | 15 +++++++++++++++
>  4 files changed, 57 insertions(+)
> 
> diff --git a/lib/eal/include/rte_thread.h b/lib/eal/include/rte_thread.h
> index 8da9d4d3fb..139cafac96 100644
> --- a/lib/eal/include/rte_thread.h
> +++ b/lib/eal/include/rte_thread.h
> @@ -183,6 +183,28 @@ int rte_thread_join(rte_thread_t thread_id,
> uint32_t *value_ptr);
>   */
>  int rte_thread_detach(rte_thread_t thread_id);
> 
> +/**
> + * Allow another thread to run on the same CPU core.
> + *
> + * Lower priority threads may not be scheduled.
> + *
> + * Especially useful in real-time thread priority
> + * to schedule other real-time threads.
> + * @see RTE_THREAD_PRIORITY_REALTIME_CRITICAL
> + */
> +__rte_experimental
> +void rte_thread_yield(void);
> +
> +/**
> + * Unblock a CPU core running busy in a real-time thread.
> + *
> + * Especially useful in real-time thread priority
> + * to avoid a busy loop blocking vital threads on a core.
> + * @see RTE_THREAD_PRIORITY_REALTIME_CRITICAL
> + */
> +__rte_experimental
> +void rte_thread_yield_realtime(void);
> +

If an application really needs to use real-time priority, the behavior of any DPDK yield functions must be documented in much more detail than this - especially in regard to expected latency.

[...]

> diff --git a/lib/eal/windows/rte_thread.c b/lib/eal/windows/rte_thread.c
> index acf648456c..1e031eca40 100644
> --- a/lib/eal/windows/rte_thread.c
> +++ b/lib/eal/windows/rte_thread.c
> @@ -304,6 +304,21 @@ rte_thread_detach(rte_thread_t thread_id)
>  	return 0;
>  }
> 
> +void
> +rte_thread_yield(void)
> +{
> +	Sleep(0);
> +}
> +
> +void
> +rte_thread_yield_realtime(void)
> +{
> +	/* Real-time threads are not causing problems on Windows.
> +	 * A normal yield should be fine.

Back in the days, the Windows API had a Yield() function; make sure your comment can't be misunderstood as referring to that.

> +	 */
> +	Sleep(0);
> +}


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2] eal/unix: allow creating thread with real-time priority
  2023-10-25 17:54       ` Morten Brørup
@ 2023-10-25 21:33         ` Stephen Hemminger
  2023-10-26  7:33           ` Morten Brørup
  2023-10-26  0:00         ` Stephen Hemminger
  1 sibling, 1 reply; 54+ messages in thread
From: Stephen Hemminger @ 2023-10-25 21:33 UTC (permalink / raw)
  To: Morten Brørup
  Cc: Thomas Monjalon, dev, David Marchand, stable, Anatoly Burakov,
	Dmitry Kozlyuk, Narcisa Ana Maria Vasile, Dmitry Malloy,
	Pallavi Kadam, Tyler Retzlaff, Andrew Rybchenko,
	Konstantin Ananyev

On Wed, 25 Oct 2023 19:54:06 +0200
Morten Brørup <mb@smartsharesystems.com> wrote:

> I agree with Thomas on this.
> 
> If you want the log message, please degrade it to INFO or DEBUG level. It is only relevant when chasing problems, not for normal production - and thus NOTICE is too high.

I don't want the message to be hidden.
If we get any bug reports want to be able to say "read the log, don't do that".

> Someone might build a kernel with options to keep non-dataplane threads off some dedicated CPU cores, so they can be used for guaranteed low-latency dataplane threads. We do. We don't use real-time priority, though.

This is really, hard to do. Isolated CPU's are not isolated from interrupts and other sources which end up scheduling work as kernel threads. Plus there is the behavior where kernel decides to turn a soft irq into a kernel thread, then starve
itself. Under starvation, disk corruption is likely if interrupts never get processed :-(

> For reference, we did some experiments (using this custom built kernel) with a dedicated thread doing nothing but a loop calling rte_rdtsc_precise() and registering the delta. Although the overwhelming majority is ca. CPU 80 cycles, there are some big outliers at ca. 9,000 CPU cycles. (Order of magnitude: ca. 45 of these big outliers per minute.) Apparently some kernel threads steal some cycles from this thread, regardless of our customizations. We haven't bothered analyzing and optimizing it further.

Was this on isolated CPU?
Did you check that that CPU was excluded from the smp_affinty mask on all devices?
Did you enable the kernel feature to avoid clock ticks if CPU is dedicated?
Same thing for RCU, need to adjust parameters?

Also, on many systems there can be SMI BIOS hidden execution that will cause big outliers.

Lastly never try and use CPU 0. The kernel uses CPU 0 as catch all in lots of places.

> I think our experiment supports the need to allow kernel threads to run, e.g. by calling sleep() or similar, when an EAL thread has real-time priority.


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2] eal/unix: allow creating thread with real-time priority
  2023-10-25 17:54       ` Morten Brørup
  2023-10-25 21:33         ` Stephen Hemminger
@ 2023-10-26  0:00         ` Stephen Hemminger
  1 sibling, 0 replies; 54+ messages in thread
From: Stephen Hemminger @ 2023-10-26  0:00 UTC (permalink / raw)
  To: Morten Brørup
  Cc: Thomas Monjalon, dev, David Marchand, stable, Anatoly Burakov,
	Dmitry Kozlyuk, Narcisa Ana Maria Vasile, Dmitry Malloy,
	Pallavi Kadam, Tyler Retzlaff, Andrew Rybchenko,
	Konstantin Ananyev

On Wed, 25 Oct 2023 19:54:06 +0200
Morten Brørup <mb@smartsharesystems.com> wrote:

> Someone might build a kernel with options to keep non-dataplane threads off some dedicated CPU cores, so they can be used for guaranteed low-latency dataplane threads. We do. We don't use real-time priority, though.
> 
> For reference, we did some experiments (using this custom built kernel) with a dedicated thread doing nothing but a loop calling rte_rdtsc_precise() and registering the delta. Although the overwhelming majority is ca. CPU 80 cycles, there are some big outliers at ca. 9,000 CPU cycles. (Order of magnitude: ca. 45 of these big outliers per minute.) Apparently some kernel threads steal some cycles from this thread, regardless of our customizations. We haven't bothered analyzing and optimizing it further.
> 
> I think our experiment supports the need to allow kernel threads to run, e.g. by calling sleep() or similar, when an EAL thread has real-time priority.

First. We need to dispel the myth that on Linux real time is faster.
It isn't, you can ask the RT kernel developers if you need more support.
The purpose of RT is to have user process respond in a deterministic fashion
to a kernel event (ie. usually HW interrupt or IPC). In most cases, this is
not how DPDK applications are written. It is possible to use hardware interrupts
in DPDK, but getting it right is hard, and the latency of HW to kernel to DPDK
in userspace is a long time. RT will make it more consistent, but it won't
remove the overhead; i.e less long tail with RT, but average latency will
still be too long for most network applications.

Users say "but my polling thread is getting latency", then you need to make sure
your application is running on dedicated cores and the system is configured to
not use those cores for HW events.  Doing RT won't fix that.

Then some user say "but I want to run multiple polling threads on a single core".
That plain won't work no matter what the priority.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [PATCH v2] eal/unix: allow creating thread with real-time priority
  2023-10-25 21:33         ` Stephen Hemminger
@ 2023-10-26  7:33           ` Morten Brørup
  2023-10-26 16:32             ` Stephen Hemminger
  0 siblings, 1 reply; 54+ messages in thread
From: Morten Brørup @ 2023-10-26  7:33 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Thomas Monjalon, dev, David Marchand, stable, Anatoly Burakov,
	Dmitry Kozlyuk, Narcisa Ana Maria Vasile, Dmitry Malloy,
	Pallavi Kadam, Tyler Retzlaff, Andrew Rybchenko,
	Konstantin Ananyev

> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Wednesday, 25 October 2023 23.33
> 
> On Wed, 25 Oct 2023 19:54:06 +0200
> Morten Brørup <mb@smartsharesystems.com> wrote:
> 
> > I agree with Thomas on this.
> >
> > If you want the log message, please degrade it to INFO or DEBUG level. It is
> only relevant when chasing problems, not for normal production - and thus
> NOTICE is too high.
> 
> I don't want the message to be hidden.
> If we get any bug reports want to be able to say "read the log, don't do
> that".

Since Stephen is arguing so strongly for it, I have changed my mind, and now support Stephen's suggestion.

It's a tradeoff: Noise for carefully designed systems, vs. important bug hunting information for systems under development (or casually developed systems).
As Stephen points out, it is a good starting point to check for bug reports possibly related to this. And, I suppose the experienced users who really understands it will not be seriously confused by such a NOTICE message in the log.

> 
> > Someone might build a kernel with options to keep non-dataplane threads off
> some dedicated CPU cores, so they can be used for guaranteed low-latency
> dataplane threads. We do. We don't use real-time priority, though.
> 
> This is really, hard to do.

As my kids would say: This is really, really, really, really, really hard to do!

We have not been able to find an authoritative source of documentation describing how to do it. :-(

And our experiment shows that we didn't 100 % succeed doing it. But we got close enough for our purposes. Outliers of max 9,000 CPU cycles on a 3+ GHz CPU corresponds to max 3 microseconds of added worst-case latency.

It would be great for latency-sensitive applications if the DPDK documentation went more into detail on this topic. However, if the DPDK runs on top of a Linux distro, it essentially depends on the distro, and should be documented there. And if running on top of a custom built Linux Kernel, it essentially depends on the kernel, and should be documented there. In other words: Such information should be contributed there, and not in the DPDK documentation. ;-)

> Isolated CPU's are not isolated from interrupts
> and other sources which end up scheduling work as kernel threads. Plus there
> is the behavior where kernel decides to turn a soft irq into a kernel thread,
> then starve itself.

We have configured the kernel to put all of this on CPU 0. (Details further below.)

> Under starvation, disk corruption is likely if interrupts never get
> processed :-(
> 
> > For reference, we did some experiments (using this custom built kernel) with
> a dedicated thread doing nothing but a loop calling rte_rdtsc_precise() and
> registering the delta. Although the overwhelming majority is ca. CPU 80
> cycles, there are some big outliers at ca. 9,000 CPU cycles. (Order of
> magnitude: ca. 45 of these big outliers per minute.) Apparently some kernel
> threads steal some cycles from this thread, regardless of our customizations.
> We haven't bothered analyzing and optimizing it further.
> 
> Was this on isolated CPU?

Yes. We isolate all CPUs but CPU 0.

> Did you check that that CPU was excluded from the smp_affinty mask on all
> devices?

Not sure how to do that?

NB: We are currently only using single-socket hardware - this makes some things easier. Perhaps this is one of those things?

> Did you enable the kernel feature to avoid clock ticks if CPU is dedicated?

Yes:
# Timers subsystem
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
CONFIG_NO_HZ_FULL=y
CONFIG_NO_HZ_FULL_ALL=y

CONFIG_CMDLINE="isolcpus=1-32 irqaffinity=0 rcu_nocb_poll"

> Same thing for RCU, need to adjust parameters?

Yes:
# RCU Subsystem
CONFIG_TREE_RCU=y
CONFIG_SRCU=y
CONFIG_RCU_STALL_COMMON=y
CONFIG_CONTEXT_TRACKING=y
CONFIG_RCU_NOCB_CPU=y
CONFIG_RCU_NOCB_CPU_ALL=y

> 
> Also, on many systems there can be SMI BIOS hidden execution that will cause
> big outliers.

Yes, this is a big surprise to many people, when it happens. Our hardware doesn't suffer from that.

> 
> Lastly never try and use CPU 0. The kernel uses CPU 0 as catch all in lots of
> places.

Yes, this is very important! We treat CPU 0 as if any random process or interrupt handler can take it away at any time.

> 
> > I think our experiment supports the need to allow kernel threads to run,
> e.g. by calling sleep() or similar, when an EAL thread has real-time priority.


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 0/2] allow creating thread with real-time priority
  2023-10-25 16:31 ` [PATCH v3 0/2] " Thomas Monjalon
  2023-10-25 16:31   ` [PATCH v3 1/2] eal: add thread yield functions Thomas Monjalon
  2023-10-25 16:31   ` [PATCH v3 2/2] eal/unix: allow creating thread with real-time priority Thomas Monjalon
@ 2023-10-26 13:36   ` Thomas Monjalon
  2023-10-26 13:44     ` Morten Brørup
  2 siblings, 1 reply; 54+ messages in thread
From: Thomas Monjalon @ 2023-10-26 13:36 UTC (permalink / raw)
  To: Morten Brørup, stephen, bruce.richardson; +Cc: dev, David Marchand

25/10/2023 18:31, Thomas Monjalon:
> Real-time thread priority was been forbidden on Unix
> because of problems they can cause.
> Warnings and helpers are added to avoid deadlocks,
> so real-time can be allowed on all systems.

Unit test is failing:
DPDK:fast-tests / threads_autotest      TIMEOUT 600.01 s

It is seen in only 1 target (maybe the failure occurence is random):
  Debian 11 (Buster) (ARM) | PASS
  Fedora 37 (ARM)          | PASS
  CentOS Stream 9 (ARM)    | FAIL
  Fedora 38 (ARM)          | PASS
  Fedora 38 (ARM Clang)    | PASS
  Ubuntu 20.04 (ARM)       | PASS

I need to send a v4 with new implementation and better comments.
The Unix sleep will be upgraded from 1 ns to 1 us in case it makes a difference.



^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v4 0/2] allow creating thread with real-time priority
  2023-10-24 12:54 [PATCH] eal/unix: allow creating thread with real-time priority Thomas Monjalon
                   ` (2 preceding siblings ...)
  2023-10-25 16:31 ` [PATCH v3 0/2] " Thomas Monjalon
@ 2023-10-26 13:37 ` Thomas Monjalon
  2023-10-26 13:37   ` [PATCH v4 1/2] eal: add thread yield functions Thomas Monjalon
  2023-10-26 13:37   ` [PATCH v4 2/2] eal/unix: allow creating thread with real-time priority Thomas Monjalon
  2023-10-26 14:19 ` [PATCH v5 0/2] " Thomas Monjalon
  2023-10-27  8:08 ` [PATCH v6 1/1] eal/unix: " Thomas Monjalon
  5 siblings, 2 replies; 54+ messages in thread
From: Thomas Monjalon @ 2023-10-26 13:37 UTC (permalink / raw)
  To: dev; +Cc: David Marchand

Real-time thread priority was been forbidden on Unix
because of problems they can cause.
Warnings and helpers are added to avoid deadlocks,
so real-time can be allowed on all systems.

Thomas Monjalon (2):
  eal: add thread yield functions
  eal/unix: allow creating thread with real-time priority

v1: no yield at all
v2: more comments, sched_yield() and Sleep(0) on Windows
v3: 2 yield functions with sleep in realtime version
v4: runtime warning, longer sleep on Unix and lighter yield on Windows

 app/test/test_threads.c                       | 11 +-----
 .../prog_guide/env_abstraction_layer.rst      |  4 +-
 lib/eal/include/rte_thread.h                  | 33 +++++++++++++++-
 lib/eal/unix/rte_thread.c                     | 38 ++++++++++++++-----
 lib/eal/version.map                           |  4 ++
 lib/eal/windows/rte_thread.c                  | 15 ++++++++
 6 files changed, 83 insertions(+), 22 deletions(-)

-- 
2.42.0


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v4 1/2] eal: add thread yield functions
  2023-10-26 13:37 ` [PATCH v4 " Thomas Monjalon
@ 2023-10-26 13:37   ` Thomas Monjalon
  2023-10-26 13:37   ` [PATCH v4 2/2] eal/unix: allow creating thread with real-time priority Thomas Monjalon
  1 sibling, 0 replies; 54+ messages in thread
From: Thomas Monjalon @ 2023-10-26 13:37 UTC (permalink / raw)
  To: dev
  Cc: David Marchand, Dmitry Kozlyuk, Narcisa Ana Maria Vasile,
	Dmitry Malloy, Pallavi Kadam

When running real-time threads, we may need to force scheduling
kernel threads or other real-time threads.
New functions are added to address these cases.

The yield functions should not have any interest for normal threads.
Note: other purposes may be addressed with rte_pause() or rte_delay_*().

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
---
 lib/eal/include/rte_thread.h | 25 +++++++++++++++++++++++++
 lib/eal/unix/rte_thread.c    | 17 +++++++++++++++++
 lib/eal/version.map          |  4 ++++
 lib/eal/windows/rte_thread.c | 15 +++++++++++++++
 4 files changed, 61 insertions(+)

diff --git a/lib/eal/include/rte_thread.h b/lib/eal/include/rte_thread.h
index 8da9d4d3fb..f2581fe152 100644
--- a/lib/eal/include/rte_thread.h
+++ b/lib/eal/include/rte_thread.h
@@ -183,6 +183,31 @@ int rte_thread_join(rte_thread_t thread_id, uint32_t *value_ptr);
  */
 int rte_thread_detach(rte_thread_t thread_id);
 
+/**
+ * Allow another thread to run on the same CPU core.
+ *
+ * This is a scheduler request, with minimum latency.
+ * Lower priority threads may not be scheduled.
+ *
+ * Especially useful in real-time thread priority
+ * to schedule other real-time threads.
+ * @see RTE_THREAD_PRIORITY_REALTIME_CRITICAL
+ */
+__rte_experimental
+void rte_thread_yield(void);
+
+/**
+ * Unblock a CPU core running busy in a real-time thread.
+ *
+ * This is a sleep call, giving priority to all other threads.
+ *
+ * Especially useful in real-time thread priority
+ * to avoid a busy loop blocking vital threads on a core.
+ * @see RTE_THREAD_PRIORITY_REALTIME_CRITICAL
+ */
+__rte_experimental
+void rte_thread_yield_realtime(void);
+
 /**
  * Get the id of the calling thread.
  *
diff --git a/lib/eal/unix/rte_thread.c b/lib/eal/unix/rte_thread.c
index 36a21ab2f9..d0758f23bb 100644
--- a/lib/eal/unix/rte_thread.c
+++ b/lib/eal/unix/rte_thread.c
@@ -5,9 +5,11 @@
 
 #include <errno.h>
 #include <pthread.h>
+#include <sched.h>
 #include <stdbool.h>
 #include <stdlib.h>
 #include <string.h>
+#include <time.h>
 
 #include <rte_errno.h>
 #include <rte_log.h>
@@ -227,6 +229,21 @@ rte_thread_detach(rte_thread_t thread_id)
 	return pthread_detach((pthread_t)thread_id.opaque_id);
 }
 
+void
+rte_thread_yield(void)
+{
+	sched_yield();
+}
+
+void
+rte_thread_yield_realtime(void)
+{
+	/* A simple yield may not be enough to schedule kernel threads. */
+	struct timespec wait = {.tv_nsec = 1000};
+	nanosleep(&wait, NULL);
+	usleep
+}
+
 int
 rte_thread_equal(rte_thread_t t1, rte_thread_t t2)
 {
diff --git a/lib/eal/version.map b/lib/eal/version.map
index e00a844805..b81ac3e3af 100644
--- a/lib/eal/version.map
+++ b/lib/eal/version.map
@@ -413,6 +413,10 @@ EXPERIMENTAL {
 	# added in 23.07
 	rte_memzone_max_get;
 	rte_memzone_max_set;
+
+	# added in 23.11
+	rte_thread_yield;
+	rte_thread_yield_realtime;
 };
 
 INTERNAL {
diff --git a/lib/eal/windows/rte_thread.c b/lib/eal/windows/rte_thread.c
index acf648456c..ea17b5de3d 100644
--- a/lib/eal/windows/rte_thread.c
+++ b/lib/eal/windows/rte_thread.c
@@ -304,6 +304,21 @@ rte_thread_detach(rte_thread_t thread_id)
 	return 0;
 }
 
+void
+rte_thread_yield(void)
+{
+	SwitchToThread();
+}
+
+void
+rte_thread_yield_realtime(void)
+{
+	/* Real-time threads are not causing problems on Windows.
+	 * A simple sleep should be more than enough.
+	 */
+	Sleep(0);
+}
+
 int
 rte_thread_equal(rte_thread_t t1, rte_thread_t t2)
 {
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v4 2/2] eal/unix: allow creating thread with real-time priority
  2023-10-26 13:37 ` [PATCH v4 " Thomas Monjalon
  2023-10-26 13:37   ` [PATCH v4 1/2] eal: add thread yield functions Thomas Monjalon
@ 2023-10-26 13:37   ` Thomas Monjalon
  1 sibling, 0 replies; 54+ messages in thread
From: Thomas Monjalon @ 2023-10-26 13:37 UTC (permalink / raw)
  To: dev
  Cc: David Marchand, stable, Morten Brørup, Anatoly Burakov,
	Stephen Hemminger, Narcisa Vasile, Tyler Retzlaff,
	Dmitry Kozlyuk, Andrew Rybchenko, Konstantin Ananyev

When adding an API for creating threads,
the real-time priority has been forbidden on Unix.

There is a known issue with ring behaviour,
but it should not be completely forbidden.

Real-time thread can block some kernel threads on the same core,
making the system unstable.
That's why a pause is added in the test thread.

Fixes: ca04c78b6262 ("eal: get/set thread priority per thread identifier")
Fixes: ce6e911d20f6 ("eal: add thread lifetime API")
Fixes: a7ba40b2b1bf ("drivers: convert to internal control threads")
Cc: stable@dpdk.org

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
---
 app/test/test_threads.c                       | 11 +---------
 .../prog_guide/env_abstraction_layer.rst      |  4 +++-
 lib/eal/include/rte_thread.h                  |  8 +++++--
 lib/eal/unix/rte_thread.c                     | 21 +++++++++++--------
 4 files changed, 22 insertions(+), 22 deletions(-)

diff --git a/app/test/test_threads.c b/app/test/test_threads.c
index 4ac3f2671a..c14d39fc83 100644
--- a/app/test/test_threads.c
+++ b/app/test/test_threads.c
@@ -22,7 +22,7 @@ thread_main(void *arg)
 	__atomic_store_n(&thread_id_ready, 1, __ATOMIC_RELEASE);
 
 	while (__atomic_load_n(&thread_id_ready, __ATOMIC_ACQUIRE) == 1)
-		;
+		rte_thread_yield_realtime(); /* required for RT priority */
 
 	return 0;
 }
@@ -97,21 +97,12 @@ test_thread_priority(void)
 		"Priority set mismatches priority get");
 
 	priority = RTE_THREAD_PRIORITY_REALTIME_CRITICAL;
-#ifndef RTE_EXEC_ENV_WINDOWS
-	RTE_TEST_ASSERT(rte_thread_set_priority(thread_id, priority) == ENOTSUP,
-		"Priority set to critical should fail");
-	RTE_TEST_ASSERT(rte_thread_get_priority(thread_id, &priority) == 0,
-		"Failed to get thread priority");
-	RTE_TEST_ASSERT(priority == RTE_THREAD_PRIORITY_NORMAL,
-		"Failed set to critical should have retained normal");
-#else
 	RTE_TEST_ASSERT(rte_thread_set_priority(thread_id, priority) == 0,
 		"Priority set to critical should succeed");
 	RTE_TEST_ASSERT(rte_thread_get_priority(thread_id, &priority) == 0,
 		"Failed to get thread priority");
 	RTE_TEST_ASSERT(priority == RTE_THREAD_PRIORITY_REALTIME_CRITICAL,
 		"Priority set mismatches priority get");
-#endif
 
 	priority = RTE_THREAD_PRIORITY_NORMAL;
 	RTE_TEST_ASSERT(rte_thread_set_priority(thread_id, priority) == 0,
diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst
index 6debf54efb..d1f7cae7cd 100644
--- a/doc/guides/prog_guide/env_abstraction_layer.rst
+++ b/doc/guides/prog_guide/env_abstraction_layer.rst
@@ -815,7 +815,9 @@ Known Issues
 
   4. It MAY be used by preemptible multi-producer and/or preemptible multi-consumer pthreads whose scheduling policy are all SCHED_OTHER(cfs), SCHED_IDLE or SCHED_BATCH. User SHOULD be aware of the performance penalty before using it.
 
-  5. It MUST not be used by multi-producer/consumer pthreads, whose scheduling policies are SCHED_FIFO or SCHED_RR.
+  5. It MUST not be used by multi-producer/consumer pthreads
+     whose scheduling policies are ``SCHED_FIFO``
+     or ``SCHED_RR`` (``RTE_THREAD_PRIORITY_REALTIME_CRITICAL``).
 
   Alternatively, applications can use the lock-free stack mempool handler. When
   considering this handler, note that:
diff --git a/lib/eal/include/rte_thread.h b/lib/eal/include/rte_thread.h
index f2581fe152..7ff031e1b2 100644
--- a/lib/eal/include/rte_thread.h
+++ b/lib/eal/include/rte_thread.h
@@ -56,10 +56,14 @@ typedef uint32_t (*rte_thread_func) (void *arg);
  * Thread priority values.
  */
 enum rte_thread_priority {
+	/** Normal thread priority, the default. */
 	RTE_THREAD_PRIORITY_NORMAL            = 0,
-	/**< normal thread priority, the default */
+	/**
+	 * Highest thread priority, use with caution.
+	 * WARNING: System may be unstable because of a real-time busy loop.
+	 *          @see rte_thread_yield_realtime().
+	 */
 	RTE_THREAD_PRIORITY_REALTIME_CRITICAL = 1,
-	/**< highest thread priority allowed */
 };
 
 /**
diff --git a/lib/eal/unix/rte_thread.c b/lib/eal/unix/rte_thread.c
index d0758f23bb..e9967e7e41 100644
--- a/lib/eal/unix/rte_thread.c
+++ b/lib/eal/unix/rte_thread.c
@@ -51,6 +51,18 @@ thread_map_priority_to_os_value(enum rte_thread_priority eal_pri, int *os_pri,
 			sched_get_priority_max(SCHED_OTHER)) / 2;
 		break;
 	case RTE_THREAD_PRIORITY_REALTIME_CRITICAL:
+		/*
+		 * WARNING: Real-time busy loop takes priority on kernel threads,
+		 *          making the system unstable.
+		 *          There is also a known issue when using rte_ring.
+		 */
+		static bool warned;
+		if (!warned) {
+			RTE_LOG(NOTICE, EAL,
+					"Real-time thread is unstable if polling without sleep.\n");
+			warned = true;
+		}
+
 		*pol = SCHED_RR;
 		*os_pri = sched_get_priority_max(SCHED_RR);
 		break;
@@ -155,11 +167,6 @@ rte_thread_create(rte_thread_t *thread_id,
 			goto cleanup;
 		}
 
-		if (thread_attr->priority ==
-				RTE_THREAD_PRIORITY_REALTIME_CRITICAL) {
-			ret = ENOTSUP;
-			goto cleanup;
-		}
 		ret = thread_map_priority_to_os_value(thread_attr->priority,
 				&param.sched_priority, &policy);
 		if (ret != 0)
@@ -292,10 +299,6 @@ rte_thread_set_priority(rte_thread_t thread_id,
 	int policy;
 	int ret;
 
-	/* Realtime priority can cause crashes on non-Windows platforms. */
-	if (priority == RTE_THREAD_PRIORITY_REALTIME_CRITICAL)
-		return ENOTSUP;
-
 	ret = thread_map_priority_to_os_value(priority, &param.sched_priority,
 		&policy);
 	if (ret != 0)
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* RE: [PATCH v3 0/2] allow creating thread with real-time priority
  2023-10-26 13:36   ` [PATCH v3 0/2] " Thomas Monjalon
@ 2023-10-26 13:44     ` Morten Brørup
  2023-10-26 13:57       ` Morten Brørup
  0 siblings, 1 reply; 54+ messages in thread
From: Morten Brørup @ 2023-10-26 13:44 UTC (permalink / raw)
  To: Thomas Monjalon, stephen, bruce.richardson; +Cc: dev, David Marchand

> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> Sent: Thursday, 26 October 2023 15.37
> 
> 25/10/2023 18:31, Thomas Monjalon:
> > Real-time thread priority was been forbidden on Unix
> > because of problems they can cause.
> > Warnings and helpers are added to avoid deadlocks,
> > so real-time can be allowed on all systems.
> 
> Unit test is failing:
> DPDK:fast-tests / threads_autotest      TIMEOUT 600.01 s
> 
> It is seen in only 1 target (maybe the failure occurence is random):
>   Debian 11 (Buster) (ARM) | PASS
>   Fedora 37 (ARM)          | PASS
>   CentOS Stream 9 (ARM)    | FAIL
>   Fedora 38 (ARM)          | PASS
>   Fedora 38 (ARM Clang)    | PASS
>   Ubuntu 20.04 (ARM)       | PASS
> 
> I need to send a v4 with new implementation and better comments.
> The Unix sleep will be upgraded from 1 ns to 1 us in case it makes a
> difference.

It will not make a difference. The kernel will go through the sleeping steps, then wake up again and see the real-time thread is ready to run, and then immediately schedule it.

For testing purposes, consider sleeping 10 milliseconds or something significant like that.



^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [PATCH v3 0/2] allow creating thread with real-time priority
  2023-10-26 13:44     ` Morten Brørup
@ 2023-10-26 13:57       ` Morten Brørup
  2023-10-26 14:04         ` Thomas Monjalon
  2023-10-26 14:10         ` David Marchand
  0 siblings, 2 replies; 54+ messages in thread
From: Morten Brørup @ 2023-10-26 13:57 UTC (permalink / raw)
  To: Thomas Monjalon, stephen, bruce.richardson; +Cc: dev, David Marchand

> From: Morten Brørup [mailto:mb@smartsharesystems.com]
> Sent: Thursday, 26 October 2023 15.45
> 
> > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > Sent: Thursday, 26 October 2023 15.37
> >
> > 25/10/2023 18:31, Thomas Monjalon:
> > > Real-time thread priority was been forbidden on Unix
> > > because of problems they can cause.
> > > Warnings and helpers are added to avoid deadlocks,
> > > so real-time can be allowed on all systems.
> >
> > Unit test is failing:
> > DPDK:fast-tests / threads_autotest      TIMEOUT 600.01 s
> >
> > It is seen in only 1 target (maybe the failure occurence is random):
> >   Debian 11 (Buster) (ARM) | PASS
> >   Fedora 37 (ARM)          | PASS
> >   CentOS Stream 9 (ARM)    | FAIL
> >   Fedora 38 (ARM)          | PASS
> >   Fedora 38 (ARM Clang)    | PASS
> >   Ubuntu 20.04 (ARM)       | PASS
> >
> > I need to send a v4 with new implementation and better comments.
> > The Unix sleep will be upgraded from 1 ns to 1 us in case it makes a
> > difference.
> 
> It will not make a difference. The kernel will go through the sleeping steps,
> then wake up again and see the real-time thread is ready to run, and then
> immediately schedule it.
> 
> For testing purposes, consider sleeping 10 milliseconds or something
> significant like that.

A bit more details...

In our recent tests, nanosleep() itself took around 50 us. So you need to sleep longer than that for your thread not to be runnable when the nanosleep() wakes up again, because 50 us has already passed in "nanosleep overhead".
10 milliseconds provides plenty of margin, and corresponds to 10 jiffies on a 1000 Hz kernel. (I don't know if it makes any difference for the kernel scheduler if the timer crosses a jiffy border or not.)



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 0/2] allow creating thread with real-time priority
  2023-10-26 13:57       ` Morten Brørup
@ 2023-10-26 14:04         ` Thomas Monjalon
  2023-10-26 14:08           ` Morten Brørup
  2023-10-26 14:10         ` David Marchand
  1 sibling, 1 reply; 54+ messages in thread
From: Thomas Monjalon @ 2023-10-26 14:04 UTC (permalink / raw)
  To: Morten Brørup; +Cc: stephen, bruce.richardson, dev, David Marchand

26/10/2023 15:57, Morten Brørup:
> > From: Morten Brørup [mailto:mb@smartsharesystems.com]
> > Sent: Thursday, 26 October 2023 15.45
> > 
> > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > Sent: Thursday, 26 October 2023 15.37
> > >
> > > 25/10/2023 18:31, Thomas Monjalon:
> > > > Real-time thread priority was been forbidden on Unix
> > > > because of problems they can cause.
> > > > Warnings and helpers are added to avoid deadlocks,
> > > > so real-time can be allowed on all systems.
> > >
> > > Unit test is failing:
> > > DPDK:fast-tests / threads_autotest      TIMEOUT 600.01 s
> > >
> > > It is seen in only 1 target (maybe the failure occurence is random):
> > >   Debian 11 (Buster) (ARM) | PASS
> > >   Fedora 37 (ARM)          | PASS
> > >   CentOS Stream 9 (ARM)    | FAIL
> > >   Fedora 38 (ARM)          | PASS
> > >   Fedora 38 (ARM Clang)    | PASS
> > >   Ubuntu 20.04 (ARM)       | PASS
> > >
> > > I need to send a v4 with new implementation and better comments.
> > > The Unix sleep will be upgraded from 1 ns to 1 us in case it makes a
> > > difference.
> > 
> > It will not make a difference. The kernel will go through the sleeping steps,
> > then wake up again and see the real-time thread is ready to run, and then
> > immediately schedule it.
> > 
> > For testing purposes, consider sleeping 10 milliseconds or something
> > significant like that.
> 
> A bit more details...
> 
> In our recent tests, nanosleep() itself took around 50 us. So you need to sleep longer than that for your thread not to be runnable when the nanosleep() wakes up again, because 50 us has already passed in "nanosleep overhead".
> 10 milliseconds provides plenty of margin, and corresponds to 10 jiffies on a 1000 Hz kernel. (I don't know if it makes any difference for the kernel scheduler if the timer crosses a jiffy border or not.)

10 ms looks like an eternity.
I will try.
(Anyway I did a mistake when sending v4)




^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [PATCH v3 0/2] allow creating thread with real-time priority
  2023-10-26 14:04         ` Thomas Monjalon
@ 2023-10-26 14:08           ` Morten Brørup
  2023-10-26 14:30             ` Thomas Monjalon
  2023-10-26 16:28             ` Stephen Hemminger
  0 siblings, 2 replies; 54+ messages in thread
From: Morten Brørup @ 2023-10-26 14:08 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: stephen, bruce.richardson, dev, David Marchand

> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> Sent: Thursday, 26 October 2023 16.05
> 
> 26/10/2023 15:57, Morten Brørup:
> > > From: Morten Brørup [mailto:mb@smartsharesystems.com]
> > > Sent: Thursday, 26 October 2023 15.45
> > >
> > > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > > Sent: Thursday, 26 October 2023 15.37
> > > >
> > > > 25/10/2023 18:31, Thomas Monjalon:
> > > > > Real-time thread priority was been forbidden on Unix
> > > > > because of problems they can cause.
> > > > > Warnings and helpers are added to avoid deadlocks,
> > > > > so real-time can be allowed on all systems.
> > > >
> > > > Unit test is failing:
> > > > DPDK:fast-tests / threads_autotest      TIMEOUT 600.01 s
> > > >
> > > > It is seen in only 1 target (maybe the failure occurence is random):
> > > >   Debian 11 (Buster) (ARM) | PASS
> > > >   Fedora 37 (ARM)          | PASS
> > > >   CentOS Stream 9 (ARM)    | FAIL
> > > >   Fedora 38 (ARM)          | PASS
> > > >   Fedora 38 (ARM Clang)    | PASS
> > > >   Ubuntu 20.04 (ARM)       | PASS
> > > >
> > > > I need to send a v4 with new implementation and better comments.
> > > > The Unix sleep will be upgraded from 1 ns to 1 us in case it makes a
> > > > difference.
> > >
> > > It will not make a difference. The kernel will go through the sleeping
> steps,
> > > then wake up again and see the real-time thread is ready to run, and then
> > > immediately schedule it.
> > >
> > > For testing purposes, consider sleeping 10 milliseconds or something
> > > significant like that.
> >
> > A bit more details...
> >
> > In our recent tests, nanosleep() itself took around 50 us. So you need to
> sleep longer than that for your thread not to be runnable when the nanosleep()
> wakes up again, because 50 us has already passed in "nanosleep overhead".
> > 10 milliseconds provides plenty of margin, and corresponds to 10 jiffies on
> a 1000 Hz kernel. (I don't know if it makes any difference for the kernel
> scheduler if the timer crosses a jiffy border or not.)
> 
> 10 ms looks like an eternity.

Agree. It is only for functional testing, not for production!

> I will try.
> (Anyway I did a mistake when sending v4)
> 
> 


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 0/2] allow creating thread with real-time priority
  2023-10-26 13:57       ` Morten Brørup
  2023-10-26 14:04         ` Thomas Monjalon
@ 2023-10-26 14:10         ` David Marchand
  1 sibling, 0 replies; 54+ messages in thread
From: David Marchand @ 2023-10-26 14:10 UTC (permalink / raw)
  To: Morten Brørup; +Cc: Thomas Monjalon, stephen, bruce.richardson, dev

On Thu, Oct 26, 2023 at 3:57 PM Morten Brørup <mb@smartsharesystems.com> wrote:
>
> > From: Morten Brørup [mailto:mb@smartsharesystems.com]
> > Sent: Thursday, 26 October 2023 15.45
> >
> > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > Sent: Thursday, 26 October 2023 15.37
> > >
> > > 25/10/2023 18:31, Thomas Monjalon:
> > > > Real-time thread priority was been forbidden on Unix
> > > > because of problems they can cause.
> > > > Warnings and helpers are added to avoid deadlocks,
> > > > so real-time can be allowed on all systems.
> > >
> > > Unit test is failing:
> > > DPDK:fast-tests / threads_autotest      TIMEOUT 600.01 s
> > >
> > > It is seen in only 1 target (maybe the failure occurence is random):
> > >   Debian 11 (Buster) (ARM) | PASS
> > >   Fedora 37 (ARM)          | PASS
> > >   CentOS Stream 9 (ARM)    | FAIL
> > >   Fedora 38 (ARM)          | PASS
> > >   Fedora 38 (ARM Clang)    | PASS
> > >   Ubuntu 20.04 (ARM)       | PASS
> > >
> > > I need to send a v4 with new implementation and better comments.
> > > The Unix sleep will be upgraded from 1 ns to 1 us in case it makes a
> > > difference.
> >
> > It will not make a difference. The kernel will go through the sleeping steps,
> > then wake up again and see the real-time thread is ready to run, and then
> > immediately schedule it.
> >
> > For testing purposes, consider sleeping 10 milliseconds or something
> > significant like that.
>
> A bit more details...
>
> In our recent tests, nanosleep() itself took around 50 us. So you need to sleep longer than that for your thread not to be runnable when the nanosleep() wakes up again, because 50 us has already passed in "nanosleep overhead".

You may want to read manual for prctl and look for PR_SET_TIMERSLACK.
https://github.com/openvswitch/ovs/commit/f62629a55894546ff043e8a116c3c57aff73c285

> 10 milliseconds provides plenty of margin, and corresponds to 10 jiffies on a 1000 Hz kernel. (I don't know if it makes any difference for the kernel scheduler if the timer crosses a jiffy border or not.)



-- 
David Marchand


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v5 0/2] allow creating thread with real-time priority
  2023-10-24 12:54 [PATCH] eal/unix: allow creating thread with real-time priority Thomas Monjalon
                   ` (3 preceding siblings ...)
  2023-10-26 13:37 ` [PATCH v4 " Thomas Monjalon
@ 2023-10-26 14:19 ` Thomas Monjalon
  2023-10-26 14:19   ` [PATCH v5 1/2] eal: add thread yield functions Thomas Monjalon
                     ` (2 more replies)
  2023-10-27  8:08 ` [PATCH v6 1/1] eal/unix: " Thomas Monjalon
  5 siblings, 3 replies; 54+ messages in thread
From: Thomas Monjalon @ 2023-10-26 14:19 UTC (permalink / raw)
  To: dev; +Cc: David Marchand

Real-time thread priority was been forbidden on Unix
because of problems they can cause.
Warnings and helpers are added to avoid deadlocks,
so real-time can be allowed on all systems.

Thomas Monjalon (2):
  eal: add thread yield functions
  eal/unix: allow creating thread with real-time priority

v1: no yield at all
v2: more comments, sched_yield() and Sleep(0) on Windows
v3: 2 yield functions with sleep in realtime version
v4: runtime warning, longer sleep on Unix and lighter yield on Windows
v5: fix build and increase Unix sleep to 1 ms

Thomas Monjalon (2):
  eal: add thread yield functions
  eal/unix: allow creating thread with real-time priority

 app/test/test_threads.c                       | 11 +-----
 .../prog_guide/env_abstraction_layer.rst      |  4 +-
 lib/eal/include/rte_thread.h                  | 33 +++++++++++++++-
 lib/eal/unix/rte_thread.c                     | 38 ++++++++++++++-----
 lib/eal/version.map                           |  4 ++
 lib/eal/windows/rte_thread.c                  | 15 ++++++++
 6 files changed, 83 insertions(+), 22 deletions(-)

-- 
2.42.0


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v5 1/2] eal: add thread yield functions
  2023-10-26 14:19 ` [PATCH v5 0/2] " Thomas Monjalon
@ 2023-10-26 14:19   ` Thomas Monjalon
  2023-10-26 14:19   ` [PATCH v5 2/2] eal/unix: allow creating thread with real-time priority Thomas Monjalon
  2023-10-26 20:00   ` [PATCH v5 0/2] " Thomas Monjalon
  2 siblings, 0 replies; 54+ messages in thread
From: Thomas Monjalon @ 2023-10-26 14:19 UTC (permalink / raw)
  To: dev
  Cc: David Marchand, Dmitry Kozlyuk, Narcisa Ana Maria Vasile,
	Dmitry Malloy, Pallavi Kadam

When running real-time threads, we may need to force scheduling
kernel threads or other real-time threads.
New functions are added to address these cases.

The yield functions should not have any interest for normal threads.
Note: other purposes may be addressed with rte_pause() or rte_delay_*().

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
---
 lib/eal/include/rte_thread.h | 25 +++++++++++++++++++++++++
 lib/eal/unix/rte_thread.c    | 16 ++++++++++++++++
 lib/eal/version.map          |  4 ++++
 lib/eal/windows/rte_thread.c | 15 +++++++++++++++
 4 files changed, 60 insertions(+)

diff --git a/lib/eal/include/rte_thread.h b/lib/eal/include/rte_thread.h
index 8da9d4d3fb..f2581fe152 100644
--- a/lib/eal/include/rte_thread.h
+++ b/lib/eal/include/rte_thread.h
@@ -183,6 +183,31 @@ int rte_thread_join(rte_thread_t thread_id, uint32_t *value_ptr);
  */
 int rte_thread_detach(rte_thread_t thread_id);
 
+/**
+ * Allow another thread to run on the same CPU core.
+ *
+ * This is a scheduler request, with minimum latency.
+ * Lower priority threads may not be scheduled.
+ *
+ * Especially useful in real-time thread priority
+ * to schedule other real-time threads.
+ * @see RTE_THREAD_PRIORITY_REALTIME_CRITICAL
+ */
+__rte_experimental
+void rte_thread_yield(void);
+
+/**
+ * Unblock a CPU core running busy in a real-time thread.
+ *
+ * This is a sleep call, giving priority to all other threads.
+ *
+ * Especially useful in real-time thread priority
+ * to avoid a busy loop blocking vital threads on a core.
+ * @see RTE_THREAD_PRIORITY_REALTIME_CRITICAL
+ */
+__rte_experimental
+void rte_thread_yield_realtime(void);
+
 /**
  * Get the id of the calling thread.
  *
diff --git a/lib/eal/unix/rte_thread.c b/lib/eal/unix/rte_thread.c
index 36a21ab2f9..278d8d342d 100644
--- a/lib/eal/unix/rte_thread.c
+++ b/lib/eal/unix/rte_thread.c
@@ -5,9 +5,11 @@
 
 #include <errno.h>
 #include <pthread.h>
+#include <sched.h>
 #include <stdbool.h>
 #include <stdlib.h>
 #include <string.h>
+#include <time.h>
 
 #include <rte_errno.h>
 #include <rte_log.h>
@@ -227,6 +229,20 @@ rte_thread_detach(rte_thread_t thread_id)
 	return pthread_detach((pthread_t)thread_id.opaque_id);
 }
 
+void
+rte_thread_yield(void)
+{
+	sched_yield();
+}
+
+void
+rte_thread_yield_realtime(void)
+{
+	/* A simple yield may not be enough to schedule kernel threads. */
+	struct timespec wait = {.tv_nsec = 1000000}; /* 1 ms */
+	nanosleep(&wait, NULL);
+}
+
 int
 rte_thread_equal(rte_thread_t t1, rte_thread_t t2)
 {
diff --git a/lib/eal/version.map b/lib/eal/version.map
index e00a844805..b81ac3e3af 100644
--- a/lib/eal/version.map
+++ b/lib/eal/version.map
@@ -413,6 +413,10 @@ EXPERIMENTAL {
 	# added in 23.07
 	rte_memzone_max_get;
 	rte_memzone_max_set;
+
+	# added in 23.11
+	rte_thread_yield;
+	rte_thread_yield_realtime;
 };
 
 INTERNAL {
diff --git a/lib/eal/windows/rte_thread.c b/lib/eal/windows/rte_thread.c
index acf648456c..ea17b5de3d 100644
--- a/lib/eal/windows/rte_thread.c
+++ b/lib/eal/windows/rte_thread.c
@@ -304,6 +304,21 @@ rte_thread_detach(rte_thread_t thread_id)
 	return 0;
 }
 
+void
+rte_thread_yield(void)
+{
+	SwitchToThread();
+}
+
+void
+rte_thread_yield_realtime(void)
+{
+	/* Real-time threads are not causing problems on Windows.
+	 * A simple sleep should be more than enough.
+	 */
+	Sleep(0);
+}
+
 int
 rte_thread_equal(rte_thread_t t1, rte_thread_t t2)
 {
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v5 2/2] eal/unix: allow creating thread with real-time priority
  2023-10-26 14:19 ` [PATCH v5 0/2] " Thomas Monjalon
  2023-10-26 14:19   ` [PATCH v5 1/2] eal: add thread yield functions Thomas Monjalon
@ 2023-10-26 14:19   ` Thomas Monjalon
  2023-10-26 20:00   ` [PATCH v5 0/2] " Thomas Monjalon
  2 siblings, 0 replies; 54+ messages in thread
From: Thomas Monjalon @ 2023-10-26 14:19 UTC (permalink / raw)
  To: dev
  Cc: David Marchand, stable, Morten Brørup, Anatoly Burakov,
	Narcisa Vasile, Tyler Retzlaff, Stephen Hemminger,
	Dmitry Kozlyuk, Konstantin Ananyev, Andrew Rybchenko

When adding an API for creating threads,
the real-time priority has been forbidden on Unix.

There is a known issue with ring behaviour,
but it should not be completely forbidden.

Real-time thread can block some kernel threads on the same core,
making the system unstable.
That's why a sleep is added in the test thread,
and a warning is logged when using real-time priority.

Fixes: ca04c78b6262 ("eal: get/set thread priority per thread identifier")
Fixes: ce6e911d20f6 ("eal: add thread lifetime API")
Fixes: a7ba40b2b1bf ("drivers: convert to internal control threads")
Cc: stable@dpdk.org

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
---
 app/test/test_threads.c                       | 11 +---------
 .../prog_guide/env_abstraction_layer.rst      |  4 +++-
 lib/eal/include/rte_thread.h                  |  8 +++++--
 lib/eal/unix/rte_thread.c                     | 22 +++++++++++--------
 4 files changed, 23 insertions(+), 22 deletions(-)

diff --git a/app/test/test_threads.c b/app/test/test_threads.c
index 4ac3f2671a..c14d39fc83 100644
--- a/app/test/test_threads.c
+++ b/app/test/test_threads.c
@@ -22,7 +22,7 @@ thread_main(void *arg)
 	__atomic_store_n(&thread_id_ready, 1, __ATOMIC_RELEASE);
 
 	while (__atomic_load_n(&thread_id_ready, __ATOMIC_ACQUIRE) == 1)
-		;
+		rte_thread_yield_realtime(); /* required for RT priority */
 
 	return 0;
 }
@@ -97,21 +97,12 @@ test_thread_priority(void)
 		"Priority set mismatches priority get");
 
 	priority = RTE_THREAD_PRIORITY_REALTIME_CRITICAL;
-#ifndef RTE_EXEC_ENV_WINDOWS
-	RTE_TEST_ASSERT(rte_thread_set_priority(thread_id, priority) == ENOTSUP,
-		"Priority set to critical should fail");
-	RTE_TEST_ASSERT(rte_thread_get_priority(thread_id, &priority) == 0,
-		"Failed to get thread priority");
-	RTE_TEST_ASSERT(priority == RTE_THREAD_PRIORITY_NORMAL,
-		"Failed set to critical should have retained normal");
-#else
 	RTE_TEST_ASSERT(rte_thread_set_priority(thread_id, priority) == 0,
 		"Priority set to critical should succeed");
 	RTE_TEST_ASSERT(rte_thread_get_priority(thread_id, &priority) == 0,
 		"Failed to get thread priority");
 	RTE_TEST_ASSERT(priority == RTE_THREAD_PRIORITY_REALTIME_CRITICAL,
 		"Priority set mismatches priority get");
-#endif
 
 	priority = RTE_THREAD_PRIORITY_NORMAL;
 	RTE_TEST_ASSERT(rte_thread_set_priority(thread_id, priority) == 0,
diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst
index 6debf54efb..d1f7cae7cd 100644
--- a/doc/guides/prog_guide/env_abstraction_layer.rst
+++ b/doc/guides/prog_guide/env_abstraction_layer.rst
@@ -815,7 +815,9 @@ Known Issues
 
   4. It MAY be used by preemptible multi-producer and/or preemptible multi-consumer pthreads whose scheduling policy are all SCHED_OTHER(cfs), SCHED_IDLE or SCHED_BATCH. User SHOULD be aware of the performance penalty before using it.
 
-  5. It MUST not be used by multi-producer/consumer pthreads, whose scheduling policies are SCHED_FIFO or SCHED_RR.
+  5. It MUST not be used by multi-producer/consumer pthreads
+     whose scheduling policies are ``SCHED_FIFO``
+     or ``SCHED_RR`` (``RTE_THREAD_PRIORITY_REALTIME_CRITICAL``).
 
   Alternatively, applications can use the lock-free stack mempool handler. When
   considering this handler, note that:
diff --git a/lib/eal/include/rte_thread.h b/lib/eal/include/rte_thread.h
index f2581fe152..7ff031e1b2 100644
--- a/lib/eal/include/rte_thread.h
+++ b/lib/eal/include/rte_thread.h
@@ -56,10 +56,14 @@ typedef uint32_t (*rte_thread_func) (void *arg);
  * Thread priority values.
  */
 enum rte_thread_priority {
+	/** Normal thread priority, the default. */
 	RTE_THREAD_PRIORITY_NORMAL            = 0,
-	/**< normal thread priority, the default */
+	/**
+	 * Highest thread priority, use with caution.
+	 * WARNING: System may be unstable because of a real-time busy loop.
+	 *          @see rte_thread_yield_realtime().
+	 */
 	RTE_THREAD_PRIORITY_REALTIME_CRITICAL = 1,
-	/**< highest thread priority allowed */
 };
 
 /**
diff --git a/lib/eal/unix/rte_thread.c b/lib/eal/unix/rte_thread.c
index 278d8d342d..17ffb86c17 100644
--- a/lib/eal/unix/rte_thread.c
+++ b/lib/eal/unix/rte_thread.c
@@ -33,6 +33,8 @@ static int
 thread_map_priority_to_os_value(enum rte_thread_priority eal_pri, int *os_pri,
 	int *pol)
 {
+	static bool warned;
+
 	/* Clear the output parameters. */
 	*os_pri = sched_get_priority_min(SCHED_OTHER) - 1;
 	*pol = -1;
@@ -51,6 +53,17 @@ thread_map_priority_to_os_value(enum rte_thread_priority eal_pri, int *os_pri,
 			sched_get_priority_max(SCHED_OTHER)) / 2;
 		break;
 	case RTE_THREAD_PRIORITY_REALTIME_CRITICAL:
+		/*
+		 * WARNING: Real-time busy loop takes priority on kernel threads,
+		 *          making the system unstable.
+		 *          There is also a known issue when using rte_ring.
+		 */
+		if (!warned) {
+			RTE_LOG(NOTICE, EAL,
+					"Real-time thread is unstable if polling without sleep.\n");
+			warned = true;
+		}
+
 		*pol = SCHED_RR;
 		*os_pri = sched_get_priority_max(SCHED_RR);
 		break;
@@ -155,11 +168,6 @@ rte_thread_create(rte_thread_t *thread_id,
 			goto cleanup;
 		}
 
-		if (thread_attr->priority ==
-				RTE_THREAD_PRIORITY_REALTIME_CRITICAL) {
-			ret = ENOTSUP;
-			goto cleanup;
-		}
 		ret = thread_map_priority_to_os_value(thread_attr->priority,
 				&param.sched_priority, &policy);
 		if (ret != 0)
@@ -291,10 +299,6 @@ rte_thread_set_priority(rte_thread_t thread_id,
 	int policy;
 	int ret;
 
-	/* Realtime priority can cause crashes on non-Windows platforms. */
-	if (priority == RTE_THREAD_PRIORITY_REALTIME_CRITICAL)
-		return ENOTSUP;
-
 	ret = thread_map_priority_to_os_value(priority, &param.sched_priority,
 		&policy);
 	if (ret != 0)
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 0/2] allow creating thread with real-time priority
  2023-10-26 14:08           ` Morten Brørup
@ 2023-10-26 14:30             ` Thomas Monjalon
  2023-10-26 14:50               ` Morten Brørup
  2023-10-26 16:28             ` Stephen Hemminger
  1 sibling, 1 reply; 54+ messages in thread
From: Thomas Monjalon @ 2023-10-26 14:30 UTC (permalink / raw)
  To: Morten Brørup; +Cc: dev, stephen, bruce.richardson, David Marchand

26/10/2023 16:08, Morten Brørup:
> > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > Sent: Thursday, 26 October 2023 16.05
> > 
> > 26/10/2023 15:57, Morten Brørup:
> > > > From: Morten Brørup [mailto:mb@smartsharesystems.com]
> > > > Sent: Thursday, 26 October 2023 15.45
> > > >
> > > > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > > > Sent: Thursday, 26 October 2023 15.37
> > > > >
> > > > > 25/10/2023 18:31, Thomas Monjalon:
> > > > > > Real-time thread priority was been forbidden on Unix
> > > > > > because of problems they can cause.
> > > > > > Warnings and helpers are added to avoid deadlocks,
> > > > > > so real-time can be allowed on all systems.
> > > > >
> > > > > Unit test is failing:
> > > > > DPDK:fast-tests / threads_autotest      TIMEOUT 600.01 s
> > > > >
> > > > > It is seen in only 1 target (maybe the failure occurence is random):
> > > > >   Debian 11 (Buster) (ARM) | PASS
> > > > >   Fedora 37 (ARM)          | PASS
> > > > >   CentOS Stream 9 (ARM)    | FAIL
> > > > >   Fedora 38 (ARM)          | PASS
> > > > >   Fedora 38 (ARM Clang)    | PASS
> > > > >   Ubuntu 20.04 (ARM)       | PASS
> > > > >
> > > > > I need to send a v4 with new implementation and better comments.
> > > > > The Unix sleep will be upgraded from 1 ns to 1 us in case it makes a
> > > > > difference.
> > > >
> > > > It will not make a difference. The kernel will go through the sleeping
> > steps,
> > > > then wake up again and see the real-time thread is ready to run, and then
> > > > immediately schedule it.
> > > >
> > > > For testing purposes, consider sleeping 10 milliseconds or something
> > > > significant like that.
> > >
> > > A bit more details...
> > >
> > > In our recent tests, nanosleep() itself took around 50 us. So you need to
> > sleep longer than that for your thread not to be runnable when the nanosleep()
> > wakes up again, because 50 us has already passed in "nanosleep overhead".
> > > 10 milliseconds provides plenty of margin, and corresponds to 10 jiffies on
> > a 1000 Hz kernel. (I don't know if it makes any difference for the kernel
> > scheduler if the timer crosses a jiffy border or not.)
> > 
> > 10 ms looks like an eternity.
> 
> Agree. It is only for functional testing, not for production!

Realtime thread won't make any sense if we have to insert a long sleep.
We need to find a good sleep value or give up with real-time threads.
(note I'm not sure how much it is useful)

> > I will try.
> > (Anyway I did a mistake when sending v4)

I've sent a trial with 1 ms.



^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [PATCH v3 0/2] allow creating thread with real-time priority
  2023-10-26 14:30             ` Thomas Monjalon
@ 2023-10-26 14:50               ` Morten Brørup
  2023-10-26 14:59                 ` Morten Brørup
  0 siblings, 1 reply; 54+ messages in thread
From: Morten Brørup @ 2023-10-26 14:50 UTC (permalink / raw)
  To: Thomas Monjalon, David Marchand; +Cc: dev, stephen, bruce.richardson

> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> Sent: Thursday, 26 October 2023 16.31
> 
> 26/10/2023 16:08, Morten Brørup:
> > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > Sent: Thursday, 26 October 2023 16.05
> > >
> > > 26/10/2023 15:57, Morten Brørup:
> > > > > From: Morten Brørup [mailto:mb@smartsharesystems.com]
> > > > > Sent: Thursday, 26 October 2023 15.45
> > > > >
> > > > > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > > > > Sent: Thursday, 26 October 2023 15.37
> > > > > >
> > > > > > 25/10/2023 18:31, Thomas Monjalon:
> > > > > > > Real-time thread priority was been forbidden on Unix
> > > > > > > because of problems they can cause.
> > > > > > > Warnings and helpers are added to avoid deadlocks,
> > > > > > > so real-time can be allowed on all systems.
> > > > > >
> > > > > > Unit test is failing:
> > > > > > DPDK:fast-tests / threads_autotest      TIMEOUT 600.01 s
> > > > > >
> > > > > > It is seen in only 1 target (maybe the failure occurence is random):
> > > > > >   Debian 11 (Buster) (ARM) | PASS
> > > > > >   Fedora 37 (ARM)          | PASS
> > > > > >   CentOS Stream 9 (ARM)    | FAIL
> > > > > >   Fedora 38 (ARM)          | PASS
> > > > > >   Fedora 38 (ARM Clang)    | PASS
> > > > > >   Ubuntu 20.04 (ARM)       | PASS
> > > > > >
> > > > > > I need to send a v4 with new implementation and better comments.
> > > > > > The Unix sleep will be upgraded from 1 ns to 1 us in case it makes a
> > > > > > difference.
> > > > >
> > > > > It will not make a difference. The kernel will go through the sleeping
> > > steps,
> > > > > then wake up again and see the real-time thread is ready to run, and
> then
> > > > > immediately schedule it.
> > > > >
> > > > > For testing purposes, consider sleeping 10 milliseconds or something
> > > > > significant like that.
> > > >
> > > > A bit more details...
> > > >
> > > > In our recent tests, nanosleep() itself took around 50 us. So you need
> to
> > > sleep longer than that for your thread not to be runnable when the
> nanosleep()
> > > wakes up again, because 50 us has already passed in "nanosleep overhead".
> > > > 10 milliseconds provides plenty of margin, and corresponds to 10 jiffies
> on
> > > a 1000 Hz kernel. (I don't know if it makes any difference for the kernel
> > > scheduler if the timer crosses a jiffy border or not.)
> > >
> > > 10 ms looks like an eternity.
> >
> > Agree. It is only for functional testing, not for production!
> 
> Realtime thread won't make any sense if we have to insert a long sleep.

It seems David came to our rescue here!

I have just tried running our test again with prctl(PR_SET_TIMERSLACK) of 1 ns, and the nanosleep(1 ns) delay dropped from ca. 50 us to ca. 2.5 us.

The timeout parameter to epoll_wait() is in milliseconds, which is useless for low-latency.
Perhaps real-time threads can be used with epoll() combined with timerfd for nanosecond resolution timeout.

> We need to find a good sleep value or give up with real-time threads.
> (note I'm not sure how much it is useful)

Only the application developer knows how much delay is acceptable. Which is why I mentioned that the new yield functions should document the delay.

> 
> > > I will try.
> > > (Anyway I did a mistake when sending v4)
> 
> I've sent a trial with 1 ms.
> 


^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [PATCH v3 0/2] allow creating thread with real-time priority
  2023-10-26 14:50               ` Morten Brørup
@ 2023-10-26 14:59                 ` Morten Brørup
  2023-10-26 15:54                   ` Bruce Richardson
  0 siblings, 1 reply; 54+ messages in thread
From: Morten Brørup @ 2023-10-26 14:59 UTC (permalink / raw)
  To: Thomas Monjalon, David Marchand; +Cc: dev, stephen, bruce.richardson

> From: Morten Brørup [mailto:mb@smartsharesystems.com]
> Sent: Thursday, 26 October 2023 16.50
> 
> > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > Sent: Thursday, 26 October 2023 16.31
> >
> > 26/10/2023 16:08, Morten Brørup:
> > > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > > Sent: Thursday, 26 October 2023 16.05
> > > >
> > > > 26/10/2023 15:57, Morten Brørup:
> > > > > > From: Morten Brørup [mailto:mb@smartsharesystems.com]
> > > > > > Sent: Thursday, 26 October 2023 15.45
> > > > > >
> > > > > > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > > > > > Sent: Thursday, 26 October 2023 15.37
> > > > > > >
> > > > > > > 25/10/2023 18:31, Thomas Monjalon:
> > > > > > > > Real-time thread priority was been forbidden on Unix
> > > > > > > > because of problems they can cause.
> > > > > > > > Warnings and helpers are added to avoid deadlocks,
> > > > > > > > so real-time can be allowed on all systems.
> > > > > > >
> > > > > > > Unit test is failing:
> > > > > > > DPDK:fast-tests / threads_autotest      TIMEOUT 600.01 s
> > > > > > >
> > > > > > > It is seen in only 1 target (maybe the failure occurence is
> random):
> > > > > > >   Debian 11 (Buster) (ARM) | PASS
> > > > > > >   Fedora 37 (ARM)          | PASS
> > > > > > >   CentOS Stream 9 (ARM)    | FAIL
> > > > > > >   Fedora 38 (ARM)          | PASS
> > > > > > >   Fedora 38 (ARM Clang)    | PASS
> > > > > > >   Ubuntu 20.04 (ARM)       | PASS
> > > > > > >
> > > > > > > I need to send a v4 with new implementation and better comments.
> > > > > > > The Unix sleep will be upgraded from 1 ns to 1 us in case it makes
> a
> > > > > > > difference.
> > > > > >
> > > > > > It will not make a difference. The kernel will go through the
> sleeping
> > > > steps,
> > > > > > then wake up again and see the real-time thread is ready to run, and
> > then
> > > > > > immediately schedule it.
> > > > > >
> > > > > > For testing purposes, consider sleeping 10 milliseconds or something
> > > > > > significant like that.
> > > > >
> > > > > A bit more details...
> > > > >
> > > > > In our recent tests, nanosleep() itself took around 50 us. So you need
> > to
> > > > sleep longer than that for your thread not to be runnable when the
> > nanosleep()
> > > > wakes up again, because 50 us has already passed in "nanosleep
> overhead".
> > > > > 10 milliseconds provides plenty of margin, and corresponds to 10
> jiffies
> > on
> > > > a 1000 Hz kernel. (I don't know if it makes any difference for the
> kernel
> > > > scheduler if the timer crosses a jiffy border or not.)
> > > >
> > > > 10 ms looks like an eternity.
> > >
> > > Agree. It is only for functional testing, not for production!
> >
> > Realtime thread won't make any sense if we have to insert a long sleep.
> 
> It seems David came to our rescue here!
> 
> I have just tried running our test again with prctl(PR_SET_TIMERSLACK) of 1
> ns, and the nanosleep(1 ns) delay dropped from ca. 50 us to ca. 2.5 us.
> 
> The timeout parameter to epoll_wait() is in milliseconds, which is useless for
> low-latency.
> Perhaps real-time threads can be used with epoll() combined with timerfd for
> nanosecond resolution timeout.

Or epoll_pwait2(), which has nanosecond resolution timeout.

Unfortunately, rte_epoll_wait() is not an experimental API anymore, so we cannot change its timeout parameter from milliseconds to micro- or nanoseconds. We would have to introduce a new API for this.

> 
> > We need to find a good sleep value or give up with real-time threads.
> > (note I'm not sure how much it is useful)
> 
> Only the application developer knows how much delay is acceptable. Which is
> why I mentioned that the new yield functions should document the delay.
> 
> >
> > > > I will try.
> > > > (Anyway I did a mistake when sending v4)
> >
> > I've sent a trial with 1 ms.
> >


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 0/2] allow creating thread with real-time priority
  2023-10-26 14:59                 ` Morten Brørup
@ 2023-10-26 15:54                   ` Bruce Richardson
  2023-10-26 16:07                     ` Thomas Monjalon
  0 siblings, 1 reply; 54+ messages in thread
From: Bruce Richardson @ 2023-10-26 15:54 UTC (permalink / raw)
  To: Morten Brørup; +Cc: Thomas Monjalon, David Marchand, dev, stephen

On Thu, Oct 26, 2023 at 04:59:51PM +0200, Morten Brørup wrote:
> > From: Morten Brørup [mailto:mb@smartsharesystems.com]
> > Sent: Thursday, 26 October 2023 16.50
> > 
> > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > Sent: Thursday, 26 October 2023 16.31
> > >
> > > 26/10/2023 16:08, Morten Brørup:
> > > > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > > > Sent: Thursday, 26 October 2023 16.05
> > > > >
> > > > > 26/10/2023 15:57, Morten Brørup:
> > > > > > > From: Morten Brørup [mailto:mb@smartsharesystems.com]
> > > > > > > Sent: Thursday, 26 October 2023 15.45
> > > > > > >
> > > > > > > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > > > > > > Sent: Thursday, 26 October 2023 15.37
> > > > > > > >
> > > > > > > > 25/10/2023 18:31, Thomas Monjalon:
> > > > > > > > > Real-time thread priority was been forbidden on Unix
> > > > > > > > > because of problems they can cause.
> > > > > > > > > Warnings and helpers are added to avoid deadlocks,
> > > > > > > > > so real-time can be allowed on all systems.
> > > > > > > >
> > > > > > > > Unit test is failing:
> > > > > > > > DPDK:fast-tests / threads_autotest      TIMEOUT 600.01 s
> > > > > > > >
> > > > > > > > It is seen in only 1 target (maybe the failure occurence is
> > random):
> > > > > > > >   Debian 11 (Buster) (ARM) | PASS
> > > > > > > >   Fedora 37 (ARM)          | PASS
> > > > > > > >   CentOS Stream 9 (ARM)    | FAIL
> > > > > > > >   Fedora 38 (ARM)          | PASS
> > > > > > > >   Fedora 38 (ARM Clang)    | PASS
> > > > > > > >   Ubuntu 20.04 (ARM)       | PASS
> > > > > > > >
> > > > > > > > I need to send a v4 with new implementation and better comments.
> > > > > > > > The Unix sleep will be upgraded from 1 ns to 1 us in case it makes
> > a
> > > > > > > > difference.
> > > > > > >
> > > > > > > It will not make a difference. The kernel will go through the
> > sleeping
> > > > > steps,
> > > > > > > then wake up again and see the real-time thread is ready to run, and
> > > then
> > > > > > > immediately schedule it.
> > > > > > >
> > > > > > > For testing purposes, consider sleeping 10 milliseconds or something
> > > > > > > significant like that.
> > > > > >
> > > > > > A bit more details...
> > > > > >
> > > > > > In our recent tests, nanosleep() itself took around 50 us. So you need
> > > to
> > > > > sleep longer than that for your thread not to be runnable when the
> > > nanosleep()
> > > > > wakes up again, because 50 us has already passed in "nanosleep
> > overhead".
> > > > > > 10 milliseconds provides plenty of margin, and corresponds to 10
> > jiffies
> > > on
> > > > > a 1000 Hz kernel. (I don't know if it makes any difference for the
> > kernel
> > > > > scheduler if the timer crosses a jiffy border or not.)
> > > > >
> > > > > 10 ms looks like an eternity.
> > > >
> > > > Agree. It is only for functional testing, not for production!
> > >
> > > Realtime thread won't make any sense if we have to insert a long sleep.
> > 
> > It seems David came to our rescue here!
> > 
> > I have just tried running our test again with prctl(PR_SET_TIMERSLACK) of 1
> > ns, and the nanosleep(1 ns) delay dropped from ca. 50 us to ca. 2.5 us.
> > 
> > The timeout parameter to epoll_wait() is in milliseconds, which is useless for
> > low-latency.
> > Perhaps real-time threads can be used with epoll() combined with timerfd for
> > nanosecond resolution timeout.
> 
> Or epoll_pwait2(), which has nanosecond resolution timeout.
> 
> Unfortunately, rte_epoll_wait() is not an experimental API anymore, so we cannot change its timeout parameter from milliseconds to micro- or nanoseconds. We would have to introduce a new API for this.
> 

Just an idea - can we change the timeout parameter to float rather than int,
and then use function versioning for backward compatibility for any
binaries passing int?
That way the actual meaning of the parameter doesn't change, but it still
allows sub-millisecond values (all-be-it with some loss of accuracy due to
float).

/Bruce


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 0/2] allow creating thread with real-time priority
  2023-10-26 15:54                   ` Bruce Richardson
@ 2023-10-26 16:07                     ` Thomas Monjalon
  2023-10-26 16:50                       ` Morten Brørup
  0 siblings, 1 reply; 54+ messages in thread
From: Thomas Monjalon @ 2023-10-26 16:07 UTC (permalink / raw)
  To: Morten Brørup, Bruce Richardson; +Cc: David Marchand, dev, stephen

26/10/2023 17:54, Bruce Richardson:
> On Thu, Oct 26, 2023 at 04:59:51PM +0200, Morten Brørup wrote:
> > > From: Morten Brørup [mailto:mb@smartsharesystems.com]
> > > Sent: Thursday, 26 October 2023 16.50
> > > 
> > > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > > Sent: Thursday, 26 October 2023 16.31
> > > >
> > > > 26/10/2023 16:08, Morten Brørup:
> > > > > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > > > > Sent: Thursday, 26 October 2023 16.05
> > > > > >
> > > > > > 26/10/2023 15:57, Morten Brørup:
> > > > > > > > From: Morten Brørup [mailto:mb@smartsharesystems.com]
> > > > > > > > Sent: Thursday, 26 October 2023 15.45
> > > > > > > >
> > > > > > > > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > > > > > > > Sent: Thursday, 26 October 2023 15.37
> > > > > > > > >
> > > > > > > > > 25/10/2023 18:31, Thomas Monjalon:
> > > > > > > > > > Real-time thread priority was been forbidden on Unix
> > > > > > > > > > because of problems they can cause.
> > > > > > > > > > Warnings and helpers are added to avoid deadlocks,
> > > > > > > > > > so real-time can be allowed on all systems.
> > > > > > > > >
> > > > > > > > > Unit test is failing:
> > > > > > > > > DPDK:fast-tests / threads_autotest      TIMEOUT 600.01 s
> > > > > > > > >
> > > > > > > > > It is seen in only 1 target (maybe the failure occurence is
> > > random):
> > > > > > > > >   Debian 11 (Buster) (ARM) | PASS
> > > > > > > > >   Fedora 37 (ARM)          | PASS
> > > > > > > > >   CentOS Stream 9 (ARM)    | FAIL
> > > > > > > > >   Fedora 38 (ARM)          | PASS
> > > > > > > > >   Fedora 38 (ARM Clang)    | PASS
> > > > > > > > >   Ubuntu 20.04 (ARM)       | PASS
> > > > > > > > >
> > > > > > > > > I need to send a v4 with new implementation and better comments.
> > > > > > > > > The Unix sleep will be upgraded from 1 ns to 1 us in case it makes
> > > a
> > > > > > > > > difference.
> > > > > > > >
> > > > > > > > It will not make a difference. The kernel will go through the
> > > sleeping
> > > > > > steps,
> > > > > > > > then wake up again and see the real-time thread is ready to run, and
> > > > then
> > > > > > > > immediately schedule it.
> > > > > > > >
> > > > > > > > For testing purposes, consider sleeping 10 milliseconds or something
> > > > > > > > significant like that.
> > > > > > >
> > > > > > > A bit more details...
> > > > > > >
> > > > > > > In our recent tests, nanosleep() itself took around 50 us. So you need
> > > > to
> > > > > > sleep longer than that for your thread not to be runnable when the
> > > > nanosleep()
> > > > > > wakes up again, because 50 us has already passed in "nanosleep
> > > overhead".
> > > > > > > 10 milliseconds provides plenty of margin, and corresponds to 10
> > > jiffies
> > > > on
> > > > > > a 1000 Hz kernel. (I don't know if it makes any difference for the
> > > kernel
> > > > > > scheduler if the timer crosses a jiffy border or not.)
> > > > > >
> > > > > > 10 ms looks like an eternity.
> > > > >
> > > > > Agree. It is only for functional testing, not for production!
> > > >
> > > > Realtime thread won't make any sense if we have to insert a long sleep.
> > > 
> > > It seems David came to our rescue here!
> > > 
> > > I have just tried running our test again with prctl(PR_SET_TIMERSLACK) of 1
> > > ns, and the nanosleep(1 ns) delay dropped from ca. 50 us to ca. 2.5 us.
> > > 
> > > The timeout parameter to epoll_wait() is in milliseconds, which is useless for
> > > low-latency.
> > > Perhaps real-time threads can be used with epoll() combined with timerfd for
> > > nanosecond resolution timeout.
> > 
> > Or epoll_pwait2(), which has nanosecond resolution timeout.
> > 
> > Unfortunately, rte_epoll_wait() is not an experimental API anymore, so we cannot change its timeout parameter from milliseconds to micro- or nanoseconds. We would have to introduce a new API for this.
> > 
> 
> Just an idea - can we change the timeout parameter to float rather than int,
> and then use function versioning for backward compatibility for any
> binaries passing int?
> That way the actual meaning of the parameter doesn't change, but it still
> allows sub-millisecond values (all-be-it with some loss of accuracy due to
> float).

Sorry I'm not following why you want to use rte_epoll_wait()?

If the realtime thread has some blocking system calls,
no sleep is needed I think.
For average realtime thread, I suggest the API rte_thread_yield_realtime()
which could wait for 1 ms or less by default.
For smaller sleep, we can use PR_SET_TIMERSLACK and rte_delay_us_sleep().
If we provide an API for PR_SET_TIMERSLACK, we could adapt the duration
of rte_thread_yield_realtime() dynamically after calling prctl().



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 0/2] allow creating thread with real-time priority
  2023-10-26 14:08           ` Morten Brørup
  2023-10-26 14:30             ` Thomas Monjalon
@ 2023-10-26 16:28             ` Stephen Hemminger
  2023-10-26 19:55               ` Thomas Monjalon
  1 sibling, 1 reply; 54+ messages in thread
From: Stephen Hemminger @ 2023-10-26 16:28 UTC (permalink / raw)
  To: Morten Brørup; +Cc: Thomas Monjalon, bruce.richardson, dev, David Marchand

On Thu, 26 Oct 2023 16:08:02 +0200
Morten Brørup <mb@smartsharesystems.com> wrote:

> > > In our recent tests, nanosleep() itself took around 50 us. So you need to  
> > sleep longer than that for your thread not to be runnable when the nanosleep()
> > wakes up again, because 50 us has already passed in "nanosleep overhead".  
> > > 10 milliseconds provides plenty of margin, and corresponds to 10 jiffies on  
> > a 1000 Hz kernel. (I don't know if it makes any difference for the kernel
> > scheduler if the timer crosses a jiffy border or not.)
> > 
> > 10 ms looks like an eternity.  
> 
> Agree. It is only for functional testing, not for production!

To be safe the sleep has to be longer than the system clock tick.
Most systems are built today with HZ=250 but really should be using HZ=1000
on modern CPU's.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2] eal/unix: allow creating thread with real-time priority
  2023-10-26  7:33           ` Morten Brørup
@ 2023-10-26 16:32             ` Stephen Hemminger
  2023-10-26 17:07               ` Morten Brørup
  0 siblings, 1 reply; 54+ messages in thread
From: Stephen Hemminger @ 2023-10-26 16:32 UTC (permalink / raw)
  To: Morten Brørup
  Cc: Thomas Monjalon, dev, David Marchand, stable, Anatoly Burakov,
	Dmitry Kozlyuk, Narcisa Ana Maria Vasile, Dmitry Malloy,
	Pallavi Kadam, Tyler Retzlaff, Andrew Rybchenko,
	Konstantin Ananyev

On Thu, 26 Oct 2023 09:33:42 +0200
Morten Brørup <mb@smartsharesystems.com> wrote:

> > From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> > Sent: Wednesday, 25 October 2023 23.33
> > 
> > On Wed, 25 Oct 2023 19:54:06 +0200
> > Morten Brørup <mb@smartsharesystems.com> wrote:
> >   
> > > I agree with Thomas on this.
> > >
> > > If you want the log message, please degrade it to INFO or DEBUG level. It is  
> > only relevant when chasing problems, not for normal production - and thus
> > NOTICE is too high.
> > 
> > I don't want the message to be hidden.
> > If we get any bug reports want to be able to say "read the log, don't do
> > that".  
> 
> Since Stephen is arguing so strongly for it, I have changed my mind, and now support Stephen's suggestion.
> 
> It's a tradeoff: Noise for carefully designed systems, vs. important bug hunting information for systems under development (or casually developed systems).
> As Stephen points out, it is a good starting point to check for bug reports possibly related to this. And, I suppose the experienced users who really understands it will not be seriously confused by such a NOTICE message in the log.
> 
> >   
> > > Someone might build a kernel with options to keep non-dataplane threads off  
> > some dedicated CPU cores, so they can be used for guaranteed low-latency
> > dataplane threads. We do. We don't use real-time priority, though.
> > 
> > This is really, hard to do.  
> 
> As my kids would say: This is really, really, really, really, really hard to do!
> 
> We have not been able to find an authoritative source of documentation describing how to do it. :-(
> 
> And our experiment shows that we didn't 100 % succeed doing it. But we got close enough for our purposes. Outliers of max 9,000 CPU cycles on a 3+ GHz CPU corresponds to max 3 microseconds of added worst-case latency.
> 
> It would be great for latency-sensitive applications if the DPDK documentation went more into detail on this topic. However, if the DPDK runs on top of a Linux distro, it essentially depends on the distro, and should be documented there. And if running on top of a custom built Linux Kernel, it essentially depends on the kernel, and should be documented there. In other words: Such information should be contributed there, and not in the DPDK documentation. ;-)
> 
> > Isolated CPU's are not isolated from interrupts
> > and other sources which end up scheduling work as kernel threads. Plus there
> > is the behavior where kernel decides to turn a soft irq into a kernel thread,
> > then starve itself.  
> 
> We have configured the kernel to put all of this on CPU 0. (Details further below.)
> 
> > Under starvation, disk corruption is likely if interrupts never get
> > processed :-(
> >   
> > > For reference, we did some experiments (using this custom built kernel) with  
> > a dedicated thread doing nothing but a loop calling rte_rdtsc_precise() and
> > registering the delta. Although the overwhelming majority is ca. CPU 80
> > cycles, there are some big outliers at ca. 9,000 CPU cycles. (Order of
> > magnitude: ca. 45 of these big outliers per minute.) Apparently some kernel
> > threads steal some cycles from this thread, regardless of our customizations.
> > We haven't bothered analyzing and optimizing it further.
> > 
> > Was this on isolated CPU?  
> 
> Yes. We isolate all CPUs but CPU 0.
> 
> > Did you check that that CPU was excluded from the smp_affinty mask on all
> > devices?  
> 
> Not sure how to do that?
> 
> NB: We are currently only using single-socket hardware - this makes some things easier. Perhaps this is one of those things?
> 
> > Did you enable the kernel feature to avoid clock ticks if CPU is dedicated?  
> 
> Yes:
> # Timers subsystem
> CONFIG_TICK_ONESHOT=y
> CONFIG_NO_HZ_COMMON=y
> CONFIG_NO_HZ_FULL=y
> CONFIG_NO_HZ_FULL_ALL=y
> 
> CONFIG_CMDLINE="isolcpus=1-32 irqaffinity=0 rcu_nocb_poll"
> 
> > Same thing for RCU, need to adjust parameters?  
> 
> Yes:
> # RCU Subsystem
> CONFIG_TREE_RCU=y
> CONFIG_SRCU=y
> CONFIG_RCU_STALL_COMMON=y
> CONFIG_CONTEXT_TRACKING=y
> CONFIG_RCU_NOCB_CPU=y
> CONFIG_RCU_NOCB_CPU_ALL=y
> 
> > 
> > Also, on many systems there can be SMI BIOS hidden execution that will cause
> > big outliers.  
> 
> Yes, this is a big surprise to many people, when it happens. Our hardware doesn't suffer from that.
> 
> > 
> > Lastly never try and use CPU 0. The kernel uses CPU 0 as catch all in lots of
> > places.  
> 
> Yes, this is very important! We treat CPU 0 as if any random process or interrupt handler can take it away at any time.
> 
> >   
> > > I think our experiment supports the need to allow kernel threads to run,  
> > e.g. by calling sleep() or similar, when an EAL thread has real-time priority.  
> 

One benefit of doing real-time thread is that kernel will be more precise in
any calls to sleep. If you do small sleep in normal thread, the kernel will round
up the timer to try and avoid reprogramming timer chip and to save power (less wakeups from idle).
With RT thread it will do "you wanted 21us, ok for you will do 21us"

The project that was originally Vyatta, has a script that tries to isolate interrupts etc.
I started it but they have worked on it since then.

   https://github.com/danos/vyatta-cpu-shield

It adjust kernel workers, softirq, cgroups etc

^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [PATCH v3 0/2] allow creating thread with real-time priority
  2023-10-26 16:07                     ` Thomas Monjalon
@ 2023-10-26 16:50                       ` Morten Brørup
  2023-10-26 19:51                         ` Thomas Monjalon
  0 siblings, 1 reply; 54+ messages in thread
From: Morten Brørup @ 2023-10-26 16:50 UTC (permalink / raw)
  To: Thomas Monjalon, Bruce Richardson; +Cc: David Marchand, dev, stephen

> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> Sent: Thursday, 26 October 2023 18.07
> 
> 26/10/2023 17:54, Bruce Richardson:
> > On Thu, Oct 26, 2023 at 04:59:51PM +0200, Morten Brørup wrote:
> > > > From: Morten Brørup [mailto:mb@smartsharesystems.com]
> > > > Sent: Thursday, 26 October 2023 16.50
> > > >
> > > > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > > > Sent: Thursday, 26 October 2023 16.31
> > > > >
> > > > > 26/10/2023 16:08, Morten Brørup:
> > > > > > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > > > > > Sent: Thursday, 26 October 2023 16.05
> > > > > > >
> > > > > > > 26/10/2023 15:57, Morten Brørup:
> > > > > > > > > From: Morten Brørup [mailto:mb@smartsharesystems.com]
> > > > > > > > > Sent: Thursday, 26 October 2023 15.45
> > > > > > > > >
> > > > > > > > > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > > > > > > > > Sent: Thursday, 26 October 2023 15.37
> > > > > > > > > >
> > > > > > > > > > 25/10/2023 18:31, Thomas Monjalon:
> > > > > > > > > > > Real-time thread priority was been forbidden on Unix
> > > > > > > > > > > because of problems they can cause.
> > > > > > > > > > > Warnings and helpers are added to avoid deadlocks,
> > > > > > > > > > > so real-time can be allowed on all systems.
> > > > > > > > > >
> > > > > > > > > > Unit test is failing:
> > > > > > > > > > DPDK:fast-tests / threads_autotest      TIMEOUT 600.01
> s
> > > > > > > > > >
> > > > > > > > > > It is seen in only 1 target (maybe the failure
> occurence is
> > > > random):
> > > > > > > > > >   Debian 11 (Buster) (ARM) | PASS
> > > > > > > > > >   Fedora 37 (ARM)          | PASS
> > > > > > > > > >   CentOS Stream 9 (ARM)    | FAIL
> > > > > > > > > >   Fedora 38 (ARM)          | PASS
> > > > > > > > > >   Fedora 38 (ARM Clang)    | PASS
> > > > > > > > > >   Ubuntu 20.04 (ARM)       | PASS
> > > > > > > > > >
> > > > > > > > > > I need to send a v4 with new implementation and better
> comments.
> > > > > > > > > > The Unix sleep will be upgraded from 1 ns to 1 us in
> case it makes
> > > > a
> > > > > > > > > > difference.
> > > > > > > > >
> > > > > > > > > It will not make a difference. The kernel will go
> through the
> > > > sleeping
> > > > > > > steps,
> > > > > > > > > then wake up again and see the real-time thread is ready
> to run, and
> > > > > then
> > > > > > > > > immediately schedule it.
> > > > > > > > >
> > > > > > > > > For testing purposes, consider sleeping 10 milliseconds
> or something
> > > > > > > > > significant like that.
> > > > > > > >
> > > > > > > > A bit more details...
> > > > > > > >
> > > > > > > > In our recent tests, nanosleep() itself took around 50 us.
> So you need
> > > > > to
> > > > > > > sleep longer than that for your thread not to be runnable
> when the
> > > > > nanosleep()
> > > > > > > wakes up again, because 50 us has already passed in
> "nanosleep
> > > > overhead".
> > > > > > > > 10 milliseconds provides plenty of margin, and corresponds
> to 10
> > > > jiffies
> > > > > on
> > > > > > > a 1000 Hz kernel. (I don't know if it makes any difference
> for the
> > > > kernel
> > > > > > > scheduler if the timer crosses a jiffy border or not.)
> > > > > > >
> > > > > > > 10 ms looks like an eternity.
> > > > > >
> > > > > > Agree. It is only for functional testing, not for production!
> > > > >
> > > > > Realtime thread won't make any sense if we have to insert a long
> sleep.
> > > >
> > > > It seems David came to our rescue here!
> > > >
> > > > I have just tried running our test again with
> prctl(PR_SET_TIMERSLACK) of 1
> > > > ns, and the nanosleep(1 ns) delay dropped from ca. 50 us to ca.
> 2.5 us.
> > > >
> > > > The timeout parameter to epoll_wait() is in milliseconds, which is
> useless for
> > > > low-latency.
> > > > Perhaps real-time threads can be used with epoll() combined with
> timerfd for
> > > > nanosecond resolution timeout.
> > >
> > > Or epoll_pwait2(), which has nanosecond resolution timeout.
> > >
> > > Unfortunately, rte_epoll_wait() is not an experimental API anymore,
> so we cannot change its timeout parameter from milliseconds to micro- or
> nanoseconds. We would have to introduce a new API for this.
> > >
> >
> > Just an idea - can we change the timeout parameter to float rather
> than int,
> > and then use function versioning for backward compatibility for any
> > binaries passing int?
> > That way the actual meaning of the parameter doesn't change, but it
> still
> > allows sub-millisecond values (all-be-it with some loss of accuracy
> due to
> > float).

Too exotic for my taste. I would rather introduce rte_epoll_wait_ns() with timeout in nanoseconds than pass a float.

> 
> Sorry I'm not following why you want to use rte_epoll_wait()?

I don't have experience with it yet, but it seems to be the official DPDK API for blocking I/O system call.

> 
> If the realtime thread has some blocking system calls,
> no sleep is needed I think.

Correct.

> For average realtime thread, I suggest the API
> rte_thread_yield_realtime()
> which could wait for 1 ms or less by default.

If we introduce a yield() function, it should yield according to the O/S scheduling policy, e.g. the rest of the time slice allocated to the thread by the O/S scheduler (although that might not be available for real-time prioritized threads in Linux). I don't think we can make it O/S agnostic.

I don't think it should wait a fixed amount of time - we already have rte_delay_us_sleep() for that.

In my experiments with power saving, I ended up with a varying sleep duration, depending on traffic load. The l3fwd-power example also uses a varying sleep duration depending on traffic load.

> For smaller sleep, we can use PR_SET_TIMERSLACK and
> rte_delay_us_sleep().

Agree.

> If we provide an API for PR_SET_TIMERSLACK, we could adapt the duration
> of rte_thread_yield_realtime() dynamically after calling prctl().
> 

I'm not sure exposing an API for PR_SET_TIMERSLACK is the right solution.

I would rather have the EAL set the timer slack to minimum (1 ns) at EAL initialization. An EAL command line parameter could be added to change the default from 1 ns.

Also, something similar needs to be done for Windows.


^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [PATCH v2] eal/unix: allow creating thread with real-time priority
  2023-10-26 16:32             ` Stephen Hemminger
@ 2023-10-26 17:07               ` Morten Brørup
  0 siblings, 0 replies; 54+ messages in thread
From: Morten Brørup @ 2023-10-26 17:07 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Thomas Monjalon, dev, David Marchand, stable, Anatoly Burakov,
	Dmitry Kozlyuk, Narcisa Ana Maria Vasile, Dmitry Malloy,
	Pallavi Kadam, Tyler Retzlaff, bruce.richardson,
	Andrew Rybchenko, Konstantin Ananyev

> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Thursday, 26 October 2023 18.32
> 
> On Thu, 26 Oct 2023 09:33:42 +0200
> Morten Brørup <mb@smartsharesystems.com> wrote:
> 
> > > From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> > > Sent: Wednesday, 25 October 2023 23.33
> > >
> > > On Wed, 25 Oct 2023 19:54:06 +0200
> > > Morten Brørup <mb@smartsharesystems.com> wrote:

[...]

> > > > Someone might build a kernel with options to keep non-dataplane
> threads off
> > > some dedicated CPU cores, so they can be used for guaranteed low-
> latency
> > > dataplane threads. We do. We don't use real-time priority, though.
> > >
> > > This is really, hard to do.
> >
> > As my kids would say: This is really, really, really, really, really
> hard to do!
> >
> > We have not been able to find an authoritative source of
> documentation describing how to do it. :-(

[...]

> One benefit of doing real-time thread is that kernel will be more
> precise in
> any calls to sleep. If you do small sleep in normal thread, the kernel
> will round
> up the timer to try and avoid reprogramming timer chip and to save
> power (less wakeups from idle).
> With RT thread it will do "you wanted 21us, ok for you will do 21us"

So we don't need PR_SET_TIMERSLACK with RT threads?

> 
> The project that was originally Vyatta, has a script that tries to
> isolate interrupts etc.
> I started it but they have worked on it since then.
> 
>    https://github.com/danos/vyatta-cpu-shield
> 
> It adjust kernel workers, softirq, cgroups etc

This script looks very interesting. Thank you, Stephen!


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 0/2] allow creating thread with real-time priority
  2023-10-26 16:50                       ` Morten Brørup
@ 2023-10-26 19:51                         ` Thomas Monjalon
  2023-10-27  7:19                           ` Morten Brørup
  0 siblings, 1 reply; 54+ messages in thread
From: Thomas Monjalon @ 2023-10-26 19:51 UTC (permalink / raw)
  To: Morten Brørup; +Cc: Bruce Richardson, dev, David Marchand, dev, stephen

26/10/2023 18:50, Morten Brørup:
> > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > Sent: Thursday, 26 October 2023 18.07
> > 
> > 26/10/2023 17:54, Bruce Richardson:
> > > On Thu, Oct 26, 2023 at 04:59:51PM +0200, Morten Brørup wrote:
> > > > > From: Morten Brørup [mailto:mb@smartsharesystems.com]
> > > > > Sent: Thursday, 26 October 2023 16.50
> > > > >
> > > > > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > > > > Sent: Thursday, 26 October 2023 16.31
> > > > > >
> > > > > > 26/10/2023 16:08, Morten Brørup:
> > > > > > > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > > > > > > Sent: Thursday, 26 October 2023 16.05
> > > > > > > >
> > > > > > > > 26/10/2023 15:57, Morten Brørup:
> > > > > > > > > > From: Morten Brørup [mailto:mb@smartsharesystems.com]
> > > > > > > > > > Sent: Thursday, 26 October 2023 15.45
> > > > > > > > > >
> > > > > > > > > > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > > > > > > > > > Sent: Thursday, 26 October 2023 15.37
> > > > > > > > > > >
> > > > > > > > > > > 25/10/2023 18:31, Thomas Monjalon:
> > > > > > > > > > > > Real-time thread priority was been forbidden on Unix
> > > > > > > > > > > > because of problems they can cause.
> > > > > > > > > > > > Warnings and helpers are added to avoid deadlocks,
> > > > > > > > > > > > so real-time can be allowed on all systems.
> > > > > > > > > > >
> > > > > > > > > > > Unit test is failing:
> > > > > > > > > > > DPDK:fast-tests / threads_autotest      TIMEOUT 600.01
> > s
> > > > > > > > > > >
> > > > > > > > > > > It is seen in only 1 target (maybe the failure
> > occurence is
> > > > > random):
> > > > > > > > > > >   Debian 11 (Buster) (ARM) | PASS
> > > > > > > > > > >   Fedora 37 (ARM)          | PASS
> > > > > > > > > > >   CentOS Stream 9 (ARM)    | FAIL
> > > > > > > > > > >   Fedora 38 (ARM)          | PASS
> > > > > > > > > > >   Fedora 38 (ARM Clang)    | PASS
> > > > > > > > > > >   Ubuntu 20.04 (ARM)       | PASS
> > > > > > > > > > >
> > > > > > > > > > > I need to send a v4 with new implementation and better
> > comments.
> > > > > > > > > > > The Unix sleep will be upgraded from 1 ns to 1 us in
> > case it makes
> > > > > a
> > > > > > > > > > > difference.
> > > > > > > > > >
> > > > > > > > > > It will not make a difference. The kernel will go
> > through the
> > > > > sleeping
> > > > > > > > steps,
> > > > > > > > > > then wake up again and see the real-time thread is ready
> > to run, and
> > > > > > then
> > > > > > > > > > immediately schedule it.
> > > > > > > > > >
> > > > > > > > > > For testing purposes, consider sleeping 10 milliseconds
> > or something
> > > > > > > > > > significant like that.
> > > > > > > > >
> > > > > > > > > A bit more details...
> > > > > > > > >
> > > > > > > > > In our recent tests, nanosleep() itself took around 50 us.
> > So you need
> > > > > > to
> > > > > > > > sleep longer than that for your thread not to be runnable
> > when the
> > > > > > nanosleep()
> > > > > > > > wakes up again, because 50 us has already passed in
> > "nanosleep
> > > > > overhead".
> > > > > > > > > 10 milliseconds provides plenty of margin, and corresponds
> > to 10
> > > > > jiffies
> > > > > > on
> > > > > > > > a 1000 Hz kernel. (I don't know if it makes any difference
> > for the
> > > > > kernel
> > > > > > > > scheduler if the timer crosses a jiffy border or not.)
> > > > > > > >
> > > > > > > > 10 ms looks like an eternity.
> > > > > > >
> > > > > > > Agree. It is only for functional testing, not for production!
> > > > > >
> > > > > > Realtime thread won't make any sense if we have to insert a long
> > sleep.
> > > > >
> > > > > It seems David came to our rescue here!
> > > > >
> > > > > I have just tried running our test again with
> > prctl(PR_SET_TIMERSLACK) of 1
> > > > > ns, and the nanosleep(1 ns) delay dropped from ca. 50 us to ca.
> > 2.5 us.
> > > > >
> > > > > The timeout parameter to epoll_wait() is in milliseconds, which is
> > useless for
> > > > > low-latency.
> > > > > Perhaps real-time threads can be used with epoll() combined with
> > timerfd for
> > > > > nanosecond resolution timeout.
> > > >
> > > > Or epoll_pwait2(), which has nanosecond resolution timeout.
> > > >
> > > > Unfortunately, rte_epoll_wait() is not an experimental API anymore,
> > so we cannot change its timeout parameter from milliseconds to micro- or
> > nanoseconds. We would have to introduce a new API for this.
> > > >
> > >
> > > Just an idea - can we change the timeout parameter to float rather
> > than int,
> > > and then use function versioning for backward compatibility for any
> > > binaries passing int?
> > > That way the actual meaning of the parameter doesn't change, but it
> > still
> > > allows sub-millisecond values (all-be-it with some loss of accuracy
> > due to
> > > float).
> 
> Too exotic for my taste. I would rather introduce rte_epoll_wait_ns() with timeout in nanoseconds than pass a float.
> 
> > 
> > Sorry I'm not following why you want to use rte_epoll_wait()?
> 
> I don't have experience with it yet, but it seems to be the official DPDK API for blocking I/O system call.
> 
> > 
> > If the realtime thread has some blocking system calls,
> > no sleep is needed I think.
> 
> Correct.
> 
> > For average realtime thread, I suggest the API
> > rte_thread_yield_realtime()
> > which could wait for 1 ms or less by default.
> 
> If we introduce a yield() function, it should yield according to the O/S scheduling policy, e.g. the rest of the time slice allocated to the thread by the O/S scheduler (although that might not be available for real-time prioritized threads in Linux). I don't think we can make it O/S agnostic.
> 
> I don't think it should wait a fixed amount of time - we already have rte_delay_us_sleep() for that.
> 
> In my experiments with power saving, I ended up with a varying sleep duration, depending on traffic load. The l3fwd-power example also uses a varying sleep duration depending on traffic load.

I feel you lost the goal of this: schedule other threads (especially kernel threads).
So we don't care about the sleep duration at all,
except we want it minimal while allowing scheduling.

> > For smaller sleep, we can use PR_SET_TIMERSLACK and
> > rte_delay_us_sleep().
> 
> Agree.
> 
> > If we provide an API for PR_SET_TIMERSLACK, we could adapt the duration
> > of rte_thread_yield_realtime() dynamically after calling prctl().
> > 
> 
> I'm not sure exposing an API for PR_SET_TIMERSLACK is the right solution.
> 
> I would rather have the EAL set the timer slack to minimum (1 ns) at EAL initialization. An EAL command line parameter could be added to change the default from 1 ns.
> 
> Also, something similar needs to be done for Windows.

Windows should be fine with Sleep(0).



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 0/2] allow creating thread with real-time priority
  2023-10-26 16:28             ` Stephen Hemminger
@ 2023-10-26 19:55               ` Thomas Monjalon
  2023-10-26 21:10                 ` Stephen Hemminger
  0 siblings, 1 reply; 54+ messages in thread
From: Thomas Monjalon @ 2023-10-26 19:55 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Morten Brørup, dev, bruce.richardson, dev, David Marchand

26/10/2023 18:28, Stephen Hemminger:
> On Thu, 26 Oct 2023 16:08:02 +0200
> Morten Brørup <mb@smartsharesystems.com> wrote:
> 
> > > > In our recent tests, nanosleep() itself took around 50 us. So you need to  
> > > sleep longer than that for your thread not to be runnable when the nanosleep()
> > > wakes up again, because 50 us has already passed in "nanosleep overhead".  
> > > > 10 milliseconds provides plenty of margin, and corresponds to 10 jiffies on  
> > > a 1000 Hz kernel. (I don't know if it makes any difference for the kernel
> > > scheduler if the timer crosses a jiffy border or not.)
> > > 
> > > 10 ms looks like an eternity.  
> > 
> > Agree. It is only for functional testing, not for production!
> 
> To be safe the sleep has to be longer than the system clock tick.
> Most systems are built today with HZ=250 but really should be using HZ=1000
> on modern CPU's.

If it has to be more than 1 ms,
we should mention it is a slow call
which may be skipped if the thread is already blocking on something else.




^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v5 0/2] allow creating thread with real-time priority
  2023-10-26 14:19 ` [PATCH v5 0/2] " Thomas Monjalon
  2023-10-26 14:19   ` [PATCH v5 1/2] eal: add thread yield functions Thomas Monjalon
  2023-10-26 14:19   ` [PATCH v5 2/2] eal/unix: allow creating thread with real-time priority Thomas Monjalon
@ 2023-10-26 20:00   ` Thomas Monjalon
  2023-10-26 23:35     ` Tyler Retzlaff
  2 siblings, 1 reply; 54+ messages in thread
From: Thomas Monjalon @ 2023-10-26 20:00 UTC (permalink / raw)
  To: stephen, Morten Brørup, Tyler Retzlaff; +Cc: dev, David Marchand

26/10/2023 16:19, Thomas Monjalon:
> Real-time thread priority was been forbidden on Unix
> because of problems they can cause.
> Warnings and helpers are added to avoid deadlocks,
> so real-time can be allowed on all systems.
> 
> Thomas Monjalon (2):
>   eal: add thread yield functions
>   eal/unix: allow creating thread with real-time priority
> 
> v1: no yield at all
> v2: more comments, sched_yield() and Sleep(0) on Windows
> v3: 2 yield functions with sleep in realtime version
> v4: runtime warning, longer sleep on Unix and lighter yield on Windows
> v5: fix build and increase Unix sleep to 1 ms
> 
> Thomas Monjalon (2):
>   eal: add thread yield functions
>   eal/unix: allow creating thread with real-time priority

Now there is a test failing on Windows:
http://mails.dpdk.org/archives/test-report/2023-October/491475.html




^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 0/2] allow creating thread with real-time priority
  2023-10-26 19:55               ` Thomas Monjalon
@ 2023-10-26 21:10                 ` Stephen Hemminger
  0 siblings, 0 replies; 54+ messages in thread
From: Stephen Hemminger @ 2023-10-26 21:10 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: Morten Brørup, dev, bruce.richardson, David Marchand

On Thu, 26 Oct 2023 21:55:24 +0200
Thomas Monjalon <thomas@monjalon.net> wrote:

> > To be safe the sleep has to be longer than the system clock tick.
> > Most systems are built today with HZ=250 but really should be using HZ=1000
> > on modern CPU's.  
> 
> If it has to be more than 1 ms,
> we should mention it is a slow call
> which may be skipped if the thread is already blocking on something else.

A thread that is real time is not subject to timer slack.
Therefore even 100ns would probably work fine.


https://man7.org/linux/man-pages/man7/time.7.html

  High-resolution timers
       Before Linux 2.6.21, the accuracy of timer and sleep system calls
       (see below) was also limited by the size of the jiffy.

       Since Linux 2.6.21, Linux supports high-resolution timers (HRTs),
       optionally configurable via CONFIG_HIGH_RES_TIMERS.  On a system
       that supports HRTs, the accuracy of sleep and timer system calls
       is no longer constrained by the jiffy, but instead can be as
       accurate as the hardware allows (microsecond accuracy is typical
       of modern hardware).  You can determine whether high-resolution
       timers are supported by checking the resolution returned by a
       call to clock_getres(2) or looking at the "resolution" entries in
       /proc/timer_list.

       HRTs are not supported on all hardware architectures.  (Support
       is provided on x86, ARM, and PowerPC, among others.)
...
   Timer slack
       Since Linux 2.6.28, it is possible to control the "timer slack"
       value for a thread.  The timer slack is the length of time by
       which the kernel may delay the wake-up of certain system calls
       that block with a timeout.  Permitting this delay allows the
       kernel to coalesce wake-up events, thus possibly reducing the
       number of system wake-ups and saving power.  For more details,
       see the description of PR_SET_TIMERSLACK in prctl(2).

...
https://man7.org/linux/man-pages/man2/prctl.2.html


       PR_SET_TIMERSLACK (since Linux 2.6.28)
              Each thread has two associated timer slack values: a
              "default" value, and a "current" value.  This operation
              sets the "current" timer slack value for the calling
              thread.  arg2 is an unsigned long value, then maximum
              "current" value is ULONG_MAX and the minimum "current"
              value is 1.  If the nanosecond value supplied in arg2 is
              greater than zero, then the "current" value is set to this
              value.  If arg2 is equal to zero, the "current" timer
              slack is reset to the thread's "default" timer slack
              value.

              The "current" timer slack is used by the kernel to group
              timer expirations for the calling thread that are close to
              one another; as a consequence, timer expirations for the
              thread may be up to the specified number of nanoseconds
              late (but will never expire early).  Grouping timer
              expirations can help reduce system power consumption by
              minimizing CPU wake-ups.

              The timer expirations affected by timer slack are those
              set by select(2), pselect(2), poll(2), ppoll(2),
              epoll_wait(2), epoll_pwait(2), clock_nanosleep(2),
              nanosleep(2), and futex(2) (and thus the library functions
              implemented via futexes, including
              pthread_cond_timedwait(3), pthread_mutex_timedlock(3),
              pthread_rwlock_timedrdlock(3),
              pthread_rwlock_timedwrlock(3), and sem_timedwait(3)).

              Timer slack is not applied to threads that are scheduled
              under a real-time scheduling policy (see
              sched_setscheduler(2)).

              When a new thread is created, the two timer slack values
              are made the same as the "current" value of the creating
              thread.  Thereafter, a thread can adjust its "current"
              timer slack value via PR_SET_TIMERSLACK.  The "default"
              value can't be changed.  The timer slack values of init
              (PID 1), the ancestor of all processes, are 50,000
              nanoseconds (50 microseconds).  The timer slack value is
              inherited by a child created via fork(2), and is preserved
              across execve(2).

              Since Linux 4.6, the "current" timer slack value of any
              process can be examined and changed via the file
              /proc/pid/timerslack_ns.  See proc(5).


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v5 0/2] allow creating thread with real-time priority
  2023-10-26 20:00   ` [PATCH v5 0/2] " Thomas Monjalon
@ 2023-10-26 23:35     ` Tyler Retzlaff
  0 siblings, 0 replies; 54+ messages in thread
From: Tyler Retzlaff @ 2023-10-26 23:35 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: stephen, Morten Brørup, dev, David Marchand

On Thu, Oct 26, 2023 at 10:00:33PM +0200, Thomas Monjalon wrote:
> 26/10/2023 16:19, Thomas Monjalon:
> > Real-time thread priority was been forbidden on Unix
> > because of problems they can cause.
> > Warnings and helpers are added to avoid deadlocks,
> > so real-time can be allowed on all systems.
> > 
> > Thomas Monjalon (2):
> >   eal: add thread yield functions
> >   eal/unix: allow creating thread with real-time priority
> > 
> > v1: no yield at all
> > v2: more comments, sched_yield() and Sleep(0) on Windows
> > v3: 2 yield functions with sleep in realtime version
> > v4: runtime warning, longer sleep on Unix and lighter yield on Windows
> > v5: fix build and increase Unix sleep to 1 ms
> > 
> > Thomas Monjalon (2):
> >   eal: add thread yield functions
> >   eal/unix: allow creating thread with real-time priority
> 
> Now there is a test failing on Windows:
> http://mails.dpdk.org/archives/test-report/2023-October/491475.html

40/84 DPDK:fast-tests / lcores_autotest                TIMEOUT
626.38s

i don't think this test is reliable. if the failure is being reported
due to the lcores_autotest timeout.

i also see this test fail on linux for me occasionally when run in my
local lab.

ty

^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [PATCH v3 0/2] allow creating thread with real-time priority
  2023-10-26 19:51                         ` Thomas Monjalon
@ 2023-10-27  7:19                           ` Morten Brørup
  0 siblings, 0 replies; 54+ messages in thread
From: Morten Brørup @ 2023-10-27  7:19 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Bruce Richardson, dev, David Marchand, stephen, Tyler Retzlaff

> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> Sent: Thursday, 26 October 2023 21.51
> 
> 26/10/2023 18:50, Morten Brørup:
> > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > Sent: Thursday, 26 October 2023 18.07

[...]

> > > For average realtime thread, I suggest the API
> > > rte_thread_yield_realtime()
> > > which could wait for 1 ms or less by default.
> >
> > If we introduce a yield() function, it should yield according to the
> O/S scheduling policy, e.g. the rest of the time slice allocated to the
> thread by the O/S scheduler (although that might not be available for
> real-time prioritized threads in Linux). I don't think we can make it
> O/S agnostic.
> >
> > I don't think it should wait a fixed amount of time - we already have
> rte_delay_us_sleep() for that.
> >
> > In my experiments with power saving, I ended up with a varying sleep
> duration, depending on traffic load. The l3fwd-power example also uses
> a varying sleep duration depending on traffic load.
> 
> I feel you lost the goal of this: schedule other threads (especially
> kernel threads).
> So we don't care about the sleep duration at all,
> except we want it minimal while allowing scheduling.

I do understand the goal: We want to relinquish the CPU core briefly, so the RT-priority thread gives way for other threads to run on the same CPU core.

But I don't think it is possible for an RT-priority thread to yield without waiting in a blocking call or waiting for an explicit timeout. This is the reason why Stephen wants the big fat warning sign when assigning RT-priority to a thread.

This will happen if we don't provide something for the RT-priority thread to wait for:

The application thread calls "yield()", e.g. sleep(0), to pass control to the kernel scheduler.
The kernel scheduler looks at all tasks available. The application thread has RT-priority and is ready to run, so the kernel scheduler will pick this thread for running. Only if other RT-priority threads are ready to run, perhaps one of those will be picked. No other threads will be picked, not even kernel threads (because RT-priority is higher than the priority of the kernel threads).

[...]

> Windows should be fine with Sleep(0).

Reading the Win32 API documentation [1] carefully, it think it will generally behave like the Linux scheduler. It says:

"A value of zero causes the thread to relinquish the remainder of its time slice to any other thread of equal priority that is ready to run. If there are no other threads of equal priority ready to run, the function returns immediately, and the thread continues execution. This behavior changed starting with Windows Server 2003."

How I interpret this is: The Windows scheduler will not relinquish the CPU core to lower priority threads when calling Sleep(0). If the application thread has RT-priority, the scheduler will immediately pick the same thread. Only if other RT-priority threads are ready to run, one of those will be picked.

[1]: https://learn.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-sleep


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v6 1/1] eal/unix: allow creating thread with real-time priority
  2023-10-24 12:54 [PATCH] eal/unix: allow creating thread with real-time priority Thomas Monjalon
                   ` (4 preceding siblings ...)
  2023-10-26 14:19 ` [PATCH v5 0/2] " Thomas Monjalon
@ 2023-10-27  8:08 ` Thomas Monjalon
  2023-10-27  8:45   ` Morten Brørup
  5 siblings, 1 reply; 54+ messages in thread
From: Thomas Monjalon @ 2023-10-27  8:08 UTC (permalink / raw)
  To: dev
  Cc: David Marchand, stable, Morten Brørup, Anatoly Burakov,
	Tyler Retzlaff, Narcisa Vasile, Stephen Hemminger,
	Dmitry Kozlyuk, Konstantin Ananyev, Andrew Rybchenko

When adding an API for creating threads,
the real-time priority has been forbidden on Unix.

There is a known issue with ring behaviour,
but it should not be completely forbidden.

Real-time thread can block some kernel threads on the same core,
making the system unstable.
That's why a sleep is added in the test thread,
and a warning is logged when using real-time priority.

Fixes: ca04c78b6262 ("eal: get/set thread priority per thread identifier")
Fixes: ce6e911d20f6 ("eal: add thread lifetime API")
Fixes: a7ba40b2b1bf ("drivers: convert to internal control threads")
Cc: stable@dpdk.org

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
---

v1: no yield at all
v2: more comments, sched_yield() and Sleep(0) on Windows
v3: 2 yield functions with sleep in realtime version
v4: runtime warning, longer sleep on Unix and lighter yield on Windows
v5: fix build and increase Unix sleep to 1 ms
v6: no yield helper, use a big sleep for testing purpose

It is not clear how to implement helpers which would work
on all systems (different configuration, arch and OS).
So the first patch adding yield functions is dropped for now.

---
 app/test/test_threads.c                       | 12 ++--------
 .../prog_guide/env_abstraction_layer.rst      |  4 +++-
 lib/eal/include/rte_thread.h                  |  8 +++++--
 lib/eal/unix/rte_thread.c                     | 22 +++++++++++--------
 4 files changed, 24 insertions(+), 22 deletions(-)

diff --git a/app/test/test_threads.c b/app/test/test_threads.c
index 4ac3f2671a..a863214137 100644
--- a/app/test/test_threads.c
+++ b/app/test/test_threads.c
@@ -5,6 +5,7 @@
 #include <string.h>
 
 #include <rte_thread.h>
+#include <rte_cycles.h>
 #include <rte_debug.h>
 
 #include "test.h"
@@ -22,7 +23,7 @@ thread_main(void *arg)
 	__atomic_store_n(&thread_id_ready, 1, __ATOMIC_RELEASE);
 
 	while (__atomic_load_n(&thread_id_ready, __ATOMIC_ACQUIRE) == 1)
-		;
+		rte_delay_us_sleep(1000); /* required for RT priority */
 
 	return 0;
 }
@@ -97,21 +98,12 @@ test_thread_priority(void)
 		"Priority set mismatches priority get");
 
 	priority = RTE_THREAD_PRIORITY_REALTIME_CRITICAL;
-#ifndef RTE_EXEC_ENV_WINDOWS
-	RTE_TEST_ASSERT(rte_thread_set_priority(thread_id, priority) == ENOTSUP,
-		"Priority set to critical should fail");
-	RTE_TEST_ASSERT(rte_thread_get_priority(thread_id, &priority) == 0,
-		"Failed to get thread priority");
-	RTE_TEST_ASSERT(priority == RTE_THREAD_PRIORITY_NORMAL,
-		"Failed set to critical should have retained normal");
-#else
 	RTE_TEST_ASSERT(rte_thread_set_priority(thread_id, priority) == 0,
 		"Priority set to critical should succeed");
 	RTE_TEST_ASSERT(rte_thread_get_priority(thread_id, &priority) == 0,
 		"Failed to get thread priority");
 	RTE_TEST_ASSERT(priority == RTE_THREAD_PRIORITY_REALTIME_CRITICAL,
 		"Priority set mismatches priority get");
-#endif
 
 	priority = RTE_THREAD_PRIORITY_NORMAL;
 	RTE_TEST_ASSERT(rte_thread_set_priority(thread_id, priority) == 0,
diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst
index 6debf54efb..d1f7cae7cd 100644
--- a/doc/guides/prog_guide/env_abstraction_layer.rst
+++ b/doc/guides/prog_guide/env_abstraction_layer.rst
@@ -815,7 +815,9 @@ Known Issues
 
   4. It MAY be used by preemptible multi-producer and/or preemptible multi-consumer pthreads whose scheduling policy are all SCHED_OTHER(cfs), SCHED_IDLE or SCHED_BATCH. User SHOULD be aware of the performance penalty before using it.
 
-  5. It MUST not be used by multi-producer/consumer pthreads, whose scheduling policies are SCHED_FIFO or SCHED_RR.
+  5. It MUST not be used by multi-producer/consumer pthreads
+     whose scheduling policies are ``SCHED_FIFO``
+     or ``SCHED_RR`` (``RTE_THREAD_PRIORITY_REALTIME_CRITICAL``).
 
   Alternatively, applications can use the lock-free stack mempool handler. When
   considering this handler, note that:
diff --git a/lib/eal/include/rte_thread.h b/lib/eal/include/rte_thread.h
index f2581fe152..7ff031e1b2 100644
--- a/lib/eal/include/rte_thread.h
+++ b/lib/eal/include/rte_thread.h
@@ -56,10 +56,14 @@ typedef uint32_t (*rte_thread_func) (void *arg);
  * Thread priority values.
  */
 enum rte_thread_priority {
+	/** Normal thread priority, the default. */
 	RTE_THREAD_PRIORITY_NORMAL            = 0,
-	/**< normal thread priority, the default */
+	/**
+	 * Highest thread priority, use with caution.
+	 * WARNING: System may be unstable because of a real-time busy loop.
+	 *          @see rte_thread_yield_realtime().
+	 */
 	RTE_THREAD_PRIORITY_REALTIME_CRITICAL = 1,
-	/**< highest thread priority allowed */
 };
 
 /**
diff --git a/lib/eal/unix/rte_thread.c b/lib/eal/unix/rte_thread.c
index 278d8d342d..17ffb86c17 100644
--- a/lib/eal/unix/rte_thread.c
+++ b/lib/eal/unix/rte_thread.c
@@ -33,6 +33,8 @@ static int
 thread_map_priority_to_os_value(enum rte_thread_priority eal_pri, int *os_pri,
 	int *pol)
 {
+	static bool warned;
+
 	/* Clear the output parameters. */
 	*os_pri = sched_get_priority_min(SCHED_OTHER) - 1;
 	*pol = -1;
@@ -51,6 +53,17 @@ thread_map_priority_to_os_value(enum rte_thread_priority eal_pri, int *os_pri,
 			sched_get_priority_max(SCHED_OTHER)) / 2;
 		break;
 	case RTE_THREAD_PRIORITY_REALTIME_CRITICAL:
+		/*
+		 * WARNING: Real-time busy loop takes priority on kernel threads,
+		 *          making the system unstable.
+		 *          There is also a known issue when using rte_ring.
+		 */
+		if (!warned) {
+			RTE_LOG(NOTICE, EAL,
+					"Real-time thread is unstable if polling without sleep.\n");
+			warned = true;
+		}
+
 		*pol = SCHED_RR;
 		*os_pri = sched_get_priority_max(SCHED_RR);
 		break;
@@ -155,11 +168,6 @@ rte_thread_create(rte_thread_t *thread_id,
 			goto cleanup;
 		}
 
-		if (thread_attr->priority ==
-				RTE_THREAD_PRIORITY_REALTIME_CRITICAL) {
-			ret = ENOTSUP;
-			goto cleanup;
-		}
 		ret = thread_map_priority_to_os_value(thread_attr->priority,
 				&param.sched_priority, &policy);
 		if (ret != 0)
@@ -291,10 +299,6 @@ rte_thread_set_priority(rte_thread_t thread_id,
 	int policy;
 	int ret;
 
-	/* Realtime priority can cause crashes on non-Windows platforms. */
-	if (priority == RTE_THREAD_PRIORITY_REALTIME_CRITICAL)
-		return ENOTSUP;
-
 	ret = thread_map_priority_to_os_value(priority, &param.sched_priority,
 		&policy);
 	if (ret != 0)
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* RE: [PATCH v6 1/1] eal/unix: allow creating thread with real-time priority
  2023-10-27  8:08 ` [PATCH v6 1/1] eal/unix: " Thomas Monjalon
@ 2023-10-27  8:45   ` Morten Brørup
  2023-10-27  9:11     ` Thomas Monjalon
  2023-10-27 18:15     ` Stephen Hemminger
  0 siblings, 2 replies; 54+ messages in thread
From: Morten Brørup @ 2023-10-27  8:45 UTC (permalink / raw)
  To: Thomas Monjalon, dev
  Cc: David Marchand, stable, Anatoly Burakov, Tyler Retzlaff,
	Narcisa Vasile, Stephen Hemminger, Dmitry Kozlyuk,
	Konstantin Ananyev, Andrew Rybchenko

> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> Sent: Friday, 27 October 2023 10.09
> 
> When adding an API for creating threads,
> the real-time priority has been forbidden on Unix.
> 
> There is a known issue with ring behaviour,
> but it should not be completely forbidden.
> 
> Real-time thread can block some kernel threads on the same core,
> making the system unstable.
> That's why a sleep is added in the test thread,
> and a warning is logged when using real-time priority.
> 
> Fixes: ca04c78b6262 ("eal: get/set thread priority per thread
> identifier")
> Fixes: ce6e911d20f6 ("eal: add thread lifetime API")
> Fixes: a7ba40b2b1bf ("drivers: convert to internal control threads")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
> Acked-by: Morten Brørup <mb@smartsharesystems.com>
> ---

[...]

>  enum rte_thread_priority {
> +	/** Normal thread priority, the default. */
>  	RTE_THREAD_PRIORITY_NORMAL            = 0,
> -	/**< normal thread priority, the default */
> +	/**
> +	 * Highest thread priority, use with caution.
> +	 * WARNING: System may be unstable because of a real-time busy
> loop.
> +	 *          @see rte_thread_yield_realtime().

Please remove the reference to the now non-existing function.

Also, I'd prefer to move the warning comments (about real-time threads having priority over kernel threads, and issues with rte_ring) up here, so it goes into the public API documentation.

> +	 */
>  	RTE_THREAD_PRIORITY_REALTIME_CRITICAL = 1,
> -	/**< highest thread priority allowed */
>  };
> 
>  /**
> diff --git a/lib/eal/unix/rte_thread.c b/lib/eal/unix/rte_thread.c
> index 278d8d342d..17ffb86c17 100644
> --- a/lib/eal/unix/rte_thread.c
> +++ b/lib/eal/unix/rte_thread.c
> @@ -33,6 +33,8 @@ static int
>  thread_map_priority_to_os_value(enum rte_thread_priority eal_pri, int
> *os_pri,
>  	int *pol)
>  {
> +	static bool warned;
> +
>  	/* Clear the output parameters. */
>  	*os_pri = sched_get_priority_min(SCHED_OTHER) - 1;
>  	*pol = -1;
> @@ -51,6 +53,17 @@ thread_map_priority_to_os_value(enum
> rte_thread_priority eal_pri, int *os_pri,
>  			sched_get_priority_max(SCHED_OTHER)) / 2;
>  		break;
>  	case RTE_THREAD_PRIORITY_REALTIME_CRITICAL:
> +		/*
> +		 * WARNING: Real-time busy loop takes priority on kernel
> threads,
> +		 *          making the system unstable.
> +		 *          There is also a known issue when using
> rte_ring.
> +		 */
> +		if (!warned) {
> +			RTE_LOG(NOTICE, EAL,
> +					"Real-time thread is unstable if polling
> without sleep.\n");
> +			warned = true;
> +		}

Is it 100 % certain that the system becomes unstable if not sleeping or using blocking system calls from a real-time thread?
And technically, it's not the thread itself that becomes unstable.

How about:
"System may be unstable unless real-time thread uses blocking system calls or sleeps."
or:
"Real-time thread usually requires the use of blocking system calls or sleeps."
or something else.

My ACK is still valid.


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v6 1/1] eal/unix: allow creating thread with real-time priority
  2023-10-27  8:45   ` Morten Brørup
@ 2023-10-27  9:11     ` Thomas Monjalon
  2023-10-27 18:15     ` Stephen Hemminger
  1 sibling, 0 replies; 54+ messages in thread
From: Thomas Monjalon @ 2023-10-27  9:11 UTC (permalink / raw)
  To: Morten Brørup
  Cc: dev, David Marchand, stable, Anatoly Burakov, Tyler Retzlaff,
	Narcisa Vasile, Stephen Hemminger, Dmitry Kozlyuk,
	Konstantin Ananyev, Andrew Rybchenko

27/10/2023 10:45, Morten Brørup:
> > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > Sent: Friday, 27 October 2023 10.09
> > 
> > When adding an API for creating threads,
> > the real-time priority has been forbidden on Unix.
> > 
> > There is a known issue with ring behaviour,
> > but it should not be completely forbidden.
> > 
> > Real-time thread can block some kernel threads on the same core,
> > making the system unstable.
> > That's why a sleep is added in the test thread,
> > and a warning is logged when using real-time priority.
> > 
> > Fixes: ca04c78b6262 ("eal: get/set thread priority per thread
> > identifier")
> > Fixes: ce6e911d20f6 ("eal: add thread lifetime API")
> > Fixes: a7ba40b2b1bf ("drivers: convert to internal control threads")
> > Cc: stable@dpdk.org
> > 
> > Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
> > Acked-by: Morten Brørup <mb@smartsharesystems.com>
> > ---
> 
> [...]
> 
> >  enum rte_thread_priority {
> > +	/** Normal thread priority, the default. */
> >  	RTE_THREAD_PRIORITY_NORMAL            = 0,
> > -	/**< normal thread priority, the default */
> > +	/**
> > +	 * Highest thread priority, use with caution.
> > +	 * WARNING: System may be unstable because of a real-time busy
> > loop.
> > +	 *          @see rte_thread_yield_realtime().
> 
> Please remove the reference to the now non-existing function.
> 
> Also, I'd prefer to move the warning comments (about real-time threads having priority over kernel threads, and issues with rte_ring) up here, so it goes into the public API documentation.

Yes OK, thanks for the careful review.

> 
> > +	 */
> >  	RTE_THREAD_PRIORITY_REALTIME_CRITICAL = 1,
> > -	/**< highest thread priority allowed */
> >  };
> > 
> >  /**
> > diff --git a/lib/eal/unix/rte_thread.c b/lib/eal/unix/rte_thread.c
> > index 278d8d342d..17ffb86c17 100644
> > --- a/lib/eal/unix/rte_thread.c
> > +++ b/lib/eal/unix/rte_thread.c
> > @@ -33,6 +33,8 @@ static int
> >  thread_map_priority_to_os_value(enum rte_thread_priority eal_pri, int
> > *os_pri,
> >  	int *pol)
> >  {
> > +	static bool warned;
> > +
> >  	/* Clear the output parameters. */
> >  	*os_pri = sched_get_priority_min(SCHED_OTHER) - 1;
> >  	*pol = -1;
> > @@ -51,6 +53,17 @@ thread_map_priority_to_os_value(enum
> > rte_thread_priority eal_pri, int *os_pri,
> >  			sched_get_priority_max(SCHED_OTHER)) / 2;
> >  		break;
> >  	case RTE_THREAD_PRIORITY_REALTIME_CRITICAL:
> > +		/*
> > +		 * WARNING: Real-time busy loop takes priority on kernel
> > threads,
> > +		 *          making the system unstable.
> > +		 *          There is also a known issue when using
> > rte_ring.
> > +		 */
> > +		if (!warned) {
> > +			RTE_LOG(NOTICE, EAL,
> > +					"Real-time thread is unstable if polling
> > without sleep.\n");
> > +			warned = true;
> > +		}
> 
> Is it 100 % certain that the system becomes unstable if not sleeping or using blocking system calls from a real-time thread?
> And technically, it's not the thread itself that becomes unstable.
> 
> How about:
> "System may be unstable unless real-time thread uses blocking system calls or sleeps."
> or:
> "Real-time thread usually requires the use of blocking system calls or sleeps."
> or something else.

Yes something like that looks better.
I will try to find a short sentence.

> 
> My ACK is still valid.
> 
> 






^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v6 1/1] eal/unix: allow creating thread with real-time priority
  2023-10-27  8:45   ` Morten Brørup
  2023-10-27  9:11     ` Thomas Monjalon
@ 2023-10-27 18:15     ` Stephen Hemminger
  1 sibling, 0 replies; 54+ messages in thread
From: Stephen Hemminger @ 2023-10-27 18:15 UTC (permalink / raw)
  To: Morten Brørup
  Cc: Thomas Monjalon, dev, David Marchand, stable, Anatoly Burakov,
	Tyler Retzlaff, Narcisa Vasile, Dmitry Kozlyuk,
	Konstantin Ananyev, Andrew Rybchenko

On Fri, 27 Oct 2023 10:45:03 +0200
Morten Brørup <mb@smartsharesystems.com> wrote:

> Is it 100 % certain that the system becomes unstable if not sleeping or using blocking system calls from a real-time thread?
> And technically, it's not the thread itself that becomes unstable.

My experience is that the goal of real time threads is "do not let kernel have higher priority than this thread".
That means that if/when a kernel worker is needed on this core, instability happens.
It is very hard but possible to ensure that a kernel worker thread never runs on that core.
But if you are able to do configure that, then there is no point in using RT threads.
The best design is to avoid the kernel needing to run, and if it does then let it run.

^ permalink raw reply	[flat|nested] 54+ messages in thread

end of thread, other threads:[~2023-10-27 18:16 UTC | newest]

Thread overview: 54+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-24 12:54 [PATCH] eal/unix: allow creating thread with real-time priority Thomas Monjalon
2023-10-24 13:55 ` Morten Brørup
2023-10-24 16:04   ` Stephen Hemminger
2023-10-25 13:15     ` Thomas Monjalon
2023-10-25 13:34       ` Bruce Richardson
2023-10-25 13:44         ` Thomas Monjalon
2023-10-25 15:08           ` Stephen Hemminger
2023-10-25 15:14             ` Bruce Richardson
2023-10-25 15:18               ` Thomas Monjalon
2023-10-25 15:32                 ` Thomas Monjalon
2023-10-25 15:13 ` [PATCH v2] " Thomas Monjalon
2023-10-25 15:37   ` Stephen Hemminger
2023-10-25 16:46     ` Thomas Monjalon
2023-10-25 17:54       ` Morten Brørup
2023-10-25 21:33         ` Stephen Hemminger
2023-10-26  7:33           ` Morten Brørup
2023-10-26 16:32             ` Stephen Hemminger
2023-10-26 17:07               ` Morten Brørup
2023-10-26  0:00         ` Stephen Hemminger
2023-10-25 16:31 ` [PATCH v3 0/2] " Thomas Monjalon
2023-10-25 16:31   ` [PATCH v3 1/2] eal: add thread yield functions Thomas Monjalon
2023-10-25 16:40     ` Bruce Richardson
2023-10-25 17:06       ` Thomas Monjalon
2023-10-25 18:07     ` Morten Brørup
2023-10-25 16:31   ` [PATCH v3 2/2] eal/unix: allow creating thread with real-time priority Thomas Monjalon
2023-10-26 13:36   ` [PATCH v3 0/2] " Thomas Monjalon
2023-10-26 13:44     ` Morten Brørup
2023-10-26 13:57       ` Morten Brørup
2023-10-26 14:04         ` Thomas Monjalon
2023-10-26 14:08           ` Morten Brørup
2023-10-26 14:30             ` Thomas Monjalon
2023-10-26 14:50               ` Morten Brørup
2023-10-26 14:59                 ` Morten Brørup
2023-10-26 15:54                   ` Bruce Richardson
2023-10-26 16:07                     ` Thomas Monjalon
2023-10-26 16:50                       ` Morten Brørup
2023-10-26 19:51                         ` Thomas Monjalon
2023-10-27  7:19                           ` Morten Brørup
2023-10-26 16:28             ` Stephen Hemminger
2023-10-26 19:55               ` Thomas Monjalon
2023-10-26 21:10                 ` Stephen Hemminger
2023-10-26 14:10         ` David Marchand
2023-10-26 13:37 ` [PATCH v4 " Thomas Monjalon
2023-10-26 13:37   ` [PATCH v4 1/2] eal: add thread yield functions Thomas Monjalon
2023-10-26 13:37   ` [PATCH v4 2/2] eal/unix: allow creating thread with real-time priority Thomas Monjalon
2023-10-26 14:19 ` [PATCH v5 0/2] " Thomas Monjalon
2023-10-26 14:19   ` [PATCH v5 1/2] eal: add thread yield functions Thomas Monjalon
2023-10-26 14:19   ` [PATCH v5 2/2] eal/unix: allow creating thread with real-time priority Thomas Monjalon
2023-10-26 20:00   ` [PATCH v5 0/2] " Thomas Monjalon
2023-10-26 23:35     ` Tyler Retzlaff
2023-10-27  8:08 ` [PATCH v6 1/1] eal/unix: " Thomas Monjalon
2023-10-27  8:45   ` Morten Brørup
2023-10-27  9:11     ` Thomas Monjalon
2023-10-27 18:15     ` Stephen Hemminger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.