All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] rteval: Offline NUMA node bugfix
@ 2022-04-19 16:14 Valentin Schneider
  2022-04-19 16:14 ` [PATCH 1/3] rteval: systopology: Fix offline NUMA node parsing Valentin Schneider
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Valentin Schneider @ 2022-04-19 16:14 UTC (permalink / raw)
  To: linux-rt-users; +Cc: John Kacur, Clark Williams

Hi folks,

I wanted to try out rteval on a NUMA system with one node offlined and
got a few exceptions, here's my take on them. There's also a small
cleanup as an added bonus.

Cheers,
Valentin

Valentin Schneider (3):
  rteval: systopology: Fix offline NUMA node parsing
  rteval: kcompile: Fix offline node handling
  rteval: systopology: Slight CpuList.__expand_cpulist() cleanup

 rteval/modules/loads/kcompile.py |  5 ++++-
 rteval/systopology.py            | 15 +++++++++------
 2 files changed, 13 insertions(+), 7 deletions(-)

--
2.27.0


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/3] rteval: systopology: Fix offline NUMA node parsing
  2022-04-19 16:14 [PATCH 0/3] rteval: Offline NUMA node bugfix Valentin Schneider
@ 2022-04-19 16:14 ` Valentin Schneider
  2022-04-29 19:54   ` John Kacur
  2022-04-19 16:14 ` [PATCH 2/3] rteval: kcompile: Fix offline node handling Valentin Schneider
  2022-04-19 16:14 ` [PATCH 3/3] rteval: systopology: Slight CpuList.__expand_cpulist() cleanup Valentin Schneider
  2 siblings, 1 reply; 8+ messages in thread
From: Valentin Schneider @ 2022-04-19 16:14 UTC (permalink / raw)
  To: linux-rt-users; +Cc: John Kacur, Clark Williams

An offline NUMA node will report in its cpulist an empty
string. Unfortunately, "".split(sep=x) with x != None returns a list
containing an empty string rather than an empty list, which causes
CpuList._expand_cpulist() to try to run int(''), which ends up in the
following exception:

  ValueError: invalid literal for int() with base 10: ''

Prevent this by adding an early empty-string check.

Signed-off-by: Valentin Schneider <vschneid@redhat.com>
---
 rteval/systopology.py | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/rteval/systopology.py b/rteval/systopology.py
index bf794ce..b2da7bb 100644
--- a/rteval/systopology.py
+++ b/rteval/systopology.py
@@ -103,6 +103,10 @@ class CpuList:
         don't error check against online cpus
         """
         result = []
+
+        if not cpulist:
+            return result
+
         for part in cpulist.split(','):
             if '-' in part:
                 a, b = part.split('-')
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/3] rteval: kcompile: Fix offline node handling
  2022-04-19 16:14 [PATCH 0/3] rteval: Offline NUMA node bugfix Valentin Schneider
  2022-04-19 16:14 ` [PATCH 1/3] rteval: systopology: Fix offline NUMA node parsing Valentin Schneider
@ 2022-04-19 16:14 ` Valentin Schneider
  2022-04-29 20:21   ` John Kacur
  2022-04-19 16:14 ` [PATCH 3/3] rteval: systopology: Slight CpuList.__expand_cpulist() cleanup Valentin Schneider
  2 siblings, 1 reply; 8+ messages in thread
From: Valentin Schneider @ 2022-04-19 16:14 UTC (permalink / raw)
  To: linux-rt-users; +Cc: John Kacur, Clark Williams

Having an empty NumaNode but with CPUs attached to it (IOW they are all
offline) causes kcompile.py to raise the following exception:

  calc_jobs_per_cpu():
      ratio = float(mem) / float(len(self.node))
  ZeroDivisionError: float division by zero

Remove nodes that do have CPUs but none of which are online.

Signed-off-by: Valentin Schneider <vschneid@redhat.com>
---
 rteval/modules/loads/kcompile.py | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/rteval/modules/loads/kcompile.py b/rteval/modules/loads/kcompile.py
index 367f8dc..ac99964 100644
--- a/rteval/modules/loads/kcompile.py
+++ b/rteval/modules/loads/kcompile.py
@@ -211,7 +211,10 @@ class Kcompile(CommandLineLoad):
 
         # remove nodes with no cpus available for running
         for node, cpus in self.cpus.items():
-            if not cpus:
+            # If the intersection between the node CPUs and the cpulist is empty
+            # then either the cpulist exludes that node, or the CPUs allowed by
+            # the cpulist are actually offline
+            if not set(self.topology.nodes[node].cpus.cpulist) & set(cpus):
                 self.nodes.remove(node)
                 self._log(Log.DEBUG, "node %s has no available cpus, removing" % node)
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 3/3] rteval: systopology: Slight CpuList.__expand_cpulist() cleanup
  2022-04-19 16:14 [PATCH 0/3] rteval: Offline NUMA node bugfix Valentin Schneider
  2022-04-19 16:14 ` [PATCH 1/3] rteval: systopology: Fix offline NUMA node parsing Valentin Schneider
  2022-04-19 16:14 ` [PATCH 2/3] rteval: kcompile: Fix offline node handling Valentin Schneider
@ 2022-04-19 16:14 ` Valentin Schneider
  2022-04-29 20:53   ` John Kacur
  2 siblings, 1 reply; 8+ messages in thread
From: Valentin Schneider @ 2022-04-19 16:14 UTC (permalink / raw)
  To: linux-rt-users; +Cc: John Kacur, Clark Williams

This method currently aggregates CPUs into a list, then converts this to
set and then back to list. The aggregation can instead be done directly
into a set.

(as an offside, it would make more sense for CpuList to have its storage be
a set in the first place as duplicate CPU ids don't make sense for it, but
that's a separate discussion :-))

The integer conversion of the "a-b" pattern can also be condensed into a
single map() expression.

Signed-off-by: Valentin Schneider <vschneid@redhat.com>
---
 rteval/systopology.py | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/rteval/systopology.py b/rteval/systopology.py
index b2da7bb..2a28f9c 100644
--- a/rteval/systopology.py
+++ b/rteval/systopology.py
@@ -102,20 +102,19 @@ class CpuList:
         """ expand a range string into an array of cpu numbers
         don't error check against online cpus
         """
-        result = []
-
         if not cpulist:
-            return result
+            return []
+
+        result = set()
 
         for part in cpulist.split(','):
             if '-' in part:
-                a, b = part.split('-')
-                a, b = int(a), int(b)
-                result.extend(list(range(a, b + 1)))
+                a, b = map(int, part.split('-'))
+                result |= set(range(a, b + 1))
             else:
                 a = int(part)
-                result.append(a)
-        return [int(i) for i in list(set(result))]
+                result |= {a}
+        return list(result)
 
     def getcpulist(self):
         """ return the list of cpus tracked """
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/3] rteval: systopology: Fix offline NUMA node parsing
  2022-04-19 16:14 ` [PATCH 1/3] rteval: systopology: Fix offline NUMA node parsing Valentin Schneider
@ 2022-04-29 19:54   ` John Kacur
  0 siblings, 0 replies; 8+ messages in thread
From: John Kacur @ 2022-04-29 19:54 UTC (permalink / raw)
  To: Valentin Schneider; +Cc: linux-rt-users, Clark Williams



On Tue, 19 Apr 2022, Valentin Schneider wrote:

> An offline NUMA node will report in its cpulist an empty
> string. Unfortunately, "".split(sep=x) with x != None returns a list
> containing an empty string rather than an empty list, which causes
> CpuList._expand_cpulist() to try to run int(''), which ends up in the
> following exception:
> 
>   ValueError: invalid literal for int() with base 10: ''
> 
> Prevent this by adding an early empty-string check.
> 
> Signed-off-by: Valentin Schneider <vschneid@redhat.com>
> ---
>  rteval/systopology.py | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/rteval/systopology.py b/rteval/systopology.py
> index bf794ce..b2da7bb 100644
> --- a/rteval/systopology.py
> +++ b/rteval/systopology.py
> @@ -103,6 +103,10 @@ class CpuList:
>          don't error check against online cpus
>          """
>          result = []
> +
> +        if not cpulist:
> +            return result
> +
>          for part in cpulist.split(','):
>              if '-' in part:
>                  a, b = part.split('-')
> -- 
> 2.27.0
> 
> 
Signed-off-by: John Kacur <jkacur@redhat.com>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/3] rteval: kcompile: Fix offline node handling
  2022-04-19 16:14 ` [PATCH 2/3] rteval: kcompile: Fix offline node handling Valentin Schneider
@ 2022-04-29 20:21   ` John Kacur
  0 siblings, 0 replies; 8+ messages in thread
From: John Kacur @ 2022-04-29 20:21 UTC (permalink / raw)
  To: Valentin Schneider; +Cc: linux-rt-users, Clark Williams



On Tue, 19 Apr 2022, Valentin Schneider wrote:

> Having an empty NumaNode but with CPUs attached to it (IOW they are all
> offline) causes kcompile.py to raise the following exception:
> 
>   calc_jobs_per_cpu():
>       ratio = float(mem) / float(len(self.node))
>   ZeroDivisionError: float division by zero
> 
> Remove nodes that do have CPUs but none of which are online.
> 
> Signed-off-by: Valentin Schneider <vschneid@redhat.com>
> ---
>  rteval/modules/loads/kcompile.py | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/rteval/modules/loads/kcompile.py b/rteval/modules/loads/kcompile.py
> index 367f8dc..ac99964 100644
> --- a/rteval/modules/loads/kcompile.py
> +++ b/rteval/modules/loads/kcompile.py
> @@ -211,7 +211,10 @@ class Kcompile(CommandLineLoad):
>  
>          # remove nodes with no cpus available for running
>          for node, cpus in self.cpus.items():
> -            if not cpus:
> +            # If the intersection between the node CPUs and the cpulist is empty
> +            # then either the cpulist exludes that node, or the CPUs allowed by
> +            # the cpulist are actually offline
> +            if not set(self.topology.nodes[node].cpus.cpulist) & set(cpus):
>                  self.nodes.remove(node)
>                  self._log(Log.DEBUG, "node %s has no available cpus, removing" % node)
>  
> -- 
> 2.27.0
> 
> 

Sorry, this isn't quite right.

The cpulist in kcompile is the list of cpus where the load modules will 
run. The user can specify it like this
--loads-cpulist=LIST

If the user does not specify a list (because they want it to run 
everywhere) then the cpulist is empty. Your patch was working for you 
because the cpulist was empty, but that has nothing to do with whether the 
cpu is online or not.

systopology will fetch a list of cpus and consider whether they are online 
or not. So, I think the solution is to delete the method in kcompile and 
just use the one in systopology.

Sending another mail with the patch.

Thanks

John Kacur


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 3/3] rteval: systopology: Slight CpuList.__expand_cpulist() cleanup
  2022-04-19 16:14 ` [PATCH 3/3] rteval: systopology: Slight CpuList.__expand_cpulist() cleanup Valentin Schneider
@ 2022-04-29 20:53   ` John Kacur
  2022-05-03 10:26     ` Valentin Schneider
  0 siblings, 1 reply; 8+ messages in thread
From: John Kacur @ 2022-04-29 20:53 UTC (permalink / raw)
  To: Valentin Schneider; +Cc: linux-rt-users, Clark Williams



On Tue, 19 Apr 2022, Valentin Schneider wrote:

> This method currently aggregates CPUs into a list, then converts this to
> set and then back to list. The aggregation can instead be done directly
> into a set.
> 
> (as an offside, it would make more sense for CpuList to have its storage be
> a set in the first place as duplicate CPU ids don't make sense for it, but
> that's a separate discussion :-))
> 
> The integer conversion of the "a-b" pattern can also be condensed into a
> single map() expression.
> 
> Signed-off-by: Valentin Schneider <vschneid@redhat.com>
> ---
>  rteval/systopology.py | 15 +++++++--------
>  1 file changed, 7 insertions(+), 8 deletions(-)
> 
> diff --git a/rteval/systopology.py b/rteval/systopology.py
> index b2da7bb..2a28f9c 100644
> --- a/rteval/systopology.py
> +++ b/rteval/systopology.py
> @@ -102,20 +102,19 @@ class CpuList:
>          """ expand a range string into an array of cpu numbers
>          don't error check against online cpus
>          """
> -        result = []
> -
>          if not cpulist:
> -            return result
> +            return []
> +
> +        result = set()
>  
>          for part in cpulist.split(','):
>              if '-' in part:
> -                a, b = part.split('-')
> -                a, b = int(a), int(b)
> -                result.extend(list(range(a, b + 1)))
> +                a, b = map(int, part.split('-'))
> +                result |= set(range(a, b + 1))
>              else:
>                  a = int(part)
> -                result.append(a)
> -        return [int(i) for i in list(set(result))]
> +                result |= {a}
> +        return list(result)
>  
>      def getcpulist(self):
>          """ return the list of cpus tracked """
> -- 
> 2.27.0
> 
> 

I'm guessing that the reason for the "set" was to remove any potential 
duplicates. Duplicates are not normally a problem in rteval, but if you 
want to ensure you handle any user input correctly, it makes sense to do 
that. I think your code is correct, but I'm not sure if it buys us 
anything.

John


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 3/3] rteval: systopology: Slight CpuList.__expand_cpulist() cleanup
  2022-04-29 20:53   ` John Kacur
@ 2022-05-03 10:26     ` Valentin Schneider
  0 siblings, 0 replies; 8+ messages in thread
From: Valentin Schneider @ 2022-05-03 10:26 UTC (permalink / raw)
  To: John Kacur; +Cc: linux-rt-users, Clark Williams

On 29/04/22 16:53, John Kacur wrote:
> On Tue, 19 Apr 2022, Valentin Schneider wrote:
>
>> This method currently aggregates CPUs into a list, then converts this to
>> set and then back to list. The aggregation can instead be done directly
>> into a set.
>>
>> (as an offside, it would make more sense for CpuList to have its storage be
>> a set in the first place as duplicate CPU ids don't make sense for it, but
>> that's a separate discussion :-))
>>
>> The integer conversion of the "a-b" pattern can also be condensed into a
>> single map() expression.
>>
>> Signed-off-by: Valentin Schneider <vschneid@redhat.com>
>> ---
>>  rteval/systopology.py | 15 +++++++--------
>>  1 file changed, 7 insertions(+), 8 deletions(-)
>>
>> diff --git a/rteval/systopology.py b/rteval/systopology.py
>> index b2da7bb..2a28f9c 100644
>> --- a/rteval/systopology.py
>> +++ b/rteval/systopology.py
>> @@ -102,20 +102,19 @@ class CpuList:
>>          """ expand a range string into an array of cpu numbers
>>          don't error check against online cpus
>>          """
>> -        result = []
>> -
>>          if not cpulist:
>> -            return result
>> +            return []
>> +
>> +        result = set()
>>
>>          for part in cpulist.split(','):
>>              if '-' in part:
>> -                a, b = part.split('-')
>> -                a, b = int(a), int(b)
>> -                result.extend(list(range(a, b + 1)))
>> +                a, b = map(int, part.split('-'))
>> +                result |= set(range(a, b + 1))
>>              else:
>>                  a = int(part)
>> -                result.append(a)
>> -        return [int(i) for i in list(set(result))]
>> +                result |= {a}
>> +        return list(result)
>>
>>      def getcpulist(self):
>>          """ return the list of cpus tracked """
>> --
>> 2.27.0
>>
>>
>
> I'm guessing that the reason for the "set" was to remove any potential
> duplicates. Duplicates are not normally a problem in rteval, but if you
> want to ensure you handle any user input correctly, it makes sense to do
> that. I think your code is correct, but I'm not sure if it buys us
> anything.
>

It doesn't really, it should be functionaly identical to the existing code
- it's just "drive-by cleanup" really :)

> John


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-05-03 10:27 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-19 16:14 [PATCH 0/3] rteval: Offline NUMA node bugfix Valentin Schneider
2022-04-19 16:14 ` [PATCH 1/3] rteval: systopology: Fix offline NUMA node parsing Valentin Schneider
2022-04-29 19:54   ` John Kacur
2022-04-19 16:14 ` [PATCH 2/3] rteval: kcompile: Fix offline node handling Valentin Schneider
2022-04-29 20:21   ` John Kacur
2022-04-19 16:14 ` [PATCH 3/3] rteval: systopology: Slight CpuList.__expand_cpulist() cleanup Valentin Schneider
2022-04-29 20:53   ` John Kacur
2022-05-03 10:26     ` Valentin Schneider

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.