bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] perf lock contention: Do not use BPF task local storage
@ 2022-11-18 19:01 Namhyung Kim
  2022-11-21 17:32 ` Martin KaFai Lau
  0 siblings, 1 reply; 8+ messages in thread
From: Namhyung Kim @ 2022-11-18 19:01 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers, Adrian Hunter,
	linux-perf-users, Song Liu, bpf, Blake Jones, Chris Li

It caused some troubles when a lock inside kmalloc is contended
because task local storage would allocate memory using kmalloc.
It'd create a recusion and even crash in my system.

There could be a couple of workarounds but I think the simplest
one is to use a pre-allocated hash map.  We could fix the task
local storage to use the safe BPF allocator, but it takes time
so let's change this until it happens actually.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/bpf_lock_contention.c         |  1 +
 .../perf/util/bpf_skel/lock_contention.bpf.c  | 34 ++++++++++++-------
 2 files changed, 23 insertions(+), 12 deletions(-)

diff --git a/tools/perf/util/bpf_lock_contention.c b/tools/perf/util/bpf_lock_contention.c
index 0deec1178778..4db9ad3d50c4 100644
--- a/tools/perf/util/bpf_lock_contention.c
+++ b/tools/perf/util/bpf_lock_contention.c
@@ -39,6 +39,7 @@ int lock_contention_prepare(struct lock_contention *con)
 	bpf_map__set_value_size(skel->maps.stacks, con->max_stack * sizeof(u64));
 	bpf_map__set_max_entries(skel->maps.stacks, con->map_nr_entries);
 	bpf_map__set_max_entries(skel->maps.lock_stat, con->map_nr_entries);
+	bpf_map__set_max_entries(skel->maps.tstamp, con->map_nr_entries);
 
 	if (target__has_cpu(target))
 		ncpus = perf_cpu_map__nr(evlist->core.user_requested_cpus);
diff --git a/tools/perf/util/bpf_skel/lock_contention.bpf.c b/tools/perf/util/bpf_skel/lock_contention.bpf.c
index 1bb8628e7c9f..9681cb59b0df 100644
--- a/tools/perf/util/bpf_skel/lock_contention.bpf.c
+++ b/tools/perf/util/bpf_skel/lock_contention.bpf.c
@@ -40,10 +40,10 @@ struct {
 
 /* maintain timestamp at the beginning of contention */
 struct {
-	__uint(type, BPF_MAP_TYPE_TASK_STORAGE);
-	__uint(map_flags, BPF_F_NO_PREALLOC);
+	__uint(type, BPF_MAP_TYPE_HASH);
 	__type(key, int);
 	__type(value, struct tstamp_data);
+	__uint(max_entries, MAX_ENTRIES);
 } tstamp SEC(".maps");
 
 /* actual lock contention statistics */
@@ -103,18 +103,28 @@ static inline int can_record(void)
 SEC("tp_btf/contention_begin")
 int contention_begin(u64 *ctx)
 {
-	struct task_struct *curr;
+	__u32 pid;
 	struct tstamp_data *pelem;
 
 	if (!enabled || !can_record())
 		return 0;
 
-	curr = bpf_get_current_task_btf();
-	pelem = bpf_task_storage_get(&tstamp, curr, NULL,
-				     BPF_LOCAL_STORAGE_GET_F_CREATE);
-	if (!pelem || pelem->lock)
+	pid = bpf_get_current_pid_tgid();
+	pelem = bpf_map_lookup_elem(&tstamp, &pid);
+	if (pelem && pelem->lock)
 		return 0;
 
+	if (pelem == NULL) {
+		struct tstamp_data zero = {};
+
+		bpf_map_update_elem(&tstamp, &pid, &zero, BPF_ANY);
+		pelem = bpf_map_lookup_elem(&tstamp, &pid);
+		if (pelem == NULL) {
+			lost++;
+			return 0;
+		}
+	}
+
 	pelem->timestamp = bpf_ktime_get_ns();
 	pelem->lock = (__u64)ctx[0];
 	pelem->flags = (__u32)ctx[1];
@@ -128,7 +138,7 @@ int contention_begin(u64 *ctx)
 SEC("tp_btf/contention_end")
 int contention_end(u64 *ctx)
 {
-	struct task_struct *curr;
+	__u32 pid;
 	struct tstamp_data *pelem;
 	struct contention_key key;
 	struct contention_data *data;
@@ -137,8 +147,8 @@ int contention_end(u64 *ctx)
 	if (!enabled)
 		return 0;
 
-	curr = bpf_get_current_task_btf();
-	pelem = bpf_task_storage_get(&tstamp, curr, NULL, 0);
+	pid = bpf_get_current_pid_tgid();
+	pelem = bpf_map_lookup_elem(&tstamp, &pid);
 	if (!pelem || pelem->lock != ctx[0])
 		return 0;
 
@@ -156,7 +166,7 @@ int contention_end(u64 *ctx)
 		};
 
 		bpf_map_update_elem(&lock_stat, &key, &first, BPF_NOEXIST);
-		pelem->lock = 0;
+		bpf_map_delete_elem(&tstamp, &pid);
 		return 0;
 	}
 
@@ -169,7 +179,7 @@ int contention_end(u64 *ctx)
 	if (data->min_time > duration)
 		data->min_time = duration;
 
-	pelem->lock = 0;
+	bpf_map_delete_elem(&tstamp, &pid);
 	return 0;
 }
 
-- 
2.38.1.584.g0f3c55d4c2-goog


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] perf lock contention: Do not use BPF task local storage
  2022-11-18 19:01 [PATCH] perf lock contention: Do not use BPF task local storage Namhyung Kim
@ 2022-11-21 17:32 ` Martin KaFai Lau
  2022-11-23 13:49   ` Arnaldo Carvalho de Melo
  2023-01-09 20:56   ` Namhyung Kim
  0 siblings, 2 replies; 8+ messages in thread
From: Martin KaFai Lau @ 2022-11-21 17:32 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers, Adrian Hunter,
	linux-perf-users, Song Liu, bpf, Blake Jones, Chris Li,
	Arnaldo Carvalho de Melo, Jiri Olsa

On 11/18/22 11:01 AM, Namhyung Kim wrote:
> It caused some troubles when a lock inside kmalloc is contended
> because task local storage would allocate memory using kmalloc.
> It'd create a recusion and even crash in my system.
> 
> There could be a couple of workarounds but I think the simplest
> one is to use a pre-allocated hash map.

Acked-by: Martin KaFai Lau <martin.lau@kernel.org>

> We could fix the task local storage to use the safe BPF allocator,
> but it takes time so let's change this until it happens actually.

I also got another report on the kfree_rcu path.  I am also looking into this 
direction on using the BPF allocator.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] perf lock contention: Do not use BPF task local storage
  2022-11-21 17:32 ` Martin KaFai Lau
@ 2022-11-23 13:49   ` Arnaldo Carvalho de Melo
  2023-01-09 20:56   ` Namhyung Kim
  1 sibling, 0 replies; 8+ messages in thread
From: Arnaldo Carvalho de Melo @ 2022-11-23 13:49 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: Namhyung Kim, Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers,
	Adrian Hunter, linux-perf-users, Song Liu, bpf, Blake Jones,
	Chris Li, Jiri Olsa

Em Mon, Nov 21, 2022 at 09:32:56AM -0800, Martin KaFai Lau escreveu:
> On 11/18/22 11:01 AM, Namhyung Kim wrote:
> > It caused some troubles when a lock inside kmalloc is contended
> > because task local storage would allocate memory using kmalloc.
> > It'd create a recusion and even crash in my system.
> > 
> > There could be a couple of workarounds but I think the simplest
> > one is to use a pre-allocated hash map.
> 
> Acked-by: Martin KaFai Lau <martin.lau@kernel.org>

Thanks, applied.

- Arnaldo

 
> > We could fix the task local storage to use the safe BPF allocator,
> > but it takes time so let's change this until it happens actually.
> 
> I also got another report on the kfree_rcu path.  I am also looking into
> this direction on using the BPF allocator.

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] perf lock contention: Do not use BPF task local storage
  2022-11-21 17:32 ` Martin KaFai Lau
  2022-11-23 13:49   ` Arnaldo Carvalho de Melo
@ 2023-01-09 20:56   ` Namhyung Kim
  2023-01-09 21:22     ` Martin KaFai Lau
  1 sibling, 1 reply; 8+ messages in thread
From: Namhyung Kim @ 2023-01-09 20:56 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers, Adrian Hunter,
	linux-perf-users, Song Liu, bpf, Blake Jones, Chris Li,
	Arnaldo Carvalho de Melo, Jiri Olsa

Hello,

On Mon, Nov 21, 2022 at 9:33 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
> On 11/18/22 11:01 AM, Namhyung Kim wrote:
> > We could fix the task local storage to use the safe BPF allocator,
> > but it takes time so let's change this until it happens actually.
>
> I also got another report on the kfree_rcu path.  I am also looking into this
> direction on using the BPF allocator.

Any progress on this?  Are there any concerns about the change?

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] perf lock contention: Do not use BPF task local storage
  2023-01-09 20:56   ` Namhyung Kim
@ 2023-01-09 21:22     ` Martin KaFai Lau
  2023-01-09 22:25       ` Namhyung Kim
  2023-01-10  3:29       ` Hou Tao
  0 siblings, 2 replies; 8+ messages in thread
From: Martin KaFai Lau @ 2023-01-09 21:22 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers, Adrian Hunter,
	linux-perf-users, Song Liu, bpf, Blake Jones, Chris Li,
	Arnaldo Carvalho de Melo, Jiri Olsa

On 1/9/23 12:56 PM, Namhyung Kim wrote:
> Hello,
> 
> On Mon, Nov 21, 2022 at 9:33 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>>
>> On 11/18/22 11:01 AM, Namhyung Kim wrote:
>>> We could fix the task local storage to use the safe BPF allocator,
>>> but it takes time so let's change this until it happens actually.
>>
>> I also got another report on the kfree_rcu path.  I am also looking into this
>> direction on using the BPF allocator.
> 
> Any progress on this?  Are there any concerns about the change?

Yep, I am working on it. It is not a direct replacement from kzalloc to 
bpf_mem_cache_alloc. eg. Some changes in the bpf mem allocator is needed to 
ensure the free list cannot be reused before the rcu grace period. There is a 
similar RFC patchset going into this direction that I am trying with.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] perf lock contention: Do not use BPF task local storage
  2023-01-09 21:22     ` Martin KaFai Lau
@ 2023-01-09 22:25       ` Namhyung Kim
  2023-01-10  3:29       ` Hou Tao
  1 sibling, 0 replies; 8+ messages in thread
From: Namhyung Kim @ 2023-01-09 22:25 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers, Adrian Hunter,
	linux-perf-users, Song Liu, bpf, Blake Jones, Chris Li,
	Arnaldo Carvalho de Melo, Jiri Olsa

On Mon, Jan 9, 2023 at 1:23 PM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
> On 1/9/23 12:56 PM, Namhyung Kim wrote:
> > Hello,
> >
> > On Mon, Nov 21, 2022 at 9:33 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
> >>
> >> On 11/18/22 11:01 AM, Namhyung Kim wrote:
> >>> We could fix the task local storage to use the safe BPF allocator,
> >>> but it takes time so let's change this until it happens actually.
> >>
> >> I also got another report on the kfree_rcu path.  I am also looking into this
> >> direction on using the BPF allocator.
> >
> > Any progress on this?  Are there any concerns about the change?
>
> Yep, I am working on it. It is not a direct replacement from kzalloc to
> bpf_mem_cache_alloc. eg. Some changes in the bpf mem allocator is needed to
> ensure the free list cannot be reused before the rcu grace period. There is a
> similar RFC patchset going into this direction that I am trying with.

I see.  Thanks for the update!

Namhyung

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] perf lock contention: Do not use BPF task local storage
  2023-01-09 21:22     ` Martin KaFai Lau
  2023-01-09 22:25       ` Namhyung Kim
@ 2023-01-10  3:29       ` Hou Tao
  2023-01-10  6:29         ` Martin KaFai Lau
  1 sibling, 1 reply; 8+ messages in thread
From: Hou Tao @ 2023-01-10  3:29 UTC (permalink / raw)
  To: Martin KaFai Lau, Alexei Starovoitov
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers, Adrian Hunter,
	linux-perf-users, Song Liu, bpf, Blake Jones, Chris Li,
	Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim

Hi Martin,

On 1/10/2023 5:22 AM, Martin KaFai Lau wrote:
> On 1/9/23 12:56 PM, Namhyung Kim wrote:
>> Hello,
>>
>> On Mon, Nov 21, 2022 at 9:33 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>>>
>>> On 11/18/22 11:01 AM, Namhyung Kim wrote:
>>>> We could fix the task local storage to use the safe BPF allocator,
>>>> but it takes time so let's change this until it happens actually.
>>>
>>> I also got another report on the kfree_rcu path.  I am also looking into this
>>> direction on using the BPF allocator.
>>
>> Any progress on this?  Are there any concerns about the change?
>
> Yep, I am working on it. It is not a direct replacement from kzalloc to
> bpf_mem_cache_alloc. eg. Some changes in the bpf mem allocator is needed to
> ensure the free list cannot be reused before the rcu grace period. There is a
> similar RFC patchset going into this direction that I am trying with.
>
> .
Do you mean "[RFC PATCH bpf-next 0/6] bpf: Handle reuse in bpf memory alloc"
[0], right ? The main concern [1] for the proposal is the possibility of OOM
will increase when RCU tasks trace grace period is slow, because the immediate
reuse is disabled and the reuse is only possible after one RCU tasks trace grace
period. Using a memory cgroup and setting a hard-limit on the cgroup may reduce
the influence of the OOM problem, but it is not good enough. So do you have
other ways to mitigate the potential OOM problem ?

[0]: https://lore.kernel.org/bpf/20221230041151.1231169-1-houtao@huaweicloud.com/
[1]:
https://lore.kernel.org/bpf/CAADnVQ+z-Y6Yv2i-icAUy=Uyh9yiN4S1AOrLd=K8mu32TXORkw@mail.gmail.com/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] perf lock contention: Do not use BPF task local storage
  2023-01-10  3:29       ` Hou Tao
@ 2023-01-10  6:29         ` Martin KaFai Lau
  0 siblings, 0 replies; 8+ messages in thread
From: Martin KaFai Lau @ 2023-01-10  6:29 UTC (permalink / raw)
  To: Hou Tao
  Cc: Alexei Starovoitov, Ingo Molnar, Peter Zijlstra, LKML,
	Ian Rogers, Adrian Hunter, linux-perf-users, Song Liu, bpf,
	Blake Jones, Chris Li, Arnaldo Carvalho de Melo, Jiri Olsa,
	Namhyung Kim

On 1/9/23 7:29 PM, Hou Tao wrote:
> Hi Martin,
> 
> On 1/10/2023 5:22 AM, Martin KaFai Lau wrote:
>> On 1/9/23 12:56 PM, Namhyung Kim wrote:
>>> Hello,
>>>
>>> On Mon, Nov 21, 2022 at 9:33 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>>>>
>>>> On 11/18/22 11:01 AM, Namhyung Kim wrote:
>>>>> We could fix the task local storage to use the safe BPF allocator,
>>>>> but it takes time so let's change this until it happens actually.
>>>>
>>>> I also got another report on the kfree_rcu path.  I am also looking into this
>>>> direction on using the BPF allocator.
>>>
>>> Any progress on this?  Are there any concerns about the change?
>>
>> Yep, I am working on it. It is not a direct replacement from kzalloc to
>> bpf_mem_cache_alloc. eg. Some changes in the bpf mem allocator is needed to
>> ensure the free list cannot be reused before the rcu grace period. There is a
>> similar RFC patchset going into this direction that I am trying with.
>>
>> .
> Do you mean "[RFC PATCH bpf-next 0/6] bpf: Handle reuse in bpf memory alloc"
> [0], right ? 
Yes, that is the RFC patch I was referring :). I was planning to comment after 
looking at the patch in details. I have shared some of my quick thoughts in that 
thread for the local storage usages.

> The main concern [1] for the proposal is the possibility of OOM
> will increase when RCU tasks trace grace period is slow, because the immediate
> reuse is disabled and the reuse is only possible after one RCU tasks trace grace
> period. Using a memory cgroup and setting a hard-limit on the cgroup may reduce
> the influence of the OOM problem, but it is not good enough. So do you have
> other ways to mitigate the potential OOM problem ?
> 
> [0]: https://lore.kernel.org/bpf/20221230041151.1231169-1-houtao@huaweicloud.com/
> [1]:
> https://lore.kernel.org/bpf/CAADnVQ+z-Y6Yv2i-icAUy=Uyh9yiN4S1AOrLd=K8mu32TXORkw@mail.gmail.com/


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-01-10  6:29 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-18 19:01 [PATCH] perf lock contention: Do not use BPF task local storage Namhyung Kim
2022-11-21 17:32 ` Martin KaFai Lau
2022-11-23 13:49   ` Arnaldo Carvalho de Melo
2023-01-09 20:56   ` Namhyung Kim
2023-01-09 21:22     ` Martin KaFai Lau
2023-01-09 22:25       ` Namhyung Kim
2023-01-10  3:29       ` Hou Tao
2023-01-10  6:29         ` Martin KaFai Lau

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).