netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Hou Tao <houtao@huaweicloud.com>
To: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Cc: Hao Luo <haoluo@google.com>, Waiman Long <longman@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>, Will Deacon <will@kernel.org>,
	netdev@vger.kernel.org, Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	Martin KaFai Lau <martin.lau@linux.dev>,
	Song Liu <song@kernel.org>, Yonghong Song <yhs@fb.com>,
	John Fastabend <john.fastabend@gmail.com>,
	KP Singh <kpsingh@kernel.org>,
	Stanislav Fomichev <sdf@google.com>, Jiri Olsa <jolsa@kernel.org>,
	bpf <bpf@vger.kernel.org>,
	"houtao1@huawei.com" <houtao1@huawei.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Boqun Feng <boqun.feng@gmail.com>
Subject: Re: [net-next] bpf: avoid hashtab deadlock with try_lock
Date: Thu, 1 Dec 2022 10:53:37 +0800	[thread overview]
Message-ID: <20b8ad93-7a90-dc8c-581b-491d543423a5@huaweicloud.com> (raw)
In-Reply-To: <CAMDZJNWv0Qv6LJ=APOj644vKfJttcQ4WHXFizxn_2hCeVzQcXA@mail.gmail.com>

Hi,

On 11/30/2022 1:55 PM, Tonghao Zhang wrote:
> On Wed, Nov 30, 2022 at 12:13 PM Hou Tao <houtao@huaweicloud.com> wrote:
>> Hi,
>>
>> On 11/30/2022 10:47 AM, Tonghao Zhang wrote:
>>> On Wed, Nov 30, 2022 at 9:50 AM Hou Tao <houtao@huaweicloud.com> wrote:
>>>> Hi Hao,
>>>>
>>>> On 11/30/2022 3:36 AM, Hao Luo wrote:
>>>>> On Tue, Nov 29, 2022 at 9:32 AM Boqun Feng <boqun.feng@gmail.com> wrote:
>>>>>> Just to be clear, I meant to refactor htab_lock_bucket() into a try
>>>>>> lock pattern. Also after a second thought, the below suggestion doesn't
>>>>>> work. I think the proper way is to make htab_lock_bucket() as a
>>>>>> raw_spin_trylock_irqsave().
>>>>>>
>>>>>> Regards,
>>>>>> Boqun
>>>>>>
>>>>> The potential deadlock happens when the lock is contended from the
>>>>> same cpu. When the lock is contended from a remote cpu, we would like
>>>>> the remote cpu to spin and wait, instead of giving up immediately. As
>>>>> this gives better throughput. So replacing the current
>>>>> raw_spin_lock_irqsave() with trylock sacrifices this performance gain.
>>>>>
>>>>> I suspect the source of the problem is the 'hash' that we used in
>>>>> htab_lock_bucket(). The 'hash' is derived from the 'key', I wonder
>>>>> whether we should use a hash derived from 'bucket' rather than from
>>>>> 'key'. For example, from the memory address of the 'bucket'. Because,
>>>>> different keys may fall into the same bucket, but yield different
>>>>> hashes. If the same bucket can never have two different 'hashes' here,
>>>>> the map_locked check should behave as intended. Also because
>>>>> ->map_locked is per-cpu, execution flows from two different cpus can
>>>>> both pass.
>>>> The warning from lockdep is due to the reason the bucket lock A is used in a
>>>> no-NMI context firstly, then the same bucke lock is used a NMI context, so
>>> Yes, I tested lockdep too, we can't use the lock in NMI(but only
>>> try_lock work fine) context if we use them no-NMI context. otherwise
>>> the lockdep prints the warning.
>>> * for the dead-lock case: we can use the
>>> 1. hash & min(HASHTAB_MAP_LOCK_MASK, htab->n_buckets -1)
>>> 2. or hash bucket address.
>> Use the computed hash will be better than hash bucket address, because the hash
>> buckets are allocated sequentially.
>>> * for lockdep warning, we should use in_nmi check with map_locked.
>>>
>>> BTW, the patch doesn't work, so we can remove the lock_key
>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c50eb518e262fa06bd334e6eec172eaf5d7a5bd9
>>>
>>> static inline int htab_lock_bucket(const struct bpf_htab *htab,
>>>                                    struct bucket *b, u32 hash,
>>>                                    unsigned long *pflags)
>>> {
>>>         unsigned long flags;
>>>
>>>         hash = hash & min(HASHTAB_MAP_LOCK_MASK, htab->n_buckets -1);
>>>
>>>         preempt_disable();
>>>         if (unlikely(__this_cpu_inc_return(*(htab->map_locked[hash])) != 1)) {
>>>                 __this_cpu_dec(*(htab->map_locked[hash]));
>>>                 preempt_enable();
>>>                 return -EBUSY;
>>>         }
>>>
>>>         if (in_nmi()) {
>>>                 if (!raw_spin_trylock_irqsave(&b->raw_lock, flags))
>>>                         return -EBUSY;
>> The only purpose of trylock here is to make lockdep happy and it may lead to
>> unnecessary -EBUSY error for htab operations in NMI context. I still prefer add
>> a virtual lock-class for map_locked to fix the lockdep warning. So could you use
> Hi, what is virtual lock-class ? Can you give me an example of what you mean?
If LOCKDEP is enabled, raw_spinlock will add dep_map in the definition and it
also calls lock_acquire() and lock_release() to assist the deadlock check. Now
map_locked is not a lock but it acts like a raw_spin_trylock, so we need to add
dep_map to it manually, and then also call lock_acquire(trylock=1) and
lock_release() before increasing and decreasing map_locked. You can reference
the implementation of raw_spin_trylock and raw_spin_unlock for more details.
>> separated patches to fix the potential dead-lock and the lockdep warning ? It
>> will be better you can also add a bpf selftests for deadlock problem as said before.
>>
>> Thanks,
>> Tao
>>>         } else {
>>>                 raw_spin_lock_irqsave(&b->raw_lock, flags);
>>>         }
>>>
>>>         *pflags = flags;
>>>         return 0;
>>> }
>>>
>>>
>>>> lockdep deduces that may be a dead-lock. I have already tried to use the same
>>>> map_locked for keys with the same bucket, the dead-lock is gone, but still got
>>>> lockdep warning.
>>>>> Hao
>>>>> .
>


  reply	other threads:[~2022-12-01  2:53 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-21 10:05 [net-next] bpf: avoid the multi checking xiangxia.m.yue
2022-11-21 10:05 ` [net-next] bpf: avoid hashtab deadlock with try_lock xiangxia.m.yue
2022-11-21 20:19   ` Jakub Kicinski
2022-11-22  1:15   ` Hou Tao
2022-11-22  3:12     ` Tonghao Zhang
2022-11-22  4:01       ` Hou Tao
2022-11-22  4:06         ` Hou Tao
2022-11-24 12:57           ` Tonghao Zhang
2022-11-24 14:13             ` Hou Tao
2022-11-28  3:15               ` Tonghao Zhang
2022-11-28 21:55                 ` Hao Luo
2022-11-29  4:32                   ` Hou Tao
2022-11-29  6:06                     ` Tonghao Zhang
2022-11-29  7:56                       ` Hou Tao
2022-11-29 12:45                       ` Hou Tao
2022-11-29 16:06                         ` Waiman Long
2022-11-29 17:23                           ` Boqun Feng
2022-11-29 17:32                             ` Boqun Feng
2022-11-29 19:36                               ` Hao Luo
2022-11-29 21:13                                 ` Waiman Long
2022-11-30  1:50                                 ` Hou Tao
2022-11-30  2:47                                   ` Tonghao Zhang
2022-11-30  3:06                                     ` Waiman Long
2022-11-30  3:32                                       ` Tonghao Zhang
2022-11-30  4:07                                         ` Waiman Long
2022-11-30  4:13                                     ` Hou Tao
2022-11-30  5:02                                       ` Hao Luo
2022-11-30  5:56                                         ` Tonghao Zhang
2022-11-30  5:55                                       ` Tonghao Zhang
2022-12-01  2:53                                         ` Hou Tao [this message]
2022-11-30  1:37                             ` Hou Tao
2022-11-22 22:16 ` [net-next] bpf: avoid the multi checking Daniel Borkmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20b8ad93-7a90-dc8c-581b-491d543423a5@huaweicloud.com \
    --to=houtao@huaweicloud.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=boqun.feng@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=haoluo@google.com \
    --cc=houtao1@huawei.com \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kpsingh@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=longman@redhat.com \
    --cc=martin.lau@linux.dev \
    --cc=mingo@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=sdf@google.com \
    --cc=song@kernel.org \
    --cc=will@kernel.org \
    --cc=xiangxia.m.yue@gmail.com \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).