Kajetan Puchalski wrote: > On Fri, Jul 01, 2022 at 10:01:10PM +0200, Florian Westphal wrote: > > Kajetan Puchalski wrote: > > > While running the udp-flood test from stress-ng on Ampere Altra (Mt. > > > Jade platform) I encountered a kernel panic caused by NULL pointer > > > dereference within nf_conntrack. > > > > > > The issue is present in the latest mainline (5.19-rc4), latest stable > > > (5.18.8), as well as multiple older stable versions. The last working > > > stable version I found was 5.15.40. > > > > Do I need a special setup for conntrack? > > I don't think there was any special setup involved, the config I started > from was a generic distribution config and I didn't change any > networking-specific options. In case that's helpful here's the .config I > used. > > https://pastebin.com/Bb2wttdx > > > > > No crashes after more than one hour of stress-ng on > > 1. 4 core amd64 Fedora 5.17 kernel > > 2. 16 core amd64, linux stable 5.17.15 > > 3. 12 core intel, Fedora 5.18 kernel > > 4. 3 core aarch64 vm, 5.18.7-200.fc36.aarch64 > > > > That would make sense, from further experiments I ran it somehow seems > to be related to the number of workers being spawned by stress-ng along > with the CPUs/cores involved. > > For instance, running the test with <=25 workers (--udp-flood 25 etc.) > results in the test running fine for at least 15 minutes. Ok. I will let it run for longer on the machines I have access to. In mean time, you could test attached patch, its simple s/refcount_/atomic_/ in nf_conntrack. If mainline (patch vs. HEAD 69cb6c6556ad89620547318439) crashes for you but works with attached patch someone who understands aarch64 memory ordering would have to look more closely at refcount_XXX functions to see where they might differ from atomic_ ones. If it still crashes, please try below hunk in addition, although I don't see how it would make a difference. This is the one spot where the original conversion replaced atomic_inc() with refcount_set(), this is on allocation, refcount is expected to be 0 so refcount_inc() triggers a warning hinting at a use-after free. diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c --- a/net/netfilter/nf_conntrack_core.c +++ b/net/netfilter/nf_conntrack_core.c @@ -1776,7 +1776,7 @@ init_conntrack(struct net *net, struct nf_conn *tmpl, __nf_ct_try_assign_helper(ct, tmpl, GFP_ATOMIC); /* Now it is going to be associated with an sk_buff, set refcount to 1. */ - atomic_set(&ct->ct_general.use, 1); + atomic_inc(&ct->ct_general.use); if (exp) { if (exp->expectfn)