All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: Baokun Li <libaokun1@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Yi Zhang <yi.zhang@redhat.com>, Ming Lei <ming.lei@redhat.com>,
	mark.rutland@arm.com, Christian Brauner <brauner@kernel.org>,
	linux-fsdevel@vger.kernel.org,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	linux-kernel@vger.kernel.org, linux-scsi@vger.kernel.org,
	Changhui Zhong <czhong@redhat.com>,
	yangerkun <yangerkun@huawei.com>,
	"zhangyi (F)" <yi.zhang@huawei.com>,
	Kees Cook <keescook@chromium.org>,
	chengzhihao <chengzhihao1@huawei.com>
Subject: Re: [czhong@redhat.com: [bug report] WARNING: CPU: 121 PID: 93233 at fs/dcache.c:365 __dentry_kill+0x214/0x278]
Date: Mon, 18 Sep 2023 11:42:00 -0700	[thread overview]
Message-ID: <20230918184200.GA347993@frogsfrogsfrogs> (raw)
In-Reply-To: <a6b10684-39ee-960a-10ab-663746800f85@huawei.com>

On Mon, Sep 18, 2023 at 09:52:28AM +0800, Baokun Li wrote:
> On 2023/9/17 17:26, Peter Zijlstra wrote:
> > On Sun, Sep 17, 2023 at 11:10:32AM +0200, Peter Zijlstra wrote:
> > > On Sat, Sep 16, 2023 at 02:55:47PM +0800, Baokun Li wrote:
> > > > On 2023/9/13 16:59, Yi Zhang wrote:
> > > > > The issue still can be reproduced on the latest linux tree[2].
> > > > > To reproduce I need to run about 1000 times blktests block/001, and
> > > > > bisect shows it was introduced with commit[1], as it was not 100%
> > > > > reproduced, not sure if it's the culprit?
> > > > > 
> > > > > 
> > > > > [1] 9257959a6e5b locking/atomic: scripts: restructure fallback ifdeffery
> > > > Hello, everyone!
> > > > 
> > > > We have confirmed that the merge-in of this patch caused hlist_bl_lock
> > > > (aka, bit_spin_lock) to fail, which in turn triggered the issue above.
> > > > [root@localhost ~]# insmod mymod.ko
> > > > [   37.994787][  T621] >>> a = 725, b = 724
> > > > [   37.995313][  T621] ------------[ cut here ]------------
> > > > [   37.995951][  T621] kernel BUG at fs/mymod/mymod.c:42!
> > > > [r[  oo 3t7@.l996o4c61al]h[o s T6t21] ~ ]#Int ernal error: Oops - BUG:
> > > > 00000000f2000800 [#1] SMP
> > > > [   37.997420][  T621] Modules linked in: mymod(E)
> > > > [   37.997891][  T621] CPU: 9 PID: 621 Comm: bl_lock_thread2 Tainted:
> > > > G            E      6.4.0-rc2-00034-g9257959a6e5b-dirty #117
> > > > [   37.999038][  T621] Hardware name: linux,dummy-virt (DT)
> > > > [   37.999571][  T621] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS
> > > > BTYPE=--)
> > > > [   38.000344][  T621] pc : increase_ab+0xcc/0xe70 [mymod]
> > > > [   38.000882][  T621] lr : increase_ab+0xcc/0xe70 [mymod]
> > > > [   38.001416][  T621] sp : ffff800008b4be40
> > > > [   38.001822][  T621] x29: ffff800008b4be40 x28: 0000000000000000 x27:
> > > > 0000000000000000
> > > > [   38.002605][  T621] x26: 0000000000000000 x25: 0000000000000000 x24:
> > > > 0000000000000000
> > > > [   38.003385][  T621] x23: ffffd9930c698190 x22: ffff800008a0ba38 x21:
> > > > 0000000000000001
> > > > [   38.004174][  T621] x20: ffffffffffffefff x19: ffffd9930c69a580 x18:
> > > > 0000000000000000
> > > > [   38.004955][  T621] x17: 0000000000000000 x16: ffffd9933011bd38 x15:
> > > > ffffffffffffffff
> > > > [   38.005754][  T621] x14: 0000000000000000 x13: 205d313236542020 x12:
> > > > ffffd99332175b80
> > > > [   38.006538][  T621] x11: 0000000000000003 x10: 0000000000000001 x9 :
> > > > ffffd9933022a9d8
> > > > [   38.007325][  T621] x8 : 00000000000bffe8 x7 : c0000000ffff7fff x6 :
> > > > ffffd993320b5b40
> > > > [   38.008124][  T621] x5 : ffff0001f7d1c708 x4 : 0000000000000000 x3 :
> > > > 0000000000000000
> > > > [   38.008912][  T621] x2 : 0000000000000000 x1 : 0000000000000000 x0 :
> > > > 0000000000000015
> > > > [   38.009709][  T621] Call trace:
> > > > [   38.010035][  T621]  increase_ab+0xcc/0xe70 [mymod]
> > > > [   38.010539][  T621]  kthread+0xdc/0xf0
> > > > [   38.010927][  T621]  ret_from_fork+0x10/0x20
> > > > [   38.011370][  T621] Code: 17ffffe0 90000020 91044000 9400000d (d4210000)
> > > > [   38.012067][  T621] ---[ end trace 0000000000000000 ]---
> > > Is this arm64 or something? You seem to have forgotten to mention what
> > > platform you're using.
> > Is that an LSE or LLSC arm64 ?
> 
> I'm not sure how to distinguish if it's LSE or LLSC, here's some info on the
> cpu:
> 
> $ cat /sys/devices/system/cpu/cpu0/regs/identification/midr_el1
> 0x00000000481fd010
> 
> $ lscpu
> Architecture:        aarch64
> Byte Order:          Little Endian
> CPU(s):              96
> On-line CPU(s) list: 0-95
> Thread(s) per core:  1
> Core(s) per socket:  48
> Socket(s):           2
> NUMA node(s):        4
> Vendor ID:           HiSilicon
> BIOS Vendor ID:      HiSilicon
> Model:               0
> Model name:          Kunpeng-920
> BIOS Model name:     Kunpeng 920-4826
> Stepping:            0x1
> BogoMIPS:            200.00
> L1d cache:           64K
> L1i cache:           64K
> L2 cache:            512K
> L3 cache:            49152K
> NUMA node0 CPU(s):   0-23
> NUMA node1 CPU(s):   24-47
> NUMA node2 CPU(s):   48-71
> NUMA node3 CPU(s):   72-95
> Flags:               fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp
> asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm
> 
> > Anyway, it seems that ARM64 shouldn't be using the fallback as it does
> > everything itself.
> > 
> > Mark, can you have a look please? At first glance the
> > atomic64_fetch_or_acquire() that's being used by generic bitops/lock.h
> > seems in order..
> > 
> We also suspect some implicit mechanism change in
> raw_atomic64_fetch_or_acquire. You can reproduce the problem with the
> above mod that can reproduce the problem to make it easier to locate.
> I can help reproduce it and grab some information if you can't reproduce
> it on your end.

FWIW this looks a lot like the crash I reported last week:
https://lore.kernel.org/linux-fsdevel/ZQep0OR0uMmR%2Fwg3@dread.disaster.area/T/#t

Also arm64, but virtualized.  I /think/ the host is some Ampere box,
though I have no idea what kind since it's just some Oracle Cloud A1
instance.  The internet claims "Ampere Altra" processors[1].

# lscpu
Architecture:           aarch64
  CPU op-mode(s):       32-bit, 64-bit
  Byte Order:           Little Endian
CPU(s):                 2
  On-line CPU(s) list:  0,1
Vendor ID:              ARM
  Model name:           Neoverse-N1
    Model:              1
    Thread(s) per core: 1
    Core(s) per socket: 2
    Socket(s):          1
    Stepping:           r3p1
    BogoMIPS:           50.00
    Flags:              fp asimd evtstrm aes pmull sha1 sha2 crc32
atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp ssbs
NUMA:                   
  NUMA node(s):         1
  NUMA node0 CPU(s):    0,1
Vulnerabilities:        
  Gather data sampling: Not affected
  Itlb multihit:        Not affected
  L1tf:                 Not affected
  Mds:                  Not affected
  Meltdown:             Not affected
  Mmio stale data:      Not affected
  Retbleed:             Not affected
  Spec rstack overflow: Not affected
  Spec store bypass:    Vulnerable
  Spectre v1:           Mitigation; __user pointer sanitization
  Spectre v2:           Mitigation; CSV2, but not BHB
  Srbds:                Not affected
  Tsx async abort:      Not affected

[1] https://www.oracle.com/cloud/compute/arm/ 

--D

> -- 
> With Best Regards,
> Baokun Li
> .

  reply	other threads:[~2023-09-18 18:42 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-23  4:06 [czhong@redhat.com: [bug report] WARNING: CPU: 121 PID: 93233 at fs/dcache.c:365 __dentry_kill+0x214/0x278] Ming Lei
2023-08-23  8:47 ` Christian Brauner
2023-08-28 10:43   ` Ming Lei
2023-09-13  8:59     ` Yi Zhang
2023-09-16  6:55       ` Baokun Li
2023-09-17  9:10         ` Peter Zijlstra
2023-09-17  9:26           ` Peter Zijlstra
2023-09-18  1:52             ` Baokun Li
2023-09-18 18:42               ` Darrick J. Wong [this message]
2023-09-18  1:10           ` Baokun Li
2023-09-18 10:20             ` Yi Zhang
2023-09-19 15:10         ` Mark Rutland
2023-09-17  0:35       ` Bagas Sanjaya
2023-09-29 13:24         ` Linux regression tracking #update (Thorsten Leemhuis)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230918184200.GA347993@frogsfrogsfrogs \
    --to=djwong@kernel.org \
    --cc=brauner@kernel.org \
    --cc=chengzhihao1@huawei.com \
    --cc=czhong@redhat.com \
    --cc=keescook@chromium.org \
    --cc=libaokun1@huawei.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=ming.lei@redhat.com \
    --cc=peterz@infradead.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=yangerkun@huawei.com \
    --cc=yi.zhang@huawei.com \
    --cc=yi.zhang@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.