regressions.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* [mainline] [arm64] Internal error: Oops - percpu_counter_add_batch
@ 2021-07-02  8:24 Naresh Kamboju
  2021-07-02 12:38 ` Naresh Kamboju
  2021-07-02 18:49 ` Linus Torvalds
  0 siblings, 2 replies; 4+ messages in thread
From: Naresh Kamboju @ 2021-07-02  8:24 UTC (permalink / raw)
  To: Linux ARM, open list, regressions, Greg Kroah-Hartman, linux-ext4
  Cc: Ritesh Harjani, Harshad Shirwadkar, Theodore Tso, lkft-triage,
	linux-fsdevel, Linus Torvalds, Al Viro, Andreas Dilger,
	Jens Axboe, Zhang Yi

Results from Linaro’s test farm.
Regression found on arm64 on Linux mainline tree.

The following kernel crash was noticed while running LTP fs_fill test case on
arm64 devices Linus ' mainline tree (this is not yet tagged / released).

This regression  / crash is easy to reproduce.

fs_fill.c:53: TINFO: Unlinking mntpoint/thread6/file2
fs_fill.c:87: TPASS: Got 6 ENOSPC runtime 3847ms
[ 1140.055715] Unable to handle kernel paging request at virtual
address ffff76a8a6b59000
[ 1140.058338] Mem abort info:
[ 1140.059216]   ESR = 0x96000004
[ 1140.060366]   EC = 0x25: DABT (current EL), IL = 32 bits
[ 1140.062013]   SET = 0, FnV = 0
[ 1140.062959]   EA = 0, S1PTW = 0
[ 1140.064001]   FSC = 0x04: level 0 translation fault
[ 1140.065496] Data abort info:
[ 1140.066389]   ISV = 0, ISS = 0x00000004
[ 1140.067556]   CM = 0, WnR = 0
[ 1140.068600] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000041e18000
[ 1140.070627] [ffff76a8a6b59000] pgd=0000000000000000, p4d=0000000000000000
[ 1140.072794] Internal error: Oops: 96000004 [#1] PREEMPT SMP
[ 1140.074473] Modules linked in: btrfs blake2b_generic libcrc32c xor
xor_neon zstd_compress raid6_pq tun rfkill crct10dif_ce fuse
[ 1140.077
93Br9o]a dCcPaUs: 0 PID: 8478t  mCeosmsma:g ef sf_rfoim sylslt eNot
taintedm d5-.j1o3u.r0n a#l1d@
juno[  (1F1r4i0 .20082110-8057]- 0H2a r0d6w:5a2r:e3 6n aUmTeC:) :linux,
dumm
y-kveirrnte l([D2T4)7]
: [ 1[1 4101.4007.20789349]0 2I]n tpesrtnaatle :e r8r0o0r0:0 00O5o
p(sN:z c9v6 0d0a0i0f0 4- P[A#N1 ]- UPAROE E-MTPCTO  SBMTPYPE=--
)
[ 1140.087819] pc : percpu_counter_add_batch+0x34/0x140
[ 1140.089436] lr : percpu_counter_add_batch+0x2c/0x140
[ 1140.091033] sp : ffff8000114ebaf0
[ 1140.092122] x29: ffff8000114ebaf0 x28: ffff1f3342f8ac40 x27: 0000000000000000
[ 1140.094330] x26: 0000000000000000 x25: 0000000000000002 x24: 0000000000000001
[ 1140.096579] x23: ffff1f3342f8ac40 x22: ffff1f334391ec50 x21: 0000000000000020
[ 1140.098772] x20: ffffffffffffffff x19: ffff1f334391eb80 x18: 0000000000000000
[ 1140.101012] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
[ 1140.103222] x14: 0000000000000004 x13: 0000000000000000 x12: ffff1f3344e6e630
[ 1140.105431] x11: ffff1f334054dda8 x10: 0000000000000a50 x9 : ffffa88ad740491c
[ 1140.107653] x8 : ffff1f3342f8b6f0 x7 : 0000000000000000 x6 : 0000000000000001
[ 1140.109869] x5 : ffffa88ad93bf260 x4 : ffffa88ad93bf000 x3 : 0000000000000000
[ 1140.112108] x2 : 0000000000000002 x1 : ffff76a8a6b59000 x0 : 0000000000000000
[ 1140.114298] Call trace:
[ 1140.115107]  percpu_counter_add_batch+0x34/0x140
[ 1140.116578]  __jbd2_journal_remove_checkpoint+0x84/0x20c
[ 1140.118237]  jbd2_log_do_checkpoint+0xbc/0x424
[ 1140.119659]  jbd2_journal_destroy+0x11c/0x2dc
[ 1140.121043]  ext4_put_super+0x8c/0x3c0
[ 1140.122226]  generic_shutdown_super+0x80/0x110
[ 1140.123651]  kill_block_super+0x2c/0x74
[ 1140.124882]  deactivate_locked_super+0x58/0xdc
[ 1140.126274]  deactivate_super+0x80/0xa0
[ 1140.127516]  cleanup_mnt+0xe4/0x180
[ 1140.128637]  __cleanup_mnt+0x20/0x2c
[ 1140.129766]  task_work_run+0x90/0x190
[ 1140.130920]  do_notify_resume+0x258/0x1350
[ 1140.132267]  work_pending+0xc/0x43c
[ 1140.133381] Code: f9001bf7 97eb400b d538d081 f9401260 (b8616816)
[ 1140.135285] ---[ end trace c36233c500dfe4b8 ]---
[ 1140.136872] note: fs_fill[8478] exited with preempt_count 2

Full test log:
--------------
https://qa-reports.linaro.org/lkft/linux-mainline-master/build/v5.13-7637-g3dbdb38e2869/testrun/5034937/suite/ltp-fs-tests/test/fs_fill/log

metadata:
  git branch: master
  git repo: https://gitlab.com/Linaro/lkft/mirrors/torvalds/linux-mainline
  git commit: 3dbdb38e286903ec220aaf1fb29a8d94297da246
  git describe: v5.13-7637-g3dbdb38e2869
  kernel-config: https://builds.tuxbuild.com/1ukJXvBf1z1X0lUo2IV4O0Q5WkQ/config
  System.map: https://builds.tuxbuild.com/1ukJXvBf1z1X0lUo2IV4O0Q5WkQ/System.map
  vmlinux: https://builds.tuxbuild.com/1ukJXvBf1z1X0lUo2IV4O0Q5WkQ/vmlinux.xz

step to reproduce:
1) use the kernel Image from above link or build your kernel with below command

# TuxMake is a command line tool and Python library that provides
# portable and repeatable Linux kernel builds across a variety of
# architectures, toolchains, kernel configurations, and make targets.
#
# TuxMake supports the concept of runtimes.
# See https://docs.tuxmake.org/runtimes/, for that to work it requires
# that you install podman or docker on your system.
#
# To install tuxmake on your system globally:
# sudo pip3 install -U tuxmake
#
# See https://docs.tuxmake.org/ for complete documentation.


tuxmake --runtime podman --target-arch arm64 --toolchain gcc-11
--kconfig defconfig --kconfig-add
https://builds.tuxbuild.com/1ukJXvBf1z1X0lUo2IV4O0Q5WkQ/config

2) Boot the qemu with below command
/usr/bin/qemu-system-aarch64 -cpu host -machine virt,accel=kvm
-nographic -net nic,model=virtio,macaddr=BC:DD:AD:CC:09:02 -net tap -m
4096 -monitor none -kernel Image.gz --append "console=ttyAMA0
root=/dev/vda rw" -hda
rpb-console-image-lkft-juno-20210525221209.rootfs.ext4 -smp 4


3) run ltp fs_fill test case
   # cd /opt/ltp
   # ./runltp -s fs_fill
   # you notice the reported crash

The git log short summary between good and bad.

$ git log --oneline dbe69e433722..3dbdb38e2869 -- fs/
c288d9cd7104 Merge tag 'for-5.14/io_uring-2021-06-30' of
git://git.kernel.dk/linux-block
911a2997a5b7 Merge tag 'fs_for_v5.14-rc1' of
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
a6ecc2a491e3 Merge tag 'ext4_for_linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
e149bd742b2d io_uring: code clean for kiocb_done()
915b3dde9b72 io_uring: spin in iopoll() only when reqs are in a single queue
99ebe4efbd38 io_uring: pre-initialise some of req fields
5182ed2e332e io_uring: refactor io_submit_flush_completions
4cfb25bf8877 io_uring: optimise hot path restricted checks
e5dc480d4ed9 io_uring: remove not needed PF_EXITING check
dd432ea5204e io_uring: mainstream sqpoll task_work running
b2d9c3da7711 io_uring: refactor io_arm_poll_handler()
59b735aeeb0f io_uring: reduce latency by reissueing the operation
22634bc5620d io_uring: add IOPOLL and reserved field checks to
IORING_OP_UNLINKAT
ed7eb2592286 io_uring: add IOPOLL and reserved field checks to
IORING_OP_RENAMEAT
12dcb58ac785 io_uring: refactor io_openat2()
16340eab61a3 io_uring: update sqe layout build checks
fe7e32575029 io_uring: fix code style problems
1a924a808208 io_uring: refactor io_sq_thread()
948e19479cb6 io_uring: don't change sqpoll creds if not needed
16aa4c9a1fbe jbd2: export jbd2_journal_[un]register_shrinker()
d578b99443fd ext4: notify sysfs on errors_count value change
8b0ed8443ae6 writeback: fix obtain a reference to a freeing memcg css
acc6100d3ffa fs: remove bdev_try_to_free_page callback
3b672e3aedff ext4: remove bdev_try_to_free_page() callback
dbf2bab7935b jbd2: simplify journal_clean_one_cp_list()
4ba3fcdde7e3 jbd2,ext4: add a shrinker to release checkpointed buffers
214eb5a4d8a2 jbd2: remove redundant buffer io error checks
235d68069cbd jbd2: don't abort the journal when freeing buffers
fcf37549ae19 jbd2: ensure abort the journal if detect IO error when
writing original buffer back
1866cba84243 jbd2: remove the out label in __jbd2_journal_remove_checkpoint()
0caaefbaf2a4 ext4: no need to verify new add extent block
d07621d9b9b8 jbd2: clean up misleading comments for jbd2_fc_release_bufs
b1489186cc83 ext4: add check to prevent attempting to resize an fs
with sparse_super2
e9f9f61d0cdc ext4: consolidate checks for resize of bigalloc into
ext4_resize_begin
310c097c2bdb ext4: remove duplicate definition of ext4_xattr_ibody_inline_set()
ee00d6b3c7aa ext4: fsmap: fix the block/inode bitmap comment
6d2424a84533 ext4: fix comment for s_hash_unsigned
4ce8ad95f0af io_uring: Create define to modify a SQPOLL parameter
997135017716 io_uring: Fix race condition when sqp thread goes to sleep
f9505c72b2ee ext4: use local variable ei instead of EXT4_I() macro
c89849cc0259 ext4: fix avefreec in find_group_orlov
4fb7c70a889e ext4: correct the cache_nr in tracepoint ext4_es_shrink_exit
e5e7010e5444 ext4: remove check for zero nr_to_scan in ext4_es_scan()
b2d2e7573548 ext4: remove set but rewrite variables
351a0a3fbc35 ext4: add ioctl EXT4_IOC_CHECKPOINT
01d5d96542fd ext4: add discard/zeroout flags to journal flush
ce1b06c5f5e7 quota: remove unnecessary oom message
7a778f9dc32d io_uring: improve in tctx_task_work() resubmission
16f72070386f io_uring: don't resched with empty task_list
c6538be9e488 io_uring: refactor tctx task_work list splicing
ebd0df2e6342 io_uring: optimise task_work submit flushing
3f18407dc6f2 io_uring: inline __tctx_task_work()
a3dbdf54da80 io_uring: refactor io_get_sequence()
c854357bc1b9 io_uring: clean all flags in io_clean_op() at once
1dacb4df4ebe io_uring: simplify iovec freeing in io_clean_op()
b8e64b530011 io_uring: track request creds with a flag
c10d1f986b4e io_uring: move creds from io-wq work to io_kiocb
2a2758f26df5 io_uring: refactor io_submit_flush_completions()
e6ab8991c5d0 io_uring: fix false WARN_ONCE
fe76421d1da1 io_uring: allow user configurable IO thread CPU affinity
0e03496d1967 io-wq: use private CPU mask
e8d46b384129 isofs: remove redundant continue statement
8f6840c4fd1e ext4: return error code when ext4_fill_flex_info() fails
b9a037b7f3c4 ext4: cleanup in-core orphan list if ext4_truncate()
failed to get a transaction handle
ce3aba43599f ext4: fix kernel infoleak via ext4_extent_header
618f003199c6 ext4: fix memory leak in ext4_fill_super
1fc57ca5a2cd ext4: remove redundant assignment to error
5c680150d7f4 ext4: remove redundant check buffer_uptodate()
d0b040f5f255 ext4: fix overflow in ext4_iomap_alloc()
ec16d35b6c9d io-wq: remove header files not needed anymore
236daeae3616 io_uring: Add to traces the req pointer when available
2335f6f5ddf2 io_uring: optimise io_commit_cqring()
3c19966d3710 io_uring: shove more drain bits out of hot path
10c669040e9b io_uring: switch !DRAIN fast path when possible
27f6b318dea2 io_uring: fix min types mismatch in table alloc
dd9ae8a0b298 io_uring: Fix comment of io_get_sqe
441b8a7803bf io_uring: optimise non-drain path
76cc33d79175 io_uring: refactor io_req_defer()
0499e582aaff io_uring: move uring_lock location
311997b3fcdd io_uring: wait heads renaming
5ed7a37d21b3 io_uring: clean up check_overflow flag
5e159204d7ed io_uring: small io_submit_sqe() optimisation
f18ee4cf0a27 io_uring: optimise completion timeout flushing
15641e427070 io_uring: don't cache number of dropped SQEs
17d3aeb33cda io_uring: refactor io_get_sqe()
7f1129d227ea io_uring: shuffle more fields into SQ ctx section
b52ecf8cb5b5 io_uring: move ctx->flags from SQ cacheline
c7af47cf0fab io_uring: keep SQ pointers in a single cacheline
b1b2fc3574a6 io-wq: remove redundant initialization of variable ret
fdd1dc316e89 io_uring: Fix incorrect sizeof operator for copy_from_user call
aeab9506ef50 io_uring: inline io_iter_do_read()
78cc687be9c5 io_uring: unify SQPOLL and user task cancellations
09899b19155a io_uring: cache task struct refs
2d091d62b110 io_uring: don't vmalloc rsrc tags
9123c8ffce16 io_uring: add helpers for 2 level table alloc
157d257f99c1 io_uring: remove rsrc put work irq save/restore
d878c81610e1 io_uring: hide rsrc tag copy into generic helpers
e587227b680f io-wq: simplify worker exiting
769e68371521 io-wq: don't repeat IO_WQ_BIT_EXIT check by worker
eef51daa72f7 io_uring: rename function *task_file
cb3d8972c78a io_uring: refactor io_iopoll_req_issued
382cb030469d io-wq: remove unused io-wq refcounting
c7f405d6fa36 io-wq: embed wqe ptr array into struct io_wq
976517f162a0 io_uring: fix blocking inline submission
40dad765c045 io_uring: enable shmem/memfd memory registration
d0acdee296d4 io_uring: don't bounce submit_state cachelines
d068b5068d43 io_uring: rename io_get_cqring
8f6ed49a4443 io_uring: kill cached_cq_overflow
ea5ab3b57983 io_uring: deduce cq_mask from cq_entries
a566c5562d41 io_uring: remove dependency on ring->sq/cq_entries
b13a8918d395 io_uring: better locality for rsrc fields
b986af7e2df4 io_uring: shuffle rarely used ctx fields
93d2bcd2cbfe io_uring: make fail flag not link specific
3dd0c97a9e01 io_uring: get rid of files in exit cancel
acfb381d9d71 io_uring: simplify waking sqo_sq_wait
21f2fc080f86 io_uring: remove unused park_task_work
aaa9f0f48172 io_uring: improve sq_thread waiting check
e4b6d902a9e3 io_uring: improve sqpoll event/state handling
64c2c2c62f92 quota: Change quotactl_path() systcall to an fd-based one
21e4e15a846f reiserfs: Remove unneed check in reiserfs_write_full_page()
fa236c2b2d44 udf: Fix NULL pointer dereference in udf_symlink function
a149127be52f reiserfs: add check for invalid 1st journal block

--
Linaro LKFT
https://lkft.linaro.org

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [mainline] [arm64] Internal error: Oops - percpu_counter_add_batch
  2021-07-02  8:24 [mainline] [arm64] Internal error: Oops - percpu_counter_add_batch Naresh Kamboju
@ 2021-07-02 12:38 ` Naresh Kamboju
  2021-07-02 13:29   ` Zhang Yi
  2021-07-02 18:49 ` Linus Torvalds
  1 sibling, 1 reply; 4+ messages in thread
From: Naresh Kamboju @ 2021-07-02 12:38 UTC (permalink / raw)
  To: Linux ARM, open list, regressions, Greg Kroah-Hartman, linux-ext4
  Cc: Ritesh Harjani, Harshad Shirwadkar, Theodore Tso, lkft-triage,
	linux-fsdevel, Linus Torvalds, Al Viro, Andreas Dilger,
	Jens Axboe, Zhang Yi

On Fri, 2 Jul 2021 at 13:54, Naresh Kamboju <naresh.kamboju@linaro.org> wrote:
>
> Results from Linaro’s test farm.
> Regression found on arm64 on Linux mainline tree.

Regression found on arm64, arm and i386 on Linux mainline tree.
But x86_64 tests PASS.

>
> The following kernel crash was noticed while running LTP fs_fill test case on
> arm64 devices Linus ' mainline tree (this is not yet tagged / released).
>
> This regression  / crash is easy to reproduce.
>
> fs_fill.c:53: TINFO: Unlinking mntpoint/thread6/file2
> fs_fill.c:87: TPASS: Got 6 ENOSPC runtime 3847ms
> [ 1140.055715] Unable to handle kernel paging request at virtual
> address ffff76a8a6b59000

ref;
https://lore.kernel.org/regressions/CA+G9fYuBvh-H8Vqp58j-coXUD8p1A6h2it_aZdRiYcN2soGNdg@mail.gmail.com/T/#u

- Naresh

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [mainline] [arm64] Internal error: Oops - percpu_counter_add_batch
  2021-07-02 12:38 ` Naresh Kamboju
@ 2021-07-02 13:29   ` Zhang Yi
  0 siblings, 0 replies; 4+ messages in thread
From: Zhang Yi @ 2021-07-02 13:29 UTC (permalink / raw)
  To: Naresh Kamboju, Linux ARM, open list, regressions,
	Greg Kroah-Hartman, linux-ext4
  Cc: Ritesh Harjani, Harshad Shirwadkar, Theodore Tso, lkft-triage,
	linux-fsdevel, Linus Torvalds, Al Viro, Andreas Dilger,
	Jens Axboe

On 2021/7/2 20:38, Naresh Kamboju wrote:
> On Fri, 2 Jul 2021 at 13:54, Naresh Kamboju <naresh.kamboju@linaro.org> wrote:
>>
>> Results from Linaro’s test farm.
>> Regression found on arm64 on Linux mainline tree.
> 
> Regression found on arm64, arm and i386 on Linux mainline tree.
> But x86_64 tests PASS.
> 
>>
>> The following kernel crash was noticed while running LTP fs_fill test case on
>> arm64 devices Linus ' mainline tree (this is not yet tagged / released).
>>
>> This regression  / crash is easy to reproduce.
>>
>> fs_fill.c:53: TINFO: Unlinking mntpoint/thread6/file2
>> fs_fill.c:87: TPASS: Got 6 ENOSPC runtime 3847ms
>> [ 1140.055715] Unable to handle kernel paging request at virtual
>> address ffff76a8a6b59000
> 
> ref;
> https://lore.kernel.org/regressions/CA+G9fYuBvh-H8Vqp58j-coXUD8p1A6h2it_aZdRiYcN2soGNdg@mail.gmail.com/T/#u
> 
> - Naresh
> .
> 

Thanks for the report, I'm not test this patch on arm64 and powerpc,
and sorry about not catching this problem, I will fix it soon.

Yi.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [mainline] [arm64] Internal error: Oops - percpu_counter_add_batch
  2021-07-02  8:24 [mainline] [arm64] Internal error: Oops - percpu_counter_add_batch Naresh Kamboju
  2021-07-02 12:38 ` Naresh Kamboju
@ 2021-07-02 18:49 ` Linus Torvalds
  1 sibling, 0 replies; 4+ messages in thread
From: Linus Torvalds @ 2021-07-02 18:49 UTC (permalink / raw)
  To: Naresh Kamboju
  Cc: Linux ARM, open list, regressions, Greg Kroah-Hartman,
	linux-ext4, Ritesh Harjani, Harshad Shirwadkar, Theodore Tso,
	lkft-triage, linux-fsdevel, Al Viro, Andreas Dilger, Jens Axboe,
	Zhang Yi

On Fri, Jul 2, 2021 at 1:24 AM Naresh Kamboju <naresh.kamboju@linaro.org> wrote:
>
> The git log short summary between good and bad.

Ity would have been really good to have a full bisect.

But from another report:

    https://lore.kernel.org/lkml/87ade24e-08f2-5fb1-7616-e6032af399a3@nvidia.com/

 it seems to be commit 4ba3fcdde7e3 ("jbd2,ext4: add a shrinker to
release checkpointed buffers"):

Ted, when there is a fix for this, it would be interesting to see what
made this all work fine on x86-64 but fail elsewhere...

         Linus

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-07-02 18:50 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-02  8:24 [mainline] [arm64] Internal error: Oops - percpu_counter_add_batch Naresh Kamboju
2021-07-02 12:38 ` Naresh Kamboju
2021-07-02 13:29   ` Zhang Yi
2021-07-02 18:49 ` Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).