All of lore.kernel.org
 help / color / mirror / Atom feed
* [Bug 210185] New: kernel BUG at fs/ext4/page-io.c:126!
@ 2020-11-13  9:16 bugzilla-daemon
  2020-11-13 20:10 ` [Bug 210185] " bugzilla-daemon
                   ` (14 more replies)
  0 siblings, 15 replies; 16+ messages in thread
From: bugzilla-daemon @ 2020-11-13  9:16 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=210185

            Bug ID: 210185
           Summary: kernel BUG at fs/ext4/page-io.c:126!
           Product: File System
           Version: 2.5
    Kernel Version: 5.9.8
          Hardware: x86-64
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: ext4
          Assignee: fs_ext4@kernel-bugs.osdl.org
          Reporter: emchroma@gmail.com
        Regression: No

We get the following error on several machines with kernel 5.9.8 (the same with
5.9.1).
The setup on all machines is identical. The error happens after 10-15 minutes
of
moderate I/O with a threaded application.

[ 2443.940844] kernel BUG at fs/ext4/page-io.c:126!
[ 2443.941045] invalid opcode: 0000 [#1] SMP PTI
[ 2443.941070] CPU: 8 PID: 982 Comm: kworker/u64:1 Not tainted 5.9.8-kd-cluster
#1
[ 2443.941111] Hardware name: Supermicro SYS-1028GQ-TR/X10DGQ, BIOS 1.0a
08/14/2015
[ 2443.941183] Workqueue: ext4-rsv-conversion ext4_end_io_rsv_work [ext4]
[ 2443.941234] RIP: 0010:ext4_finish_bio+0x25e/0x260 [ext4]
[ 2443.941262] Code: da 49 c1 e6 06 4c 03 30 e9 11 fe ff ff 48 83 c4 30 5b 5d
41 5c 41 5d 41 5e 41 5f c3 4d 8b 7e 28 4c 89 74 24 28 e9 2b fe ff ff <0f> 0b 0f
1f 44 00 00 48 8b 07 48 39 c7 0f 85 cf 00 00 00 f6 47 28
[ 2443.941365] RSP: 0018:ffffb1c68731bd98 EFLAGS: 00010246
[ 2443.941391] RAX: 0000000000000081 RBX: ffff9e662470f000 RCX:
0000000000000001
[ 2443.941413] RDX: 0000000000001000 RSI: 0000000000000001 RDI:
0000000000001000
[ 2443.941436] RBP: 0000000000000000 R08: ffff9e6834e68800 R09:
0000000000000000
[ 2443.941457] R10: ffffb1c68731bd00 R11: 0000000000000000 R12:
ffff9e662470f000
[ 2443.941480] R13: ffff9e65f71fcc18 R14: ffffe6bb792cb0c0 R15:
ffff9e66b626e2d8
[ 2443.941502] FS:  0000000000000000(0000) GS:ffff9e783fa00000(0000)
knlGS:0000000000000000
[ 2443.941527] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2443.941545] CR2: 0000564319c27638 CR3: 0000000fce3ae002 CR4:
00000000001706e0
[ 2443.941568] Call Trace:
[ 2443.941593]  ext4_release_io_end+0x48/0xf0 [ext4]
[ 2443.941618]  ext4_end_io_rsv_work+0xbd/0x180 [ext4]
[ 2443.941637]  process_one_work+0x199/0x380
[ 2443.941653]  ? pwq_activate_delayed_work+0x3b/0xa0
[ 2443.941671]  worker_thread+0x4f/0x3b0
[ 2443.941685]  ? rescuer_thread+0x360/0x360
[ 2443.941700]  kthread+0xfc/0x130
[ 2443.941712]  ? kthread_associate_blkcg+0x90/0x90
[ 2443.941729]  ret_from_fork+0x22/0x30
[ 2443.942216] Modules linked in: nfsv3 nfs_acl xt_conntrack xt_MASQUERADE
nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_filter
iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter
bridge bpfilter 8021q garp mrp stp llc rpcsec_gss_krb5 auth_rpcgss nfsv4
dns_resolver nfs lockd grace nfs_ssc fscache overlay intel_rapl_msr iTCO_wdt
intel_pmc_bxt iTCO_vendor_support intel_rapl_common sb_edac
x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd
cryptd glue_helper rapl intel_cstate intel_uncore pcspkr ast drm_vram_helper
drm_ttm_helper ttm drm_kms_helper cec joydev sg drm snd_hda_codec_hdmi lpc_ich
snd_hda_intel snd_intel_dspcfg snd_hda_codec mei_me mei snd_hda_core snd_hwdep
snd_pcm snd_timer snd soundcore ioatdma acpi_ipmi ipmi_si ipmi_devintf
ipmi_msghandler evdev acpi_power_meter acpi_pad tiny_power_button button
ecryptfs cbc encrypted_keys parport_pc ppdev
[ 2443.942244]  lp parport sunrpc ip_tables x_tables autofs4 hid_generic usbhid
hid ext4 crc16 mbcache jbd2 raid10 raid456 async_raid6_recov async_memcpy
async_pq async_xor async_tx xor sd_mod t10_pi raid6_pq libcrc32c crc32c_generic
raid1 raid0 multipath linear md_mod ahci libahci ehci_pci ehci_hcd libata igb
i2c_i801 i2c_algo_bit crc32c_intel scsi_mod i2c_smbus usbcore dca wmi
[ 2443.948743] ---[ end trace f72fe3c4ac9cf471 ]---
[ 2443.952835] RIP: 0010:ext4_finish_bio+0x25e/0x260 [ext4]
[ 2443.953707] Code: da 49 c1 e6 06 4c 03 30 e9 11 fe ff ff 48 83 c4 30 5b 5d
41 5c 41 5d 41 5e 41 5f c3 4d 8b 7e 28 4c 89 74 24 28 e9 2b fe ff ff <0f> 0b 0f
1f 44 00 00 48 8b 07 48 39 c7 0f 85 cf 00 00 00 f6 47 28
[ 2443.955098] RSP: 0018:ffffb1c68731bd98 EFLAGS: 00010246
[ 2443.955755] RAX: 0000000000000081 RBX: ffff9e662470f000 RCX:
0000000000000001
[ 2443.956572] RDX: 0000000000001000 RSI: 0000000000000001 RDI:
0000000000001000
[ 2443.957394] RBP: 0000000000000000 R08: ffff9e6834e68800 R09:
0000000000000000
[ 2443.958071] R10: ffffb1c68731bd00 R11: 0000000000000000 R12:
ffff9e662470f000
[ 2443.958790] R13: ffff9e65f71fcc18 R14: ffffe6bb792cb0c0 R15:
ffff9e66b626e2d8
[ 2443.959509] FS:  0000000000000000(0000) GS:ffff9e783fa00000(0000)
knlGS:0000000000000000
[ 2443.960272] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2443.961139] CR2: 0000564319c27638 CR3: 0000000fce3ae002 CR4:
00000000001706e0

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug 210185] kernel BUG at fs/ext4/page-io.c:126!
  2020-11-13  9:16 [Bug 210185] New: kernel BUG at fs/ext4/page-io.c:126! bugzilla-daemon
@ 2020-11-13 20:10 ` bugzilla-daemon
  2020-11-13 22:03 ` bugzilla-daemon
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: bugzilla-daemon @ 2020-11-13 20:10 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=210185

Amy (enbyamy@gmail.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |enbyamy@gmail.com

--- Comment #1 from Amy (enbyamy@gmail.com) ---
Hey there!

Traced back that file. That line runs page_buffers(), which launches a bug if
PagePrivate is false. 

#define page_buffers(page)                                      \
        ({                                                      \
                BUG_ON(!PagePrivate(page));                     \
                ((struct buffer_head *)page_private(page));     \
        })

What application were you running, if I may ask?

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug 210185] kernel BUG at fs/ext4/page-io.c:126!
  2020-11-13  9:16 [Bug 210185] New: kernel BUG at fs/ext4/page-io.c:126! bugzilla-daemon
  2020-11-13 20:10 ` [Bug 210185] " bugzilla-daemon
@ 2020-11-13 22:03 ` bugzilla-daemon
  2020-11-17 19:07 ` bugzilla-daemon
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: bugzilla-daemon @ 2020-11-13 22:03 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=210185

--- Comment #2 from emchroma@gmail.com ---
(In reply to Amy from comment #1)

> What application were you running, if I may ask?

Hi Amy,

it happens with AutoCtk (https://github.com/ksettaluri6/AutoCkt), so some
Python code. Everything runs fine with kernel 5.7.2, last week we went to
kernel 5.9.1 and this bug appeared. Today I've tried with it kernel 5.9.8, but
no luck.

The machine is unstable after this bug happens. The load goues up but we can't
kill the python processes, they're all in state D. Also, a clean reboot is not
possible, only a hard reset helps.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug 210185] kernel BUG at fs/ext4/page-io.c:126!
  2020-11-13  9:16 [Bug 210185] New: kernel BUG at fs/ext4/page-io.c:126! bugzilla-daemon
  2020-11-13 20:10 ` [Bug 210185] " bugzilla-daemon
  2020-11-13 22:03 ` bugzilla-daemon
@ 2020-11-17 19:07 ` bugzilla-daemon
  2020-11-18 10:30 ` bugzilla-daemon
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: bugzilla-daemon @ 2020-11-17 19:07 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=210185

Theodore Tso (tytso@mit.edu) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |tytso@mit.edu

--- Comment #3 from Theodore Tso (tytso@mit.edu) ---
How easily can you reproduce the problem?  

Can you give us instructions for a reliable repro (e.g., download AutoCtk, run
it with these options and this input file, and it will crash in N minutes)?

Thanks!

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug 210185] kernel BUG at fs/ext4/page-io.c:126!
  2020-11-13  9:16 [Bug 210185] New: kernel BUG at fs/ext4/page-io.c:126! bugzilla-daemon
                   ` (2 preceding siblings ...)
  2020-11-17 19:07 ` bugzilla-daemon
@ 2020-11-18 10:30 ` bugzilla-daemon
  2020-11-18 10:31 ` bugzilla-daemon
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: bugzilla-daemon @ 2020-11-18 10:30 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=210185

--- Comment #4 from emchroma@gmail.com ---
(In reply to Theodore Tso from comment #3)

Hi Theodore,

> How easily can you reproduce the problem?  

on each invocation.

> Can you give us instructions for a reliable repro (e.g., download AutoCtk,
> run it with these options and this input file, and it will crash in N
> minutes)?

You'll need Anaconda, Ngspice and AutoCkt. I'm on debian and I've apt-get
ngspice. I'll try a step by step procedure for Anaconda and AutoCkt

We're using https://repo.anaconda.com/archive/Anaconda3-2020.02-Linux-x86_64.sh
Download and install Anaconda

# activate base environment
$ . .bashrc

# upgrade pip
$ pip install --upgrade pip

# download AutoCkt
$ git clone https://github.com/ksettaluri6/AutoCkt.git

# create autockt_updated environment, use the attached autockt_updated.yml,
# not the environment provided by AutoCkt
$ conda env create -f autockt_updated.yml

# activate environment autockt_upgraded
$ conda activate autockt_updated

# copy the attached parameter file to AutoCkt/autockt
$ cp val_autobag_ray_hyperparameter_tuning.py AutoCkt/autockt

# run AutoCkt
$ cd AutoCkt
$ python autockt/gen_specs.py --num_specs 600
$ ipython
$ run autockt/val_autobag_ray_hyperparameter_tuning.py # this is in the IPython
shell

AutoCkt creates a lot of files in /tmp/ckt_da and /tmp/ray. tmp is on the root
partition and
our root is rather small (100G), therefore we start a python script to delete
files/directories
older than 10 minutes in /tmp/ckt_da from a second shell

# cleanup /tmp/ckt_da
$ python auto_delete_old_tmp_ckt_da_files.py

AutoCkt is very verbose and you'll probably see some warnings, but it should
run. On our machines
it usually takes 10-15 minutes, sometimes up to 30 minutes until the bug
appears.

Not sure whether it matters, but we're using software raid1 for the root
partition.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug 210185] kernel BUG at fs/ext4/page-io.c:126!
  2020-11-13  9:16 [Bug 210185] New: kernel BUG at fs/ext4/page-io.c:126! bugzilla-daemon
                   ` (3 preceding siblings ...)
  2020-11-18 10:30 ` bugzilla-daemon
@ 2020-11-18 10:31 ` bugzilla-daemon
  2020-11-18 10:33 ` bugzilla-daemon
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: bugzilla-daemon @ 2020-11-18 10:31 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=210185

--- Comment #5 from emchroma@gmail.com ---
Created attachment 293715
  --> https://bugzilla.kernel.org/attachment.cgi?id=293715&action=edit
AutoCkt Environment

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug 210185] kernel BUG at fs/ext4/page-io.c:126!
  2020-11-13  9:16 [Bug 210185] New: kernel BUG at fs/ext4/page-io.c:126! bugzilla-daemon
                   ` (4 preceding siblings ...)
  2020-11-18 10:31 ` bugzilla-daemon
@ 2020-11-18 10:33 ` bugzilla-daemon
  2020-11-18 10:33 ` bugzilla-daemon
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: bugzilla-daemon @ 2020-11-18 10:33 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=210185

--- Comment #6 from emchroma@gmail.com ---
Created attachment 293717
  --> https://bugzilla.kernel.org/attachment.cgi?id=293717&action=edit
Training parameter

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug 210185] kernel BUG at fs/ext4/page-io.c:126!
  2020-11-13  9:16 [Bug 210185] New: kernel BUG at fs/ext4/page-io.c:126! bugzilla-daemon
                   ` (5 preceding siblings ...)
  2020-11-18 10:33 ` bugzilla-daemon
@ 2020-11-18 10:33 ` bugzilla-daemon
  2020-11-18 13:09 ` bugzilla-daemon
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: bugzilla-daemon @ 2020-11-18 10:33 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=210185

--- Comment #7 from emchroma@gmail.com ---
Created attachment 293719
  --> https://bugzilla.kernel.org/attachment.cgi?id=293719&action=edit
Cleanup Script

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug 210185] kernel BUG at fs/ext4/page-io.c:126!
  2020-11-13  9:16 [Bug 210185] New: kernel BUG at fs/ext4/page-io.c:126! bugzilla-daemon
                   ` (6 preceding siblings ...)
  2020-11-18 10:33 ` bugzilla-daemon
@ 2020-11-18 13:09 ` bugzilla-daemon
  2020-11-18 13:09 ` bugzilla-daemon
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: bugzilla-daemon @ 2020-11-18 13:09 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=210185

--- Comment #8 from emchroma@gmail.com ---
Sorry, I forgot that we modified AutoCkt to our needs. Before your run
AutoCkt, please apply the attached patch and modify the .include in
AutoCkt/eval_engines/ngspice/ngspice_inputs/netlist/ACDAI.cir and
AutoCkt/eval_engines/ngspice/ngspice_inputs/netlist/two_stage_opamp.cir
to match your path

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug 210185] kernel BUG at fs/ext4/page-io.c:126!
  2020-11-13  9:16 [Bug 210185] New: kernel BUG at fs/ext4/page-io.c:126! bugzilla-daemon
                   ` (7 preceding siblings ...)
  2020-11-18 13:09 ` bugzilla-daemon
@ 2020-11-18 13:09 ` bugzilla-daemon
  2020-11-20  7:49 ` bugzilla-daemon
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: bugzilla-daemon @ 2020-11-18 13:09 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=210185

--- Comment #9 from emchroma@gmail.com ---
Created attachment 293725
  --> https://bugzilla.kernel.org/attachment.cgi?id=293725&action=edit
AutoCkt patch

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug 210185] kernel BUG at fs/ext4/page-io.c:126!
  2020-11-13  9:16 [Bug 210185] New: kernel BUG at fs/ext4/page-io.c:126! bugzilla-daemon
                   ` (8 preceding siblings ...)
  2020-11-18 13:09 ` bugzilla-daemon
@ 2020-11-20  7:49 ` bugzilla-daemon
  2021-02-02 11:00 ` bugzilla-daemon
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: bugzilla-daemon @ 2020-11-20  7:49 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=210185

--- Comment #10 from emchroma@gmail.com ---
I've made some more tests with different kernels and it looks like the the bug
was introduced with kernel 5.9.0. Kernel 5.8.18 and below (5.8.0 and 5.8.16)
runs
fine, kernel 5.9.0 shows the bug. Today I've tried kernel 5.9.9, which also
crashes.

I've also made the test on a different machine with different hardware and
different setup - there, all kernels seems to run fine. At least program ran
for a few hours before I stopped it.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug 210185] kernel BUG at fs/ext4/page-io.c:126!
  2020-11-13  9:16 [Bug 210185] New: kernel BUG at fs/ext4/page-io.c:126! bugzilla-daemon
                   ` (9 preceding siblings ...)
  2020-11-20  7:49 ` bugzilla-daemon
@ 2021-02-02 11:00 ` bugzilla-daemon
  2021-02-02 16:23 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: bugzilla-daemon @ 2021-02-02 11:00 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=210185

gpo (gernot.poerner@web.de) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |gernot.poerner@web.de

--- Comment #11 from gpo (gernot.poerner@web.de) ---
Hi, any updates here lately? 

Hitting the same bug on a completely different workload and setup. 5.9.x
crashes, 5.8.x works, this is all on Debian 10 Buster running as a VM

Currently we downgrade to 5.8.x because of this. I can provide further
information if interested.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug 210185] kernel BUG at fs/ext4/page-io.c:126!
  2020-11-13  9:16 [Bug 210185] New: kernel BUG at fs/ext4/page-io.c:126! bugzilla-daemon
                   ` (10 preceding siblings ...)
  2021-02-02 11:00 ` bugzilla-daemon
@ 2021-02-02 16:23 ` bugzilla-daemon
  2021-02-02 17:06 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: bugzilla-daemon @ 2021-02-02 16:23 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=210185

--- Comment #12 from Amy (enbyamy@gmail.com) ---
On Tue, Feb 2, 2021 at 3:00 AM <bugzilla-daemon@bugzilla.kernel.org> wrote:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=210185
>
> gpo (gernot.poerner@web.de) changed:
>
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |gernot.poerner@web.de
>
> --- Comment #11 from gpo (gernot.poerner@web.de) ---
> Hi, any updates here lately?

Nope, forgot about this thread.

>
> Hitting the same bug on a completely different workload and setup. 5.9.x
> crashes, 5.8.x works, this is all on Debian 10 Buster running as a VM
>

Do 5.10.x or 5.11-rcx work?

> Currently we downgrade to 5.8.x because of this. I can provide further
> information if interested.

Yes, that'd be great. You said you tried 5.9.1 - does it break on 5.9.0?

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug 210185] kernel BUG at fs/ext4/page-io.c:126!
  2020-11-13  9:16 [Bug 210185] New: kernel BUG at fs/ext4/page-io.c:126! bugzilla-daemon
                   ` (11 preceding siblings ...)
  2021-02-02 16:23 ` bugzilla-daemon
@ 2021-02-02 17:06 ` bugzilla-daemon
  2021-02-03  8:19 ` bugzilla-daemon
  2021-02-18 12:57 ` bugzilla-daemon
  14 siblings, 0 replies; 16+ messages in thread
From: bugzilla-daemon @ 2021-02-02 17:06 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=210185

--- Comment #13 from Theodore Tso (tytso@mit.edu) ---
How many CPU's and memory do you have on your servers?   And what distribution
and distro version are you using, in case that matters?

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug 210185] kernel BUG at fs/ext4/page-io.c:126!
  2020-11-13  9:16 [Bug 210185] New: kernel BUG at fs/ext4/page-io.c:126! bugzilla-daemon
                   ` (12 preceding siblings ...)
  2021-02-02 17:06 ` bugzilla-daemon
@ 2021-02-03  8:19 ` bugzilla-daemon
  2021-02-18 12:57 ` bugzilla-daemon
  14 siblings, 0 replies; 16+ messages in thread
From: bugzilla-daemon @ 2021-02-03  8:19 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=210185

--- Comment #14 from gpo (gernot.poerner@web.de) ---
This is on Debian Buster running 5.x kernels from backports. These are VMs (on
kvm) with 4 cores/32G Ram each. They are running as kubernetes workers.

When we set those up with Debian which in our image comes with kernel 5.9 they
constantly crashed after a while with:

[58832.748868] ------------[ cut here ]------------

[58832.749391] kernel BUG at fs/ext4/page-io.c:126!

[58832.749797] invalid opcode: 0000 [#1] SMP PTI

[58832.750166] CPU: 3 PID: 4233 Comm: kworker/u8:4 Not tainted
5.9.0-0.bpo.2-amd64 #1 Debian 5.9.6-1~bpo10+1

[58832.750903] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS

[58832.751468] Workqueue: ext4-rsv-conversion ext4_end_io_rsv_work [ext4]

[58832.752001] RIP: 0010:ext4_finish_bio+0x248/0x250 [ext4]

[58832.752434] Code: c7 a0 d0 66 c0 e8 d8 a0 03 f3 85 c0 0f 84 4c ff ff ff e9
f8 37 02 00 49 8b 44 24 28 4c 89 64 24 28 48 89 04 24 e9 51 fe ff ff <0f> 0b 66
0f 1f 44 00 00 66 66 66 66 90 41 57 41 56 41 55 41 54 55

[58832.753829] RSP: 0018:ffffb7b603637dc0 EFLAGS: 00010246

[58832.754264] RAX: 0000000000000081 RBX: ffff8c042706ae40 RCX:
0000000000000001

[58832.754822] RDX: 0000000000000000 RSI: 0000000000001000 RDI:
ffff8c042706ae40

[58832.755378] RBP: 0000000000001000 R08: 0000000000000000 R09:
ffffffffc0682b50

[58832.755943] R10: ffff8c0262c7f9d8 R11: 0000000000000001 R12:
ffffefbd1c982cc0

[58832.756510] R13: ffff8c042706ae40 R14: 0000000000000001 R15:
0000000000000000

[58832.757055] FS:  0000000000000000(0000) GS:ffff8c045fd80000(0000)
knlGS:0000000000000000

[58832.757677] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033

[58832.758138] CR2: 00007fd4543d2d80 CR3: 0000000609c48004 CR4:
00000000000606e0

[58832.758703] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000

[58832.759252] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400

[58832.759801] Call Trace:

[58832.760048]  ext4_release_io_end+0x48/0xf0 [ext4]

[58832.760422]  ext4_end_io_rsv_work+0x92/0x180 [ext4]

[58832.760824]  process_one_work+0x1ad/0x370

[58832.761191]  worker_thread+0x30/0x390

[58832.761505]  ? create_worker+0x1a0/0x1a0

[58832.761843]  kthread+0x116/0x130

[58832.762115]  ? kthread_park+0x80/0x80

[58832.762426]  ret_from_fork+0x22/0x30

[58832.762719] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4
dns_resolver nfs lockd grace nfs_ssc fscache iptable_filter xt_CT xt_multiport
xt_nat xt_tcpudp veth dm_mod xt_set ip_set_hash_ipport ip_set_bitmap_port
ip_set_hash_ipportnet ip_set_hash_ipportip ip_set dummy ip_vs_sh ip_vs_wrr
ip_vs_rr ip_vs xt_comment xt_mark xt_conntrack xt_MASQUERADE
nf_conntrack_netlink xfrm_user xfrm_algo nft_counter xt_addrtype nft_compat
nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c
nf_tables nfnetlink br_netfilter bridge stp llc overlay intel_rapl_msr
intel_rapl_common crc32_pclmul ghash_clmulni_intel aesni_intel libaes
crypto_simd cryptd glue_helper hid_generic usbhid hid virtio_balloon joydev
evdev pcspkr serio_raw qemu_fw_cfg button sunrpc tcp_bbr ip_tables x_tables
autofs4 ext4 crc16 mbcache jbd2 crc32c_generic ata_generic virtio_net
net_failover virtio_blk failover uhci_hcd ata_piix ehci_hcd libata usbcore
scsi_mod crct10dif_pclmul crct10dif_common psmouse usb_common

[58832.762748]  crc32c_intel virtio_pci virtio_ring virtio i2c_piix4

[58832.769326] ---[ end trace 22c803ae565a2f8c ]---

[58832.769746] RIP: 0010:ext4_finish_bio+0x248/0x250 [ext4]

[58832.770183] Code: c7 a0 d0 66 c0 e8 d8 a0 03 f3 85 c0 0f 84 4c ff ff ff e9
f8 37 02 00 49 8b 44 24 28 4c 89 64 24 28 48 89 04 24 e9 51 fe ff ff <0f> 0b 66
0f 1f 44 00 00 66 66 66 66 90 41 57 41 56 41 55 41 54 55

[58832.771522] RSP: 0018:ffffb7b603637dc0 EFLAGS: 00010246

[58832.771927] RAX: 0000000000000081 RBX: ffff8c042706ae40 RCX:
0000000000000001

[58832.772454] RDX: 0000000000000000 RSI: 0000000000001000 RDI:
ffff8c042706ae40

[58832.772984] RBP: 0000000000001000 R08: 0000000000000000 R09:
ffffffffc0682b50

[58832.773521] R10: ffff8c0262c7f9d8 R11: 0000000000000001 R12:
ffffefbd1c982cc0

[58832.774045] R13: ffff8c042706ae40 R14: 0000000000000001 R15:
0000000000000000

[58832.774566] FS:  0000000000000000(0000) GS:ffff8c045fd80000(0000)
knlGS:0000000000000000

[58832.775157] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033

[58832.775604] CR2: 00007fd4543d2d80 CR3: 0000000609c48004 CR4:
00000000000606e0

[58832.776122] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000

[58832.776636] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400

[58832.777195] Kernel panic - not syncing: Fatal exception

[58832.777840] Kernel Offset: 0x32200000 from 0xffffffff81000000 (relocation
range: 0xffffffff80000000-0xffffffffbfffffff)

[58832.778610] Rebooting in 1 seconds..

Loading Linux 5.9.0-0.bpo.2-amd64 ...

Loading initial ramdisk ...

[    0.000000] Linux version 5.9.0-0.bpo.2-amd64
(debian-kernel@lists.debian.org) (gcc-8 (Debian 8.3.0-6) 8.3.0, GNU ld (GNU
Binutils for Debian) 2.31.1) #1 SMP Debian 5.9.6-1~bpo10+1 (2020-11-19)

[    0.000000] Command line: BOOT_IMAGE=/vmlinuz-5.9.0-0.bpo.2-amd64
root=UUID=b722221a-e6a3-4609-a545-19454bd4f53c ro console=tty0
console=ttyS0,115200n8 noplymouth elevator=noop nomodeset net.ifnames=0
biosdevname=0

[    0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point
registers'

[    0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'

[    0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'

[    0.000000] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256

[    0.000000] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes,
using 'standard' format.

[    0.000000] BIOS-provided physical RAM map:

[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable

[    0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved

[    0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved

[    0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000bffdbfff] usable

[    0.000000] BIOS-e820: [mem 0x00000000bffdc000-0x00000000bfffffff] reserved

[    0.000000] BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved

[    0.000000] BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved

[    0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000083fffffff] usable

This is also how I came upon this bug report.

Downgrading to the latest 5.8 (5.8.0-0.bpo.2-amd64 #1 SMP Debian
5.8.10-1~bpo10+1 (2020-09-26) x86_64 GNU/Linux) stopped the constant crashing.

The problem here is that we could not identify for sure what really causes the
crash, these nodes are running differing workloads since they are kubernetes.

I would go ahead and test a later kernel (5.10) on one of these to see if the
problem is fixed in upstream already.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug 210185] kernel BUG at fs/ext4/page-io.c:126!
  2020-11-13  9:16 [Bug 210185] New: kernel BUG at fs/ext4/page-io.c:126! bugzilla-daemon
                   ` (13 preceding siblings ...)
  2021-02-03  8:19 ` bugzilla-daemon
@ 2021-02-18 12:57 ` bugzilla-daemon
  14 siblings, 0 replies; 16+ messages in thread
From: bugzilla-daemon @ 2021-02-18 12:57 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=210185

--- Comment #15 from gpo (gernot.poerner@web.de) ---
Ok, quick update here, we built a vanilla 5.10.16 and ran that for a couple of
days and it didn't occur again, so it seems to be fixed, at least for our
problem.

We will test with the official Debian 5.10 which now also was released but I'm
very optimistic it's fixed.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2021-02-18 16:35 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-13  9:16 [Bug 210185] New: kernel BUG at fs/ext4/page-io.c:126! bugzilla-daemon
2020-11-13 20:10 ` [Bug 210185] " bugzilla-daemon
2020-11-13 22:03 ` bugzilla-daemon
2020-11-17 19:07 ` bugzilla-daemon
2020-11-18 10:30 ` bugzilla-daemon
2020-11-18 10:31 ` bugzilla-daemon
2020-11-18 10:33 ` bugzilla-daemon
2020-11-18 10:33 ` bugzilla-daemon
2020-11-18 13:09 ` bugzilla-daemon
2020-11-18 13:09 ` bugzilla-daemon
2020-11-20  7:49 ` bugzilla-daemon
2021-02-02 11:00 ` bugzilla-daemon
2021-02-02 16:23 ` bugzilla-daemon
2021-02-02 17:06 ` bugzilla-daemon
2021-02-03  8:19 ` bugzilla-daemon
2021-02-18 12:57 ` bugzilla-daemon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.