All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yi Zhang <yi.zhang@redhat.com>
To: Jason Gunthorpe <jgg@nvidia.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Ralph Campbell <rcampbell@nvidia.com>
Cc: linux-nvdimm <linux-nvdimm@lists.01.org>
Subject: Re: regression from 5.10.0-rc3: BUG: Bad page state in process kworker/41:0 pfn:891066 during fio on devdax
Date: Wed, 18 Nov 2020 22:02:00 +0800	[thread overview]
Message-ID: <51e938d1-aff7-0fa4-1a79-f77ac8bb2f8b@redhat.com> (raw)
In-Reply-To: <ef5aca5c-6d32-8d01-81d6-ac65558115fa@redhat.com>

ping
This issue still can be reproduced on 5.10.0-rc4

[ 1914.356562] BUG: Bad page state in process kworker/58:0  pfn:1fadf5
[ 1914.390159] page:00000000fee4d2a1 refcount:0 mapcount:-1024 mapping:0000000000000000 index:0x0 pfn:0x1fadf5
[ 1914.436292] flags: 0x17ffffc0000000()
[ 1914.452792] raw: 0017ffffc0000000 dead000000000100 dead000000000122 0000000000000000
[ 1914.488322] raw: 0000000000000000 0000000000000000 00000000fffffbff 0000000000000000
[ 1914.523625] page dumped because: nonzero mapcount
[ 1914.544972] Modules linked in: dm_log_writes loop ext4 mbcache jbd2 rfkill sunrpc vfat fat dm_multipath intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass mgag200 crct10dif_pclmul i2c_algo_bit drm_kms_helper syscopyarea crc32_pclmul ghash_clmulni_intel iTCO_wdt sysfillrect sysimgblt rapl fb_sys_fops intel_cstate iTCO_vendor_support drm dax_pmem_compat ipmi_ssif device_dax intel_uncore pcspkr dax_pmem_core i2c_i801 lpc_ich acpi_ipmi ipmi_si joydev ipmi_devintf acpi_tad ipmi_msghandler hpilo hpwdt i2c_smbus ioatdma acpi_power_meter dca ip_tables xfs sr_mod cdrom sd_mod t10_pi sg nd_pmem nd_btt ahci bnx2x nfit libahci libata tg3 libnvdimm hpsa mdio libcrc32c scsi_transport_sas crc32c_intel wmi dm_mirror dm_region_hash dm_log dm_mod
[ 1914.862181] CPU: 58 PID: 14617 Comm: kworker/58:0 Tainted: G S  B             5.10.0-rc4 #1
[ 1914.903469] Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 10/05/2016
[ 1914.945189] Workqueue: mm_percpu_wq vmstat_update
[ 1914.966350] Call Trace:
[ 1914.977331]  dump_stack+0x57/0x6a
[ 1914.992193]  bad_page.cold.114+0x9b/0xa0
[ 1915.009908]  free_pcppages_bulk+0x538/0x760
[ 1915.029226]  drain_zone_pages+0x1f/0x30
[ 1915.046526]  refresh_cpu_vm_stats+0x1ea/0x2b0
[ 1915.066113]  vmstat_update+0xf/0x50
[ 1915.081784]  process_one_work+0x1a4/0x340
[ 1915.099858]  ? process_one_work+0x340/0x340
[ 1915.118741]  worker_thread+0x30/0x370
[ 1915.135268]  ? process_one_work+0x340/0x340
[ 1915.154211]  kthread+0x116/0x130
[ 1915.168771]  ? kthread_park+0x80/0x80
[ 1915.185635]  ret_from_fork+0x22/0x30
[ 1972.063440] restraintd[2377]: *** Current Time: Mon Nov 16 00:56:57 2020  Localwatchdog at: Mon Nov 16 02:55:57 2020
[ 1976.501706] BUG: Bad page state in process kworker/4:0  pfn:a24692
[ 1976.532586] page:00000000f000e4ba refcount:0 mapcount:-1024 mapping:0000000000000000 index:0x0 pfn:0xa24692
[ 1976.581869] flags: 0x57ffffc0000000()
[ 1976.599064] raw: 0057ffffc0000000 dead000000000100 dead000000000122 0000000000000000
[ 1976.635786] raw: 0000000000000000 0000000000000000 00000000fffffbff 0000000000000000
[ 1976.671862] page dumped because: nonzero mapcount
[ 1976.694287] Modules linked in: dm_log_writes loop ext4 mbcache jbd2 rfkill sunrpc vfat fat dm_multipath intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass mgag200 crct10dif_pclmul i2c_algo_bit drm_kms_helper syscopyarea crc32_pclmul ghash_clmulni_intel iTCO_wdt sysfillrect sysimgblt rapl fb_sys_fops intel_cstate iTCO_vendor_support drm dax_pmem_compat ipmi_ssif device_dax intel_uncore pcspkr dax_pmem_core i2c_i801 lpc_ich acpi_ipmi ipmi_si joydev ipmi_devintf acpi_tad ipmi_msghandler hpilo hpwdt i2c_smbus ioatdma acpi_power_meter dca ip_tables xfs sr_mod cdrom sd_mod t10_pi sg nd_pmem nd_btt ahci bnx2x nfit libahci libata tg3 libnvdimm hpsa mdio libcrc32c scsi_transport_sas crc32c_intel wmi dm_mirror dm_region_hash dm_log dm_mod
[ 1977.024006] CPU: 4 PID: 23471 Comm: kworker/4:0 Tainted: G S  B             5.10.0-rc4 #1
[ 1977.067069] Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 10/05/2016
[ 1977.106156] Workqueue: mm_percpu_wq vmstat_update
[ 1977.128645] Call Trace:
[ 1977.140263]  dump_stack+0x57/0x6a
[ 1977.155844]  bad_page.cold.114+0x9b/0xa0
[ 1977.174451]  free_pcppages_bulk+0x538/0x760
[ 1977.194417]  drain_zone_pages+0x1f/0x30
[ 1977.212748]  refresh_cpu_vm_stats+0x1ea/0x2b0
[ 1977.233450]  vmstat_update+0xf/0x50
[ 1977.249779]  process_one_work+0x1a4/0x340
[ 1977.268797]  ? process_one_work+0x340/0x340
[ 1977.288564]  worker_thread+0x30/0x370
[ 1977.306138]  ? process_one_work+0x340/0x340
[ 1977.326017]  kthread+0x116/0x130
[ 1977.341274]  ? kthread_park+0x80/0x80
[ 1977.358649]  ret_from_fork+0x22/0x30



On 11/11/20 11:44 AM, Yi Zhang wrote:
> Add Ralph
>
>>>
>> Hi Dan/Jason
>>
>> It turns out that it was introduced by bellow patch[1] which fixed 
>> the "static key devmap_managed_key" issue, but introduced [2]
>> Finally I found it was not 100% reproduced, and sorry for my mistake.
>>
>> [1]
>> commit 46b1ee38b2ba1a9524c8e886ad078bd3ca40de2a (HEAD)
>> Author: Ralph Campbell <rcampbell@nvidia.com>
>> Date:   Sun Nov 1 17:07:23 2020 -0800
>>
>>     mm/mremap_pages: fix static key devmap_managed_key updates
>>
>> [2]
>> [ 1129.792673] memmap_init_zone_device initialised 2063872 pages in 34ms
>> [ 1129.865469] memmap_init_zone_device initialised 2063872 pages in 34ms
>> [ 1129.924080] memmap_init_zone_device initialised 2063872 pages in 24ms
>> [ 1129.987160] memmap_init_zone_device initialised 2063872 pages in 25ms
>> [ 1170.785114] BUG: Bad page state in process kworker/67:2 pfn:189e3e
>> [ 1170.815859] page:000000002f5fe047 refcount:0 mapcount:-1024 
>> mapping:0000000000000000 index:0x0 pfn:0x189e3e
>> [ 1170.864772] flags: 0x17ffffc0000000()
>> [ 1170.883291] raw: 0017ffffc0000000 dead000000000100 
>> dead000000000122 0000000000000000
>> [ 1170.920537] raw: 0000000000000000 0000000000000000 
>> 00000000fffffbff 0000000000000000
>> [ 1170.957627] page dumped because: nonzero mapcount
>> [ 1170.980101] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 
>> dns_resolver nfs lockd grace nfs_ssc fscache rfkill sunrpc vfat fat 
>> dm_multipath intel_rapl_msr intel_rapl_common sb_edac 
>> x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel ipmi_ssif 
>> kvm irqbypass mgag200 crct10dif_pclmul iTCO_wdt i2c_algo_bit 
>> crc32_pclmul iTCO_vendor_support drm_kms_helper syscopyarea acpi_ipmi 
>> ghash_clmulni_intel sysfillrect ipmi_si rapl sysimgblt fb_sys_fops 
>> i2c_i801 ipmi_devintf drm ipmi_msghandler intel_cstate intel_uncore 
>> dax_pmem_compat device_dax ioatdma i2c_smbus acpi_tad joydev 
>> dax_pmem_core pcspkr hpwdt lpc_ich acpi_power_meter hpilo dca 
>> ip_tables xfs sr_mod cdrom sd_mod t10_pi sg nd_pmem nd_btt ahci bnx2x 
>> libahci nfit libata tg3 libnvdimm hpsa mdio scsi_transport_sas 
>> libcrc32c wmi crc32c_intel dm_mirror dm_region_hash dm_log dm_mod
>> [ 1171.332281] CPU: 67 PID: 2700 Comm: kworker/67:2 Tainted: G 
>> S                5.10.0-rc2.46b1ee38b2ba+ #4
>> [ 1171.378334] Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 
>> Gen9, BIOS P89 10/05/2016
>> [ 1171.419774] Workqueue: mm_percpu_wq vmstat_update
>> [ 1171.442726] Call Trace:
>> [ 1171.454481]  dump_stack+0x57/0x6a
>> [ 1171.470597]  bad_page.cold.114+0x9b/0xa0
>> [ 1171.489841]  free_pcppages_bulk+0x538/0x760
>> [ 1171.509124]  drain_zone_pages+0x1f/0x30
>> [ 1171.527649]  refresh_cpu_vm_stats+0x1ea/0x2b0
>> [ 1171.548935]  vmstat_update+0xf/0x50
>> [ 1171.565961]  process_one_work+0x1a4/0x340
>> [ 1171.585142]  ? process_one_work+0x340/0x340
>> [ 1171.605147]  worker_thread+0x30/0x370
>> [ 1171.622603]  ? process_one_work+0x340/0x340
>> [ 1171.642355]  kthread+0x116/0x130
>> [ 1171.657519]  ? kthread_park+0x80/0x80
>> [ 1171.674713]  ret_from_fork+0x22/0x30
>> [ 1171.691291] Disabling lock debugging due to kernel taint
>>
>>>> How confident are you in the bisection?
>>>>
>>>> Jason
>>>>
>>> _______________________________________________
>>> Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
>>> To unsubscribe send an email to linux-nvdimm-leave@lists.01.org
>>>
>> _______________________________________________
>> Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
>> To unsubscribe send an email to linux-nvdimm-leave@lists.01.org
> _______________________________________________
> Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
> To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

  reply	other threads:[~2020-11-18 14:02 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1934921834.1085815.1604889035798.JavaMail.zimbra@redhat.com>
2020-11-09  2:38 ` regression from 5.10.0-rc3: BUG: Bad page state in process kworker/41:0 pfn:891066 during fio on devdax Yi Zhang
2020-11-09  3:00   ` Dan Williams
2020-11-09  3:13     ` Yi Zhang
2020-11-09 12:11   ` Yi Zhang
     [not found]     ` <20201109141216.GD244516@ziepe.ca>
2020-11-09 17:26       ` Dan Williams
     [not found]         ` <20201109175442.GE244516@ziepe.ca>
2020-11-10  0:36           ` Jason Gunthorpe
2020-11-10  7:36             ` Yi Zhang
2020-11-10 16:51               ` Yi Zhang
2020-11-11  3:44                 ` Yi Zhang
2020-11-18 14:02                   ` Yi Zhang [this message]
2020-12-01  1:36                 ` Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51e938d1-aff7-0fa4-1a79-f77ac8bb2f8b@redhat.com \
    --to=yi.zhang@redhat.com \
    --cc=dan.j.williams@intel.com \
    --cc=jgg@nvidia.com \
    --cc=linux-nvdimm@lists.01.org \
    --cc=rcampbell@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.