nvdimm.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: Yi Zhang <yi.zhang@redhat.com>
To: dan.j.williams@intel.com, jgg@ziepe.ca
Cc: linux-nvdimm@lists.01.org
Subject: Re: regression from 5.10.0-rc3: BUG: Bad page state in process kworker/41:0 pfn:891066 during fio on devdax
Date: Mon, 9 Nov 2020 20:11:03 +0800	[thread overview]
Message-ID: <4ed7ea52-20be-68fe-f920-238ba358395c@redhat.com> (raw)
In-Reply-To: <1687234809.1086398.1604889506963.JavaMail.zimbra@redhat.com>

Hi Dan

By bisecting, this issue was introduced with bellow patch

commit f8f6ae5d077a9bdaf5cbf2ac960a5d1a04b47482
Author: Jason Gunthorpe <jgg@ziepe.ca>
Date:   Sun Nov 1 17:08:00 2020 -0800

     mm: always have io_remap_pfn_range() set pgprot_decrypted()

     The purpose of io_remap_pfn_range() is to map IO memory, such as a
     memory mapped IO exposed through a PCI BAR.  IO devices do not
     understand encryption, so this memory must always be decrypted.
     Automatically call pgprot_decrypted() as part of the generic
     implementation.

     This fixes a bug where enabling AMD SME causes subsystems, such as 
RDMA,
     using io_remap_pfn_range() to expose BAR pages to user space to fail.
     The CPU will encrypt access to those BAR pages instead of passing
     unencrypted IO directly to the device.

     Places not mapping IO should use remap_pfn_range().


On 11/9/20 10:38 AM, Yi Zhang wrote:
> Hello
>
> I found this regression during devdax fio test on 5.10.0-rc3, could anyone help check it, thanks.
>
> [  303.441089] memmap_init_zone_device initialised 2063872 pages in 34ms
> [  303.501085] memmap_init_zone_device initialised 2063872 pages in 34ms
> [  303.556891] memmap_init_zone_device initialised 2063872 pages in 24ms
> [  303.612790] memmap_init_zone_device initialised 2063872 pages in 24ms
> [  326.779920] perf: interrupt took too long (2714 > 2500), lowering kernel.perf_event_max_sample_rate to 73000
> [  334.857133] perf: interrupt took too long (3737 > 3392), lowering kernel.perf_event_max_sample_rate to 53000
> [  366.202597] memmap_init_zone_device initialised 1835008 pages in 21ms
> [  366.255031] memmap_init_zone_device initialised 1835008 pages in 22ms
> [  366.317048] memmap_init_zone_device initialised 1835008 pages in 31ms
> [  366.377970] memmap_init_zone_device initialised 1835008 pages in 32ms
> [  368.785285] BUG: Bad page state in process kworker/41:0  pfn:891066
> [  368.818471] page:00000000581ab220 refcount:0 mapcount:-1024 mapping:0000000000000000 index:0x0 pfn:0x891066
> [  368.865117] flags: 0x57ffffc0000000()
> [  368.882138] raw: 0057ffffc0000000 dead000000000100 dead000000000122 0000000000000000
> [  368.917429] raw: 0000000000000000 0000000000000000 00000000fffffbff 0000000000000000
> [  368.952788] page dumped because: nonzero mapcount
> [  368.974190] Modules linked in: rfkill sunrpc vfat fat dm_multipath intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp mgag200 ipmi_ssif i2c_algo_bit kvm_intel drm_kms_helper syscopyarea acpi_ipmi sysfillrect kvm sysimgblt ipmi_si fb_sys_fops iTCO_wdt iTCO_vendor_support ipmi_devintf drm irqbypass crct10dif_pclmul ipmi_msghandler crc32_pclmul i2c_i801 ghash_clmulni_intel dax_pmem_compat rapl device_dax i2c_smbus intel_cstate ioatdma intel_uncore joydev hpilo dax_pmem_core pcspkr acpi_tad hpwdt lpc_ich dca acpi_power_meter ip_tables xfs sr_mod cdrom sd_mod t10_pi sg nd_pmem nd_btt ahci nfit bnx2x libahci libata tg3 libnvdimm hpsa mdio libcrc32c scsi_transport_sas wmi crc32c_intel dm_mirror dm_region_hash dm_log dm_mod
> [  369.281195] CPU: 41 PID: 3258 Comm: kworker/41:0 Tainted: G S                5.10.0-rc3 #1
> [  369.321037] Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 10/05/2016
> [  369.363640] Workqueue: mm_percpu_wq vmstat_update
> [  369.385044] Call Trace:
> [  369.388275] perf: interrupt took too long (5477 > 4671), lowering kernel.perf_event_max_sample_rate to 36000
> [  369.396225]  dump_stack+0x57/0x6a
> [  369.411391]  bad_page.cold.114+0x9b/0xa0
> [  369.429316]  free_pcppages_bulk+0x538/0x760
> [  369.448465]  drain_zone_pages+0x1f/0x30
> [  369.466027]  refresh_cpu_vm_stats+0x1ea/0x2b0
> [  369.485972]  vmstat_update+0xf/0x50
> [  369.502064]  process_one_work+0x1a4/0x340
> [  369.520412]  ? process_one_work+0x340/0x340
> [  369.539510]  worker_thread+0x30/0x370
> [  369.555744]  ? process_one_work+0x340/0x340
> [  369.574765]  kthread+0x116/0x130
> [  369.589612]  ? kthread_park+0x80/0x80
> [  369.606231]  ret_from_fork+0x22/0x30
> [  369.622910] Disabling lock debugging due to kernel taint
> [  393.619285] perf: interrupt took too long (6874 > 6846), lowering kernel.perf_event_max_sample_rate to 29000
> [  397.904036] BUG: Bad page state in process kworker/57:1  pfn:189525
> [  397.936971] page:00000000be782875 refcount:0 mapcount:-1024 mapping:0000000000000000 index:0x0 pfn:0x189525
> [  397.984722] flags: 0x17ffffc0000000()
> [  398.002324] raw: 0017ffffc0000000 dead000000000100 dead000000000122 0000000000000000
> [  398.039032] raw: 0000000000000000 0000000000000000 00000000fffffbff 0000000000000000
> [  398.075804] page dumped because: nonzero mapcount
> [  398.098130] Modules linked in: rfkill sunrpc vfat fat dm_multipath intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp mgag200 ipmi_ssif i2c_algo_bit kvm_intel drm_kms_helper syscopyarea acpi_ipmi sysfillrect kvm sysimgblt ipmi_si fb_sys_fops iTCO_wdt iTCO_vendor_support ipmi_devintf drm irqbypass crct10dif_pclmul ipmi_msghandler crc32_pclmul i2c_i801 ghash_clmulni_intel dax_pmem_compat rapl device_dax i2c_smbus intel_cstate ioatdma intel_uncore joydev hpilo dax_pmem_core pcspkr acpi_tad hpwdt lpc_ich dca acpi_power_meter ip_tables xfs sr_mod cdrom sd_mod t10_pi sg nd_pmem nd_btt ahci nfit bnx2x libahci libata tg3 libnvdimm hpsa mdio libcrc32c scsi_transport_sas wmi crc32c_intel dm_mirror dm_region_hash dm_log dm_mod
> [  398.413042] CPU: 57 PID: 587 Comm: kworker/57:1 Tainted: G S  B             5.10.0-rc3 #1
> [  398.455914] Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 10/05/2016
> [  398.496657] Workqueue: mm_percpu_wq vmstat_update
> [  398.518938] Call Trace:
> [  398.530673]  dump_stack+0x57/0x6a
> [  398.546463]  bad_page.cold.114+0x9b/0xa0
> [  398.564977]  free_pcppages_bulk+0x538/0x760
> [  398.584697]  drain_zone_pages+0x1f/0x30
> [  398.602907]  refresh_cpu_vm_stats+0x1ea/0x2b0
> [  398.623681]  vmstat_update+0xf/0x50
> [  398.640415]  process_one_work+0x1a4/0x340
> [  398.659517]  ? process_one_work+0x340/0x340
> [  398.678659]  worker_thread+0x30/0x370
> [  398.695506]  ? process_one_work+0x340/0x340
> [  398.715204]  kthread+0x116/0x130
> [  398.730572]  ? kthread_park+0x80/0x80
> [  398.747761]  ret_from_fork+0x22/0x30
>
>
>
>
> Best Regards,
>    Yi Zhang
>
> _______________________________________________
> Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
> To unsubscribe send an email to linux-nvdimm-leave@lists.01.org
>
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

  parent reply	other threads:[~2020-11-09 12:11 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1934921834.1085815.1604889035798.JavaMail.zimbra@redhat.com>
2020-11-09  2:38 ` regression from 5.10.0-rc3: BUG: Bad page state in process kworker/41:0 pfn:891066 during fio on devdax Yi Zhang
2020-11-09  3:00   ` Dan Williams
2020-11-09  3:13     ` Yi Zhang
2020-11-09 12:11   ` Yi Zhang [this message]
     [not found]     ` <20201109141216.GD244516@ziepe.ca>
2020-11-09 17:26       ` Dan Williams
     [not found]         ` <20201109175442.GE244516@ziepe.ca>
2020-11-10  0:36           ` Jason Gunthorpe
2020-11-10  7:36             ` Yi Zhang
2020-11-10 16:51               ` Yi Zhang
2020-11-11  3:44                 ` Yi Zhang
2020-11-18 14:02                   ` Yi Zhang
2020-12-01  1:36                 ` Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4ed7ea52-20be-68fe-f920-238ba358395c@redhat.com \
    --to=yi.zhang@redhat.com \
    --cc=dan.j.williams@intel.com \
    --cc=jgg@ziepe.ca \
    --cc=linux-nvdimm@lists.01.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).