From: "Verma, Vishal L" <vishal.l.verma@intel.com>
To: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"jmorris@namei.org" <jmorris@namei.org>,
"tiwai@suse.de" <tiwai@suse.de>,
"sashal@kernel.org" <sashal@kernel.org>,
"pasha.tatashin@soleen.com" <pasha.tatashin@soleen.com>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"dave.hansen@linux.intel.com" <dave.hansen@linux.intel.com>,
"david@redhat.com" <david@redhat.com>, "bp@suse.de" <bp@suse.de>,
"Williams, Dan J" <dan.j.williams@intel.com>,
"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
"linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>,
"jglisse@redhat.com" <jglisse@redhat.com>,
"zwisler@kernel.org" <zwisler@kernel.org>,
"mhocko@suse.com" <mhocko@suse.com>,
"Jiang, Dave" <dave.jiang@intel.com>,
"bhelgaas@google.com" <bhelgaas@google.com>,
"Busch, Keith" <keith.busch@intel.com>,
"thomas.lendacky@amd.com" <thomas.lendacky@amd.com>,
"Huang, Ying" <ying.huang@intel.com>,
"Wu, Fengguang" <fengguang.wu@intel.com>,
"baiyaowei@cmss.chinamobile.com" <baiyaowei@cmss.chinamobile.com>
Subject: Re: [v5 0/3] "Hotremove" persistent memory
Date: Thu, 2 May 2019 20:50:30 +0000 [thread overview]
Message-ID: <76dfe7943f2a0ceaca73f5fd23e944dfdc0309d1.camel@intel.com> (raw)
In-Reply-To: <20190502184337.20538-1-pasha.tatashin@soleen.com>
On Thu, 2019-05-02 at 14:43 -0400, Pavel Tatashin wrote:
> The series of operations look like this:
>
> 1. After boot restore /dev/pmem0 to ramdisk to be consumed by apps.
> and free ramdisk.
> 2. Convert raw pmem0 to devdax
> ndctl create-namespace --mode devdax --map mem -e namespace0.0 -f
> 3. Hotadd to System RAM
> echo dax0.0 > /sys/bus/dax/drivers/device_dax/unbind
> echo dax0.0 > /sys/bus/dax/drivers/kmem/new_id
> echo online_movable > /sys/devices/system/memoryXXX/state
> 4. Before reboot hotremove device-dax memory from System RAM
> echo offline > /sys/devices/system/memoryXXX/state
> echo dax0.0 > /sys/bus/dax/drivers/kmem/unbind
Hi Pavel,
I am working on adding this sort of a workflow into a new daxctl command
(daxctl-reconfigure-device)- this will allow changing the 'mode' of a
dax device to kmem, online the resulting memory, and with your patches,
also attempt to offline the memory, and change back to device-dax.
In running with these patches, and testing the offlining part, I ran
into the following lockdep below.
This is with just these three patches on top of -rc7.
[ +0.004886] ======================================================
[ +0.001576] WARNING: possible circular locking dependency detected
[ +0.001506] 5.1.0-rc7+ #13 Tainted: G O
[ +0.000929] ------------------------------------------------------
[ +0.000708] daxctl/22950 is trying to acquire lock:
[ +0.000548] 00000000f4d397f7 (kn->count#424){++++}, at: kernfs_remove_by_name_ns+0x40/0x80
[ +0.000922]
but task is already holding lock:
[ +0.000657] 000000002aa52a9f (mem_sysfs_mutex){+.+.}, at: unregister_memory_section+0x22/0xa0
[ +0.000960]
which lock already depends on the new lock.
[ +0.001001]
the existing dependency chain (in reverse order) is:
[ +0.000837]
-> #3 (mem_sysfs_mutex){+.+.}:
[ +0.000631] __mutex_lock+0x82/0x9a0
[ +0.000477] unregister_memory_section+0x22/0xa0
[ +0.000582] __remove_pages+0xe9/0x520
[ +0.000489] arch_remove_memory+0x81/0xc0
[ +0.000510] devm_memremap_pages_release+0x180/0x270
[ +0.000633] release_nodes+0x234/0x280
[ +0.000483] device_release_driver_internal+0xf4/0x1d0
[ +0.000701] bus_remove_device+0xfc/0x170
[ +0.000529] device_del+0x16a/0x380
[ +0.000459] unregister_dev_dax+0x23/0x50
[ +0.000526] release_nodes+0x234/0x280
[ +0.000487] device_release_driver_internal+0xf4/0x1d0
[ +0.000646] unbind_store+0x9b/0x130
[ +0.000467] kernfs_fop_write+0xf0/0x1a0
[ +0.000510] vfs_write+0xba/0x1c0
[ +0.000438] ksys_write+0x5a/0xe0
[ +0.000521] do_syscall_64+0x60/0x210
[ +0.000489] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ +0.000637]
-> #2 (mem_hotplug_lock.rw_sem){++++}:
[ +0.000717] get_online_mems+0x3e/0x80
[ +0.000491] kmem_cache_create_usercopy+0x2e/0x270
[ +0.000609] kmem_cache_create+0x12/0x20
[ +0.000507] ptlock_cache_init+0x20/0x28
[ +0.000506] start_kernel+0x240/0x4d0
[ +0.000480] secondary_startup_64+0xa4/0xb0
[ +0.000539]
-> #1 (cpu_hotplug_lock.rw_sem){++++}:
[ +0.000784] cpus_read_lock+0x3e/0x80
[ +0.000511] online_pages+0x37/0x310
[ +0.000469] memory_subsys_online+0x34/0x60
[ +0.000611] device_online+0x60/0x80
[ +0.000611] state_store+0x66/0xd0
[ +0.000552] kernfs_fop_write+0xf0/0x1a0
[ +0.000649] vfs_write+0xba/0x1c0
[ +0.000487] ksys_write+0x5a/0xe0
[ +0.000459] do_syscall_64+0x60/0x210
[ +0.000482] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ +0.000646]
-> #0 (kn->count#424){++++}:
[ +0.000669] lock_acquire+0x9e/0x180
[ +0.000471] __kernfs_remove+0x26a/0x310
[ +0.000518] kernfs_remove_by_name_ns+0x40/0x80
[ +0.000583] remove_files.isra.1+0x30/0x70
[ +0.000555] sysfs_remove_group+0x3d/0x80
[ +0.000524] sysfs_remove_groups+0x29/0x40
[ +0.000532] device_remove_attrs+0x42/0x80
[ +0.000522] device_del+0x162/0x380
[ +0.000464] device_unregister+0x16/0x60
[ +0.000505] unregister_memory_section+0x6e/0xa0
[ +0.000591] __remove_pages+0xe9/0x520
[ +0.000492] arch_remove_memory+0x81/0xc0
[ +0.000568] try_remove_memory+0xba/0xd0
[ +0.000510] remove_memory+0x23/0x40
[ +0.000483] dev_dax_kmem_remove+0x29/0x57 [kmem]
[ +0.000608] device_release_driver_internal+0xe4/0x1d0
[ +0.000637] unbind_store+0x9b/0x130
[ +0.000464] kernfs_fop_write+0xf0/0x1a0
[ +0.000685] vfs_write+0xba/0x1c0
[ +0.000594] ksys_write+0x5a/0xe0
[ +0.000449] do_syscall_64+0x60/0x210
[ +0.000481] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ +0.000619]
other info that might help us debug this:
[ +0.000889] Chain exists of:
kn->count#424 --> mem_hotplug_lock.rw_sem --> mem_sysfs_mutex
[ +0.001269] Possible unsafe locking scenario:
[ +0.000652] CPU0 CPU1
[ +0.000505] ---- ----
[ +0.000523] lock(mem_sysfs_mutex);
[ +0.000422] lock(mem_hotplug_lock.rw_sem);
[ +0.000905] lock(mem_sysfs_mutex);
[ +0.000793] lock(kn->count#424);
[ +0.000394]
*** DEADLOCK ***
[ +0.000665] 7 locks held by daxctl/22950:
[ +0.000458] #0: 000000005f6d3c13 (sb_writers#4){.+.+}, at: vfs_write+0x159/0x1c0
[ +0.000943] #1: 00000000e468825d (&of->mutex){+.+.}, at: kernfs_fop_write+0xbd/0x1a0
[ +0.000895] #2: 00000000caa17dbb (&dev->mutex){....}, at: device_release_driver_internal+0x1a/0x1d0
[ +0.001019] #3: 000000002119b22c (device_hotplug_lock){+.+.}, at: remove_memory+0x16/0x40
[ +0.000942] #4: 00000000150c8efe (cpu_hotplug_lock.rw_sem){++++}, at: try_remove_memory+0x2e/0xd0
[ +0.001019] #5: 000000003d6b2a0f (mem_hotplug_lock.rw_sem){++++}, at: percpu_down_write+0x25/0x120
[ +0.001118] #6: 000000002aa52a9f (mem_sysfs_mutex){+.+.}, at: unregister_memory_section+0x22/0xa0
[ +0.001033]
stack backtrace:
[ +0.000507] CPU: 5 PID: 22950 Comm: daxctl Tainted: G O 5.1.0-rc7+ #13
[ +0.000896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.1-0-g0551a4be2c-prebuilt.qemu-project.org 04/01/2014
[ +0.001360] Call Trace:
[ +0.000293] dump_stack+0x85/0xc0
[ +0.000390] print_circular_bug.isra.41.cold.60+0x15c/0x195
[ +0.000651] check_prev_add.constprop.50+0x5fd/0xbe0
[ +0.000563] ? call_rcu_zapped+0x80/0x80
[ +0.000449] __lock_acquire+0xcee/0xfd0
[ +0.000437] lock_acquire+0x9e/0x180
[ +0.000428] ? kernfs_remove_by_name_ns+0x40/0x80
[ +0.000531] __kernfs_remove+0x26a/0x310
[ +0.000451] ? kernfs_remove_by_name_ns+0x40/0x80
[ +0.000529] ? kernfs_name_hash+0x12/0x80
[ +0.000462] kernfs_remove_by_name_ns+0x40/0x80
[ +0.000513] remove_files.isra.1+0x30/0x70
[ +0.000483] sysfs_remove_group+0x3d/0x80
[ +0.000458] sysfs_remove_groups+0x29/0x40
[ +0.000477] device_remove_attrs+0x42/0x80
[ +0.000461] device_del+0x162/0x380
[ +0.000399] device_unregister+0x16/0x60
[ +0.000442] unregister_memory_section+0x6e/0xa0
[ +0.001232] __remove_pages+0xe9/0x520
[ +0.000443] arch_remove_memory+0x81/0xc0
[ +0.000459] try_remove_memory+0xba/0xd0
[ +0.000460] remove_memory+0x23/0x40
[ +0.000461] dev_dax_kmem_remove+0x29/0x57 [kmem]
[ +0.000603] device_release_driver_internal+0xe4/0x1d0
[ +0.000590] unbind_store+0x9b/0x130
[ +0.000409] kernfs_fop_write+0xf0/0x1a0
[ +0.000448] vfs_write+0xba/0x1c0
[ +0.000395] ksys_write+0x5a/0xe0
[ +0.000382] do_syscall_64+0x60/0x210
[ +0.000418] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ +0.000573] RIP: 0033:0x7fd1f7442fa8
[ +0.000407] Code: 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 75 77 0d 00 8b 00 85 c0 75 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 49 89 d4 55
[ +0.002119] RSP: 002b:00007ffd48f58e28 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ +0.000833] RAX: ffffffffffffffda RBX: 000000000210c817 RCX: 00007fd1f7442fa8
[ +0.000795] RDX: 0000000000000007 RSI: 000000000210c817 RDI: 0000000000000003
[ +0.000816] RBP: 0000000000000007 R08: 000000000210c7d0 R09: 00007fd1f74d4e80
[ +0.000808] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003
[ +0.000819] R13: 00007fd1f72b9ce8 R14: 0000000000000000 R15: 00007ffd48f58e70
next prev parent reply other threads:[~2019-05-02 20:50 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-05-02 18:43 [v5 0/3] "Hotremove" persistent memory Pavel Tatashin
2019-05-02 18:43 ` [v5 1/3] device-dax: fix memory and resource leak if hotplug fails Pavel Tatashin
2019-05-02 18:43 ` [v5 2/3] mm/hotplug: make remove_memory() interface useable Pavel Tatashin
2019-05-03 10:06 ` David Hildenbrand
2019-05-06 17:57 ` Dave Hansen
2019-05-06 18:01 ` Dan Williams
2019-05-06 18:04 ` Dave Hansen
2019-05-06 18:18 ` Pavel Tatashin
2019-05-17 18:10 ` Pavel Tatashin
2019-05-06 18:13 ` Pavel Tatashin
2019-05-02 18:43 ` [v5 3/3] device-dax: "Hotremove" persistent memory that is used like normal RAM Pavel Tatashin
2019-05-02 20:50 ` Verma, Vishal L [this message]
2019-05-02 21:44 ` [v5 0/3] "Hotremove" persistent memory Pavel Tatashin
2019-05-02 22:29 ` Verma, Vishal L
2019-05-02 22:36 ` Pavel Tatashin
2019-05-03 21:48 ` Verma, Vishal L
2019-05-15 18:11 ` Pavel Tatashin
2019-05-16 0:42 ` Dan Williams
2019-05-16 7:10 ` David Hildenbrand
2019-05-17 14:09 ` Pavel Tatashin
2019-05-20 7:57 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=76dfe7943f2a0ceaca73f5fd23e944dfdc0309d1.camel@intel.com \
--to=vishal.l.verma@intel.com \
--cc=akpm@linux-foundation.org \
--cc=baiyaowei@cmss.chinamobile.com \
--cc=bhelgaas@google.com \
--cc=bp@suse.de \
--cc=dan.j.williams@intel.com \
--cc=dave.hansen@linux.intel.com \
--cc=dave.jiang@intel.com \
--cc=david@redhat.com \
--cc=fengguang.wu@intel.com \
--cc=jglisse@redhat.com \
--cc=jmorris@namei.org \
--cc=keith.busch@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-nvdimm@lists.01.org \
--cc=mhocko@suse.com \
--cc=pasha.tatashin@soleen.com \
--cc=sashal@kernel.org \
--cc=thomas.lendacky@amd.com \
--cc=tiwai@suse.de \
--cc=ying.huang@intel.com \
--cc=zwisler@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).