linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* KASAN vs ZONE_DEVICE (was: Re: [PATCH v2 2/7] dax: change bdev_dax_supported()...)
@ 2018-06-05  4:22 Dan Williams
  2018-06-05  7:50 ` Dmitry Vyukov
  2018-06-05 14:01 ` Andrey Ryabinin
  0 siblings, 2 replies; 4+ messages in thread
From: Dan Williams @ 2018-06-05  4:22 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Darrick J. Wong, Mike Snitzer, linux-nvdimm,
	Linux Kernel Mailing List, linux-xfs, device-mapper development,
	linux-fsdevel, Dmitry Vyukov, Alexander Potapenko,
	Andrey Ryabinin

[-- Attachment #1: Type: text/plain, Size: 4895 bytes --]

On Mon, Jun 4, 2018 at 8:32 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> [ adding KASAN devs...]
>
> On Mon, Jun 4, 2018 at 4:40 PM, Dan Williams <dan.j.williams@intel.com> wrote:
>> On Sun, Jun 3, 2018 at 6:48 PM, Dan Williams <dan.j.williams@intel.com> wrote:
>>> On Sun, Jun 3, 2018 at 5:25 PM, Dave Chinner <david@fromorbit.com> wrote:
>>>> On Mon, Jun 04, 2018 at 08:20:38AM +1000, Dave Chinner wrote:
>>>>> On Thu, May 31, 2018 at 09:02:52PM -0700, Dan Williams wrote:
>>>>> > On Thu, May 31, 2018 at 7:24 PM, Dave Chinner <david@fromorbit.com> wrote:
>>>>> > > On Thu, May 31, 2018 at 06:57:33PM -0700, Dan Williams wrote:
>>>>> > >> > FWIW, XFS+DAX used to just work on this setup (I hadn't even
>>>>> > >> > installed ndctl until this morning!) but after changing the kernel
>>>>> > >> > it no longer works. That would make it a regression, yes?
>>>>>
>>>>> [....]
>>>>>
>>>>> > >> I suspect your kernel does not have CONFIG_ZONE_DEVICE enabled which
>>>>> > >> has the following dependencies:
>>>>> > >>
>>>>> > >>         depends on MEMORY_HOTPLUG
>>>>> > >>         depends on MEMORY_HOTREMOVE
>>>>> > >>         depends on SPARSEMEM_VMEMMAP
>>>>> > >
>>>>> > > Filesystem DAX now has a dependency on memory hotplug?
>>>>>
>>>>> [....]
>>>>>
>>>>> > > OK, works now I've found the magic config incantantions to turn
>>>>> > > everything I now need on.
>>>>>
>>>>> By enabling these options, my test VM now has a ~30s pause in the
>>>>> boot very soon after the nvdimm subsystem is initialised.
>>>>>
>>>>> [    1.523718] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
>>>>> [    1.550353] 00:05: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
>>>>> [    1.552175] Non-volatile memory driver v1.3
>>>>> [    2.332045] tsc: Refined TSC clocksource calibration: 2199.909 MHz
>>>>> [    2.333280] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x1fb5dcd4620, max_idle_ns: 440795264143 ns
>>>>> [   37.217453] brd: module loaded
>>>>> [   37.225423] loop: module loaded
>>>>> [   37.228441] virtio_blk virtio2: [vda] 10485760 512-byte logical blocks (5.37 GB/5.00 GiB)
>>>>> [   37.245418] virtio_blk virtio3: [vdb] 146800640 512-byte logical blocks (75.2 GB/70.0 GiB)
>>>>> [   37.255794] virtio_blk virtio4: [vdc] 1073741824000 512-byte logical blocks (550 TB/500 TiB)
>>>>> [   37.265403] nd_pmem namespace1.0: unable to guarantee persistence of writes
>>>>> [   37.265618] nd_pmem namespace0.0: unable to guarantee persistence of writes
>>>>>
>>>>> The system does not appear to be consuming CPU, but it is blocking
>>>>> NMIs so I can't get a CPU trace. For a VM that I rely on booting in
>>>>> a few seconds because I reboot it tens of times a day, this is a
>>>>> problem....
>>>>
>>>> And when I turn on KASAN, the kernel fails to boot to a login prompt
>>>> because:
>>>
>>> What's your qemu and kernel command line? I'll take look at this first
>>> thing tomorrow.
>>
>> I was able to reproduce this crash by just turning on KASAN...
>> investigating. It would still help to have your config for our own
>> regression testing purposes it makes sense for us to prioritize
>> "Dave's test config", similar to the priority of not breaking Linus'
>> laptop.
>
> I believe this is a bug in KASAN, or a bug in devm_memremap_pages(),
> depends on your point of view. At the very least it is a mismatch of
> assumptions. KASAN learns of hot added memory via the memory hotplug
> notifier. However, the devm_memremap_pages() implementation is
> intentionally limited to the "first half" of the memory hotplug
> procedure. I.e. it does just enough to setup the linear map for
> pfn_to_page() and initialize the "struct page" memmap, but then stops
> short of onlining the pages. This is why we are getting a NULL ptr
> deref and not a KASAN report, because KASAN has no shadow area setup
> for the linearly mapped pmem range.
>
> In terms of solving it we could refactor kasan_mem_notifier() so that
> devm_memremap_pages() can call it outside of the notifier... I'll give
> this a shot.

Well, the attached patch got me slightly further, but only slightly...

[   14.998394] BUG: KASAN: unknown-crash in pmem_do_bvec+0x19e/0x790 [nd_pmem]
[   15.000006] Read of size 4096 at addr ffff880200000000 by task
systemd-udevd/915
[   15.001991]
[   15.002590] CPU: 15 PID: 915 Comm: systemd-udevd Tainted: G
  OE     4.17.0-rc5+ #1
982
[   15.004783] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.11.1-0-g0551a
4be2c-prebuilt.qemu-project.org 04/01/2014
[   15.007652] Call Trace:
[   15.008339]  dump_stack+0x9a/0xeb
[   15.009344]  print_address_description+0x73/0x280
[   15.010524]  kasan_report+0x258/0x380
[   15.011528]  ? pmem_do_bvec+0x19e/0x790 [nd_pmem]
[   15.012747]  memcpy+0x1f/0x50
[   15.013659]  pmem_do_bvec+0x19e/0x790 [nd_pmem]

...I've exhausted my limited kasan internals knowledge, any ideas what
it's missing?

[-- Attachment #2: patch --]
[-- Type: application/octet-stream, Size: 3119 bytes --]

diff --git a/kernel/memremap.c b/kernel/memremap.c
index 895e6b76b25e..b6f478658719 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -20,6 +20,10 @@
 #include <linux/swap.h>
 #include <linux/swapops.h>
 
+/* for manually notifying of linear map additions */
+#include <linux/memory.h>
+#include "../mm/kasan/kasan.h"
+
 #ifndef ioremap_cache
 /* temporary while we convert existing ioremap_cache users to memremap */
 __weak void __iomem *ioremap_cache(resource_size_t offset, unsigned long size)
@@ -291,6 +295,7 @@ static void devm_memremap_pages_release(void *data)
 	struct device *dev = pgmap->dev;
 	struct resource *res = &pgmap->res;
 	resource_size_t align_start, align_size;
+	struct memory_notify arg;
 	unsigned long pfn;
 
 	for_each_device_pfn(pfn, pgmap)
@@ -309,6 +314,9 @@ static void devm_memremap_pages_release(void *data)
 	mem_hotplug_begin();
 	arch_remove_memory(align_start, align_size, pgmap->altmap_valid ?
 			&pgmap->altmap : NULL);
+	arg.start_pfn = align_start >> PAGE_SHIFT;
+	arg.nr_pages = align_size >> PAGE_SHIFT;
+	kasan_mem_notifier(NULL, MEM_GOING_OFFLINE, &arg);
 	mem_hotplug_done();
 
 	untrack_pfn(NULL, PHYS_PFN(align_start), align_size);
@@ -396,10 +404,16 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap)
 
 	mem_hotplug_begin();
 	error = arch_add_memory(nid, align_start, align_size, altmap, false);
-	if (!error)
+	if (!error) {
+		struct memory_notify arg = {
+			.start_pfn = align_start >> PAGE_SHIFT,
+			.nr_pages = align_size >> PAGE_SHIFT,
+		};
+
 		move_pfn_range_to_zone(&NODE_DATA(nid)->node_zones[ZONE_DEVICE],
-					align_start >> PAGE_SHIFT,
-					align_size >> PAGE_SHIFT, altmap);
+					arg.start_pfn, arg.nr_pages, altmap);
+		kasan_mem_notifier(NULL, MEM_GOING_ONLINE, &arg);
+	}
 	mem_hotplug_done();
 	if (error)
 		goto err_add_memory;
diff --git a/mm/kasan/kasan.c b/mm/kasan/kasan.c
index bc0e68f7dc75..27560aea5e67 100644
--- a/mm/kasan/kasan.c
+++ b/mm/kasan/kasan.c
@@ -792,8 +792,8 @@ DEFINE_ASAN_SET_SHADOW(f5);
 DEFINE_ASAN_SET_SHADOW(f8);
 
 #ifdef CONFIG_MEMORY_HOTPLUG
-static int __meminit kasan_mem_notifier(struct notifier_block *nb,
-			unsigned long action, void *data)
+int kasan_mem_notifier(struct notifier_block *nb, unsigned long action,
+		void *data)
 {
 	struct memory_notify *mem_data = data;
 	unsigned long nr_shadow_pages, start_kaddr, shadow_start;
diff --git a/mm/kasan/kasan.h b/mm/kasan/kasan.h
index c12dcfde2ebd..f56d5ae60536 100644
--- a/mm/kasan/kasan.h
+++ b/mm/kasan/kasan.h
@@ -120,6 +120,17 @@ static inline void quarantine_reduce(void) { }
 static inline void quarantine_remove_cache(struct kmem_cache *cache) { }
 #endif
 
+#if defined(CONFIG_MEMORY_HOTPLUG) && defined(CONFIG_KASAN)
+int kasan_mem_notifier(struct notifier_block *nb, unsigned long action,
+		void *data);
+#else
+static inline int kasan_mem_notifier(struct notifier_block *nb,
+		unsigned long action, void *data)
+{
+	return NOTIFY_OK;
+}
+#endif
+
 /*
  * Exported functions for interfaces called from assembly or from generated
  * code. Declarations here to avoid warning about missing declarations.

^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-06-05 19:10 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-05  4:22 KASAN vs ZONE_DEVICE (was: Re: [PATCH v2 2/7] dax: change bdev_dax_supported()...) Dan Williams
2018-06-05  7:50 ` Dmitry Vyukov
2018-06-05 14:01 ` Andrey Ryabinin
2018-06-05 19:10   ` Dan Williams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).