* + media-v4l2-core-set-pages-dirty-upon-releasing-dma-buffers.patch added to -mm tree
[not found] <20191206170123.cb3ad1f76af2b48505fabb33@linux-foundation.org>
@ 2019-12-10 1:16 ` Andrew Morton
2019-12-11 20:16 ` + revert-ipcsem-remove-uneeded-sem_undo_list-lock-usage-in-exit_sem.patch " Andrew Morton
` (2 subsequent siblings)
3 siblings, 0 replies; 4+ messages in thread
From: Andrew Morton @ 2019-12-10 1:16 UTC (permalink / raw)
To: airlied, alex.williamson, aneesh.kumar, axboe, benh, bjorn.topel,
corbet, dan.j.williams, daniel.vetter, daniel, davem, david, hch,
hverkuil-cisco, ira.weiny, jack, jgg, jgg, jglisse, jhubbard,
kirill.shutemov, leonro, magnus.karlsson, mchehab, mhocko,
mike.kravetz, mm-commits, mpe, paulus, rppt, shuah, stable,
vbabka, viro
The patch titled
Subject: media/v4l2-core: set pages dirty upon releasing DMA buffers
has been added to the -mm tree. Its filename is
media-v4l2-core-set-pages-dirty-upon-releasing-dma-buffers.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/media-v4l2-core-set-pages-dirty-upon-releasing-dma-buffers.patch
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/media-v4l2-core-set-pages-dirty-upon-releasing-dma-buffers.patch
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: John Hubbard <jhubbard@nvidia.com>
Subject: media/v4l2-core: set pages dirty upon releasing DMA buffers
After DMA is complete, and the device and CPU caches are synchronized,
it's still required to mark the CPU pages as dirty, if the data was coming
from the device. However, this driver was just issuing a bare put_page()
call, without any set_page_dirty*() call.
Fix the problem, by calling set_page_dirty_lock() if the CPU pages were
potentially receiving data from the device.
Link: http://lkml.kernel.org/r/20191209225344.99740-18-jhubbard@nvidia.com
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: <stable@vger.kernel.org>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Björn Töpel <bjorn.topel@intel.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: David Airlie <airlied@linux.ie>
Cc: "David S . Miller" <davem@davemloft.net>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Leon Romanovsky <leonro@mellanox.com>
Cc: Magnus Karlsson <magnus.karlsson@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
drivers/media/v4l2-core/videobuf-dma-sg.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
--- a/drivers/media/v4l2-core/videobuf-dma-sg.c~media-v4l2-core-set-pages-dirty-upon-releasing-dma-buffers
+++ a/drivers/media/v4l2-core/videobuf-dma-sg.c
@@ -349,8 +349,11 @@ int videobuf_dma_free(struct videobuf_dm
BUG_ON(dma->sglen);
if (dma->pages) {
- for (i = 0; i < dma->nr_pages; i++)
+ for (i = 0; i < dma->nr_pages; i++) {
+ if (dma->direction == DMA_FROM_DEVICE)
+ set_page_dirty_lock(dma->pages[i]);
put_page(dma->pages[i]);
+ }
kfree(dma->pages);
dma->pages = NULL;
}
_
Patches currently in -mm which might be from jhubbard@nvidia.com are
mm-gup-factor-out-duplicate-code-from-four-routines.patch
mm-gup-move-try_get_compound_head-to-top-fix-minor-issues.patch
mm-devmap-refactor-1-based-refcounting-for-zone_device-pages.patch
goldish_pipe-rename-local-pin_user_pages-routine.patch
mm-fix-get_user_pages_remotes-handling-of-foll_longterm.patch
vfio-fix-foll_longterm-use-simplify-get_user_pages_remote-call.patch
mm-gup-allow-foll_force-for-get_user_pages_fast.patch
ib-umem-use-get_user_pages_fast-to-pin-dma-pages.patch
mm-gup-introduce-pin_user_pages-and-foll_pin.patch
goldish_pipe-convert-to-pin_user_pages-and-put_user_page.patch
ib-corehwumem-set-foll_pin-via-pin_user_pages-fix-up-odp.patch
mm-process_vm_access-set-foll_pin-via-pin_user_pages_remote.patch
drm-via-set-foll_pin-via-pin_user_pages_fast.patch
fs-io_uring-set-foll_pin-via-pin_user_pages.patch
net-xdp-set-foll_pin-via-pin_user_pages.patch
media-v4l2-core-set-pages-dirty-upon-releasing-dma-buffers.patch
media-v4l2-core-pin_user_pages-foll_pin-and-put_user_page-conversion.patch
vfio-mm-pin_user_pages-foll_pin-and-put_user_page-conversion.patch
powerpc-book3s64-convert-to-pin_user_pages-and-put_user_page.patch
powerpc-book3s64-convert-to-pin_user_pages-and-put_user_page-fix.patch
mm-gup_benchmark-use-proper-foll_write-flags-instead-of-hard-coding-1.patch
mm-tree-wide-rename-put_user_page-to-unpin_user_page.patch
mm-gup-pass-flags-arg-to-__gup_device_-functions.patch
mm-gup-track-foll_pin-pages.patch
mm-gup_benchmark-support-pin_user_pages-and-related-calls.patch
selftests-vm-run_vmtests-invoke-gup_benchmark-with-basic-foll_pin-coverage.patch
^ permalink raw reply [flat|nested] 4+ messages in thread
* + revert-ipcsem-remove-uneeded-sem_undo_list-lock-usage-in-exit_sem.patch added to -mm tree
[not found] <20191206170123.cb3ad1f76af2b48505fabb33@linux-foundation.org>
2019-12-10 1:16 ` + media-v4l2-core-set-pages-dirty-upon-releasing-dma-buffers.patch added to -mm tree Andrew Morton
@ 2019-12-11 20:16 ` Andrew Morton
2019-12-11 21:02 ` + mm-oom-fix-pgtables-units-mismatch-in-killed-process-message.patch " Andrew Morton
2019-12-11 21:13 ` + mm-fix-uninitialized-memmaps-on-a-partially-populated-last-section.patch " Andrew Morton
3 siblings, 0 replies; 4+ messages in thread
From: Andrew Morton @ 2019-12-11 20:16 UTC (permalink / raw)
To: arnd, catalin.marinas, dave, herton, ioanna-maria.alifieraki,
jay.vosburgh, joel, malat, manfred, mm-commits, stable
The patch titled
Subject: Revert "ipc,sem: remove uneeded sem_undo_list lock usage in exit_sem()"
has been added to the -mm tree. Its filename is
revert-ipcsem-remove-uneeded-sem_undo_list-lock-usage-in-exit_sem.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/revert-ipcsem-remove-uneeded-sem_undo_list-lock-usage-in-exit_sem.patch
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/revert-ipcsem-remove-uneeded-sem_undo_list-lock-usage-in-exit_sem.patch
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Ioanna Alifieraki <ioanna-maria.alifieraki@canonical.com>
Subject: Revert "ipc,sem: remove uneeded sem_undo_list lock usage in exit_sem()"
This reverts commit a97955844807e327df11aa33869009d14d6b7de0.
Commit a97955844807 ("ipc,sem: remove uneeded sem_undo_list lock usage in
exit_sem()") removes a lock that is needed. This leads to a process
looping infinitely in exit_sem() and can also lead to a crash. There is a
reproducer available in [1] and with the commit reverted the issue does
not reproduce anymore.
Using the reproducer found in [1] is fairly easy to reach a point where
one of the child processes is looping infinitely in exit_sem between
for(;;) and if (semid == -1) block, while it's trying to free its last
sem_undo structure which has already been freed by freeary().
Each sem_undo struct is on two lists: one per semaphore set (list_id) and
one per process (list_proc). The list_id list tracks undos by semaphore
set, and the list_proc by process.
Undo structures are removed either by freeary() or by exit_sem(). The
freeary function is invoked when the user invokes a syscall to remove a
semaphore set. During this operation freeary() traverses the list_id
associated with the semaphore set and removes the undo structures from
both the list_id and list_proc lists.
For this case, exit_sem() is called at process exit. Each process
contains a struct sem_undo_list (referred to as "ulp") which contains the
head for the list_proc list. When the process exits, exit_sem() traverses
this list to remove each sem_undo struct. As in freeary(), whenever a
sem_undo struct is removed from list_proc, it is also removed from the
list_id list.
Removing elements from list_id is safe for both exit_sem() and freeary()
due to sem_lock(). Removing elements from list_proc is not safe;
freeary() locks &un->ulp->lock when it performs
list_del_rcu(&un->list_proc) but exit_sem() does not (locking was removed
by commit a97955844807 ("ipc,sem: remove uneeded sem_undo_list lock usage
in exit_sem()").
This can result in the following situation while executing the reproducer
[1] : Consider a child process in exit_sem() and the parent in freeary()
(because of semctl(sid[i], NSEM, IPC_RMID)). The list_proc for the child
contains the last two undo structs A and B (the rest have been removed
either by exit_sem() or freeary()). The semid for A is 1 and semid for B
is 2. exit_sem() removes A and at the same time freeary() removes B.
Since A and B have different semid sem_lock() will acquire different locks
for each process and both can proceed. The bug is that they remove A and
B from the same list_proc at the same time because only freeary() acquires
the ulp lock. When exit_sem() removes A it makes ulp->list_proc.next to
point at B and at the same time freeary() removes B setting B->semid=-1.
At the next iteration of for(;;) loop exit_sem() will try to remove B.
The only way to break from for(;;) is for (&un->list_proc ==
&ulp->list_proc) to be true which is not. Then exit_sem() will check if
B->semid=-1 which is and will continue looping in for(;;) until the memory
for B is reallocated and the value at B->semid is changed. At that point,
exit_sem() will crash attempting to unlink B from the lists (this can be
easily triggered by running the reproducer [1] a second time).
To prove this scenario instrumentation was added to keep information about
each sem_undo (un) struct that is removed per process and per semaphore
set (sma).
CPU0 CPU1
[caller holds sem_lock(sma for A)] ...
freeary() exit_sem()
... ...
... sem_lock(sma for B)
spin_lock(A->ulp->lock) ...
list_del_rcu(un_A->list_proc) list_del_rcu(un_B->list_proc)
Undo structures A and B have different semid and sem_lock() operations
proceed. However they belong to the same list_proc list and they are
removed at the same time. This results into ulp->list_proc.next pointing
to the address of B which is already removed.
After reverting commit a97955844807 ("ipc,sem: remove uneeded
sem_undo_list lock usage in exit_sem()") the issue was no longer
reproducible.
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1694779
Link: http://lkml.kernel.org/r/20191211191318.11860-1-ioanna-maria.alifieraki@canonical.com
Fixes: a97955844807 ("ipc,sem: remove uneeded sem_undo_list lock usage in exit_sem()")
Signed-off-by: Ioanna Alifieraki <ioanna-maria.alifieraki@canonical.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Herton Krzesinski <herton@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: <malat@debian.org>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Jay Vosburgh <jay.vosburgh@canonical.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
ipc/sem.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
--- a/ipc/sem.c~revert-ipcsem-remove-uneeded-sem_undo_list-lock-usage-in-exit_sem
+++ a/ipc/sem.c
@@ -2368,11 +2368,9 @@ void exit_sem(struct task_struct *tsk)
ipc_assert_locked_object(&sma->sem_perm);
list_del(&un->list_id);
- /* we are the last process using this ulp, acquiring ulp->lock
- * isn't required. Besides that, we are also protected against
- * IPC_RMID as we hold sma->sem_perm lock now
- */
+ spin_lock(&ulp->lock);
list_del_rcu(&un->list_proc);
+ spin_unlock(&ulp->lock);
/* perform adjustments registered in un */
for (i = 0; i < sma->sem_nsems; i++) {
_
Patches currently in -mm which might be from ioanna-maria.alifieraki@canonical.com are
revert-ipcsem-remove-uneeded-sem_undo_list-lock-usage-in-exit_sem.patch
^ permalink raw reply [flat|nested] 4+ messages in thread
* + mm-oom-fix-pgtables-units-mismatch-in-killed-process-message.patch added to -mm tree
[not found] <20191206170123.cb3ad1f76af2b48505fabb33@linux-foundation.org>
2019-12-10 1:16 ` + media-v4l2-core-set-pages-dirty-upon-releasing-dma-buffers.patch added to -mm tree Andrew Morton
2019-12-11 20:16 ` + revert-ipcsem-remove-uneeded-sem_undo_list-lock-usage-in-exit_sem.patch " Andrew Morton
@ 2019-12-11 21:02 ` Andrew Morton
2019-12-11 21:13 ` + mm-fix-uninitialized-memmaps-on-a-partially-populated-last-section.patch " Andrew Morton
3 siblings, 0 replies; 4+ messages in thread
From: Andrew Morton @ 2019-12-11 21:02 UTC (permalink / raw)
To: akpm, echron, idryomov, mhocko, mm-commits, rientjes, stable
The patch titled
Subject: mm/oom: fix pgtables units mismatch in Killed process message
has been added to the -mm tree. Its filename is
mm-oom-fix-pgtables-units-mismatch-in-killed-process-message.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-oom-fix-pgtables-units-mismatch-in-killed-process-message.patch
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-oom-fix-pgtables-units-mismatch-in-killed-process-message.patch
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Ilya Dryomov <idryomov@gmail.com>
Subject: mm/oom: fix pgtables units mismatch in Killed process message
pr_err() expects kB, but mm_pgtables_bytes() returns the number of bytes.
As everything else is printed in kB, I chose to fix the value rather than
the string.
Before:
[ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
...
[ 1878] 1000 1878 217253 151144 1269760 0 0 python
...
Out of memory: Killed process 1878 (python) total-vm:869012kB, anon-rss:604572kB, file-rss:4kB, shmem-rss:0kB, UID:1000 pgtables:1269760kB oom_score_adj:0
After:
[ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
...
[ 1436] 1000 1436 217253 151890 1294336 0 0 python
...
Out of memory: Killed process 1436 (python) total-vm:869012kB, anon-rss:607516kB, file-rss:44kB, shmem-rss:0kB, UID:1000 pgtables:1264kB oom_score_adj:0
Link: http://lkml.kernel.org/r/20191211202830.1600-1-idryomov@gmail.com
Fixes: 70cb6d267790 ("mm/oom: add oom_score_adj and pgtables to Killed process message")
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Edward Chron <echron@arista.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: David Rientjes <rientjes@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/oom_kill.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/oom_kill.c~mm-oom-fix-pgtables-units-mismatch-in-killed-process-message
+++ a/mm/oom_kill.c
@@ -890,7 +890,7 @@ static void __oom_kill_process(struct ta
K(get_mm_counter(mm, MM_FILEPAGES)),
K(get_mm_counter(mm, MM_SHMEMPAGES)),
from_kuid(&init_user_ns, task_uid(victim)),
- mm_pgtables_bytes(mm), victim->signal->oom_score_adj);
+ mm_pgtables_bytes(mm) >> 10, victim->signal->oom_score_adj);
task_unlock(victim);
/*
_
Patches currently in -mm which might be from idryomov@gmail.com are
mm-oom-fix-pgtables-units-mismatch-in-killed-process-message.patch
^ permalink raw reply [flat|nested] 4+ messages in thread
* + mm-fix-uninitialized-memmaps-on-a-partially-populated-last-section.patch added to -mm tree
[not found] <20191206170123.cb3ad1f76af2b48505fabb33@linux-foundation.org>
` (2 preceding siblings ...)
2019-12-11 21:02 ` + mm-oom-fix-pgtables-units-mismatch-in-killed-process-message.patch " Andrew Morton
@ 2019-12-11 21:13 ` Andrew Morton
3 siblings, 0 replies; 4+ messages in thread
From: Andrew Morton @ 2019-12-11 21:13 UTC (permalink / raw)
To: adobriyan, akpm, bob.picco, dan.j.williams, daniel.m.jordan,
david, mhocko, mhocko, mm-commits, n-horiguchi, osalvador,
pasha.tatashin, sfr, stable, steven.sistare
The patch titled
Subject: mm/page_alloc.c: fix uninitialized memmaps on a partially populated last section
has been added to the -mm tree. Its filename is
mm-fix-uninitialized-memmaps-on-a-partially-populated-last-section.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-fix-uninitialized-memmaps-on-a-partially-populated-last-section.patch
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-fix-uninitialized-memmaps-on-a-partially-populated-last-section.patch
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: David Hildenbrand <david@redhat.com>
Subject: mm/page_alloc.c: fix uninitialized memmaps on a partially populated last section
If max_pfn is not aligned to a section boundary, we can easily run into
BUGs. This can e.g., be triggered on x86-64 under QEMU by specifying a
memory size that is not a multiple of 128MB (e.g., 4097MB, but also
4160MB). I was told that on real HW, we can easily have this scenario
(esp., one of the main reasons sub-section hotadd of devmem was added).
The issue is, that we have a valid memmap (pfn_valid()) for the whole
section, and the whole section will be marked "online".
pfn_to_online_page() will succeed, but the memmap contains garbage.
E.g., doing a "./page-types -r -a 0x144001" when QEMU was started with "-m
4160M" - (see tools/vm/page-types.c):
[ 200.476376] BUG: unable to handle page fault for address: fffffffffffffffe
[ 200.477500] #PF: supervisor read access in kernel mode
[ 200.478334] #PF: error_code(0x0000) - not-present page
[ 200.479076] PGD 59614067 P4D 59614067 PUD 59616067 PMD 0
[ 200.479557] Oops: 0000 [#4] SMP NOPTI
[ 200.479875] CPU: 0 PID: 603 Comm: page-types Tainted: G D W 5.5.0-rc1-next-20191209 #93
[ 200.480646] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu4
[ 200.481648] RIP: 0010:stable_page_flags+0x4d/0x410
[ 200.482061] Code: f3 ff 41 89 c0 48 b8 00 00 00 00 01 00 00 00 45 84 c0 0f 85 cd 02 00 00 48 8b 53 08 48 8b 2b 48f
[ 200.483644] RSP: 0018:ffffb139401cbe60 EFLAGS: 00010202
[ 200.484091] RAX: fffffffffffffffe RBX: fffffbeec5100040 RCX: 0000000000000000
[ 200.484697] RDX: 0000000000000001 RSI: ffffffff9535c7cd RDI: 0000000000000246
[ 200.485313] RBP: ffffffffffffffff R08: 0000000000000000 R09: 0000000000000000
[ 200.485917] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000144001
[ 200.486523] R13: 00007ffd6ba55f48 R14: 00007ffd6ba55f40 R15: ffffb139401cbf08
[ 200.487130] FS: 00007f68df717580(0000) GS:ffff9ec77fa00000(0000) knlGS:0000000000000000
[ 200.487804] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 200.488295] CR2: fffffffffffffffe CR3: 0000000135d48000 CR4: 00000000000006f0
[ 200.488897] Call Trace:
[ 200.489115] kpageflags_read+0xe9/0x140
[ 200.489447] proc_reg_read+0x3c/0x60
[ 200.489755] vfs_read+0xc2/0x170
[ 200.490037] ksys_pread64+0x65/0xa0
[ 200.490352] do_syscall_64+0x5c/0xa0
[ 200.490665] entry_SYSCALL_64_after_hwframe+0x49/0xbe
But it can be triggered much easier via "cat /proc/kpageflags > /dev/null"
after cold/hot plugging a DIMM to such a system:
[root@localhost ~]# cat /proc/kpageflags > /dev/null
[ 111.517275] BUG: unable to handle page fault for address: fffffffffffffffe
[ 111.517907] #PF: supervisor read access in kernel mode
[ 111.518333] #PF: error_code(0x0000) - not-present page
[ 111.518771] PGD a240e067 P4D a240e067 PUD a2410067 PMD 0
This patch fixes that by at least zero-ing out that memmap (so e.g.,
page_to_pfn() will not crash). Commit 907ec5fca3dc ("mm: zero remaining
unavailable struct pages") tried to fix a similar issue, but forgot to
consider this special case.
After this patch, there are still problems to solve. E.g., not all of
these pages falling into a memory hole will actually get initialized later
and set PageReserved - they are only zeroed out - but at least the
immediate crashes are gone. A follow-up patch will take care of this.
Link: http://lkml.kernel.org/r/20191211163201.17179-2-david@redhat.com
Fixes: f7f99100d8d9 ("mm: stop zeroing memory during allocation in vmemmap")
Signed-off-by: David Hildenbrand <david@redhat.com>
Tested-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Steven Sistare <steven.sistare@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Bob Picco <bob.picco@oracle.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: <stable@vger.kernel.org> [4.15+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/page_alloc.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)
--- a/mm/page_alloc.c~mm-fix-uninitialized-memmaps-on-a-partially-populated-last-section
+++ a/mm/page_alloc.c
@@ -6932,7 +6932,8 @@ static u64 zero_pfn_range(unsigned long
* This function also addresses a similar issue where struct pages are left
* uninitialized because the physical address range is not covered by
* memblock.memory or memblock.reserved. That could happen when memblock
- * layout is manually configured via memmap=.
+ * layout is manually configured via memmap=, or when the highest physical
+ * address (max_pfn) does not end on a section boundary.
*/
void __init zero_resv_unavail(void)
{
@@ -6950,7 +6951,16 @@ void __init zero_resv_unavail(void)
pgcnt += zero_pfn_range(PFN_DOWN(next), PFN_UP(start));
next = end;
}
- pgcnt += zero_pfn_range(PFN_DOWN(next), max_pfn);
+
+ /*
+ * Early sections always have a fully populated memmap for the whole
+ * section - see pfn_valid(). If the last section has holes at the
+ * end and that section is marked "online", the memmap will be
+ * considered initialized. Make sure that memmap has a well defined
+ * state.
+ */
+ pgcnt += zero_pfn_range(PFN_DOWN(next),
+ round_up(max_pfn, PAGES_PER_SECTION));
/*
* Struct pages that do not have backing memory. This could be because
_
Patches currently in -mm which might be from david@redhat.com are
mm-fix-uninitialized-memmaps-on-a-partially-populated-last-section.patch
fs-proc-pagec-allow-inspection-of-last-section-and-fix-end-detection.patch
mm-initialize-memmap-of-unavailable-memory-directly.patch
mm-memory_hotplug-shrink-zones-when-offlining-memory.patch
mm-memory_hotplug-poison-memmap-in-remove_pfn_range_from_zone.patch
mm-memory_hotplug-we-always-have-a-zone-in-find_smallestbiggest_section_pfn.patch
mm-memory_hotplug-dont-check-for-all-holes-in-shrink_zone_span.patch
mm-memory_hotplug-drop-local-variables-in-shrink_zone_span.patch
mm-memory_hotplug-cleanup-__remove_pages.patch
^ permalink raw reply [flat|nested] 4+ messages in thread