From: osstest service owner <osstest-admin@xenproject.org>
To: xen-devel@lists.xensource.com, osstest-admin@xenproject.org
Subject: [linux-3.18 bisection] complete test-amd64-amd64-xl-credit2
Date: Sat, 16 Jul 2016 14:52:57 +0000 [thread overview]
Message-ID: <E1bOQxd-0003AY-SZ@osstest.test-lab.xenproject.org> (raw)
branch xen-unstable
xenbranch xen-unstable
job test-amd64-amd64-xl-credit2
testid xen-boot
Tree: linux git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
Tree: linuxfirmware git://xenbits.xen.org/osstest/linux-firmware.git
Tree: qemu git://xenbits.xen.org/qemu-xen-traditional.git
Tree: qemuu git://xenbits.xen.org/qemu-xen.git
Tree: xen git://xenbits.xen.org/xen.git
*** Found and reproduced problem changeset ***
Bug is in tree: linux git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
Bug introduced: a2d8c514753276394d68414f563591f174ef86cb
Bug not present: 8f620446135b64ca6f96cf32066a76d64e79a388
Last fail repro: http://logs.test-lab.xenproject.org/osstest/logs/97435/
commit a2d8c514753276394d68414f563591f174ef86cb
Author: Lukasz Odzioba <lukasz.odzioba@intel.com>
Date: Fri Jun 24 14:50:01 2016 -0700
mm/swap.c: flush lru pvecs on compound page arrival
[ Upstream commit 8f182270dfec432e93fae14f9208a6b9af01009f ]
Currently we can have compound pages held on per cpu pagevecs, which
leads to a lot of memory unavailable for reclaim when needed. In the
systems with hundreads of processors it can be GBs of memory.
On of the way of reproducing the problem is to not call munmap
explicitly on all mapped regions (i.e. after receiving SIGTERM). After
that some pages (with THP enabled also huge pages) may end up on
lru_add_pvec, example below.
void main() {
#pragma omp parallel
{
size_t size = 55 * 1000 * 1000; // smaller than MEM/CPUS
void *p = mmap(NULL, size, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS , -1, 0);
if (p != MAP_FAILED)
memset(p, 0, size);
//munmap(p, size); // uncomment to make the problem go away
}
}
When we run it with THP enabled it will leave significant amount of
memory on lru_add_pvec. This memory will be not reclaimed if we hit
OOM, so when we run above program in a loop:
for i in `seq 100`; do ./a.out; done
many processes (95% in my case) will be killed by OOM.
The primary point of the LRU add cache is to save the zone lru_lock
contention with a hope that more pages will belong to the same zone and
so their addition can be batched. The huge page is already a form of
batched addition (it will add 512 worth of memory in one go) so skipping
the batching seems like a safer option when compared to a potential
excess in the caching which can be quite large and much harder to fix
because lru_add_drain_all is way to expensive and it is not really clear
what would be a good moment to call it.
Similarly we can reproduce the problem on lru_deactivate_pvec by adding:
madvise(p, size, MADV_FREE); after memset.
This patch flushes lru pvecs on compound page arrival making the problem
less severe - after applying it kill rate of above example drops to 0%,
due to reducing maximum amount of memory held on pvec from 28MB (with
THP) to 56kB per CPU.
Suggested-by: Michal Hocko <mhocko@suse.com>
Link: http://lkml.kernel.org/r/1466180198-18854-1-git-send-email-lukasz.odzioba@intel.com
Signed-off-by: Lukasz Odzioba <lukasz.odzioba@intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Cc: Ming Li <mingli199x@qq.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
For bisection revision-tuple graph see:
http://logs.test-lab.xenproject.org/osstest/results/bisect/linux-3.18/test-amd64-amd64-xl-credit2.xen-boot.html
Revision IDs in each graph node refer, respectively, to the Trees above.
----------------------------------------
Running cs-bisection-step --graph-out=/home/logs/results/bisect/linux-3.18/test-amd64-amd64-xl-credit2.xen-boot --summary-out=tmp/97435.bisection-summary --basis-template=96188 --blessings=real,real-bisect linux-3.18 test-amd64-amd64-xl-credit2 xen-boot
Searching for failure / basis pass:
97377 fail [host=chardonnay0] / 96188 [host=italia0] 96161 [host=baroque1] 95844 [host=godello0] 95809 [host=godello1] 95597 [host=fiano0] 95521 [host=pinot1] 95458 [host=elbling1] 95406 [host=fiano1] 94728 [host=merlot1] 94153 [host=elbling0] 94083 ok.
Failure / basis pass flights: 97377 / 94083
(tree with no url: minios)
(tree with no url: ovmf)
(tree with no url: seabios)
Tree: linux git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
Tree: linuxfirmware git://xenbits.xen.org/osstest/linux-firmware.git
Tree: qemu git://xenbits.xen.org/qemu-xen-traditional.git
Tree: qemuu git://xenbits.xen.org/qemu-xen.git
Tree: xen git://xenbits.xen.org/xen.git
Latest 0ac0a856d986c1ab240753479f5e50fdfab82b14 c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f ea210c52abb6458e39f5365f7f2c3abb9c191c47
Basis pass 6b12ebc0ecce75d7bd3660cd85f8b47a615c2071 c530a75c1e6a472b0eb9558310b518f0dfcd8860 e4ceb77cf88bc44f0b7fe39225c49d660735f327 62b3d206425c245ed0a020390a64640d40d97471 c79fc6c4bee28b40948838a760b4aaadf6b5cd47
Generating revisions with ./adhoc-revtuple-generator git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git#6b12ebc0ecce75d7bd3660cd85f8b47a615c2071-0ac0a856d986c1ab240753479f5e50fdfab82b14 git://xenbits.xen.org/osstest/linux-firmware.git#c530a75c1e6a472b0eb9558310b518f0dfcd8860-c530a75c1e6a472b0eb9558310b518f0dfcd8860 git://xenbits.xen.org/qemu-xen-traditional.git#e4ceb77cf88bc44f0b7fe39225c49d660735f327-6e20809727261599e8527c456eb078c0e89139a1 git://xenbits.xen.org/qemu-xen.git#62b3d206425c245ed0a020390a64640d40d97471-44a072f0de0d57c95c2212bbce02888832b7b74f git://xenbits.xen.org/xen.git#c79fc6c4bee28b40948838a760b4aaadf6b5cd47-ea210c52abb6458e39f5365f7f2c3abb9c191c47
Loaded 12048 nodes in revision graph
Searching for test results:
94035 [host=huxelrebe0]
94083 pass 6b12ebc0ecce75d7bd3660cd85f8b47a615c2071 c530a75c1e6a472b0eb9558310b518f0dfcd8860 e4ceb77cf88bc44f0b7fe39225c49d660735f327 62b3d206425c245ed0a020390a64640d40d97471 c79fc6c4bee28b40948838a760b4aaadf6b5cd47
94056 [host=huxelrebe1]
94153 [host=elbling0]
94728 [host=merlot1]
95406 [host=fiano1]
95458 [host=elbling1]
95521 [host=pinot1]
95597 [host=fiano0]
95809 [host=godello1]
95844 [host=godello0]
96161 [host=baroque1]
96188 [host=italia0]
97278 fail irrelevant
97289 fail 0ac0a856d986c1ab240753479f5e50fdfab82b14 c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f ea210c52abb6458e39f5365f7f2c3abb9c191c47
97321 pass 6b12ebc0ecce75d7bd3660cd85f8b47a615c2071 c530a75c1e6a472b0eb9558310b518f0dfcd8860 e4ceb77cf88bc44f0b7fe39225c49d660735f327 62b3d206425c245ed0a020390a64640d40d97471 c79fc6c4bee28b40948838a760b4aaadf6b5cd47
97346 fail 0ac0a856d986c1ab240753479f5e50fdfab82b14 c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f ea210c52abb6458e39f5365f7f2c3abb9c191c47
97357 pass f27ca140ad82b5e76282cc5b54bfb0a665520d17 c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f 22ea8ad02e465e32cd40887c750b55c3a997a288
97352 pass b5076139991c6b12c62346d9880eec1d4227d99f c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f 87beb45e0b05be76755cac53322aae4f5b426aac
97319 fail 0ac0a856d986c1ab240753479f5e50fdfab82b14 c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f ea210c52abb6458e39f5365f7f2c3abb9c191c47
97389 fail e23042d05035bd64c980ea8f1d9d311972b09104 c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f 22ea8ad02e465e32cd40887c750b55c3a997a288
97404 pass faa35ed7c7dd74a62bb58340e0ba1819ec33e4e1 c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f 22ea8ad02e465e32cd40887c750b55c3a997a288
97423 fail a2d8c514753276394d68414f563591f174ef86cb c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f 22ea8ad02e465e32cd40887c750b55c3a997a288
97393 pass 1b9dc6680de288cb47e0a3c1587ba69879b3c26f c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f 22ea8ad02e465e32cd40887c750b55c3a997a288
97397 pass 4c2b0216cdf54e81f7c0e841b5bb1116701ae25b c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f 22ea8ad02e465e32cd40887c750b55c3a997a288
97364 fail 30888a2ea001e237ae9960de877d6f4d2351d8a2 c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f 22ea8ad02e465e32cd40887c750b55c3a997a288
97376 blocked 71c879eb92223676c4583e130f1b0ce26cddb891 c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f 22ea8ad02e465e32cd40887c750b55c3a997a288
97413 pass 8f620446135b64ca6f96cf32066a76d64e79a388 c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f 22ea8ad02e465e32cd40887c750b55c3a997a288
97379 fail 6d94f01566e30c87ebd42e1175ade4f648735578 c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f 22ea8ad02e465e32cd40887c750b55c3a997a288
97399 fail a2d8c514753276394d68414f563591f174ef86cb c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f 22ea8ad02e465e32cd40887c750b55c3a997a288
97417 fail a2d8c514753276394d68414f563591f174ef86cb c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f 22ea8ad02e465e32cd40887c750b55c3a997a288
97421 pass 8f620446135b64ca6f96cf32066a76d64e79a388 c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f 22ea8ad02e465e32cd40887c750b55c3a997a288
97377 fail 0ac0a856d986c1ab240753479f5e50fdfab82b14 c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f ea210c52abb6458e39f5365f7f2c3abb9c191c47
97428 pass 8f620446135b64ca6f96cf32066a76d64e79a388 c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f 22ea8ad02e465e32cd40887c750b55c3a997a288
97435 fail a2d8c514753276394d68414f563591f174ef86cb c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f 22ea8ad02e465e32cd40887c750b55c3a997a288
Searching for interesting versions
Result found: flight 94083 (pass), for basis pass
Result found: flight 97289 (fail), for basis failure
Repro found: flight 97321 (pass), for basis pass
Repro found: flight 97346 (fail), for basis failure
0 revisions at 8f620446135b64ca6f96cf32066a76d64e79a388 c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f 22ea8ad02e465e32cd40887c750b55c3a997a288
No revisions left to test, checking graph state.
Result found: flight 97413 (pass), for last pass
Result found: flight 97417 (fail), for first failure
Repro found: flight 97421 (pass), for last pass
Repro found: flight 97423 (fail), for first failure
Repro found: flight 97428 (pass), for last pass
Repro found: flight 97435 (fail), for first failure
*** Found and reproduced problem changeset ***
Bug is in tree: linux git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
Bug introduced: a2d8c514753276394d68414f563591f174ef86cb
Bug not present: 8f620446135b64ca6f96cf32066a76d64e79a388
Last fail repro: http://logs.test-lab.xenproject.org/osstest/logs/97435/
commit a2d8c514753276394d68414f563591f174ef86cb
Author: Lukasz Odzioba <lukasz.odzioba@intel.com>
Date: Fri Jun 24 14:50:01 2016 -0700
mm/swap.c: flush lru pvecs on compound page arrival
[ Upstream commit 8f182270dfec432e93fae14f9208a6b9af01009f ]
Currently we can have compound pages held on per cpu pagevecs, which
leads to a lot of memory unavailable for reclaim when needed. In the
systems with hundreads of processors it can be GBs of memory.
On of the way of reproducing the problem is to not call munmap
explicitly on all mapped regions (i.e. after receiving SIGTERM). After
that some pages (with THP enabled also huge pages) may end up on
lru_add_pvec, example below.
void main() {
#pragma omp parallel
{
size_t size = 55 * 1000 * 1000; // smaller than MEM/CPUS
void *p = mmap(NULL, size, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS , -1, 0);
if (p != MAP_FAILED)
memset(p, 0, size);
//munmap(p, size); // uncomment to make the problem go away
}
}
When we run it with THP enabled it will leave significant amount of
memory on lru_add_pvec. This memory will be not reclaimed if we hit
OOM, so when we run above program in a loop:
for i in `seq 100`; do ./a.out; done
many processes (95% in my case) will be killed by OOM.
The primary point of the LRU add cache is to save the zone lru_lock
contention with a hope that more pages will belong to the same zone and
so their addition can be batched. The huge page is already a form of
batched addition (it will add 512 worth of memory in one go) so skipping
the batching seems like a safer option when compared to a potential
excess in the caching which can be quite large and much harder to fix
because lru_add_drain_all is way to expensive and it is not really clear
what would be a good moment to call it.
Similarly we can reproduce the problem on lru_deactivate_pvec by adding:
madvise(p, size, MADV_FREE); after memset.
This patch flushes lru pvecs on compound page arrival making the problem
less severe - after applying it kill rate of above example drops to 0%,
due to reducing maximum amount of memory held on pvec from 28MB (with
THP) to 56kB per CPU.
Suggested-by: Michal Hocko <mhocko@suse.com>
Link: http://lkml.kernel.org/r/1466180198-18854-1-git-send-email-lukasz.odzioba@intel.com
Signed-off-by: Lukasz Odzioba <lukasz.odzioba@intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Cc: Ming Li <mingli199x@qq.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
dot: graph is too large for cairo-renderer bitmaps. Scaling by 0.719964 to fit
pnmtopng: 47 colors found
Revision graph left in /home/logs/results/bisect/linux-3.18/test-amd64-amd64-xl-credit2.xen-boot.{dot,ps,png,html,svg}.
----------------------------------------
97435: tolerable ALL FAIL
flight 97435 linux-3.18 real-bisect [real]
http://logs.test-lab.xenproject.org/osstest/logs/97435/
Failures :-/ but no regressions.
Tests which did not succeed,
including tests which could not be run:
test-amd64-amd64-xl-credit2 6 xen-boot fail baseline untested
jobs:
test-amd64-amd64-xl-credit2 fail
------------------------------------------------------------
sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images
Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs
Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master
Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
next reply other threads:[~2016-07-16 14:52 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-07-16 14:52 osstest service owner [this message]
2018-08-27 4:40 [linux-3.18 bisection] complete test-amd64-amd64-xl-credit2 osstest service owner
2019-02-02 0:52 osstest service owner
2019-05-22 5:49 osstest service owner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=E1bOQxd-0003AY-SZ@osstest.test-lab.xenproject.org \
--to=osstest-admin@xenproject.org \
--cc=xen-devel@lists.xensource.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).