Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
From: Dmitry Yakunin <zeil@yandex-team.ru>
To: osalvador@suse.de
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	mhocko@kernel.org, mike.kravetz@oracle.com,
	n-horiguchi@ah.jp.nec.com, max7255@yandex-team.ru
Subject: Re: [RFC PATCH v2 00/16] Hwpoison rework {hard,soft}-offline
Date: Thu, 11 Jun 2020 19:43:19 +0300
Message-ID: <20200611164319.16860-1-zeil@yandex-team.ru> (raw)
In-Reply-To: <20191017142123.24245-1-osalvador@suse.de>

Hello!

We are faced with similar problems with hwpoisoned pages
on one of our production clusters after kernel update to stable 4.19.
Application that does a lot of memory allocations sometimes caught SIGBUS signal
with message in dmesg about hardware memory corruption fault.
In kernel and mce logs we saw messages about soft offlining pages with
correctable errors. Those events always had happened before application
was killed. This is not the behavior we expect. We want our application to
continue working on a smaller set of available pages in the system.

This issue is difficult to reproduce, but we suppose that the reason for such
behavior is that compaction does not check for page poisonness while processing
free pages, so as a result valid userspace data gets migrated to bad pages.
We wrote the simple test:
  - soft offline first 4 pages in every 64 continuous pages in ZONE_NORMAL
    through writing pfn to /sys/devices/system/memory/soft_offline_page
  - force compaction by echo 1 >> /proc/sys/vm/compact_memory
Without this patch series after these steps bash became unusable
and every attempt to run any command leads to SIGBUS with message about
hardware memory corruption fault. And after applying this series to our kernel
tree we cannot reproduce such SIGBUSes by our test. On upstream kernel 5.7
this behavior is still reproducible.

So, we want to know, why this patchset wasn't merged to the upstream?
Is there any problems in such rework for {soft,hard}-offline handling?
BTW, this patchset should be updated with upstream changes in mm.

Thanks for you replies.

--
Dmitry Yakunin


  parent reply index

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-17 14:21 Oscar Salvador
2019-10-17 14:21 ` [RFC PATCH v2 01/16] mm,hwpoison: cleanup unused PageHuge() check Oscar Salvador
2019-10-18 11:48   ` Michal Hocko
2019-10-21  7:00     ` Naoya Horiguchi
2019-10-21 12:16       ` Michal Hocko
2019-11-12 12:22       ` Aneesh Kumar K.V
2019-11-13  6:02         ` Naoya Horiguchi
2019-10-17 14:21 ` [RFC PATCH v2 02/16] mm,madvise: call soft_offline_page() without MF_COUNT_INCREASED Oscar Salvador
2019-10-18 11:52   ` Michal Hocko
2019-10-21  7:02     ` Naoya Horiguchi
2019-10-21 12:20       ` Michal Hocko
2019-10-17 14:21 ` [RFC PATCH v2 03/16] mm,madvise: Refactor madvise_inject_error Oscar Salvador
2019-10-21  7:03   ` Naoya Horiguchi
2019-10-17 14:21 ` [RFC PATCH v2 04/16] mm,hwpoison-inject: don't pin for hwpoison_filter Oscar Salvador
2019-10-17 14:21 ` [RFC PATCH v2 05/16] mm,hwpoison: Un-export get_hwpoison_page and make it static Oscar Salvador
2019-10-21  7:03   ` Naoya Horiguchi
2019-10-17 14:21 ` [RFC PATCH v2 06/16] mm,hwpoison: Kill put_hwpoison_page Oscar Salvador
2019-10-21  7:04   ` Naoya Horiguchi
2019-10-17 14:21 ` [RFC PATCH v2 07/16] mm,hwpoison: remove MF_COUNT_INCREASED Oscar Salvador
2019-10-17 14:21 ` [RFC PATCH v2 08/16] mm,hwpoison: remove flag argument from soft offline functions Oscar Salvador
2019-10-17 14:21 ` [RFC PATCH v2 09/16] mm,hwpoison: Unify THP handling for hard and soft offline Oscar Salvador
2019-10-21  7:04   ` Naoya Horiguchi
2019-10-21  9:51     ` [PATCH 17/16] mm,hwpoison: introduce MF_MSG_UNSPLIT_THP Naoya Horiguchi
2019-10-22  8:00       ` Oscar Salvador
2019-10-17 14:21 ` [RFC PATCH v2 10/16] mm,hwpoison: Rework soft offline for free pages Oscar Salvador
2019-10-18 12:06   ` Michal Hocko
2019-10-21 12:58     ` Oscar Salvador
2019-10-21 15:41       ` Michal Hocko
2019-10-22  7:46         ` Oscar Salvador
2019-10-22  8:26           ` Michal Hocko
2019-10-22  8:35             ` Oscar Salvador
2019-10-22  9:22               ` Michal Hocko
2019-10-22  9:58                 ` Oscar Salvador
2019-10-22 10:24                   ` Michal Hocko
2019-10-22 10:33                     ` Oscar Salvador
2019-10-23  2:15                       ` Naoya Horiguchi
2019-10-23  2:01                   ` Naoya Horiguchi
2019-10-21  7:45   ` Naoya Horiguchi
2019-10-22  8:00     ` Oscar Salvador
2019-10-17 14:21 ` [RFC PATCH v2 11/16] mm,hwpoison: Rework soft offline for in-use pages Oscar Salvador
2019-10-18 12:39   ` Michal Hocko
2019-10-21 13:48     ` Oscar Salvador
2019-10-21 14:06       ` Michal Hocko
2019-10-22  7:56         ` Oscar Salvador
2019-10-22  8:30           ` Michal Hocko
2019-10-22  9:40             ` Oscar Salvador
2019-10-17 14:21 ` [RFC PATCH v2 12/16] mm,hwpoison: Refactor soft_offline_huge_page and __soft_offline_page Oscar Salvador
2019-10-17 14:21 ` [RFC PATCH v2 13/16] mm,hwpoison: Take pages off the buddy when hard-offlining Oscar Salvador
2019-10-17 14:21 ` [RFC PATCH v2 14/16] mm,hwpoison: Return 0 if the page is already poisoned in soft-offline Oscar Salvador
2019-10-21  9:20   ` Naoya Horiguchi
2019-10-17 14:21 ` [RFC PATCH v2 15/16] mm/hwpoison-inject: Rip off duplicated checks Oscar Salvador
2019-10-21  9:40   ` David Hildenbrand
2019-10-22  7:57     ` Oscar Salvador
2019-10-17 14:21 ` [RFC PATCH v2 16/16] mm, soft-offline: convert parameter to pfn Oscar Salvador
2019-10-18  8:15   ` David Hildenbrand
2020-06-11 16:43 ` Dmitry Yakunin [this message]
2020-06-15  6:19   ` [RFC PATCH v2 00/16] Hwpoison rework {hard,soft}-offline HORIGUCHI NAOYA(堀口 直也)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200611164319.16860-1-zeil@yandex-team.ru \
    --to=zeil@yandex-team.ru \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=max7255@yandex-team.ru \
    --cc=mhocko@kernel.org \
    --cc=mike.kravetz@oracle.com \
    --cc=n-horiguchi@ah.jp.nec.com \
    --cc=osalvador@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git