From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 5CFC9209818AA for ; Sat, 2 Jun 2018 22:33:33 -0700 (PDT) Subject: [PATCH v2 10/11] mm, memory_failure: Teach memory_failure() about dev_pagemap pages From: Dan Williams Date: Sat, 02 Jun 2018 22:23:36 -0700 Message-ID: <152800341624.17112.13515547552061692915.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <152800336321.17112.3300876636370683279.stgit@dwillia2-desk3.amr.corp.intel.com> References: <152800336321.17112.3300876636370683279.stgit@dwillia2-desk3.amr.corp.intel.com> MIME-Version: 1.0 List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" To: linux-nvdimm@lists.01.org Cc: jack@suse.cz, Matthew Wilcox , linux-mm@kvack.org, =?utf-8?b?SsOpcsO0bWU=?= Glisse , linux-fsdevel@vger.kernel.org, Naoya Horiguchi , Christoph Hellwig List-ID: ICAgIG1jZTogVW5jb3JyZWN0ZWQgaGFyZHdhcmUgbWVtb3J5IGVycm9yIGluIHVzZXItYWNjZXNz IGF0IGFmMzQyMTQyMDAKICAgIHsxfVtIYXJkd2FyZSBFcnJvcl06IEl0IGhhcyBiZWVuIGNvcnJl Y3RlZCBieSBoL3cgYW5kIHJlcXVpcmVzIG5vIGZ1cnRoZXIgYWN0aW9uCiAgICBtY2U6IFtIYXJk d2FyZSBFcnJvcl06IE1hY2hpbmUgY2hlY2sgZXZlbnRzIGxvZ2dlZAogICAgezF9W0hhcmR3YXJl IEVycm9yXTogZXZlbnQgc2V2ZXJpdHk6IGNvcnJlY3RlZAogICAgTWVtb3J5IGZhaWx1cmU6IDB4 YWYzNDIxNDogcmVzZXJ2ZWQga2VybmVsIHBhZ2Ugc3RpbGwgcmVmZXJlbmNlZCBieSAxIHVzZXJz CiAgICBbLi5dCiAgICBNZW1vcnkgZmFpbHVyZTogMHhhZjM0MjE0OiByZWNvdmVyeSBhY3Rpb24g Zm9yIHJlc2VydmVkIGtlcm5lbCBwYWdlOiBGYWlsZWQKICAgIG1jZTogTWVtb3J5IGVycm9yIG5v dCByZWNvdmVyZWQKCkluIGNvbnRyYXN0IHRvIHR5cGljYWwgbWVtb3J5LCBkZXZfcGFnZW1hcCBw YWdlcyBtYXkgYmUgZGF4IG1hcHBlZC4gV2l0aApkYXggdGhlcmUgaXMgbm8gcG9zc2liaWxpdHkg dG8gbWFwIGluIGFub3RoZXIgcGFnZSBkeW5hbWljYWxseSBzaW5jZSBkYXgKZXN0YWJsaXNoZXMg MToxIHBoeXNpY2FsIGFkZHJlc3MgdG8gZmlsZSBvZmZzZXQgYXNzb2NpYXRpb25zLiBBbHNvCmRl dl9wYWdlbWFwIHBhZ2VzIGFzc29jaWF0ZWQgd2l0aCBOVkRJTU0gLyBwZXJzaXN0ZW50IG1lbW9y eSBkZXZpY2VzIGNhbgppbnRlcm5hbCByZW1hcC9yZXBhaXIgYWRkcmVzc2VzIHdpdGggcG9pc29u LiBXaGlsZSBtZW1vcnlfZmFpbHVyZSgpCmFzc3VtZXMgdGhhdCBpdCBjYW4gZGlzY2FyZCB0eXBp Y2FsIHBvaXNvbmVkIHBhZ2VzIGFuZCBrZWVwIHRoZW0KdW5tYXBwZWQgaW5kZWZpbml0ZWx5LCBk ZXZfcGFnZW1hcCBwYWdlcyBtYXkgYmUgcmV0dXJuZWQgdG8gc2VydmljZQphZnRlciB0aGUgZXJy b3IgaXMgY2xlYXJlZC4KClRlYWNoIG1lbW9yeV9mYWlsdXJlKCkgdG8gZGV0ZWN0IGFuZCBoYW5k bGUgTUVNT1JZX0RFVklDRV9IT1NUCmRldl9wYWdlbWFwIHBhZ2VzIHRoYXQgaGF2ZSBwb2lzb24g Y29uc3VtZWQgYnkgdXNlcnNwYWNlLiBNYXJrIHRoZQptZW1vcnkgYXMgVUMgaW5zdGVhZCBvZiB1 bm1hcHBpbmcgaXQgY29tcGxldGVseSB0byBhbGxvdyBvbmdvaW5nIGFjY2Vzcwp2aWEgdGhlIGRl dmljZSBkcml2ZXIgKG5kX3BtZW0pLiBMYXRlciwgbmRfcG1lbSB3aWxsIGdyb3cgc3VwcG9ydCBm b3IKbWFya2luZyB0aGUgcGFnZSBiYWNrIHRvIFdCIHdoZW4gdGhlIGVycm9yIGlzIGNsZWFyZWQu CgpDYzogSmFuIEthcmEgPGphY2tAc3VzZS5jej4KQ2M6IENocmlzdG9waCBIZWxsd2lnIDxoY2hA bHN0LmRlPgpDYzogSsOpcsO0bWUgR2xpc3NlIDxqZ2xpc3NlQHJlZGhhdC5jb20+CkNjOiBNYXR0 aGV3IFdpbGNveCA8bWF3aWxjb3hAbWljcm9zb2Z0LmNvbT4KQ2M6IE5hb3lhIEhvcmlndWNoaSA8 bi1ob3JpZ3VjaGlAYWguanAubmVjLmNvbT4KQ2M6IFJvc3MgWndpc2xlciA8cm9zcy56d2lzbGVy QGxpbnV4LmludGVsLmNvbT4KU2lnbmVkLW9mZi1ieTogRGFuIFdpbGxpYW1zIDxkYW4uai53aWxs aWFtc0BpbnRlbC5jb20+Ci0tLQogaW5jbHVkZS9saW51eC9tbS5oICB8ICAgIDEgCiBtbS9tZW1v cnktZmFpbHVyZS5jIHwgIDE0NSArKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysr KysrKysrKysrKysrKysKIDIgZmlsZXMgY2hhbmdlZCwgMTQ2IGluc2VydGlvbnMoKykKCmRpZmYg LS1naXQgYS9pbmNsdWRlL2xpbnV4L21tLmggYi9pbmNsdWRlL2xpbnV4L21tLmgKaW5kZXggMWFj MWYwNmE0YmU2Li41NjZjOTcyZTAzZTcgMTAwNjQ0Ci0tLSBhL2luY2x1ZGUvbGludXgvbW0uaAor KysgYi9pbmNsdWRlL2xpbnV4L21tLmgKQEAgLTI2NjksNiArMjY2OSw3IEBAIGVudW0gbWZfYWN0 aW9uX3BhZ2VfdHlwZSB7CiAJTUZfTVNHX1RSVU5DQVRFRF9MUlUsCiAJTUZfTVNHX0JVRERZLAog CU1GX01TR19CVUREWV8yTkQsCisJTUZfTVNHX0RBWCwKIAlNRl9NU0dfVU5LTk9XTiwKIH07CiAK ZGlmZiAtLWdpdCBhL21tL21lbW9yeS1mYWlsdXJlLmMgYi9tbS9tZW1vcnktZmFpbHVyZS5jCmlu ZGV4IGI2ZWZiNzhiYTQ5Yi4uZGUwYmM4OTdkNmU3IDEwMDY0NAotLS0gYS9tbS9tZW1vcnktZmFp bHVyZS5jCisrKyBiL21tL21lbW9yeS1mYWlsdXJlLmMKQEAgLTU1LDYgKzU1LDcgQEAKICNpbmNs dWRlIDxsaW51eC9odWdldGxiLmg+CiAjaW5jbHVkZSA8bGludXgvbWVtb3J5X2hvdHBsdWcuaD4K ICNpbmNsdWRlIDxsaW51eC9tbV9pbmxpbmUuaD4KKyNpbmNsdWRlIDxsaW51eC9tZW1yZW1hcC5o PgogI2luY2x1ZGUgPGxpbnV4L2tmaWZvLmg+CiAjaW5jbHVkZSA8bGludXgvcmF0ZWxpbWl0Lmg+ CiAjaW5jbHVkZSAiaW50ZXJuYWwuaCIKQEAgLTUzMSw2ICs1MzIsNyBAQCBzdGF0aWMgY29uc3Qg Y2hhciAqIGNvbnN0IGFjdGlvbl9wYWdlX3R5cGVzW10gPSB7CiAJW01GX01TR19UUlVOQ0FURURf TFJVXQkJPSAiYWxyZWFkeSB0cnVuY2F0ZWQgTFJVIHBhZ2UiLAogCVtNRl9NU0dfQlVERFldCQkJ PSAiZnJlZSBidWRkeSBwYWdlIiwKIAlbTUZfTVNHX0JVRERZXzJORF0JCT0gImZyZWUgYnVkZHkg cGFnZSAoMm5kIHRyeSkiLAorCVtNRl9NU0dfREFYXQkJCT0gImRheCBwYWdlIiwKIAlbTUZfTVNH X1VOS05PV05dCQk9ICJ1bmtub3duIHBhZ2UiLAogfTsKIApAQCAtMTEzMiw2ICsxMTM0LDE0NCBA QCBzdGF0aWMgaW50IG1lbW9yeV9mYWlsdXJlX2h1Z2V0bGIodW5zaWduZWQgbG9uZyBwZm4sIGlu dCBmbGFncykKIAlyZXR1cm4gcmVzOwogfQogCitzdGF0aWMgdW5zaWduZWQgbG9uZyBkYXhfbWFw cGluZ19zaXplKHN0cnVjdCBhZGRyZXNzX3NwYWNlICptYXBwaW5nLAorCQlzdHJ1Y3QgcGFnZSAq cGFnZSkKK3sKKwlwZ29mZl90IHBnb2ZmID0gcGFnZV90b19wZ29mZihwYWdlKTsKKwlzdHJ1Y3Qg dm1fYXJlYV9zdHJ1Y3QgKnZtYTsKKwl1bnNpZ25lZCBsb25nIHNpemUgPSAwOworCisJaV9tbWFw X2xvY2tfcmVhZChtYXBwaW5nKTsKKwl4YV9sb2NrX2lycSgmbWFwcGluZy0+aV9wYWdlcyk7CisJ LyogdmFsaWRhdGUgdGhhdCBAcGFnZSBpcyBzdGlsbCBsaW5rZWQgdG8gQG1hcHBpbmcgKi8KKwlp ZiAocGFnZS0+bWFwcGluZyAhPSBtYXBwaW5nKSB7CisJCXhhX3VubG9ja19pcnEoJm1hcHBpbmct PmlfcGFnZXMpOworCQlpX21tYXBfdW5sb2NrX3JlYWQobWFwcGluZyk7CisJCQlyZXR1cm4gMDsK Kwl9CisJdm1hX2ludGVydmFsX3RyZWVfZm9yZWFjaCh2bWEsICZtYXBwaW5nLT5pX21tYXAsIHBn b2ZmLCBwZ29mZikgeworCQl1bnNpZ25lZCBsb25nIGFkZHJlc3MgPSB2bWFfYWRkcmVzcyhwYWdl LCB2bWEpOworCQlwZ2RfdCAqcGdkOworCQlwNGRfdCAqcDRkOworCQlwdWRfdCAqcHVkOworCQlw bWRfdCAqcG1kOworCQlwdGVfdCAqcHRlOworCisJCXBnZCA9IHBnZF9vZmZzZXQodm1hLT52bV9t bSwgYWRkcmVzcyk7CisJCWlmICghcGdkX3ByZXNlbnQoKnBnZCkpCisJCQljb250aW51ZTsKKwkJ cDRkID0gcDRkX29mZnNldChwZ2QsIGFkZHJlc3MpOworCQlpZiAoIXA0ZF9wcmVzZW50KCpwNGQp KQorCQkJY29udGludWU7CisJCXB1ZCA9IHB1ZF9vZmZzZXQocDRkLCBhZGRyZXNzKTsKKwkJaWYg KCFwdWRfcHJlc2VudCgqcHVkKSkKKwkJCWNvbnRpbnVlOworCQlpZiAocHVkX2Rldm1hcCgqcHVk KSkgeworCQkJc2l6ZSA9IFBVRF9TSVpFOworCQkJYnJlYWs7CisJCX0KKwkJcG1kID0gcG1kX29m ZnNldChwdWQsIGFkZHJlc3MpOworCQlpZiAoIXBtZF9wcmVzZW50KCpwbWQpKQorCQkJY29udGlu dWU7CisJCWlmIChwbWRfZGV2bWFwKCpwbWQpKSB7CisJCQlzaXplID0gUE1EX1NJWkU7CisJCQli cmVhazsKKwkJfQorCQlwdGUgPSBwdGVfb2Zmc2V0X21hcChwbWQsIGFkZHJlc3MpOworCQlpZiAo IXB0ZV9wcmVzZW50KCpwdGUpKQorCQkJY29udGludWU7CisJCWlmIChwdGVfZGV2bWFwKCpwdGUp KSB7CisJCQlzaXplID0gUEFHRV9TSVpFOworCQkJYnJlYWs7CisJCX0KKwl9CisJeGFfdW5sb2Nr X2lycSgmbWFwcGluZy0+aV9wYWdlcyk7CisJaV9tbWFwX3VubG9ja19yZWFkKG1hcHBpbmcpOwor CisJcmV0dXJuIHNpemU7Cit9CisKK3N0YXRpYyBpbnQgbWVtb3J5X2ZhaWx1cmVfZGV2X3BhZ2Vt YXAodW5zaWduZWQgbG9uZyBwZm4sIGludCBmbGFncywKKwkJc3RydWN0IGRldl9wYWdlbWFwICpw Z21hcCkKK3sKKwlzdHJ1Y3QgcGFnZSAqcGFnZSA9IHBmbl90b19wYWdlKHBmbik7CisJY29uc3Qg Ym9vbCB1bm1hcF9zdWNjZXNzID0gdHJ1ZTsKKwlzdHJ1Y3QgYWRkcmVzc19zcGFjZSAqbWFwcGlu ZzsKKwl1bnNpZ25lZCBsb25nIHNpemU7CisJTElTVF9IRUFEKHRva2lsbCk7CisJaW50IHJjID0g LUVCVVNZOworCWxvZmZfdCBzdGFydDsKKworCS8qCisJICogUHJldmVudCB0aGUgaW5vZGUgZnJv bSBiZWluZyBmcmVlZCB3aGlsZSB3ZSBhcmUgaW50ZXJyb2dhdGluZworCSAqIHRoZSBhZGRyZXNz X3NwYWNlLCB0eXBpY2FsbHkgdGhpcyB3b3VsZCBiZSBoYW5kbGVkIGJ5CisJICogbG9ja19wYWdl KCksIGJ1dCBkYXggcGFnZXMgZG8gbm90IHVzZSB0aGUgcGFnZSBsb2NrLgorCSAqLworCXJjdV9y ZWFkX2xvY2soKTsKKwltYXBwaW5nID0gcGFnZS0+bWFwcGluZzsKKwlpZiAoIW1hcHBpbmcpIHsK KwkJcmN1X3JlYWRfdW5sb2NrKCk7CisJCWdvdG8gb3V0OworCX0KKwlpZiAoIWlncmFiKG1hcHBp bmctPmhvc3QpKSB7CisJCW1hcHBpbmcgPSBOVUxMOworCQlyY3VfcmVhZF91bmxvY2soKTsKKwkJ Z290byBvdXQ7CisJfQorCXJjdV9yZWFkX3VubG9jaygpOworCisJaWYgKGh3cG9pc29uX2ZpbHRl cihwYWdlKSkgeworCQlyYyA9IDA7CisJCWdvdG8gb3V0OworCX0KKworCXN3aXRjaCAocGdtYXAt PnR5cGUpIHsKKwljYXNlIE1FTU9SWV9ERVZJQ0VfUFJJVkFURToKKwljYXNlIE1FTU9SWV9ERVZJ Q0VfUFVCTElDOgorCQkvKgorCQkgKiBUT0RPOiBIYW5kbGUgSE1NIHBhZ2VzIHdoaWNoIG1heSBu ZWVkIGNvb3JkaW5hdGlvbgorCQkgKiB3aXRoIGRldmljZS1zaWRlIG1lbW9yeS4KKwkJICovCisJ CWdvdG8gb3V0OworCWRlZmF1bHQ6CisJCWJyZWFrOworCX0KKworCS8qCisJICogSWYgdGhlIHBh Z2UgaXMgbm90IG1hcHBlZCBpbiB1c2Vyc3BhY2UgdGhlbiByZXBvcnQgaXQgYXMKKwkgKiB1bmhh bmRsZWQuCisJICovCisJc2l6ZSA9IGRheF9tYXBwaW5nX3NpemUobWFwcGluZywgcGFnZSk7CisJ aWYgKCFzaXplKSB7CisJCXByX2VycigiTWVtb3J5IGZhaWx1cmU6ICUjbHg6IGZhaWxlZCB0byB1 bm1hcCBwYWdlXG4iLCBwZm4pOworCQlnb3RvIG91dDsKKwl9CisKKwlTZXRQYWdlSFdQb2lzb24o cGFnZSk7CisKKwkvKgorCSAqIFVubGlrZSBTeXN0ZW0tUkFNIHRoZXJlIGlzIG5vIHBvc3NpYmls aXR5IHRvIHN3YXAgaW4gYQorCSAqIGRpZmZlcmVudCBwaHlzaWNhbCBwYWdlIGF0IGEgZ2l2ZW4g dmlydHVhbCBhZGRyZXNzLCBzbyBhbGwKKwkgKiB1c2Vyc3BhY2UgY29uc3VtcHRpb24gb2YgWk9O RV9ERVZJQ0UgbWVtb3J5IG5lY2Vzc2l0YXRlcworCSAqIFNJR0JVUyAoaS5lLiBNRl9NVVNUX0tJ TEwpCisJICovCisJZmxhZ3MgfD0gTUZfQUNUSU9OX1JFUVVJUkVEIHwgTUZfTVVTVF9LSUxMOwor CWNvbGxlY3RfcHJvY3MobWFwcGluZywgcGFnZSwgJnRva2lsbCwgZmxhZ3MgJiBNRl9BQ1RJT05f UkVRVUlSRUQpOworCisJc3RhcnQgPSAocGFnZS0+aW5kZXggPDwgUEFHRV9TSElGVCkgJiB+KHNp emUgLSAxKTsKKwl1bm1hcF9tYXBwaW5nX3JhbmdlKHBhZ2UtPm1hcHBpbmcsIHN0YXJ0LCBzdGFy dCArIHNpemUsIDApOworCisJa2lsbF9wcm9jcygmdG9raWxsLCBmbGFncyAmIE1GX01VU1RfS0lM TCwgIXVubWFwX3N1Y2Nlc3MsIGlsb2cyKHNpemUpLAorCQkJbWFwcGluZywgcGFnZSwgZmxhZ3Mp OworCXJjID0gMDsKK291dDoKKwlpZiAobWFwcGluZykKKwkJaXB1dChtYXBwaW5nLT5ob3N0KTsK KwlwdXRfZGV2X3BhZ2VtYXAocGdtYXApOworCWFjdGlvbl9yZXN1bHQocGZuLCBNRl9NU0dfREFY LCByYyA/IE1GX0ZBSUxFRCA6IE1GX1JFQ09WRVJFRCk7CisJcmV0dXJuIHJjOworfQorCiAvKioK ICAqIG1lbW9yeV9mYWlsdXJlIC0gSGFuZGxlIG1lbW9yeSBmYWlsdXJlIG9mIGEgcGFnZS4KICAq IEBwZm46IFBhZ2UgTnVtYmVyIG9mIHRoZSBjb3JydXB0ZWQgcGFnZQpAQCAtMTE1NCw2ICsxMjk0 LDcgQEAgaW50IG1lbW9yeV9mYWlsdXJlKHVuc2lnbmVkIGxvbmcgcGZuLCBpbnQgZmxhZ3MpCiAJ c3RydWN0IHBhZ2UgKnA7CiAJc3RydWN0IHBhZ2UgKmhwYWdlOwogCXN0cnVjdCBwYWdlICpvcmln X2hlYWQ7CisJc3RydWN0IGRldl9wYWdlbWFwICpwZ21hcDsKIAlpbnQgcmVzOwogCXVuc2lnbmVk IGxvbmcgcGFnZV9mbGFnczsKIApAQCAtMTE2Niw2ICsxMzA3LDEwIEBAIGludCBtZW1vcnlfZmFp bHVyZSh1bnNpZ25lZCBsb25nIHBmbiwgaW50IGZsYWdzKQogCQlyZXR1cm4gLUVOWElPOwogCX0K IAorCXBnbWFwID0gZ2V0X2Rldl9wYWdlbWFwKHBmbiwgTlVMTCk7CisJaWYgKHBnbWFwKQorCQly ZXR1cm4gbWVtb3J5X2ZhaWx1cmVfZGV2X3BhZ2VtYXAocGZuLCBmbGFncywgcGdtYXApOworCiAJ cCA9IHBmbl90b19wYWdlKHBmbik7CiAJaWYgKFBhZ2VIdWdlKHApKQogCQlyZXR1cm4gbWVtb3J5 X2ZhaWx1cmVfaHVnZXRsYihwZm4sIGZsYWdzKTsKCl9fX19fX19fX19fX19fX19fX19fX19fX19f X19fX19fX19fX19fX19fX19fX19fCkxpbnV4LW52ZGltbSBtYWlsaW5nIGxpc3QKTGludXgtbnZk aW1tQGxpc3RzLjAxLm9yZwpodHRwczovL2xpc3RzLjAxLm9yZy9tYWlsbWFuL2xpc3RpbmZvL2xp bnV4LW52ZGltbQo= From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Subject: [PATCH v2 10/11] mm, memory_failure: Teach memory_failure() about dev_pagemap pages From: Dan Williams To: linux-nvdimm@lists.01.org Cc: Jan Kara , Christoph Hellwig , =?utf-8?b?SsOpcsO0bWU=?= Glisse , Matthew Wilcox , Naoya Horiguchi , Ross Zwisler , linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, jack@suse.cz Date: Sat, 02 Jun 2018 22:23:36 -0700 Message-ID: <152800341624.17112.13515547552061692915.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <152800336321.17112.3300876636370683279.stgit@dwillia2-desk3.amr.corp.intel.com> References: <152800336321.17112.3300876636370683279.stgit@dwillia2-desk3.amr.corp.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org List-ID: mce: Uncorrected hardware memory error in user-access at af34214200 {1}[Hardware Error]: It has been corrected by h/w and requires no further action mce: [Hardware Error]: Machine check events logged {1}[Hardware Error]: event severity: corrected Memory failure: 0xaf34214: reserved kernel page still referenced by 1 users [..] Memory failure: 0xaf34214: recovery action for reserved kernel page: Failed mce: Memory error not recovered In contrast to typical memory, dev_pagemap pages may be dax mapped. With dax there is no possibility to map in another page dynamically since dax establishes 1:1 physical address to file offset associations. Also dev_pagemap pages associated with NVDIMM / persistent memory devices can internal remap/repair addresses with poison. While memory_failure() assumes that it can discard typical poisoned pages and keep them unmapped indefinitely, dev_pagemap pages may be returned to service after the error is cleared. Teach memory_failure() to detect and handle MEMORY_DEVICE_HOST dev_pagemap pages that have poison consumed by userspace. Mark the memory as UC instead of unmapping it completely to allow ongoing access via the device driver (nd_pmem). Later, nd_pmem will grow support for marking the page back to WB when the error is cleared. Cc: Jan Kara Cc: Christoph Hellwig Cc: Jérôme Glisse Cc: Matthew Wilcox Cc: Naoya Horiguchi Cc: Ross Zwisler Signed-off-by: Dan Williams --- include/linux/mm.h | 1 mm/memory-failure.c | 145 +++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 146 insertions(+) diff --git a/include/linux/mm.h b/include/linux/mm.h index 1ac1f06a4be6..566c972e03e7 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2669,6 +2669,7 @@ enum mf_action_page_type { MF_MSG_TRUNCATED_LRU, MF_MSG_BUDDY, MF_MSG_BUDDY_2ND, + MF_MSG_DAX, MF_MSG_UNKNOWN, }; diff --git a/mm/memory-failure.c b/mm/memory-failure.c index b6efb78ba49b..de0bc897d6e7 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -55,6 +55,7 @@ #include #include #include +#include #include #include #include "internal.h" @@ -531,6 +532,7 @@ static const char * const action_page_types[] = { [MF_MSG_TRUNCATED_LRU] = "already truncated LRU page", [MF_MSG_BUDDY] = "free buddy page", [MF_MSG_BUDDY_2ND] = "free buddy page (2nd try)", + [MF_MSG_DAX] = "dax page", [MF_MSG_UNKNOWN] = "unknown page", }; @@ -1132,6 +1134,144 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags) return res; } +static unsigned long dax_mapping_size(struct address_space *mapping, + struct page *page) +{ + pgoff_t pgoff = page_to_pgoff(page); + struct vm_area_struct *vma; + unsigned long size = 0; + + i_mmap_lock_read(mapping); + xa_lock_irq(&mapping->i_pages); + /* validate that @page is still linked to @mapping */ + if (page->mapping != mapping) { + xa_unlock_irq(&mapping->i_pages); + i_mmap_unlock_read(mapping); + return 0; + } + vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, pgoff) { + unsigned long address = vma_address(page, vma); + pgd_t *pgd; + p4d_t *p4d; + pud_t *pud; + pmd_t *pmd; + pte_t *pte; + + pgd = pgd_offset(vma->vm_mm, address); + if (!pgd_present(*pgd)) + continue; + p4d = p4d_offset(pgd, address); + if (!p4d_present(*p4d)) + continue; + pud = pud_offset(p4d, address); + if (!pud_present(*pud)) + continue; + if (pud_devmap(*pud)) { + size = PUD_SIZE; + break; + } + pmd = pmd_offset(pud, address); + if (!pmd_present(*pmd)) + continue; + if (pmd_devmap(*pmd)) { + size = PMD_SIZE; + break; + } + pte = pte_offset_map(pmd, address); + if (!pte_present(*pte)) + continue; + if (pte_devmap(*pte)) { + size = PAGE_SIZE; + break; + } + } + xa_unlock_irq(&mapping->i_pages); + i_mmap_unlock_read(mapping); + + return size; +} + +static int memory_failure_dev_pagemap(unsigned long pfn, int flags, + struct dev_pagemap *pgmap) +{ + struct page *page = pfn_to_page(pfn); + const bool unmap_success = true; + struct address_space *mapping; + unsigned long size; + LIST_HEAD(tokill); + int rc = -EBUSY; + loff_t start; + + /* + * Prevent the inode from being freed while we are interrogating + * the address_space, typically this would be handled by + * lock_page(), but dax pages do not use the page lock. + */ + rcu_read_lock(); + mapping = page->mapping; + if (!mapping) { + rcu_read_unlock(); + goto out; + } + if (!igrab(mapping->host)) { + mapping = NULL; + rcu_read_unlock(); + goto out; + } + rcu_read_unlock(); + + if (hwpoison_filter(page)) { + rc = 0; + goto out; + } + + switch (pgmap->type) { + case MEMORY_DEVICE_PRIVATE: + case MEMORY_DEVICE_PUBLIC: + /* + * TODO: Handle HMM pages which may need coordination + * with device-side memory. + */ + goto out; + default: + break; + } + + /* + * If the page is not mapped in userspace then report it as + * unhandled. + */ + size = dax_mapping_size(mapping, page); + if (!size) { + pr_err("Memory failure: %#lx: failed to unmap page\n", pfn); + goto out; + } + + SetPageHWPoison(page); + + /* + * Unlike System-RAM there is no possibility to swap in a + * different physical page at a given virtual address, so all + * userspace consumption of ZONE_DEVICE memory necessitates + * SIGBUS (i.e. MF_MUST_KILL) + */ + flags |= MF_ACTION_REQUIRED | MF_MUST_KILL; + collect_procs(mapping, page, &tokill, flags & MF_ACTION_REQUIRED); + + start = (page->index << PAGE_SHIFT) & ~(size - 1); + unmap_mapping_range(page->mapping, start, start + size, 0); + + kill_procs(&tokill, flags & MF_MUST_KILL, !unmap_success, ilog2(size), + mapping, page, flags); + rc = 0; +out: + if (mapping) + iput(mapping->host); + put_dev_pagemap(pgmap); + action_result(pfn, MF_MSG_DAX, rc ? MF_FAILED : MF_RECOVERED); + return rc; +} + /** * memory_failure - Handle memory failure of a page. * @pfn: Page Number of the corrupted page @@ -1154,6 +1294,7 @@ int memory_failure(unsigned long pfn, int flags) struct page *p; struct page *hpage; struct page *orig_head; + struct dev_pagemap *pgmap; int res; unsigned long page_flags; @@ -1166,6 +1307,10 @@ int memory_failure(unsigned long pfn, int flags) return -ENXIO; } + pgmap = get_dev_pagemap(pfn, NULL); + if (pgmap) + return memory_failure_dev_pagemap(pfn, flags, pgmap); + p = pfn_to_page(pfn); if (PageHuge(p)) return memory_failure_hugetlb(pfn, flags); From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f200.google.com (mail-pf0-f200.google.com [209.85.192.200]) by kanga.kvack.org (Postfix) with ESMTP id 21F3B6B0272 for ; Sun, 3 Jun 2018 01:33:35 -0400 (EDT) Received: by mail-pf0-f200.google.com with SMTP id e3-v6so16895150pfe.15 for ; Sat, 02 Jun 2018 22:33:35 -0700 (PDT) Received: from mga01.intel.com (mga01.intel.com. [192.55.52.88]) by mx.google.com with ESMTPS id t22-v6si14610767plo.263.2018.06.02.22.33.33 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 02 Jun 2018 22:33:33 -0700 (PDT) Subject: [PATCH v2 10/11] mm, memory_failure: Teach memory_failure() about dev_pagemap pages From: Dan Williams Date: Sat, 02 Jun 2018 22:23:36 -0700 Message-ID: <152800341624.17112.13515547552061692915.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <152800336321.17112.3300876636370683279.stgit@dwillia2-desk3.amr.corp.intel.com> References: <152800336321.17112.3300876636370683279.stgit@dwillia2-desk3.amr.corp.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org List-ID: To: linux-nvdimm@lists.01.org Cc: Jan Kara , Christoph Hellwig , =?utf-8?b?SsOpcsO0bWU=?= Glisse , Matthew Wilcox , Naoya Horiguchi , Ross Zwisler , linux-mm@kvack.org, linux-fsdevel@vger.kernel.orgjack@suse.cz mce: Uncorrected hardware memory error in user-access at af34214200 {1}[Hardware Error]: It has been corrected by h/w and requires no further action mce: [Hardware Error]: Machine check events logged {1}[Hardware Error]: event severity: corrected Memory failure: 0xaf34214: reserved kernel page still referenced by 1 users [..] Memory failure: 0xaf34214: recovery action for reserved kernel page: Failed mce: Memory error not recovered In contrast to typical memory, dev_pagemap pages may be dax mapped. With dax there is no possibility to map in another page dynamically since dax establishes 1:1 physical address to file offset associations. Also dev_pagemap pages associated with NVDIMM / persistent memory devices can internal remap/repair addresses with poison. While memory_failure() assumes that it can discard typical poisoned pages and keep them unmapped indefinitely, dev_pagemap pages may be returned to service after the error is cleared. Teach memory_failure() to detect and handle MEMORY_DEVICE_HOST dev_pagemap pages that have poison consumed by userspace. Mark the memory as UC instead of unmapping it completely to allow ongoing access via the device driver (nd_pmem). Later, nd_pmem will grow support for marking the page back to WB when the error is cleared. Cc: Jan Kara Cc: Christoph Hellwig Cc: JA(C)rA'me Glisse Cc: Matthew Wilcox Cc: Naoya Horiguchi Cc: Ross Zwisler Signed-off-by: Dan Williams --- include/linux/mm.h | 1 mm/memory-failure.c | 145 +++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 146 insertions(+) diff --git a/include/linux/mm.h b/include/linux/mm.h index 1ac1f06a4be6..566c972e03e7 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2669,6 +2669,7 @@ enum mf_action_page_type { MF_MSG_TRUNCATED_LRU, MF_MSG_BUDDY, MF_MSG_BUDDY_2ND, + MF_MSG_DAX, MF_MSG_UNKNOWN, }; diff --git a/mm/memory-failure.c b/mm/memory-failure.c index b6efb78ba49b..de0bc897d6e7 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -55,6 +55,7 @@ #include #include #include +#include #include #include #include "internal.h" @@ -531,6 +532,7 @@ static const char * const action_page_types[] = { [MF_MSG_TRUNCATED_LRU] = "already truncated LRU page", [MF_MSG_BUDDY] = "free buddy page", [MF_MSG_BUDDY_2ND] = "free buddy page (2nd try)", + [MF_MSG_DAX] = "dax page", [MF_MSG_UNKNOWN] = "unknown page", }; @@ -1132,6 +1134,144 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags) return res; } +static unsigned long dax_mapping_size(struct address_space *mapping, + struct page *page) +{ + pgoff_t pgoff = page_to_pgoff(page); + struct vm_area_struct *vma; + unsigned long size = 0; + + i_mmap_lock_read(mapping); + xa_lock_irq(&mapping->i_pages); + /* validate that @page is still linked to @mapping */ + if (page->mapping != mapping) { + xa_unlock_irq(&mapping->i_pages); + i_mmap_unlock_read(mapping); + return 0; + } + vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, pgoff) { + unsigned long address = vma_address(page, vma); + pgd_t *pgd; + p4d_t *p4d; + pud_t *pud; + pmd_t *pmd; + pte_t *pte; + + pgd = pgd_offset(vma->vm_mm, address); + if (!pgd_present(*pgd)) + continue; + p4d = p4d_offset(pgd, address); + if (!p4d_present(*p4d)) + continue; + pud = pud_offset(p4d, address); + if (!pud_present(*pud)) + continue; + if (pud_devmap(*pud)) { + size = PUD_SIZE; + break; + } + pmd = pmd_offset(pud, address); + if (!pmd_present(*pmd)) + continue; + if (pmd_devmap(*pmd)) { + size = PMD_SIZE; + break; + } + pte = pte_offset_map(pmd, address); + if (!pte_present(*pte)) + continue; + if (pte_devmap(*pte)) { + size = PAGE_SIZE; + break; + } + } + xa_unlock_irq(&mapping->i_pages); + i_mmap_unlock_read(mapping); + + return size; +} + +static int memory_failure_dev_pagemap(unsigned long pfn, int flags, + struct dev_pagemap *pgmap) +{ + struct page *page = pfn_to_page(pfn); + const bool unmap_success = true; + struct address_space *mapping; + unsigned long size; + LIST_HEAD(tokill); + int rc = -EBUSY; + loff_t start; + + /* + * Prevent the inode from being freed while we are interrogating + * the address_space, typically this would be handled by + * lock_page(), but dax pages do not use the page lock. + */ + rcu_read_lock(); + mapping = page->mapping; + if (!mapping) { + rcu_read_unlock(); + goto out; + } + if (!igrab(mapping->host)) { + mapping = NULL; + rcu_read_unlock(); + goto out; + } + rcu_read_unlock(); + + if (hwpoison_filter(page)) { + rc = 0; + goto out; + } + + switch (pgmap->type) { + case MEMORY_DEVICE_PRIVATE: + case MEMORY_DEVICE_PUBLIC: + /* + * TODO: Handle HMM pages which may need coordination + * with device-side memory. + */ + goto out; + default: + break; + } + + /* + * If the page is not mapped in userspace then report it as + * unhandled. + */ + size = dax_mapping_size(mapping, page); + if (!size) { + pr_err("Memory failure: %#lx: failed to unmap page\n", pfn); + goto out; + } + + SetPageHWPoison(page); + + /* + * Unlike System-RAM there is no possibility to swap in a + * different physical page at a given virtual address, so all + * userspace consumption of ZONE_DEVICE memory necessitates + * SIGBUS (i.e. MF_MUST_KILL) + */ + flags |= MF_ACTION_REQUIRED | MF_MUST_KILL; + collect_procs(mapping, page, &tokill, flags & MF_ACTION_REQUIRED); + + start = (page->index << PAGE_SHIFT) & ~(size - 1); + unmap_mapping_range(page->mapping, start, start + size, 0); + + kill_procs(&tokill, flags & MF_MUST_KILL, !unmap_success, ilog2(size), + mapping, page, flags); + rc = 0; +out: + if (mapping) + iput(mapping->host); + put_dev_pagemap(pgmap); + action_result(pfn, MF_MSG_DAX, rc ? MF_FAILED : MF_RECOVERED); + return rc; +} + /** * memory_failure - Handle memory failure of a page. * @pfn: Page Number of the corrupted page @@ -1154,6 +1294,7 @@ int memory_failure(unsigned long pfn, int flags) struct page *p; struct page *hpage; struct page *orig_head; + struct dev_pagemap *pgmap; int res; unsigned long page_flags; @@ -1166,6 +1307,10 @@ int memory_failure(unsigned long pfn, int flags) return -ENXIO; } + pgmap = get_dev_pagemap(pfn, NULL); + if (pgmap) + return memory_failure_dev_pagemap(pfn, flags, pgmap); + p = pfn_to_page(pfn); if (PageHuge(p)) return memory_failure_hugetlb(pfn, flags);