All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/3] RAS/CEC: fix __find_elem
@ 2019-04-18  3:41 ` WANG Chao
  0 siblings, 0 replies; 21+ messages in thread
From: WANG Chao @ 2019-04-18  3:41 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Tony Luck, linux-kernel, linux-edac

A left over pfn (because we don't clear) at ca->array[n] can be a match
in __find_elem. Later it'd cause a memmove size overflow in del_elem.

Signed-off-by: WANG Chao <chao.wang@ucloud.cn>
---
 drivers/ras/cec.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c
index 2d9ec378a8bc..2e0bf1269c31 100644
--- a/drivers/ras/cec.c
+++ b/drivers/ras/cec.c
@@ -206,7 +206,7 @@ static int __find_elem(struct ce_array *ca, u64 pfn, unsigned int *to)
 
 	this_pfn = PFN(ca->array[min]);
 
-	if (this_pfn == pfn)
+	if (this_pfn == pfn && ca->n > min)
 		return min;
 
 	return -ENOKEY;
-- 
2.21.0


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [1/3] RAS/CEC: fix __find_elem
@ 2019-04-18  3:41 ` WANG Chao
  0 siblings, 0 replies; 21+ messages in thread
From: WANG Chao @ 2019-04-18  3:41 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Tony Luck, linux-kernel, linux-edac

A left over pfn (because we don't clear) at ca->array[n] can be a match
in __find_elem. Later it'd cause a memmove size overflow in del_elem.

Signed-off-by: WANG Chao <chao.wang@ucloud.cn>
---
 drivers/ras/cec.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c
index 2d9ec378a8bc..2e0bf1269c31 100644
--- a/drivers/ras/cec.c
+++ b/drivers/ras/cec.c
@@ -206,7 +206,7 @@ static int __find_elem(struct ce_array *ca, u64 pfn, unsigned int *to)
 
 	this_pfn = PFN(ca->array[min]);
 
-	if (this_pfn == pfn)
+	if (this_pfn == pfn && ca->n > min)
 		return min;
 
 	return -ENOKEY;

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 2/3] RAS/CEC: make ces_entered smp safe
@ 2019-04-18  3:41   ` WANG Chao
  0 siblings, 0 replies; 21+ messages in thread
From: WANG Chao @ 2019-04-18  3:41 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Tony Luck, linux-kernel, linux-edac

ces_entered should be put in a critical section to avoid race condition.

Signed-off-by: WANG Chao <chao.wang@ucloud.cn>
---
 drivers/ras/cec.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c
index 2e0bf1269c31..702e4c02c713 100644
--- a/drivers/ras/cec.c
+++ b/drivers/ras/cec.c
@@ -286,10 +286,10 @@ int cec_add_elem(u64 pfn)
 	if (!ce_arr.array || ce_arr.disabled)
 		return -ENODEV;
 
-	ca->ces_entered++;
-
 	mutex_lock(&ce_mutex);
 
+	ca->ces_entered++;
+
 	if (ca->n == MAX_ELEMS)
 		WARN_ON(!del_lru_elem_unlocked(ca));
 
-- 
2.21.0


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [2/3] RAS/CEC: make ces_entered smp safe
@ 2019-04-18  3:41   ` WANG Chao
  0 siblings, 0 replies; 21+ messages in thread
From: WANG Chao @ 2019-04-18  3:41 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Tony Luck, linux-kernel, linux-edac

ces_entered should be put in a critical section to avoid race condition.

Signed-off-by: WANG Chao <chao.wang@ucloud.cn>
---
 drivers/ras/cec.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c
index 2e0bf1269c31..702e4c02c713 100644
--- a/drivers/ras/cec.c
+++ b/drivers/ras/cec.c
@@ -286,10 +286,10 @@ int cec_add_elem(u64 pfn)
 	if (!ce_arr.array || ce_arr.disabled)
 		return -ENODEV;
 
-	ca->ces_entered++;
-
 	mutex_lock(&ce_mutex);
 
+	ca->ces_entered++;
+
 	if (ca->n == MAX_ELEMS)
 		WARN_ON(!del_lru_elem_unlocked(ca));
 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 3/3] RAS/CEC: immediate soft-offline page when count_threshold == 1
@ 2019-04-18  3:41   ` WANG Chao
  0 siblings, 0 replies; 21+ messages in thread
From: WANG Chao @ 2019-04-18  3:41 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Tony Luck, linux-kernel, linux-edac

count_threshol == 1 isn't working as expected. CEC only does soft
offline the second time the same pfn is hit by a correctable error.

Signed-off-by: WANG Chao <chao.wang@ucloud.cn>
---
 drivers/ras/cec.c | 36 +++++++++++++++++++++---------------
 1 file changed, 21 insertions(+), 15 deletions(-)

diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c
index 702e4c02c713..ac879c45377c 100644
--- a/drivers/ras/cec.c
+++ b/drivers/ras/cec.c
@@ -272,7 +272,22 @@ static u64 __maybe_unused del_lru_elem(void)
 	return pfn;
 }
 
+static void cec_valid_soft_offline(u64 pfn)
+{
+	if (!pfn_valid(pfn)) {
+		pr_warn("CEC: Invalid pfn: 0x%llx\n", pfn);
+	} else {
+		/* We have reached max count for this page, soft-offline it. */
+		pr_err("Soft-offlining pfn: 0x%llx\n", pfn);
+		memory_failure_queue(pfn, MF_SOFT_OFFLINE, &cec_chain);
+		ce_arr.pfns_poisoned++;
+	}
+}
 
+/*
+ * Return a >0 value to denote that we've reached the offlining
+ * threshold.
+ */
 int cec_add_elem(u64 pfn)
 {
 	struct ce_array *ca = &ce_arr;
@@ -295,6 +310,11 @@ int cec_add_elem(u64 pfn)
 
 	ret = find_elem(ca, pfn, &to);
 	if (ret < 0) {
+		if (count_threshold == 1) {
+			cec_valid_soft_offline(pfn);
+			ret = 1;
+			goto unlock;
+		}
 		/*
 		 * Shift range [to-end] to make room for one more element.
 		 */
@@ -320,23 +340,9 @@ int cec_add_elem(u64 pfn)
 
 		ret = 0;
 	} else {
-		u64 pfn = ca->array[to] >> PAGE_SHIFT;
-
-		if (!pfn_valid(pfn)) {
-			pr_warn("CEC: Invalid pfn: 0x%llx\n", pfn);
-		} else {
-			/* We have reached max count for this page, soft-offline it. */
-			pr_err("Soft-offlining pfn: 0x%llx\n", pfn);
-			memory_failure_queue(pfn, MF_SOFT_OFFLINE);
-			ca->pfns_poisoned++;
-		}
-
+		cec_valid_soft_offline(pfn);
 		del_elem(ca, to);
 
-		/*
-		 * Return a >0 value to denote that we've reached the offlining
-		 * threshold.
-		 */
 		ret = 1;
 
 		goto unlock;
-- 
2.21.0


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [3/3] RAS/CEC: immediate soft-offline page when count_threshold == 1
@ 2019-04-18  3:41   ` WANG Chao
  0 siblings, 0 replies; 21+ messages in thread
From: WANG Chao @ 2019-04-18  3:41 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Tony Luck, linux-kernel, linux-edac

count_threshol == 1 isn't working as expected. CEC only does soft
offline the second time the same pfn is hit by a correctable error.

Signed-off-by: WANG Chao <chao.wang@ucloud.cn>
---
 drivers/ras/cec.c | 36 +++++++++++++++++++++---------------
 1 file changed, 21 insertions(+), 15 deletions(-)

diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c
index 702e4c02c713..ac879c45377c 100644
--- a/drivers/ras/cec.c
+++ b/drivers/ras/cec.c
@@ -272,7 +272,22 @@ static u64 __maybe_unused del_lru_elem(void)
 	return pfn;
 }
 
+static void cec_valid_soft_offline(u64 pfn)
+{
+	if (!pfn_valid(pfn)) {
+		pr_warn("CEC: Invalid pfn: 0x%llx\n", pfn);
+	} else {
+		/* We have reached max count for this page, soft-offline it. */
+		pr_err("Soft-offlining pfn: 0x%llx\n", pfn);
+		memory_failure_queue(pfn, MF_SOFT_OFFLINE, &cec_chain);
+		ce_arr.pfns_poisoned++;
+	}
+}
 
+/*
+ * Return a >0 value to denote that we've reached the offlining
+ * threshold.
+ */
 int cec_add_elem(u64 pfn)
 {
 	struct ce_array *ca = &ce_arr;
@@ -295,6 +310,11 @@ int cec_add_elem(u64 pfn)
 
 	ret = find_elem(ca, pfn, &to);
 	if (ret < 0) {
+		if (count_threshold == 1) {
+			cec_valid_soft_offline(pfn);
+			ret = 1;
+			goto unlock;
+		}
 		/*
 		 * Shift range [to-end] to make room for one more element.
 		 */
@@ -320,23 +340,9 @@ int cec_add_elem(u64 pfn)
 
 		ret = 0;
 	} else {
-		u64 pfn = ca->array[to] >> PAGE_SHIFT;
-
-		if (!pfn_valid(pfn)) {
-			pr_warn("CEC: Invalid pfn: 0x%llx\n", pfn);
-		} else {
-			/* We have reached max count for this page, soft-offline it. */
-			pr_err("Soft-offlining pfn: 0x%llx\n", pfn);
-			memory_failure_queue(pfn, MF_SOFT_OFFLINE);
-			ca->pfns_poisoned++;
-		}
-
+		cec_valid_soft_offline(pfn);
 		del_elem(ca, to);
 
-		/*
-		 * Return a >0 value to denote that we've reached the offlining
-		 * threshold.
-		 */
 		ret = 1;
 
 		goto unlock;

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [tip:ras/core] RAS/CEC: Increment cec_entered under the mutex lock
@ 2019-04-20 10:19     ` tip-bot for Borislav Petkov
  0 siblings, 0 replies; 21+ messages in thread
From: tip-bot for WANG Chao @ 2019-04-20 10:19 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: tony.luck, linux-kernel, hpa, chao.wang, tglx, bp, mingo, linux-edac

Commit-ID:  06e0fe2d8e9178bda874a75083bc13647fbf983f
Gitweb:     https://git.kernel.org/tip/06e0fe2d8e9178bda874a75083bc13647fbf983f
Author:     WANG Chao <chao.wang () ucloud ! cn>
AuthorDate: Thu, 18 Apr 2019 03:41:14 +0000
Committer:  Borislav Petkov <bp@suse.de>
CommitDate: Sat, 20 Apr 2019 12:13:13 +0200

RAS/CEC: Increment cec_entered under the mutex lock

Modify ->cec_entered in the critical section of the mutex.

Signed-off-by: WANG Chao <chao.wang@ucloud.cn>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/20190418034115.75954-2-chao.wang@ucloud.cn
---
 drivers/ras/cec.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c
index 2d9ec378a8bc..88e4f3ff0cb8 100644
--- a/drivers/ras/cec.c
+++ b/drivers/ras/cec.c
@@ -286,10 +286,10 @@ int cec_add_elem(u64 pfn)
 	if (!ce_arr.array || ce_arr.disabled)
 		return -ENODEV;
 
-	ca->ces_entered++;
-
 	mutex_lock(&ce_mutex);
 
+	ca->ces_entered++;
+
 	if (ca->n == MAX_ELEMS)
 		WARN_ON(!del_lru_elem_unlocked(ca));
 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [tip:ras/core] RAS/CEC: Increment cec_entered under the mutex lock
@ 2019-04-20 10:19     ` tip-bot for Borislav Petkov
  0 siblings, 0 replies; 21+ messages in thread
From: tip-bot for Borislav Petkov @ 2019-04-20 10:19 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: tony.luck, linux-kernel, hpa, chao.wang, tglx, bp, mingo, linux-edac

Commit-ID:  06e0fe2d8e9178bda874a75083bc13647fbf983f
Gitweb:     https://git.kernel.org/tip/06e0fe2d8e9178bda874a75083bc13647fbf983f
Author:     WANG Chao <chao.wang () ucloud ! cn>
AuthorDate: Thu, 18 Apr 2019 03:41:14 +0000
Committer:  Borislav Petkov <bp@suse.de>
CommitDate: Sat, 20 Apr 2019 12:13:13 +0200

RAS/CEC: Increment cec_entered under the mutex lock

Modify ->cec_entered in the critical section of the mutex.

Signed-off-by: WANG Chao <chao.wang@ucloud.cn>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/20190418034115.75954-2-chao.wang@ucloud.cn
---
 drivers/ras/cec.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c
index 2d9ec378a8bc..88e4f3ff0cb8 100644
--- a/drivers/ras/cec.c
+++ b/drivers/ras/cec.c
@@ -286,10 +286,10 @@ int cec_add_elem(u64 pfn)
 	if (!ce_arr.array || ce_arr.disabled)
 		return -ENODEV;
 
-	ca->ces_entered++;
-
 	mutex_lock(&ce_mutex);
 
+	ca->ces_entered++;
+
 	if (ca->n == MAX_ELEMS)
 		WARN_ON(!del_lru_elem_unlocked(ca));
 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [tip:ras/core] RAS/CEC: Increment cec_entered under the mutex lock
@ 2019-04-20 10:22     ` tip-bot for Borislav Petkov
  0 siblings, 0 replies; 21+ messages in thread
From: tip-bot for WANG Chao @ 2019-04-20 10:22 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: tglx, mingo, tony.luck, linux-edac, linux-kernel, chao.wang, hpa, bp

Commit-ID:  09cbd2197e9291d6a3d3f42873f06ca1f388c1a4
Gitweb:     https://git.kernel.org/tip/09cbd2197e9291d6a3d3f42873f06ca1f388c1a4
Author:     WANG Chao <chao.wang@ucloud.cn>
AuthorDate: Thu, 18 Apr 2019 03:41:14 +0000
Committer:  Borislav Petkov <bp@suse.de>
CommitDate: Sat, 20 Apr 2019 12:16:52 +0200

RAS/CEC: Increment cec_entered under the mutex lock

Modify ->cec_entered in the critical section of the mutex.

Signed-off-by: WANG Chao <chao.wang@ucloud.cn>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/20190418034115.75954-2-chao.wang@ucloud.cn
---
 drivers/ras/cec.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c
index 2d9ec378a8bc..88e4f3ff0cb8 100644
--- a/drivers/ras/cec.c
+++ b/drivers/ras/cec.c
@@ -286,10 +286,10 @@ int cec_add_elem(u64 pfn)
 	if (!ce_arr.array || ce_arr.disabled)
 		return -ENODEV;
 
-	ca->ces_entered++;
-
 	mutex_lock(&ce_mutex);
 
+	ca->ces_entered++;
+
 	if (ca->n == MAX_ELEMS)
 		WARN_ON(!del_lru_elem_unlocked(ca));
 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [tip:ras/core] RAS/CEC: Increment cec_entered under the mutex lock
@ 2019-04-20 10:22     ` tip-bot for Borislav Petkov
  0 siblings, 0 replies; 21+ messages in thread
From: tip-bot for Borislav Petkov @ 2019-04-20 10:22 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: tglx, mingo, tony.luck, linux-edac, linux-kernel, chao.wang, hpa, bp

Commit-ID:  09cbd2197e9291d6a3d3f42873f06ca1f388c1a4
Gitweb:     https://git.kernel.org/tip/09cbd2197e9291d6a3d3f42873f06ca1f388c1a4
Author:     WANG Chao <chao.wang@ucloud.cn>
AuthorDate: Thu, 18 Apr 2019 03:41:14 +0000
Committer:  Borislav Petkov <bp@suse.de>
CommitDate: Sat, 20 Apr 2019 12:16:52 +0200

RAS/CEC: Increment cec_entered under the mutex lock

Modify ->cec_entered in the critical section of the mutex.

Signed-off-by: WANG Chao <chao.wang@ucloud.cn>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/20190418034115.75954-2-chao.wang@ucloud.cn
---
 drivers/ras/cec.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c
index 2d9ec378a8bc..88e4f3ff0cb8 100644
--- a/drivers/ras/cec.c
+++ b/drivers/ras/cec.c
@@ -286,10 +286,10 @@ int cec_add_elem(u64 pfn)
 	if (!ce_arr.array || ce_arr.disabled)
 		return -ENODEV;
 
-	ca->ces_entered++;
-
 	mutex_lock(&ce_mutex);
 
+	ca->ces_entered++;
+
 	if (ca->n == MAX_ELEMS)
 		WARN_ON(!del_lru_elem_unlocked(ca));
 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/3] RAS/CEC: immediate soft-offline page when count_threshold == 1
@ 2019-04-20 11:57     ` Borislav Petkov
  0 siblings, 0 replies; 21+ messages in thread
From: Borislav Petkov @ 2019-04-20 11:57 UTC (permalink / raw)
  To: WANG Chao; +Cc: Tony Luck, linux-kernel, linux-edac

On Thu, Apr 18, 2019 at 11:41:15AM +0800, WANG Chao wrote:
> count_threshol == 1 isn't working as expected. CEC only does soft
> offline the second time the same pfn is hit by a correctable error.

So this?

---
diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c
index b3c377ddf340..750a427e1a73 100644
--- a/drivers/ras/cec.c
+++ b/drivers/ras/cec.c
@@ -333,6 +333,7 @@ int cec_add_elem(u64 pfn)
 
 	mutex_lock(&ce_mutex);
 
+	/* Array full, free the LRU slot. */
 	if (ca->n == MAX_ELEMS)
 		WARN_ON(!del_lru_elem_unlocked(ca));
 
@@ -346,14 +347,9 @@ int cec_add_elem(u64 pfn)
 			(void *)&ca->array[to],
 			(ca->n - to) * sizeof(u64));
 
-		ca->array[to] = (pfn << PAGE_SHIFT) |
-				(DECAY_MASK << COUNT_BITS) | 1;
+		ca->array[to] = (pfn << PAGE_SHIFT) | 1;
 
 		ca->n++;
-
-		ret = 0;
-
-		goto decay;
 	}
 
 	count = COUNT(ca->array[to]);
@@ -386,7 +382,6 @@ int cec_add_elem(u64 pfn)
 		goto unlock;
 	}
 
-decay:
 	ca->decay_count++;
 
 	if (ca->decay_count >= CLEAN_ELEMS)

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [3/3] RAS/CEC: immediate soft-offline page when count_threshold == 1
@ 2019-04-20 11:57     ` Borislav Petkov
  0 siblings, 0 replies; 21+ messages in thread
From: Borislav Petkov @ 2019-04-20 11:57 UTC (permalink / raw)
  To: WANG Chao; +Cc: Tony Luck, linux-kernel, linux-edac

On Thu, Apr 18, 2019 at 11:41:15AM +0800, WANG Chao wrote:
> count_threshol == 1 isn't working as expected. CEC only does soft
> offline the second time the same pfn is hit by a correctable error.

So this?

diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c
index b3c377ddf340..750a427e1a73 100644
--- a/drivers/ras/cec.c
+++ b/drivers/ras/cec.c
@@ -333,6 +333,7 @@ int cec_add_elem(u64 pfn)
 
 	mutex_lock(&ce_mutex);
 
+	/* Array full, free the LRU slot. */
 	if (ca->n == MAX_ELEMS)
 		WARN_ON(!del_lru_elem_unlocked(ca));
 
@@ -346,14 +347,9 @@ int cec_add_elem(u64 pfn)
 			(void *)&ca->array[to],
 			(ca->n - to) * sizeof(u64));
 
-		ca->array[to] = (pfn << PAGE_SHIFT) |
-				(DECAY_MASK << COUNT_BITS) | 1;
+		ca->array[to] = (pfn << PAGE_SHIFT) | 1;
 
 		ca->n++;
-
-		ret = 0;
-
-		goto decay;
 	}
 
 	count = COUNT(ca->array[to]);
@@ -386,7 +382,6 @@ int cec_add_elem(u64 pfn)
 		goto unlock;
 	}
 
-decay:
 	ca->decay_count++;
 
 	if (ca->decay_count >= CLEAN_ELEMS)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/3] RAS/CEC: immediate soft-offline page when count_threshold == 1
@ 2019-04-24  2:43       ` WANG Chao
  0 siblings, 0 replies; 21+ messages in thread
From: WANG Chao @ 2019-04-24  2:43 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Tony Luck, linux-kernel, linux-edac

On 04/20/19 at 01:57P, Borislav Petkov wrote:
> On Thu, Apr 18, 2019 at 11:41:15AM +0800, WANG Chao wrote:
> > count_threshol == 1 isn't working as expected. CEC only does soft
> > offline the second time the same pfn is hit by a correctable error.
> 
> So this?
> 
> ---
> diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c
> index b3c377ddf340..750a427e1a73 100644
> --- a/drivers/ras/cec.c
> +++ b/drivers/ras/cec.c
> @@ -333,6 +333,7 @@ int cec_add_elem(u64 pfn)
>  
>  	mutex_lock(&ce_mutex);
>  
> +	/* Array full, free the LRU slot. */
>  	if (ca->n == MAX_ELEMS)
>  		WARN_ON(!del_lru_elem_unlocked(ca));
>  
> @@ -346,14 +347,9 @@ int cec_add_elem(u64 pfn)
>  			(void *)&ca->array[to],
>  			(ca->n - to) * sizeof(u64));
>  
> -		ca->array[to] = (pfn << PAGE_SHIFT) |
> -				(DECAY_MASK << COUNT_BITS) | 1;
> +		ca->array[to] = (pfn << PAGE_SHIFT) | 1;
>  
>  		ca->n++;
> -
> -		ret = 0;
> -
> -		goto decay;
>  	}
>  
>  	count = COUNT(ca->array[to]);
> @@ -386,7 +382,6 @@ int cec_add_elem(u64 pfn)
>  		goto unlock;
>  	}
>  
> -decay:
>  	ca->decay_count++;
>  
>  	if (ca->decay_count >= CLEAN_ELEMS)

It looks good to me. Thanks for a better fix.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [3/3] RAS/CEC: immediate soft-offline page when count_threshold == 1
@ 2019-04-24  2:43       ` WANG Chao
  0 siblings, 0 replies; 21+ messages in thread
From: WANG Chao @ 2019-04-24  2:43 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Tony Luck, linux-kernel, linux-edac

On 04/20/19 at 01:57P, Borislav Petkov wrote:
> On Thu, Apr 18, 2019 at 11:41:15AM +0800, WANG Chao wrote:
> > count_threshol == 1 isn't working as expected. CEC only does soft
> > offline the second time the same pfn is hit by a correctable error.
> 
> So this?
> 
> ---
> diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c
> index b3c377ddf340..750a427e1a73 100644
> --- a/drivers/ras/cec.c
> +++ b/drivers/ras/cec.c
> @@ -333,6 +333,7 @@ int cec_add_elem(u64 pfn)
>  
>  	mutex_lock(&ce_mutex);
>  
> +	/* Array full, free the LRU slot. */
>  	if (ca->n == MAX_ELEMS)
>  		WARN_ON(!del_lru_elem_unlocked(ca));
>  
> @@ -346,14 +347,9 @@ int cec_add_elem(u64 pfn)
>  			(void *)&ca->array[to],
>  			(ca->n - to) * sizeof(u64));
>  
> -		ca->array[to] = (pfn << PAGE_SHIFT) |
> -				(DECAY_MASK << COUNT_BITS) | 1;
> +		ca->array[to] = (pfn << PAGE_SHIFT) | 1;
>  
>  		ca->n++;
> -
> -		ret = 0;
> -
> -		goto decay;
>  	}
>  
>  	count = COUNT(ca->array[to]);
> @@ -386,7 +382,6 @@ int cec_add_elem(u64 pfn)
>  		goto unlock;
>  	}
>  
> -decay:
>  	ca->decay_count++;
>  
>  	if (ca->decay_count >= CLEAN_ELEMS)

It looks good to me. Thanks for a better fix.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/3] RAS/CEC: immediate soft-offline page when count_threshold == 1
@ 2019-04-24 10:26         ` Borislav Petkov
  0 siblings, 0 replies; 21+ messages in thread
From: Borislav Petkov @ 2019-04-24 10:26 UTC (permalink / raw)
  To: WANG Chao; +Cc: Tony Luck, linux-kernel, linux-edac

On Wed, Apr 24, 2019 at 10:43:04AM +0800, WANG Chao wrote:
> It looks good to me. Thanks for a better fix.

Latest version:

https://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git/commit/?h=tip-ras-core-cec&id=aad216775348c4aaf467069c2e5fbf7ff6c27695

I'll post soon after I've hammered more on this thing.

Thx.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [3/3] RAS/CEC: immediate soft-offline page when count_threshold == 1
@ 2019-04-24 10:26         ` Borislav Petkov
  0 siblings, 0 replies; 21+ messages in thread
From: Borislav Petkov @ 2019-04-24 10:26 UTC (permalink / raw)
  To: WANG Chao; +Cc: Tony Luck, linux-kernel, linux-edac

On Wed, Apr 24, 2019 at 10:43:04AM +0800, WANG Chao wrote:
> It looks good to me. Thanks for a better fix.

Latest version:

https://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git/commit/?h=tip-ras-core-cec&id=aad216775348c4aaf467069c2e5fbf7ff6c27695

I'll post soon after I've hammered more on this thing.

Thx.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/3] RAS/CEC: fix __find_elem
@ 2019-04-25  7:56   ` WANG Chao
  0 siblings, 0 replies; 21+ messages in thread
From: WANG Chao @ 2019-04-25  7:56 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Tony Luck, linux-kernel, linux-edac

On 04/18/19 at 11:41P, WANG Chao wrote:
> A left over pfn (because we don't clear) at ca->array[n] can be a match
> in __find_elem. Later it'd cause a memmove size overflow in del_elem.
> 
> Signed-off-by: WANG Chao <chao.wang@ucloud.cn>
> ---
>  drivers/ras/cec.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c
> index 2d9ec378a8bc..2e0bf1269c31 100644
> --- a/drivers/ras/cec.c
> +++ b/drivers/ras/cec.c
> @@ -206,7 +206,7 @@ static int __find_elem(struct ce_array *ca, u64 pfn, unsigned int *to)
>  
>  	this_pfn = PFN(ca->array[min]);
>  
> -	if (this_pfn == pfn)
> +	if (this_pfn == pfn && ca->n > min)
>  		return min;
>  
>  	return -ENOKEY;

Any thought on this one?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [1/3] RAS/CEC: fix __find_elem
@ 2019-04-25  7:56   ` WANG Chao
  0 siblings, 0 replies; 21+ messages in thread
From: WANG Chao @ 2019-04-25  7:56 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Tony Luck, linux-kernel, linux-edac

On 04/18/19 at 11:41P, WANG Chao wrote:
> A left over pfn (because we don't clear) at ca->array[n] can be a match
> in __find_elem. Later it'd cause a memmove size overflow in del_elem.
> 
> Signed-off-by: WANG Chao <chao.wang@ucloud.cn>
> ---
>  drivers/ras/cec.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c
> index 2d9ec378a8bc..2e0bf1269c31 100644
> --- a/drivers/ras/cec.c
> +++ b/drivers/ras/cec.c
> @@ -206,7 +206,7 @@ static int __find_elem(struct ce_array *ca, u64 pfn, unsigned int *to)
>  
>  	this_pfn = PFN(ca->array[min]);
>  
> -	if (this_pfn == pfn)
> +	if (this_pfn == pfn && ca->n > min)
>  		return min;
>  
>  	return -ENOKEY;

Any thought on this one?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/3] RAS/CEC: fix __find_elem
@ 2019-04-25  8:05     ` WANG Chao
  0 siblings, 0 replies; 21+ messages in thread
From: WANG Chao @ 2019-04-25  8:05 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Tony Luck, linux-kernel, linux-edac

On 04/25/19 at 03:56P, WANG Chao wrote:
> On 04/18/19 at 11:41P, WANG Chao wrote:
> > A left over pfn (because we don't clear) at ca->array[n] can be a match
> > in __find_elem. Later it'd cause a memmove size overflow in del_elem.
> > 
> > Signed-off-by: WANG Chao <chao.wang@ucloud.cn>
> > ---
> >  drivers/ras/cec.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c
> > index 2d9ec378a8bc..2e0bf1269c31 100644
> > --- a/drivers/ras/cec.c
> > +++ b/drivers/ras/cec.c
> > @@ -206,7 +206,7 @@ static int __find_elem(struct ce_array *ca, u64 pfn, unsigned int *to)
> >  
> >  	this_pfn = PFN(ca->array[min]);
> >  
> > -	if (this_pfn == pfn)
> > +	if (this_pfn == pfn && ca->n > min)
> >  		return min;
> >  
> >  	return -ENOKEY;
> 
> Any thought on this one?

Aha, I see there's another fix queued. Thanks.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [1/3] RAS/CEC: fix __find_elem
@ 2019-04-25  8:05     ` WANG Chao
  0 siblings, 0 replies; 21+ messages in thread
From: WANG Chao @ 2019-04-25  8:05 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Tony Luck, linux-kernel, linux-edac

On 04/25/19 at 03:56P, WANG Chao wrote:
> On 04/18/19 at 11:41P, WANG Chao wrote:
> > A left over pfn (because we don't clear) at ca->array[n] can be a match
> > in __find_elem. Later it'd cause a memmove size overflow in del_elem.
> > 
> > Signed-off-by: WANG Chao <chao.wang@ucloud.cn>
> > ---
> >  drivers/ras/cec.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c
> > index 2d9ec378a8bc..2e0bf1269c31 100644
> > --- a/drivers/ras/cec.c
> > +++ b/drivers/ras/cec.c
> > @@ -206,7 +206,7 @@ static int __find_elem(struct ce_array *ca, u64 pfn, unsigned int *to)
> >  
> >  	this_pfn = PFN(ca->array[min]);
> >  
> > -	if (this_pfn == pfn)
> > +	if (this_pfn == pfn && ca->n > min)
> >  		return min;
> >  
> >  	return -ENOKEY;
> 
> Any thought on this one?

Aha, I see there's another fix queued. Thanks.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [tip:ras/core] RAS/CEC: Check count_threshold unconditionally
  2019-04-18  3:41   ` [3/3] " WANG Chao
  (?)
  (?)
@ 2019-06-08 21:26   ` tip-bot for Borislav Petkov
  -1 siblings, 0 replies; 21+ messages in thread
From: tip-bot for Borislav Petkov @ 2019-06-08 21:26 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: chao.wang, bp, mingo, linux-kernel, tony.luck, tglx, hpa

Commit-ID:  de0e0624d86ff9fc512dedb297f8978698abf21a
Gitweb:     https://git.kernel.org/tip/de0e0624d86ff9fc512dedb297f8978698abf21a
Author:     Borislav Petkov <bp@suse.de>
AuthorDate: Sat, 20 Apr 2019 14:06:37 +0200
Committer:  Borislav Petkov <bp@suse.de>
CommitDate: Sat, 8 Jun 2019 17:33:10 +0200

RAS/CEC: Check count_threshold unconditionally

The count_threshold should be checked unconditionally, after insertion
too, so that a count_threshold value of 1 can cause an immediate
offlining. I.e., offline the page on the *first* error encountered.

Add comments to make it clear what cec_add_elem() does, while at it.

Reported-by: WANG Chao <chao.wang@ucloud.cn>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac@vger.kernel.org
Link: https://lkml.kernel.org/r/20190418034115.75954-3-chao.wang@ucloud.cn
---
 drivers/ras/cec.c | 27 ++++++++++-----------------
 1 file changed, 10 insertions(+), 17 deletions(-)

diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c
index f5795adc5a6e..73a975c26f9f 100644
--- a/drivers/ras/cec.c
+++ b/drivers/ras/cec.c
@@ -294,6 +294,7 @@ int cec_add_elem(u64 pfn)
 
 	ca->ces_entered++;
 
+	/* Array full, free the LRU slot. */
 	if (ca->n == MAX_ELEMS)
 		WARN_ON(!del_lru_elem_unlocked(ca));
 
@@ -306,24 +307,17 @@ int cec_add_elem(u64 pfn)
 			(void *)&ca->array[to],
 			(ca->n - to) * sizeof(u64));
 
-		ca->array[to] = (pfn << PAGE_SHIFT) |
-				(DECAY_MASK << COUNT_BITS) | 1;
-
+		ca->array[to] = pfn << PAGE_SHIFT;
 		ca->n++;
-
-		ret = 0;
-
-		goto decay;
 	}
 
-	count = COUNT(ca->array[to]);
-
-	if (count < count_threshold) {
-		ca->array[to] |= (DECAY_MASK << COUNT_BITS);
-		ca->array[to]++;
+	/* Add/refresh element generation and increment count */
+	ca->array[to] |= DECAY_MASK << COUNT_BITS;
+	ca->array[to]++;
 
-		ret = 0;
-	} else {
+	/* Check action threshold and soft-offline, if reached. */
+	count = COUNT(ca->array[to]);
+	if (count >= count_threshold) {
 		u64 pfn = ca->array[to] >> PAGE_SHIFT;
 
 		if (!pfn_valid(pfn)) {
@@ -338,15 +332,14 @@ int cec_add_elem(u64 pfn)
 		del_elem(ca, to);
 
 		/*
-		 * Return a >0 value to denote that we've reached the offlining
-		 * threshold.
+		 * Return a >0 value to callers, to denote that we've reached
+		 * the offlining threshold.
 		 */
 		ret = 1;
 
 		goto unlock;
 	}
 
-decay:
 	ca->decay_count++;
 
 	if (ca->decay_count >= CLEAN_ELEMS)

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2019-06-08 21:26 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-18  3:41 [PATCH 1/3] RAS/CEC: fix __find_elem WANG Chao
2019-04-18  3:41 ` [1/3] " WANG Chao
2019-04-18  3:41 ` [PATCH 2/3] RAS/CEC: make ces_entered smp safe WANG Chao
2019-04-18  3:41   ` [2/3] " WANG Chao
2019-04-20 10:19   ` [tip:ras/core] RAS/CEC: Increment cec_entered under the mutex lock tip-bot for WANG Chao
2019-04-20 10:19     ` tip-bot for Borislav Petkov
2019-04-20 10:22   ` tip-bot for WANG Chao
2019-04-20 10:22     ` tip-bot for Borislav Petkov
2019-04-18  3:41 ` [PATCH 3/3] RAS/CEC: immediate soft-offline page when count_threshold == 1 WANG Chao
2019-04-18  3:41   ` [3/3] " WANG Chao
2019-04-20 11:57   ` [PATCH 3/3] " Borislav Petkov
2019-04-20 11:57     ` [3/3] " Borislav Petkov
2019-04-24  2:43     ` [PATCH 3/3] " WANG Chao
2019-04-24  2:43       ` [3/3] " WANG Chao
2019-04-24 10:26       ` [PATCH 3/3] " Borislav Petkov
2019-04-24 10:26         ` [3/3] " Borislav Petkov
2019-06-08 21:26   ` [tip:ras/core] RAS/CEC: Check count_threshold unconditionally tip-bot for Borislav Petkov
2019-04-25  7:56 ` [PATCH 1/3] RAS/CEC: fix __find_elem WANG Chao
2019-04-25  7:56   ` [1/3] " WANG Chao
2019-04-25  8:05   ` [PATCH 1/3] " WANG Chao
2019-04-25  8:05     ` [1/3] " WANG Chao

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.