* [1/3] RAS/CEC: fix __find_elem
@ 2019-04-18 3:41 WANG Chao
2019-04-18 3:41 ` [PATCH 1/3] " WANG Chao
` (3 more replies)
0 siblings, 4 replies; 20+ messages in thread
From: WANG Chao @ 2019-04-18 3:41 UTC (permalink / raw)
To: Borislav Petkov; +Cc: Tony Luck, linux-kernel, linux-edac
A left over pfn (because we don't clear) at ca->array[n] can be a match
in __find_elem. Later it'd cause a memmove size overflow in del_elem.
Signed-off-by: WANG Chao <chao.wang@ucloud.cn>
---
drivers/ras/cec.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c
index 2d9ec378a8bc..2e0bf1269c31 100644
--- a/drivers/ras/cec.c
+++ b/drivers/ras/cec.c
@@ -206,7 +206,7 @@ static int __find_elem(struct ce_array *ca, u64 pfn, unsigned int *to)
this_pfn = PFN(ca->array[min]);
- if (this_pfn == pfn)
+ if (this_pfn == pfn && ca->n > min)
return min;
return -ENOKEY;
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 1/3] RAS/CEC: fix __find_elem
2019-04-18 3:41 [1/3] RAS/CEC: fix __find_elem WANG Chao
@ 2019-04-18 3:41 ` WANG Chao
2019-04-18 3:41 ` [2/3] RAS/CEC: make ces_entered smp safe WANG Chao
` (2 subsequent siblings)
3 siblings, 0 replies; 20+ messages in thread
From: WANG Chao @ 2019-04-18 3:41 UTC (permalink / raw)
To: Borislav Petkov; +Cc: Tony Luck, linux-kernel, linux-edac
A left over pfn (because we don't clear) at ca->array[n] can be a match
in __find_elem. Later it'd cause a memmove size overflow in del_elem.
Signed-off-by: WANG Chao <chao.wang@ucloud.cn>
---
drivers/ras/cec.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c
index 2d9ec378a8bc..2e0bf1269c31 100644
--- a/drivers/ras/cec.c
+++ b/drivers/ras/cec.c
@@ -206,7 +206,7 @@ static int __find_elem(struct ce_array *ca, u64 pfn, unsigned int *to)
this_pfn = PFN(ca->array[min]);
- if (this_pfn == pfn)
+ if (this_pfn == pfn && ca->n > min)
return min;
return -ENOKEY;
--
2.21.0
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [2/3] RAS/CEC: make ces_entered smp safe
@ 2019-04-18 3:41 ` WANG Chao
2019-04-18 3:41 ` [PATCH 2/3] " WANG Chao
` (2 more replies)
0 siblings, 3 replies; 20+ messages in thread
From: WANG Chao @ 2019-04-18 3:41 UTC (permalink / raw)
To: Borislav Petkov; +Cc: Tony Luck, linux-kernel, linux-edac
ces_entered should be put in a critical section to avoid race condition.
Signed-off-by: WANG Chao <chao.wang@ucloud.cn>
---
drivers/ras/cec.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c
index 2e0bf1269c31..702e4c02c713 100644
--- a/drivers/ras/cec.c
+++ b/drivers/ras/cec.c
@@ -286,10 +286,10 @@ int cec_add_elem(u64 pfn)
if (!ce_arr.array || ce_arr.disabled)
return -ENODEV;
- ca->ces_entered++;
-
mutex_lock(&ce_mutex);
+ ca->ces_entered++;
+
if (ca->n == MAX_ELEMS)
WARN_ON(!del_lru_elem_unlocked(ca));
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 2/3] RAS/CEC: make ces_entered smp safe
2019-04-18 3:41 ` [2/3] RAS/CEC: make ces_entered smp safe WANG Chao
@ 2019-04-18 3:41 ` WANG Chao
2019-04-20 10:19 ` [tip:ras/core] RAS/CEC: Increment cec_entered under the mutex lock tip-bot for Borislav Petkov
2019-04-20 10:22 ` tip-bot for Borislav Petkov
2 siblings, 0 replies; 20+ messages in thread
From: WANG Chao @ 2019-04-18 3:41 UTC (permalink / raw)
To: Borislav Petkov; +Cc: Tony Luck, linux-kernel, linux-edac
ces_entered should be put in a critical section to avoid race condition.
Signed-off-by: WANG Chao <chao.wang@ucloud.cn>
---
drivers/ras/cec.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c
index 2e0bf1269c31..702e4c02c713 100644
--- a/drivers/ras/cec.c
+++ b/drivers/ras/cec.c
@@ -286,10 +286,10 @@ int cec_add_elem(u64 pfn)
if (!ce_arr.array || ce_arr.disabled)
return -ENODEV;
- ca->ces_entered++;
-
mutex_lock(&ce_mutex);
+ ca->ces_entered++;
+
if (ca->n == MAX_ELEMS)
WARN_ON(!del_lru_elem_unlocked(ca));
--
2.21.0
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [3/3] RAS/CEC: immediate soft-offline page when count_threshold == 1
@ 2019-04-18 3:41 ` WANG Chao
2019-04-18 3:41 ` [PATCH 3/3] " WANG Chao
2019-04-20 11:57 ` [3/3] " Borislav Petkov
0 siblings, 2 replies; 20+ messages in thread
From: WANG Chao @ 2019-04-18 3:41 UTC (permalink / raw)
To: Borislav Petkov; +Cc: Tony Luck, linux-kernel, linux-edac
count_threshol == 1 isn't working as expected. CEC only does soft
offline the second time the same pfn is hit by a correctable error.
Signed-off-by: WANG Chao <chao.wang@ucloud.cn>
---
drivers/ras/cec.c | 36 +++++++++++++++++++++---------------
1 file changed, 21 insertions(+), 15 deletions(-)
diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c
index 702e4c02c713..ac879c45377c 100644
--- a/drivers/ras/cec.c
+++ b/drivers/ras/cec.c
@@ -272,7 +272,22 @@ static u64 __maybe_unused del_lru_elem(void)
return pfn;
}
+static void cec_valid_soft_offline(u64 pfn)
+{
+ if (!pfn_valid(pfn)) {
+ pr_warn("CEC: Invalid pfn: 0x%llx\n", pfn);
+ } else {
+ /* We have reached max count for this page, soft-offline it. */
+ pr_err("Soft-offlining pfn: 0x%llx\n", pfn);
+ memory_failure_queue(pfn, MF_SOFT_OFFLINE, &cec_chain);
+ ce_arr.pfns_poisoned++;
+ }
+}
+/*
+ * Return a >0 value to denote that we've reached the offlining
+ * threshold.
+ */
int cec_add_elem(u64 pfn)
{
struct ce_array *ca = &ce_arr;
@@ -295,6 +310,11 @@ int cec_add_elem(u64 pfn)
ret = find_elem(ca, pfn, &to);
if (ret < 0) {
+ if (count_threshold == 1) {
+ cec_valid_soft_offline(pfn);
+ ret = 1;
+ goto unlock;
+ }
/*
* Shift range [to-end] to make room for one more element.
*/
@@ -320,23 +340,9 @@ int cec_add_elem(u64 pfn)
ret = 0;
} else {
- u64 pfn = ca->array[to] >> PAGE_SHIFT;
-
- if (!pfn_valid(pfn)) {
- pr_warn("CEC: Invalid pfn: 0x%llx\n", pfn);
- } else {
- /* We have reached max count for this page, soft-offline it. */
- pr_err("Soft-offlining pfn: 0x%llx\n", pfn);
- memory_failure_queue(pfn, MF_SOFT_OFFLINE);
- ca->pfns_poisoned++;
- }
-
+ cec_valid_soft_offline(pfn);
del_elem(ca, to);
- /*
- * Return a >0 value to denote that we've reached the offlining
- * threshold.
- */
ret = 1;
goto unlock;
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 3/3] RAS/CEC: immediate soft-offline page when count_threshold == 1
2019-04-18 3:41 ` [3/3] RAS/CEC: immediate soft-offline page when count_threshold == 1 WANG Chao
@ 2019-04-18 3:41 ` WANG Chao
2019-04-20 11:57 ` [3/3] " Borislav Petkov
1 sibling, 0 replies; 20+ messages in thread
From: WANG Chao @ 2019-04-18 3:41 UTC (permalink / raw)
To: Borislav Petkov; +Cc: Tony Luck, linux-kernel, linux-edac
count_threshol == 1 isn't working as expected. CEC only does soft
offline the second time the same pfn is hit by a correctable error.
Signed-off-by: WANG Chao <chao.wang@ucloud.cn>
---
drivers/ras/cec.c | 36 +++++++++++++++++++++---------------
1 file changed, 21 insertions(+), 15 deletions(-)
diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c
index 702e4c02c713..ac879c45377c 100644
--- a/drivers/ras/cec.c
+++ b/drivers/ras/cec.c
@@ -272,7 +272,22 @@ static u64 __maybe_unused del_lru_elem(void)
return pfn;
}
+static void cec_valid_soft_offline(u64 pfn)
+{
+ if (!pfn_valid(pfn)) {
+ pr_warn("CEC: Invalid pfn: 0x%llx\n", pfn);
+ } else {
+ /* We have reached max count for this page, soft-offline it. */
+ pr_err("Soft-offlining pfn: 0x%llx\n", pfn);
+ memory_failure_queue(pfn, MF_SOFT_OFFLINE, &cec_chain);
+ ce_arr.pfns_poisoned++;
+ }
+}
+/*
+ * Return a >0 value to denote that we've reached the offlining
+ * threshold.
+ */
int cec_add_elem(u64 pfn)
{
struct ce_array *ca = &ce_arr;
@@ -295,6 +310,11 @@ int cec_add_elem(u64 pfn)
ret = find_elem(ca, pfn, &to);
if (ret < 0) {
+ if (count_threshold == 1) {
+ cec_valid_soft_offline(pfn);
+ ret = 1;
+ goto unlock;
+ }
/*
* Shift range [to-end] to make room for one more element.
*/
@@ -320,23 +340,9 @@ int cec_add_elem(u64 pfn)
ret = 0;
} else {
- u64 pfn = ca->array[to] >> PAGE_SHIFT;
-
- if (!pfn_valid(pfn)) {
- pr_warn("CEC: Invalid pfn: 0x%llx\n", pfn);
- } else {
- /* We have reached max count for this page, soft-offline it. */
- pr_err("Soft-offlining pfn: 0x%llx\n", pfn);
- memory_failure_queue(pfn, MF_SOFT_OFFLINE);
- ca->pfns_poisoned++;
- }
-
+ cec_valid_soft_offline(pfn);
del_elem(ca, to);
- /*
- * Return a >0 value to denote that we've reached the offlining
- * threshold.
- */
ret = 1;
goto unlock;
--
2.21.0
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [tip:ras/core] RAS/CEC: Increment cec_entered under the mutex lock
@ 2019-04-20 10:19 ` tip-bot for Borislav Petkov
2019-04-20 10:19 ` tip-bot for WANG Chao
0 siblings, 1 reply; 20+ messages in thread
From: tip-bot for Borislav Petkov @ 2019-04-20 10:19 UTC (permalink / raw)
To: linux-tip-commits
Cc: tony.luck, linux-kernel, hpa, chao.wang, tglx, bp, mingo, linux-edac
Commit-ID: 06e0fe2d8e9178bda874a75083bc13647fbf983f
Gitweb: https://git.kernel.org/tip/06e0fe2d8e9178bda874a75083bc13647fbf983f
Author: WANG Chao <chao.wang () ucloud ! cn>
AuthorDate: Thu, 18 Apr 2019 03:41:14 +0000
Committer: Borislav Petkov <bp@suse.de>
CommitDate: Sat, 20 Apr 2019 12:13:13 +0200
RAS/CEC: Increment cec_entered under the mutex lock
Modify ->cec_entered in the critical section of the mutex.
Signed-off-by: WANG Chao <chao.wang@ucloud.cn>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/20190418034115.75954-2-chao.wang@ucloud.cn
---
drivers/ras/cec.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c
index 2d9ec378a8bc..88e4f3ff0cb8 100644
--- a/drivers/ras/cec.c
+++ b/drivers/ras/cec.c
@@ -286,10 +286,10 @@ int cec_add_elem(u64 pfn)
if (!ce_arr.array || ce_arr.disabled)
return -ENODEV;
- ca->ces_entered++;
-
mutex_lock(&ce_mutex);
+ ca->ces_entered++;
+
if (ca->n == MAX_ELEMS)
WARN_ON(!del_lru_elem_unlocked(ca));
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [tip:ras/core] RAS/CEC: Increment cec_entered under the mutex lock
2019-04-20 10:19 ` [tip:ras/core] RAS/CEC: Increment cec_entered under the mutex lock tip-bot for Borislav Petkov
@ 2019-04-20 10:19 ` tip-bot for WANG Chao
0 siblings, 0 replies; 20+ messages in thread
From: tip-bot for WANG Chao @ 2019-04-20 10:19 UTC (permalink / raw)
To: linux-tip-commits
Cc: tony.luck, linux-kernel, hpa, chao.wang, tglx, bp, mingo, linux-edac
Commit-ID: 06e0fe2d8e9178bda874a75083bc13647fbf983f
Gitweb: https://git.kernel.org/tip/06e0fe2d8e9178bda874a75083bc13647fbf983f
Author: WANG Chao <chao.wang () ucloud ! cn>
AuthorDate: Thu, 18 Apr 2019 03:41:14 +0000
Committer: Borislav Petkov <bp@suse.de>
CommitDate: Sat, 20 Apr 2019 12:13:13 +0200
RAS/CEC: Increment cec_entered under the mutex lock
Modify ->cec_entered in the critical section of the mutex.
Signed-off-by: WANG Chao <chao.wang@ucloud.cn>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/20190418034115.75954-2-chao.wang@ucloud.cn
---
drivers/ras/cec.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c
index 2d9ec378a8bc..88e4f3ff0cb8 100644
--- a/drivers/ras/cec.c
+++ b/drivers/ras/cec.c
@@ -286,10 +286,10 @@ int cec_add_elem(u64 pfn)
if (!ce_arr.array || ce_arr.disabled)
return -ENODEV;
- ca->ces_entered++;
-
mutex_lock(&ce_mutex);
+ ca->ces_entered++;
+
if (ca->n == MAX_ELEMS)
WARN_ON(!del_lru_elem_unlocked(ca));
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [tip:ras/core] RAS/CEC: Increment cec_entered under the mutex lock
@ 2019-04-20 10:22 ` tip-bot for Borislav Petkov
2019-04-20 10:22 ` tip-bot for WANG Chao
0 siblings, 1 reply; 20+ messages in thread
From: tip-bot for Borislav Petkov @ 2019-04-20 10:22 UTC (permalink / raw)
To: linux-tip-commits
Cc: tglx, mingo, tony.luck, linux-edac, linux-kernel, chao.wang, hpa, bp
Commit-ID: 09cbd2197e9291d6a3d3f42873f06ca1f388c1a4
Gitweb: https://git.kernel.org/tip/09cbd2197e9291d6a3d3f42873f06ca1f388c1a4
Author: WANG Chao <chao.wang@ucloud.cn>
AuthorDate: Thu, 18 Apr 2019 03:41:14 +0000
Committer: Borislav Petkov <bp@suse.de>
CommitDate: Sat, 20 Apr 2019 12:16:52 +0200
RAS/CEC: Increment cec_entered under the mutex lock
Modify ->cec_entered in the critical section of the mutex.
Signed-off-by: WANG Chao <chao.wang@ucloud.cn>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/20190418034115.75954-2-chao.wang@ucloud.cn
---
drivers/ras/cec.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c
index 2d9ec378a8bc..88e4f3ff0cb8 100644
--- a/drivers/ras/cec.c
+++ b/drivers/ras/cec.c
@@ -286,10 +286,10 @@ int cec_add_elem(u64 pfn)
if (!ce_arr.array || ce_arr.disabled)
return -ENODEV;
- ca->ces_entered++;
-
mutex_lock(&ce_mutex);
+ ca->ces_entered++;
+
if (ca->n == MAX_ELEMS)
WARN_ON(!del_lru_elem_unlocked(ca));
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [tip:ras/core] RAS/CEC: Increment cec_entered under the mutex lock
2019-04-20 10:22 ` tip-bot for Borislav Petkov
@ 2019-04-20 10:22 ` tip-bot for WANG Chao
0 siblings, 0 replies; 20+ messages in thread
From: tip-bot for WANG Chao @ 2019-04-20 10:22 UTC (permalink / raw)
To: linux-tip-commits
Cc: tglx, mingo, tony.luck, linux-edac, linux-kernel, chao.wang, hpa, bp
Commit-ID: 09cbd2197e9291d6a3d3f42873f06ca1f388c1a4
Gitweb: https://git.kernel.org/tip/09cbd2197e9291d6a3d3f42873f06ca1f388c1a4
Author: WANG Chao <chao.wang@ucloud.cn>
AuthorDate: Thu, 18 Apr 2019 03:41:14 +0000
Committer: Borislav Petkov <bp@suse.de>
CommitDate: Sat, 20 Apr 2019 12:16:52 +0200
RAS/CEC: Increment cec_entered under the mutex lock
Modify ->cec_entered in the critical section of the mutex.
Signed-off-by: WANG Chao <chao.wang@ucloud.cn>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/20190418034115.75954-2-chao.wang@ucloud.cn
---
drivers/ras/cec.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c
index 2d9ec378a8bc..88e4f3ff0cb8 100644
--- a/drivers/ras/cec.c
+++ b/drivers/ras/cec.c
@@ -286,10 +286,10 @@ int cec_add_elem(u64 pfn)
if (!ce_arr.array || ce_arr.disabled)
return -ENODEV;
- ca->ces_entered++;
-
mutex_lock(&ce_mutex);
+ ca->ces_entered++;
+
if (ca->n == MAX_ELEMS)
WARN_ON(!del_lru_elem_unlocked(ca));
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [3/3] RAS/CEC: immediate soft-offline page when count_threshold == 1
@ 2019-04-20 11:57 ` Borislav Petkov
2019-04-20 11:57 ` [PATCH 3/3] " Borislav Petkov
2019-04-24 2:43 ` [3/3] " WANG Chao
0 siblings, 2 replies; 20+ messages in thread
From: Borislav Petkov @ 2019-04-20 11:57 UTC (permalink / raw)
To: WANG Chao; +Cc: Tony Luck, linux-kernel, linux-edac
On Thu, Apr 18, 2019 at 11:41:15AM +0800, WANG Chao wrote:
> count_threshol == 1 isn't working as expected. CEC only does soft
> offline the second time the same pfn is hit by a correctable error.
So this?
diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c
index b3c377ddf340..750a427e1a73 100644
--- a/drivers/ras/cec.c
+++ b/drivers/ras/cec.c
@@ -333,6 +333,7 @@ int cec_add_elem(u64 pfn)
mutex_lock(&ce_mutex);
+ /* Array full, free the LRU slot. */
if (ca->n == MAX_ELEMS)
WARN_ON(!del_lru_elem_unlocked(ca));
@@ -346,14 +347,9 @@ int cec_add_elem(u64 pfn)
(void *)&ca->array[to],
(ca->n - to) * sizeof(u64));
- ca->array[to] = (pfn << PAGE_SHIFT) |
- (DECAY_MASK << COUNT_BITS) | 1;
+ ca->array[to] = (pfn << PAGE_SHIFT) | 1;
ca->n++;
-
- ret = 0;
-
- goto decay;
}
count = COUNT(ca->array[to]);
@@ -386,7 +382,6 @@ int cec_add_elem(u64 pfn)
goto unlock;
}
-decay:
ca->decay_count++;
if (ca->decay_count >= CLEAN_ELEMS)
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH 3/3] RAS/CEC: immediate soft-offline page when count_threshold == 1
2019-04-20 11:57 ` [3/3] " Borislav Petkov
@ 2019-04-20 11:57 ` Borislav Petkov
2019-04-24 2:43 ` [3/3] " WANG Chao
1 sibling, 0 replies; 20+ messages in thread
From: Borislav Petkov @ 2019-04-20 11:57 UTC (permalink / raw)
To: WANG Chao; +Cc: Tony Luck, linux-kernel, linux-edac
On Thu, Apr 18, 2019 at 11:41:15AM +0800, WANG Chao wrote:
> count_threshol == 1 isn't working as expected. CEC only does soft
> offline the second time the same pfn is hit by a correctable error.
So this?
---
diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c
index b3c377ddf340..750a427e1a73 100644
--- a/drivers/ras/cec.c
+++ b/drivers/ras/cec.c
@@ -333,6 +333,7 @@ int cec_add_elem(u64 pfn)
mutex_lock(&ce_mutex);
+ /* Array full, free the LRU slot. */
if (ca->n == MAX_ELEMS)
WARN_ON(!del_lru_elem_unlocked(ca));
@@ -346,14 +347,9 @@ int cec_add_elem(u64 pfn)
(void *)&ca->array[to],
(ca->n - to) * sizeof(u64));
- ca->array[to] = (pfn << PAGE_SHIFT) |
- (DECAY_MASK << COUNT_BITS) | 1;
+ ca->array[to] = (pfn << PAGE_SHIFT) | 1;
ca->n++;
-
- ret = 0;
-
- goto decay;
}
count = COUNT(ca->array[to]);
@@ -386,7 +382,6 @@ int cec_add_elem(u64 pfn)
goto unlock;
}
-decay:
ca->decay_count++;
if (ca->decay_count >= CLEAN_ELEMS)
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [3/3] RAS/CEC: immediate soft-offline page when count_threshold == 1
@ 2019-04-24 2:43 ` WANG Chao
2019-04-24 2:43 ` [PATCH 3/3] " WANG Chao
2019-04-24 10:26 ` [3/3] " Borislav Petkov
0 siblings, 2 replies; 20+ messages in thread
From: WANG Chao @ 2019-04-24 2:43 UTC (permalink / raw)
To: Borislav Petkov; +Cc: Tony Luck, linux-kernel, linux-edac
On 04/20/19 at 01:57P, Borislav Petkov wrote:
> On Thu, Apr 18, 2019 at 11:41:15AM +0800, WANG Chao wrote:
> > count_threshol == 1 isn't working as expected. CEC only does soft
> > offline the second time the same pfn is hit by a correctable error.
>
> So this?
>
> ---
> diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c
> index b3c377ddf340..750a427e1a73 100644
> --- a/drivers/ras/cec.c
> +++ b/drivers/ras/cec.c
> @@ -333,6 +333,7 @@ int cec_add_elem(u64 pfn)
>
> mutex_lock(&ce_mutex);
>
> + /* Array full, free the LRU slot. */
> if (ca->n == MAX_ELEMS)
> WARN_ON(!del_lru_elem_unlocked(ca));
>
> @@ -346,14 +347,9 @@ int cec_add_elem(u64 pfn)
> (void *)&ca->array[to],
> (ca->n - to) * sizeof(u64));
>
> - ca->array[to] = (pfn << PAGE_SHIFT) |
> - (DECAY_MASK << COUNT_BITS) | 1;
> + ca->array[to] = (pfn << PAGE_SHIFT) | 1;
>
> ca->n++;
> -
> - ret = 0;
> -
> - goto decay;
> }
>
> count = COUNT(ca->array[to]);
> @@ -386,7 +382,6 @@ int cec_add_elem(u64 pfn)
> goto unlock;
> }
>
> -decay:
> ca->decay_count++;
>
> if (ca->decay_count >= CLEAN_ELEMS)
It looks good to me. Thanks for a better fix.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 3/3] RAS/CEC: immediate soft-offline page when count_threshold == 1
2019-04-24 2:43 ` [3/3] " WANG Chao
@ 2019-04-24 2:43 ` WANG Chao
2019-04-24 10:26 ` [3/3] " Borislav Petkov
1 sibling, 0 replies; 20+ messages in thread
From: WANG Chao @ 2019-04-24 2:43 UTC (permalink / raw)
To: Borislav Petkov; +Cc: Tony Luck, linux-kernel, linux-edac
On 04/20/19 at 01:57P, Borislav Petkov wrote:
> On Thu, Apr 18, 2019 at 11:41:15AM +0800, WANG Chao wrote:
> > count_threshol == 1 isn't working as expected. CEC only does soft
> > offline the second time the same pfn is hit by a correctable error.
>
> So this?
>
> ---
> diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c
> index b3c377ddf340..750a427e1a73 100644
> --- a/drivers/ras/cec.c
> +++ b/drivers/ras/cec.c
> @@ -333,6 +333,7 @@ int cec_add_elem(u64 pfn)
>
> mutex_lock(&ce_mutex);
>
> + /* Array full, free the LRU slot. */
> if (ca->n == MAX_ELEMS)
> WARN_ON(!del_lru_elem_unlocked(ca));
>
> @@ -346,14 +347,9 @@ int cec_add_elem(u64 pfn)
> (void *)&ca->array[to],
> (ca->n - to) * sizeof(u64));
>
> - ca->array[to] = (pfn << PAGE_SHIFT) |
> - (DECAY_MASK << COUNT_BITS) | 1;
> + ca->array[to] = (pfn << PAGE_SHIFT) | 1;
>
> ca->n++;
> -
> - ret = 0;
> -
> - goto decay;
> }
>
> count = COUNT(ca->array[to]);
> @@ -386,7 +382,6 @@ int cec_add_elem(u64 pfn)
> goto unlock;
> }
>
> -decay:
> ca->decay_count++;
>
> if (ca->decay_count >= CLEAN_ELEMS)
It looks good to me. Thanks for a better fix.
^ permalink raw reply [flat|nested] 20+ messages in thread
* [3/3] RAS/CEC: immediate soft-offline page when count_threshold == 1
@ 2019-04-24 10:26 ` Borislav Petkov
2019-04-24 10:26 ` [PATCH 3/3] " Borislav Petkov
0 siblings, 1 reply; 20+ messages in thread
From: Borislav Petkov @ 2019-04-24 10:26 UTC (permalink / raw)
To: WANG Chao; +Cc: Tony Luck, linux-kernel, linux-edac
On Wed, Apr 24, 2019 at 10:43:04AM +0800, WANG Chao wrote:
> It looks good to me. Thanks for a better fix.
Latest version:
https://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git/commit/?h=tip-ras-core-cec&id=aad216775348c4aaf467069c2e5fbf7ff6c27695
I'll post soon after I've hammered more on this thing.
Thx.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 3/3] RAS/CEC: immediate soft-offline page when count_threshold == 1
2019-04-24 10:26 ` [3/3] " Borislav Petkov
@ 2019-04-24 10:26 ` Borislav Petkov
0 siblings, 0 replies; 20+ messages in thread
From: Borislav Petkov @ 2019-04-24 10:26 UTC (permalink / raw)
To: WANG Chao; +Cc: Tony Luck, linux-kernel, linux-edac
On Wed, Apr 24, 2019 at 10:43:04AM +0800, WANG Chao wrote:
> It looks good to me. Thanks for a better fix.
Latest version:
https://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git/commit/?h=tip-ras-core-cec&id=aad216775348c4aaf467069c2e5fbf7ff6c27695
I'll post soon after I've hammered more on this thing.
Thx.
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
^ permalink raw reply [flat|nested] 20+ messages in thread
* [1/3] RAS/CEC: fix __find_elem
@ 2019-04-25 7:56 ` WANG Chao
2019-04-25 7:56 ` [PATCH 1/3] " WANG Chao
2019-04-25 8:05 ` [1/3] " WANG Chao
0 siblings, 2 replies; 20+ messages in thread
From: WANG Chao @ 2019-04-25 7:56 UTC (permalink / raw)
To: Borislav Petkov; +Cc: Tony Luck, linux-kernel, linux-edac
On 04/18/19 at 11:41P, WANG Chao wrote:
> A left over pfn (because we don't clear) at ca->array[n] can be a match
> in __find_elem. Later it'd cause a memmove size overflow in del_elem.
>
> Signed-off-by: WANG Chao <chao.wang@ucloud.cn>
> ---
> drivers/ras/cec.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c
> index 2d9ec378a8bc..2e0bf1269c31 100644
> --- a/drivers/ras/cec.c
> +++ b/drivers/ras/cec.c
> @@ -206,7 +206,7 @@ static int __find_elem(struct ce_array *ca, u64 pfn, unsigned int *to)
>
> this_pfn = PFN(ca->array[min]);
>
> - if (this_pfn == pfn)
> + if (this_pfn == pfn && ca->n > min)
> return min;
>
> return -ENOKEY;
Any thought on this one?
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/3] RAS/CEC: fix __find_elem
2019-04-25 7:56 ` [1/3] RAS/CEC: fix __find_elem WANG Chao
@ 2019-04-25 7:56 ` WANG Chao
2019-04-25 8:05 ` [1/3] " WANG Chao
1 sibling, 0 replies; 20+ messages in thread
From: WANG Chao @ 2019-04-25 7:56 UTC (permalink / raw)
To: Borislav Petkov; +Cc: Tony Luck, linux-kernel, linux-edac
On 04/18/19 at 11:41P, WANG Chao wrote:
> A left over pfn (because we don't clear) at ca->array[n] can be a match
> in __find_elem. Later it'd cause a memmove size overflow in del_elem.
>
> Signed-off-by: WANG Chao <chao.wang@ucloud.cn>
> ---
> drivers/ras/cec.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c
> index 2d9ec378a8bc..2e0bf1269c31 100644
> --- a/drivers/ras/cec.c
> +++ b/drivers/ras/cec.c
> @@ -206,7 +206,7 @@ static int __find_elem(struct ce_array *ca, u64 pfn, unsigned int *to)
>
> this_pfn = PFN(ca->array[min]);
>
> - if (this_pfn == pfn)
> + if (this_pfn == pfn && ca->n > min)
> return min;
>
> return -ENOKEY;
Any thought on this one?
^ permalink raw reply [flat|nested] 20+ messages in thread
* [1/3] RAS/CEC: fix __find_elem
@ 2019-04-25 8:05 ` WANG Chao
2019-04-25 8:05 ` [PATCH 1/3] " WANG Chao
0 siblings, 1 reply; 20+ messages in thread
From: WANG Chao @ 2019-04-25 8:05 UTC (permalink / raw)
To: Borislav Petkov; +Cc: Tony Luck, linux-kernel, linux-edac
On 04/25/19 at 03:56P, WANG Chao wrote:
> On 04/18/19 at 11:41P, WANG Chao wrote:
> > A left over pfn (because we don't clear) at ca->array[n] can be a match
> > in __find_elem. Later it'd cause a memmove size overflow in del_elem.
> >
> > Signed-off-by: WANG Chao <chao.wang@ucloud.cn>
> > ---
> > drivers/ras/cec.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c
> > index 2d9ec378a8bc..2e0bf1269c31 100644
> > --- a/drivers/ras/cec.c
> > +++ b/drivers/ras/cec.c
> > @@ -206,7 +206,7 @@ static int __find_elem(struct ce_array *ca, u64 pfn, unsigned int *to)
> >
> > this_pfn = PFN(ca->array[min]);
> >
> > - if (this_pfn == pfn)
> > + if (this_pfn == pfn && ca->n > min)
> > return min;
> >
> > return -ENOKEY;
>
> Any thought on this one?
Aha, I see there's another fix queued. Thanks.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/3] RAS/CEC: fix __find_elem
2019-04-25 8:05 ` [1/3] " WANG Chao
@ 2019-04-25 8:05 ` WANG Chao
0 siblings, 0 replies; 20+ messages in thread
From: WANG Chao @ 2019-04-25 8:05 UTC (permalink / raw)
To: Borislav Petkov; +Cc: Tony Luck, linux-kernel, linux-edac
On 04/25/19 at 03:56P, WANG Chao wrote:
> On 04/18/19 at 11:41P, WANG Chao wrote:
> > A left over pfn (because we don't clear) at ca->array[n] can be a match
> > in __find_elem. Later it'd cause a memmove size overflow in del_elem.
> >
> > Signed-off-by: WANG Chao <chao.wang@ucloud.cn>
> > ---
> > drivers/ras/cec.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c
> > index 2d9ec378a8bc..2e0bf1269c31 100644
> > --- a/drivers/ras/cec.c
> > +++ b/drivers/ras/cec.c
> > @@ -206,7 +206,7 @@ static int __find_elem(struct ce_array *ca, u64 pfn, unsigned int *to)
> >
> > this_pfn = PFN(ca->array[min]);
> >
> > - if (this_pfn == pfn)
> > + if (this_pfn == pfn && ca->n > min)
> > return min;
> >
> > return -ENOKEY;
>
> Any thought on this one?
Aha, I see there's another fix queued. Thanks.
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2019-04-25 8:05 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-18 3:41 [1/3] RAS/CEC: fix __find_elem WANG Chao
2019-04-18 3:41 ` [PATCH 1/3] " WANG Chao
2019-04-18 3:41 ` [2/3] RAS/CEC: make ces_entered smp safe WANG Chao
2019-04-18 3:41 ` [PATCH 2/3] " WANG Chao
2019-04-20 10:19 ` [tip:ras/core] RAS/CEC: Increment cec_entered under the mutex lock tip-bot for Borislav Petkov
2019-04-20 10:19 ` tip-bot for WANG Chao
2019-04-20 10:22 ` tip-bot for Borislav Petkov
2019-04-20 10:22 ` tip-bot for WANG Chao
2019-04-18 3:41 ` [3/3] RAS/CEC: immediate soft-offline page when count_threshold == 1 WANG Chao
2019-04-18 3:41 ` [PATCH 3/3] " WANG Chao
2019-04-20 11:57 ` [3/3] " Borislav Petkov
2019-04-20 11:57 ` [PATCH 3/3] " Borislav Petkov
2019-04-24 2:43 ` [3/3] " WANG Chao
2019-04-24 2:43 ` [PATCH 3/3] " WANG Chao
2019-04-24 10:26 ` [3/3] " Borislav Petkov
2019-04-24 10:26 ` [PATCH 3/3] " Borislav Petkov
2019-04-25 7:56 ` [1/3] RAS/CEC: fix __find_elem WANG Chao
2019-04-25 7:56 ` [PATCH 1/3] " WANG Chao
2019-04-25 8:05 ` [1/3] " WANG Chao
2019-04-25 8:05 ` [PATCH 1/3] " WANG Chao
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).