* [PATCH 1/3] RAS/CEC: fix __find_elem @ 2019-04-18 3:41 ` WANG Chao 0 siblings, 0 replies; 21+ messages in thread From: WANG Chao @ 2019-04-18 3:41 UTC (permalink / raw) To: Borislav Petkov; +Cc: Tony Luck, linux-kernel, linux-edac A left over pfn (because we don't clear) at ca->array[n] can be a match in __find_elem. Later it'd cause a memmove size overflow in del_elem. Signed-off-by: WANG Chao <chao.wang@ucloud.cn> --- drivers/ras/cec.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c index 2d9ec378a8bc..2e0bf1269c31 100644 --- a/drivers/ras/cec.c +++ b/drivers/ras/cec.c @@ -206,7 +206,7 @@ static int __find_elem(struct ce_array *ca, u64 pfn, unsigned int *to) this_pfn = PFN(ca->array[min]); - if (this_pfn == pfn) + if (this_pfn == pfn && ca->n > min) return min; return -ENOKEY; -- 2.21.0 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* [1/3] RAS/CEC: fix __find_elem @ 2019-04-18 3:41 ` WANG Chao 0 siblings, 0 replies; 21+ messages in thread From: WANG Chao @ 2019-04-18 3:41 UTC (permalink / raw) To: Borislav Petkov; +Cc: Tony Luck, linux-kernel, linux-edac A left over pfn (because we don't clear) at ca->array[n] can be a match in __find_elem. Later it'd cause a memmove size overflow in del_elem. Signed-off-by: WANG Chao <chao.wang@ucloud.cn> --- drivers/ras/cec.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c index 2d9ec378a8bc..2e0bf1269c31 100644 --- a/drivers/ras/cec.c +++ b/drivers/ras/cec.c @@ -206,7 +206,7 @@ static int __find_elem(struct ce_array *ca, u64 pfn, unsigned int *to) this_pfn = PFN(ca->array[min]); - if (this_pfn == pfn) + if (this_pfn == pfn && ca->n > min) return min; return -ENOKEY; ^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH 2/3] RAS/CEC: make ces_entered smp safe @ 2019-04-18 3:41 ` WANG Chao 0 siblings, 0 replies; 21+ messages in thread From: WANG Chao @ 2019-04-18 3:41 UTC (permalink / raw) To: Borislav Petkov; +Cc: Tony Luck, linux-kernel, linux-edac ces_entered should be put in a critical section to avoid race condition. Signed-off-by: WANG Chao <chao.wang@ucloud.cn> --- drivers/ras/cec.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c index 2e0bf1269c31..702e4c02c713 100644 --- a/drivers/ras/cec.c +++ b/drivers/ras/cec.c @@ -286,10 +286,10 @@ int cec_add_elem(u64 pfn) if (!ce_arr.array || ce_arr.disabled) return -ENODEV; - ca->ces_entered++; - mutex_lock(&ce_mutex); + ca->ces_entered++; + if (ca->n == MAX_ELEMS) WARN_ON(!del_lru_elem_unlocked(ca)); -- 2.21.0 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* [2/3] RAS/CEC: make ces_entered smp safe @ 2019-04-18 3:41 ` WANG Chao 0 siblings, 0 replies; 21+ messages in thread From: WANG Chao @ 2019-04-18 3:41 UTC (permalink / raw) To: Borislav Petkov; +Cc: Tony Luck, linux-kernel, linux-edac ces_entered should be put in a critical section to avoid race condition. Signed-off-by: WANG Chao <chao.wang@ucloud.cn> --- drivers/ras/cec.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c index 2e0bf1269c31..702e4c02c713 100644 --- a/drivers/ras/cec.c +++ b/drivers/ras/cec.c @@ -286,10 +286,10 @@ int cec_add_elem(u64 pfn) if (!ce_arr.array || ce_arr.disabled) return -ENODEV; - ca->ces_entered++; - mutex_lock(&ce_mutex); + ca->ces_entered++; + if (ca->n == MAX_ELEMS) WARN_ON(!del_lru_elem_unlocked(ca)); ^ permalink raw reply related [flat|nested] 21+ messages in thread
* [tip:ras/core] RAS/CEC: Increment cec_entered under the mutex lock @ 2019-04-20 10:19 ` tip-bot for Borislav Petkov 0 siblings, 0 replies; 21+ messages in thread From: tip-bot for WANG Chao @ 2019-04-20 10:19 UTC (permalink / raw) To: linux-tip-commits Cc: tony.luck, linux-kernel, hpa, chao.wang, tglx, bp, mingo, linux-edac Commit-ID: 06e0fe2d8e9178bda874a75083bc13647fbf983f Gitweb: https://git.kernel.org/tip/06e0fe2d8e9178bda874a75083bc13647fbf983f Author: WANG Chao <chao.wang () ucloud ! cn> AuthorDate: Thu, 18 Apr 2019 03:41:14 +0000 Committer: Borislav Petkov <bp@suse.de> CommitDate: Sat, 20 Apr 2019 12:13:13 +0200 RAS/CEC: Increment cec_entered under the mutex lock Modify ->cec_entered in the critical section of the mutex. Signed-off-by: WANG Chao <chao.wang@ucloud.cn> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: https://lkml.kernel.org/r/20190418034115.75954-2-chao.wang@ucloud.cn --- drivers/ras/cec.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c index 2d9ec378a8bc..88e4f3ff0cb8 100644 --- a/drivers/ras/cec.c +++ b/drivers/ras/cec.c @@ -286,10 +286,10 @@ int cec_add_elem(u64 pfn) if (!ce_arr.array || ce_arr.disabled) return -ENODEV; - ca->ces_entered++; - mutex_lock(&ce_mutex); + ca->ces_entered++; + if (ca->n == MAX_ELEMS) WARN_ON(!del_lru_elem_unlocked(ca)); ^ permalink raw reply related [flat|nested] 21+ messages in thread
* [tip:ras/core] RAS/CEC: Increment cec_entered under the mutex lock @ 2019-04-20 10:19 ` tip-bot for Borislav Petkov 0 siblings, 0 replies; 21+ messages in thread From: tip-bot for Borislav Petkov @ 2019-04-20 10:19 UTC (permalink / raw) To: linux-tip-commits Cc: tony.luck, linux-kernel, hpa, chao.wang, tglx, bp, mingo, linux-edac Commit-ID: 06e0fe2d8e9178bda874a75083bc13647fbf983f Gitweb: https://git.kernel.org/tip/06e0fe2d8e9178bda874a75083bc13647fbf983f Author: WANG Chao <chao.wang () ucloud ! cn> AuthorDate: Thu, 18 Apr 2019 03:41:14 +0000 Committer: Borislav Petkov <bp@suse.de> CommitDate: Sat, 20 Apr 2019 12:13:13 +0200 RAS/CEC: Increment cec_entered under the mutex lock Modify ->cec_entered in the critical section of the mutex. Signed-off-by: WANG Chao <chao.wang@ucloud.cn> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: https://lkml.kernel.org/r/20190418034115.75954-2-chao.wang@ucloud.cn --- drivers/ras/cec.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c index 2d9ec378a8bc..88e4f3ff0cb8 100644 --- a/drivers/ras/cec.c +++ b/drivers/ras/cec.c @@ -286,10 +286,10 @@ int cec_add_elem(u64 pfn) if (!ce_arr.array || ce_arr.disabled) return -ENODEV; - ca->ces_entered++; - mutex_lock(&ce_mutex); + ca->ces_entered++; + if (ca->n == MAX_ELEMS) WARN_ON(!del_lru_elem_unlocked(ca)); ^ permalink raw reply related [flat|nested] 21+ messages in thread
* [tip:ras/core] RAS/CEC: Increment cec_entered under the mutex lock @ 2019-04-20 10:22 ` tip-bot for Borislav Petkov 0 siblings, 0 replies; 21+ messages in thread From: tip-bot for WANG Chao @ 2019-04-20 10:22 UTC (permalink / raw) To: linux-tip-commits Cc: tglx, mingo, tony.luck, linux-edac, linux-kernel, chao.wang, hpa, bp Commit-ID: 09cbd2197e9291d6a3d3f42873f06ca1f388c1a4 Gitweb: https://git.kernel.org/tip/09cbd2197e9291d6a3d3f42873f06ca1f388c1a4 Author: WANG Chao <chao.wang@ucloud.cn> AuthorDate: Thu, 18 Apr 2019 03:41:14 +0000 Committer: Borislav Petkov <bp@suse.de> CommitDate: Sat, 20 Apr 2019 12:16:52 +0200 RAS/CEC: Increment cec_entered under the mutex lock Modify ->cec_entered in the critical section of the mutex. Signed-off-by: WANG Chao <chao.wang@ucloud.cn> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: https://lkml.kernel.org/r/20190418034115.75954-2-chao.wang@ucloud.cn --- drivers/ras/cec.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c index 2d9ec378a8bc..88e4f3ff0cb8 100644 --- a/drivers/ras/cec.c +++ b/drivers/ras/cec.c @@ -286,10 +286,10 @@ int cec_add_elem(u64 pfn) if (!ce_arr.array || ce_arr.disabled) return -ENODEV; - ca->ces_entered++; - mutex_lock(&ce_mutex); + ca->ces_entered++; + if (ca->n == MAX_ELEMS) WARN_ON(!del_lru_elem_unlocked(ca)); ^ permalink raw reply related [flat|nested] 21+ messages in thread
* [tip:ras/core] RAS/CEC: Increment cec_entered under the mutex lock @ 2019-04-20 10:22 ` tip-bot for Borislav Petkov 0 siblings, 0 replies; 21+ messages in thread From: tip-bot for Borislav Petkov @ 2019-04-20 10:22 UTC (permalink / raw) To: linux-tip-commits Cc: tglx, mingo, tony.luck, linux-edac, linux-kernel, chao.wang, hpa, bp Commit-ID: 09cbd2197e9291d6a3d3f42873f06ca1f388c1a4 Gitweb: https://git.kernel.org/tip/09cbd2197e9291d6a3d3f42873f06ca1f388c1a4 Author: WANG Chao <chao.wang@ucloud.cn> AuthorDate: Thu, 18 Apr 2019 03:41:14 +0000 Committer: Borislav Petkov <bp@suse.de> CommitDate: Sat, 20 Apr 2019 12:16:52 +0200 RAS/CEC: Increment cec_entered under the mutex lock Modify ->cec_entered in the critical section of the mutex. Signed-off-by: WANG Chao <chao.wang@ucloud.cn> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: https://lkml.kernel.org/r/20190418034115.75954-2-chao.wang@ucloud.cn --- drivers/ras/cec.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c index 2d9ec378a8bc..88e4f3ff0cb8 100644 --- a/drivers/ras/cec.c +++ b/drivers/ras/cec.c @@ -286,10 +286,10 @@ int cec_add_elem(u64 pfn) if (!ce_arr.array || ce_arr.disabled) return -ENODEV; - ca->ces_entered++; - mutex_lock(&ce_mutex); + ca->ces_entered++; + if (ca->n == MAX_ELEMS) WARN_ON(!del_lru_elem_unlocked(ca)); ^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH 3/3] RAS/CEC: immediate soft-offline page when count_threshold == 1 @ 2019-04-18 3:41 ` WANG Chao 0 siblings, 0 replies; 21+ messages in thread From: WANG Chao @ 2019-04-18 3:41 UTC (permalink / raw) To: Borislav Petkov; +Cc: Tony Luck, linux-kernel, linux-edac count_threshol == 1 isn't working as expected. CEC only does soft offline the second time the same pfn is hit by a correctable error. Signed-off-by: WANG Chao <chao.wang@ucloud.cn> --- drivers/ras/cec.c | 36 +++++++++++++++++++++--------------- 1 file changed, 21 insertions(+), 15 deletions(-) diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c index 702e4c02c713..ac879c45377c 100644 --- a/drivers/ras/cec.c +++ b/drivers/ras/cec.c @@ -272,7 +272,22 @@ static u64 __maybe_unused del_lru_elem(void) return pfn; } +static void cec_valid_soft_offline(u64 pfn) +{ + if (!pfn_valid(pfn)) { + pr_warn("CEC: Invalid pfn: 0x%llx\n", pfn); + } else { + /* We have reached max count for this page, soft-offline it. */ + pr_err("Soft-offlining pfn: 0x%llx\n", pfn); + memory_failure_queue(pfn, MF_SOFT_OFFLINE, &cec_chain); + ce_arr.pfns_poisoned++; + } +} +/* + * Return a >0 value to denote that we've reached the offlining + * threshold. + */ int cec_add_elem(u64 pfn) { struct ce_array *ca = &ce_arr; @@ -295,6 +310,11 @@ int cec_add_elem(u64 pfn) ret = find_elem(ca, pfn, &to); if (ret < 0) { + if (count_threshold == 1) { + cec_valid_soft_offline(pfn); + ret = 1; + goto unlock; + } /* * Shift range [to-end] to make room for one more element. */ @@ -320,23 +340,9 @@ int cec_add_elem(u64 pfn) ret = 0; } else { - u64 pfn = ca->array[to] >> PAGE_SHIFT; - - if (!pfn_valid(pfn)) { - pr_warn("CEC: Invalid pfn: 0x%llx\n", pfn); - } else { - /* We have reached max count for this page, soft-offline it. */ - pr_err("Soft-offlining pfn: 0x%llx\n", pfn); - memory_failure_queue(pfn, MF_SOFT_OFFLINE); - ca->pfns_poisoned++; - } - + cec_valid_soft_offline(pfn); del_elem(ca, to); - /* - * Return a >0 value to denote that we've reached the offlining - * threshold. - */ ret = 1; goto unlock; -- 2.21.0 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* [3/3] RAS/CEC: immediate soft-offline page when count_threshold == 1 @ 2019-04-18 3:41 ` WANG Chao 0 siblings, 0 replies; 21+ messages in thread From: WANG Chao @ 2019-04-18 3:41 UTC (permalink / raw) To: Borislav Petkov; +Cc: Tony Luck, linux-kernel, linux-edac count_threshol == 1 isn't working as expected. CEC only does soft offline the second time the same pfn is hit by a correctable error. Signed-off-by: WANG Chao <chao.wang@ucloud.cn> --- drivers/ras/cec.c | 36 +++++++++++++++++++++--------------- 1 file changed, 21 insertions(+), 15 deletions(-) diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c index 702e4c02c713..ac879c45377c 100644 --- a/drivers/ras/cec.c +++ b/drivers/ras/cec.c @@ -272,7 +272,22 @@ static u64 __maybe_unused del_lru_elem(void) return pfn; } +static void cec_valid_soft_offline(u64 pfn) +{ + if (!pfn_valid(pfn)) { + pr_warn("CEC: Invalid pfn: 0x%llx\n", pfn); + } else { + /* We have reached max count for this page, soft-offline it. */ + pr_err("Soft-offlining pfn: 0x%llx\n", pfn); + memory_failure_queue(pfn, MF_SOFT_OFFLINE, &cec_chain); + ce_arr.pfns_poisoned++; + } +} +/* + * Return a >0 value to denote that we've reached the offlining + * threshold. + */ int cec_add_elem(u64 pfn) { struct ce_array *ca = &ce_arr; @@ -295,6 +310,11 @@ int cec_add_elem(u64 pfn) ret = find_elem(ca, pfn, &to); if (ret < 0) { + if (count_threshold == 1) { + cec_valid_soft_offline(pfn); + ret = 1; + goto unlock; + } /* * Shift range [to-end] to make room for one more element. */ @@ -320,23 +340,9 @@ int cec_add_elem(u64 pfn) ret = 0; } else { - u64 pfn = ca->array[to] >> PAGE_SHIFT; - - if (!pfn_valid(pfn)) { - pr_warn("CEC: Invalid pfn: 0x%llx\n", pfn); - } else { - /* We have reached max count for this page, soft-offline it. */ - pr_err("Soft-offlining pfn: 0x%llx\n", pfn); - memory_failure_queue(pfn, MF_SOFT_OFFLINE); - ca->pfns_poisoned++; - } - + cec_valid_soft_offline(pfn); del_elem(ca, to); - /* - * Return a >0 value to denote that we've reached the offlining - * threshold. - */ ret = 1; goto unlock; ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH 3/3] RAS/CEC: immediate soft-offline page when count_threshold == 1 @ 2019-04-20 11:57 ` Borislav Petkov 0 siblings, 0 replies; 21+ messages in thread From: Borislav Petkov @ 2019-04-20 11:57 UTC (permalink / raw) To: WANG Chao; +Cc: Tony Luck, linux-kernel, linux-edac On Thu, Apr 18, 2019 at 11:41:15AM +0800, WANG Chao wrote: > count_threshol == 1 isn't working as expected. CEC only does soft > offline the second time the same pfn is hit by a correctable error. So this? --- diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c index b3c377ddf340..750a427e1a73 100644 --- a/drivers/ras/cec.c +++ b/drivers/ras/cec.c @@ -333,6 +333,7 @@ int cec_add_elem(u64 pfn) mutex_lock(&ce_mutex); + /* Array full, free the LRU slot. */ if (ca->n == MAX_ELEMS) WARN_ON(!del_lru_elem_unlocked(ca)); @@ -346,14 +347,9 @@ int cec_add_elem(u64 pfn) (void *)&ca->array[to], (ca->n - to) * sizeof(u64)); - ca->array[to] = (pfn << PAGE_SHIFT) | - (DECAY_MASK << COUNT_BITS) | 1; + ca->array[to] = (pfn << PAGE_SHIFT) | 1; ca->n++; - - ret = 0; - - goto decay; } count = COUNT(ca->array[to]); @@ -386,7 +382,6 @@ int cec_add_elem(u64 pfn) goto unlock; } -decay: ca->decay_count++; if (ca->decay_count >= CLEAN_ELEMS) -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply. ^ permalink raw reply related [flat|nested] 21+ messages in thread
* [3/3] RAS/CEC: immediate soft-offline page when count_threshold == 1 @ 2019-04-20 11:57 ` Borislav Petkov 0 siblings, 0 replies; 21+ messages in thread From: Borislav Petkov @ 2019-04-20 11:57 UTC (permalink / raw) To: WANG Chao; +Cc: Tony Luck, linux-kernel, linux-edac On Thu, Apr 18, 2019 at 11:41:15AM +0800, WANG Chao wrote: > count_threshol == 1 isn't working as expected. CEC only does soft > offline the second time the same pfn is hit by a correctable error. So this? diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c index b3c377ddf340..750a427e1a73 100644 --- a/drivers/ras/cec.c +++ b/drivers/ras/cec.c @@ -333,6 +333,7 @@ int cec_add_elem(u64 pfn) mutex_lock(&ce_mutex); + /* Array full, free the LRU slot. */ if (ca->n == MAX_ELEMS) WARN_ON(!del_lru_elem_unlocked(ca)); @@ -346,14 +347,9 @@ int cec_add_elem(u64 pfn) (void *)&ca->array[to], (ca->n - to) * sizeof(u64)); - ca->array[to] = (pfn << PAGE_SHIFT) | - (DECAY_MASK << COUNT_BITS) | 1; + ca->array[to] = (pfn << PAGE_SHIFT) | 1; ca->n++; - - ret = 0; - - goto decay; } count = COUNT(ca->array[to]); @@ -386,7 +382,6 @@ int cec_add_elem(u64 pfn) goto unlock; } -decay: ca->decay_count++; if (ca->decay_count >= CLEAN_ELEMS) ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH 3/3] RAS/CEC: immediate soft-offline page when count_threshold == 1 @ 2019-04-24 2:43 ` WANG Chao 0 siblings, 0 replies; 21+ messages in thread From: WANG Chao @ 2019-04-24 2:43 UTC (permalink / raw) To: Borislav Petkov; +Cc: Tony Luck, linux-kernel, linux-edac On 04/20/19 at 01:57P, Borislav Petkov wrote: > On Thu, Apr 18, 2019 at 11:41:15AM +0800, WANG Chao wrote: > > count_threshol == 1 isn't working as expected. CEC only does soft > > offline the second time the same pfn is hit by a correctable error. > > So this? > > --- > diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c > index b3c377ddf340..750a427e1a73 100644 > --- a/drivers/ras/cec.c > +++ b/drivers/ras/cec.c > @@ -333,6 +333,7 @@ int cec_add_elem(u64 pfn) > > mutex_lock(&ce_mutex); > > + /* Array full, free the LRU slot. */ > if (ca->n == MAX_ELEMS) > WARN_ON(!del_lru_elem_unlocked(ca)); > > @@ -346,14 +347,9 @@ int cec_add_elem(u64 pfn) > (void *)&ca->array[to], > (ca->n - to) * sizeof(u64)); > > - ca->array[to] = (pfn << PAGE_SHIFT) | > - (DECAY_MASK << COUNT_BITS) | 1; > + ca->array[to] = (pfn << PAGE_SHIFT) | 1; > > ca->n++; > - > - ret = 0; > - > - goto decay; > } > > count = COUNT(ca->array[to]); > @@ -386,7 +382,6 @@ int cec_add_elem(u64 pfn) > goto unlock; > } > > -decay: > ca->decay_count++; > > if (ca->decay_count >= CLEAN_ELEMS) It looks good to me. Thanks for a better fix. ^ permalink raw reply [flat|nested] 21+ messages in thread
* [3/3] RAS/CEC: immediate soft-offline page when count_threshold == 1 @ 2019-04-24 2:43 ` WANG Chao 0 siblings, 0 replies; 21+ messages in thread From: WANG Chao @ 2019-04-24 2:43 UTC (permalink / raw) To: Borislav Petkov; +Cc: Tony Luck, linux-kernel, linux-edac On 04/20/19 at 01:57P, Borislav Petkov wrote: > On Thu, Apr 18, 2019 at 11:41:15AM +0800, WANG Chao wrote: > > count_threshol == 1 isn't working as expected. CEC only does soft > > offline the second time the same pfn is hit by a correctable error. > > So this? > > --- > diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c > index b3c377ddf340..750a427e1a73 100644 > --- a/drivers/ras/cec.c > +++ b/drivers/ras/cec.c > @@ -333,6 +333,7 @@ int cec_add_elem(u64 pfn) > > mutex_lock(&ce_mutex); > > + /* Array full, free the LRU slot. */ > if (ca->n == MAX_ELEMS) > WARN_ON(!del_lru_elem_unlocked(ca)); > > @@ -346,14 +347,9 @@ int cec_add_elem(u64 pfn) > (void *)&ca->array[to], > (ca->n - to) * sizeof(u64)); > > - ca->array[to] = (pfn << PAGE_SHIFT) | > - (DECAY_MASK << COUNT_BITS) | 1; > + ca->array[to] = (pfn << PAGE_SHIFT) | 1; > > ca->n++; > - > - ret = 0; > - > - goto decay; > } > > count = COUNT(ca->array[to]); > @@ -386,7 +382,6 @@ int cec_add_elem(u64 pfn) > goto unlock; > } > > -decay: > ca->decay_count++; > > if (ca->decay_count >= CLEAN_ELEMS) It looks good to me. Thanks for a better fix. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 3/3] RAS/CEC: immediate soft-offline page when count_threshold == 1 @ 2019-04-24 10:26 ` Borislav Petkov 0 siblings, 0 replies; 21+ messages in thread From: Borislav Petkov @ 2019-04-24 10:26 UTC (permalink / raw) To: WANG Chao; +Cc: Tony Luck, linux-kernel, linux-edac On Wed, Apr 24, 2019 at 10:43:04AM +0800, WANG Chao wrote: > It looks good to me. Thanks for a better fix. Latest version: https://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git/commit/?h=tip-ras-core-cec&id=aad216775348c4aaf467069c2e5fbf7ff6c27695 I'll post soon after I've hammered more on this thing. Thx. -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply. ^ permalink raw reply [flat|nested] 21+ messages in thread
* [3/3] RAS/CEC: immediate soft-offline page when count_threshold == 1 @ 2019-04-24 10:26 ` Borislav Petkov 0 siblings, 0 replies; 21+ messages in thread From: Borislav Petkov @ 2019-04-24 10:26 UTC (permalink / raw) To: WANG Chao; +Cc: Tony Luck, linux-kernel, linux-edac On Wed, Apr 24, 2019 at 10:43:04AM +0800, WANG Chao wrote: > It looks good to me. Thanks for a better fix. Latest version: https://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git/commit/?h=tip-ras-core-cec&id=aad216775348c4aaf467069c2e5fbf7ff6c27695 I'll post soon after I've hammered more on this thing. Thx. ^ permalink raw reply [flat|nested] 21+ messages in thread
* [tip:ras/core] RAS/CEC: Check count_threshold unconditionally 2019-04-18 3:41 ` [3/3] " WANG Chao (?) (?) @ 2019-06-08 21:26 ` tip-bot for Borislav Petkov -1 siblings, 0 replies; 21+ messages in thread From: tip-bot for Borislav Petkov @ 2019-06-08 21:26 UTC (permalink / raw) To: linux-tip-commits Cc: chao.wang, bp, mingo, linux-kernel, tony.luck, tglx, hpa Commit-ID: de0e0624d86ff9fc512dedb297f8978698abf21a Gitweb: https://git.kernel.org/tip/de0e0624d86ff9fc512dedb297f8978698abf21a Author: Borislav Petkov <bp@suse.de> AuthorDate: Sat, 20 Apr 2019 14:06:37 +0200 Committer: Borislav Petkov <bp@suse.de> CommitDate: Sat, 8 Jun 2019 17:33:10 +0200 RAS/CEC: Check count_threshold unconditionally The count_threshold should be checked unconditionally, after insertion too, so that a count_threshold value of 1 can cause an immediate offlining. I.e., offline the page on the *first* error encountered. Add comments to make it clear what cec_add_elem() does, while at it. Reported-by: WANG Chao <chao.wang@ucloud.cn> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac@vger.kernel.org Link: https://lkml.kernel.org/r/20190418034115.75954-3-chao.wang@ucloud.cn --- drivers/ras/cec.c | 27 ++++++++++----------------- 1 file changed, 10 insertions(+), 17 deletions(-) diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c index f5795adc5a6e..73a975c26f9f 100644 --- a/drivers/ras/cec.c +++ b/drivers/ras/cec.c @@ -294,6 +294,7 @@ int cec_add_elem(u64 pfn) ca->ces_entered++; + /* Array full, free the LRU slot. */ if (ca->n == MAX_ELEMS) WARN_ON(!del_lru_elem_unlocked(ca)); @@ -306,24 +307,17 @@ int cec_add_elem(u64 pfn) (void *)&ca->array[to], (ca->n - to) * sizeof(u64)); - ca->array[to] = (pfn << PAGE_SHIFT) | - (DECAY_MASK << COUNT_BITS) | 1; - + ca->array[to] = pfn << PAGE_SHIFT; ca->n++; - - ret = 0; - - goto decay; } - count = COUNT(ca->array[to]); - - if (count < count_threshold) { - ca->array[to] |= (DECAY_MASK << COUNT_BITS); - ca->array[to]++; + /* Add/refresh element generation and increment count */ + ca->array[to] |= DECAY_MASK << COUNT_BITS; + ca->array[to]++; - ret = 0; - } else { + /* Check action threshold and soft-offline, if reached. */ + count = COUNT(ca->array[to]); + if (count >= count_threshold) { u64 pfn = ca->array[to] >> PAGE_SHIFT; if (!pfn_valid(pfn)) { @@ -338,15 +332,14 @@ int cec_add_elem(u64 pfn) del_elem(ca, to); /* - * Return a >0 value to denote that we've reached the offlining - * threshold. + * Return a >0 value to callers, to denote that we've reached + * the offlining threshold. */ ret = 1; goto unlock; } -decay: ca->decay_count++; if (ca->decay_count >= CLEAN_ELEMS) ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH 1/3] RAS/CEC: fix __find_elem @ 2019-04-25 7:56 ` WANG Chao 0 siblings, 0 replies; 21+ messages in thread From: WANG Chao @ 2019-04-25 7:56 UTC (permalink / raw) To: Borislav Petkov; +Cc: Tony Luck, linux-kernel, linux-edac On 04/18/19 at 11:41P, WANG Chao wrote: > A left over pfn (because we don't clear) at ca->array[n] can be a match > in __find_elem. Later it'd cause a memmove size overflow in del_elem. > > Signed-off-by: WANG Chao <chao.wang@ucloud.cn> > --- > drivers/ras/cec.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c > index 2d9ec378a8bc..2e0bf1269c31 100644 > --- a/drivers/ras/cec.c > +++ b/drivers/ras/cec.c > @@ -206,7 +206,7 @@ static int __find_elem(struct ce_array *ca, u64 pfn, unsigned int *to) > > this_pfn = PFN(ca->array[min]); > > - if (this_pfn == pfn) > + if (this_pfn == pfn && ca->n > min) > return min; > > return -ENOKEY; Any thought on this one? ^ permalink raw reply [flat|nested] 21+ messages in thread
* [1/3] RAS/CEC: fix __find_elem @ 2019-04-25 7:56 ` WANG Chao 0 siblings, 0 replies; 21+ messages in thread From: WANG Chao @ 2019-04-25 7:56 UTC (permalink / raw) To: Borislav Petkov; +Cc: Tony Luck, linux-kernel, linux-edac On 04/18/19 at 11:41P, WANG Chao wrote: > A left over pfn (because we don't clear) at ca->array[n] can be a match > in __find_elem. Later it'd cause a memmove size overflow in del_elem. > > Signed-off-by: WANG Chao <chao.wang@ucloud.cn> > --- > drivers/ras/cec.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c > index 2d9ec378a8bc..2e0bf1269c31 100644 > --- a/drivers/ras/cec.c > +++ b/drivers/ras/cec.c > @@ -206,7 +206,7 @@ static int __find_elem(struct ce_array *ca, u64 pfn, unsigned int *to) > > this_pfn = PFN(ca->array[min]); > > - if (this_pfn == pfn) > + if (this_pfn == pfn && ca->n > min) > return min; > > return -ENOKEY; Any thought on this one? ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 1/3] RAS/CEC: fix __find_elem @ 2019-04-25 8:05 ` WANG Chao 0 siblings, 0 replies; 21+ messages in thread From: WANG Chao @ 2019-04-25 8:05 UTC (permalink / raw) To: Borislav Petkov; +Cc: Tony Luck, linux-kernel, linux-edac On 04/25/19 at 03:56P, WANG Chao wrote: > On 04/18/19 at 11:41P, WANG Chao wrote: > > A left over pfn (because we don't clear) at ca->array[n] can be a match > > in __find_elem. Later it'd cause a memmove size overflow in del_elem. > > > > Signed-off-by: WANG Chao <chao.wang@ucloud.cn> > > --- > > drivers/ras/cec.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c > > index 2d9ec378a8bc..2e0bf1269c31 100644 > > --- a/drivers/ras/cec.c > > +++ b/drivers/ras/cec.c > > @@ -206,7 +206,7 @@ static int __find_elem(struct ce_array *ca, u64 pfn, unsigned int *to) > > > > this_pfn = PFN(ca->array[min]); > > > > - if (this_pfn == pfn) > > + if (this_pfn == pfn && ca->n > min) > > return min; > > > > return -ENOKEY; > > Any thought on this one? Aha, I see there's another fix queued. Thanks. ^ permalink raw reply [flat|nested] 21+ messages in thread
* [1/3] RAS/CEC: fix __find_elem @ 2019-04-25 8:05 ` WANG Chao 0 siblings, 0 replies; 21+ messages in thread From: WANG Chao @ 2019-04-25 8:05 UTC (permalink / raw) To: Borislav Petkov; +Cc: Tony Luck, linux-kernel, linux-edac On 04/25/19 at 03:56P, WANG Chao wrote: > On 04/18/19 at 11:41P, WANG Chao wrote: > > A left over pfn (because we don't clear) at ca->array[n] can be a match > > in __find_elem. Later it'd cause a memmove size overflow in del_elem. > > > > Signed-off-by: WANG Chao <chao.wang@ucloud.cn> > > --- > > drivers/ras/cec.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c > > index 2d9ec378a8bc..2e0bf1269c31 100644 > > --- a/drivers/ras/cec.c > > +++ b/drivers/ras/cec.c > > @@ -206,7 +206,7 @@ static int __find_elem(struct ce_array *ca, u64 pfn, unsigned int *to) > > > > this_pfn = PFN(ca->array[min]); > > > > - if (this_pfn == pfn) > > + if (this_pfn == pfn && ca->n > min) > > return min; > > > > return -ENOKEY; > > Any thought on this one? Aha, I see there's another fix queued. Thanks. ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2019-06-08 21:26 UTC | newest] Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-04-18 3:41 [PATCH 1/3] RAS/CEC: fix __find_elem WANG Chao 2019-04-18 3:41 ` [1/3] " WANG Chao 2019-04-18 3:41 ` [PATCH 2/3] RAS/CEC: make ces_entered smp safe WANG Chao 2019-04-18 3:41 ` [2/3] " WANG Chao 2019-04-20 10:19 ` [tip:ras/core] RAS/CEC: Increment cec_entered under the mutex lock tip-bot for WANG Chao 2019-04-20 10:19 ` tip-bot for Borislav Petkov 2019-04-20 10:22 ` tip-bot for WANG Chao 2019-04-20 10:22 ` tip-bot for Borislav Petkov 2019-04-18 3:41 ` [PATCH 3/3] RAS/CEC: immediate soft-offline page when count_threshold == 1 WANG Chao 2019-04-18 3:41 ` [3/3] " WANG Chao 2019-04-20 11:57 ` [PATCH 3/3] " Borislav Petkov 2019-04-20 11:57 ` [3/3] " Borislav Petkov 2019-04-24 2:43 ` [PATCH 3/3] " WANG Chao 2019-04-24 2:43 ` [3/3] " WANG Chao 2019-04-24 10:26 ` [PATCH 3/3] " Borislav Petkov 2019-04-24 10:26 ` [3/3] " Borislav Petkov 2019-06-08 21:26 ` [tip:ras/core] RAS/CEC: Check count_threshold unconditionally tip-bot for Borislav Petkov 2019-04-25 7:56 ` [PATCH 1/3] RAS/CEC: fix __find_elem WANG Chao 2019-04-25 7:56 ` [1/3] " WANG Chao 2019-04-25 8:05 ` [PATCH 1/3] " WANG Chao 2019-04-25 8:05 ` [1/3] " WANG Chao
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.