All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] ras: fix an off-by-one error in __find_elem()
@ 2019-04-16  1:20 ` Cong Wang
  0 siblings, 0 replies; 36+ messages in thread
From: Cong Wang @ 2019-04-16  1:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-edac, Cong Wang, Tony Luck, Borislav Petkov, Thomas Gleixner

ce_arr.array[] is always within the range [0, ce_arr.n-1].
However, the binary search code in __find_elem() uses ce_arr.n
as the maximum index, which could lead to an off-by-one
out-of-bound access when the element after the last is exactly
the one just got deleted, that is, 'min' returned to caller as
'ce_arr.n'.

Fixes: 011d82611172 ("RAS: Add a Corrected Errors Collector")
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 drivers/ras/cec.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c
index 2d9ec378a8bc..61332c9aab5a 100644
--- a/drivers/ras/cec.c
+++ b/drivers/ras/cec.c
@@ -184,7 +184,7 @@ static void cec_timer_fn(struct timer_list *unused)
 static int __find_elem(struct ce_array *ca, u64 pfn, unsigned int *to)
 {
 	u64 this_pfn;
-	int min = 0, max = ca->n;
+	int min = 0, max = ca->n - 1;
 
 	while (min < max) {
 		int tmp = (max + min) >> 1;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [1/2] ras: fix an off-by-one error in __find_elem()
@ 2019-04-16  1:20 ` Cong Wang
  0 siblings, 0 replies; 36+ messages in thread
From: Cong Wang @ 2019-04-16  1:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-edac, Cong Wang, Tony Luck, Borislav Petkov, Thomas Gleixner

ce_arr.array[] is always within the range [0, ce_arr.n-1].
However, the binary search code in __find_elem() uses ce_arr.n
as the maximum index, which could lead to an off-by-one
out-of-bound access when the element after the last is exactly
the one just got deleted, that is, 'min' returned to caller as
'ce_arr.n'.

Fixes: 011d82611172 ("RAS: Add a Corrected Errors Collector")
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 drivers/ras/cec.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c
index 2d9ec378a8bc..61332c9aab5a 100644
--- a/drivers/ras/cec.c
+++ b/drivers/ras/cec.c
@@ -184,7 +184,7 @@ static void cec_timer_fn(struct timer_list *unused)
 static int __find_elem(struct ce_array *ca, u64 pfn, unsigned int *to)
 {
 	u64 this_pfn;
-	int min = 0, max = ca->n;
+	int min = 0, max = ca->n - 1;
 
 	while (min < max) {
 		int tmp = (max + min) >> 1;

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 2/2] ras: close the race condition with timer
@ 2019-04-16  1:20   ` Cong Wang
  0 siblings, 0 replies; 36+ messages in thread
From: Cong Wang @ 2019-04-16  1:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-edac, Cong Wang, Tony Luck, Borislav Petkov, Thomas Gleixner

cec_timer_fn() is a timer callback which reads ce_arr.array[]
and updates its decay values. Elements could be added to or
removed from this global array in parallel, although the array
itself will not grow or shrink. del_lru_elem_unlocked() uses
FULL_COUNT() as a key to find a right element to remove,
which could be affected by the parallel timer.

Fix this by converting the mutex to a spinlock and holding it
inside the timer.

Fixes: 011d82611172 ("RAS: Add a Corrected Errors Collector")
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 drivers/ras/cec.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c
index 61332c9aab5a..a82c9d08d47a 100644
--- a/drivers/ras/cec.c
+++ b/drivers/ras/cec.c
@@ -117,7 +117,7 @@ static struct ce_array {
 	};
 } ce_arr;
 
-static DEFINE_MUTEX(ce_mutex);
+static DEFINE_SPINLOCK(ce_lock);
 static u64 dfs_pfn;
 
 /* Amount of errors after which we offline */
@@ -171,7 +171,9 @@ static void cec_mod_timer(struct timer_list *t, unsigned long interval)
 
 static void cec_timer_fn(struct timer_list *unused)
 {
+	spin_lock(&ce_lock);
 	do_spring_cleaning(&ce_arr);
+	spin_unlock(&ce_lock);
 
 	cec_mod_timer(&cec_timer, timer_interval);
 }
@@ -265,9 +267,9 @@ static u64 __maybe_unused del_lru_elem(void)
 	if (!ca->n)
 		return 0;
 
-	mutex_lock(&ce_mutex);
+	spin_lock_bh(&ce_lock);
 	pfn = del_lru_elem_unlocked(ca);
-	mutex_unlock(&ce_mutex);
+	spin_unlock_bh(&ce_lock);
 
 	return pfn;
 }
@@ -288,7 +290,7 @@ int cec_add_elem(u64 pfn)
 
 	ca->ces_entered++;
 
-	mutex_lock(&ce_mutex);
+	spin_lock_bh(&ce_lock);
 
 	if (ca->n == MAX_ELEMS)
 		WARN_ON(!del_lru_elem_unlocked(ca));
@@ -349,7 +351,7 @@ int cec_add_elem(u64 pfn)
 		do_spring_cleaning(ca);
 
 unlock:
-	mutex_unlock(&ce_mutex);
+	spin_unlock_bh(&ce_lock);
 
 	return ret;
 }
@@ -406,7 +408,7 @@ static int array_dump(struct seq_file *m, void *v)
 	u64 prev = 0;
 	int i;
 
-	mutex_lock(&ce_mutex);
+	spin_lock_bh(&ce_lock);
 
 	seq_printf(m, "{ n: %d\n", ca->n);
 	for (i = 0; i < ca->n; i++) {
@@ -431,7 +433,7 @@ static int array_dump(struct seq_file *m, void *v)
 
 	seq_printf(m, "Action threshold: %d\n", count_threshold);
 
-	mutex_unlock(&ce_mutex);
+	spin_unlock_bh(&ce_lock);
 
 	return 0;
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [2/2] ras: close the race condition with timer
@ 2019-04-16  1:20   ` Cong Wang
  0 siblings, 0 replies; 36+ messages in thread
From: Cong Wang @ 2019-04-16  1:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-edac, Cong Wang, Tony Luck, Borislav Petkov, Thomas Gleixner

cec_timer_fn() is a timer callback which reads ce_arr.array[]
and updates its decay values. Elements could be added to or
removed from this global array in parallel, although the array
itself will not grow or shrink. del_lru_elem_unlocked() uses
FULL_COUNT() as a key to find a right element to remove,
which could be affected by the parallel timer.

Fix this by converting the mutex to a spinlock and holding it
inside the timer.

Fixes: 011d82611172 ("RAS: Add a Corrected Errors Collector")
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 drivers/ras/cec.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c
index 61332c9aab5a..a82c9d08d47a 100644
--- a/drivers/ras/cec.c
+++ b/drivers/ras/cec.c
@@ -117,7 +117,7 @@ static struct ce_array {
 	};
 } ce_arr;
 
-static DEFINE_MUTEX(ce_mutex);
+static DEFINE_SPINLOCK(ce_lock);
 static u64 dfs_pfn;
 
 /* Amount of errors after which we offline */
@@ -171,7 +171,9 @@ static void cec_mod_timer(struct timer_list *t, unsigned long interval)
 
 static void cec_timer_fn(struct timer_list *unused)
 {
+	spin_lock(&ce_lock);
 	do_spring_cleaning(&ce_arr);
+	spin_unlock(&ce_lock);
 
 	cec_mod_timer(&cec_timer, timer_interval);
 }
@@ -265,9 +267,9 @@ static u64 __maybe_unused del_lru_elem(void)
 	if (!ca->n)
 		return 0;
 
-	mutex_lock(&ce_mutex);
+	spin_lock_bh(&ce_lock);
 	pfn = del_lru_elem_unlocked(ca);
-	mutex_unlock(&ce_mutex);
+	spin_unlock_bh(&ce_lock);
 
 	return pfn;
 }
@@ -288,7 +290,7 @@ int cec_add_elem(u64 pfn)
 
 	ca->ces_entered++;
 
-	mutex_lock(&ce_mutex);
+	spin_lock_bh(&ce_lock);
 
 	if (ca->n == MAX_ELEMS)
 		WARN_ON(!del_lru_elem_unlocked(ca));
@@ -349,7 +351,7 @@ int cec_add_elem(u64 pfn)
 		do_spring_cleaning(ca);
 
 unlock:
-	mutex_unlock(&ce_mutex);
+	spin_unlock_bh(&ce_lock);
 
 	return ret;
 }
@@ -406,7 +408,7 @@ static int array_dump(struct seq_file *m, void *v)
 	u64 prev = 0;
 	int i;
 
-	mutex_lock(&ce_mutex);
+	spin_lock_bh(&ce_lock);
 
 	seq_printf(m, "{ n: %d\n", ca->n);
 	for (i = 0; i < ca->n; i++) {
@@ -431,7 +433,7 @@ static int array_dump(struct seq_file *m, void *v)
 
 	seq_printf(m, "Action threshold: %d\n", count_threshold);
 
-	mutex_unlock(&ce_mutex);
+	spin_unlock_bh(&ce_lock);
 
 	return 0;
 }

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/2] ras: fix an off-by-one error in __find_elem()
@ 2019-04-16  9:07   ` Borislav Petkov
  0 siblings, 0 replies; 36+ messages in thread
From: Borislav Petkov @ 2019-04-16  9:07 UTC (permalink / raw)
  To: Cong Wang; +Cc: linux-kernel, linux-edac, Tony Luck, Thomas Gleixner

On Mon, Apr 15, 2019 at 06:20:00PM -0700, Cong Wang wrote:
> ce_arr.array[] is always within the range [0, ce_arr.n-1].
> However, the binary search code in __find_elem() uses ce_arr.n
> as the maximum index, which could lead to an off-by-one
> out-of-bound access when the element after the last is exactly
> the one just got deleted, that is, 'min' returned to caller as
> 'ce_arr.n'.

Sorry, I don't follow.

There's a debugfs interface in /sys/kernel/debug/ras/cec/ with which you
can input random PFNs and test the thing.

Show me pls how this can happen with an example.

Thx.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [1/2] ras: fix an off-by-one error in __find_elem()
@ 2019-04-16  9:07   ` Borislav Petkov
  0 siblings, 0 replies; 36+ messages in thread
From: Borislav Petkov @ 2019-04-16  9:07 UTC (permalink / raw)
  To: Cong Wang; +Cc: linux-kernel, linux-edac, Tony Luck, Thomas Gleixner

On Mon, Apr 15, 2019 at 06:20:00PM -0700, Cong Wang wrote:
> ce_arr.array[] is always within the range [0, ce_arr.n-1].
> However, the binary search code in __find_elem() uses ce_arr.n
> as the maximum index, which could lead to an off-by-one
> out-of-bound access when the element after the last is exactly
> the one just got deleted, that is, 'min' returned to caller as
> 'ce_arr.n'.

Sorry, I don't follow.

There's a debugfs interface in /sys/kernel/debug/ras/cec/ with which you
can input random PFNs and test the thing.

Show me pls how this can happen with an example.

Thx.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/2] ras: close the race condition with timer
@ 2019-04-16  9:58     ` Borislav Petkov
  0 siblings, 0 replies; 36+ messages in thread
From: Borislav Petkov @ 2019-04-16  9:58 UTC (permalink / raw)
  To: Cong Wang; +Cc: linux-kernel, linux-edac, Tony Luck, Thomas Gleixner

On Mon, Apr 15, 2019 at 06:20:01PM -0700, Cong Wang wrote:
> cec_timer_fn() is a timer callback which reads ce_arr.array[]
> and updates its decay values. Elements could be added to or
> removed from this global array in parallel, although the array
> itself will not grow or shrink. del_lru_elem_unlocked() uses
> FULL_COUNT() as a key to find a right element to remove,
> which could be affected by the parallel timer.
> 
> Fix this by converting the mutex to a spinlock and holding it
> inside the timer.
> 
> Fixes: 011d82611172 ("RAS: Add a Corrected Errors Collector")
> Cc: Tony Luck <tony.luck@intel.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
> ---
>  drivers/ras/cec.c | 16 +++++++++-------
>  1 file changed, 9 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c
> index 61332c9aab5a..a82c9d08d47a 100644
> --- a/drivers/ras/cec.c
> +++ b/drivers/ras/cec.c
> @@ -117,7 +117,7 @@ static struct ce_array {
>  	};
>  } ce_arr;
>  
> -static DEFINE_MUTEX(ce_mutex);
> +static DEFINE_SPINLOCK(ce_lock);

Nah, pls keep the simplicity here by retaining the mutex. Use a
workqueue instead to queue the spring cleaning from IRQ context and then
do it when back in preemptible context.

Thx.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [2/2] ras: close the race condition with timer
@ 2019-04-16  9:58     ` Borislav Petkov
  0 siblings, 0 replies; 36+ messages in thread
From: Borislav Petkov @ 2019-04-16  9:58 UTC (permalink / raw)
  To: Cong Wang; +Cc: linux-kernel, linux-edac, Tony Luck, Thomas Gleixner

On Mon, Apr 15, 2019 at 06:20:01PM -0700, Cong Wang wrote:
> cec_timer_fn() is a timer callback which reads ce_arr.array[]
> and updates its decay values. Elements could be added to or
> removed from this global array in parallel, although the array
> itself will not grow or shrink. del_lru_elem_unlocked() uses
> FULL_COUNT() as a key to find a right element to remove,
> which could be affected by the parallel timer.
> 
> Fix this by converting the mutex to a spinlock and holding it
> inside the timer.
> 
> Fixes: 011d82611172 ("RAS: Add a Corrected Errors Collector")
> Cc: Tony Luck <tony.luck@intel.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
> ---
>  drivers/ras/cec.c | 16 +++++++++-------
>  1 file changed, 9 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c
> index 61332c9aab5a..a82c9d08d47a 100644
> --- a/drivers/ras/cec.c
> +++ b/drivers/ras/cec.c
> @@ -117,7 +117,7 @@ static struct ce_array {
>  	};
>  } ce_arr;
>  
> -static DEFINE_MUTEX(ce_mutex);
> +static DEFINE_SPINLOCK(ce_lock);

Nah, pls keep the simplicity here by retaining the mutex. Use a
workqueue instead to queue the spring cleaning from IRQ context and then
do it when back in preemptible context.

Thx.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/2] ras: fix an off-by-one error in __find_elem()
@ 2019-04-16 17:01     ` Cong Wang
  0 siblings, 0 replies; 36+ messages in thread
From: Cong Wang @ 2019-04-16 17:01 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: LKML, linux-edac, Tony Luck, Thomas Gleixner

On Tue, Apr 16, 2019 at 2:07 AM Borislav Petkov <bp@alien8.de> wrote:
>
> On Mon, Apr 15, 2019 at 06:20:00PM -0700, Cong Wang wrote:
> > ce_arr.array[] is always within the range [0, ce_arr.n-1].
> > However, the binary search code in __find_elem() uses ce_arr.n
> > as the maximum index, which could lead to an off-by-one
> > out-of-bound access when the element after the last is exactly
> > the one just got deleted, that is, 'min' returned to caller as
> > 'ce_arr.n'.
>
> Sorry, I don't follow.

Sorry for the confusion here. Let me try to make it clear with
the example I wrote down on a paper before submitting this patch.

Imagine we have the ca->array[] with ca->n = 4, the elements inside
are 0, 1, 2, 3. We are trying to find 5 in this array. (This is just to
simplify the following iterations of the while loop.)

So in this specific scenario, without my patch we have the following
inside the while loop:

min = 0, max = 4
tmp = 2
min = 3, max = 4
tmp = 3
min = 4, max = 4
break

It is okay to have min==4 after this loop as we still need to check
if array[4] is whether 5. The problem is array[4] could really be 5
before we delete array[4], so 4 could be returned to caller. And,
after that, 4 could be passed to del_elem() inside cec_add_elem(),
then the if check inside is passed as it is an unsigned operation,
then the memmove() accesses index 5...

To actually crash the kernel, we have to replace 4 with MAX_ELEMS
in the above example, kernel would crash either when reading
array[MAX_ELEMS] or in the memmove().


>
> There's a debugfs interface in /sys/kernel/debug/ras/cec/ with which you
> can input random PFNs and test the thing.
>
> Show me pls how this can happen with an example.
>

Let me try if I can figure out how to add and remove PFN's.

Thanks.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [1/2] ras: fix an off-by-one error in __find_elem()
@ 2019-04-16 17:01     ` Cong Wang
  0 siblings, 0 replies; 36+ messages in thread
From: Cong Wang @ 2019-04-16 17:01 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: LKML, linux-edac, Tony Luck, Thomas Gleixner

On Tue, Apr 16, 2019 at 2:07 AM Borislav Petkov <bp@alien8.de> wrote:
>
> On Mon, Apr 15, 2019 at 06:20:00PM -0700, Cong Wang wrote:
> > ce_arr.array[] is always within the range [0, ce_arr.n-1].
> > However, the binary search code in __find_elem() uses ce_arr.n
> > as the maximum index, which could lead to an off-by-one
> > out-of-bound access when the element after the last is exactly
> > the one just got deleted, that is, 'min' returned to caller as
> > 'ce_arr.n'.
>
> Sorry, I don't follow.

Sorry for the confusion here. Let me try to make it clear with
the example I wrote down on a paper before submitting this patch.

Imagine we have the ca->array[] with ca->n = 4, the elements inside
are 0, 1, 2, 3. We are trying to find 5 in this array. (This is just to
simplify the following iterations of the while loop.)

So in this specific scenario, without my patch we have the following
inside the while loop:

min = 0, max = 4
tmp = 2
min = 3, max = 4
tmp = 3
min = 4, max = 4
break

It is okay to have min==4 after this loop as we still need to check
if array[4] is whether 5. The problem is array[4] could really be 5
before we delete array[4], so 4 could be returned to caller. And,
after that, 4 could be passed to del_elem() inside cec_add_elem(),
then the if check inside is passed as it is an unsigned operation,
then the memmove() accesses index 5...

To actually crash the kernel, we have to replace 4 with MAX_ELEMS
in the above example, kernel would crash either when reading
array[MAX_ELEMS] or in the memmove().


>
> There's a debugfs interface in /sys/kernel/debug/ras/cec/ with which you
> can input random PFNs and test the thing.
>
> Show me pls how this can happen with an example.
>

Let me try if I can figure out how to add and remove PFN's.

Thanks.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/2] ras: close the race condition with timer
@ 2019-04-16 17:09       ` Cong Wang
  0 siblings, 0 replies; 36+ messages in thread
From: Cong Wang @ 2019-04-16 17:09 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: LKML, linux-edac, Tony Luck, Thomas Gleixner

On Tue, Apr 16, 2019 at 2:58 AM Borislav Petkov <bp@alien8.de> wrote:
>
> On Mon, Apr 15, 2019 at 06:20:01PM -0700, Cong Wang wrote:
> > cec_timer_fn() is a timer callback which reads ce_arr.array[]
> > and updates its decay values. Elements could be added to or
> > removed from this global array in parallel, although the array
> > itself will not grow or shrink. del_lru_elem_unlocked() uses
> > FULL_COUNT() as a key to find a right element to remove,
> > which could be affected by the parallel timer.
> >
> > Fix this by converting the mutex to a spinlock and holding it
> > inside the timer.
> >
> > Fixes: 011d82611172 ("RAS: Add a Corrected Errors Collector")
> > Cc: Tony Luck <tony.luck@intel.com>
> > Cc: Borislav Petkov <bp@alien8.de>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
> > ---
> >  drivers/ras/cec.c | 16 +++++++++-------
> >  1 file changed, 9 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c
> > index 61332c9aab5a..a82c9d08d47a 100644
> > --- a/drivers/ras/cec.c
> > +++ b/drivers/ras/cec.c
> > @@ -117,7 +117,7 @@ static struct ce_array {
> >       };
> >  } ce_arr;
> >
> > -static DEFINE_MUTEX(ce_mutex);
> > +static DEFINE_SPINLOCK(ce_lock);
>
> Nah, pls keep the simplicity here by retaining the mutex. Use a

I don't understand why mutex is simpler than a spinlock here.

They are just locks requiring different contexts, I don't see how one is
simpler than the other. Do you mind to be more specific?


> workqueue instead to queue the spring cleaning from IRQ context and then
> do it when back in preemptible context.

By workqueue, you must mean to say delayed work, right?

But the global workqueue is not serialized either, if you really prefer to
move it to a same workqueue for serialization, we have to switch to an
ordered workqueue here. It certainly could work, but I don't see how
this way is any simpler than just using a spinlock.

What do you think?

Thanks.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [2/2] ras: close the race condition with timer
@ 2019-04-16 17:09       ` Cong Wang
  0 siblings, 0 replies; 36+ messages in thread
From: Cong Wang @ 2019-04-16 17:09 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: LKML, linux-edac, Tony Luck, Thomas Gleixner

On Tue, Apr 16, 2019 at 2:58 AM Borislav Petkov <bp@alien8.de> wrote:
>
> On Mon, Apr 15, 2019 at 06:20:01PM -0700, Cong Wang wrote:
> > cec_timer_fn() is a timer callback which reads ce_arr.array[]
> > and updates its decay values. Elements could be added to or
> > removed from this global array in parallel, although the array
> > itself will not grow or shrink. del_lru_elem_unlocked() uses
> > FULL_COUNT() as a key to find a right element to remove,
> > which could be affected by the parallel timer.
> >
> > Fix this by converting the mutex to a spinlock and holding it
> > inside the timer.
> >
> > Fixes: 011d82611172 ("RAS: Add a Corrected Errors Collector")
> > Cc: Tony Luck <tony.luck@intel.com>
> > Cc: Borislav Petkov <bp@alien8.de>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
> > ---
> >  drivers/ras/cec.c | 16 +++++++++-------
> >  1 file changed, 9 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c
> > index 61332c9aab5a..a82c9d08d47a 100644
> > --- a/drivers/ras/cec.c
> > +++ b/drivers/ras/cec.c
> > @@ -117,7 +117,7 @@ static struct ce_array {
> >       };
> >  } ce_arr;
> >
> > -static DEFINE_MUTEX(ce_mutex);
> > +static DEFINE_SPINLOCK(ce_lock);
>
> Nah, pls keep the simplicity here by retaining the mutex. Use a

I don't understand why mutex is simpler than a spinlock here.

They are just locks requiring different contexts, I don't see how one is
simpler than the other. Do you mind to be more specific?


> workqueue instead to queue the spring cleaning from IRQ context and then
> do it when back in preemptible context.

By workqueue, you must mean to say delayed work, right?

But the global workqueue is not serialized either, if you really prefer to
move it to a same workqueue for serialization, we have to switch to an
ordered workqueue here. It certainly could work, but I don't see how
this way is any simpler than just using a spinlock.

What do you think?

Thanks.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/2] ras: close the race condition with timer
@ 2019-04-16 17:42         ` Borislav Petkov
  0 siblings, 0 replies; 36+ messages in thread
From: Borislav Petkov @ 2019-04-16 17:42 UTC (permalink / raw)
  To: Cong Wang; +Cc: LKML, linux-edac, Tony Luck, Thomas Gleixner

On Tue, Apr 16, 2019 at 10:09:49AM -0700, Cong Wang wrote:
> They are just locks requiring different contexts, I don't see how one is
> simpler than the other. Do you mind to be more specific?

Yes, I'd like for this whole CEC code to be lazy and preemptible as it
is not at all important when it does its work, as long as it gets it
done eventually.

Can't be preemptible with spinlocks.

> By workqueue, you must mean to say delayed work, right?
>
> But the global workqueue is not serialized either,

Serialized with what? Insertions?

That's what the mutex is for and the insertions happen in process
context.

So yeah, delayed_work sounds like what it should do. I.e.,
queue_delayed_work() and decay_interval_set() should do
mod_delayed_work(). Something along those lines, anyways.

Thx.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [2/2] ras: close the race condition with timer
@ 2019-04-16 17:42         ` Borislav Petkov
  0 siblings, 0 replies; 36+ messages in thread
From: Borislav Petkov @ 2019-04-16 17:42 UTC (permalink / raw)
  To: Cong Wang; +Cc: LKML, linux-edac, Tony Luck, Thomas Gleixner

On Tue, Apr 16, 2019 at 10:09:49AM -0700, Cong Wang wrote:
> They are just locks requiring different contexts, I don't see how one is
> simpler than the other. Do you mind to be more specific?

Yes, I'd like for this whole CEC code to be lazy and preemptible as it
is not at all important when it does its work, as long as it gets it
done eventually.

Can't be preemptible with spinlocks.

> By workqueue, you must mean to say delayed work, right?
>
> But the global workqueue is not serialized either,

Serialized with what? Insertions?

That's what the mutex is for and the insertions happen in process
context.

So yeah, delayed_work sounds like what it should do. I.e.,
queue_delayed_work() and decay_interval_set() should do
mod_delayed_work(). Something along those lines, anyways.

Thx.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/2] ras: close the race condition with timer
@ 2019-04-16 18:00           ` Cong Wang
  0 siblings, 0 replies; 36+ messages in thread
From: Cong Wang @ 2019-04-16 18:00 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: LKML, linux-edac, Tony Luck, Thomas Gleixner

On Tue, Apr 16, 2019 at 10:43 AM Borislav Petkov <bp@alien8.de> wrote:
>
> On Tue, Apr 16, 2019 at 10:09:49AM -0700, Cong Wang wrote:
> > They are just locks requiring different contexts, I don't see how one is
> > simpler than the other. Do you mind to be more specific?
>
> Yes, I'd like for this whole CEC code to be lazy and preemptible as it
> is not at all important when it does its work, as long as it gets it
> done eventually.
>
> Can't be preemptible with spinlocks.

Got it, but the work done with holding a spinlock isn't heavy, given
there are only at most 512 elements in the array.

>
> > By workqueue, you must mean to say delayed work, right?
> >
> > But the global workqueue is not serialized either,
>
> Serialized with what? Insertions?

Hmm? We still have to serialize either the timer callback or
a delayed work with the rest of array updates (add or delete),
right?

As far as I understand, two works in the global workqueue could
be still ran in parallel on different CPU's, this is why I said it has to
be an ordered queue to guarantee the serialization.


>
> That's what the mutex is for and the insertions happen in process
> context.
>
> So yeah, delayed_work sounds like what it should do. I.e.,
> queue_delayed_work() and decay_interval_set() should do
> mod_delayed_work(). Something along those lines, anyways.

This part is perfectly understood.

Thanks.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [2/2] ras: close the race condition with timer
@ 2019-04-16 18:00           ` Cong Wang
  0 siblings, 0 replies; 36+ messages in thread
From: Cong Wang @ 2019-04-16 18:00 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: LKML, linux-edac, Tony Luck, Thomas Gleixner

On Tue, Apr 16, 2019 at 10:43 AM Borislav Petkov <bp@alien8.de> wrote:
>
> On Tue, Apr 16, 2019 at 10:09:49AM -0700, Cong Wang wrote:
> > They are just locks requiring different contexts, I don't see how one is
> > simpler than the other. Do you mind to be more specific?
>
> Yes, I'd like for this whole CEC code to be lazy and preemptible as it
> is not at all important when it does its work, as long as it gets it
> done eventually.
>
> Can't be preemptible with spinlocks.

Got it, but the work done with holding a spinlock isn't heavy, given
there are only at most 512 elements in the array.

>
> > By workqueue, you must mean to say delayed work, right?
> >
> > But the global workqueue is not serialized either,
>
> Serialized with what? Insertions?

Hmm? We still have to serialize either the timer callback or
a delayed work with the rest of array updates (add or delete),
right?

As far as I understand, two works in the global workqueue could
be still ran in parallel on different CPU's, this is why I said it has to
be an ordered queue to guarantee the serialization.


>
> That's what the mutex is for and the insertions happen in process
> context.
>
> So yeah, delayed_work sounds like what it should do. I.e.,
> queue_delayed_work() and decay_interval_set() should do
> mod_delayed_work(). Something along those lines, anyways.

This part is perfectly understood.

Thanks.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/2] ras: close the race condition with timer
@ 2019-04-16 18:06             ` Cong Wang
  0 siblings, 0 replies; 36+ messages in thread
From: Cong Wang @ 2019-04-16 18:06 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: LKML, linux-edac, Tony Luck, Thomas Gleixner

On Tue, Apr 16, 2019 at 11:00 AM Cong Wang <xiyou.wangcong@gmail.com> wrote:
>
> As far as I understand, two works in the global workqueue could
> be still ran in parallel on different CPU's, this is why I said it has to
> be an ordered queue to guarantee the serialization.

Never mind, I think we both agree to hold the mutex inside the work
too. I misunderstood it.

I will update my patch to use delayed work.

Thanks.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [2/2] ras: close the race condition with timer
@ 2019-04-16 18:06             ` Cong Wang
  0 siblings, 0 replies; 36+ messages in thread
From: Cong Wang @ 2019-04-16 18:06 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: LKML, linux-edac, Tony Luck, Thomas Gleixner

On Tue, Apr 16, 2019 at 11:00 AM Cong Wang <xiyou.wangcong@gmail.com> wrote:
>
> As far as I understand, two works in the global workqueue could
> be still ran in parallel on different CPU's, this is why I said it has to
> be an ordered queue to guarantee the serialization.

Never mind, I think we both agree to hold the mutex inside the work
too. I misunderstood it.

I will update my patch to use delayed work.

Thanks.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/2] ras: fix an off-by-one error in __find_elem()
@ 2019-04-16 22:18     ` Luck, Tony
  0 siblings, 0 replies; 36+ messages in thread
From: Luck, Tony @ 2019-04-16 22:18 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Cong Wang, linux-kernel, linux-edac, Thomas Gleixner

On Tue, Apr 16, 2019 at 11:07:26AM +0200, Borislav Petkov wrote:
> On Mon, Apr 15, 2019 at 06:20:00PM -0700, Cong Wang wrote:
> > ce_arr.array[] is always within the range [0, ce_arr.n-1].
> > However, the binary search code in __find_elem() uses ce_arr.n
> > as the maximum index, which could lead to an off-by-one
> > out-of-bound access when the element after the last is exactly
> > the one just got deleted, that is, 'min' returned to caller as
> > 'ce_arr.n'.
> 
> Sorry, I don't follow.
> 
> There's a debugfs interface in /sys/kernel/debug/ras/cec/ with which you
> can input random PFNs and test the thing.
> 
> Show me pls how this can happen with an example.

The array of previously seen pfn values is one page.

The problem case occurs when we've seen enough distinct
errors that we have filled every entry, then we try to
look up a pfn that is larger that any seen before.

The loop:

	while (min < max) {
		...
	}

will terminate with "min" set to MAX_ELEMS. Then we
execute:

	this_pfn = PFN(ca->array[min]);

which references beyond the end of the space allocated
for ca->array.

Probably won't crash, but we will read a garbage value
from whatever memory is allocated next.

Chances are high that the test:

	if (this_pfn == pfn)

won't find that the garbage value matches the pfn that
we were looking for ... so we will likley be lucky and
not do anything too dumb. But we shouldn't just cross
our fingers and hope.

Fix looks mostly OK, but we should probably move the

	if (to)
		*to = min;

inside the new

	if (min < ca->n) {
		...
	}

clause.

-Tony

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [1/2] ras: fix an off-by-one error in __find_elem()
@ 2019-04-16 22:18     ` Luck, Tony
  0 siblings, 0 replies; 36+ messages in thread
From: Luck, Tony @ 2019-04-16 22:18 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Cong Wang, linux-kernel, linux-edac, Thomas Gleixner

On Tue, Apr 16, 2019 at 11:07:26AM +0200, Borislav Petkov wrote:
> On Mon, Apr 15, 2019 at 06:20:00PM -0700, Cong Wang wrote:
> > ce_arr.array[] is always within the range [0, ce_arr.n-1].
> > However, the binary search code in __find_elem() uses ce_arr.n
> > as the maximum index, which could lead to an off-by-one
> > out-of-bound access when the element after the last is exactly
> > the one just got deleted, that is, 'min' returned to caller as
> > 'ce_arr.n'.
> 
> Sorry, I don't follow.
> 
> There's a debugfs interface in /sys/kernel/debug/ras/cec/ with which you
> can input random PFNs and test the thing.
> 
> Show me pls how this can happen with an example.

The array of previously seen pfn values is one page.

The problem case occurs when we've seen enough distinct
errors that we have filled every entry, then we try to
look up a pfn that is larger that any seen before.

The loop:

	while (min < max) {
		...
	}

will terminate with "min" set to MAX_ELEMS. Then we
execute:

	this_pfn = PFN(ca->array[min]);

which references beyond the end of the space allocated
for ca->array.

Probably won't crash, but we will read a garbage value
from whatever memory is allocated next.

Chances are high that the test:

	if (this_pfn == pfn)

won't find that the garbage value matches the pfn that
we were looking for ... so we will likley be lucky and
not do anything too dumb. But we shouldn't just cross
our fingers and hope.

Fix looks mostly OK, but we should probably move the

	if (to)
		*to = min;

inside the new

	if (min < ca->n) {
		...
	}

clause.

-Tony

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/2] ras: fix an off-by-one error in __find_elem()
@ 2019-04-16 23:18       ` Cong Wang
  0 siblings, 0 replies; 36+ messages in thread
From: Cong Wang @ 2019-04-16 23:18 UTC (permalink / raw)
  To: Luck, Tony; +Cc: Borislav Petkov, LKML, linux-edac, Thomas Gleixner

On Tue, Apr 16, 2019 at 3:18 PM Luck, Tony <tony.luck@intel.com> wrote:
>
> On Tue, Apr 16, 2019 at 11:07:26AM +0200, Borislav Petkov wrote:
> > On Mon, Apr 15, 2019 at 06:20:00PM -0700, Cong Wang wrote:
> > > ce_arr.array[] is always within the range [0, ce_arr.n-1].
> > > However, the binary search code in __find_elem() uses ce_arr.n
> > > as the maximum index, which could lead to an off-by-one
> > > out-of-bound access when the element after the last is exactly
> > > the one just got deleted, that is, 'min' returned to caller as
> > > 'ce_arr.n'.
> >
> > Sorry, I don't follow.
> >
> > There's a debugfs interface in /sys/kernel/debug/ras/cec/ with which you
> > can input random PFNs and test the thing.
> >
> > Show me pls how this can happen with an example.
>
> The array of previously seen pfn values is one page.
>
> The problem case occurs when we've seen enough distinct
> errors that we have filled every entry, then we try to
> look up a pfn that is larger that any seen before.
>
> The loop:
>
>         while (min < max) {
>                 ...
>         }
>
> will terminate with "min" set to MAX_ELEMS. Then we
> execute:
>
>         this_pfn = PFN(ca->array[min]);
>
> which references beyond the end of the space allocated
> for ca->array.

Exactly.


>
> Probably won't crash, but we will read a garbage value
> from whatever memory is allocated next.


It will eventually crash, this can be proved. :)


>
> Chances are high that the test:
>
>         if (this_pfn == pfn)
>
> won't find that the garbage value matches the pfn that
> we were looking for ... so we will likley be lucky and
> not do anything too dumb. But we shouldn't just cross
> our fingers and hope.
>
> Fix looks mostly OK, but we should probably move the
>
>         if (to)
>                 *to = min;
>
> inside the new
>
>         if (min < ca->n) {
>                 ...
>         }
>
> clause.

No, we can't, in case of -ENOKEY, we still have to save the index
for caller to insert it at the end of the array, therefore *to must be
always saved.

Thanks.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [1/2] ras: fix an off-by-one error in __find_elem()
@ 2019-04-16 23:18       ` Cong Wang
  0 siblings, 0 replies; 36+ messages in thread
From: Cong Wang @ 2019-04-16 23:18 UTC (permalink / raw)
  To: Luck, Tony; +Cc: Borislav Petkov, LKML, linux-edac, Thomas Gleixner

On Tue, Apr 16, 2019 at 3:18 PM Luck, Tony <tony.luck@intel.com> wrote:
>
> On Tue, Apr 16, 2019 at 11:07:26AM +0200, Borislav Petkov wrote:
> > On Mon, Apr 15, 2019 at 06:20:00PM -0700, Cong Wang wrote:
> > > ce_arr.array[] is always within the range [0, ce_arr.n-1].
> > > However, the binary search code in __find_elem() uses ce_arr.n
> > > as the maximum index, which could lead to an off-by-one
> > > out-of-bound access when the element after the last is exactly
> > > the one just got deleted, that is, 'min' returned to caller as
> > > 'ce_arr.n'.
> >
> > Sorry, I don't follow.
> >
> > There's a debugfs interface in /sys/kernel/debug/ras/cec/ with which you
> > can input random PFNs and test the thing.
> >
> > Show me pls how this can happen with an example.
>
> The array of previously seen pfn values is one page.
>
> The problem case occurs when we've seen enough distinct
> errors that we have filled every entry, then we try to
> look up a pfn that is larger that any seen before.
>
> The loop:
>
>         while (min < max) {
>                 ...
>         }
>
> will terminate with "min" set to MAX_ELEMS. Then we
> execute:
>
>         this_pfn = PFN(ca->array[min]);
>
> which references beyond the end of the space allocated
> for ca->array.

Exactly.


>
> Probably won't crash, but we will read a garbage value
> from whatever memory is allocated next.


It will eventually crash, this can be proved. :)


>
> Chances are high that the test:
>
>         if (this_pfn == pfn)
>
> won't find that the garbage value matches the pfn that
> we were looking for ... so we will likley be lucky and
> not do anything too dumb. But we shouldn't just cross
> our fingers and hope.
>
> Fix looks mostly OK, but we should probably move the
>
>         if (to)
>                 *to = min;
>
> inside the new
>
>         if (min < ca->n) {
>                 ...
>         }
>
> clause.

No, we can't, in case of -ENOKEY, we still have to save the index
for caller to insert it at the end of the array, therefore *to must be
always saved.

Thanks.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/2] ras: fix an off-by-one error in __find_elem()
@ 2019-04-16 23:28         ` Luck, Tony
  0 siblings, 0 replies; 36+ messages in thread
From: Luck, Tony @ 2019-04-16 23:28 UTC (permalink / raw)
  To: Cong Wang; +Cc: Borislav Petkov, LKML, linux-edac, Thomas Gleixner

On Tue, Apr 16, 2019 at 04:18:57PM -0700, Cong Wang wrote:
> > The problem case occurs when we've seen enough distinct
> > errors that we have filled every entry, then we try to
> > look up a pfn that is larger that any seen before.
> >
> > The loop:
> >
> >         while (min < max) {
> >                 ...
> >         }
> >
> > will terminate with "min" set to MAX_ELEMS. Then we
> > execute:
> >
> >         this_pfn = PFN(ca->array[min]);
> >
> > which references beyond the end of the space allocated
> > for ca->array.
> 
> Exactly.

Hmmm. But can we ever really have this happen?  The call
sequence to get here looks like:


        mutex_lock(&ce_mutex);

        if (ca->n == MAX_ELEMS)
                WARN_ON(!del_lru_elem_unlocked(ca));

        ret = find_elem(ca, pfn, &to);

I.e. if the array was all the way full, we delete one element
before calling find_elem().  So when we get here:

static int __find_elem(struct ce_array *ca, u64 pfn, unsigned int *to)
{
        u64 this_pfn;
        int min = 0, max = ca->n;

The biggest value "max" can have is MAX_ELEMS-1


-Tony

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [1/2] ras: fix an off-by-one error in __find_elem()
@ 2019-04-16 23:28         ` Luck, Tony
  0 siblings, 0 replies; 36+ messages in thread
From: Luck, Tony @ 2019-04-16 23:28 UTC (permalink / raw)
  To: Cong Wang; +Cc: Borislav Petkov, LKML, linux-edac, Thomas Gleixner

On Tue, Apr 16, 2019 at 04:18:57PM -0700, Cong Wang wrote:
> > The problem case occurs when we've seen enough distinct
> > errors that we have filled every entry, then we try to
> > look up a pfn that is larger that any seen before.
> >
> > The loop:
> >
> >         while (min < max) {
> >                 ...
> >         }
> >
> > will terminate with "min" set to MAX_ELEMS. Then we
> > execute:
> >
> >         this_pfn = PFN(ca->array[min]);
> >
> > which references beyond the end of the space allocated
> > for ca->array.
> 
> Exactly.

Hmmm. But can we ever really have this happen?  The call
sequence to get here looks like:


        mutex_lock(&ce_mutex);

        if (ca->n == MAX_ELEMS)
                WARN_ON(!del_lru_elem_unlocked(ca));

        ret = find_elem(ca, pfn, &to);

I.e. if the array was all the way full, we delete one element
before calling find_elem().  So when we get here:

static int __find_elem(struct ce_array *ca, u64 pfn, unsigned int *to)
{
        u64 this_pfn;
        int min = 0, max = ca->n;

The biggest value "max" can have is MAX_ELEMS-1


-Tony

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/2] ras: fix an off-by-one error in __find_elem()
@ 2019-04-16 23:47           ` Cong Wang
  0 siblings, 0 replies; 36+ messages in thread
From: Cong Wang @ 2019-04-16 23:47 UTC (permalink / raw)
  To: Luck, Tony; +Cc: Borislav Petkov, LKML, linux-edac, Thomas Gleixner

On Tue, Apr 16, 2019 at 4:28 PM Luck, Tony <tony.luck@intel.com> wrote:
>
> On Tue, Apr 16, 2019 at 04:18:57PM -0700, Cong Wang wrote:
> > > The problem case occurs when we've seen enough distinct
> > > errors that we have filled every entry, then we try to
> > > look up a pfn that is larger that any seen before.
> > >
> > > The loop:
> > >
> > >         while (min < max) {
> > >                 ...
> > >         }
> > >
> > > will terminate with "min" set to MAX_ELEMS. Then we
> > > execute:
> > >
> > >         this_pfn = PFN(ca->array[min]);
> > >
> > > which references beyond the end of the space allocated
> > > for ca->array.
> >
> > Exactly.
>
> Hmmm. But can we ever really have this happen?  The call
> sequence to get here looks like:
>
>
>         mutex_lock(&ce_mutex);
>
>         if (ca->n == MAX_ELEMS)
>                 WARN_ON(!del_lru_elem_unlocked(ca));
>
>         ret = find_elem(ca, pfn, &to);
>
> I.e. if the array was all the way full, we delete one element
> before calling find_elem().  So when we get here:
>
> static int __find_elem(struct ce_array *ca, u64 pfn, unsigned int *to)
> {
>         u64 this_pfn;
>         int min = 0, max = ca->n;
>
> The biggest value "max" can have is MAX_ELEMS-1

This is exactly the explanation for why the crash is inside
memmove() rather than inside find_elem(). del_elem() actually
accesses off-by-two once we pass its 'if' check in line 232:

229 static void del_elem(struct ce_array *ca, int idx)
230 {
231         /* Save us a function call when deleting the last element. */
232         if (ca->n - (idx + 1))
233                 memmove((void *)&ca->array[idx],
234                         (void *)&ca->array[idx + 1],
235                         (ca->n - (idx + 1)) * sizeof(u64));
236
237         ca->n--;
238 }

idx is ca->n and ca->n is MAX_ELEMS-1, then the above if statement
becomes true, therefore idx+1 is MAX_ELEMS which is just beyond
the valid range.

Thanks.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [1/2] ras: fix an off-by-one error in __find_elem()
@ 2019-04-16 23:47           ` Cong Wang
  0 siblings, 0 replies; 36+ messages in thread
From: Cong Wang @ 2019-04-16 23:47 UTC (permalink / raw)
  To: Luck, Tony; +Cc: Borislav Petkov, LKML, linux-edac, Thomas Gleixner

On Tue, Apr 16, 2019 at 4:28 PM Luck, Tony <tony.luck@intel.com> wrote:
>
> On Tue, Apr 16, 2019 at 04:18:57PM -0700, Cong Wang wrote:
> > > The problem case occurs when we've seen enough distinct
> > > errors that we have filled every entry, then we try to
> > > look up a pfn that is larger that any seen before.
> > >
> > > The loop:
> > >
> > >         while (min < max) {
> > >                 ...
> > >         }
> > >
> > > will terminate with "min" set to MAX_ELEMS. Then we
> > > execute:
> > >
> > >         this_pfn = PFN(ca->array[min]);
> > >
> > > which references beyond the end of the space allocated
> > > for ca->array.
> >
> > Exactly.
>
> Hmmm. But can we ever really have this happen?  The call
> sequence to get here looks like:
>
>
>         mutex_lock(&ce_mutex);
>
>         if (ca->n == MAX_ELEMS)
>                 WARN_ON(!del_lru_elem_unlocked(ca));
>
>         ret = find_elem(ca, pfn, &to);
>
> I.e. if the array was all the way full, we delete one element
> before calling find_elem().  So when we get here:
>
> static int __find_elem(struct ce_array *ca, u64 pfn, unsigned int *to)
> {
>         u64 this_pfn;
>         int min = 0, max = ca->n;
>
> The biggest value "max" can have is MAX_ELEMS-1

This is exactly the explanation for why the crash is inside
memmove() rather than inside find_elem(). del_elem() actually
accesses off-by-two once we pass its 'if' check in line 232:

229 static void del_elem(struct ce_array *ca, int idx)
230 {
231         /* Save us a function call when deleting the last element. */
232         if (ca->n - (idx + 1))
233                 memmove((void *)&ca->array[idx],
234                         (void *)&ca->array[idx + 1],
235                         (ca->n - (idx + 1)) * sizeof(u64));
236
237         ca->n--;
238 }

idx is ca->n and ca->n is MAX_ELEMS-1, then the above if statement
becomes true, therefore idx+1 is MAX_ELEMS which is just beyond
the valid range.

Thanks.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/2] ras: fix an off-by-one error in __find_elem()
@ 2019-04-17  1:53             ` Luck, Tony
  0 siblings, 0 replies; 36+ messages in thread
From: Luck, Tony @ 2019-04-17  1:53 UTC (permalink / raw)
  To: Cong Wang; +Cc: Borislav Petkov, LKML, linux-edac, Thomas Gleixner

On Tue, Apr 16, 2019 at 04:47:55PM -0700, Cong Wang wrote:
> 229 static void del_elem(struct ce_array *ca, int idx)
> 230 {
> 231         /* Save us a function call when deleting the last element. */
> 232         if (ca->n - (idx + 1))
> 233                 memmove((void *)&ca->array[idx],
> 234                         (void *)&ca->array[idx + 1],
> 235                         (ca->n - (idx + 1)) * sizeof(u64));
> 236
> 237         ca->n--;
> 238 }
> 
> idx is ca->n and ca->n is MAX_ELEMS-1, then the above if statement
> becomes true, therefore idx+1 is MAX_ELEMS which is just beyond
> the valid range.

Is that really the memmove() where we die?  It looks like
it has a special case for dealing with the last element.

But this:

296         ret = find_elem(ca, pfn, &to);
297         if (ret < 0) {
298                 /*
299                  * Shift range [to-end] to make room for one more element.
300                  */
301                 memmove((void *)&ca->array[to + 1],
302                         (void *)&ca->array[to],
303                         (ca->n - to) * sizeof(u64));
304

looks like it also needs a special case for when "to ==  MAX_ELEMS-1"
(we don't need to memmove).

-Tony

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [1/2] ras: fix an off-by-one error in __find_elem()
@ 2019-04-17  1:53             ` Luck, Tony
  0 siblings, 0 replies; 36+ messages in thread
From: Luck, Tony @ 2019-04-17  1:53 UTC (permalink / raw)
  To: Cong Wang; +Cc: Borislav Petkov, LKML, linux-edac, Thomas Gleixner

On Tue, Apr 16, 2019 at 04:47:55PM -0700, Cong Wang wrote:
> 229 static void del_elem(struct ce_array *ca, int idx)
> 230 {
> 231         /* Save us a function call when deleting the last element. */
> 232         if (ca->n - (idx + 1))
> 233                 memmove((void *)&ca->array[idx],
> 234                         (void *)&ca->array[idx + 1],
> 235                         (ca->n - (idx + 1)) * sizeof(u64));
> 236
> 237         ca->n--;
> 238 }
> 
> idx is ca->n and ca->n is MAX_ELEMS-1, then the above if statement
> becomes true, therefore idx+1 is MAX_ELEMS which is just beyond
> the valid range.

Is that really the memmove() where we die?  It looks like
it has a special case for dealing with the last element.

But this:

296         ret = find_elem(ca, pfn, &to);
297         if (ret < 0) {
298                 /*
299                  * Shift range [to-end] to make room for one more element.
300                  */
301                 memmove((void *)&ca->array[to + 1],
302                         (void *)&ca->array[to],
303                         (ca->n - to) * sizeof(u64));
304

looks like it also needs a special case for when "to ==  MAX_ELEMS-1"
(we don't need to memmove).

-Tony

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/2] ras: fix an off-by-one error in __find_elem()
@ 2019-04-17  2:31               ` Cong Wang
  0 siblings, 0 replies; 36+ messages in thread
From: Cong Wang @ 2019-04-17  2:31 UTC (permalink / raw)
  To: Luck, Tony; +Cc: Borislav Petkov, LKML, linux-edac, Thomas Gleixner

On Tue, Apr 16, 2019 at 6:53 PM Luck, Tony <tony.luck@intel.com> wrote:
>
> On Tue, Apr 16, 2019 at 04:47:55PM -0700, Cong Wang wrote:
> > 229 static void del_elem(struct ce_array *ca, int idx)
> > 230 {
> > 231         /* Save us a function call when deleting the last element. */
> > 232         if (ca->n - (idx + 1))
> > 233                 memmove((void *)&ca->array[idx],
> > 234                         (void *)&ca->array[idx + 1],
> > 235                         (ca->n - (idx + 1)) * sizeof(u64));
> > 236
> > 237         ca->n--;
> > 238 }
> >
> > idx is ca->n and ca->n is MAX_ELEMS-1, then the above if statement
> > becomes true, therefore idx+1 is MAX_ELEMS which is just beyond
> > the valid range.
>
> Is that really the memmove() where we die?  It looks like
> it has a special case for dealing with the last element.

Yes it is, I have a stacktrace in production which clearly shows
del_elem.isra.1+0x34/0x40, unlike the one I triggered via fake
PFN's. I can show you if you want, it is on 4.14, so very unlikely
it is interesting to anyone here.



>
> But this:
>
> 296         ret = find_elem(ca, pfn, &to);
> 297         if (ret < 0) {
> 298                 /*
> 299                  * Shift range [to-end] to make room for one more element.
> 300                  */
> 301                 memmove((void *)&ca->array[to + 1],
> 302                         (void *)&ca->array[to],
> 303                         (ca->n - to) * sizeof(u64));
> 304
>
> looks like it also needs a special case for when "to ==  MAX_ELEMS-1"
> (we don't need to memmove).

In the specific I talked about, find_elem() returns non-negative, so it won't
even hit this branch. Remember how it passed the if check
(this_pfn == pfn)? ;)


Thanks.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [1/2] ras: fix an off-by-one error in __find_elem()
@ 2019-04-17  2:31               ` Cong Wang
  0 siblings, 0 replies; 36+ messages in thread
From: Cong Wang @ 2019-04-17  2:31 UTC (permalink / raw)
  To: Luck, Tony; +Cc: Borislav Petkov, LKML, linux-edac, Thomas Gleixner

On Tue, Apr 16, 2019 at 6:53 PM Luck, Tony <tony.luck@intel.com> wrote:
>
> On Tue, Apr 16, 2019 at 04:47:55PM -0700, Cong Wang wrote:
> > 229 static void del_elem(struct ce_array *ca, int idx)
> > 230 {
> > 231         /* Save us a function call when deleting the last element. */
> > 232         if (ca->n - (idx + 1))
> > 233                 memmove((void *)&ca->array[idx],
> > 234                         (void *)&ca->array[idx + 1],
> > 235                         (ca->n - (idx + 1)) * sizeof(u64));
> > 236
> > 237         ca->n--;
> > 238 }
> >
> > idx is ca->n and ca->n is MAX_ELEMS-1, then the above if statement
> > becomes true, therefore idx+1 is MAX_ELEMS which is just beyond
> > the valid range.
>
> Is that really the memmove() where we die?  It looks like
> it has a special case for dealing with the last element.

Yes it is, I have a stacktrace in production which clearly shows
del_elem.isra.1+0x34/0x40, unlike the one I triggered via fake
PFN's. I can show you if you want, it is on 4.14, so very unlikely
it is interesting to anyone here.



>
> But this:
>
> 296         ret = find_elem(ca, pfn, &to);
> 297         if (ret < 0) {
> 298                 /*
> 299                  * Shift range [to-end] to make room for one more element.
> 300                  */
> 301                 memmove((void *)&ca->array[to + 1],
> 302                         (void *)&ca->array[to],
> 303                         (ca->n - to) * sizeof(u64));
> 304
>
> looks like it also needs a special case for when "to ==  MAX_ELEMS-1"
> (we don't need to memmove).

In the specific I talked about, find_elem() returns non-negative, so it won't
even hit this branch. Remember how it passed the if check
(this_pfn == pfn)? ;)


Thanks.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/2] ras: fix an off-by-one error in __find_elem()
@ 2019-04-17  2:37                 ` Cong Wang
  0 siblings, 0 replies; 36+ messages in thread
From: Cong Wang @ 2019-04-17  2:37 UTC (permalink / raw)
  To: Luck, Tony; +Cc: Borislav Petkov, LKML, linux-edac, Thomas Gleixner

On Tue, Apr 16, 2019 at 7:31 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
> Yes it is, I have a stacktrace in production which clearly shows
> del_elem.isra.1+0x34/0x40, unlike the one I triggered via fake
> PFN's. I can show you if you want, it is on 4.14, so very unlikely
> it is interesting to anyone here.

Correct myself:

The one triggered via fake PFN's also shows
del_elem.constprop.1+0x39/0x40.

Sorry for the mistake, I kinda read into another crash...

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [1/2] ras: fix an off-by-one error in __find_elem()
@ 2019-04-17  2:37                 ` Cong Wang
  0 siblings, 0 replies; 36+ messages in thread
From: Cong Wang @ 2019-04-17  2:37 UTC (permalink / raw)
  To: Luck, Tony; +Cc: Borislav Petkov, LKML, linux-edac, Thomas Gleixner

On Tue, Apr 16, 2019 at 7:31 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
> Yes it is, I have a stacktrace in production which clearly shows
> del_elem.isra.1+0x34/0x40, unlike the one I triggered via fake
> PFN's. I can show you if you want, it is on 4.14, so very unlikely
> it is interesting to anyone here.

Correct myself:

The one triggered via fake PFN's also shows
del_elem.constprop.1+0x39/0x40.

Sorry for the mistake, I kinda read into another crash...

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/2] ras: fix an off-by-one error in __find_elem()
@ 2019-04-17 21:15                   ` Luck, Tony
  0 siblings, 0 replies; 36+ messages in thread
From: Luck, Tony @ 2019-04-17 21:15 UTC (permalink / raw)
  To: Cong Wang; +Cc: Borislav Petkov, LKML, linux-edac, Thomas Gleixner

On Tue, Apr 16, 2019 at 07:37:41PM -0700, Cong Wang wrote:
> On Tue, Apr 16, 2019 at 7:31 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
> > Yes it is, I have a stacktrace in production which clearly shows
> > del_elem.isra.1+0x34/0x40, unlike the one I triggered via fake
> > PFN's. I can show you if you want, it is on 4.14, so very unlikely
> > it is interesting to anyone here.
> 
> Correct myself:
> 
> The one triggered via fake PFN's also shows
> del_elem.constprop.1+0x39/0x40.
> 
> Sorry for the mistake, I kinda read into another crash...

Ok. Now I follow what you are saying. I'll try to rephrase
here.

1) Initial setup is that we filled the array. Set the threshold
to a low value, and then repeatedly added a new pfn (with a
value higher than any in the array).

2) First time:
We find the array is full, so delete one item. Add the new pfn
to the end (in slot MAX_ELEM-1)

3) Next time. Array is full. Delete one item (assume not the one
for our target pfn). This moved our target to slot MAX_ELEM-2.

4) Next time. Array isn't full. We get a hit on our pfn. Bump
the count. Find that it exceeded the threshold, so ask to
offline it. We delete this from the array. It is a tail
deletion so we just decrement ca->n.

5) Final time. Array isn't full. When we look for our pfn in
the array it isn't in the array[0,ca->n) range. So we ought
to return ENOKEY. But we don't. When we fall out of the
search loop:

136         while (min < max) {
137                 int tmp = (max + min) >> 1;
138
139                 this_pfn = PFN(ca->array[tmp]);
140
141                 if (this_pfn < pfn)
142                         min = tmp + 1;
143                 else if (this_pfn > pfn)
144                         max = tmp;
145                 else {
146                         min = tmp;
147                         break;
148                 }
149         }

We exit because "min" is not less than "max". We assign
"*to" with the index of the available slot.

151         if (to)
152                 *to = min;

This line is just wrong. "min" is still within the allocated
bounds of the array, but the value is "ca->n", i.e. the first
unused slot. The "garbage" value we pick up from here is what
was last assigned to the slot when it was valid. Because we have
been adding the same pfn over and over, the value we pick up is
the pfn that we "deleted" in step "4" above. But remember we
deleted it just by adjusting ca->n, not by overwriting this
entry.

154         this_pfn = PFN(ca->array[min]);


So now this test succeeds, when it should fail. We return
an index of "ca->n".
156         if (this_pfn == pfn)
157                 return min;
158
159         return -ENOKEY;

Things go downhill fast from here. We update the count for
this entry. Find it is above the threshold, so offline the
page and try to delete.

226 static void del_elem(struct ce_array *ca, int idx)
227 {
228         /* Save us a function call when deleting the last element. */

idx is equal to ca->n. So this is "if (-1)" ... so we decide
we need to memmove some entries.  The array indices "idx" and "idx+1"
are not out of bounds for the allocated size, but they aren't what we
want. The real problem is the count passed to memmove.

229         if (ca->n - (idx + 1))
230                 memmove((void *)&ca->array[idx],
231                         (void *)&ca->array[idx + 1],
232                         (ca->n - (idx + 1)) * sizeof(u64));
233
234         ca->n--;
235 }


So I'm now liking this version of the patch by Cong:


diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c
index 2d9ec378a8bc..b7cb0b9b1346 100644
--- a/drivers/ras/cec.c
+++ b/drivers/ras/cec.c
@@ -204,10 +204,12 @@ static int __find_elem(struct ce_array *ca, u64 pfn, unsigned int *to)
 	if (to)
 		*to = min;
 
-	this_pfn = PFN(ca->array[min]);
+	if (min < ca->n) {
+		this_pfn = PFN(ca->array[min]);
 
-	if (this_pfn == pfn)
-		return min;
+		if (this_pfn == pfn)
+			return min;
+	}
 
 	return -ENOKEY;
 }

Reviewed-by: Tony Luck <tony.luck@intel.com>

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [1/2] ras: fix an off-by-one error in __find_elem()
@ 2019-04-17 21:15                   ` Luck, Tony
  0 siblings, 0 replies; 36+ messages in thread
From: Luck, Tony @ 2019-04-17 21:15 UTC (permalink / raw)
  To: Cong Wang; +Cc: Borislav Petkov, LKML, linux-edac, Thomas Gleixner

On Tue, Apr 16, 2019 at 07:37:41PM -0700, Cong Wang wrote:
> On Tue, Apr 16, 2019 at 7:31 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
> > Yes it is, I have a stacktrace in production which clearly shows
> > del_elem.isra.1+0x34/0x40, unlike the one I triggered via fake
> > PFN's. I can show you if you want, it is on 4.14, so very unlikely
> > it is interesting to anyone here.
> 
> Correct myself:
> 
> The one triggered via fake PFN's also shows
> del_elem.constprop.1+0x39/0x40.
> 
> Sorry for the mistake, I kinda read into another crash...

Ok. Now I follow what you are saying. I'll try to rephrase
here.

1) Initial setup is that we filled the array. Set the threshold
to a low value, and then repeatedly added a new pfn (with a
value higher than any in the array).

2) First time:
We find the array is full, so delete one item. Add the new pfn
to the end (in slot MAX_ELEM-1)

3) Next time. Array is full. Delete one item (assume not the one
for our target pfn). This moved our target to slot MAX_ELEM-2.

4) Next time. Array isn't full. We get a hit on our pfn. Bump
the count. Find that it exceeded the threshold, so ask to
offline it. We delete this from the array. It is a tail
deletion so we just decrement ca->n.

5) Final time. Array isn't full. When we look for our pfn in
the array it isn't in the array[0,ca->n) range. So we ought
to return ENOKEY. But we don't. When we fall out of the
search loop:

136         while (min < max) {
137                 int tmp = (max + min) >> 1;
138
139                 this_pfn = PFN(ca->array[tmp]);
140
141                 if (this_pfn < pfn)
142                         min = tmp + 1;
143                 else if (this_pfn > pfn)
144                         max = tmp;
145                 else {
146                         min = tmp;
147                         break;
148                 }
149         }

We exit because "min" is not less than "max". We assign
"*to" with the index of the available slot.

151         if (to)
152                 *to = min;

This line is just wrong. "min" is still within the allocated
bounds of the array, but the value is "ca->n", i.e. the first
unused slot. The "garbage" value we pick up from here is what
was last assigned to the slot when it was valid. Because we have
been adding the same pfn over and over, the value we pick up is
the pfn that we "deleted" in step "4" above. But remember we
deleted it just by adjusting ca->n, not by overwriting this
entry.

154         this_pfn = PFN(ca->array[min]);


So now this test succeeds, when it should fail. We return
an index of "ca->n".
156         if (this_pfn == pfn)
157                 return min;
158
159         return -ENOKEY;

Things go downhill fast from here. We update the count for
this entry. Find it is above the threshold, so offline the
page and try to delete.

226 static void del_elem(struct ce_array *ca, int idx)
227 {
228         /* Save us a function call when deleting the last element. */

idx is equal to ca->n. So this is "if (-1)" ... so we decide
we need to memmove some entries.  The array indices "idx" and "idx+1"
are not out of bounds for the allocated size, but they aren't what we
want. The real problem is the count passed to memmove.

229         if (ca->n - (idx + 1))
230                 memmove((void *)&ca->array[idx],
231                         (void *)&ca->array[idx + 1],
232                         (ca->n - (idx + 1)) * sizeof(u64));
233
234         ca->n--;
235 }


So I'm now liking this version of the patch by Cong:



Reviewed-by: Tony Luck <tony.luck@intel.com>

diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c
index 2d9ec378a8bc..b7cb0b9b1346 100644
--- a/drivers/ras/cec.c
+++ b/drivers/ras/cec.c
@@ -204,10 +204,12 @@ static int __find_elem(struct ce_array *ca, u64 pfn, unsigned int *to)
 	if (to)
 		*to = min;
 
-	this_pfn = PFN(ca->array[min]);
+	if (min < ca->n) {
+		this_pfn = PFN(ca->array[min]);
 
-	if (this_pfn == pfn)
-		return min;
+		if (this_pfn == pfn)
+			return min;
+	}
 
 	return -ENOKEY;
 }

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/2] ras: fix an off-by-one error in __find_elem()
@ 2019-04-18 22:54                     ` Cong Wang
  0 siblings, 0 replies; 36+ messages in thread
From: Cong Wang @ 2019-04-18 22:54 UTC (permalink / raw)
  To: Luck, Tony; +Cc: Borislav Petkov, LKML, linux-edac, Thomas Gleixner

On Wed, Apr 17, 2019 at 2:15 PM Luck, Tony <tony.luck@intel.com> wrote:
>
> On Tue, Apr 16, 2019 at 07:37:41PM -0700, Cong Wang wrote:
> > On Tue, Apr 16, 2019 at 7:31 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
> > > Yes it is, I have a stacktrace in production which clearly shows
> > > del_elem.isra.1+0x34/0x40, unlike the one I triggered via fake
> > > PFN's. I can show you if you want, it is on 4.14, so very unlikely
> > > it is interesting to anyone here.
> >
> > Correct myself:
> >
> > The one triggered via fake PFN's also shows
> > del_elem.constprop.1+0x39/0x40.
> >
> > Sorry for the mistake, I kinda read into another crash...
>
> Ok. Now I follow what you are saying. I'll try to rephrase
> here.

Yes, your detailed analysis is correct. Sorry if I caused any
confusion here.

>
>
> So I'm now liking this version of the patch by Cong:
>
...
> Reviewed-by: Tony Luck <tony.luck@intel.com>


Thanks for reviewing my patch!

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [1/2] ras: fix an off-by-one error in __find_elem()
@ 2019-04-18 22:54                     ` Cong Wang
  0 siblings, 0 replies; 36+ messages in thread
From: Cong Wang @ 2019-04-18 22:54 UTC (permalink / raw)
  To: Luck, Tony; +Cc: Borislav Petkov, LKML, linux-edac, Thomas Gleixner

On Wed, Apr 17, 2019 at 2:15 PM Luck, Tony <tony.luck@intel.com> wrote:
>
> On Tue, Apr 16, 2019 at 07:37:41PM -0700, Cong Wang wrote:
> > On Tue, Apr 16, 2019 at 7:31 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
> > > Yes it is, I have a stacktrace in production which clearly shows
> > > del_elem.isra.1+0x34/0x40, unlike the one I triggered via fake
> > > PFN's. I can show you if you want, it is on 4.14, so very unlikely
> > > it is interesting to anyone here.
> >
> > Correct myself:
> >
> > The one triggered via fake PFN's also shows
> > del_elem.constprop.1+0x39/0x40.
> >
> > Sorry for the mistake, I kinda read into another crash...
>
> Ok. Now I follow what you are saying. I'll try to rephrase
> here.

Yes, your detailed analysis is correct. Sorry if I caused any
confusion here.

>
>
> So I'm now liking this version of the patch by Cong:
>
...
> Reviewed-by: Tony Luck <tony.luck@intel.com>


Thanks for reviewing my patch!

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2019-04-18 22:54 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-16  1:20 [PATCH 1/2] ras: fix an off-by-one error in __find_elem() Cong Wang
2019-04-16  1:20 ` [1/2] " Cong Wang
2019-04-16  1:20 ` [PATCH 2/2] ras: close the race condition with timer Cong Wang
2019-04-16  1:20   ` [2/2] " Cong Wang
2019-04-16  9:58   ` [PATCH 2/2] " Borislav Petkov
2019-04-16  9:58     ` [2/2] " Borislav Petkov
2019-04-16 17:09     ` [PATCH 2/2] " Cong Wang
2019-04-16 17:09       ` [2/2] " Cong Wang
2019-04-16 17:42       ` [PATCH 2/2] " Borislav Petkov
2019-04-16 17:42         ` [2/2] " Borislav Petkov
2019-04-16 18:00         ` [PATCH 2/2] " Cong Wang
2019-04-16 18:00           ` [2/2] " Cong Wang
2019-04-16 18:06           ` [PATCH 2/2] " Cong Wang
2019-04-16 18:06             ` [2/2] " Cong Wang
2019-04-16  9:07 ` [PATCH 1/2] ras: fix an off-by-one error in __find_elem() Borislav Petkov
2019-04-16  9:07   ` [1/2] " Borislav Petkov
2019-04-16 17:01   ` [PATCH 1/2] " Cong Wang
2019-04-16 17:01     ` [1/2] " Cong Wang
2019-04-16 22:18   ` [PATCH 1/2] " Luck, Tony
2019-04-16 22:18     ` [1/2] " Luck, Tony
2019-04-16 23:18     ` [PATCH 1/2] " Cong Wang
2019-04-16 23:18       ` [1/2] " Cong Wang
2019-04-16 23:28       ` [PATCH 1/2] " Luck, Tony
2019-04-16 23:28         ` [1/2] " Luck, Tony
2019-04-16 23:47         ` [PATCH 1/2] " Cong Wang
2019-04-16 23:47           ` [1/2] " Cong Wang
2019-04-17  1:53           ` [PATCH 1/2] " Luck, Tony
2019-04-17  1:53             ` [1/2] " Luck, Tony
2019-04-17  2:31             ` [PATCH 1/2] " Cong Wang
2019-04-17  2:31               ` [1/2] " Cong Wang
2019-04-17  2:37               ` [PATCH 1/2] " Cong Wang
2019-04-17  2:37                 ` [1/2] " Cong Wang
2019-04-17 21:15                 ` [PATCH 1/2] " Luck, Tony
2019-04-17 21:15                   ` [1/2] " Luck, Tony
2019-04-18 22:54                   ` [PATCH 1/2] " Cong Wang
2019-04-18 22:54                     ` [1/2] " Cong Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.