All of lore.kernel.org
 help / color / mirror / Atom feed
* [kernel-hardening] Conversion from atomic_t to refcount_t: summary of issues
@ 2016-11-28 11:56 Reshetova, Elena
  2016-11-28 12:13 ` [kernel-hardening] " Peter Zijlstra
  0 siblings, 1 reply; 48+ messages in thread
From: Reshetova, Elena @ 2016-11-28 11:56 UTC (permalink / raw)
  To: Peter Zijlstra, kernel-hardening, Greg KH, Kees Cook,
	will.deacon, Boqun Feng
  Cc: Hans Liljestrand, David Windsor

Hi,

Now when we are almost over the huge set of conversions, I would like to make a summary of issues that we see to be very common.
So, that we can then decide how to address them.

First, about the types. 
We do have a number of instances of atomic_long_t used as refcounters, see below:

linux/perf_event.h:
struct perf_event {
...
 atomic_long_t refcount;
 ...
}

kernel/audit_tree.c:
struct audit_chunk {
...
    atomic_long_t refs;
..
}

kernel/acct.c:
struct bsd_acct_struct {
...
    atomic_long_t       count;
...
};

linux/fs.h:
struct file {
    ...
    atomic_long_t       f_count;
    ...
}

block/blk-ioc.c:
struct io_context {
    atomic_long_t refcount;
    ...
}

There might be more since we are not 100% finished, but at least struct file is pretty important one that we should be covering. 

And yes, we *do* have at least one instance (again not 100% finished, more might show up) of atomic64_t used as refcounter:

arch/powerpc/mm/mmu_context_iommu.c:
struct mm_iommu_table_group_mem_t {
...
    atomic64_t mapped;
...
}

Next with regards to API. Networking code surely wins the competitions of giving the most trouble.
The biggest overall issue seem to be in fact that freeing the object happens not when refcount is zero, but when it is -1,
which is obviously impossible to implement with current API that only returns unsigned int. 

Most common constructions that are hard to fit into current API are:

-    if (atomic_cmpxchg(&cur->refcnt, 1, 0) == 1) {...} (typical for networking code)
-    if (atomic_cmpxchg(&p->refcnt, 0, -1) == 0) {..} (typical for networking code)
-    if (atomic_add_unless(&inode->i_count, -1, 1)) (typical for fs and other code)

Also, refcount_add() seems to be needed in number of places since it looks like refcounts in some cases are increased by two or by some constant. 
Luckily we haven't seen a need a sub().

The following functions are also needed quite commonly:
refcount_inc_return()
refcount_dec_return()

There is also a need for smth like refcount_dec_if_one(), i.e. decrement only if counter equals one and then do some housekeeping. 

I also saw one use of this from net/ipv4/udp.c:
    if (!sk || !atomic_inc_not_zero_hint(&sk->sk_refcnt, 2))

Lastly as I mentioned previously, almost half of invocations of dec() in the code is plain atomic_dec() without any if statements and any checks on what happens as a result of dec(). 
Peter previously suggested to turn them into WARN_ON(refcount_dec_and_test()), but looking in the code, it is not really clear what would this help to achieve? 
It is clear that in that places the caller explicitly doesn't care about how the dec() goes and what is the end result....

Best Regards,
Elena.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [kernel-hardening] Re: Conversion from atomic_t to refcount_t: summary of issues
  2016-11-28 11:56 [kernel-hardening] Conversion from atomic_t to refcount_t: summary of issues Reshetova, Elena
@ 2016-11-28 12:13 ` Peter Zijlstra
  2016-11-28 12:44   ` Peter Zijlstra
                     ` (4 more replies)
  0 siblings, 5 replies; 48+ messages in thread
From: Peter Zijlstra @ 2016-11-28 12:13 UTC (permalink / raw)
  To: Reshetova, Elena
  Cc: kernel-hardening, Greg KH, Kees Cook, will.deacon, Boqun Feng,
	Hans Liljestrand, David Windsor, aik, david

On Mon, Nov 28, 2016 at 11:56:17AM +0000, Reshetova, Elena wrote:
> First, about the types. 
> We do have a number of instances of atomic_long_t used as refcounters, see below:

Right, those were expected. We could do long_refcount_t I suppose.

> And yes, we *do* have at least one instance (again not 100% finished,
> more might show up) of atomic64_t used as refcounter:
> 
> arch/powerpc/mm/mmu_context_iommu.c:
> struct mm_iommu_table_group_mem_t {
> ...
>     atomic64_t mapped;
> ...
> }

*urgh*, Alexey does that really need to be atomic64_t ? Wouldn't
atomic_long_t work for you?

> Next with regards to API. Networking code surely wins the competitions
> of giving the most trouble.  The biggest overall issue seem to be in
> fact that freeing the object happens not when refcount is zero, but
> when it is -1, which is obviously impossible to implement with current
> API that only returns unsigned int. 
> 
> Most common constructions that are hard to fit into current API are:
> 
> -    if (atomic_cmpxchg(&cur->refcnt, 1, 0) == 1) {...} (typical for networking code)

Right, we spoke about this before, and the dec_if_one() you mentioned
below could replace that.

> -    if (atomic_cmpxchg(&p->refcnt, 0, -1) == 0) {..} (typical for networking code)

That's really weird, a refcount of -1 doesn't really make sense.

> -    if (atomic_add_unless(&inode->i_count, -1, 1)) (typical for fs and other code)

And that's dec_not_one(), really weird that, why do they need that?

> Also, refcount_add() seems to be needed in number of places since it
> looks like refcounts in some cases are increased by two or by some
> constant.  Luckily we haven't seen a need a sub().

There is sub_and_test() usage in for example memcontrol.c.

> The following functions are also needed quite commonly:

> refcount_inc_return()
> refcount_dec_return()

What for? They don't typicaly make sense for refcounting? Other than the
trivial pattern of dec_return() == 0, which is already well covered.

> I also saw one use of this from net/ipv4/udp.c:
>     if (!sk || !atomic_inc_not_zero_hint(&sk->sk_refcnt, 2))

Yes, that one is quite unfortunate, we can trivially support that
ofcourse, but it does make a bit of a mess of things.

> Lastly as I mentioned previously, almost half of invocations of dec()
> in the code is plain atomic_dec() without any if statements and any
> checks on what happens as a result of dec().  Peter previously
> suggested to turn them into WARN_ON(refcount_dec_and_test()), but
> looking in the code, it is not really clear what would this help to
> achieve?  

Well, it clearly marks where refcounting goes bad and we leak crap. A
regular decrement should _never_ hit 0.

> It is clear that in that places the caller explicitly
> doesn't care about how the dec() goes and what is the end result....

No, the typical usage would be you _know_ it will not hit 0. Any other
usage is broken and bad.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [kernel-hardening] Re: Conversion from atomic_t to refcount_t: summary of issues
  2016-11-28 12:13 ` [kernel-hardening] " Peter Zijlstra
@ 2016-11-28 12:44   ` Peter Zijlstra
  2016-11-28 12:48   ` Peter Zijlstra
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 48+ messages in thread
From: Peter Zijlstra @ 2016-11-28 12:44 UTC (permalink / raw)
  To: Reshetova, Elena
  Cc: kernel-hardening, Greg KH, Kees Cook, will.deacon, Boqun Feng,
	Hans Liljestrand, David Windsor, aik, david

On Mon, Nov 28, 2016 at 01:13:47PM +0100, Peter Zijlstra wrote:
> On Mon, Nov 28, 2016 at 11:56:17AM +0000, Reshetova, Elena wrote:
> > -    if (atomic_cmpxchg(&p->refcnt, 0, -1) == 0) {..} (typical for networking code)
> 
> That's really weird, a refcount of -1 doesn't really make sense.

I looked at the one in inetpeer.c, and I think we can simply do +1 on
the entire refcount scheme and it'll work.

There is no dec_and_test anywhere, the only one doing deletion is that
GC. If that were to do dec_if_one(), then lookup_rcu() can do the normal
inc_not_zero().

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [kernel-hardening] Re: Conversion from atomic_t to refcount_t: summary of issues
  2016-11-28 12:13 ` [kernel-hardening] " Peter Zijlstra
  2016-11-28 12:44   ` Peter Zijlstra
@ 2016-11-28 12:48   ` Peter Zijlstra
  2016-11-28 14:12   ` [kernel-hardening] " Reshetova, Elena
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 48+ messages in thread
From: Peter Zijlstra @ 2016-11-28 12:48 UTC (permalink / raw)
  To: Reshetova, Elena
  Cc: kernel-hardening, Greg KH, Kees Cook, will.deacon, Boqun Feng,
	Hans Liljestrand, David Windsor, aik, david

On Mon, Nov 28, 2016 at 01:13:47PM +0100, Peter Zijlstra wrote:
> 
> > -    if (atomic_add_unless(&inode->i_count, -1, 1)) (typical for fs and other code)
> 

Many instances are variants of dec_and_lock/dec_and_mutex_lock(), like
for example the one in put_css_set().

However, there are a few, like in super.c and in XFS that are more
complex, but I have too much of a head-ache to actually think about that
atm.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [kernel-hardening] RE: Conversion from atomic_t to refcount_t: summary of issues
  2016-11-28 12:13 ` [kernel-hardening] " Peter Zijlstra
  2016-11-28 12:44   ` Peter Zijlstra
  2016-11-28 12:48   ` Peter Zijlstra
@ 2016-11-28 14:12   ` Reshetova, Elena
  2016-11-29  3:19   ` [kernel-hardening] " Alexey Kardashevskiy
  2016-11-29 15:35   ` [kernel-hardening] " Reshetova, Elena
  4 siblings, 0 replies; 48+ messages in thread
From: Reshetova, Elena @ 2016-11-28 14:12 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: kernel-hardening, Greg KH, Kees Cook, will.deacon, Boqun Feng,
	Hans Liljestrand, David Windsor, aik, david

> > Also, refcount_add() seems to be needed in number of places since it
> > looks like refcounts in some cases are increased by two or by some
> > constant.  Luckily we haven't seen a need a sub().
> 
> There is sub_and_test() usage in for example memcontrol.c.

Right (sub_and_test is one pattern I haven't processed yet, but I am *hoping* that there are not that many).
Also now I run into atomic_sub(len -1, &sk->sk_wmem_alloc) in net/core/sock.c

> 
> > The following functions are also needed quite commonly:
> 
> > refcount_inc_return()
> > refcount_dec_return()
> 
> What for? They don't typicaly make sense for refcounting? Other than the
> trivial pattern of dec_return() == 0, which is already well covered.

Well, I guess you have to ask developers what for. They do verify the return to be equal to different numbers or values...
Sometimes it is ">0", sometimes "==1", sometimes "!=-1". These all seem to be border cases and checked in some scenarios. 

> > I also saw one use of this from net/ipv4/udp.c:
> >     if (!sk || !atomic_inc_not_zero_hint(&sk->sk_refcnt, 2))
> 
> Yes, that one is quite unfortunate, we can trivially support that
> ofcourse, but it does make a bit of a mess of things.
> 
> > Lastly as I mentioned previously, almost half of invocations of dec()
> > in the code is plain atomic_dec() without any if statements and any
> > checks on what happens as a result of dec().  Peter previously
> > suggested to turn them into WARN_ON(refcount_dec_and_test()), but
> > looking in the code, it is not really clear what would this help to
> > achieve?
> 
> Well, it clearly marks where refcounting goes bad and we leak crap. A
> regular decrement should _never_ hit 0.
> 
> > It is clear that in that places the caller explicitly
> > doesn't care about how the dec() goes and what is the end result....
> 
> No, the typical usage would be you _know_ it will not hit 0. Any other
> usage is broken and bad.

Ok, makes sense. 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [kernel-hardening] Re: Conversion from atomic_t to refcount_t: summary of issues
  2016-11-28 12:13 ` [kernel-hardening] " Peter Zijlstra
                     ` (2 preceding siblings ...)
  2016-11-28 14:12   ` [kernel-hardening] " Reshetova, Elena
@ 2016-11-29  3:19   ` Alexey Kardashevskiy
  2016-11-29  9:31     ` Peter Zijlstra
  2016-11-29 15:35   ` [kernel-hardening] " Reshetova, Elena
  4 siblings, 1 reply; 48+ messages in thread
From: Alexey Kardashevskiy @ 2016-11-29  3:19 UTC (permalink / raw)
  To: Peter Zijlstra, Reshetova, Elena
  Cc: kernel-hardening, Greg KH, Kees Cook, will.deacon, Boqun Feng,
	Hans Liljestrand, David Windsor, david

On 28/11/16 23:13, Peter Zijlstra wrote:
> On Mon, Nov 28, 2016 at 11:56:17AM +0000, Reshetova, Elena wrote:
>> First, about the types. 
>> We do have a number of instances of atomic_long_t used as refcounters, see below:
> 
> Right, those were expected. We could do long_refcount_t I suppose.
> 
>> And yes, we *do* have at least one instance (again not 100% finished,
>> more might show up) of atomic64_t used as refcounter:
>>
>> arch/powerpc/mm/mmu_context_iommu.c:
>> struct mm_iommu_table_group_mem_t {
>> ...
>>     atomic64_t mapped;
>> ...
>> }
> 
> *urgh*, Alexey does that really need to be atomic64_t ? Wouldn't
> atomic_long_t work for you?


It would, this code only works in 64bit where long==64bit anyway (in fact
even 32bit variant would do).



-- 
Alexey

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [kernel-hardening] Re: Conversion from atomic_t to refcount_t: summary of issues
  2016-11-29  3:19   ` [kernel-hardening] " Alexey Kardashevskiy
@ 2016-11-29  9:31     ` Peter Zijlstra
  2016-11-30  0:23       ` Alexey Kardashevskiy
  0 siblings, 1 reply; 48+ messages in thread
From: Peter Zijlstra @ 2016-11-29  9:31 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Reshetova, Elena, kernel-hardening, Greg KH, Kees Cook,
	will.deacon, Boqun Feng, Hans Liljestrand, David Windsor, david

On Tue, Nov 29, 2016 at 02:19:56PM +1100, Alexey Kardashevskiy wrote:
> On 28/11/16 23:13, Peter Zijlstra wrote:
> > On Mon, Nov 28, 2016 at 11:56:17AM +0000, Reshetova, Elena wrote:
> >> First, about the types. 
> >> We do have a number of instances of atomic_long_t used as refcounters, see below:
> > 
> > Right, those were expected. We could do long_refcount_t I suppose.
> > 
> >> And yes, we *do* have at least one instance (again not 100% finished,
> >> more might show up) of atomic64_t used as refcounter:
> >>
> >> arch/powerpc/mm/mmu_context_iommu.c:
> >> struct mm_iommu_table_group_mem_t {
> >> ...
> >>     atomic64_t mapped;
> >> ...
> >> }
> > 
> > *urgh*, Alexey does that really need to be atomic64_t ? Wouldn't
> > atomic_long_t work for you?
> 
> 
> It would, this code only works in 64bit where long==64bit anyway (in fact
> even 32bit variant would do).
> 

Thanks, we'll convert it to a 32bit refcount then.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [kernel-hardening] RE: Conversion from atomic_t to refcount_t: summary of issues
  2016-11-28 12:13 ` [kernel-hardening] " Peter Zijlstra
                     ` (3 preceding siblings ...)
  2016-11-29  3:19   ` [kernel-hardening] " Alexey Kardashevskiy
@ 2016-11-29 15:35   ` Reshetova, Elena
  2016-11-29 15:47     ` Peter Zijlstra
                       ` (2 more replies)
  4 siblings, 3 replies; 48+ messages in thread
From: Reshetova, Elena @ 2016-11-29 15:35 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: kernel-hardening, Greg KH, Kees Cook, will.deacon, Boqun Feng,
	Hans Liljestrand, David Windsor, aik, david

So, could we agree on the following additions that are needed to refcount_t API:

- refcount_long_t and all related functions
- refcount_add(), refcount_sub(), refcount_sub_and_test()
- refcount_dec_return(), refcount_inc_return()
- refcount_dec_if_one()

With the above set we can hopefully convert almost everything we already saw and then we can decide what to do with remaining extreme cases. 

Peter, if you would be able to send the new patch providing the above API, it would be great!
I am on vacation from tomorrow to December 7, but Hans will be finishing processing rest of the cases and he can also do remaining conversions given that we agree on additional API. 
We can also provide the above API, but it is easier to maintain it in one place, so Peter if you keep updating your version, it is better that you send it out. 

Best Regards,
Elena.

> -----Original Message-----
> From: Peter Zijlstra [mailto:peterz@infradead.org]
> Sent: Monday, November 28, 2016 2:14 PM
> To: Reshetova, Elena <elena.reshetova@intel.com>
> Cc: kernel-hardening@lists.openwall.com; Greg KH
> <gregkh@linuxfoundation.org>; Kees Cook <keescook@chromium.org>;
> will.deacon@arm.com; Boqun Feng <boqun.feng@gmail.com>; Hans Liljestrand
> <ishkamiel@gmail.com>; David Windsor <dwindsor@gmail.com>; aik@ozlabs.ru;
> david@gibson.dropbear.id.au
> Subject: Re: Conversion from atomic_t to refcount_t: summary of issues
> 
> On Mon, Nov 28, 2016 at 11:56:17AM +0000, Reshetova, Elena wrote:
> > First, about the types.
> > We do have a number of instances of atomic_long_t used as refcounters, see
> below:
> 
> Right, those were expected. We could do long_refcount_t I suppose.
> 
> > And yes, we *do* have at least one instance (again not 100% finished,
> > more might show up) of atomic64_t used as refcounter:
> >
> > arch/powerpc/mm/mmu_context_iommu.c:
> > struct mm_iommu_table_group_mem_t {
> > ...
> >     atomic64_t mapped;
> > ...
> > }
> 
> *urgh*, Alexey does that really need to be atomic64_t ? Wouldn't
> atomic_long_t work for you?
> 
> > Next with regards to API. Networking code surely wins the competitions
> > of giving the most trouble.  The biggest overall issue seem to be in
> > fact that freeing the object happens not when refcount is zero, but
> > when it is -1, which is obviously impossible to implement with current
> > API that only returns unsigned int.
> >
> > Most common constructions that are hard to fit into current API are:
> >
> > -    if (atomic_cmpxchg(&cur->refcnt, 1, 0) == 1) {...} (typical for networking
> code)
> 
> Right, we spoke about this before, and the dec_if_one() you mentioned
> below could replace that.
> 
> > -    if (atomic_cmpxchg(&p->refcnt, 0, -1) == 0) {..} (typical for networking code)
> 
> That's really weird, a refcount of -1 doesn't really make sense.
> 
> > -    if (atomic_add_unless(&inode->i_count, -1, 1)) (typical for fs and other code)
> 
> And that's dec_not_one(), really weird that, why do they need that?
> 
> > Also, refcount_add() seems to be needed in number of places since it
> > looks like refcounts in some cases are increased by two or by some
> > constant.  Luckily we haven't seen a need a sub().
> 
> There is sub_and_test() usage in for example memcontrol.c.
> 
> > The following functions are also needed quite commonly:
> 
> > refcount_inc_return()
> > refcount_dec_return()
> 
> What for? They don't typicaly make sense for refcounting? Other than the
> trivial pattern of dec_return() == 0, which is already well covered.
> 
> > I also saw one use of this from net/ipv4/udp.c:
> >     if (!sk || !atomic_inc_not_zero_hint(&sk->sk_refcnt, 2))
> 
> Yes, that one is quite unfortunate, we can trivially support that
> ofcourse, but it does make a bit of a mess of things.
> 
> > Lastly as I mentioned previously, almost half of invocations of dec()
> > in the code is plain atomic_dec() without any if statements and any
> > checks on what happens as a result of dec().  Peter previously
> > suggested to turn them into WARN_ON(refcount_dec_and_test()), but
> > looking in the code, it is not really clear what would this help to
> > achieve?
> 
> Well, it clearly marks where refcounting goes bad and we leak crap. A
> regular decrement should _never_ hit 0.
> 
> > It is clear that in that places the caller explicitly
> > doesn't care about how the dec() goes and what is the end result....
> 
> No, the typical usage would be you _know_ it will not hit 0. Any other
> usage is broken and bad.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [kernel-hardening] RE: Conversion from atomic_t to refcount_t: summary of issues
  2016-11-29 15:35   ` [kernel-hardening] " Reshetova, Elena
@ 2016-11-29 15:47     ` Peter Zijlstra
  2016-12-01 19:15     ` [kernel-hardening] " Peter Zijlstra
  2016-12-07 14:13     ` Peter Zijlstra
  2 siblings, 0 replies; 48+ messages in thread
From: Peter Zijlstra @ 2016-11-29 15:47 UTC (permalink / raw)
  To: Reshetova, Elena
  Cc: kernel-hardening, Greg KH, Kees Cook, will.deacon, Boqun Feng,
	Hans Liljestrand, David Windsor, aik, david



On 29 November 2016 16:35:15 CET, "Reshetova, Elena" <elena.reshetova@intel.com> wrote:
>So, could we agree on the following additions that are needed to
>refcount_t API:
>
>- refcount_long_t and all related functions
>- refcount_add(), refcount_sub(), refcount_sub_and_test()
>- refcount_dec_return(), refcount_inc_return()
>- refcount_dec_if_one()
>
>With the above set we can hopefully convert almost everything we
>already saw and then we can decide what to do with remaining extreme
>cases. 
>
>Peter, if you would be able to send the new patch providing the above
>API, it would be great!

I'm not convinced on the return variants. I'll have to look at a number of usage sites to determine wth  they're doing. (Or someone else needs to and explain it here)

Also, I'm ill atm, so I'm not particularly productive.

Also, it would be very good to hear other opinions on the add/sub thing, last time they came up tglx in particular hated them.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [kernel-hardening] Re: Conversion from atomic_t to refcount_t: summary of issues
  2016-11-29  9:31     ` Peter Zijlstra
@ 2016-11-30  0:23       ` Alexey Kardashevskiy
  0 siblings, 0 replies; 48+ messages in thread
From: Alexey Kardashevskiy @ 2016-11-30  0:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Reshetova, Elena, kernel-hardening, Greg KH, Kees Cook,
	will.deacon, Boqun Feng, Hans Liljestrand, David Windsor, david

On 29/11/16 20:31, Peter Zijlstra wrote:
> On Tue, Nov 29, 2016 at 02:19:56PM +1100, Alexey Kardashevskiy wrote:
>> On 28/11/16 23:13, Peter Zijlstra wrote:
>>> On Mon, Nov 28, 2016 at 11:56:17AM +0000, Reshetova, Elena wrote:
>>>> First, about the types. 
>>>> We do have a number of instances of atomic_long_t used as refcounters, see below:
>>>
>>> Right, those were expected. We could do long_refcount_t I suppose.
>>>
>>>> And yes, we *do* have at least one instance (again not 100% finished,
>>>> more might show up) of atomic64_t used as refcounter:
>>>>
>>>> arch/powerpc/mm/mmu_context_iommu.c:
>>>> struct mm_iommu_table_group_mem_t {
>>>> ...
>>>>     atomic64_t mapped;
>>>> ...
>>>> }
>>>
>>> *urgh*, Alexey does that really need to be atomic64_t ? Wouldn't
>>> atomic_long_t work for you?
>>
>>
>> It would, this code only works in 64bit where long==64bit anyway (in fact
>> even 32bit variant would do).
>>
> 
> Thanks, we'll convert it to a 32bit refcount then.


I'd rather make it "long" as everything else in that struct is long (шюую
64ише) and having 32bit value in a middle won't save space but create an
useless gap.



-- 
Alexey

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [kernel-hardening] Re: Conversion from atomic_t to refcount_t: summary of issues
  2016-11-29 15:35   ` [kernel-hardening] " Reshetova, Elena
  2016-11-29 15:47     ` Peter Zijlstra
@ 2016-12-01 19:15     ` Peter Zijlstra
  2016-12-01 21:31       ` David Windsor
  2016-12-02 15:44       ` Liljestrand Hans
  2016-12-07 14:13     ` Peter Zijlstra
  2 siblings, 2 replies; 48+ messages in thread
From: Peter Zijlstra @ 2016-12-01 19:15 UTC (permalink / raw)
  To: Reshetova, Elena
  Cc: kernel-hardening, Greg KH, Kees Cook, will.deacon, Boqun Feng,
	Hans Liljestrand, David Windsor, aik, david

On Tue, Nov 29, 2016 at 03:35:15PM +0000, Reshetova, Elena wrote:
> but Hans will be finishing processing 

> > > The following functions are also needed quite commonly:
> > 
> > > refcount_inc_return()
> > > refcount_dec_return()
> > 
> > What for? They don't typicaly make sense for refcounting? Other than the
> > trivial pattern of dec_return() == 0, which is already well covered.

Hans, could you point me to a few users of {inc,dec}_return() that I can
audit for (in)sanity?

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [kernel-hardening] Re: Conversion from atomic_t to refcount_t: summary of issues
  2016-12-01 19:15     ` [kernel-hardening] " Peter Zijlstra
@ 2016-12-01 21:31       ` David Windsor
  2016-12-01 23:03         ` Peter Zijlstra
  2016-12-02 15:44       ` Liljestrand Hans
  1 sibling, 1 reply; 48+ messages in thread
From: David Windsor @ 2016-12-01 21:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Reshetova, Elena, kernel-hardening, Greg KH, Kees Cook,
	will.deacon, Boqun Feng, Hans Liljestrand, aik, david

[-- Attachment #1: Type: text/plain, Size: 2119 bytes --]

Also, I'd like to point out that while identifying stats_t instances, I
have found a similar distribution of non-standard functions (as agreed upon
for the stats_t API).

First, usage of atomic_long_wrap_t (there currently isn't a stats_long_t
planned for implementation):

(Use
https://github.com/ereshetova/linux-stable/blob/hardened_atomic_next_stats
to view these snippets in context.  Line numbers are accurate to within a
few lines).

include/linux/mm.h:2360:
    extern atomic_long_wrap_t num_poisoned_pages;

include/linux/mmzone.h:695
    atomic_long_wrap_t      vm_stat[NR_VM_NODE_STAT_ITEMS];

mm/memory-failure.c:64:
    atomic_long_wrap_t num_poisoned_pages __read_mostly =
ATOMIC_LONG_INIT(0);

... and several others.  Note, these are only from stats_t conversions for
the mm subsystem.

Next, API calls outside of the proposed stats_t API:

kernel/auditsc.c:2029:
   if (uid_valid(loginuid))
        sessionid = (unsigned int)atomic_inc_return_wrap(&session_id);

kernel/padata.c:58:
    seq_nr = atomic_inc_return_wrap(&pd->seq_nr);

kernel/rcu/tree_trace.c:192:
    s0 += atomic_long_read_wrap(&rdp->exp_workdone0);

kernel/trace/trace_mmiotrace.c:123
    atomic_xchg_wrap(&dropped_count, 0);

... and several others.  Note, these are only from stats_t conversions in
the kernel/ directory.

I haven't yet completed my audit of the entire kernel source tree in my
atomic_t-to-stats_t conversion efforts, so I don't yet have an exhaustive
list of non-supported functions, but will at some point soon.

Thanks,
David

On Thu, Dec 1, 2016 at 2:15 PM, Peter Zijlstra <peterz@infradead.org> wrote:

> On Tue, Nov 29, 2016 at 03:35:15PM +0000, Reshetova, Elena wrote:
> > but Hans will be finishing processing
>
> > > > The following functions are also needed quite commonly:
> > >
> > > > refcount_inc_return()
> > > > refcount_dec_return()
> > >
> > > What for? They don't typicaly make sense for refcounting? Other than
> the
> > > trivial pattern of dec_return() == 0, which is already well covered.
>
> Hans, could you point me to a few users of {inc,dec}_return() that I can
> audit for (in)sanity?
>

[-- Attachment #2: Type: text/html, Size: 2826 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [kernel-hardening] Re: Conversion from atomic_t to refcount_t: summary of issues
  2016-12-01 21:31       ` David Windsor
@ 2016-12-01 23:03         ` Peter Zijlstra
  2016-12-01 23:20           ` Kees Cook
  2016-12-01 23:20           ` David Windsor
  0 siblings, 2 replies; 48+ messages in thread
From: Peter Zijlstra @ 2016-12-01 23:03 UTC (permalink / raw)
  To: David Windsor
  Cc: Reshetova, Elena, kernel-hardening, Greg KH, Kees Cook,
	will.deacon, Boqun Feng, Hans Liljestrand, aik, david

On Thu, Dec 01, 2016 at 04:31:16PM -0500, David Windsor wrote:
> Also, I'd like to point out that while identifying stats_t instances, I
> have found a similar distribution of non-standard functions (as agreed upon
> for the stats_t API).

> First, usage of atomic_long_wrap_t (there currently isn't a stats_long_t
> planned for implementation):

There isn't even a stats_t planned. I'm still very much not convinced
stats_t is needed or even makes sense.

It wouldn't have any different semantics from atomic_t, and the only
argument Kees made was that reduced atomic_t usage would make it easier
to spot refcounts, but you're already building tools to find those.

Once the tools work, who cares.

> Next, API calls outside of the proposed stats_t API:
> 
> kernel/auditsc.c:2029:
>    if (uid_valid(loginuid))
>         sessionid = (unsigned int)atomic_inc_return_wrap(&session_id);

This is _NOT_ a statistic counter and should not be stats_t

> kernel/padata.c:58:
>     seq_nr = atomic_inc_return_wrap(&pd->seq_nr);

Idem

> kernel/rcu/tree_trace.c:192:
>     s0 += atomic_long_read_wrap(&rdp->exp_workdone0);

Don't get the point, this one is actually trivial.

> kernel/trace/trace_mmiotrace.c:123
>     atomic_xchg_wrap(&dropped_count, 0);
> 
> ... and several others.  Note, these are only from stats_t conversions in
> the kernel/ directory.

And while that thing counts, its not actually a statistics thing.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [kernel-hardening] Re: Conversion from atomic_t to refcount_t: summary of issues
  2016-12-01 23:03         ` Peter Zijlstra
@ 2016-12-01 23:20           ` Kees Cook
  2016-12-01 23:29             ` David Windsor
                               ` (2 more replies)
  2016-12-01 23:20           ` David Windsor
  1 sibling, 3 replies; 48+ messages in thread
From: Kees Cook @ 2016-12-01 23:20 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: David Windsor, Reshetova, Elena, kernel-hardening, Greg KH,
	will.deacon, Boqun Feng, Hans Liljestrand, aik, david

On Thu, Dec 1, 2016 at 3:03 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Thu, Dec 01, 2016 at 04:31:16PM -0500, David Windsor wrote:
>> Also, I'd like to point out that while identifying stats_t instances, I
>> have found a similar distribution of non-standard functions (as agreed upon
>> for the stats_t API).
>
>> First, usage of atomic_long_wrap_t (there currently isn't a stats_long_t
>> planned for implementation):
>
> There isn't even a stats_t planned. I'm still very much not convinced
> stats_t is needed or even makes sense.
>
> It wouldn't have any different semantics from atomic_t, and the only
> argument Kees made was that reduced atomic_t usage would make it easier
> to spot refcounts, but you're already building tools to find those.
>
> Once the tools work, who cares.

The tool is only part of the whole thing. By distinctly splitting the
other major atomic_t usage pattern away from atomic_t, it solidifies a
stats_t as NOT a reference counter. It's the slow feature-creep or bad
example situations that I'd like to avoid. Also, tools won't catch
everything, and doing manual inspection is much easier if we know a
stats_t cannot be misused.

There doesn't seem to be a good reason NOT to have stats_t, beyond the
work needed to create it and audit the places it should be used.

-Kees

-- 
Kees Cook
Nexus Security

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [kernel-hardening] Re: Conversion from atomic_t to refcount_t: summary of issues
  2016-12-01 23:03         ` Peter Zijlstra
  2016-12-01 23:20           ` Kees Cook
@ 2016-12-01 23:20           ` David Windsor
  2016-12-07 13:21             ` Peter Zijlstra
  1 sibling, 1 reply; 48+ messages in thread
From: David Windsor @ 2016-12-01 23:20 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Reshetova, Elena, kernel-hardening, Greg KH, Kees Cook,
	will.deacon, Boqun Feng, Hans Liljestrand, aik, david

On Thu, Dec 1, 2016 at 6:03 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Thu, Dec 01, 2016 at 04:31:16PM -0500, David Windsor wrote:
> > Also, I'd like to point out that while identifying stats_t instances, I
> > have found a similar distribution of non-standard functions (as agreed upon
> > for the stats_t API).
>
> > First, usage of atomic_long_wrap_t (there currently isn't a stats_long_t
> > planned for implementation):
>
> There isn't even a stats_t planned. I'm still very much not convinced
> stats_t is needed or even makes sense.
>
> It wouldn't have any different semantics from atomic_t, and the only
> argument Kees made was that reduced atomic_t usage would make it easier
> to spot refcounts, but you're already building tools to find those.
>

Then are we decided that overflow protection is going to be opt-in?
If atomic_t isn't protected by default (opt-in case), then yes, new
users will have no need to use a type other than atomic_t for their
shared statistical counters.

> Once the tools work, who cares.
>

> > Next, API calls outside of the proposed stats_t API:
> >
> > kernel/auditsc.c:2029:
> >    if (uid_valid(loginuid))
> >         sessionid = (unsigned int)atomic_inc_return_wrap(&session_id);
>
> This is _NOT_ a statistic counter and should not be stats_t
>
> > kernel/padata.c:58:
> >     seq_nr = atomic_inc_return_wrap(&pd->seq_nr);
>
> Idem
>
> > kernel/rcu/tree_trace.c:192:
> >     s0 += atomic_long_read_wrap(&rdp->exp_workdone0);
>
> Don't get the point, this one is actually trivial.
>
> > kernel/trace/trace_mmiotrace.c:123
> >     atomic_xchg_wrap(&dropped_count, 0);
> >
> > ... and several others.  Note, these are only from stats_t conversions in
> > the kernel/ directory.
>
> And while that thing counts, its not actually a statistics thing.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [kernel-hardening] Re: Conversion from atomic_t to refcount_t: summary of issues
  2016-12-01 23:20           ` Kees Cook
@ 2016-12-01 23:29             ` David Windsor
  2016-12-02  1:17             ` Boqun Feng
  2016-12-07 13:36             ` Peter Zijlstra
  2 siblings, 0 replies; 48+ messages in thread
From: David Windsor @ 2016-12-01 23:29 UTC (permalink / raw)
  To: Kees Cook
  Cc: Peter Zijlstra, Reshetova, Elena, kernel-hardening, Greg KH,
	will.deacon, Boqun Feng, Hans Liljestrand, aik, david

On Thu, Dec 1, 2016 at 6:20 PM, Kees Cook <keescook@chromium.org> wrote:
> On Thu, Dec 1, 2016 at 3:03 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>> On Thu, Dec 01, 2016 at 04:31:16PM -0500, David Windsor wrote:
>>> Also, I'd like to point out that while identifying stats_t instances, I
>>> have found a similar distribution of non-standard functions (as agreed upon
>>> for the stats_t API).
>>
>>> First, usage of atomic_long_wrap_t (there currently isn't a stats_long_t
>>> planned for implementation):
>>
>> There isn't even a stats_t planned. I'm still very much not convinced
>> stats_t is needed or even makes sense.
>>
>> It wouldn't have any different semantics from atomic_t, and the only
>> argument Kees made was that reduced atomic_t usage would make it easier
>> to spot refcounts, but you're already building tools to find those.
>>
>> Once the tools work, who cares.
>
> The tool is only part of the whole thing. By distinctly splitting the
> other major atomic_t usage pattern away from atomic_t, it solidifies a
> stats_t as NOT a reference counter.

Further, as you pointed out in a another thread, there is value in
having a type with an obviously descriptive name that naturally leads
the developers to the correct API.  Helping reduce misuse of types
definitely has value.

> It's the slow feature-creep or bad
> example situations that I'd like to avoid. Also, tools won't catch
> everything, and doing manual inspection is much easier if we know a
> stats_t cannot be misused.
>
> There doesn't seem to be a good reason NOT to have stats_t, beyond the
> work needed to create it and audit the places it should be used.
>
> -Kees
>
> --
> Kees Cook
> Nexus Security

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [kernel-hardening] Re: Conversion from atomic_t to refcount_t: summary of issues
  2016-12-01 23:20           ` Kees Cook
  2016-12-01 23:29             ` David Windsor
@ 2016-12-02  1:17             ` Boqun Feng
  2016-12-02 20:25               ` David Windsor
  2016-12-07 13:36             ` Peter Zijlstra
  2 siblings, 1 reply; 48+ messages in thread
From: Boqun Feng @ 2016-12-02  1:17 UTC (permalink / raw)
  To: Kees Cook
  Cc: Peter Zijlstra, David Windsor, Reshetova, Elena,
	kernel-hardening, Greg KH, will.deacon, Hans Liljestrand, aik,
	david

[-- Attachment #1: Type: text/plain, Size: 2515 bytes --]

On Thu, Dec 01, 2016 at 03:20:29PM -0800, Kees Cook wrote:
> On Thu, Dec 1, 2016 at 3:03 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Thu, Dec 01, 2016 at 04:31:16PM -0500, David Windsor wrote:
> >> Also, I'd like to point out that while identifying stats_t instances, I
> >> have found a similar distribution of non-standard functions (as agreed upon
> >> for the stats_t API).
> >
> >> First, usage of atomic_long_wrap_t (there currently isn't a stats_long_t
> >> planned for implementation):
> >
> > There isn't even a stats_t planned. I'm still very much not convinced
> > stats_t is needed or even makes sense.
> >
> > It wouldn't have any different semantics from atomic_t, and the only
> > argument Kees made was that reduced atomic_t usage would make it easier
> > to spot refcounts, but you're already building tools to find those.
> >
> > Once the tools work, who cares.
> 
> The tool is only part of the whole thing. By distinctly splitting the
> other major atomic_t usage pattern away from atomic_t, it solidifies a
> stats_t as NOT a reference counter. It's the slow feature-creep or bad
> example situations that I'd like to avoid. Also, tools won't catch
> everything, and doing manual inspection is much easier if we know a
> stats_t cannot be misused.
> 
> There doesn't seem to be a good reason NOT to have stats_t, beyond the
> work needed to create it and audit the places it should be used.
> 

So we currently don't have a clear semantics for stats_t, do we? What
kind of atomic_t should be replaced with stats_t? In the link David
pointed out, there are a few cases where a stats_t is put on a
correctness-related variable. I don't think that's a good place to use
stats_t.

Besides, there are many statistics-purpose variables in kernel which are
not atomic, given those, only calling atomic statistic variables stats_t
seems inappropriate and misleading.

How about provide a modifier, say __stats, like __rcu, __percpu. We can
add it to the types of the variables that are only for
statistics-purpose. I think it won't be difficult to find all related
callsites of a __stats with the help of some compiler frontend tools, we
can then detect a possible problem if we do a cmpxchg on a __stats,
which is unlikely a proper usage for a statistic variable. And we don't
need to change or use special APIs, we just need to mark variables and
fields.

Thoughts?

Regards,
Boqun

> -Kees
> 
> -- 
> Kees Cook
> Nexus Security

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [kernel-hardening] Re: Conversion from atomic_t to refcount_t: summary of issues
  2016-12-01 19:15     ` [kernel-hardening] " Peter Zijlstra
  2016-12-01 21:31       ` David Windsor
@ 2016-12-02 15:44       ` Liljestrand Hans
  2016-12-02 16:14         ` Greg KH
  2016-12-07 13:52         ` Peter Zijlstra
  1 sibling, 2 replies; 48+ messages in thread
From: Liljestrand Hans @ 2016-12-02 15:44 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Reshetova, Elena, kernel-hardening, Greg KH, Kees Cook,
	will.deacon, Boqun Feng, David Windsor, aik, david

On Thu, 2016-12-01 at 20:15 +0100, Peter Zijlstra wrote:
> On Tue, Nov 29, 2016 at 03:35:15PM +0000, Reshetova, Elena wrote:
> > but Hans will be finishing processing 
> 
> > > > The following functions are also needed quite commonly:
> > > 
> > > > refcount_inc_return()
> > > > refcount_dec_return()
> > > 
> > > What for? They don't typicaly make sense for refcounting? Other than the
> > > trivial pattern of dec_return() == 0, which is already well covered.
> 
> Hans, could you point me to a few users of {inc,dec}_return() that I can
> audit for (in)sanity?

Hi, sorry for the slow response, I have been a bit otherwise engaged.
Here's at least some of the cases we've already encountered.


There's a lot of uses like this (but unless I'm missing something they
should mostly go under the trivial dec_return() == 0 pattern already
mentioned):

if (!atomic_dec_return(&buf->refcount))
-
if (atomic_dec_return(&mcast->refcount) <= 1)
-
map_guard = atomic_add_return(-1, mux->map_guard);
if (!map_guard)


Then there's at least include/net/ip_vs.h that does unchecked decs and
instead has this dedicated free function that checks for negative values
(so with unsigned refcount it is broken anyway, guess we could do a
conditional dec with a _read, but then its no longer atomic):

http://lxr.free-electrons.com/source/include/net/ip_vs.h#L1424

 static inline void ip_vs_dest_put_and_free(struct ip_vs_dest *dest) 
 {
 	if (atomic_dec_return(&dest->refcnt) < 0)
 		kfree(dest);
 }


Then there's cases that check for the first increment, like here (maybe
something like inc_and_one could allow these without too much leeway?):

http://lxr.free-electrons.com/source/drivers/tty/serial/zs.c#L764

 irq_guard = atomic_add_return(1, &scc->irq_guard);
 	if (irq_guard == 1) {

http://lxr.free-electrons.com/source/drivers/usb/gadget/function/f_fs.c#L1497

 if (atomic_add_return(1, &ffs->opened) == 1 &&
 	ffs->state == FFS_DEACTIVATED) {


And finally some cases with other uses/values:

http://lxr.free-electrons.com/source/kernel/bpf/syscall.c#L231

 if (atomic_inc_return(&map->refcnt) > BPF_MAX_REFCNT) {

http://lxr.free-electrons.com/source/drivers/staging/lustre/lustre/ptlrpc/client.c#L3081

 if (atomic_inc_return(&req->rq_refcount) == 2)


Regards,
-hans

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [kernel-hardening] Re: Conversion from atomic_t to refcount_t: summary of issues
  2016-12-02 15:44       ` Liljestrand Hans
@ 2016-12-02 16:14         ` Greg KH
  2016-12-07 13:52         ` Peter Zijlstra
  1 sibling, 0 replies; 48+ messages in thread
From: Greg KH @ 2016-12-02 16:14 UTC (permalink / raw)
  To: Liljestrand Hans
  Cc: Peter Zijlstra, Reshetova, Elena, kernel-hardening, Kees Cook,
	will.deacon, Boqun Feng, David Windsor, aik, david

On Fri, Dec 02, 2016 at 05:44:34PM +0200, Liljestrand Hans wrote:
> Then there's cases that check for the first increment, like here (maybe
> something like inc_and_one could allow these without too much leeway?):
> 
> http://lxr.free-electrons.com/source/drivers/tty/serial/zs.c#L764
> 
>  irq_guard = atomic_add_return(1, &scc->irq_guard);
>  	if (irq_guard == 1) {

That's horrid, let's fix it correctly, it just wants to know if the
driver has been initialized or not.  Make it a real lock and a variable
and all is good.

> http://lxr.free-electrons.com/source/drivers/usb/gadget/function/f_fs.c#L1497
> 
>  if (atomic_add_return(1, &ffs->opened) == 1 &&
>  	ffs->state == FFS_DEACTIVATED) {

Another horrid hack to try to be "cute" about only allowing one open to
succeed.  Again, let's do this correctly with a lock.

> And finally some cases with other uses/values:
> 
> http://lxr.free-electrons.com/source/kernel/bpf/syscall.c#L231
> 
>  if (atomic_inc_return(&map->refcnt) > BPF_MAX_REFCNT) {

A "don't allow any more than X things through at once" type counter, a
normal atomic type should be fine for this, it's not a "real" reference
counter for a data structure.

> http://lxr.free-electrons.com/source/drivers/staging/lustre/lustre/ptlrpc/client.c#L3081
> 
>  if (atomic_inc_return(&req->rq_refcount) == 2)

lustre should never be used as an excuse for anything, except for how to
not do things.  That's some messed up code that is slowly getting
better...

This audit is turning up good stuff, it will be nice to clean this crud
up!

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [kernel-hardening] Re: Conversion from atomic_t to refcount_t: summary of issues
  2016-12-02  1:17             ` Boqun Feng
@ 2016-12-02 20:25               ` David Windsor
  2016-12-07 13:24                 ` Peter Zijlstra
  0 siblings, 1 reply; 48+ messages in thread
From: David Windsor @ 2016-12-02 20:25 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Kees Cook, Peter Zijlstra, Reshetova, Elena, kernel-hardening,
	Greg KH, will.deacon, Hans Liljestrand, aik, david

On Thu, Dec 1, 2016 at 8:17 PM, Boqun Feng <boqun.feng@gmail.com> wrote:
> On Thu, Dec 01, 2016 at 03:20:29PM -0800, Kees Cook wrote:
>> On Thu, Dec 1, 2016 at 3:03 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>> > On Thu, Dec 01, 2016 at 04:31:16PM -0500, David Windsor wrote:
>> >> Also, I'd like to point out that while identifying stats_t instances, I
>> >> have found a similar distribution of non-standard functions (as agreed upon
>> >> for the stats_t API).
>> >
>> >> First, usage of atomic_long_wrap_t (there currently isn't a stats_long_t
>> >> planned for implementation):
>> >
>> > There isn't even a stats_t planned. I'm still very much not convinced
>> > stats_t is needed or even makes sense.
>> >
>> > It wouldn't have any different semantics from atomic_t, and the only
>> > argument Kees made was that reduced atomic_t usage would make it easier
>> > to spot refcounts, but you're already building tools to find those.
>> >
>> > Once the tools work, who cares.
>>
>> The tool is only part of the whole thing. By distinctly splitting the
>> other major atomic_t usage pattern away from atomic_t, it solidifies a
>> stats_t as NOT a reference counter. It's the slow feature-creep or bad
>> example situations that I'd like to avoid. Also, tools won't catch
>> everything, and doing manual inspection is much easier if we know a
>> stats_t cannot be misused.
>>
>> There doesn't seem to be a good reason NOT to have stats_t, beyond the
>> work needed to create it and audit the places it should be used.
>>
>
> So we currently don't have a clear semantics for stats_t, do we?

We had a discussion about the stats_t API in another thread.  We
agreed upon add(), sub(), inc(), dec(), read() and set().

> What kind of atomic_t should be replaced with stats_t?

stats_t is used for those cases in which an atomic variable is
required, but the overflow of this variable isn't of much concern.
Typically, these types of variables are counters of some kind (rx/tx
counts, etc), but not always.  Perhaps "stats_t" isn't the best type
name.  We actually used "atomic_wrap_t" in previous iterations.

> In the link David  pointed out, there are a few cases where a stats_t is put on a
> correctness-related variable. I don't think that's a good place to use
> stats_t.
>

Yeah, I just grabbed a few examples I noted during my stats_t
conversion work.  The drivers/ tree is littered with stats_t
instances.

> Besides, there are many statistics-purpose variables in kernel which are
> not atomic, given those, only calling atomic statistic variables stats_t
> seems inappropriate and misleading.
>
> How about provide a modifier, say __stats, like __rcu, __percpu. We can
> add it to the types of the variables that are only for
> statistics-purpose. I think it won't be difficult to find all related
> callsites of a __stats with the help of some compiler frontend tools, we
> can then detect a possible problem if we do a cmpxchg on a __stats,
> which is unlikely a proper usage for a statistic variable. And we don't
> need to change or use special APIs, we just need to mark variables and
> fields.
>

The modifier would definitely allow us to identify abuse of "stats"
variables.  But, this should be used in conjunction with mandatory
overflow protection on atomic_t.  As Kees said, tools won't find
everything; we need to secure the atomic_t type itself.

> Thoughts?
>
> Regards,
> Boqun
>
>> -Kees
>>
>> --
>> Kees Cook
>> Nexus Security

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [kernel-hardening] Re: Conversion from atomic_t to refcount_t: summary of issues
  2016-12-01 23:20           ` David Windsor
@ 2016-12-07 13:21             ` Peter Zijlstra
  0 siblings, 0 replies; 48+ messages in thread
From: Peter Zijlstra @ 2016-12-07 13:21 UTC (permalink / raw)
  To: David Windsor
  Cc: Reshetova, Elena, kernel-hardening, Greg KH, Kees Cook,
	will.deacon, Boqun Feng, Hans Liljestrand, aik, david

On Thu, Dec 01, 2016 at 06:20:55PM -0500, David Windsor wrote:

> Then are we decided that overflow protection is going to be opt-in?

The wholesale rape of atomic_t as proposed earlier is not going to
happen.

That results in a terrible and inconsistent API, which will only lead to
more terrible code.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [kernel-hardening] Re: Conversion from atomic_t to refcount_t: summary of issues
  2016-12-02 20:25               ` David Windsor
@ 2016-12-07 13:24                 ` Peter Zijlstra
  2016-12-07 19:03                   ` David Windsor
  0 siblings, 1 reply; 48+ messages in thread
From: Peter Zijlstra @ 2016-12-07 13:24 UTC (permalink / raw)
  To: David Windsor
  Cc: Boqun Feng, Kees Cook, Reshetova, Elena, kernel-hardening,
	Greg KH, will.deacon, Hans Liljestrand, aik, david

On Fri, Dec 02, 2016 at 03:25:42PM -0500, David Windsor wrote:
> On Thu, Dec 1, 2016 at 8:17 PM, Boqun Feng <boqun.feng@gmail.com> wrote:

> > So we currently don't have a clear semantics for stats_t, do we?
> 
> We had a discussion about the stats_t API in another thread.  We
> agreed upon add(), sub(), inc(), dec(), read() and set().
> 
> > What kind of atomic_t should be replaced with stats_t?
> 
> stats_t is used for those cases in which an atomic variable is
> required, but the overflow of this variable isn't of much concern.
> Typically, these types of variables are counters of some kind (rx/tx
> counts, etc), but not always.  Perhaps "stats_t" isn't the best type
> name.  We actually used "atomic_wrap_t" in previous iterations.

And atomic_wrap_t is a horrid trainwreck. Please as to explain the
semantics of atomic_wrap_cmpxchg(). How does wrapping apply to something
that doesn't do sign bits.

> > In the link David  pointed out, there are a few cases where a
> > stats_t is put on a correctness-related variable. I don't think
> > that's a good place to use stats_t.
> >
> 
> Yeah, I just grabbed a few examples I noted during my stats_t
> conversion work.  The drivers/ tree is littered with stats_t
> instances.

Not sure how to respond to this, if you're converting all that to
stats_t then you're doing it wrong.

Most of what you showed should very emphatically not be stats_t.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [kernel-hardening] Re: Conversion from atomic_t to refcount_t: summary of issues
  2016-12-01 23:20           ` Kees Cook
  2016-12-01 23:29             ` David Windsor
  2016-12-02  1:17             ` Boqun Feng
@ 2016-12-07 13:36             ` Peter Zijlstra
  2 siblings, 0 replies; 48+ messages in thread
From: Peter Zijlstra @ 2016-12-07 13:36 UTC (permalink / raw)
  To: Kees Cook
  Cc: David Windsor, Reshetova, Elena, kernel-hardening, Greg KH,
	will.deacon, Boqun Feng, Hans Liljestrand, aik, david

On Thu, Dec 01, 2016 at 03:20:29PM -0800, Kees Cook wrote:

> There doesn't seem to be a good reason NOT to have stats_t, beyond the
> work needed to create it and audit the places it should be used.

API proliferation is a negative. Esp. if you have APIs that provide
overlapping / redundant functionality.

Its why I currently detest kref, and why I never used it (and actively
moved people away from it for code that I work on). Its daft wrappery
(and I know Greg disagrees).

Sure, you can add atomic_stats_t or whatever name gets decided upon (I
think Boqun had a good point in that there's plenty stats that are
!atomic -- also, I don't see you proposing to split "int" into separate
types per use-case), but don't expect me to use it for the code that I
write.

refcount_t makes sense in that it has clearly defined semantics that are
not elsewhere provided and doesn't provide operations that defeat those
semantics or the use-case.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [kernel-hardening] Re: Conversion from atomic_t to refcount_t: summary of issues
  2016-12-02 15:44       ` Liljestrand Hans
  2016-12-02 16:14         ` Greg KH
@ 2016-12-07 13:52         ` Peter Zijlstra
  2016-12-07 15:59           ` David Windsor
  2016-12-16 12:10           ` [kernel-hardening] " Reshetova, Elena
  1 sibling, 2 replies; 48+ messages in thread
From: Peter Zijlstra @ 2016-12-07 13:52 UTC (permalink / raw)
  To: Liljestrand Hans
  Cc: Reshetova, Elena, kernel-hardening, Greg KH, Kees Cook,
	will.deacon, Boqun Feng, David Windsor, aik, david

On Fri, Dec 02, 2016 at 05:44:34PM +0200, Liljestrand Hans wrote:
> 
> Then there's at least include/net/ip_vs.h that does unchecked decs and
> instead has this dedicated free function that checks for negative values
> (so with unsigned refcount it is broken anyway, guess we could do a
> conditional dec with a _read, but then its no longer atomic):
> 
> http://lxr.free-electrons.com/source/include/net/ip_vs.h#L1424
> 
>  static inline void ip_vs_dest_put_and_free(struct ip_vs_dest *dest) 
>  {
>  	if (atomic_dec_return(&dest->refcnt) < 0)
>  		kfree(dest);
>  }

This looks like one that uses -1 to free, so doing a +1 on the entire
scheme would restore 'sanity', but that's fairly thick code and I
couldn't say for sure.

> Then there's cases that check for the first increment, like here (maybe
> something like inc_and_one could allow these without too much leeway?):
> 
> http://lxr.free-electrons.com/source/drivers/tty/serial/zs.c#L764
> 
>  irq_guard = atomic_add_return(1, &scc->irq_guard);
>  	if (irq_guard == 1) {
> 
> http://lxr.free-electrons.com/source/drivers/usb/gadget/function/f_fs.c#L1497
> 
>  if (atomic_add_return(1, &ffs->opened) == 1 &&
>  	ffs->state == FFS_DEACTIVATED) {
> 
> 
> And finally some cases with other uses/values:
> 
> http://lxr.free-electrons.com/source/drivers/staging/lustre/lustre/ptlrpc/client.c#L3081
> 
>  if (atomic_inc_return(&req->rq_refcount) == 2)

Greg already went through these, they're not proper refcounts.


> http://lxr.free-electrons.com/source/kernel/bpf/syscall.c#L231
> 
>  if (atomic_inc_return(&map->refcnt) > BPF_MAX_REFCNT) {

I think this one already got discussed, its a custom refcount limit
scheme (with holes in).

All in all I'm not inclined to add {add,sub.inc,dec}_return() to
refcount, as previously stated, they don't make sense.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [kernel-hardening] Re: Conversion from atomic_t to refcount_t: summary of issues
  2016-11-29 15:35   ` [kernel-hardening] " Reshetova, Elena
  2016-11-29 15:47     ` Peter Zijlstra
  2016-12-01 19:15     ` [kernel-hardening] " Peter Zijlstra
@ 2016-12-07 14:13     ` Peter Zijlstra
  2 siblings, 0 replies; 48+ messages in thread
From: Peter Zijlstra @ 2016-12-07 14:13 UTC (permalink / raw)
  To: Reshetova, Elena
  Cc: kernel-hardening, Greg KH, Kees Cook, will.deacon, Boqun Feng,
	Hans Liljestrand, David Windsor, aik, david

On Tue, Nov 29, 2016 at 03:35:15PM +0000, Reshetova, Elena wrote:
> So, could we agree on the following additions that are needed to refcount_t API:
> 
> - refcount_long_t and all related functions

sed -e 's/refcount_/&long_/g' -e 's/atomic_/&long_/g'
	-e 's/REFCOUNT_/&LONG_/g' -e 's/ATOMIC_/&LONG_/g'
	-e 's/unsigned int/unsigned long/g' -e 's/UINT/ULONG/g'

I can do a proper patch once the interface settles down.

> - refcount_add(), refcount_sub(), refcount_sub_and_test()
> - refcount_dec_if_one()

See below, completely untested.

> - refcount_dec_return(), refcount_inc_return()

Skipped those.

---
Subject: refcount_t: A special purpose refcount type
From: Peter Zijlstra <peterz@infradead.org>
Date: Mon Nov 14 18:06:19 CET 2016

Provide refcount_t, an atomic_t like primitive built just for
refcounting.

It provides saturation semantics such that overflow becomes impossible
and thereby 'spurious' use-after-free is avoided.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/refcount.h |  262 +++++++++++++++++++++++++++++++++++++++++++++++
 lib/Makefile             |    3 
 2 files changed, 264 insertions(+), 1 deletion(-)

--- /dev/null
+++ b/include/linux/refcount.h
@@ -0,0 +1,262 @@
+#ifndef _LINUX_REFCOUNT_H
+#define _LINUX_REFCOUNT_H
+
+/*
+ * Variant of atomic_t specialized for reference counts.
+ *
+ * The interface matches the atomic_t interface (to aid in porting) but only
+ * provides the few functions one should use for reference counting.
+ *
+ * It differs in that the counter saturates at UINT_MAX and will not move once
+ * there. This avoids wrapping the counter and causing 'spurious'
+ * use-after-free issues.
+ *
+ * Memory ordering rules are slightly relaxed wrt regular atomic_t functions
+ * and provide only what is strictly required for refcounts.
+ *
+ * The increments are fully relaxed; these will not provide ordering. The
+ * rationale is that whatever is used to obtain the object we're increasing the
+ * reference count on will provide the ordering. For locked data structures,
+ * its the lock acquire, for RCU/lockless data structures its the dependent
+ * load.
+ *
+ * Do note that inc_not_zero() provides a control dependency which will order
+ * future stores against the inc, this ensures we'll never modify the object
+ * if we did not in fact acquire a reference.
+ *
+ * The decrements will provide release order, such that all the prior loads and
+ * stores will be issued before, it also provides a control dependency, which
+ * will order us against the subsequent free().
+ *
+ * The control dependency is against the load of the cmpxchg (ll/sc) that
+ * succeeded. This means the stores aren't fully ordered, but this is fine
+ * because the 1->0 transition indicates no concurrency.
+ *
+ * Note that the allocator is responsible for ordering things between free()
+ * and alloc().
+ *
+ */
+
+#include <linux/atomic.h>
+#include <linux/bug.h>
+#include <linux/mutex.h>
+#include <linux/spinlock.h>
+
+typedef struct refcount_struct {
+	atomic_t refs;
+} refcount_t;
+
+#define REFCOUNT_INIT(n)	{ .refs = ATOMIC_INIT(n), }
+
+static inline void refcount_set(refcount_t *r, unsigned int n)
+{
+	atomic_set(&r->refs, n);
+}
+
+static inline unsigned int refcount_read(const refcount_t *r)
+{
+	return atomic_read(&r->refs);
+}
+
+static inline __must_check
+void refcount_add_not_zero(unsigned int i, refcount_t *r)
+{
+	unsigned int old, new, val = atomic_read(&r->refs);
+
+	for (;;) {
+		if (!val)
+			return false;
+
+		if (unlikely(val == UINT_MAX))
+			return true;
+
+		new = val + i;
+		if (new < val)
+			new = UINT_MAX;
+		old = atomic_cmpxchg_relaxed(&r->refs, val, new);
+		if (old == val)
+			break;
+
+		val = old;
+	}
+
+	WARN(new == UINT_MAX, "refcount_t: saturated; leaking memory.\n");
+
+	return true;
+}
+
+/*
+ * Similar to atomic_inc_not_zero(), will saturate at UINT_MAX and WARN.
+ *
+ * Provides no memory ordering, it is assumed the caller has guaranteed the
+ * object memory to be stable (RCU, etc.). It does provide a control dependency
+ * and thereby orders future stores. See the comment on top.
+ */
+static inline __must_check
+bool refcount_inc_not_zero(refcount_t *r)
+{
+	return refcount_add_not_zero(1, r);
+}
+
+/*
+ * Similar to atomic_inc(), will saturate at UINT_MAX and WARN.
+ *
+ * Provides no memory ordering, it is assumed the caller already has a
+ * reference on the object, will WARN when this is not so.
+ */
+static inline void refcount_inc(refcount_t *r)
+{
+	WARN(!refcount_inc_not_zero(r), "refcount_t: increment on 0; use-after-free.\n");
+}
+
+static inline void refcount_add(unsigned int i, refcount_t *r)
+{
+	WARN(!refcount_add_not_zero(i, r), "refcount_t: addition on 0; use-after-free.\n");
+}
+
+/*
+ * Similar to atomic_dec_and_test(), it will WARN on underflow and fail to
+ * decrement when saturated at UINT_MAX.
+ *
+ * Provides release memory ordering, such that prior loads and stores are done
+ * before, and provides a control dependency such that free() must come after.
+ * See the comment on top.
+ */
+static inline __must_check
+bool refcount_sub_and_test(unsigned int i, refcount_t *r)
+{
+	unsigned int old, new, val = atomic_read(&r->refs);
+
+	for (;;) {
+		if (val == UINT_MAX)
+			return false;
+
+		new = val - i;
+		if (WARN(new > val, "refcount_t: underflow; use-after-free.\n"))
+			return false;
+
+		old = atomic_cmpxchg_release(&r->refs, val, new);
+		if (old == val)
+			break;
+
+		val = old;
+	}
+
+	return !new;
+}
+
+static inline __must_check
+bool refcount_dec_and_test(refcount_t *r)
+{
+	return refcount_sub_and_test(1, r);
+}
+
+/*
+ * Similar to atomic_dec(), it will WARN on underflow and fail to decrement
+ * when saturated at UINT_MAX.
+ *
+ * Provides release memory ordering, such that prior loads and stores are done
+ * before.
+ */
+static inline
+void refcount_dec(refcount_t *r)
+{
+	WARN(refcount_dec_and_test(r), "refcount_t: decrement hit 0; leaking memory.\n");
+}
+
+/*
+ * No atomic_t counterpart, it attempts a 1 -> 0 transition and returns the
+ * success thereof.
+ *
+ * Like all decrement operations, it provides release memory order and provides
+ * a control dependency.
+ *
+ * It can be used like a try-delete operator; this explicit case is provided
+ * and not cmpxchg in generic, because that would allow implementing unsafe
+ * operations.
+ */
+static inline __must_check
+bool refcount_dec_if_one(refcount_t *r)
+{
+	return atomic_cmpxchg_release(&r->refs, 1, 0) == 1;
+}
+
+/*
+ * No atomic_t counterpart, it decrements unless the value is 1, in which case
+ * it will return false.
+ *
+ * Was often done like: atomic_add_unless(&var, -1, 1)
+ */
+static inline __must_check
+bool refcount_dec_not_one(refcount_t *r)
+{
+	unsigned int old, new, val = atomic_read(&r->refs);
+
+	for (;;) {
+		if (val == UINT_MAX)
+			return true;
+
+		if (val == 1)
+			return false;
+
+		new = val - 1;
+		if (WARN(new > val, "refcount_t: underflow; use-after-free.\n"))
+			return true;
+
+		old = atomic_cmpxchg_release(&r->refs, val, new);
+		if (old == val)
+			break;
+
+		val = old;
+	}
+
+	return true;
+}
+
+/*
+ * Similar to atomic_dec_and_mutex_lock(), it will WARN on underflow and fail
+ * to decrement when saturated at UINT_MAX.
+ *
+ * Provides release memory ordering, such that prior loads and stores are done
+ * before, and provides a control dependency such that free() must come after.
+ * See the comment on top.
+ */
+static inline __must_check
+bool refcount_dec_and_mutex_lock(refcount_t *r, struct mutex *lock)
+{
+	if (refcount_dec_not_one(r))
+		return false;
+
+	mutex_lock(lock);
+	if (!refcount_dec_and_test(r)) {
+		mutex_unlock(lock);
+		return false;
+	}
+
+	return true;
+}
+
+/*
+ * Similar to atomic_dec_and_lock(), it will WARN on underflow and fail to
+ * decrement when saturated at UINT_MAX.
+ *
+ * Provides release memory ordering, such that prior loads and stores are done
+ * before, and provides a control dependency such that free() must come after.
+ * See the comment on top.
+ */
+static inline __must_check
+bool refcount_dec_and_lock(refcount_t *r, spinlock_t *lock)
+{
+	if (refcount_dec_not_one(r))
+		return false;
+
+	spin_lock(lock);
+	if (!refcount_dec_and_test(r)) {
+		spin_unlock(lock);
+		return false;
+	}
+
+	return true;
+}
+
+#endif /* _LINUX_REFCOUNT_H */

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [kernel-hardening] Re: Conversion from atomic_t to refcount_t: summary of issues
  2016-12-07 13:52         ` Peter Zijlstra
@ 2016-12-07 15:59           ` David Windsor
  2016-12-07 16:26             ` Peter Zijlstra
  2016-12-16 12:10           ` [kernel-hardening] " Reshetova, Elena
  1 sibling, 1 reply; 48+ messages in thread
From: David Windsor @ 2016-12-07 15:59 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Liljestrand Hans, Reshetova, Elena, kernel-hardening, Greg KH,
	Kees Cook, will.deacon, Boqun Feng, aik, david

On Wed, Dec 7, 2016 at 8:52 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Fri, Dec 02, 2016 at 05:44:34PM +0200, Liljestrand Hans wrote:
>>
>> Then there's at least include/net/ip_vs.h that does unchecked decs and
>> instead has this dedicated free function that checks for negative values
>> (so with unsigned refcount it is broken anyway, guess we could do a
>> conditional dec with a _read, but then its no longer atomic):
>>
>> http://lxr.free-electrons.com/source/include/net/ip_vs.h#L1424
>>
>>  static inline void ip_vs_dest_put_and_free(struct ip_vs_dest *dest)
>>  {
>>       if (atomic_dec_return(&dest->refcnt) < 0)
>>               kfree(dest);
>>  }
>
> This looks like one that uses -1 to free, so doing a +1 on the entire
> scheme would restore 'sanity', but that's fairly thick code and I
> couldn't say for sure.
>
>> Then there's cases that check for the first increment, like here (maybe
>> something like inc_and_one could allow these without too much leeway?):
>>
>> http://lxr.free-electrons.com/source/drivers/tty/serial/zs.c#L764
>>
>>  irq_guard = atomic_add_return(1, &scc->irq_guard);
>>       if (irq_guard == 1) {
>>
>> http://lxr.free-electrons.com/source/drivers/usb/gadget/function/f_fs.c#L1497
>>
>>  if (atomic_add_return(1, &ffs->opened) == 1 &&
>>       ffs->state == FFS_DEACTIVATED) {
>>
>>
>> And finally some cases with other uses/values:
>>
>> http://lxr.free-electrons.com/source/drivers/staging/lustre/lustre/ptlrpc/client.c#L3081
>>
>>  if (atomic_inc_return(&req->rq_refcount) == 2)
>
> Greg already went through these, they're not proper refcounts.
>
>
>> http://lxr.free-electrons.com/source/kernel/bpf/syscall.c#L231
>>
>>  if (atomic_inc_return(&map->refcnt) > BPF_MAX_REFCNT) {
>
> I think this one already got discussed, its a custom refcount limit
> scheme (with holes in).
>
> All in all I'm not inclined to add {add,sub.inc,dec}_return() to
> refcount, as previously stated, they don't make sense.

Is the plan now to audit all {add,sub,inc,dec}_return() call sites?
This should probably happen anyway, due to the amount of funkiness
uncovered by Hans' mini-audit. Then we can rewrite actual reference
counting code that calls the unsupported {add,sub,inc,dec}_return() to
use something else?

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [kernel-hardening] Re: Conversion from atomic_t to refcount_t: summary of issues
  2016-12-07 15:59           ` David Windsor
@ 2016-12-07 16:26             ` Peter Zijlstra
  2016-12-07 16:31               ` David Windsor
  0 siblings, 1 reply; 48+ messages in thread
From: Peter Zijlstra @ 2016-12-07 16:26 UTC (permalink / raw)
  To: David Windsor
  Cc: Liljestrand Hans, Reshetova, Elena, kernel-hardening, Greg KH,
	Kees Cook, will.deacon, Boqun Feng, aik, david

On Wed, Dec 07, 2016 at 10:59:47AM -0500, David Windsor wrote:
> On Wed, Dec 7, 2016 at 8:52 AM, Peter Zijlstra <peterz@infradead.org> wrote:

> > All in all I'm not inclined to add {add,sub.inc,dec}_return() to
> > refcount, as previously stated, they don't make sense.
> 
> Is the plan now to audit all {add,sub,inc,dec}_return() call sites?
> This should probably happen anyway, due to the amount of funkiness
> uncovered by Hans' mini-audit. Then we can rewrite actual reference
> counting code that calls the unsupported {add,sub,inc,dec}_return() to
> use something else?

The ip_vs_dest cache thing would receive 2 patches, one doing the global
+1, the second conversion to refcount_t.

For BPF we'd need to talk to Alexei to see if the custom limit still
makes sense, but I'd be inclined to simply drop that in the refcount_t
conversion.

As to the tty and usb-gadget ones, those constructs are actually racy,
but I'm not sure the races matter. But I would certainly prefer to
rework then to be race-free.

But I wouldn't go so far as to audit all *_return calls, just those that
pop up while hunting refcounts.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [kernel-hardening] Re: Conversion from atomic_t to refcount_t: summary of issues
  2016-12-07 16:26             ` Peter Zijlstra
@ 2016-12-07 16:31               ` David Windsor
  0 siblings, 0 replies; 48+ messages in thread
From: David Windsor @ 2016-12-07 16:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Liljestrand Hans, Reshetova, Elena, kernel-hardening, Greg KH,
	Kees Cook, will.deacon, Boqun Feng, aik, david

On Wed, Dec 7, 2016 at 11:26 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Wed, Dec 07, 2016 at 10:59:47AM -0500, David Windsor wrote:
>> On Wed, Dec 7, 2016 at 8:52 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>
>> > All in all I'm not inclined to add {add,sub.inc,dec}_return() to
>> > refcount, as previously stated, they don't make sense.
>>
>> Is the plan now to audit all {add,sub,inc,dec}_return() call sites?
>> This should probably happen anyway, due to the amount of funkiness
>> uncovered by Hans' mini-audit. Then we can rewrite actual reference
>> counting code that calls the unsupported {add,sub,inc,dec}_return() to
>> use something else?
>
> The ip_vs_dest cache thing would receive 2 patches, one doing the global
> +1, the second conversion to refcount_t.
>
> For BPF we'd need to talk to Alexei to see if the custom limit still
> makes sense, but I'd be inclined to simply drop that in the refcount_t
> conversion.
>
> As to the tty and usb-gadget ones, those constructs are actually racy,
> but I'm not sure the races matter. But I would certainly prefer to
> rework then to be race-free.
>
> But I wouldn't go so far as to audit all *_return calls, just those that
> pop up while hunting refcounts.

Yeah, we definitely don't want to go about finding refcounts by
looking for *_return calls, but it's worth auditing the *_return calls
we've already identified in previous patches and each time we
encounter them in future refcount_t hunting.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [kernel-hardening] Re: Conversion from atomic_t to refcount_t: summary of issues
  2016-12-07 13:24                 ` Peter Zijlstra
@ 2016-12-07 19:03                   ` David Windsor
  2016-12-09 14:48                     ` David Windsor
  0 siblings, 1 reply; 48+ messages in thread
From: David Windsor @ 2016-12-07 19:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Boqun Feng, Kees Cook, Reshetova, Elena, kernel-hardening,
	Greg KH, will.deacon, Hans Liljestrand, aik, david

On Wed, Dec 7, 2016 at 8:24 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Fri, Dec 02, 2016 at 03:25:42PM -0500, David Windsor wrote:
>> On Thu, Dec 1, 2016 at 8:17 PM, Boqun Feng <boqun.feng@gmail.com> wrote:
>
>> > So we currently don't have a clear semantics for stats_t, do we?
>>
>> We had a discussion about the stats_t API in another thread.  We
>> agreed upon add(), sub(), inc(), dec(), read() and set().
>>
>> > What kind of atomic_t should be replaced with stats_t?
>>
>> stats_t is used for those cases in which an atomic variable is
>> required, but the overflow of this variable isn't of much concern.
>> Typically, these types of variables are counters of some kind (rx/tx
>> counts, etc), but not always.  Perhaps "stats_t" isn't the best type
>> name.  We actually used "atomic_wrap_t" in previous iterations.
>
> And atomic_wrap_t is a horrid trainwreck. Please as to explain the
> semantics of atomic_wrap_cmpxchg(). How does wrapping apply to something
> that doesn't do sign bits.
>

atomic_wrap_t grew out of a desire to fix an already broken system for
doing reference counting.  atomic_t is being widely used for both
reference counting (which should not overflow), non-reference
counting, and other operations.  atomic_wrap_t provides the semantics
of the "original" atomic_t: atomic operations with no overflow
protection.  Thus, atomic_wrap_cmpxchg(): the cmpxchg operation for
atomic_wrap_t types.  Abominations like this sometimes exist in RFC
series.  In this case, we came up with the atomic_wrap API by mostly
creating an atomic_wrap-ified version of each atomic_t API function.

>> > In the link David  pointed out, there are a few cases where a
>> > stats_t is put on a correctness-related variable. I don't think
>> > that's a good place to use stats_t.
>> >
>>
>> Yeah, I just grabbed a few examples I noted during my stats_t
>> conversion work.  The drivers/ tree is littered with stats_t
>> instances.
>
> Not sure how to respond to this, if you're converting all that to
> stats_t then you're doing it wrong.
>
> Most of what you showed should very emphatically not be stats_t.
>

Yes, none of those examples were appropriate.  My apologies; they were
quickly chosen from a pool of potential stats_t conversions.

Further audit of atomic_wrap_t indicates that much of what we marked
as atomic_wrap_t are actually things like sequence numbers, IDs, etc.
Examples:

http://lxr.free-electrons.com/source/include/linux/padata.h#L132
struct parallel_data.seq_nr

http://lxr.free-electrons.com/source/fs/btrfs/delayed-inode.h#L46
struct btrfs_delayed_root.items_seq

http://lxr.free-electrons.com/source/drivers/ata/libata-core.c#L108
ata_print_id

These "identifier" types make up a good number of atomic_wrap_t cases.

With respect to stats_t variables, here are two more examples.

First, cm_listens_created and cm_listens_destroyed from
drivers/infiniband/hw/nes/nes_cm.c:
http://lxr.free-electrons.com/source/drivers/infiniband/hw/nes/nes_cm.c#L72.
Only atomic_inc() and atomic_read() are used on these variables:

http://lxr.free-electrons.com/source/drivers/infiniband/hw/nes/nes_cm.c#L3499
atomic_inc(&cm_listens_created);

http://lxr.free-electrons.com/source/drivers/infiniband/hw/nes/nes_cm.c#L1336
atomic_inc(&cm_listens_destroyed);

http://lxr.free-electrons.com/source/drivers/infiniband/hw/nes/nes_nic.c#L1284
target_stat_values[++index] = atomic_read(&cm_listens_created);

http://lxr.free-electrons.com/source/drivers/infiniband/hw/nes/nes_nic.c#L1285
target_stat_values[++index] = atomic_read(&cm_listens_destroyed);


Next, struct irq_desc.threads_handled.  It is defined in
include/linux/irqdesc.h and, in the struct's comments, is described
as, "[a] stats field for deferred spurious detection of threaded
handlers."

Only atomic_inc() and atomic_read() are called to manage this variable:

atomic_read():
kernel/irq/spurious.c:
handled = atomic_read(&desc->threads_handled);

irq_desc.threads_handled is part of threaded interrupt handlers and is
incremented in irq_thread():.

atomic_inc():
kernel/irq/manage.c:949
static int irq_thread(void *data)
{
    ....
    while (!irq_wait_for_interrupt(action)) {
        irqreturn_t action_ret;

        irq_thread_check_affinity(desc, action);

        action_ret = handler_fn(desc, action);
        if (action_ret == IRQ_HANDLED)
            atomic_inc(&desc->threads_handled);
    ...
}

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [kernel-hardening] Re: Conversion from atomic_t to refcount_t: summary of issues
  2016-12-07 19:03                   ` David Windsor
@ 2016-12-09 14:48                     ` David Windsor
  0 siblings, 0 replies; 48+ messages in thread
From: David Windsor @ 2016-12-09 14:48 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Boqun Feng, Kees Cook, Reshetova, Elena, kernel-hardening,
	Greg KH, will.deacon, Hans Liljestrand, aik, david

On Wed, Dec 7, 2016 at 2:03 PM, David Windsor <dwindsor@gmail.com> wrote:
> On Wed, Dec 7, 2016 at 8:24 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>> On Fri, Dec 02, 2016 at 03:25:42PM -0500, David Windsor wrote:
>>> On Thu, Dec 1, 2016 at 8:17 PM, Boqun Feng <boqun.feng@gmail.com> wrote:
>>
>>> > So we currently don't have a clear semantics for stats_t, do we?
>>>
>>> We had a discussion about the stats_t API in another thread.  We
>>> agreed upon add(), sub(), inc(), dec(), read() and set().
>>>
>>> > What kind of atomic_t should be replaced with stats_t?
>>>
>>> stats_t is used for those cases in which an atomic variable is
>>> required, but the overflow of this variable isn't of much concern.
>>> Typically, these types of variables are counters of some kind (rx/tx
>>> counts, etc), but not always.  Perhaps "stats_t" isn't the best type
>>> name.  We actually used "atomic_wrap_t" in previous iterations.
>>
>> And atomic_wrap_t is a horrid trainwreck. Please as to explain the
>> semantics of atomic_wrap_cmpxchg(). How does wrapping apply to something
>> that doesn't do sign bits.
>>
>
> atomic_wrap_t grew out of a desire to fix an already broken system for
> doing reference counting.  atomic_t is being widely used for both
> reference counting (which should not overflow), non-reference
> counting, and other operations.  atomic_wrap_t provides the semantics
> of the "original" atomic_t: atomic operations with no overflow
> protection.  Thus, atomic_wrap_cmpxchg(): the cmpxchg operation for
> atomic_wrap_t types.  Abominations like this sometimes exist in RFC
> series.  In this case, we came up with the atomic_wrap API by mostly
> creating an atomic_wrap-ified version of each atomic_t API function.
>

After giving this some more thought, I feel I should mention that
kernel reference counting is obviously not "broken" as stated above.
I was just referring to the fact that a single type is used for both
reference counting and non-reference counting, so any protection (and
added performance hit) added to atomic_t will be picked up in the
non-reference counter cases as well.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [kernel-hardening] RE: Conversion from atomic_t to refcount_t: summary of issues
  2016-12-07 13:52         ` Peter Zijlstra
  2016-12-07 15:59           ` David Windsor
@ 2016-12-16 12:10           ` Reshetova, Elena
  2016-12-16 14:01             ` [kernel-hardening] " Peter Zijlstra
  1 sibling, 1 reply; 48+ messages in thread
From: Reshetova, Elena @ 2016-12-16 12:10 UTC (permalink / raw)
  To: Peter Zijlstra, Liljestrand Hans
  Cc: kernel-hardening, Greg KH, Kees Cook, will.deacon, Boqun Feng,
	David Windsor, aik, david

> On Fri, Dec 02, 2016 at 05:44:34PM +0200, Liljestrand Hans wrote:
> >
> > Then there's at least include/net/ip_vs.h that does unchecked decs and
> > instead has this dedicated free function that checks for negative values
> > (so with unsigned refcount it is broken anyway, guess we could do a
> > conditional dec with a _read, but then its no longer atomic):
> >
> > http://lxr.free-electrons.com/source/include/net/ip_vs.h#L1424
> >
> >  static inline void ip_vs_dest_put_and_free(struct ip_vs_dest *dest)
> >  {
> >  	if (atomic_dec_return(&dest->refcnt) < 0)
> >  		kfree(dest);
> >  }
> 
> This looks like one that uses -1 to free, so doing a +1 on the entire
> scheme would restore 'sanity', but that's fairly thick code and I
> couldn't say for sure.
> 
> > Then there's cases that check for the first increment, like here (maybe
> > something like inc_and_one could allow these without too much leeway?):
> >
> > http://lxr.free-electrons.com/source/drivers/tty/serial/zs.c#L764
> >
> >  irq_guard = atomic_add_return(1, &scc->irq_guard);
> >  	if (irq_guard == 1) {
> >
> > http://lxr.free-
> electrons.com/source/drivers/usb/gadget/function/f_fs.c#L1497
> >
> >  if (atomic_add_return(1, &ffs->opened) == 1 &&
> >  	ffs->state == FFS_DEACTIVATED) {
> >
> >
> > And finally some cases with other uses/values:
> >
> > http://lxr.free-
> electrons.com/source/drivers/staging/lustre/lustre/ptlrpc/client.c#L3081
> >
> >  if (atomic_inc_return(&req->rq_refcount) == 2)
> 
> Greg already went through these, they're not proper refcounts.
> 
> 
> > http://lxr.free-electrons.com/source/kernel/bpf/syscall.c#L231
> >
> >  if (atomic_inc_return(&map->refcnt) > BPF_MAX_REFCNT) {
> 
> I think this one already got discussed, its a custom refcount limit
> scheme (with holes in).
> 
> All in all I'm not inclined to add {add,sub.inc,dec}_return() to
> refcount, as previously stated, they don't make sense.

Is it ok to add at least refcount_inc_if_zero() ? 
We already have refcount_dec_if_one(), reffcount_dec_not_one() and refcount_inc_not_zero(), so this one is the only missing one and would greatly help in couple of cases. 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [kernel-hardening] Re: Conversion from atomic_t to refcount_t: summary of issues
  2016-12-16 12:10           ` [kernel-hardening] " Reshetova, Elena
@ 2016-12-16 14:01             ` Peter Zijlstra
  2016-12-19  7:55               ` [kernel-hardening] " Reshetova, Elena
  0 siblings, 1 reply; 48+ messages in thread
From: Peter Zijlstra @ 2016-12-16 14:01 UTC (permalink / raw)
  To: Reshetova, Elena
  Cc: Liljestrand Hans, kernel-hardening, Greg KH, Kees Cook,
	will.deacon, Boqun Feng, David Windsor, aik, david

On Fri, Dec 16, 2016 at 12:10:21PM +0000, Reshetova, Elena wrote:

> Is it ok to add at least refcount_inc_if_zero() ? 

Of course not.

> We already have refcount_dec_if_one(), reffcount_dec_not_one() and
> refcount_inc_not_zero(), so this one is the only missing one and would
> greatly help in couple of cases. 

No, its absolutely insane. 0 means its freed, you cannot get another
reference at that point.

If you have code that relies on that, its broken.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [kernel-hardening] RE: Conversion from atomic_t to refcount_t: summary of issues
  2016-12-16 14:01             ` [kernel-hardening] " Peter Zijlstra
@ 2016-12-19  7:55               ` Reshetova, Elena
  2016-12-19 10:12                 ` [kernel-hardening] " Peter Zijlstra
  0 siblings, 1 reply; 48+ messages in thread
From: Reshetova, Elena @ 2016-12-19  7:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Liljestrand Hans, kernel-hardening, Greg KH, Kees Cook,
	will.deacon, Boqun Feng, David Windsor, aik, david

> On Fri, Dec 16, 2016 at 12:10:21PM +0000, Reshetova, Elena wrote:
> 
> > Is it ok to add at least refcount_inc_if_zero() ?
> 
> Of course not.
> 
> > We already have refcount_dec_if_one(), reffcount_dec_not_one() and
> > refcount_inc_not_zero(), so this one is the only missing one and would
> > greatly help in couple of cases.
> 
> No, its absolutely insane. 0 means its freed, you cannot get another
> reference at that point.
> 
> If you have code that relies on that, its broken.

Well, again, you are right in theory, but in practice for example for struct sched_group { atomic_t ref; ... }:

http://lxr.free-electrons.com/source/kernel/sched/core.c#L6178

To me this is a refcounter that needs the protection.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [kernel-hardening] Re: Conversion from atomic_t to refcount_t: summary of issues
  2016-12-19  7:55               ` [kernel-hardening] " Reshetova, Elena
@ 2016-12-19 10:12                 ` Peter Zijlstra
  2016-12-20  9:13                   ` [kernel-hardening] " Reshetova, Elena
  0 siblings, 1 reply; 48+ messages in thread
From: Peter Zijlstra @ 2016-12-19 10:12 UTC (permalink / raw)
  To: Reshetova, Elena
  Cc: Liljestrand Hans, kernel-hardening, Greg KH, Kees Cook,
	will.deacon, Boqun Feng, David Windsor, aik, david

On Mon, Dec 19, 2016 at 07:55:15AM +0000, Reshetova, Elena wrote:
> Well, again, you are right in theory, but in practice for example for struct sched_group { atomic_t ref; ... }:
> 
> http://lxr.free-electrons.com/source/kernel/sched/core.c#L6178
> 
> To me this is a refcounter that needs the protection.

Only if you have more than UINT_MAX CPUs or something like that.

And if you really really want to use refcount_t there, you could +1 the
scheme and it'd work again.

One could also split the refcount and initialized state and avoid the
problem that way.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [kernel-hardening] RE: Conversion from atomic_t to refcount_t: summary of issues
  2016-12-19 10:12                 ` [kernel-hardening] " Peter Zijlstra
@ 2016-12-20  9:13                   ` Reshetova, Elena
  2016-12-20  9:30                     ` [kernel-hardening] " Greg KH
  2016-12-20  9:41                     ` Peter Zijlstra
  0 siblings, 2 replies; 48+ messages in thread
From: Reshetova, Elena @ 2016-12-20  9:13 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Liljestrand Hans, kernel-hardening, Greg KH, Kees Cook,
	will.deacon, Boqun Feng, David Windsor, aik, david

> On Mon, Dec 19, 2016 at 07:55:15AM +0000, Reshetova, Elena wrote:
> > Well, again, you are right in theory, but in practice for example for struct
> sched_group { atomic_t ref; ... }:
> >
> > http://lxr.free-electrons.com/source/kernel/sched/core.c#L6178
> >
> > To me this is a refcounter that needs the protection.
> 
> Only if you have more than UINT_MAX CPUs or something like that.
> 
> And if you really really want to use refcount_t there, you could +1 the
> scheme and it'd work again.

Well, yes, probably, but there are many cases like this in practice, so we would need to have a good plan how to get it all submitted and tested properly. The current patch set is already bigger than what we had before and it is only growing. 
Hans will provide more info later today based on his testing, which shows many places in kernel core where we DO actually have increment on zero happening in practice and whole kernel doesn't even boot with the strictest approach (refusing to inc on zero). And we are only able to test for x86.... 

Given the massive amount of changes, it would be good to merge this at least in couple of stages: 

1) first soft version of refcount_t API which at least allows increment on zero and all atomic_t used as refcounter occurrences that don't require reference counter scheme change (+1 or other)
2) patch set that fixes all problematic places (potentially with code rewrite) 
3) patch that removes possibility of inc on zero from refcount_t

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [kernel-hardening] Re: Conversion from atomic_t to refcount_t: summary of issues
  2016-12-20  9:13                   ` [kernel-hardening] " Reshetova, Elena
@ 2016-12-20  9:30                     ` Greg KH
  2016-12-20  9:40                       ` [kernel-hardening] " Reshetova, Elena
  2016-12-20  9:41                     ` Peter Zijlstra
  1 sibling, 1 reply; 48+ messages in thread
From: Greg KH @ 2016-12-20  9:30 UTC (permalink / raw)
  To: Reshetova, Elena
  Cc: Peter Zijlstra, Liljestrand Hans, kernel-hardening, Kees Cook,
	will.deacon, Boqun Feng, David Windsor, aik, david

On Tue, Dec 20, 2016 at 09:13:58AM +0000, Reshetova, Elena wrote:
> > On Mon, Dec 19, 2016 at 07:55:15AM +0000, Reshetova, Elena wrote:
> > > Well, again, you are right in theory, but in practice for example for struct
> > sched_group { atomic_t ref; ... }:
> > >
> > > http://lxr.free-electrons.com/source/kernel/sched/core.c#L6178
> > >
> > > To me this is a refcounter that needs the protection.
> > 
> > Only if you have more than UINT_MAX CPUs or something like that.
> > 
> > And if you really really want to use refcount_t there, you could +1 the
> > scheme and it'd work again.
> 
> Well, yes, probably, but there are many cases like this in practice,
> so we would need to have a good plan how to get it all submitted and
> tested properly. The current patch set is already bigger than what we
> had before and it is only growing. 

kernel programming is hard :)

Don't get frustrated, it's going to be a lot of work, just break it up
into chunks and go at it...

> Hans will provide more info later today based on his testing, which
> shows many places in kernel core where we DO actually have increment
> on zero happening in practice and whole kernel doesn't even boot with
> the strictest approach (refusing to inc on zero). And we are only able
> to test for x86.... 
> 
> Given the massive amount of changes, it would be good to merge this at
> least in couple of stages: 
> 
> 1) first soft version of refcount_t API which at least allows
> increment on zero and all atomic_t used as refcounter occurrences that
> don't require reference counter scheme change (+1 or other)

Why not merge the "correct" implementation?  Don't submit something that
doesn't work well.  Then fix up the instances that are broken when you
convert them to this new api.

> 2) patch set that fixes all problematic places (potentially with code rewrite) 
> 3) patch that removes possibility of inc on zero from refcount_t

That implies that 3) would not happen for another year or so, not good.
Do it right the first time.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [kernel-hardening] RE: Conversion from atomic_t to refcount_t: summary of issues
  2016-12-20  9:30                     ` [kernel-hardening] " Greg KH
@ 2016-12-20  9:40                       ` Reshetova, Elena
  2016-12-20  9:51                         ` [kernel-hardening] " Greg KH
  0 siblings, 1 reply; 48+ messages in thread
From: Reshetova, Elena @ 2016-12-20  9:40 UTC (permalink / raw)
  To: Greg KH
  Cc: Peter Zijlstra, Liljestrand Hans, kernel-hardening, Kees Cook,
	will.deacon, Boqun Feng, David Windsor, aik, david


> On Tue, Dec 20, 2016 at 09:13:58AM +0000, Reshetova, Elena wrote:
> > > On Mon, Dec 19, 2016 at 07:55:15AM +0000, Reshetova, Elena wrote:
> > > > Well, again, you are right in theory, but in practice for example for struct
> > > sched_group { atomic_t ref; ... }:
> > > >
> > > > http://lxr.free-electrons.com/source/kernel/sched/core.c#L6178
> > > >
> > > > To me this is a refcounter that needs the protection.
> > >
> > > Only if you have more than UINT_MAX CPUs or something like that.
> > >
> > > And if you really really want to use refcount_t there, you could +1 the
> > > scheme and it'd work again.
> >
> > Well, yes, probably, but there are many cases like this in practice,
> > so we would need to have a good plan how to get it all submitted and
> > tested properly. The current patch set is already bigger than what we
> > had before and it is only growing.
> 
> kernel programming is hard :)
> 
> Don't get frustrated, it's going to be a lot of work, just break it up
> into chunks and go at it...
> 
> > Hans will provide more info later today based on his testing, which
> > shows many places in kernel core where we DO actually have increment
> > on zero happening in practice and whole kernel doesn't even boot with
> > the strictest approach (refusing to inc on zero). And we are only able
> > to test for x86....
> >
> > Given the massive amount of changes, it would be good to merge this at
> > least in couple of stages:
> >
> > 1) first soft version of refcount_t API which at least allows
> > increment on zero and all atomic_t used as refcounter occurrences that
> > don't require reference counter scheme change (+1 or other)
> 
> Why not merge the "correct" implementation?  Don't submit something that
> doesn't work well.  Then fix up the instances that are broken when you
> convert them to this new api.

It is not that the implementation is incorrect, it is just less radical change in logical behavior. The main issue is going to be testing. It is hard to make sure we don't break things up, so that's why usually a softer approach is to do such big changes in parts. We can test on x86 and do at least compilation for arm, but what about the rest? It is a logical change which is bigger than we had before and consequences might be severe if we miss smth.


> > 2) patch set that fixes all problematic places (potentially with code rewrite)
> > 3) patch that removes possibility of inc on zero from refcount_t
> 
> That implies that 3) would not happen for another year or so, not good.
> Do it right the first time.

I didn't have that timetable in mind, I would say couple of months the most. 

> 
> thanks,
> 
> greg k-h

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [kernel-hardening] Re: Conversion from atomic_t to refcount_t: summary of issues
  2016-12-20  9:13                   ` [kernel-hardening] " Reshetova, Elena
  2016-12-20  9:30                     ` [kernel-hardening] " Greg KH
@ 2016-12-20  9:41                     ` Peter Zijlstra
  2016-12-20  9:58                       ` [kernel-hardening] " Reshetova, Elena
  2016-12-20 10:55                       ` [kernel-hardening] " Liljestrand Hans
  1 sibling, 2 replies; 48+ messages in thread
From: Peter Zijlstra @ 2016-12-20  9:41 UTC (permalink / raw)
  To: Reshetova, Elena
  Cc: Liljestrand Hans, kernel-hardening, Greg KH, Kees Cook,
	will.deacon, Boqun Feng, David Windsor, aik, david

On Tue, Dec 20, 2016 at 09:13:58AM +0000, Reshetova, Elena wrote:
> > On Mon, Dec 19, 2016 at 07:55:15AM +0000, Reshetova, Elena wrote:
> > > Well, again, you are right in theory, but in practice for example for struct
> > sched_group { atomic_t ref; ... }:
> > >
> > > http://lxr.free-electrons.com/source/kernel/sched/core.c#L6178
> > >
> > > To me this is a refcounter that needs the protection.
> > 
> > Only if you have more than UINT_MAX CPUs or something like that.
> > 
> > And if you really really want to use refcount_t there, you could +1 the
> > scheme and it'd work again.
> 
> Well, yes, probably, but there are many cases like this in practice,
> so we would need to have a good plan how to get it all submitted and
> tested properly. The current patch set is already bigger than what we
> had before and it is only growing.  Hans will provide more info later
> today based on his testing, which shows many places in kernel core
> where we DO actually have increment on zero happening in practice and
> whole kernel doesn't even boot with the strictest approach (refusing
> to inc on zero). And we are only able to test for x86.... 
> 
> Given the massive amount of changes, it would be good to merge this at
> least in couple of stages: 
> 
> 1) first soft version of refcount_t API which at least allows
> increment on zero and all atomic_t used as refcounter occurrences that
> don't require reference counter scheme change (+1 or other) 2) patch
> set that fixes all problematic places (potentially with code rewrite)
> 3) patch that removes possibility of inc on zero from refcount_t

I don't get it. Why ?

Just leave the weird and problematic cases using atomic_t. Its far
harder to remove crap later.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [kernel-hardening] Re: Conversion from atomic_t to refcount_t: summary of issues
  2016-12-20  9:40                       ` [kernel-hardening] " Reshetova, Elena
@ 2016-12-20  9:51                         ` Greg KH
  2016-12-20  9:55                           ` [kernel-hardening] " Reshetova, Elena
  0 siblings, 1 reply; 48+ messages in thread
From: Greg KH @ 2016-12-20  9:51 UTC (permalink / raw)
  To: Reshetova, Elena
  Cc: Peter Zijlstra, Liljestrand Hans, kernel-hardening, Kees Cook,
	will.deacon, Boqun Feng, David Windsor, aik, david

On Tue, Dec 20, 2016 at 09:40:29AM +0000, Reshetova, Elena wrote:
> 
> > On Tue, Dec 20, 2016 at 09:13:58AM +0000, Reshetova, Elena wrote:
> > > > On Mon, Dec 19, 2016 at 07:55:15AM +0000, Reshetova, Elena wrote:
> > > > > Well, again, you are right in theory, but in practice for example for struct
> > > > sched_group { atomic_t ref; ... }:
> > > > >
> > > > > http://lxr.free-electrons.com/source/kernel/sched/core.c#L6178
> > > > >
> > > > > To me this is a refcounter that needs the protection.
> > > >
> > > > Only if you have more than UINT_MAX CPUs or something like that.
> > > >
> > > > And if you really really want to use refcount_t there, you could +1 the
> > > > scheme and it'd work again.
> > >
> > > Well, yes, probably, but there are many cases like this in practice,
> > > so we would need to have a good plan how to get it all submitted and
> > > tested properly. The current patch set is already bigger than what we
> > > had before and it is only growing.
> > 
> > kernel programming is hard :)
> > 
> > Don't get frustrated, it's going to be a lot of work, just break it up
> > into chunks and go at it...
> > 
> > > Hans will provide more info later today based on his testing, which
> > > shows many places in kernel core where we DO actually have increment
> > > on zero happening in practice and whole kernel doesn't even boot with
> > > the strictest approach (refusing to inc on zero). And we are only able
> > > to test for x86....
> > >
> > > Given the massive amount of changes, it would be good to merge this at
> > > least in couple of stages:
> > >
> > > 1) first soft version of refcount_t API which at least allows
> > > increment on zero and all atomic_t used as refcounter occurrences that
> > > don't require reference counter scheme change (+1 or other)
> > 
> > Why not merge the "correct" implementation?  Don't submit something that
> > doesn't work well.  Then fix up the instances that are broken when you
> > convert them to this new api.
> 
> It is not that the implementation is incorrect, it is just less
> radical change in logical behavior. The main issue is going to be
> testing.

Again, kernel programming is hard :)

> It is hard to make sure we don't break things up, so that's why
> usually a softer approach is to do such big changes in parts. We can
> test on x86 and do at least compilation for arm, but what about the
> rest? It is a logical change which is bigger than we had before and
> consequences might be severe if we miss smth.

You add the correct implementation of refcount_t, and then push the
individual conversions through the various subsystem maintainers who
will review and test the code for correctness.  Just like any other api
change we do.  Why is this somehow "different"?

> > > 2) patch set that fixes all problematic places (potentially with code rewrite)
> > > 3) patch that removes possibility of inc on zero from refcount_t
> > 
> > That implies that 3) would not happen for another year or so, not good.
> > Do it right the first time.
> 
> I didn't have that timetable in mind, I would say couple of months the most. 

3 months is just one kernel development cycle, this is going to take
longer than that, but optimism is nice to have :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [kernel-hardening] RE: Conversion from atomic_t to refcount_t: summary of issues
  2016-12-20  9:51                         ` [kernel-hardening] " Greg KH
@ 2016-12-20  9:55                           ` Reshetova, Elena
  2016-12-20 10:26                             ` [kernel-hardening] " Greg KH
  0 siblings, 1 reply; 48+ messages in thread
From: Reshetova, Elena @ 2016-12-20  9:55 UTC (permalink / raw)
  To: Greg KH
  Cc: Peter Zijlstra, Liljestrand Hans, kernel-hardening, Kees Cook,
	will.deacon, Boqun Feng, David Windsor, aik, david


> > > On Tue, Dec 20, 2016 at 09:13:58AM +0000, Reshetova, Elena wrote:
> > > > > On Mon, Dec 19, 2016 at 07:55:15AM +0000, Reshetova, Elena wrote:
> > > > > > Well, again, you are right in theory, but in practice for example for
> struct
> > > > > sched_group { atomic_t ref; ... }:
> > > > > >
> > > > > > http://lxr.free-electrons.com/source/kernel/sched/core.c#L6178
> > > > > >
> > > > > > To me this is a refcounter that needs the protection.
> > > > >
> > > > > Only if you have more than UINT_MAX CPUs or something like that.
> > > > >
> > > > > And if you really really want to use refcount_t there, you could +1 the
> > > > > scheme and it'd work again.
> > > >
> > > > Well, yes, probably, but there are many cases like this in practice,
> > > > so we would need to have a good plan how to get it all submitted and
> > > > tested properly. The current patch set is already bigger than what we
> > > > had before and it is only growing.
> > >
> > > kernel programming is hard :)
> > >
> > > Don't get frustrated, it's going to be a lot of work, just break it up
> > > into chunks and go at it...
> > >
> > > > Hans will provide more info later today based on his testing, which
> > > > shows many places in kernel core where we DO actually have increment
> > > > on zero happening in practice and whole kernel doesn't even boot with
> > > > the strictest approach (refusing to inc on zero). And we are only able
> > > > to test for x86....
> > > >
> > > > Given the massive amount of changes, it would be good to merge this at
> > > > least in couple of stages:
> > > >
> > > > 1) first soft version of refcount_t API which at least allows
> > > > increment on zero and all atomic_t used as refcounter occurrences that
> > > > don't require reference counter scheme change (+1 or other)
> > >
> > > Why not merge the "correct" implementation?  Don't submit something
> that
> > > doesn't work well.  Then fix up the instances that are broken when you
> > > convert them to this new api.
> >
> > It is not that the implementation is incorrect, it is just less
> > radical change in logical behavior. The main issue is going to be
> > testing.
> 
> Again, kernel programming is hard :)
> 
> > It is hard to make sure we don't break things up, so that's why
> > usually a softer approach is to do such big changes in parts. We can
> > test on x86 and do at least compilation for arm, but what about the
> > rest? It is a logical change which is bigger than we had before and
> > consequences might be severe if we miss smth.
> 
> You add the correct implementation of refcount_t, and then push the
> individual conversions through the various subsystem maintainers who
> will review and test the code for correctness.  Just like any other api
> change we do.  Why is this somehow "different"?

Can we really assume help on this for testing on all archs from maintainers?
If so, it does help greatly. 

> > > > 2) patch set that fixes all problematic places (potentially with code
> rewrite)
> > > > 3) patch that removes possibility of inc on zero from refcount_t
> > >
> > > That implies that 3) would not happen for another year or so, not good.
> > > Do it right the first time.
> >
> > I didn't have that timetable in mind, I would say couple of months the most.
> 
> 3 months is just one kernel development cycle, this is going to take
> longer than that, but optimism is nice to have :)
> 
> thanks,
> 
> greg k-h

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [kernel-hardening] RE: Conversion from atomic_t to refcount_t: summary of issues
  2016-12-20  9:41                     ` Peter Zijlstra
@ 2016-12-20  9:58                       ` Reshetova, Elena
  2016-12-20 10:55                       ` [kernel-hardening] " Liljestrand Hans
  1 sibling, 0 replies; 48+ messages in thread
From: Reshetova, Elena @ 2016-12-20  9:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Liljestrand Hans, kernel-hardening, Greg KH, Kees Cook,
	will.deacon, Boqun Feng, David Windsor, aik, david

> On Tue, Dec 20, 2016 at 09:13:58AM +0000, Reshetova, Elena wrote:
> > > On Mon, Dec 19, 2016 at 07:55:15AM +0000, Reshetova, Elena wrote:
> > > > Well, again, you are right in theory, but in practice for example for struct
> > > sched_group { atomic_t ref; ... }:
> > > >
> > > > http://lxr.free-electrons.com/source/kernel/sched/core.c#L6178
> > > >
> > > > To me this is a refcounter that needs the protection.
> > >
> > > Only if you have more than UINT_MAX CPUs or something like that.
> > >
> > > And if you really really want to use refcount_t there, you could +1 the
> > > scheme and it'd work again.
> >
> > Well, yes, probably, but there are many cases like this in practice,
> > so we would need to have a good plan how to get it all submitted and
> > tested properly. The current patch set is already bigger than what we
> > had before and it is only growing.  Hans will provide more info later
> > today based on his testing, which shows many places in kernel core
> > where we DO actually have increment on zero happening in practice and
> > whole kernel doesn't even boot with the strictest approach (refusing
> > to inc on zero). And we are only able to test for x86....
> >
> > Given the massive amount of changes, it would be good to merge this at
> > least in couple of stages:
> >
> > 1) first soft version of refcount_t API which at least allows
> > increment on zero and all atomic_t used as refcounter occurrences that
> > don't require reference counter scheme change (+1 or other) 2) patch
> > set that fixes all problematic places (potentially with code rewrite)
> > 3) patch that removes possibility of inc on zero from refcount_t
> 
> I don't get it. Why ?
> 
> Just leave the weird and problematic cases using atomic_t. Its far
> harder to remove crap later.

If it is internal change (not API), should be quite easy to remove later. But ok, I guess for really weird cases, it might be easier to postpone their change until the next patch set. And if we get help in testing, then hopefully it will go without major breakdowns. 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [kernel-hardening] Re: Conversion from atomic_t to refcount_t: summary of issues
  2016-12-20  9:55                           ` [kernel-hardening] " Reshetova, Elena
@ 2016-12-20 10:26                             ` Greg KH
  0 siblings, 0 replies; 48+ messages in thread
From: Greg KH @ 2016-12-20 10:26 UTC (permalink / raw)
  To: Reshetova, Elena
  Cc: Peter Zijlstra, Liljestrand Hans, kernel-hardening, Kees Cook,
	will.deacon, Boqun Feng, David Windsor, aik, david

On Tue, Dec 20, 2016 at 09:55:58AM +0000, Reshetova, Elena wrote:
> 
> > > > On Tue, Dec 20, 2016 at 09:13:58AM +0000, Reshetova, Elena wrote:
> > > > > > On Mon, Dec 19, 2016 at 07:55:15AM +0000, Reshetova, Elena wrote:
> > > > > > > Well, again, you are right in theory, but in practice for example for
> > struct
> > > > > > sched_group { atomic_t ref; ... }:
> > > > > > >
> > > > > > > http://lxr.free-electrons.com/source/kernel/sched/core.c#L6178
> > > > > > >
> > > > > > > To me this is a refcounter that needs the protection.
> > > > > >
> > > > > > Only if you have more than UINT_MAX CPUs or something like that.
> > > > > >
> > > > > > And if you really really want to use refcount_t there, you could +1 the
> > > > > > scheme and it'd work again.
> > > > >
> > > > > Well, yes, probably, but there are many cases like this in practice,
> > > > > so we would need to have a good plan how to get it all submitted and
> > > > > tested properly. The current patch set is already bigger than what we
> > > > > had before and it is only growing.
> > > >
> > > > kernel programming is hard :)
> > > >
> > > > Don't get frustrated, it's going to be a lot of work, just break it up
> > > > into chunks and go at it...
> > > >
> > > > > Hans will provide more info later today based on his testing, which
> > > > > shows many places in kernel core where we DO actually have increment
> > > > > on zero happening in practice and whole kernel doesn't even boot with
> > > > > the strictest approach (refusing to inc on zero). And we are only able
> > > > > to test for x86....
> > > > >
> > > > > Given the massive amount of changes, it would be good to merge this at
> > > > > least in couple of stages:
> > > > >
> > > > > 1) first soft version of refcount_t API which at least allows
> > > > > increment on zero and all atomic_t used as refcounter occurrences that
> > > > > don't require reference counter scheme change (+1 or other)
> > > >
> > > > Why not merge the "correct" implementation?  Don't submit something
> > that
> > > > doesn't work well.  Then fix up the instances that are broken when you
> > > > convert them to this new api.
> > >
> > > It is not that the implementation is incorrect, it is just less
> > > radical change in logical behavior. The main issue is going to be
> > > testing.
> > 
> > Again, kernel programming is hard :)
> > 
> > > It is hard to make sure we don't break things up, so that's why
> > > usually a softer approach is to do such big changes in parts. We can
> > > test on x86 and do at least compilation for arm, but what about the
> > > rest? It is a logical change which is bigger than we had before and
> > > consequences might be severe if we miss smth.
> > 
> > You add the correct implementation of refcount_t, and then push the
> > individual conversions through the various subsystem maintainers who
> > will review and test the code for correctness.  Just like any other api
> > change we do.  Why is this somehow "different"?
> 
> Can we really assume help on this for testing on all archs from maintainers?

Why wouldn't you be able to?  What makes this type of work different
from any other kernel change?

I don't understand the problem here...

greg k-h

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [kernel-hardening] Re: Conversion from atomic_t to refcount_t: summary of issues
  2016-12-20  9:41                     ` Peter Zijlstra
  2016-12-20  9:58                       ` [kernel-hardening] " Reshetova, Elena
@ 2016-12-20 10:55                       ` Liljestrand Hans
  2016-12-20 13:13                         ` Peter Zijlstra
  1 sibling, 1 reply; 48+ messages in thread
From: Liljestrand Hans @ 2016-12-20 10:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Reshetova, Elena, kernel-hardening, Greg KH, Kees Cook,
	will.deacon, Boqun Feng, David Windsor, aik, david

On Tue, 2016-12-20 at 10:41 +0100, Peter Zijlstra wrote:
> On Tue, Dec 20, 2016 at 09:13:58AM +0000, Reshetova, Elena wrote:
> > > On Mon, Dec 19, 2016 at 07:55:15AM +0000, Reshetova, Elena wrote:
> > > > Well, again, you are right in theory, but in practice for example for struct
> > > sched_group { atomic_t ref; ... }:
> > > >
> > > > http://lxr.free-electrons.com/source/kernel/sched/core.c#L6178
> > > >
> > > > To me this is a refcounter that needs the protection.
> > > 
> > > Only if you have more than UINT_MAX CPUs or something like that.
> > > 
> > > And if you really really want to use refcount_t there, you could +1 the
> > > scheme and it'd work again.
> > 
> > Well, yes, probably, but there are many cases like this in practice,
> > so we would need to have a good plan how to get it all submitted and
> > tested properly. The current patch set is already bigger than what we
> > had before and it is only growing.  Hans will provide more info later
> > today based on his testing, which shows many places in kernel core
> > where we DO actually have increment on zero happening in practice and
> > whole kernel doesn't even boot with the strictest approach (refusing
> > to inc on zero). And we are only able to test for x86.... 
> > 
> > Given the massive amount of changes, it would be good to merge this at
> > least in couple of stages: 
> > 
> > 1) first soft version of refcount_t API which at least allows
> > increment on zero and all atomic_t used as refcounter occurrences that
> > don't require reference counter scheme change (+1 or other) 2) patch
> > set that fixes all problematic places (potentially with code rewrite)
> > 3) patch that removes possibility of inc on zero from refcount_t
> 
> I don't get it. Why ?
> 
> Just leave the weird and problematic cases using atomic_t. Its far
> harder to remove crap later.

Yes, ideally we would either fix or leave them as atomic_t. One reason
for the proposal is subtle places that might not get caught in
audit/testing, in those cases allowing refcount_inc to increment on 0
(with a WARN) would ensure the code still works.

We were also hoping reviewing might have been easier with that
separation, but perhaps that was misguided, and separating/skipping the
weird places might serve the same purpose without mucking with the API.


For reference, I've listed here the places that were causing "increment
on 0" WARNs on my previous boot (temporarily allowed inc on 0 to make
boot possible). These seem to be mostly related to resource reuse, but
we haven't yet to looked in detail on how to deal with them.

fs/ext4/mballoc.c:3399          ext4_mb_use_preallocated
        Seems to have separate tracking of destruction
net/ipv4/fib_semantics.c:994    fib_create_info
net/ipv4/devinet.c:233          inetdev_init
net/ipv4/tcp_ipv4.c:1793        inet_sk_rx_dst_set
net/ipv4/route.c:2153:          __ip_route_output_key_hash
net/ipv6/ip6_fib.c:949          fib6_add
net/ipv6/route.c:1048           ip6_pol_route
net/ipv6/addrconf.c:930         ipv6_add_addr
net/ipv6/addrconf.c:357         ipv6_add_dev
net/core/filter.c:940           sk_filter_charge
        net stuff related to caching?
fs/inode.c:813                  find_inode_fast
        Seems to reuse freeing resources?
mm/backing-dev.c:399            wb_congested_get_create
        Initializes to 0

There's also some places that initializes the refcounts to zero (either
using REFCOUNT_INIT or refcount_set). Some of these places are quite
confusing (or, at least to me), so the idea was that doing the changes
incrementally might keep them more manageable.

Regards,
-hans liljestrand

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [kernel-hardening] Re: Conversion from atomic_t to refcount_t: summary of issues
  2016-12-20 10:55                       ` [kernel-hardening] " Liljestrand Hans
@ 2016-12-20 13:13                         ` Peter Zijlstra
  2016-12-20 13:35                           ` Reshetova, Elena
  2016-12-20 15:20                           ` Liljestrand Hans
  0 siblings, 2 replies; 48+ messages in thread
From: Peter Zijlstra @ 2016-12-20 13:13 UTC (permalink / raw)
  To: Liljestrand Hans
  Cc: Reshetova, Elena, kernel-hardening, Greg KH, Kees Cook,
	will.deacon, Boqun Feng, David Windsor, aik, david

On Tue, Dec 20, 2016 at 12:55:02PM +0200, Liljestrand Hans wrote:

> For reference, I've listed here the places that were causing "increment
> on 0" WARNs on my previous boot (temporarily allowed inc on 0 to make
> boot possible). These seem to be mostly related to resource reuse, but
> we haven't yet to looked in detail on how to deal with them.
> 
> fs/ext4/mballoc.c:3399          ext4_mb_use_preallocated
>         Seems to have separate tracking of destruction

This one seems particularly daft, since afaict all pa_count usage is
under pa_lock. No need for it to be atomic. Also, that code is weird, it
has separate pa_free and pa_deleted state.

This should definitely not be converted to refcount_t until its
sanitized.

> net/ipv4/fib_semantics.c:994    fib_create_info

This one I'm not sure how its not broken.

It does something like:

		ofi = fib_find_info(fi);
		if (ofi) {
			// use ofi, free fi
		}

		spin_lock_hb();
		atomic_inc(&fi->fib_clntref);
		// insert fi
		spin_unlock_hb();


If that races against itself, both instances can fail to find an
existing matching fi, and both will insert fi, resulting in a duplicate
fi.

Also, note that at the point of atomic_inc(), fi isn't in fact published
and therefore this need not be an atomic operation.

I could of course miss something subtle, since I only read part of this
code. In any case, even if that were somehow incorrect, I think you can
init fib_clntref with 1 and make it work with that.

> net/ipv4/devinet.c:233          inetdev_init

This seems to use atomic_inc before the object is published, and could
therefore simply use atomic_set().

> net/ipv4/tcp_ipv4.c:1793        inet_sk_rx_dst_set

That needs more context to be evaluated. Also seems very dodgy code in
any case. We need the caller of this function that calls it on 0.

> net/ipv4/route.c:2153:          __ip_route_output_key_hash

need more context, there's not in fact an atomic_ in that function, and
its a giant function.

> net/ipv6/ip6_fib.c:949          fib6_add

Can't make sense of this :/



Didn't look at the rest, but going by the above blindly converting to
refcount_t without prior cleanups isn't a good idea.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* RE: [kernel-hardening] Re: Conversion from atomic_t to refcount_t: summary of issues
  2016-12-20 13:13                         ` Peter Zijlstra
@ 2016-12-20 13:35                           ` Reshetova, Elena
  2016-12-20 15:20                           ` Liljestrand Hans
  1 sibling, 0 replies; 48+ messages in thread
From: Reshetova, Elena @ 2016-12-20 13:35 UTC (permalink / raw)
  To: kernel-hardening, Liljestrand Hans
  Cc: Greg KH, Kees Cook, will.deacon, Boqun Feng, David Windsor, aik, david


> On Tue, Dec 20, 2016 at 12:55:02PM +0200, Liljestrand Hans wrote:
> 
> > For reference, I've listed here the places that were causing "increment
> > on 0" WARNs on my previous boot (temporarily allowed inc on 0 to make
> > boot possible). These seem to be mostly related to resource reuse, but
> > we haven't yet to looked in detail on how to deal with them.
> >
> > fs/ext4/mballoc.c:3399          ext4_mb_use_preallocated
> >         Seems to have separate tracking of destruction
> 
> This one seems particularly daft, since afaict all pa_count usage is
> under pa_lock. No need for it to be atomic. Also, that code is weird, it
> has separate pa_free and pa_deleted state.
> 
> This should definitely not be converted to refcount_t until its
> sanitized.
> 
> > net/ipv4/fib_semantics.c:994    fib_create_info
> 
> This one I'm not sure how its not broken.
> 
> It does something like:
> 
> 		ofi = fib_find_info(fi);
> 		if (ofi) {
> 			// use ofi, free fi
> 		}
> 
> 		spin_lock_hb();
> 		atomic_inc(&fi->fib_clntref);
> 		// insert fi
> 		spin_unlock_hb();
> 
> 
> If that races against itself, both instances can fail to find an
> existing matching fi, and both will insert fi, resulting in a duplicate
> fi.
> 
> Also, note that at the point of atomic_inc(), fi isn't in fact published
> and therefore this need not be an atomic operation.
> 
> I could of course miss something subtle, since I only read part of this
> code. In any case, even if that were somehow incorrect, I think you can
> init fib_clntref with 1 and make it work with that.
> 
> > net/ipv4/devinet.c:233          inetdev_init
> 
> This seems to use atomic_inc before the object is published, and could
> therefore simply use atomic_set().
> 
> > net/ipv4/tcp_ipv4.c:1793        inet_sk_rx_dst_set
> 
> That needs more context to be evaluated. Also seems very dodgy code in
> any case. We need the caller of this function that calls it on 0.
> 
> > net/ipv4/route.c:2153:          __ip_route_output_key_hash
> 
> need more context, there's not in fact an atomic_ in that function, and
> its a giant function.
> 
> > net/ipv6/ip6_fib.c:949          fib6_add
> 
> Can't make sense of this :/
> 
> 
> 
> Didn't look at the rest, but going by the above blindly converting to
> refcount_t without prior cleanups isn't a good idea.

I think then we would only convert for now the cases that can be reasonably converted (and it is still a huge patch set) and then keep track and reasons for all other cases recorded in a doc. After we done with the first part and refcount_t API is merged in upstream, we can proceed to more complex cases and if needed involve people who knows the code better than us.

Any objections to this approach? 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [kernel-hardening] Re: Conversion from atomic_t to refcount_t: summary of issues
  2016-12-20 13:13                         ` Peter Zijlstra
  2016-12-20 13:35                           ` Reshetova, Elena
@ 2016-12-20 15:20                           ` Liljestrand Hans
  2016-12-20 15:52                             ` Peter Zijlstra
  2017-01-10 14:58                             ` Peter Zijlstra
  1 sibling, 2 replies; 48+ messages in thread
From: Liljestrand Hans @ 2016-12-20 15:20 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Reshetova, Elena, kernel-hardening, Greg KH, Kees Cook,
	will.deacon, Boqun Feng, David Windsor, aik, david

On Tue, 2016-12-20 at 14:13 +0100, Peter Zijlstra wrote:
> On Tue, Dec 20, 2016 at 12:55:02PM +0200, Liljestrand Hans wrote:
> 
> > For reference, I've listed here the places that were causing "increment
> > on 0" WARNs on my previous boot (temporarily allowed inc on 0 to make
> > boot possible). These seem to be mostly related to resource reuse, but
> > we haven't yet to looked in detail on how to deal with them.
> > 
> > fs/ext4/mballoc.c:3399          ext4_mb_use_preallocated
> >         Seems to have separate tracking of destruction
> 
> This one seems particularly daft, since afaict all pa_count usage is
> under pa_lock. No need for it to be atomic. Also, that code is weird, it
> has separate pa_free and pa_deleted state.
> 
> This should definitely not be converted to refcount_t until its
> sanitized.

Thanks, I guess that's a relief, was just trying to figure this out, and
came pretty much to the same conclusion about the weirdness.

> 
> > net/ipv4/fib_semantics.c:994    fib_create_info
> 
> This one I'm not sure how its not broken.
> 
> It does something like:
> 
> 		ofi = fib_find_info(fi);
> 		if (ofi) {
> 			// use ofi, free fi
> 		}
> 
> 		spin_lock_hb();
> 		atomic_inc(&fi->fib_clntref);
> 		// insert fi
> 		spin_unlock_hb();
> 
> 
> If that races against itself, both instances can fail to find an
> existing matching fi, and both will insert fi, resulting in a duplicate
> fi.
> 
> Also, note that at the point of atomic_inc(), fi isn't in fact published
> and therefore this need not be an atomic operation.
> 
> I could of course miss something subtle, since I only read part of this
> code. In any case, even if that were somehow incorrect, I think you can
> init fib_clntref with 1 and make it work with that.
> > net/ipv4/devinet.c:233          inetdev_init
> 
> This seems to use atomic_inc before the object is published, and could
> therefore simply use atomic_set().

Oh. Thank you, this seems to be the case for some of the latter cases
too.

> 
> > net/ipv4/tcp_ipv4.c:1793        inet_sk_rx_dst_set
> 
> That needs more context to be evaluated. Also seems very dodgy code in
> any case. We need the caller of this function that calls it on 0.
> 
> > net/ipv4/route.c:2153:          __ip_route_output_key_hash
> 
> need more context, there's not in fact an atomic_ in that function, and
> its a giant function.

Yes couldn't find that either. It is possible I've made some mistake
when getting these from dmesg logs.

> > net/ipv6/ip6_fib.c:949          fib6_add
> 
> Can't make sense of this :/
> 
> 
> 
> Didn't look at the rest, but going by the above blindly converting to
> refcount_t without prior cleanups isn't a good idea.

Yes, I agree. Do you propose we just leave the weirder cases as
atmoic_t, or should we try to incorporate needed cleanup in this initial
patchset?


As for the remaining locations, the following seem to be all incs on
unpublished objects, so the refcount_set strategy should work:

net/ipv6/route.c:1048           ip6_pol_route
net/ipv6/addrconf.c:930         ipv6_add_addr
net/ipv6/addrconf.c:357         ipv6_add_dev
mm/backing-dev.c:399            wb_congested_get_create


net/core/filter.c:940           sk_filter_charge

The sk_filter_charge is a bit iffier, since the refcount is passed in
from the caller. Still, the two calling locations have both just before
allocated/set the refcount, so I guess we could use refcount_set here
too?


fs/inode.c:813                  find_inode_fast

This seems to be doing a search for freed up objects that are then
reused, maybe. Not sure if we can guarantee the refcount is 0, nor if it
would be appropriate to use refcount_set even if we could?

Thanks,
-hans

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [kernel-hardening] Re: Conversion from atomic_t to refcount_t: summary of issues
  2016-12-20 15:20                           ` Liljestrand Hans
@ 2016-12-20 15:52                             ` Peter Zijlstra
  2017-01-10 14:58                             ` Peter Zijlstra
  1 sibling, 0 replies; 48+ messages in thread
From: Peter Zijlstra @ 2016-12-20 15:52 UTC (permalink / raw)
  To: Liljestrand Hans
  Cc: Reshetova, Elena, kernel-hardening, Greg KH, Kees Cook,
	will.deacon, Boqun Feng, David Windsor, aik, david

On Tue, Dec 20, 2016 at 05:20:08PM +0200, Liljestrand Hans wrote:
> > 
> > Didn't look at the rest, but going by the above blindly converting to
> > refcount_t without prior cleanups isn't a good idea.
> 
> Yes, I agree. Do you propose we just leave the weirder cases as
> atmoic_t, or should we try to incorporate needed cleanup in this initial
> patchset?
> 

Yes, I would leave them be for now. I imagine all the 'easy' ones is
still a giant pile of patches.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [kernel-hardening] Re: Conversion from atomic_t to refcount_t: summary of issues
  2016-12-20 15:20                           ` Liljestrand Hans
  2016-12-20 15:52                             ` Peter Zijlstra
@ 2017-01-10 14:58                             ` Peter Zijlstra
  1 sibling, 0 replies; 48+ messages in thread
From: Peter Zijlstra @ 2017-01-10 14:58 UTC (permalink / raw)
  To: Liljestrand Hans
  Cc: Reshetova, Elena, kernel-hardening, Greg KH, Kees Cook,
	will.deacon, Boqun Feng, David Windsor, aik, david

On Tue, Dec 20, 2016 at 05:20:08PM +0200, Liljestrand Hans wrote:
> fs/inode.c:813                  find_inode_fast
> 
> This seems to be doing a search for freed up objects that are then
> reused, maybe. Not sure if we can guarantee the refcount is 0, nor if it
> would be appropriate to use refcount_set even if we could?

This one is actually quite difficult. I'll try and poke at it some.

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2017-01-10 14:58 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-28 11:56 [kernel-hardening] Conversion from atomic_t to refcount_t: summary of issues Reshetova, Elena
2016-11-28 12:13 ` [kernel-hardening] " Peter Zijlstra
2016-11-28 12:44   ` Peter Zijlstra
2016-11-28 12:48   ` Peter Zijlstra
2016-11-28 14:12   ` [kernel-hardening] " Reshetova, Elena
2016-11-29  3:19   ` [kernel-hardening] " Alexey Kardashevskiy
2016-11-29  9:31     ` Peter Zijlstra
2016-11-30  0:23       ` Alexey Kardashevskiy
2016-11-29 15:35   ` [kernel-hardening] " Reshetova, Elena
2016-11-29 15:47     ` Peter Zijlstra
2016-12-01 19:15     ` [kernel-hardening] " Peter Zijlstra
2016-12-01 21:31       ` David Windsor
2016-12-01 23:03         ` Peter Zijlstra
2016-12-01 23:20           ` Kees Cook
2016-12-01 23:29             ` David Windsor
2016-12-02  1:17             ` Boqun Feng
2016-12-02 20:25               ` David Windsor
2016-12-07 13:24                 ` Peter Zijlstra
2016-12-07 19:03                   ` David Windsor
2016-12-09 14:48                     ` David Windsor
2016-12-07 13:36             ` Peter Zijlstra
2016-12-01 23:20           ` David Windsor
2016-12-07 13:21             ` Peter Zijlstra
2016-12-02 15:44       ` Liljestrand Hans
2016-12-02 16:14         ` Greg KH
2016-12-07 13:52         ` Peter Zijlstra
2016-12-07 15:59           ` David Windsor
2016-12-07 16:26             ` Peter Zijlstra
2016-12-07 16:31               ` David Windsor
2016-12-16 12:10           ` [kernel-hardening] " Reshetova, Elena
2016-12-16 14:01             ` [kernel-hardening] " Peter Zijlstra
2016-12-19  7:55               ` [kernel-hardening] " Reshetova, Elena
2016-12-19 10:12                 ` [kernel-hardening] " Peter Zijlstra
2016-12-20  9:13                   ` [kernel-hardening] " Reshetova, Elena
2016-12-20  9:30                     ` [kernel-hardening] " Greg KH
2016-12-20  9:40                       ` [kernel-hardening] " Reshetova, Elena
2016-12-20  9:51                         ` [kernel-hardening] " Greg KH
2016-12-20  9:55                           ` [kernel-hardening] " Reshetova, Elena
2016-12-20 10:26                             ` [kernel-hardening] " Greg KH
2016-12-20  9:41                     ` Peter Zijlstra
2016-12-20  9:58                       ` [kernel-hardening] " Reshetova, Elena
2016-12-20 10:55                       ` [kernel-hardening] " Liljestrand Hans
2016-12-20 13:13                         ` Peter Zijlstra
2016-12-20 13:35                           ` Reshetova, Elena
2016-12-20 15:20                           ` Liljestrand Hans
2016-12-20 15:52                             ` Peter Zijlstra
2017-01-10 14:58                             ` Peter Zijlstra
2016-12-07 14:13     ` Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.