On Thu, 28 Feb 2019, Christopher Lameter wrote:

> On Thu, 28 Feb 2019, Paul Walmsley wrote:
> 
> > On Fri, 22 Feb 2019, Christopher Lameter wrote:
> >
> > > On Fri, 22 Feb 2019, Björn Töpel wrote:
> > >
> > > > > The problem is that the register needs to be referenced by the amo
> > > > > instruction. Can amoadd increment an address relative to a scratch
> > > > > register?
> > > > >
> > > >
> > > > It cannot. So, we'll end up with (at least) a preempt disabled region...
> > >
> > > If we need to do that then we already have so much overhead due to that
> > > processing that it may not be worthwhile. lc/sc overhead is minor compared
> > > to switching preempt on and off I think.
> >
> > Is it strictly necessary to disable and re-enable preemption?
> 
> It is necessary if CONFIG_PREEMPT is set because then the execution can be
> interrupted at any time and rescheduled on another hardware thread.
> 
> > On a UMA system, it might be preferable to risk the cost of the cache line
> > bounce than to flip preemption off and on - assuming there's no other
> > impact.
> 
> Its not the cost of a cache line bounce. The counter data will be
> corrupted since the RMW operations is not properly serialized. The counter
> data will be read from one processor, then the execution context may
> change and the writeback may occur either to the new execution context
> per cpu area (where we overwrite the old value) or to the old exeuction context
> per cpu are (where the counter may already have been incremented by other
> code that ran in that context)

This is the AMO-based sequence that Palmer wrote in his E-mail:

   li t0, 1
   la t1, counter
   amoadd.w zero, t0, 0(t1)

The RMW takes place in the amoadd, and executes atomically from the point 
of view of all cores in the cache coherency domain. 

Or am I misunderstanding you?


- Paul