> On 29 May 2018, at 15.07, Konopko, Igor J <igor.j.konopko@intel.com> wrote:
> 
>> From: Javier Gonzalez [mailto:javier@cnexlabs.com]
>> Do you mean random writes? On fully sequential, a line will either be
>> fully written, fully invalidated or on its way to be written. When
>> invalidating the line, then the whole line will be invalidated and GC
>> will free it without having to move valid data.
> 
> I meant sequential writes, since this is the easiest way to reach
> rl->rb_state = PBLK_RL_LOW. When we updating this values inside
> __pblk_rl_update_rates(), most of the times rl->rb_user_cnt will
> became equal to rl->rb_user_max - which means that we cannot handle
> any more user IOs. In that case pblk_rl_user_may_insert() will return
> false, no one will call pblk_rb_update_l2p(), rl->rb_user_cnt will not
> be decreased, so user IOs will stuck for forever.
> 

What OP are you using? Even with a 5%, full lines should start being
freed as they are completely invalid. But fair enough, if you decide to
optimize for a guaranteed sequential workload where the OP is only to
support grown bad blocks, you will run into this issue.

>> I can see why you might have problems with very low OP due to the rate
>> limiter, but unfortunately this is not a good way of solving the
>> problem. When you do this, you basically make the L2P to point to the
>> device instead of pointing to the write cache, which in essence bypasses
>> mw_cuints. As a result, if a read comes in to one of the synced entries,
>> it will violate the device-host contract and most probably fail (for
>> sure fail on > SLC).
> 
> What about using on that path some modified version of
> pblk_rb_sync_l2p() which will synchronize all the RB entries except
> last mw_cunits number of elements?
> 

Typically, the size of the buffer is the closest upper power of two to
nr_luns * mw_cuints. So in essence, we only update the L2P as we wrap
up. You could go down to match nr_luns * mw_cuints exactly, which
depending on the media geometry can gain you some extra user entries.

> Also you wrote about mw_cuints definitely makes sense, but even
> without my changes I believe that we can lead into such a situation -
> especially for pblk with small number of LUNs assigned under write IOs
> with high sector count. Pblk_rb_update_l2p() does not explicitly takes
> mw_cunints this value into consideration right now.
> 

As mentioned, the size of the buffer is always over nr_luns * mw_cuints
and we only update when we wrap up - this is, when the mem pointer
(which writes new data to the write buffer), wraps and starts writing
new data. So this case is guaranteed not to happen. In fact, this
particular case is what inspired the design of the write buffer and the
current 5 pointers to extend the typical head and tail.

>> I think that the right way of solving this problem is separating the
>> write and GC buffers and then assigning tokens to them. The write thread
>> will then consume both buffers based on these tokens. In this case, user
>> I/O will have a buffer for itself, which will be guaranteed to advance
>> at the rate the rate-limiter is allowing it to. Note that the 2 buffers
>> can be a single buffer with a new set of pointers so that the lookup can
>> be done with a single bit.
>> 
>> I have been looking for time to implement this for a while. If you want
>> to give it a go, we can talk and I can give you some pointers on
>> potential issues I have thought about.
> 
> I believe this is interesting option - we can discuss this for one of
> next releases.

Sure. Ping me if you want to discuss in more detail.

Javier