> On 18 Mar 2019, at 14.02, Igor Konopko <igor.j.konopko@intel.com> wrote:
> 
> 
> 
> On 17.03.2019 20:44, Matias Bjørling wrote:
>> On 3/14/19 9:04 AM, Igor Konopko wrote:
>>> When we are trying to switch to the new line, we need to ensure that
>>> emeta for n-2 line is already written. In other case we can end with
>>> deadlock scenario, when the writer has no more requests to write and
>>> thus there is no way to trigger emeta writes from writer thread. This
>>> is a corner case scenario which occurs in a case of multiple writes
>>> error and thus kind of early line close due to lack of line space.
>>> 
>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>>> ---
>>>   drivers/lightnvm/pblk-core.c  |  2 ++
>>>   drivers/lightnvm/pblk-write.c | 24 ++++++++++++++++++++++++
>>>   drivers/lightnvm/pblk.h       |  1 +
>>>   3 files changed, 27 insertions(+)
>>> 
>>> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
>>> index 38e26fe..a683d1f 100644
>>> --- a/drivers/lightnvm/pblk-core.c
>>> +++ b/drivers/lightnvm/pblk-core.c
>>> @@ -1001,6 +1001,7 @@ static void pblk_line_setup_metadata(struct pblk_line *line,
>>>                        struct pblk_line_mgmt *l_mg,
>>>                        struct pblk_line_meta *lm)
>>>   {
>>> +    struct pblk *pblk = container_of(l_mg, struct pblk, l_mg);
>>>       int meta_line;
>>>       lockdep_assert_held(&l_mg->free_lock);
>>> @@ -1009,6 +1010,7 @@ static void pblk_line_setup_metadata(struct pblk_line *line,
>>>       meta_line = find_first_zero_bit(&l_mg->meta_bitmap, PBLK_DATA_LINES);
>>>       if (meta_line == PBLK_DATA_LINES) {
>>>           spin_unlock(&l_mg->free_lock);
>>> +        pblk_write_emeta_force(pblk);
>>>           io_schedule();
>>>           spin_lock(&l_mg->free_lock);
>>>           goto retry_meta;
>>> diff --git a/drivers/lightnvm/pblk-write.c b/drivers/lightnvm/pblk-write.c
>>> index 4e63f9b..4fbb9b2 100644
>>> --- a/drivers/lightnvm/pblk-write.c
>>> +++ b/drivers/lightnvm/pblk-write.c
>>> @@ -505,6 +505,30 @@ static struct pblk_line *pblk_should_submit_meta_io(struct pblk *pblk,
>>>       return meta_line;
>>>   }
>>> +void pblk_write_emeta_force(struct pblk *pblk)
>>> +{
>>> +    struct pblk_line_meta *lm = &pblk->lm;
>>> +    struct pblk_line_mgmt *l_mg = &pblk->l_mg;
>>> +    struct pblk_line *meta_line;
>>> +
>>> +    while (true) {
>>> +        spin_lock(&l_mg->close_lock);
>>> +        if (list_empty(&l_mg->emeta_list)) {
>>> +            spin_unlock(&l_mg->close_lock);
>>> +            break;
>>> +        }
>>> +        meta_line = list_first_entry(&l_mg->emeta_list,
>>> +                        struct pblk_line, list);
>>> +        if (meta_line->emeta->mem >= lm->emeta_len[0]) {
>>> +            spin_unlock(&l_mg->close_lock);
>>> +            io_schedule();
>>> +            continue;
>>> +        }
>>> +        spin_unlock(&l_mg->close_lock);
>>> +        pblk_submit_meta_io(pblk, meta_line);
>>> +    }
>>> +}
>>> +
>>>   static int pblk_submit_io_set(struct pblk *pblk, struct nvm_rq *rqd)
>>>   {
>>>       struct ppa_addr erase_ppa;
>>> diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
>>> index 0a85990..a42bbfb 100644
>>> --- a/drivers/lightnvm/pblk.h
>>> +++ b/drivers/lightnvm/pblk.h
>>> @@ -877,6 +877,7 @@ int pblk_write_ts(void *data);
>>>   void pblk_write_timer_fn(struct timer_list *t);
>>>   void pblk_write_should_kick(struct pblk *pblk);
>>>   void pblk_write_kick(struct pblk *pblk);
>>> +void pblk_write_emeta_force(struct pblk *pblk);
>>>   /*
>>>    * pblk read path
>> Hi Igor,
>> Is this an error that qemu can force pblk to expose? Can you provide a specific example on what is needed to force the error?
> 
> So I hit this error on PBLKs with low number of LUNs and multiple
> write IO errors (should be reproducible with error injection). Then
> pblk_map_remaining() quickly mapped all the sectors in line and thus
> writer thread was not able to issue all the necessary emeta IO writes,
> so it stucks when trying to replace line to new one. So this is
> definitely an error/corner case scenario.

If the cause if emeta writes, then there is a bug in
pblk_line_close_meta(), as the logic to prevent this case is in place.