> On 6 May 2019, at 07.16, Heiner Litz wrote: > > Igor, Javier, > > both of you are right. Here is what I came up with after some more thinking. > > We can avoid the races in 2. and 3. with the following two invariants: > I1: If we have a GC line with seq_id X, only garbage collect from > lines older than X (this addresses 2.) > I2: Guarantee that the open GC line always has a smaller seq_id than > all open user lines (this addresses 3) > > We can enforce I2 by adding a minor seq_id. The major sequence id is > only incremented when allocating a user line. Whenever a GC line is > allocated we read the current major seq_id (open user line) and > increment the minor seq_id. This allows us to order all GC lines > before the open user line during recovery. > > Problem with this approach: > Consider the following example: There exist user lines U0, U1, U2 > (where 0,1,2 are seq_ids) and a non-empty GC5 line (with seq_id 5). If > we now do only sequential writes all user lines will be overwritten > without GC being required. As a result, data will now reside on U6, > U7, U8. If we now need to GC we cannot because of I1. > Solution: We cannot fast-forward the GC line's seq_id because it > contains old data, so pad the GC line with zeros, close it and open a > new GC9 line. > > Generality: > This approach extends to schemes that use e.g. hot, warm, cold open > lines (adding a minor_minor_seq_id) > > Heiner Looks like a good solution that allows us to maintain the current mapping model. Javier