priority 1 ---------- - [x] add deltaEncode chunks function - [x] do not merge consecutive smaller chunks as these could be stored as chunks if no similar chunk is found. Thus it will need to be of `chunkSize` or less. Otherwise it could not be possibly used for deduplication. ``` for each new chunk: find similar in sketchMap if exists: delta encode else: calculate fingerprint store in fingerprintMap store in sketchMap ``` - [ ] read from repo - [x] store recipe - [x] load recipe - [ ] read chunks in-order into a stream - [ ] read individual files - [ ] properly store informations to be DNA encoded - [ ] tar source to keep files metadata ? - [ ] store chunks compressed - [ ] compress before storing - [ ] uncompress before loading - [ ] store compressed chunks into tracks ok trackSize (1024o) - [ ] add chunk cache that would look like this: ```go type ChunkCache map[ChunkId][]byte // Do we really want to only keep the chunk content ? type Cache interface { Get(id ChunkId) Chunk Set(id ChunkId, Chunk) } ``` priority 2 ---------- - [ ] use more the `Reader` API (which is analoguous to the `IOStream` in Java) - [ ] refactor matchStream as right now it is quite complex