first tests on real data

author: n-peugnet <n.peugnet@free.fr> 2021-09-13 17:31:41 +0200
committer: n-peugnet <n.peugnet@free.fr> 2021-09-13 17:31:41 +0200
commit: 3da2cd3fc7ca694a7b263c099be29c9c8b46af22 (patch)
tree: 095b960cba9e778f4afd934be29fbbab00aa0620 /TODO.md
parent: 3e0123b9d8a1097e74fcd500b72cbdaa00c7a49a (diff)
download: dna-backup-3da2cd3fc7ca694a7b263c099be29c9c8b46af22.tar.gz
dna-backup-3da2cd3fc7ca694a7b263c099be29c9c8b46af22.zip
1 files changed, 12 insertions, 16 deletions
diff --git a/TODO.md b/TODO.md
index adff59b..9dbecc4 100644
--- a/TODO.md
+++ b/TODO.md
@@ -5,20 +5,6 @@ priority 1
         chunks if no similar chunk is found. Thus, it will need to be of
         `chunkSize` or less. Otherwise, it could not be possibly used for
         deduplication.
-    ```
-    for each new chunk:
-        find similar in sketchMap
-        if exists:
-            delta encode
-        else:
-            calculate fingerprint
-            store in fingerprintMap
-            store in sketchMap
-    ```
-- [x] read from repo (Restore function)
-    - [x] store recipe
-    - [x] load recipe
-    - [x] read chunks in-order into a stream
 - [ ] read individual files
 - [ ] properly store information to be DNA encoded
     - [ ] tar source to keep files metadata ?
@@ -39,8 +25,8 @@ priority 2
     fact just `TempChunks` for now)
 - [ ] option to commit without deltas to save new base chunks
 - [ ] custom binary marshal and unmarshal for chunks
-- [ ] use `loadChunkContent` in `loadChunks`
-- [ ] store hashes for faster maps rebuild
+- [x] use `loadChunkContent` in `loadChunks`
+- [ ] TODO: store hashes for faster maps rebuild
 - [ ] try [Fdelta](https://github.com/amlwwalker/fdelta) and
     [Xdelta](https://github.com/nine-lives-later/go-xdelta) instead of Bsdiff
 - [ ] maybe use an LRU cache instead of the current FIFO one.
@@ -53,3 +39,13 @@ reunion 7/09
 - [ ] store recipe and files incrementally
 - [ ] compress recipe
 - [ ] make size comparison between recipe and chunks with some datasets
+
+ideas
+-----
+1. Would it be a good idea to store the compressed size for each chunk?
+    Maybe this way we could only load the chunks needed for each file read.
+
+2. Implement the `fs` interface of Go? Not sure if this will be useful.
+
+3. If we don't need to reduce read amplification we could compress all chunks if
+    it reduces the space used.
author	n-peugnet <n.peugnet@free.fr>	2021-09-13 17:31:41 +0200
committer	n-peugnet <n.peugnet@free.fr>	2021-09-13 17:31:41 +0200
commit	3da2cd3fc7ca694a7b263c099be29c9c8b46af22 (patch)
tree	095b960cba9e778f4afd934be29fbbab00aa0620 /TODO.md
parent	3e0123b9d8a1097e74fcd500b72cbdaa00c7a49a (diff)
download	dna-backup-3da2cd3fc7ca694a7b263c099be29c9c8b46af22.tar.gz dna-backup-3da2cd3fc7ca694a7b263c099be29c9c8b46af22.zip