add perf output

author: n-peugnet <n.peugnet@free.fr> 2021-09-15 19:20:33 +0200
committer: n-peugnet <n.peugnet@free.fr> 2021-09-15 19:20:33 +0200
commit: 9a7f82c57d23b532c16ffdc446c248364a8b70e0 (patch)
tree: 2006f770c249d869faaf27d43e4f455d8935a091
parent: 2f20511f442cecc764c817709c4358a149984766 (diff)
download: dna-backup-9a7f82c57d23b532c16ffdc446c248364a8b70e0.tar.gz
dna-backup-9a7f82c57d23b532c16ffdc446c248364a8b70e0.zip
3 files changed, 63 insertions, 9 deletions
diff --git a/TODO.md b/TODO.md
index adf90a3..ea88781 100644
--- a/TODO.md
+++ b/TODO.md
@@ -35,7 +35,6 @@ priority 2
 - [ ] maybe use an LRU cache instead of the current FIFO one.
 - [x] remove `LoadedChunk` and only use `StoredChunk` instead now that the cache
     is implemented
-- [ ] store file list compressed
 - [ ] keep hash workers so that they reuse the same hasher and reset it instead
     of creating a new one each time. This could save some processing time
 
@@ -44,6 +43,7 @@ reunion 7/09
 - [ ] save recipe consecutive chunks as extents
 - [ ] store recipe and files incrementally
 - [ ] compress recipe
+- [ ] TODO: compress file list
 - [ ] make size comparison between recipe and chunks with some datasets
 
 ideas
@@ -55,3 +55,5 @@ ideas
 
 3. If we don't need to reduce read amplification we could compress all chunks if
     it reduces the space used.
+
+4. Command line with subcommands (like, hmm... git ? for instance)
diff --git a/docs/note-2021-09-14.md b/docs/note-2021-09-14.md
index 14fb973..9e399da 100644
--- a/docs/note-2021-09-14.md
+++ b/docs/note-2021-09-14.md
@@ -1,17 +1,48 @@
-Perf improvements with concurent hash calculation
+Perf improvements with concurrent hash calculation
 =================================================
 
-Using the source code dataset here are the new times:
+Using the source code dataset here are the new perfs:
+
+```
+        254 541,85 msec task-clock                #    1,333 CPUs utilized          
+           489 390      context-switches          #    0,002 M/sec                  
+            14 491      cpu-migrations            #    0,057 K/sec                  
+           109 170      page-faults               #    0,429 K/sec                  
+   702 598 342 141      cycles                    #    2,760 GHz                    
+ 1 191 229 091 705      instructions              #    1,70  insn per cycle         
+   172 579 644 365      branches                  #  678,001 M/sec                  
+     2 502 920 412      branch-misses             #    1,45% of all branches        
+
+     191,024430360 seconds time elapsed
+
+     247,992304000 seconds user
+      15,759037000 seconds sys
+```
 
-`19:38:46.745` -> `19:41:56.652` = `00:03:09,907`
 
 But this time I also had the good idea to close all my processes and to use
 a tmp directory for writing.
 
+-------------------------------------------------------------------------------
+
 With the same setup, the previous perf was:
 
-`19:26:05.954` -> `19:29:20.805` = `00:03:14,851`
+```
+        277 665,78 msec task-clock                #    1,411 CPUs utilized          
+           853 639      context-switches          #    0,003 M/sec                  
+            27 276      cpu-migrations            #    0,098 K/sec                  
+           110 187      page-faults               #    0,397 K/sec                  
+   764 443 227 093      cycles                    #    2,753 GHz                    
+ 1 221 696 199 089      instructions              #    1,60  insn per cycle         
+   178 891 873 274      branches                  #  644,271 M/sec                  
+     2 578 200 052      branch-misses             #    1,44% of all branches        
+
+     196,744991354 seconds time elapsed
+
+     270,030535000 seconds user
+      18,285378000 seconds sys
+```
 
-So not that big of an improvement but it seems that at the same time CPU usage
-has decreased a bit. Maybe because less synchronisation calls were made ?
+So not that big of an improvement, but it seems that at the same time CPU usage
+has decreased a bit. Maybe because less synchronization calls were made ?
 
diff --git a/docs/note-2021-09-15.md b/docs/note-2021-09-15.md
index 8caf1cc..4b9dc9c 100644
--- a/docs/note-2021-09-15.md
+++ b/docs/note-2021-09-15.md
@@ -1,6 +1,27 @@
 Added storage worker
 ====================
 
-## Time
+This worker is charged to make IO calls when storing chunk data. This way writes
+are asynchronous and `matchChunks` can continue without being interrupted.
 
-`14:49:00.221` -> `14:51:34.009` = `00:02:33.788`
+perf output
+-----------
+
+```
+        238 109,33 msec task-clock                #    1,597 CPUs utilized          
+           683 140      context-switches          #    0,003 M/sec                  
+            13 181      cpu-migrations            #    0,055 K/sec                  
+            65 129      page-faults               #    0,274 K/sec                  
+   657 925 115 947      cycles                    #    2,763 GHz                    
+ 1 007 554 842 920      instructions              #    1,53  insn per cycle         
+   135 946 178 728      branches                  #  570,940 M/sec                  
+     2 360 293 394      branch-misses             #    1,74% of all branches        
+
+     149,118546976 seconds time elapsed
+
+     226,961825000 seconds user
+      18,354254000 seconds sys
+```
+
+This is not a huge improvement, but at the same time, hashes are stored which
+makes them a lot faster to recover (no need for rehashing every existing chunk).
author	n-peugnet <n.peugnet@free.fr>	2021-09-15 19:20:33 +0200
committer	n-peugnet <n.peugnet@free.fr>	2021-09-15 19:20:33 +0200
commit	9a7f82c57d23b532c16ffdc446c248364a8b70e0 (patch)
tree	2006f770c249d869faaf27d43e4f455d8935a091
parent	2f20511f442cecc764c817709c4358a149984766 (diff)
download	dna-backup-9a7f82c57d23b532c16ffdc446c248364a8b70e0.tar.gz dna-backup-9a7f82c57d23b532c16ffdc446c248364a8b70e0.zip