Analyse the distribution of files and its content. Note that the order is not stable.
sudo apt update -y && sudo apt upgrade -y
sudo apt install curl build-essential gcc make -y
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
cargo build
When running as binary directly, replace cargo run --
with glowing-happiness
Provide the input path of the directory / repository that you want to analyse. This command will return a json with the tool distribution.
# when using the ready binary
cargo run -- --input .
Output:
{"c":3,"json":261,"spring-boot":1,"test":14,"rust":7,"python":2,"objective-c":1,"gitignore":2,"javascript":261,"circleci":1,"git":1,"xml":3,"toml":2,"swift":1,"cargo":2,"github":1,"markdown":1,"spark":1}
For Visualisation, see Visualisation
With mode
you can define the aggregation level of the output.
When setting no mode is defaults to "count_by_tool".
Both calls are equivalent:
cargo run -- --input .
cargo run -- --input . --mode count_by_tool
cargo run -- --input . --mode list_by_file
Returns only a distinct json list of tools used:
cargo run -- --input . --mode list
["toml","gitignore","spark","json","test","github","c","cargo","javascript","rust","python","swift","git","spring-boot","markdown","objective-c","xml","circleci"]
To execute the application for several repositories, you can glue it with some python:
import os
import subprocess
root = "/home/darius"
for directory in os.listdir(root):
subprocess.call(f"cargo run -- --input {os.path.join(root, directory)} > {directory}.json", shell=True)
you can also run it in bash with awk magic
ls -dl /home/darius/*/ | awk -F'[[:space:]]' '{print "cargo run -- --input " $NF " > " substr($NF, 1, length($NF)-1) ".json"}' | bash
Now you can visualize it. But before we go to it, some more examples of real repositories.
Here you will find some popular repositories and the execution times of the analysis. Please note as with every benchmark, this can only give you a idea how fast it will run on your device. Also note that this repositories are very huge. The fetching of the content will take longer than the analysis.
git clone --branch main https://github.com/Microsoft/vscode/ repositories/vscode
find repositories/vscode | wc -l
time cargo run -- --input repositories/vscode
Example Output:
{"objective-c":1,"java":1,"npm":107,"xml property list":3,"docker":2,"dart":1,"yaml":58,"rust":2,"python":2,"shell":45,"gitignore":18,"javascript":248,"svg":72,"css":211,"github":2,"json":644,"swift":1,"xml":5,"git":1,"jupyter notebook":1,"yarn":100,"markdown":75,"go":2,"png":71,"html":43,"typescript":3987,"c":1}
4 seconds for 8k files
git clone --branch master https://github.com/flutter/flutter.git repositories/flutter
find repositories/flutter | wc -l
time cargo run -- --input repositories/flutter
4 seconds for 8800 files.
You can use the streamlit code in python to visualize the repositories. Something like this may get you on the way.
cargo run -- --input . > app_1.json
python3 -m venv ./venv
. ./venv/bin/activate
pip3 install -r requirements.txt
streamlit run streamlit/app.py
In case you want a rust solution here, feel free to contribute.
The files will be analysed in parallel. To reduce the IO load on your device, you can set
RAYON_NUM_THREADS=4
to only use 4 threads instead of all.