Name	Name	Last commit message	Last commit date
parent directory ..
bigcode_embeddings	bigcode_embeddings
bin	bin
.gitignore	.gitignore
.pylintrc	.pylintrc
README.md	README.md
requirements.txt	requirements.txt
setup.py	setup.py

Name

Last commit message

Last commit date

bigcode_embeddings

bigcode-embeddings

NOTE: data must be generated with bigcode-ast-tools before being able to use this tool

bigcode-embeddings allows to generate and visualize embeddings for AST nodes.

Install

This project should be used with Python 3.

To install the package either run

pip install bigcode-embeddings

or clone the repository and run

cd bigcode-embeddings
pip install -r requirements.txt
python setup.py install

NOTE: tensorflow needs to be installed separately.

Usage

Training embeddings

Training data can be generated using bigcode-ast-tools

Given a data.txt.gz generated from a vocabulary of size 30000, 100D embeddings can be trained using

./bin/bigcode-embeddings train -o embeddings/ --vocab-size 30000 --emb-size 100 --l2-value 0.05 --learning-rate 0.01 data.txt.gz

Tensorboard can be used to visualize the progress

tensorboard --logdir embeddings/

After the first epoch, embeddings visualization becomes available from Tensorboard. The vocabulary TSV file generated by bigcode-ast-tools can be loaded to have labels on the embeddings.

Visualizing the embeddings

Trained embeddings can be visualized using the visualize subcommand If the generated vocabulary file is vocab.tsv, the above embeddings can be visualized with the following command

./bin/data-explorer visualize clusters -m embeddings/embeddings.bin-STEP -l vocab.tsv

where STEP should be the largest value found in the embeddings/ directory.

The -i flag can be passed to generate an interactive plot.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

bigcode-embeddings

Install

Usage

Training embeddings

Visualizing the embeddings

FilesExpand file tree

bigcode-embeddings

Directory actions

More options

Directory actions

More options

Latest commit

History

bigcode-embeddings

Folders and files

parent directory

README.md

bigcode-embeddings

Install

Usage

Training embeddings

Visualizing the embeddings