Binary Autoencoder for Text Modeling.

For all details please check out the paper: https://easychair.org/publications/preprint/9rjT

This repository contains main language modeling experiments from the paper:

binary autoencoder (main subject of paper)
variational autoencoder (strong baseline)
bottleneck autoencoder (weak baseline)
just RNN language model

Setup

Install requirements:

sudo pip3 install -r requirements.txt

Tensorflow is also required. I use version tensorflow-gpu==1.9.0.
Then download datasets and embeddings:

# download the data (every experiment uses the same data)
python3 run.py download_dataset.py download_embeddings.py experiment_configs/binary.json

How to reproduce experiments

For example, binary autoencoder.
Open experiment_configs/binary.json. Look at hyperparameters. Note that all pathes listed in this file will be created on your computer.

# train autoencoder
python3 train.py experiment_configs/binary.json
# evaluate on test set
python3 evaluate.py experiment_configs/binary.json
# encode test set to binary vectors
python3 encode_file.py experiment_configs/binary.json
# view the resulting file
head ~/data/language_modeling/model_output/binary/vectors.txt
# decode binary vectors back to text
python3 decode_file.py experiment_configs/binary.json
# compare the original text and the decoded text
sdiff \
    ~/data/language_modeling/data/test.txt \
    ~/data/language_modeling/model_output/binary/decoded.txt \
    | head -n 100

Other experiments can be reproduced in the same way. Configs for them are also stored in experiment_configs. Training data and embeddings do not need to be downloaded twice.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
experiment_configs		experiment_configs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
classification.py		classification.py
decode_file.py		decode_file.py
download_dataset.py		download_dataset.py
download_embeddings.py		download_embeddings.py
encode_file.py		encode_file.py
evaluate.py		evaluate.py
featurizer.py		featurizer.py
load_hparams.py		load_hparams.py
model.py		model.py
my_utils.py		my_utils.py
requirements.txt		requirements.txt
rnn_cells.py		rnn_cells.py
run.py		run.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Binary Autoencoder for Text Modeling.

Setup

How to reproduce experiments

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Binary Autoencoder for Text Modeling.

Setup

How to reproduce experiments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages