Skip to content

Folder organization

Zaid Kokaja edited this page Jan 5, 2021 · 1 revision

Folder organization

We should aim to adopt a folder structure similar to Code Ocean:

.
├── README.md
├── code
├── data
│   └── .gitignore
└── results
    └── .gitignore

There should be separate projects/repositories for pickling, encoding, decoding, and extracting contextual embeddings.

code

The code folder can have multiple directories inside to organize its python code, scripts, analyses as seen fit.

data

The data folder houses all data. It should split into raw and preprocessed folders if it saves some modification of the data it uses. Most projects will be using a pickle file compression of all the data. Data files shall not be checked into git.

You may find it useful to use symlinks to refer to pickles rather than copying data files – see man ln. More on this soon.

results

This folder should create a new directory each time you create a new analysis, with all relevant information in it. Ideally, it will be prefixed with a date-stamp in the format YYYYMMDD-. Results are not checked in either, see gitignore.

.gitignore

This file inside data and results ensures that git creates these folders and that it ignores everything else other than the ignore file itself. Here are its contents:

*
!.gitignore