Skip to content

Commit

Permalink
docs: Update README with .env details
Browse files Browse the repository at this point in the history
  • Loading branch information
iusztinpaul committed Oct 5, 2024
1 parent 8cb27d9 commit 2c65a1c
Show file tree
Hide file tree
Showing 7 changed files with 247 additions and 87 deletions.
28 changes: 17 additions & 11 deletions .env.example
Original file line number Diff line number Diff line change
@@ -1,6 +1,4 @@
# MongoDB Config
DATABASE_HOST=mongodb://decodingml:[email protected]:27017
DATABASE_NAME=twin
# --- Required settings even when working locally. ---

# OpenAI API Config
OPENAI_MODEL_ID=gpt-4o-mini
Expand All @@ -9,15 +7,23 @@ OPENAI_API_KEY=str
# Huggingface API Config
HUGGINGFACE_ACCESS_TOKEN=str

# RAG
RAG_MODEL_DEVICE=cpu
# Comet ML (during training)
COMET_API_KEY=str
COMET_WORKSPACE=llm-engineers-handbook

# AWS Credentials
# --- Required settings when deploying the code. ---
# --- Otherwise, default values values work fine. ---

# MongoDB database
DATABASE_HOST="mongodb://decodingml:[email protected]:27017"

# Qdrant vector database
USE_QDRANT_CLOUD=false
QDRANT_CLOUD_URL="str"
QDRANT_APIKEY="str"

# AWS Authentication
AWS_ARN_ROLE=str
AWS_REGION=eu-central-1
AWS_ACCESS_KEY=str
AWS_SECRET_KEY=str
AWS_REGION=eu-central-1

# LinkedIn Credentials
LINKEDIN_USERNAME=str
LINKEDIN_PASSWORD=str
161 changes: 128 additions & 33 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,34 +1,108 @@
# LLM-Engineering

## Dependencies
Repository that contains all the code used throughout the [LLM Engineer's Handbook](https://www.amazon.com/LLM-Engineers-Handbook-engineering-production/dp/1836200072/).

- Python 3.11
- Poetry 1.8.3
- Docker 26.0.0
![Book Cover](/images/book_cover.png)

## Install
# Dependencies

## Local dependencies

To install and run the project locally, you need the following dependencies (the code was tested with the specified versions of the dependencies):

- [pyenv 2.3.36](https://github.com/pyenv/pyenv) (optional: for installing multiple Python versions on your machine)
- [Python 3.11](https://www.python.org/downloads/)
- [Poetry 1.8.3](https://python-poetry.org/docs/#installation)
- [Docker 27.1.1](https://docs.docker.com/engine/install/)

## Cloud services

The code also uses and depends on the following cloud services. For now, you don't have to do anything. We will guide you in the installation and deployment sections on how to use them:

- [HuggingFace](https://huggingface.com/): Model registry
- [Comet ML](https://www.comet.com/site/): Experiment tracker
- [Opik](https://www.comet.com/site/products/opik/): LLM evaluation and prompt monitoring
- [ZenML](https://www.zenml.io/): Orchestrator
- [AWS](https://aws.amazon.com/): Compute and storage
- [MongoDB](https://www.mongodb.com/): NoSQL database
- [Qdrant](https://qdrant.tech/): Vector database

In the [LLM Engineer's Handbook](https://www.amazon.com/LLM-Engineers-Handbook-engineering-production/dp/1836200072/), Chapter 2 will walk you through each tool, and in Chapters 10 and 11, you will have step-by-step guides on how to set everything you need.

# Install

## Install Python 3.11 using pyenv (Optional)

If you have a different global Python version than Python 3.11, you can use pyenv to install Python 3.11 at the project level. Verify your Python version with:
```shell
python --version
```

First, verify that you have pyenv installed:
```shell
pyenv --version
# Output: pyenv 2.3.36
```

Install Python 3.11:
```shell
pyenv install 3.11
```

From the root of your repository, run the following to verify that everything works fine:
```shell
pyenv versions
# Output:
# system
# * 3.11.8 (set by <path/to/repo>/LLM-Engineers-Handbook/.python-version)
```

Because we defined a `.python-version` file within the repository, pyenv will know to pick up the version from that file and use it locally whenever you are working within that folder. To double-check that, run the following command while you are in the repository:
```shell
python --version
# Output: Python 3.11.8
```

If you move out of this repository, both `pyenv versions` and `python --version`, might output different Python versions.

## Install project dependences

The first step is to verify that you have Poetry installed:
```shell
poetry --version
# Output: Poetry (version 1.8.3)
```

Use Poetry to install all the project's requirements to run it locally. Thus, we don't need to install any AWS dependencies. Also, we install Poe the Poet as a Poetry plugin to manage our CLI commands and pre-commit to verify our code before committing changes to git:
```shell
poetry install --without aws
poetry self add 'poethepoet[poetry_plugin]'
poetry self add 'poethepoet[poetry_plugin]==0.29.0'
pre-commit install
```

We run all the scripts using [Poe the Poet](https://poethepoet.natn.io/index.html). You don't have to do anything else but install it as a Poetry plugin.
We run all the scripts using [Poe the Poet](https://poethepoet.natn.io/index.html). You don't have to do anything else but install Poe the Poet as a Poetry plugin, as described above: `poetry self add 'poethepoet[poetry_plugin]'`

To activate the environment created by Poetry, run:
```shell
poetry shell
```

## Set up .env settings file (for local development)

### Configure sensitive information
After you have installed all the dependencies, you must create a `.env` file with sensitive credentials to run the project.
After you have installed all the dependencies, you must create and fill a `.env` file with your credentials to properly interact with other services and run the project.

First, copy our example by running the following:
```shell
cp .env.example .env # The file has to be at the root of your repository!
cp .env.example .env # The file must be at your repository's root!
```

Now, let's understand how to fill in all the variables inside the `.env` file to get you started.

We will begin by reviewing the mandatory settings we must complete when working locally or in the cloud.

### OpenAI

To authenticate to OpenAI, you must fill out the `OPENAI_API_KEY` env var with an authentication token.
To authenticate to OpenAI's API, you must fill out the `OPENAI_API_KEY` env var with an authentication token.

→ Check out this [tutorial](https://platform.openai.com/docs/quickstart) to learn how to provide one from OpenAI.

Expand All @@ -38,32 +112,53 @@ To authenticate to HuggingFace, you must fill out the `HUGGINGFACE_ACCESS_TOKEN`

→ Check out this [tutorial](https://huggingface.co/docs/hub/en/security-tokens) to learn how to provide one from HuggingFace.

### Comet ML

### LinkedIn Crawling [Optional]
This step is optional. You can finish the project without this step.
Comet ML is required only during training.

But in case you want to enable LinkedIn crawling, you have to fill in your username and password:
```shell
LINKEDIN_USERNAME = "str"
LINKEDIN_PASSWORD = "str"
To authenticate to Comet ML, you must fill out the `COMET_API_KEY` and `COMET_WORKSPACE` env vars with an authentication token and workspace name.

→ Check out this [tutorial](https://www.comet.com/docs/v2/api-and-sdk/rest-api/overview/) to learn how to fill the Comet ML variables from above.

### Opik

> Soon

## Set up .env settings file (for deployment)

when deploying the project to the cloud, we must set additional settings for Mongo, Qdrant, and AWS.

If you are just working localy, the default values of these env vars will work out-of-the-box.

We will just highlight what has to be configured, as in **Chapter 11** of the [LLM Engineer's Handbook](https://www.amazon.com/LLM-Engineers-Handbook-engineering-production/dp/1836200072/) we provide step-by-step details on how to deploy the whole system to the cloud.

### MongoDB

We must change the `DATABASE_HOST` env var with the URL pointing to the cloud MongoDB cluster.

### Qdrant

Change `USE_QDRANT_CLOUD` to `True` and `QDRANT_CLOUD_URL` with the URL and `QDRANT_APIKEY` with the API KEY of your cloud Qdrant cluster.

To work with Qdrant cloud, the env vars will look like this:
```env
USE_QDRANT_CLOUD=true
QDRANT_CLOUD_URL="<your_qdrant_cloud_url>"
QDRANT_APIKEY="<your_qdrant_api_key>"
```

For this to work, you also have to:
- disable 2FA
- disable suspicious activity
### AWS

We also recommend to:
- create a dummy profile for crawling
- crawl only your data


> [!IMPORTANT]
> Find more configuration options in the [settings.py](https://github.com/PacktPublishing/LLM-Engineering/blob/main/llm_engineering/settings.py) file. Every variable from the `Settings` class can be configured through the `.env` file.
> Find more configuration options in the [settings.py](https://github.com/PacktPublishing/LLM-Engineers-Handbook/blob/main/llm_engineering/settings.py) file. Every variable from the `Settings` class can be configured through the `.env` file.

## Run Locally
# Run the project locally

### Local Infrastructure
## Local infrastructure

> [!WARNING]
> You need Docker installed (v27.1.1 or higher)
Expand All @@ -84,7 +179,7 @@ poetry poe local-infrastructure-down
> `export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES`
> Otherwise, the connection between the local server and pipeline will break. 🔗 More details in [this issue](https://github.com/zenml-io/zenml/issues/2369).
#### ZenML is now accessible at:
### ZenML is now accessible at:

Web UI: [localhost:8237](localhost:8237)

Expand All @@ -94,15 +189,15 @@ Default credentials:

→🔗 [More on ZenML](https://docs.zenml.io/)

#### Qdrant is now accessible at:
### Qdrant is now accessible at:

REST API: [localhost:6333](localhost:6333)
Web UI: [localhost:6333/dashboard](localhost:6333/dashboard)
GRPC API: [localhost:6334](localhost:6334)

→🔗 [More on Qdrant](https://qdrant.tech/documentation/quick-start/)

#### MongoDB is now accessible at:
### MongoDB is now accessible at:

database URI: `mongodb://decodingml:[email protected]:27017`
database name: `twin`
Expand All @@ -113,7 +208,7 @@ database name: `twin`
We will fill this section in the future. So far it is available only in the 11th Chapter of the book.


### Run Pipelines
## Run Pipelines

All the pipelines will be orchestrated behind the scenes by ZenML.

Expand All @@ -126,7 +221,7 @@ To see the pipelines running and their results:

**But first, let's understand how we can run all our ML pipelines**

#### Data pipelines
### Data pipelines

Run the data collection ETL:
```shell
Expand Down Expand Up @@ -155,14 +250,14 @@ poetry poe run-end-to-end-data-pipeline
```


#### Utility pipelines
### Utility pipelines

Export ZenML artifacts to JSON:
```shell
poetry poe run-export-artifact-to-json-pipeline
```

#### Training pipelines
### Training pipelines

```shell
poetry poe run-training-pipeline
Expand Down
Binary file added images/book_cover.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 2c65a1c

Please sign in to comment.