Ritual ML Workflows
Tutorials
Large language model tutorial

đź“€ Tutorial: Large language model

1-click run with Docker.

The fastest way to deploy your LLM inference service is by first serving the model with our docker images with the below:

sudo docker build -t "ritual_infernet_ml_llm:1" -f llm_inference_service.Dockerfile .
 
# start containers
sudo docker run --name=llm_inf_service -d --mount source=llm_inf_service,target=/app -p 4999:3000 --env-file llm_inference_service.env "ritual_infernet_ml_llm:1" --bind=0.0.0.0:3000 --workers=2

Once you do this, you can check your service locally

pip install -r requirements.txt
export PYTHONPATH=src
flask --app llm_inference_service run -p 4999

Customize your environment

Setup your Dockerfile environment with the below custom variables. Example configuration files can be found in the docker instructions here

  • FLASK_LLM_WORKFLOW_CLASS - str - Fully qualified name of the workflow class. For example, 'ml.workflows.inference'.
  • FLASK_LLM_WORKFLOW_POSITIONAL_ARGS - list - Any positional arguments required to instantiate the LLM inference workflow.
  • FLASK_LLM_WORKFLOW_KW_ARGS - dict - Any keyword arguments required to instantiate the LLM inference workflow.
  • HUGGING_FACE_HUB_TOKEN (optional) - Token required if any files are needed from the Hugging Face Hub.
  • PYTHONPATH - If deploying via Docker, this can be set to the Docker container location. If starting via the command below, it should be set to '/app'.