Table of documentation contents

Installation

Weaviate is completely containerized, you can use Docker Compose and/or Kubernetes to run it.

Introduction

There are multiple ways to set up a Weaviate instance. For a try-out setup, we recommend you start with docker-compose. Cloud deployment can be used for small and larger projects. For production setup and/or large scale projects, we encourage you to use Kubernetes.

Docker Compose

If you want to try out Weaviate locally and on a small scale, you can use Docker Compose.

To start Weaviate with docker-compose, you need a docker-compose configuration configuration file. Environment variables can be set in this file, which regulate your Weaviate setup, authentication and authorization, Contextionary settings, and data storage settings.

Configuration tool

Configure the docker-compose setup file. You can retrieve a docker-compose.yml file from https://configuration.semi.technology. Use the drop-down menus to generate the url with parameters, and perform the curl command to retrieve the file.

    
  

Next, you can run the setup as follows:

$ docker-compose up

Notes:

  • Default parameters can be omitted.
  • The Dutch, German, Italian and Czech Contextionaries are experimental. Any feedback? Let share it with us on Github or StackOverflow.
  • For more information about Compound Splitting and other Contextionary parameters, click here.
  • You can modify the configuration file to add for example authentication or authorization.

Environment variables

An overview of environment variables in the docker-compose file:

VariableDescriptionTypeExample Value
ORIGINSet the http(s) origin for Weaviatestring - HTTP originhttps://my-weaviate-deployment.com
CONFIGURATION_STORAGE_URLService-Discovery for the (etcd) config store.string - URLhttp://etcd:2379
CONTEXTIONARY_URLService-Discovery for the contextionary containerstring - URLhttp://contextionary
ESVECTOR_URLService-Discovery for the Elasticsearch instancestring - URLhttp://esvector:9200
ESVECTOR_NUMBER_OF_SHARDSConfigure default number of ES shardsint1
ESVECTOR_AUTO_EXPAND_REPLICASWheter ES should auto expand replicasstring1-3
STANDALONE_MODETurn on experimental standalone modestring - true/falsefalse
PERSISTENCE_DATA_PATHOnly if STANDALONE_MODE=true: Where should Weaviate Standalone store its data?string - file path/var/lib/weaviate
AUTHENTICATION_ANONYMOUS_ACCESS_ENABLEDAllow users to interact with weaviate without authstring - true/falsetrue
AUTHENTICATION_OIDC_ENABLEDEnable OIDC Authstring - true/falsefalse
AUTHENTICATION_OIDC_ISSUEROIDC Token Issuerstring - URLhttps://myissuer.com
AUTHENTICATION_OIDC_CLIENT_IDOIDC Client IDstringmy-client-id
AUTHENTICATION_OIDC_USERNAME_CLAIMOIDC Username Claimstringemail
AUTHENTICATION_OIDC_GROUPS_CLAIMOIDC Groups Claimstringgroups
AUTHORIZATION_ADMINLIST_ENABLEDEnable AdminList Authorization modestring - true/falsetrue
AUTHORIZATION_ADMINLIST_USERSUsers with admin permissionstring - comma-separated listjane@example.com,john@example.com
AUTHORIZATION_ADMINLIST_READONLY_USERSUsers with read-only permissionstring - comma-separated listalice@example.com,dave@example.com

Attaching to the log output of only Weaviate

The output is quite verbose. You can attach the logs only to Weaviate itself, for example by running the following command instead of docker-compose up:

# Run Docker Compose
$ docker-compose up -d && docker-compose logs -f weaviate

Alternatively you can run docker-compose entirely detached with_ docker-compose up -d _and poll {bindaddress}:{port}/v1/meta until you receive status 200 OK.

Manual installation

You can also download the files manually if you have trouble with the above script.

  1. $ mkdir weaviate && cd weaviate
  2. Save the docker-compose configuration file as docker-compose.yml.
  3. Run docker-compose up in the same location you’ve downloaded the files (or for less verbose, attach to the log output of only Weaviate).

Cloud deployment

Weaviate is available on Google Cloud Marketplace, where you can find more details on deployment on the cloud.

Weaviate Cluster Service

The Weaviate Cluster Service (WCS) is currently in beta, you can create a free Weaviate cluster that lasts for 1 week completely for free. You can try it out here and if you do, we would love to hear your feedback.

Kubernetes

Note I: the Kubernetes setup is only for large scale deployments of Weaviate. In case you want to work with smaller deployments, you can always user Docker Compose or deployment on the cloud.

Note II: tested until Kubernetes 1.14.x

Note III: In case your are running a very small setup. We would advice to use Docker Compose, but you can also this minimal configuration.

To run Weaviate with Kubernetes take the following steps.

# Check if helm is installed
$ helm version
# Check if pods are running properly
$ kubectl -n kube-system get pods

Get the Helm Chart

Get the Helm chart and configuration files.

# Set the Weaviate chart version
export CHART_VERSION="v10.1.0"
# Download Helm charts
wget https://github.com/semi-technologies/weaviate-helm/releases/download/$CHART_VERSION/weaviate.tgz
# Download configuration values
wget https://raw.githubusercontent.com/semi-technologies/weaviate-helm/$CHART_VERSION/weaviate/values.yaml

K8s configuration

In the values.yaml file you can tweak the configuration to align it with your setup. The yaml file is extensively documented to help you align the configuration with your setup.

Out of the box, the configuration file is setup for:

  • 1 Weaviate replica.
  • anonymous_access = enabled.
  • 3 esvector replicas.
  • 3 etcd replicas.

As a rule of thumb, you can:

  • increase Weaviate replicas if you have a high load.
  • increase esvector replicas if you have a high load and/or a lot of data.

Deploy

You can deploy the helm charts as follows:

# Init helm (if you use Helm 2)
$ helm init --upgrade
# Create a Weaviate namespace
$ kubectl create namespace weaviate
# Deploy
$ helm upgrade \
  --values ./values.yaml \
  --install \
  --wait \
  --namespace "weaviate" \
  "weaviate" \
  weaviate.tgz

Additional Configuration Help

etcd Disaster Recovery

The weaviate chart depends on the bitnami etcd chart to provision etcd in the namespace. etcd is a vital component to Weaviate as it provides abilities for distributed RW locking as well as consistent configuration for critical areas.

Unfortunately, without disaster recovery enabled, the etcd cluster can end up in a deadlock situation without a possibility to recover. If a majority of etcd pods become unavailable, it’s impossible for new members to join. So especially with small cluster sizes, such as three pods, it only takes the simultaneous death of two pods for the cluster to be unrecoverable.

As a mitigation for this disaster scenario, the etcd chart (>= v3.0.0) provides a disaster recovery option, where the etcd cluster can be resurrected without a minimum number of pods. For this a snapshot is created at a regular interval, which can then be read back to bootstrap a “new” cluster.

When should this feature be enabled?

We recommend this feature to be enabled in any scenario where Weaviate should be able to survive cluster node upgrades, cluster auto-scaling or random node deaths (as they are quite common on Kubernetes).

Why is not enabled by default if it’s so important?

This snapshotting process requires an nfs volume. This in turn requires an nfs provisioner, such as @stable/nfs-server-provisioner. Since we cannot assume that the provisioner is present on a random cluster, the chart has to default to etcd.disasterRecovery.enabled: false (see values.yaml). Nevertheless, we recommend turning this on in most cases.

Unfortunately bundling an nfs provisioner with Weaviate is impossible because of the different life cycles. The provisioner should be deployed before weaviate is deployed and only removed after Weaviate is removed. Otherwise - if the provisioner were to be torn down with weaviate - it would be impossible to destroy the volumes it created when deploying Weaviate.

How can I turn it on?

Step 1: Make sure the cluster supports nfs volumes

The easiest way to do so is to deploy @stable/nfs-server-provisioner into the default namespace. For example, run:

NFS_VERSION="0.3.0"
helm upgrade \
  --install \
  --namespace default \
  --version "$NFS_VERSION" \
  nfs-server-provisioner \
  stable/nfs-server-provisioner \
  --set persistence.enabled=true \
  --set persistence.size=10Gi

Step 2: Turn on disaster recovery

In your values.yaml set etcd.disasterRecovery.enabled to true, then deploy Weaviate normally with your values.yaml.

Alternatively, if you don’t want to use a values.yaml, include --set etcd.disasterRecover.enabled=true in your helm install or helm upgrade command.

More Resources

If you can’t find the answer to your question here, please look at the:

  1. Frequently Asked Questions. Or,
  2. Knowledge base of old issues. Or,
  3. For questions: Stackoverflow. Or,
  4. For issues: Github. Or,
  5. Ask your question in the Slack channel: Slack.
Tags
  • installation