Table of documentation contents

How to create a schema?

A schema is used to define the concepts of the data you will be adding to Weaviate. On this page you will learn how to define the concepts in the Weaviate schema.

Introduction

When you start with an empty Weaviate, you need to define a schema to explain what kind of data you will add. Because Weaviate is a search graph, the linguistic element plays an important role. When creating concepts, Weaviate will validate if it can understand the schema concepts you want to add based on the Contextionary. You might notice that a lot of definitions are related to the everyday language we use. And this is the first best practice to bear in mind. When defining the schema, you should do this in the form like you would explain it to another person, not like tables and columns you would add to a traditional data solution.

Basics

  • A schema consists of classes and properties, which define concepts.
  • Things are distinguished from Actions in schema classes.
  • Words in the schema (names of classes and properties) must be part of the Contextionary.
  • The schema can be modified through the RESTful API. Python and JavaScript clients are available.
  • A class or property in Weaviate becomes immutable, but can always be extended.
  • Learn about Concepts, Classes, Properties and dataTypes in the API reference guide.

Prerequisites

1. Connect to a Weaviate instance.
If you haven’t set up a Weaviate instance yet, check the Getting started guide or the Installation guide. In this guide we assume your instance is running at http://localhost:8080.

Creating your first schema (with the Python client)

Let’s say you want to create a schema for a news publications dataset. This dataset consists of random news articles from publications like Financial Times, New York Times, CNN, Wired, etcetera. You also want to capture the authors, and some metadata about these objects like publication dates.

Follow these steps to create and upload the schema.

1. Start with an empty schema in JSON format.

Schemas are defined in JSON format. An empty schema to start with:

{
  "actions": {
    "classes": [],
    "type": "action"
  },
  "things": {
    "classes": [],
    "type": "thing"
  }
}

2. Define classes and properties.

Let’s say there are three classes you want to capture from this dataset in Weaviate: Publication, Article and Author. Notice that these words are nouns (because they belong to the semantic kind “Things”) and singular (which is best practice, each data object is one of these classes).

Classes always start with a capital letter. Properties always begin with a small letter. When you want to concatenate words into one class name or one property name, you can do that with camelCasing the words. Read more about schema classes, properties and data types here

Let’s define the class Publication with the properties name, hasArticles and headquartersGeoLocation in JSON format. name will be the name of the Publication, in string format. hasArticles will be a reference to Article objects. We need to define the class Articles in the same schema to make sure the reference is possible. headquartersGeoLocation will be of the special dataType geoCoordinates.

{
  "class": "Publication",
  "description": "A publication with an online source",
  "properties": [
    { 
      "dataType": [
        "string"
      ],
      "description": "Name of the publication",
      "name": "name"
    },
    { 
      "dataType": [
        "Article"
      ],
      "description": "The articles this publication has",
      "name": "hasArticles"
    },
    {
        "dataType": [
            "geoCoordinates"
        ],
        "description": "Geo location of the HQ",
        "name": "headquartersGeoLocation"
    }
  ]
}

Add the classes Article and Author to the same schema, so you will end up with the following classes:

[{
  "class": "Publication",
  "description": "A publication with an online source",
  "properties": [
    { 
      "dataType": [
        "string"
      ],
      "description": "Name of the publication",
      "name": "name"
    },
    { 
      "dataType": [
        "Article"
      ],
      "description": "The articles this publication has",
      "name": "hasArticles"
    },
    {
      "dataType": [
          "geoCoordinates"
      ],
      "description": "Geo location of the HQ",
      "name": "headquartersGeoLocation"
    }
  ]
}, {
  "class": "Article",
  "description": "A written text, for example a news article or blog post",
  "properties": [
    { 
      "dataType": [
        "string"
      ],
      "description": "Title of the article",
      "name": "title"
    },
    { 
      "dataType": [
        "text"
      ],
      "description": "The content of the article",
      "name": "content"
    }
  ]
}, {
  "class": "Author",
  "description": "The writer of an article",
  "properties": [
      {
        "dataType": [
            "string"
        ],
        "description": "Name of the author",
        "name": "name"
      },
      { 
        "dataType": [
            "Article"
        ],
        "description": "Articles this author wrote",
        "name": "wroteArticles"
      },
      { 
        "dataType": [
            "Publication"
        ],
        "description": "The publication this author writes for",
        "name": "writesFor"
      }
  ]
}]

Now, add this list of classes to the schema, which will look like this:

{
  "actions": {
    "classes": [],
    "type": "action"
  },
  "things": {
    "classes": [{
      "class": "Publication",
      "description": "A publication with an online source",
      "properties": [
        {
          "dataType": [
            "string"
          ],
          "description": "Name of the publication",
          "name": "name"
        },
        {
          "dataType": [
            "Article"
          ],
          "description": "The articles this publication has",
          "name": "hasArticles"
        },
        {
          "dataType": [
              "geoCoordinates"
          ],
          "description": "Geo location of the HQ",
          "name": "headquartersGeoLocation"
        }
      ]
    }, {
      "class": "Article",
      "description": "A written text, for example a news article or blog post",
      "properties": [
        {
          "dataType": [
            "string"
          ],
          "description": "Title of the article",
          "name": "title"
        },
        {
          "dataType": [
            "text"
          ],
          "description": "The content of the article",
          "name": "content"
        }
      ]
    }, {
      "class": "Author",
      "description": "The writer of an article",
      "properties": [
        {
          "dataType": [
              "string"
          ],
          "description": "Name of the author",
          "name": "name"
        },
        {
          "dataType": [
              "Article"
          ],
          "description": "Articles this author wrote",
          "name": "wroteArticles"
        },
        {
          "dataType": [
              "Publication"
          ],
          "description": "The publication this author writes for",
          "name": "writesFor"
        }
      ]
    }],
  "type": "thing"
  }
}

3. Upload the schema to Weaviate with the Python client.

  import weaviate

client = weaviate.Client("http://localhost:8080")

schema = {
  "actions": {
    "classes": [],
    "type": "action"
  },
  "things": {
    "classes": [{
      "class": "Publication",
      "description": "A publication with an online source",
      "properties": [
        {
          "dataType": [
            "string"
          ],
          "description": "Name of the publication",
          "name": "name"
        },
        {
          "dataType": [
            "Article"
          ],
          "description": "The articles this publication has",
          "name": "hasArticles"
        },
        {
          "dataType": [
              "geoCoordinates"
          ],
          "description": "Geo location of the HQ",
          "name": "headquartersGeoLocation"
        }
      ]
    }, {
      "class": "Article",
      "description": "A written text, for example a news article or blog post",
      "properties": [
        {
          "dataType": [
            "string"
          ],
          "description": "Title of the article",
          "name": "title"
        },
        {
          "dataType": [
            "text"
          ],
          "description": "The content of the article",
          "name": "content"
        }
      ]
    }, {
      "class": "Author",
      "description": "The writer of an article",
      "properties": [
        {
          "dataType": [
              "string"
          ],
          "description": "Name of the author",
          "name": "name"
        },
        {
          "dataType": [
              "Article"
          ],
          "description": "Articles this author wrote",
          "name": "wroteArticles"
        },
        {
          "dataType": [
              "Publication"
          ],
          "description": "The publication this author writes for",
          "name": "writesFor"
        }
      ]
    }],
  "type": "thing"
  }
}

client.schema.create(schema)

Creating your first schema (RESTful API, Python or JavaScript)

Currently, only with the Python client it is possible to upload a whole schema at once. If you are not using Python, you need to upload classes to Weaviate one by one. The schema from the previous example can be uploaded in the following steps:

1. Create the classes without references.

References to other classes can only be added if those classes exist in the Weaviate schema. Therefore, we first create the classes with all properties without references, and we will add the references in the step 2.

Add a class Publication without the property hasArticles, and add this to a running Weaviate instance as follows:

  import weaviate

client = weaviate.Client("http://localhost:8080")

class_obj = {
  "class": "Publication",
  "description": "A publication with an online source",
  "properties": [
    { 
      "dataType": [
        "string"
      ],
      "description": "Name of the publication",
      "name": "name"
    },
    {
      "dataType": [
          "geoCoordinates"
      ],
      "description": "Geo location of the HQ",
      "name": "headquartersGeoLocation"
    }
  ]
}

client.schema.create_class(class_obj)

Perform a similar request with the Article and Author class.

2. Add reference properties to the existing classes.

There are three classes in you Weaviate schema now, but we did not link them to each other with cross references yet. Let’s add the reference between Publication and Articles in the property hasArticles like this:

  import weaviate

client = weaviate.Client("http://localhost:8080")

reference_property = {
  "dataType": [
    "Article"
  ],
  "description": "The articles this publication has",
  "name": "hasArticles"
}

client.schema.property.create("Publication", reference_property)

Repeat this action with a property wroteArticles and writesFor of Author referring to Articles and Publication respectively.

Next steps

More Resources

If you can’t find the answer to your question here, please look at the:

  1. Frequently Asked Questions. Or,
  2. Knowledge base of old issues. Or,
  3. For questions: Stackoverflow. Or,
  4. For issues: Github. Or,
  5. Ask your question in the Slack channel: Slack.
Tags
  • how to
  • create a schema