Table of documentation contents

Introduction to Weaviate

Weaviate is an open-source, GraphQL and RESTful API-enabled, knowledge graph based on a word vector storage mechanism called the Contextionary.

Index

Video Tutorial

Do you prefer video over text or do you want more background information?

Why Weaviate?

We aim to allow anyone, anywhere, any time to create their own knowledge graph or knowledge network.

In almost any situation where you work with data, you store information related to something in the real world. This can be data about transactions, cars, airplanes, products; you name it. The challenge with current databases is that it is difficult for the software to grasp the context of the entity you refer to in your datasets. Do the characters “Apple” refer to the company or the fruit?

real world entities

The Weaviate knowledge graph aims to solve this problem. Every time you store data to the knowledge graph, Weaviate indexes the data based on the linguistical context through a feature called The Contextionary. For example, when you store data about a Company called Apple, Weaviate automatically contextualizes the data related to an iPhone.

If you want to learn how the Contextionary does this, you can read more about our Contextionary here. We don’t just want to store the data, but also the information and its context so that knowledge can be derived from it.

Because most data is related to something (e.g., Amsterdam is the capital of The Netherlands) we store not only the concept itself but also the relation to other concepts (e.g., “the city Amsterdam” to “the country The Netherlands”). This means that the data you add to a Weaviate instance creates a network of knowledge, better known as a graph.

why Weaviate is a knowledge graph

Features

Weaviate has four core features and a variety of additional features.

Core features

Weaviate consists of four core features;

weaviate knowledge graph USPs

  1. The contextionary (c11y) is a vector index which stores all data object based on their semantic meaning. This allows users to now only directly search and retrieve data, but also to search for its concepts.
  2. We believe that GraphQL combined with a RESTful API, provides the best user experience to query Weaviate.
  3. Weaviate can automatically build its own graph relations through conceptual classification.
  4. With Weaviate you can create a semantic Knowledge Network based on a P2P network of Weaviates.

Additional features

  • Weaviate is completely containerized with Docker and Kubernetes.
  • Weaviate scales to support super-large graph sizes.
  • Fast vector space querying.

Basic Terminology

TerminologyDescription
SchemaIn Weaviate, a schema is used to define the types of data you will be adding and querying. You can learn more about it here.
Semantic KindsBecause of Weaviates semantic nature, we make a distinction in semantic kinds. Weaviate distinct two different kinds: Things and Actions. When creating a Weaviate Schema, you need to explain what Semantic Kind a data object entails.
ThingA thing is a semantic kind, referring to an object (e.g., car, rocketship, product). The easiest way to think about Things is in the form of nouns.
ActionAn action is a semantic kind, referring to an action (e.g., walking, dancing, buying). The easiest way to think about Things is in the form of verbs.
ClassA class is a definition of a semantic kind. E.g., the Class Company or the Class Move. In Weaviate, classes can be recognized because they always have a capitalized first character. You can set as many classes with a naming you choose.
PropertyAll classes have properties. E.g., the class Company might have the property name. In Weaviate, properties can be recognized because they always have a lowercase first character.
EntityAn entity refers to something -often- in the world around us. E.g., a Company with the name Apple refers to an entity with a relation to a Product with the name iPhone. Weaviate’s Contextionary tries to find as many entities in your data as possible.
ConceptConcepts are related to entities. Often you will use concepts to search in your datasets. If your dataset has data about An Actor with the name Arnold Schwarzenegger and an Actor with the name Al Pacino, the concepts Movie and Terminator will find a closer relation to the first actor rather than the latter.
BeaconA beacon is a reference to a particular data object in Weaviate or inside the knowledge network, this data object in turn has a position in the contextionary. Often defined as follows: weaviate://{peerName}/{semanticKind}/{UUID}
Knowledge NetworkA peer to peer (P2P) network of Weaviates
FuzzyOpposed to most other data solutions, Weaviate uses fuzzy logic to interpret a query. The upside of this is that it might find answers to queries where a traditional data solution migth not.
C11yAbbreviation of Contextionary.
Weaviate ClusterA managed Weaviate cluster
Weaviate Cluster Service (WCS)A managed services that hosts Weaviate clusters on the SeMI Network
Weaviate Knowledge Network (WKN)A network of Weaviates

About the Contextionary

The Contextionary (derived from dictionary, aka C11Y) gives context to the language used in your dataset (there is an individual Contextionary per language). At the root, the Contextionary is based on the Global Vectors for Word Representation concept. When running a Weaviate instance, it comes with an out of the box Contextionary which is trained on Wikipedia and the Wiktionary. In principle, you never have to create a manual Contextionary. We aim to make the C11Y available for use cases in any domain, regardless if they are business-related, academic or other.

The Contextionary doesn’t use a traditional storage and indexing mechanism, but it uses vector positions to place data into a 600-dimensional space. When you run a Weaviate, it comes with a pre-trained Contextionary (you never have to do any training yourself) that contains the contextual representation that allows Weaviate to store data based on its contextual meaning.

An empty Weaviate could be envisioned like this:

empty Weaviate

When using Weaviate’s RESTful API to add data, the Contextionary calculates the position in the vector space that represents the real-world entity.

The process from a data object to a vector position is calculated based on the centroid of the words weighted by the occurrences of the individual words in the original training text-corpus (e.g., the word the is seen as less important than the word apple).

how the Contextionary calculates a vector

When a new class object is created, it will be added to a Weaviate.

Weaviate with data

When using the GraphQL interface, you can target a thing or action directly, or by searching for a nearby concept. E.g., the company Apple from the previous illustration, can be found by searching for the concept iphone.

About Classification

Because Weaviate converts all data objects in a vector position based on their semantic meaning, data object get a logical distance from each other. This allows for a variety of automated classification tasks Weaviate can perform in near-realtime.

Example of a classification task

Inside the Weaviate below, there are three data objects stored, a country, and two cities.

Weaviate with "Weaviate classification task, without relation

The country has a property called hasCapital of which the reference is unset. We can now request Weaviate to connect the most likely candidate as the capital. Because Weaviate -through the schema- knows that the value of hasCapital must be a City it can choose from both Amsterdam and New York. Because of the semantic relation of Amsterdam to The Netherlands, a decision can be made.

Weaviate with "Weaviate classification task, with relation

When creating automatic classification tasks, the user is able to define how certain Weaviate needs to be of the connection. During querying, the user can see if the relation was made automatically or manually.

About Weaviate Knowledge Networks

Coming soon! Sign up for our newsletter to be informed about release dates.

Because Weaviate allows for fuzzy schema definitions (e.g., a “Company with the name Apple” is seen as semantically similar to a “Business with the identifier Apple Incorporated.”) you can find beacons not only in your local Weaviate but also over a network of Weaviates. Allow creating a completely decentralized network of knowledge graphs, aka the knowledge network.

Miscellaneous

  • The Contextionary is limited to a single language per Weaviate instance (i.e., English Contextionary, Spanish Contextionary, etcetera).

Frequently Asked Questions

If you can’t find the answer to your question here, please use the:

  1. Knowledge base of old issues. Or,
  2. For questions: Stackoverflow. Or,
  3. For issues: Github.
Tags
  • introduction
  • Weaviate
  • Contextionary