Philosophy behind Weaviate
Weaviate is an open-source, GraphQL and RESTful API-enabled, search graph based on a word vector storage mechanism called the Contextionary.
- Why Weaviate?
- Basic Terminology
- About Classification
- About Weaviate Knowledge Networks
- More resources
Demo and Overview
Concept and demo presented at FOSDEM 2020;
We aim to allow anyone, anywhere, any time to create their own search graph or knowledge network.
In almost any situation where you work with data, you store information related to something in the real world. This can be data about transactions, cars, airplanes, products; you name it. The challenge with current databases is that it is difficult for the software to grasp the context of the entity you refer to in your datasets. Do the characters “Apple” refer to the company or the fruit?
The Weaviate search graph aims to solve this problem. Every time you store data to the search graph, Weaviate indexes the data based on the linguistical context through a feature called The Contextionary. For example, when you store data about a Company called Apple, Weaviate automatically contextualizes the data related to an iPhone.
If you want to learn how the Contextionary does this, you can read more about our Contextionary here. We don’t just want to store the data, but also the information and its context so that knowledge can be derived from it.
Because most data is related to something (e.g., Amsterdam is the capital of The Netherlands) we store not only the concept itself but also the relation to other concepts (e.g., “the city Amsterdam” to “the country The Netherlands”). This means that the data you add to a Weaviate instance creates a network of knowledge, better known as a graph.
Weaviate has four core features and a variety of additional features.
Weaviate consists of four core features;
- The contextionary (c11y) is a vector index which stores all data object based on their semantic meaning. This allows users to now only directly search and retrieve data, but also to search for its concepts.
- We believe that GraphQL combined with a RESTful API, provides the best user experience to query Weaviate.
- Weaviate can automatically build its own graph relations through conceptual classification.
- With Weaviate you can create a semantic Knowledge Network based on a P2P network of Weaviates.
- Weaviate is completely containerized with Docker and Kubernetes.
- Weaviate scales to support super-large graph sizes.
- Fast vector space querying.
|Schema||In Weaviate, a schema is used to define the types of data you will be adding and querying. You can learn more about it here.|
|Semantic Kinds||Because of Weaviates semantic nature, we make a distinction in semantic kinds. Weaviate distinct two different kinds: Things and Actions. When creating a Weaviate Schema, you need to explain what Semantic Kind a data object entails.|
|Thing||A thing is a semantic kind, referring to an object (e.g., car, rocketship, product). The easiest way to think about Things is in the form of nouns.|
|Action||An action is a semantic kind, referring to an action (e.g., walking, dancing, buying). The easiest way to think about Things is in the form of verbs.|
|Class||A class is a definition of a semantic kind. E.g., the Class Company or the Class Move. In Weaviate, classes can be recognized because they always have a capitalized first character. You can set as many classes with a naming you choose.|
|Property||All classes have properties. E.g., the class Company might have the property name. In Weaviate, properties can be recognized because they always have a lowercase first character.|
|Entity||An entity refers to something -often- in the world around us. E.g., a Company with the name Apple refers to an entity with a relation to a Product with the name iPhone. Weaviate’s Contextionary tries to find as many entities in your data as possible.|
|Concept||Concepts are related to entities. Often you will use concepts to search in your datasets. If your dataset has data about An Actor with the name Arnold Schwarzenegger and an Actor with the name Al Pacino, the concepts Movie and Terminator will find a closer relation to the first actor rather than the latter.|
|Beacon||A beacon is a reference to a particular data object in Weaviate or inside the knowledge network, this data object in turn has a position in the contextionary. Often defined as follows: |
|Knowledge Network||A peer to peer (P2P) network of Weaviates|
|Fuzzy||Opposed to most other data solutions, Weaviate uses fuzzy logic to interpret a query. The upside of this is that it might find answers to queries where a traditional data solution migth not.|
|C11y||Abbreviation of Contextionary.|
|Weaviate Cluster||A managed Weaviate cluster|
|Weaviate Cluster Service (WCS)||A managed services that hosts Weaviate clusters on the SeMI Network|
|Weaviate Knowledge Network (WKN)||A network of Weaviates|
About the Contextionary
The Contextionary (derived from dictionary, aka
C11Y) gives context to the language used in your dataset (there is an individual Contextionary per language). At the root, the Contextionary is based on the Global Vectors for Word Representation concept. When running a Weaviate instance, it comes with an out of the box Contextionary which is trained on Wikipedia and the Wiktionary. In principle, you never have to create a manual Contextionary. We aim to make the C11Y available for use cases in any domain, regardless if they are business-related, academic or other.
The Contextionary doesn’t use a traditional storage and indexing mechanism, but it uses vector positions to place data into a 600-dimensional space. When you run a Weaviate, it comes with a pre-trained Contextionary (you never have to do any training yourself) that contains the contextual representation that allows Weaviate to store data based on its contextual meaning.
An empty Weaviate could be envisioned like this:
When using Weaviate’s RESTful API to add data, the Contextionary calculates the position in the vector space that represents the real-world entity.
The process from a data object to a vector position is calculated based on the centroid of the words weighted by the occurrences of the individual words in the original training text-corpus (e.g., the word
the is seen as less important than the word
When a new class object is created, it will be added to a Weaviate.
When using the GraphQL interface, you can target a thing or action directly, or by searching for a nearby concept. E.g., the
company Apple from the previous illustration, can be found by searching for the concept
Because Weaviate converts all data objects in a vector position based on their semantic meaning, data object get a logical distance from each other. This allows for a variety of automated classification tasks Weaviate can perform in near-realtime.
Example of a classification task
Inside the Weaviate below, there are three data objects stored, a country, and two cities.
The country has a property called
hasCapital of which the reference is unset. We can now request Weaviate to connect the most likely candidate as the capital. Because Weaviate -through the schema- knows that the value of
hasCapital must be a
City it can choose from both Amsterdam and New York. Because of the semantic relation of Amsterdam to The Netherlands, a decision can be made.
When creating automatic classification tasks, the user is able to define how certain Weaviate needs to be of the connection. During querying, the user can see if the relation was made automatically or manually.
About Weaviate Knowledge Networks
Coming soon! Sign up for our newsletter to be informed about release dates.
Because Weaviate allows for fuzzy schema definitions (e.g., a “Company with the name Apple” is seen as semantically similar to a “Business with the identifier Apple Incorporated.”) you can find beacons not only in your local Weaviate but also over a network of Weaviates. Allow creating a completely decentralized network of search graphs, aka the knowledge network.
- The Contextionary is limited to a single language per Weaviate instance (i.e., English Contextionary, Spanish Contextionary, etcetera).
If you can’t find the answer to your question here, please use the: