Table of news contents

Google Summer of Code 2020

05 February 2020

Three challenges for Google Summer 2020 to help improve Weaviate open source!

Contribute to Weaviate this summer

Are you a university student, and looking for a challenge in summer? Apply for Google Summer of Code, learn about Open Source development and contribute to Weaviate!

Read more about the programme and application at the Summer of Code website. If you want to apply, don’t forget to apply both at this website and through our form. ​

About Weaviate

Weaviate is an open source project to create a smart knowledge graph infrastructure. Weaviate is written in Go, but can be used with any language or tool you are familiar with. Schema editing and data manipulation is done via RESTful API endpoints. Data in the graph can be queried via the user-friendly custom made GraphQL API endpoint. A Python client is also built to access your Weaviate instance.

Weaviate on the Google Cloud PodcastWeaviate on Google Cloud's Stackchat

How to register

We value initiative, creativity and motivation. To show your initiative, creativity and motivation, add your ideas on how you would approach this challenge to your submission. Please also mention your previous projects (if any) and which technologies and tools you are comfortable with. In addition, don’t forget what you want to learn during the Summer of Code competition, both in terms of technology and other (soft) experiences.

Don’t hesitate to contact us for questions about the challenge and the Open Source project Weaviate.

You can apply through our form, and don’t forget to apply on the GSoC website.

The mentors will be Laura and Micha.

Prerequisits

Depending on the challenge, different skills are preferred. This does not mean you should be an expert in these technologies, but make sure to show strong motivation on what you want to learn a new skill, and why and how, in your application.

1. vector search engine Data Visualization

Weaviate currently has a “Playground”. In the playground, the classes and its relations as defined in the schema are visualized. In addition, the Playground can be used to edit the schema. There is no visualization of the data in a Weaviate instance yet. The goal of this challenge is to design and develop a data visualization tool. This tool should allow users to see what data is present in the graph and what relations exists. Other nice-to-have features are interactivity, subset selection and the opportunity to add, change, or remove data. ​

Expected outcomes

The goal is to design and develop an interactive data visualization tool (front end) for Weaviate. How you design and develop this is dependent on your own ideas, skills, in combination with common goals of the Weaviate Open Source project. ​

How to get started

You can already try out the vector search engine Weaviate, its Playground, the Python client and all API endpoints. This might inspire you with ideas for this challenge.

Preferred skills

Javascript, html, css, Git

2. Smart Data Graph and Machine Learning

Weaviate currently supports automatic classification of relations of data objects via two methods. K Nearest Neighbour can be used if the user has some training data available, and classification based on pure contextual (vector) information can be used without any training data. There is lots of possibilities to use machine learning to improve the graph database. ​

2a. Contextionary

​ The Contextionary is a dictionary that gives context to the language used in a database. Words are represented by vectors in a high dimensional space, which allows classification and makes the graph database ‘smart’. Read more about the Contextionary here.

The algorithm at the foundation of the Contexionary is open for improvement. If you have experience in Go and want to learn about applying machine learning methods in language and data representation, then this challenge could be something for you! We are open to new ideas, from improving the weighing features to features that enhance the Contextionary like adding unknown words or new languages. You can get inspired by the current open suggestions on Github.

Expected outcomes

Improvement on the Contextionary algorithm, implemented

Preferred skills

Go, Git, Machine Learning, Natural Language Processing, Mathematics ​

2b. Improved or new ML implementation

As mentioned, we currently use K-NN and Context for doing classifications. There is a lot of room for improvement and/or enhancement of these implementations, as well as for new ML features to make the graph even ‘smarter’. We are open for any ideas. Creativity and initiative is valued, and support is offered. If you have questions about possibilities or your application, don’t hesitate to contact Laura! ​

Expected outcomes

New ML algorithm

Preferred skills

Go, Git, Python, Machine Learning, Natural Language Processing, Mathematics ​

3. Implementing and exploring a use case & demo dataset

Weaviate can be used in various different scenarios, use cases and business cases. Examples are finance, medical, retail, academia, news, transportations, etc. We have explored several use cases, of which news publications is made publicly available. Interesting is implementing a new use case, there is lots of datasets publicly available. Exploring new use cases means gathering a dataset, loading it in Weaviate, exploring the opportunities that arise by using already implemented classification methods, and visualization.

Possible outcomes are a full use case exploration and demo dataset, suggestion for new features for Weaviate and a possible implementation of that, and a description and visualization of the use case exploration which can be used by other contributors or future users of Weaviate as example.

Again, creativity and initiative is valued, and support is offered. If you have questions about possibilities or your application, don’t hesitate to contact Laura!

Expected outcomes

Improvement on the Contextionary algorithm, implemented

Preferred skills

Git, Python, Big Dat

Tags
  • Google Summer of Code
  • Students