Google Summer of Code 2020
Three challenges for Google Summer 2020 to help improve Weaviate open source!
Contribute to Weaviate this summer
Are you a university student, and looking for a challenge in summer? Apply for Google Summer of Code, learn about Open Source development and contribute to Weaviate!
Weaviate is an open source project to create a smart knowledge graph infrastructure. Weaviate is written in Go, but can be used with any language or tool you are familiar with. Schema editing and data manipulation is done via RESTful API endpoints. Data in the graph can be queried via the user-friendly custom made GraphQL API endpoint. A Python client is also built to access your Weaviate instance.
|Weaviate on the Google Cloud Podcast||Weaviate on Google Cloud's Stackchat|
How to register
We value initiative, creativity and motivation. To show your initiative, creativity and motivation, add your ideas on how you would approach this challenge to your submission. Please also mention your previous projects (if any) and which technologies and tools you are comfortable with. In addition, don’t forget what you want to learn during the Summer of Code competition, both in terms of technology and other (soft) experiences.
Don’t hesitate to contact us for questions about the challenge and the Open Source project Weaviate.
You can apply through our form, and don’t forget to apply on the GSoC website.
Depending on the challenge, different skills are preferred. This does not mean you should be an expert in these technologies, but make sure to show strong motivation on what you want to learn a new skill, and why and how, in your application.
1. Search Graph Data Visualization
Weaviate currently has a “Playground”. In the playground, the classes and its relations as defined in the schema are visualized. In addition, the Playground can be used to edit the schema. There is no visualization of the data in a Weaviate instance yet. The goal of this challenge is to design and develop a data visualization tool. This tool should allow users to see what data is present in the graph and what relations exists. Other nice-to-have features are interactivity, subset selection and the opportunity to add, change, or remove data.
The goal is to design and develop an interactive data visualization tool (front end) for Weaviate. How you design and develop this is dependent on your own ideas, skills, in combination with common goals of the Weaviate Open Source project.
How to get started
You can already try out the Search Graph Weaviate, its Playground, the Python client and all API endpoints. This might inspire you with ideas for this challenge.
2. Smart Data Graph and Machine Learning
Weaviate currently supports automatic classification of relations of data objects via two methods. K Nearest Neighbour can be used if the user has some training data available, and classification based on pure contextual (vector) information can be used without any training data. There is lots of possibilities to use machine learning to improve the graph database.
The Contextionary is a dictionary that gives context to the language used in a database. Words are represented by vectors in a high dimensional space, which allows classification and makes the graph database ‘smart’. Read more about the Contextionary here.
The algorithm at the foundation of the Contexionary is open for improvement. If you have experience in Go and want to learn about applying machine learning methods in language and data representation, then this challenge could be something for you! We are open to new ideas, from improving the weighing features to features that enhance the Contextionary like adding unknown words or new languages. You can get inspired by the current open suggestions on Github.
Improvement on the Contextionary algorithm, implemented
Go, Git, Machine Learning, Natural Language Processing, Mathematics
2b. Improved or new ML implementation
As mentioned, we currently use K-NN and Context for doing classifications. There is a lot of room for improvement and/or enhancement of these implementations, as well as for new ML features to make the graph even ‘smarter’. We are open for any ideas. Creativity and initiative is valued, and support is offered. If you have questions about possibilities or your application, don’t hesitate to contact Laura!
New ML algorithm
Go, Git, Python, Machine Learning, Natural Language Processing, Mathematics
3. Implementing and exploring a use case & demo dataset
Weaviate can be used in various different scenarios, use cases and business cases. Examples are finance, medical, retail, academia, news, transportations, etc. We have explored several use cases, of which news publications is made publicly available. Interesting is implementing a new use case, there is lots of datasets publicly available. Exploring new use cases means gathering a dataset, loading it in Weaviate, exploring the opportunities that arise by using already implemented classification methods, and visualization.
Possible outcomes are a full use case exploration and demo dataset, suggestion for new features for Weaviate and a possible implementation of that, and a description and visualization of the use case exploration which can be used by other contributors or future users of Weaviate as example.
Again, creativity and initiative is valued, and support is offered. If you have questions about possibilities or your application, don’t hesitate to contact Laura!
Improvement on the Contextionary algorithm, implemented
Git, Python, Big Dat