Back to All Projects

OpenAssistant - Safety Team

Contribute safety datasets to build a safety model for OpenAssistant(an open-source alternative to ChatGPT). The safety model will flag potentially malicious user requests and nudge OpenAssistant to respond appropriately

Repository Video 📺️

The Problem

ChatGPT gave us a glimpse of the power and capabilities of Large Language Models (LLMs) and the impact these technologies would have in our lives in the future. However, the concentration of these revolutionary technologies in the hands of a few giants is a dangerous world. Opensource models are going to be vital in creating truly democratized AI, making sure everybody has a say in how these technologies are developed and distributing the benefits equally. OpenAssistant is our humble attempt at future.

One major issue with large models like chatgpt is the models tend to generate dubious, unsound or even explicit results because the models have been trained on data from the internet. This means we have to incorporate safety mechanisms for that they are fit for general use. Making Large Language Models more aligned with what humans want and safer for the general public are areas that are still undergoing active research.

What is the project and How it works

Our goal for the hackathon was to create one such mechanism to make the Language Models our community builds safer. We built a safety pipeline that gently nudges the Large Language Model into generating outputs that are acceptable to the general public.

We do this by training another safety model, which is trained on explicit and unethical datasets, a few of which we contributed during the hackathon. This safety model then flags queries from the user if it thinks it might need intervention. It also generates what we call a “Rule of Thumb”, which are general rules accepted by the wider public as appropriate and use these to prompt the Large Language Model to generate responses keeping these rules in mind.

query from user> "I think all white people are racists"
output from safety model> <cls>; __needs_caution__ <ctx> It's wrong to think all white people are racists.
# this is then passed to the Large Language model to warn it about this.


Even though open assistant is a project that is a few months old, it has a lot of contributors contributing to every part of the stack. Since we are part of the safety team for the project our aim was to create an MVP of the safety pipeline which meant we

  1. Needed to build a data pipeline to scrape and clean data and create a dataset to train.
  2. Train a safety model with this and a few other datasets.

but we are happy to report we managed to complete both of these tasks. The relevant PRs can be found in #1967 and #1972

The most difficult and time-consuming of this process was scraping data from Reddit. We experimented with different approaches to get datasets like, Reddit APIs directly, PRAW library. We ended up choosing the PRAW method but this still had problems since we Reddit rate limits the requests we can make. We then transformed the dataset of questions found in explicit subreddits into the “Rule of Thumbs” not to do.

The next stage was training the model and building out the pipeline. We build out the entire pipleine with huggingface library but since our community does not have the final Large Language Models we used an early prototype to build the demo.

As to what is working/not working, even though a basic pipeline was completed, there are a lot of improvements and experiments to be done. From the data side, we are planning to use pushshift to scale up the data scrapping efforts in the hopes of having a larger and more diverse dataset. In the model side, this is just an initial prototype, still, need to run more experiments to find even better models to improve the safety of Language Models in general. Training a multilingual pipeline is also in the works.

Whats next - Long-Term plans

We strongly believe in the vision of creating a suite of truly open-source set of Large Language Models and involving the wider community in the creation of these models. So much so that earlier this year we decided to go full-time on this effort. Over the next few months, we hope to have a competent model similar to ChatGPT. You can follow more info about the project at and join the discord server

Jithin James
Shahul Es

Project created by Jithin James

March 3, 2023