Cherava
Cherava, a web based scraper environment that focuses on simplicity.
Repository Video 📺️Cherava
An open source zero code webscraping automation tool.
This system intends to create alert systems for various tasks like university notice boards, ecommerce site price alerts, product alerts, etc.
Features that work
- Gui based scraper task creation with zero code for any website.
- Users can add workflows and it'll be saved into database based on user session.
- Automation of scraping task with cron job like scheduling.
- Notification system to alert user when the contents of a specific html selector changes over time using email.
Future proposed features
- Use a preview of website within the ui itself to pick the css selector.
- More notification provider options
- UI based or script engine features to further process the data that was received on scraping.
Working
- Workflows for scraping automation tasks are created using a browser session as id, future plans are to use proper authentication.
- Cheerio nodejs package is used to run the webscraping tasks, and a preview is generated for the url and css selector specified in the add workflow ui. The css selector needs to be taken from the website using Inspect Element.
- A workflow upon creation is added into the Postgresql database hosted on railway.app along with the notification recipient emails, and the interval for checking the updates on the website.
- Node-cron package is used for scheduling the scraping workflow.
- Nodemailer package then sends the email to the recipients when the workflow detects a change in the css selector's contents in the website vs the contents stored in the database during the first run when the worflow was added.
Timeline
- Initially the webscraper and the basic ui for adding a workflow was implemented.
- Next the workflow ui was connected to Postgresql db
- Next the Nodemailer notifier module was added
- Next cron job module was added
- Finally all of this was integrate together
Demo Video
To run locally
Clone the Repo
Frontend
cd frontend
npm i
npm run dev
Backend
cd backend
npm i
npm run dev
Env file format for frontend
Env file format for backend
Roshan R Chandar
Sudev Suresh Sreedevi
Ajay Krishna K V
Demo Video created at https://youtu.be/Eqarz4dFGnU
March 5, 2023
Project created by Roshan R Chandar
March 3, 2023