This Full Stack SaaS application allows users to visually build, manage, and schedule web scrapers using a workflow builder powered by AI. Users can create, modify, and delete workflows with an intuitive drag-and-drop interface. The integration of AI simplifies the web scraping process, making it accessible to both technical and non-technical users.
- Visual Workflow Builder: Drag-and-drop interface to design scraping workflows effortlessly.
- AI Assistance: AI-powered suggestions for selectors, workflow optimization, and error handling.
- Credential Management: Securely manage login credentials for scraping protected websites.
- Scheduling System: Set up automatic scraping schedules for periodic data extraction.
- Workflow Management: Create, modify, delete, and duplicate workflows with ease.
- Data Export: Export scraped data in various formats (e.g., CSV, JSON).
- Server-Side Rendering (SSR) for optimized SEO and performance.
- API Routes to handle backend logic.
- Dynamic Routing for user-specific workflows.
- Built-in Authentication using NextAuth.js for secure user sessions.
- Next.js: React framework for server-side rendering and API integration.
- Tailwind CSS: For a sleek and responsive UI.
- React Flow: To enable the drag-and-drop workflow builder.
- Node.js: Handles API logic and server-side operations.
- Express: (Optional) Additional middleware for complex backend logic.
- Prisma: Database ORM for managing workflow and user data.
- PostgreSQL: Relational database for scalable storage.
- OpenAI API: For intelligent suggestions in the workflow builder.
- NextAuth.js: For secure authentication and user management.
- Cron Jobs: For implementing the scheduling system.
- Node.js (v16 or later)
- PostgreSQL database
- API Key for OpenAI
-
Clone the repository
git clone https://github.com/yourusername/ai-web-scraper-builder.git cd ai-web-scraper-builder
-
Install dependencies
npm install
-
Set up environment variables Create a
.env
file in the root directory and add the following:NEXT_PUBLIC_OPENAI_API_KEY=your_openai_api_key DATABASE_URL=postgresql://username:password@localhost:5432/yourdb NEXTAUTH_SECRET=your_nextauth_secret NEXTAUTH_URL=http://localhost:3000
-
Run database migrations
npx prisma migrate dev
-
Start the development server
npm run dev
The app will be available at
http://localhost:3000
.
- Use NextAuth.js to sign up or log in to your account.
- Drag and drop nodes to define scraping tasks.
- Use AI suggestions for selector optimization.
- Securely store website login credentials if required.
- Use the scheduling feature to automate scraping tasks.
- Download scraped data in the desired format.
- Start development server:
npm run dev
- Build for production:
npm run build
- Run production server:
npm start
- Lint code:
npm run lint
- Format code:
npm run format
- Add support for multi-step scraping workflows.
- Integrate more export formats (e.g., Google Sheets, Excel).
- Enhance AI capabilities for broader use cases.