System Diagrams

System Diagram

System Overview

  • Django Monolith for main Web App (ported from FastAPI microservices) deployed on Digital Ocean VM
  • Next.js SSG SPA for pregenerated and prebuilt content deployed to Cloudflare CDN
  • Python scripts utilizing Llama, OpenAI and Gemini models, FFmpeg, Selenium etc. to generate content
  • Gitlab, Docker, Actions, and testing harness to facilitate CI/CD
  • LFS for Videos, Content, and Images

Job Generation

Job Data Pipeline

This pipeline generates job content by:

  • pulling data from the OEWS, EP, CES, and CPS which are the main economic studies carried out by the government
  • generating additonal job information with LLMs
  • Computing calcuated fields
  • Serializing to JSON for UI population

Video Generation

Video Generation Pipeline

This pipeline generates vidoes by:
  • scraping+querying ground truth data on famous people, inventions, etc.
  • generating a video script
  • segmenting the video script by logical 'slides'
  • generating a query for each slide and retriveing image
  • translate text to speech
  • line up and compose audio, images, text, and iconography
Video Scraping

Video Curation Pipeline

This pipeline curates videos:
  • getting videos for a generated query from the Youtube API
  • scrape transcript for the videos
  • Using ensemble of few shot learning and embedded semantic similarity to filter videos