- Framework: Flask
- flask_bcrypt
- flask_wtf
- flask_paginate
- Database: MongoDB
- pymongo
- mongo-connect (pyspark package)
- Styling: Tailwindcss
- @tailwindcss/forms
- Apache Spark: Pyspark
- ml-latest-small - Small Movielens datasets
- data.json - contains results of scraped data from to_json.py
graph LR
A[Movielens Dataset] --> |preprocessing| B(Training Model)
C[Experiment]-->|Tunning| B
B --> |Integration| D[Flask prototype]
-
selection.py - Select imdbIds of movies of different genres with format dictionary with tag genre for key and list of imdbId as value
-
scrape.py - Uses bs4 and Requests to extract imdbId, title, year, poster, rating, summary, time, genres. return with json dumps
-
to_json.py - the scraped movies data append to list in json format with data.json and empty "[]". It's a bit weird to add empty list in file. but, its work! saving in json to handle: adding movieId and imdbId to confirm both id is match, for processing multiple genres in each movie and ensure that genres data type is array,m l/m in database, for processing when want to put certain data, ex: only year less than or more than, etc.
- CF_ALS.ipynb - this notebook represent experiment ALS with hyperparameter tuning and preprocessing data both used in ALS and Web Application. pyspark with mongo-connect responsibility to handle inserting data to mongodb after preprocessing.
Web Aplication is on Web Folder
-
Static folder - Contains css config for tailwindcss, javascript to handle rating display and more static file.
-
Templates folder - Contains HTML handled with jinja2
- layouts folder for base templating to share layouts contains header, footer
- other folder and file represent handle page for each other owned file or folder name.
assume if installed dependencies and adding config
- installed node (for tailwindcss), python, and apache spark
- Make sure to config mongodb port in web/app.py
pip install -r requirements.txt
cd web
npm install
npm run watch
python app.py