⬅ Back to Portfolio

Movie Revenue & Rating Prediction

Machine Learning System  ·  Solo Project  ·  Python / Scikit-learn

A supervised machine learning system that predicts box office revenue and identifies optimal runtime ranges for maximizing audience ratings across film genres.

  GitHub 📄   Documentation

Overview

This system analyzes a dataset of ~4,800 film records to deliver two capabilities: revenue prediction based on budget, genre, and runtime, and runtime optimization analysis to identify the runtime ranges most correlated with high audience ratings per genre. Feature engineering and model selection were tuned to balance accuracy with interpretability.

Tech Stack

Python  ·  Pandas  ·  NumPy  ·  Scikit-learn  ·  Random Forest  ·  Gradient Boosting  ·  Jupyter Notebook  ·  Git

Model Performance

74.6%
R² Score
Revenue Prediction (Random Forest)
$104M
RMSE
Root Mean Squared Error
r = 0.468
Strongest Runtime Correlation
Action genre
~4,800
Film Records
Kaggle Movies Dataset

Runtime–Rating Correlations by Genre

Genre Correlation (r) Strength
Action 0.468 Strong
Thriller 0.459 Strong
Adventure Moderate Moderate
Drama Moderate Moderate
Horror Moderate Moderate
Comedy Weak Weak
Animation Weak Weak