A python implementation of the Data Viz Day visualization.

Find a file

Walker Waylon Scott 3441570a5a removed blur on images.		2017-08-08 11:11:50 -05:00
data	updated(etl)	2017-07-03 16:48:17 -05:00
notebooks	added a notebook template for suresh	2017-07-03 16:54:10 -05:00
src	removed blur on images.	2017-08-08 11:11:50 -05:00
wireframe	feat(wireframe) Wireframe mockups added.	2017-06-26 14:13:22 -05:00
.gitignore	Initial commit	2017-06-19 08:03:33 -05:00
LICENSE	Initial commit	2017-06-19 08:03:33 -05:00
Procfile	trying bash web.sh to run gunicorn.	2017-07-10 12:00:18 -05:00
README.md	implemented keyword api	2017-07-11 08:26:35 -05:00
requirements.txt	removed iplotter from requirements.txt	2017-08-01 18:18:19 -05:00
web.sh	added #!/bin/bash	2017-07-10 11:54:42 -05:00

README.md

pyDataVizDay

A python implementation of the Data Viz Day visualization. The visualization is built using a flask web framework to serve the web page. The visualization is currently running on heroku.

Developing the visualization

If you want to run the visualization yourself, and further develop it clone this repo and run the pyDataVizDay.py file.

git clone https://github.com/WaylonWalker/pyDataVizDay.git
python src//pyDataVizDay.py

Dataset

This project will be using the Kaggle IMDB 5000 Movie Dataset. It is a collection of 5000 movie metadata pulled from the IMDB website. It contains

first 5 rows of data

	color	director_name	num_critic_for_reviews	duration	director_facebook_likes	actor_3_facebook_likes	actor_2_name	actor_1_facebook_likes	gross	genres	actor_1_name	movie_title	num_voted_users	cast_total_facebook_likes	actor_3_name	facenumber_in_poster	plot_keywords	movie_imdb_link	num_user_for_reviews	language	country	content_rating	budget	title_year	actor_2_facebook_likes	imdb_score	aspect_ratio	movie_facebook_likes
0	Color	James Cameron	723	178	0	855	Joel David Moore	1000	7.60506e+08	Action/Adventure/Fantasy/Sci-Fi	CCH Pounder	Avatar	886204	4834	Wes Studi	0	avatar/future/marine/native/paraplegic	http://www.imdb.com/title/tt0499549/?ref_=fn_tt_tt_1	3054	English	USA	PG-13	2.37e+08	2009	936	7.9	1.78	33000
1	Color	Gore Verbinski	302	169	563	1000	Orlando Bloom	40000	3.09404e+08	Action/Adventure/Fantasy	Johnny Depp	Pirates of the Caribbean- At World's End	471220	48350	Jack Davenport	0	goddess/marriage ceremony/marriage proposal/pirate/singapore	http://www.imdb.com/title/tt0449088/?ref_=fn_tt_tt_1	1238	English	USA	PG-13	3e+08	2007	5000	7.1	2.35	0
2	Color	Sam Mendes	602	148	0	161	Rory Kinnear	11000	2.00074e+08	Action/Adventure/Thriller	Christoph Waltz	Spectre	275868	11700	Stephanie Sigman	1	bomb/espionage/sequel/spy/terrorist	http://www.imdb.com/title/tt2379713/?ref_=fn_tt_tt_1	994	English	UK	PG-13	2.45e+08	2015	393	6.8	2.35	85000
3	Color	Christopher Nolan	813	164	22000	23000	Christian Bale	27000	4.48131e+08	Action/Thriller	Tom Hardy	The Dark Knight Rises	1144337	106759	Joseph Gordon-Levitt	0	deception/imprisonment/lawlessness/police officer/terrorist plot	http://www.imdb.com/title/tt1345836/?ref_=fn_tt_tt_1	2701	English	USA	PG-13	2.5e+08	2012	23000	8.5	2.35	164000
4	nan	Doug Walker	nan	nan	131	nan	Rob Walker	131	nan	Documentary	Doug Walker	Star Wars: Episode VII - The Force Awakens	8	143	nan	0	nan	http://www.imdb.com/title/tt5289954/?ref_=fn_tt_tt_1	nan	nan	nan	nan	nan	nan	12	7.1	nan	0

Interactivity

The viz should include the following interactive features if time allows

Filters
Tooltips
Drill-down ability
Export the data
Animation

Modeling

The viz should include the following modeling techniques if time allows

Clustering
Trend and Forecasting
Regression & Correlation
Percentiles
Natural Language Processing
Reference Lines

Wireframe

This is the wireframe that the team has been given to replicate in python using any technique they wish to use.

Time Investment

5 min ETL
- initial creation(modified personal file)
5 min settings.py
- 5 minute (modified personal file)
20 min setup initial layout.html, index, and first two "story".html pages
- found decent bootstrap layout template online
- first time following the Flask Active button tutorial
- http://jinja.pocoo.org/docs/2.9/tricks/
10 min setup heroku repo and build environment
30 min debug initial heroku deployment
15 min added reveal slide template
45 min update etl
- split keyword/genre into rows
- define update method
- docstrings
- add keyword/genre to load method
1 hr implemented data.filter() method
30 min api parser/docs/keywords