Included in this repository are selected final projects completed during my Masters program at Boston University. In the spirit of academic integrity, I have scrubbed the course names and identifiers from the included files so that they will not be plagiarised by future students.
An analysis of graduate admissions data researching whether the chance to admit can be predicted from given inputs using means-based statistical tests and Multiple Linear Regression. Includes:
- Data
- RMarkdown file with full analysis code, and
- Final report
An end-to-end data warehousing and reporting project. Raw data and a backup of the final data warehouse can be accessed here. Includes:
- ERD
- A scripts directory that contains
- database construction queries (SQL)
- shell scripts to automate the original data access from the Seas Around Us API, and
- data processing scripts (as Jupyter notebooks)
- Final report and presentation, and
- Tableau workbook
An exploratory research project comparing several types of Feature Selection approaches across several Machine Learning algorithms. Includes:
- Data
- RMarkdown file with full analysis code, and
- Final report
A demonstration of graph analysis that can support underlying NLP approaches using H.P. Lovecraft's work The Dunwich Horror. Includes:
- The raw text data) as an .Robj file
- RMarkdown file with full analysis code, and
- Final report
An implementation and example study of the Slope One algorithm in a distributed environment. The project uses large datasets from the GroupLens Project that were also used in the Netflix Prize competitions. Includes:
- A breif README
- A Jupyter notebook containing the full project and report, and
- A matching python script with only the algorithm implementation for testing in a distributed system (eg GCP)