Skip to content

arvindp25/Data-Science-portfolio

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

126 Commits
 
 
 
 

Repository files navigation

Data-Science-portfolio

Information About language,library,tools and technique


Language

Python

ide and tools

jupyter notebook
google colaboratory

Tableau


Library

Numpy for mathematical calculation and opeation.
>> https://numpy.org/
Pandas for loading data,manipulationg Dataframe, Preprocessing data with help of various method
>> https://pandas.pydata.org/
Matplotlib and Seaborn for various plot i.e barchart,scatterplot,boxplot
>>https://matplotlib.org/
>>https://seaborn.pydata.org/
>> sklearn for various algorithms
>>https://scikit-learn.org/

algorithms

Linear Regression >> sklearn documentation:- https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html
Logistic Regression >>sklearn documentation:- https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html
Decison Trees
>> sklearn :- https://scikit-learn.org/stable/modules/tree.html
Random Forest
>> sklearn :-https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html
Ada boost
>> documentation:- XGbooost
>> xgboost documentation:-https://xgboost.readthedocs.io/en/latest/
K-means
>>documentation:- https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html
Hierarchical clustering
>> documentation:- https://scikit-learn.org/stable/modules/clustering.html#hierarchical-clustering
principal component analysis >>documentaion:- https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html

Steps follwed in various Case Study


i.Loading Data with the help of Pandas
ii.preproecessing data
a. checking null value and imputing
b. checking duplicates
c. Handling Categorical columns which have so many cateogory in it
d. Understanding Data by by EDA
iii.splitting data in 70/30
iv.Scaling Data whereever required
v.Model building
vi.Model evaluation by various parameter
a. For regression :-RMSE
b.For CLassification :- Accuracy score or AUC-ROC SCOre(for imbalance data)

vii.Finalizing model and conclusion with respect to business term


What you can find in future (28/12/20)

Data analysis with tableau.
Implemeataion of various Time-series models i.e holt's winter, ARIMA, SARIMA, SARIMAX
Implemetation of Deep Learning algorithms MLP,CNN,RNN

About

Here You can find various work done by me

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors