Saturday, 3 October 2020

Python

Regression vs Classification - The most significant difference between regression vs classification is that while regression helps predict a continuous quantity, classification predicts discrete class labels. 

  • Regression
    • Regression is a process of finding the correlations between dependent and independent variables. It helps in predicting the continuous variables such as prediction of Market Trends, prediction of House prices, etc.
    • The task of the Regression algorithm is to find the mapping function to map the input variable(x) to the continuous output variable(y).
    • Types of Regression
      • Simple Linear Regression
      • Multiple Linear Regression
      • Polynomial Regression
      • Support Vector Regression
      • Decision Tree Regression
      • Random Forest Regression 
    • How to check accuracy
      • R Squared (to check goodness of best fit line)
      • Adjusted R Squared -> penalizes attributes that are not correlated
  • Classification
    • Classification is a process of finding a function which helps in dividing the dataset into classes based on different parameters. In Classification, a computer program is trained on the training dataset and based on that training, it categorizes the data into different classes.  
    • It is the process of finding or discovering a model or function which helps in separating the data into multiple categorical classes i.e. discrete values. In classification, data is categorized under different labels according to some parameters given in input and then the labels are predicted for the data.
    • The derived mapping function could be demonstrated in the form of “IF-THEN” rules. The classification process deal with the problems where the data can be divided into binary or multiple discrete labels.
    • Types of ML Classification Algorithms:
      • Logistic Regression
      • K-Nearest Neighbours
      • Support Vector Machines
      • Kernel SVM
      • Naïve Bayes
      • Decision Tree Classification
      • Random Forest Classification
    • How to check accuracy
      • confusion matrix
      • accuracy score
      • true positive rate
      • recall value
      • precision value 
      • F1 score

Difference between Regression and Classification

Regression Algorithm Classification Algorithm
In Regression, the output variable must be of continuous nature or real value. In Classification, the output variable must be a discrete value.
The task of the regression algorithm is to map the input value (x) with the continuous output variable(y). The task of the classification algorithm is to map the input value(x) with the discrete output variable(y).
Regression Algorithms are used with continuous data. Classification Algorithms are used with discrete data.
In Regression, we try to find the best fit line, which can predict the output more accurately. In Classification, we try to find the decision boundary, which can divide the dataset into different classes.
Regression algorithms can be used to solve the regression problems such as Weather Prediction, House price prediction, etc. Classification Algorithms can be used to solve classification problems such as Identification of spam emails, Speech Recognition, Identification of cancer cells, etc.
The regression Algorithm can be further divided into Linear and Non-linear Regression. The Classification algorithms can be divided into Binary Classifier and Multi-class Classifier.

 

Lambda Functions(Anonymous functions)

  • addition = lambda a,b:a+b
  • addition(12,14)

pyforest

  • lazy import of all python data science libraries
  •  

Univariate, Bivariate & MultiVariate

  • Univariate - Output based on only considering one feature
  • Bivariate - atleast two features
  • multi variate - 

Bias means error in training data 

Variance means error in test data

Overfitting - Low error with training data set but high error with Test data set. Low bias & high variance

Underfitting- high error for both training & test data sets. High bias & high variance. 

Multi-colleniarity

R Squared

  •  

Adjusted R Squared

 Hypothesis Testing

  • Stats is about data/huge amount of data. It will be useful when one analyzes it and draw conclusions from it. 
  • To find interpretation and conclusion we use hypothesis testing
  • Evaluates 2 mutual exclusive statements ON population data using sample data
  • Steps of hypothesis testing
    • Make initial assumption(H0). This is called as NULL hypothesis
    • Collect data(called evidences) to REJECT or NOT REJECT null hypothesis


    •  

 

No comments:

Post a Comment