Posts

Showing posts from February, 2021

Naive Bayes Classifier

Image
Naive Bayes algorithm It is classification technique Bayes’ Theorem with an assumption of independence among predictors. In simple terms, a Naive Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature. For example, a fruit may be considered to be an apple if it is red, round, and about 3 inches in diameter. Even if these features depend on each other or upon the existence of the other features, all of these properties independently contribute to the probability that this fruit is an apple and that is why it is known as ‘Naive’. Naive Bayes model is easy to build and particularly useful for very large data sets. Along with simplicity, Naive Bayes is known to outperform even highly sophisticated classification methods. Bayes theorem provides a way of calculating posterior probability P(c|x) from P(c), P(x) and P(x|c). Look at the equation below: Above, P ( c|x ) is the posterior probability of  class  (c,  target )

K-nearest neighbors (KNN)

K-nearest neighbors (KNN) algorithm is a type of supervised ML algorithm which can be used for both classification as well as regression predictive problems. However, it is mainly used for classification predictive problems in industry.  The following two properties would define KNN well  Lazy learning algorithm   − KNN is a lazy learning algorithm because it does not have a specialized training phase and uses all the data for training while classification. Non-parametric learning algorithm   − KNN is also a non-parametric learning algorithm because it doesn’t assume anything about the underlying data. Working of KNN Algorithm K-nearest neighbours (KNN) algorithm uses ‘feature similarity’ to predict the values of new datapoints which further means that the new data point will be assigned a value based on how closely it matches the points in the training set. We can understand its working with the help of the following steps − Step 1  − For implementing any algorithm, we need a dataset.

Bias Vs Variance

Image
Bias Vs Variance Tradeoff Bias is a metric used to evaluate a machine learning model’s ability to learn from the training data. A model with high bias will therefore not perform well on both the training data nor on test data presented to it. Models with low bias, conversely, learn well from the presented training data. Variance is a metric used to evaluate the ability of the trained model to generalize to some test dataset. More broadly, it also represents how similar the results from a model will be if it were fed different data from the same process. Models with low bias (which can learn from the training data well) often have high variance (and therefore an inability to generalize to new data), and this phenomenon is referred to as “overfitting”. By definition, therefore, high model variance despite low model bias is referred to as  overfitting . How to Prevent Overfitting Cross-validation Cross-validation is a powerful preventative measure against overfitting. The idea is clever:

Machine Learning: Training, Testing, Evaluation

  Machine Learning: Training, Testing, Evaluation : Evaluation metrics are tied to machine learning tasks. There are different metrics for the tasks of classification, regression, ranking, clustering, topic modelling, etc. Some metrics, such as precision-recall, are useful for multiple tasks. Classification, regression, and ranking are examples of supervised learning, which constitutes a majority of machine learning applications Classification Metrics Classification is about predicting class labels given input data. In binary classification, there are two possible output classes. In the multiclass classification, there are more than two possible classes. I’ll focus on binary classification here. But all of the metrics can be extended to the multiclass scenario. An example of binary classification is spam detection, where the input data could include the email text and metadata (sender, sending time), and the output label is either “spam” or “not spam.”   

Logistic Regression

Logistic regression: It is a classification algorithm used to assign observations to a discrete set of classes. Some of the examples of classification problems are Email spam or not spam, Online transactions Fraud or not Fraud, Tumor Malignant or Benign. Logistic regression transforms its output using the logistic sigmoid function to return a probability value.  What are the types of logistic regression?    1. Binary Classification (eg. Tumor Malignant or Benign)   2. Multi-class Classification (eg. Cats, dogs or monkey)