Audio classification using Deep Neural Network

  • This project was done working with audio files using librosa instead of scipy.
  • Librosa uses mono audio sample conversion by default in 22050 Hz instead of scipy’s 11025Hz.
  • Using librosa feature.mfcc (mel frequency cepstral coefficients) we can summarize the frequency distribution.
  • MFCC helps us in analyzing the frequency and time characteristics of the source audio.
  • Sequential model built along with adam optimizer.
  • Dropout (to prevent overfitting) and batch normalization were used on each layer.

Movie Recommendation System

  • Using tags, genres, average rating, and number of reviewers from different dataset are combined based on the movieID.
  • Nans on the tags are filled with genres of the movie.
  • As for preprocessing, nltk wordnet Lemmatization was used and the duplicate tags were removed.
  • For visualization Joint plot from seaborn were used to represent the frequency of reviewers for the average rating. Word cloud was used to show the most frequently occurring word.
  • Count Vectorizer is done to get the frequency count in each tag.
  • Finally, the movie recommendation is done based on cosine similarity to compare similarity between the features where 1 and 0 being high and low similarity respectively.

Heart Failure

  • Dataset that contains the person and health information of various candidates.
  • Heatmap used to visually see the correlation between the feature to determine the required feature.
  • Scaling is done using the sklearn library to bring the range of value between 0 and 1.
  • Balancing is done by oversampling the class 1 as about 67% of the target were 0.
  • Using cross-fold validation for model selection to know the model with consistent performance.
  • Ensemble models such as Random Forest and xgboost performed consistently better.
  • Learning curve to know more about test and train score.
  • Recall is used as evaluation as it is more suited for this scenario.

Fraudulent Transaction Detection

  • Dataset consists of mixture of integer, float, and object types of features.
  • MinMaxScaler is used for scaling the integer data between 0 and 1 whereas the categorical data are converted into integer data.
  • The dataset was heavily imbalanced, hence SMOTE technique was used for balancing the dataset.
  • Random forest model was used before and after using feature, the accuracy and runtime time were compared.