Audio classification using Deep Neural Network
- This project was done working with audio files using librosa instead of scipy.
- Librosa uses mono audio sample conversion by default in 22050 Hz instead of scipy’s 11025Hz.
- Using librosa feature.mfcc (mel frequency cepstral coefficients) we can summarize the frequency distribution.
- MFCC helps us in analyzing the frequency and time characteristics of the source audio.
- Sequential model built along with adam optimizer.
- Dropout (to prevent overfitting) and batch normalization were used on each layer.
Movie Recommendation System
- Using tags, genres, average rating, and number of reviewers from different dataset are combined based on the movieID.
- Nans on the tags are filled with genres of the movie.
- As for preprocessing, nltk wordnet Lemmatization was used and the duplicate tags were removed.
- For visualization Joint plot from seaborn were used to represent the frequency of reviewers for the average rating. Word cloud was used to show the most frequently occurring word.
- Count Vectorizer is done to get the frequency count in each tag.
- Finally, the movie recommendation is done based on cosine similarity to compare similarity between the features where 1 and 0 being high and low similarity respectively.
Heart Failure
- Dataset that contains the person and health information of various candidates.
- Heatmap used to visually see the correlation between the feature to determine the required feature.
- Scaling is done using the sklearn library to bring the range of value between 0 and 1.
- Balancing is done by oversampling the class 1 as about 67% of the target were 0.
- Using cross-fold validation for model selection to know the model with consistent performance.
- Ensemble models such as Random Forest and xgboost performed consistently better.
- Learning curve to know more about test and train score.
- Recall is used as evaluation as it is more suited for this scenario.
Fraudulent Transaction Detection
- Dataset consists of mixture of integer, float, and object types of features.
- MinMaxScaler is used for scaling the integer data between 0 and 1 whereas the categorical data are converted into integer data.
- The dataset was heavily imbalanced, hence SMOTE technique was used for balancing the dataset.
- Random forest model was used before and after using feature, the accuracy and runtime time were compared.