Generic - Stellar by HTML5 UP

Audio classification using Deep Neural Network

This project was done working with audio files using librosa instead of scipy.
Librosa uses mono audio sample conversion by default in 22050 Hz instead of scipy’s 11025Hz.
Using librosa feature.mfcc (mel frequency cepstral coefficients) we can summarize the frequency distribution.
MFCC helps us in analyzing the frequency and time characteristics of the source audio.
Sequential model built along with adam optimizer.
Dropout (to prevent overfitting) and batch normalization were used on each layer.

Using tags, genres, average rating, and number of reviewers from different dataset are combined based on the movieID.
Nans on the tags are filled with genres of the movie.
As for preprocessing, nltk wordnet Lemmatization was used and the duplicate tags were removed.
For visualization Joint plot from seaborn were used to represent the frequency of reviewers for the average rating. Word cloud was used to show the most frequently occurring word.
Count Vectorizer is done to get the frequency count in each tag.
Finally, the movie recommendation is done based on cosine similarity to compare similarity between the features where 1 and 0 being high and low similarity respectively.

Dataset that contains the person and health information of various candidates.
Heatmap used to visually see the correlation between the feature to determine the required feature.
Scaling is done using the sklearn library to bring the range of value between 0 and 1.
Balancing is done by oversampling the class 1 as about 67% of the target were 0.
Using cross-fold validation for model selection to know the model with consistent performance.
Ensemble models such as Random Forest and xgboost performed consistently better.
Learning curve to know more about test and train score.
Recall is used as evaluation as it is more suited for this scenario.

Dataset consists of mixture of integer, float, and object types of features.
MinMaxScaler is used for scaling the integer data between 0 and 1 whereas the categorical data are converted into integer data.
The dataset was heavily imbalanced, hence SMOTE technique was used for balancing the dataset.
Random forest model was used before and after using feature, the accuracy and runtime time were compared.