Projects

Effects of Health Indicators on Diabetes

This project aims to analyze a diverse health dataset to identify key factors influencing diabetes onset. In our research we are planning to explore the dataset and evaluate the significance of factors. Using PCA and SVD, we will reduce data dimensions while retaining critical information. Applying logistic regression and SVM, we aim to construct predictive models for insightful performance analysis. Our research seeks to offer valuable insights for improved diabetes management and prevention strategies.

Predicting Cryptocurrency Prices with Time Series Concepts and Machine Learning Algorithms

This study assesses the predictive power of various statistical and machine learning models in the context of cryptocurrency price forecasting. Focusing on BTC (Bitcoin), BNB (Binance Coin), ETH (Etherium), XRP and DOGE (Dogecoin), we analyze Yahoo Finance daily closing price data from July 2020 to November 2023. The research includes an examination of ARIMA, VAR, ARMA-GARCH, Lasso Regression, and LSTM models, highlighting their capacity to capture the unique characteristics of cryptocurrency price dynamics. Additionally, we apply the Kruskal-Wallis test to determine the significance of seasonal trends in these markets. Our objective is to identify which models most accurately predict cryptocurrency prices in a highly volatile market. In particular, we are interested in the comparison between the classical time series models and machine learning methods.

The key to the Best GF Pizza dough

This project aims to to test various combinations of factors in the gluten-free (GF) pizza dough recipe with the ultimate goal of identifying the optimal recipe that yields the best GF dough. The project adopts an experimental design, incorporating various 2 level factors, such as 2 flour blends, rising time, sugar content and cooking method.
The experimental design follows a factorial approach, systematically varying key factors to identify their individual and interactive effects on the sensory attributes of the GF pizza dough. The response variables include taste and texture.
Statistical methods, such as analysis of variance (ANOVA) and regression analysis, will be employed to identify significant factors and their optimal levels for achieving the desired pizza dough characteristics.
The outcomes of this study will contribute valuable insights to the formulation of the best GF pizza dough.

Airbnb in Boston price prediction

We want to predict the price of airbnb in Boston, MA based on host attributes and airbnb attributes such as latitude and longitude of listing, neighborhood, price, room type, minimum/maximum number of nights, number of reviews, last review date, reviews per month, availability, host listings and etc. We are going to analyze the predicting variables and study their significance for the regression model. Furthermore, we will compare differences of mean price among Airbnbs in different neighborhoods of Boston. Some example uses of this model include: the company could optimize their host earnings, offer more competitive pricing, and identify key features in determining price.

Predicting the location and type of defects in steel manufacturing

The goal of the study was to predict the location and type of defects found in steel manufacturing. The study aimed to segment defects of each class (the classes were predefined by Severstal company) for each image, where each image may have no defects, a defect of a single class, or defects of multiple classes. To increase the size of training data, the study implemented image augmentation with the Albuminations library and compared the results with and without augmentation. The study tested U-Net models with different backbones to determine the most effective model for this task. The study used mean Dice coefficient and other possible metrics to evaluate the performance of the models. Overall, the study aimed to improve the accuracy of detecting and classifying defects in steel manufacturing, which is an important problem in the manufacturing industry.

Customer segmentation

This project involves customer segmentation using grocery categories. We aim to categorize customers based on their preferences and behaviors in the grocery domain. This enables us to tailor strategies for more personalized and effective customer engagement.

Lookalike Model for brands

This project focuses on developing a lookalike model for brands, specifically designed to identify customers for brand campaigns. The model has demonstrated significant success, generating substantial value by enhancing campaign effectiveness and increasing response rates.

BERT Fake vs Real News classification Using Pytorch

Our project focuses on telling apart real and fake news articles. We use a powerful model called BERT provided by the Huggingface library, known for understanding language exceptionally well. To make BERT work specifically for spotting fake news, we fine-tune it using a fake news dataset.

Bank Reviews Topic Modeling

We use advanced methods in natural language processing (NLP) and machine learning to classify bank reviews. By applying topic modeling to a large dataset of 500,000+ bank customer reviews, we successfully identified important themes. This approach improves the classification of reviews based on their content.

Sentiment analysis on customer reviews

This project focuses on sentiment analysis for customer reviews. Using advanced natural language processing and machine learning, we aim to analyze and understand the sentiments expressed in customer feedback, providing valuable insights for business improvement.