Visualizing Crime in San Francisco

January 30, 2020

In this blog, we will be looking at the crime data in the city of San Francisco. The data we will be using contains all crimes in San Francisco from the year 2018 to 2020. You can download the data here. Since this dataset is very large (more than 330,000 crimes), we will be considering only a small part of the data for this post.

Segmenting and Clustering Neighborhoods in New York City

January 26, 2020

In this blog post, we will explore neighborhoods in New York City using the Foursquare API. We will get the most common venue categories in each neighborhood, and then using the k-means clustering algorithm, group the neighborhoods into clusters.

k-means Clustering for Customer Segmentation

January 19, 2020

Imagine that you have a customer dataset, and you are interested in exploring the behavior of your customers using their historical data.
Customer segmentation is the practice of partitioning a customer base into groups of individuals that have similar characteristics. It is a significant strategy as a business can target these specific groups of customers and effectively allocate marketing resources.

Understanding k-means Clustering

December 26, 2019

k-means clustering is a method of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining. k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster.
Although it is one of the simplest models for clustering, it is vastly used in many data science applications.

Finding the proper drug for a new patient using Decision Tree Classification

December 10, 2019

Imagine that you are a medical researcher compiling data for a study. You have collected data about a set of patients, all of whom suffered from the same illness. During their course of treatment, each patient responded to one of 5 medications, Drug A, Drug B, Drug C, Drug X and Y.
Part of your job is to build a model to find out which drug might be appropriate for a future patient with the same illness. The feature sets of this dataset are Age, Sex, Blood Pressure, and Cholesterol of patients, and the target is the drug that each patient responded to.

Customer Classification using K-Nearest Neighbors

November 29, 2019

Imagine a telecommunications provider has segmented its customer base by service usage patterns, categorizing the customers into four groups. If demographic data can be used to predict group membership, the company can customize offers for individual prospective customers. It is a classification problem. That is, given the dataset, with predefined labels, we need to build a model to be used to predict class of a new or unknown case.

Analyzing Emission Ratings for all cars in Canada (model year 2019)

November 5, 2019

In this blog post, we will use scikit-learn to implement different types of linear regression on our dataset. Then, we will split our data into training and testing sets, create a model using the training set, evaluate the model using a test set, and finally use the model to predict an unknown value. The dataset is related to the Fuel Consumption and Carbon Dioxide Emission of all cars for retail sale in Canada in the year 2019. You can download the dataset here.

Analyzing China's GDP growth using Non-Linear Regression

October 31, 2019

In this blog post, we will analyze China’s GDP growth from the year 1960 to 2019. If the data shows a curvy trend, then linear regression will not produce very accurate results when compared to a non-linear regression.

Visualizing Immigration to Canada using a Choropleth Map

October 2, 2019

A Choropleth map is a thematic map in which areas are shaded or patterned in proportion to the measurement of the statistical variable being displayed on the map. It provides an easy way to visualize how a measurement varies across a geographical area or it shows the level of variability within a region.

Exploring India using Folium Maps

September 17, 2019

Folium is a powerful Python library that helps you create several types of Leaflet maps. Folium builds on the data wrangling strengths of the Python ecosystem and the mapping strengths of the Leaflet.js library. Also, these maps are interactive so you can zoom into any region of interest despite the initial zoom level.

Pranav Shirole

Welcome to my Data Science Portfolio

Visualizing Crime in San Francisco

Segmenting and Clustering Neighborhoods in New York City

k-means Clustering for Customer Segmentation

Understanding k-means Clustering

Finding the proper drug for a new patient using Decision Tree Classification

Customer Classification using K-Nearest Neighbors

Analyzing Emission Ratings for all cars in Canada (model year 2019)

Analyzing China's GDP growth using Non-Linear Regression

Visualizing Immigration to Canada using a Choropleth Map

Exploring India using Folium Maps