Imagine you are moving to London, UK. It’s a major metropolitan city, a financial hub, a famous tourist destination, and home to around 9 million people. But as with every big city, crime is a concern, and you would like to live in a neighborhood that is safe and also popular. In this blog, we’ll use the London Crime data and the Foursquare API to select which neighborhood best fits our needs.
The London Crime data consists of more than 13 million rows containing counts of criminal reports by month, LSOA (Lower Super Output Area) borough, and major/minor category. You can download the data here.
About the data:
- lsoa_code: code for Lower Super Output Area in Greater London.
- borough: Common name for London borough.
- major_category: High-level categorization of crime
- minor_category: Low-level categorization of crime within a major category.
- value: monthly reported count of categorical crime in given borough
- year: Year of reported counts, 2008-2016
- month: Month of reported counts, 1-12
# import libraries
import pandas as pd # library for data analysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation
import requests # library to handle requests
from bs4 import BeautifulSoup # library for web scraping
#!conda install -c conda-forge geocoder --yes
import geocoder
#!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values
# libraries for displaying images
from IPython.display import Image
from IPython.core.display import HTML
# transforming json file into a pandas dataframe library
from pandas.io.json import json_normalize
#!conda install -c conda-forge folium=0.5.0 --yes
import folium # plotting library
print('Libraries imported.')
Libraries imported.
Define Foursquare credentials.
CLIENT_ID = '**********'
CLIENT_SECRET = '**********'
VERSION = '20191912'
# limit the number of venues returned by the foursquare API
LIMIT = 50
Read the dataset into a pandas dataframe.
df = pd.read_csv('london_crime_by_lsoa.csv')
df.head()
lsoa_code | borough | major_category | minor_category | value | year | month | |
---|---|---|---|---|---|---|---|
0 | E01001116 | Croydon | Burglary | Burglary in Other Buildings | 0 | 2016 | 11 |
1 | E01001646 | Greenwich | Violence Against the Person | Other violence | 0 | 2016 | 11 |
2 | E01000677 | Bromley | Violence Against the Person | Other violence | 0 | 2015 | 5 |
3 | E01003774 | Redbridge | Burglary | Burglary in Other Buildings | 0 | 2016 | 3 |
4 | E01004563 | Wandsworth | Robbery | Personal Property | 0 | 2008 | 6 |
# dimensions of the dataframe
df.shape
(13490604, 7)
Preprocessing the data
# remove all null value entries
df = df[df.value != 0]
# reset the index and drop the previous index
df = df.reset_index(drop=True)
df.head()
lsoa_code | borough | major_category | minor_category | value | year | month | |
---|---|---|---|---|---|---|---|
0 | E01004177 | Sutton | Theft and Handling | Theft/Taking of Pedal Cycle | 1 | 2016 | 8 |
1 | E01000086 | Barking and Dagenham | Theft and Handling | Other Theft Person | 1 | 2009 | 5 |
2 | E01001301 | Ealing | Theft and Handling | Other Theft Person | 2 | 2012 | 1 |
3 | E01001794 | Hackney | Violence Against the Person | Harassment | 1 | 2013 | 2 |
4 | E01000733 | Bromley | Criminal Damage | Criminal Damage To Motor Vehicle | 1 | 2016 | 4 |
# new dimensions of the dataframe
df.shape
(3419099, 7)
Change the column names.
df.columns = ['LSOA_Code', 'Borough', 'Major_Category', 'Minor_Category', 'No_of_Crimes', 'Year', 'Month']
df.head(2)
LSOA_Code | Borough | Major_Category | Minor_Category | No_of_Crimes | Year | Month | |
---|---|---|---|---|---|---|---|
0 | E01004177 | Sutton | Theft and Handling | Theft/Taking of Pedal Cycle | 1 | 2016 | 8 |
1 | E01000086 | Barking and Dagenham | Theft and Handling | Other Theft Person | 1 | 2009 | 5 |
# dataset information
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3419099 entries, 0 to 3419098
Data columns (total 7 columns):
# Column Dtype
--- ------ -----
0 LSOA_Code object
1 Borough object
2 Major_Category object
3 Minor_Category object
4 No_of_Crimes int64
5 Year int64
6 Month int64
dtypes: int64(3), object(4)
memory usage: 182.6+ MB
What is the total number of crimes in each Borough?
df['Borough'].value_counts()
Lambeth 152784
Croydon 147203
Southwark 144362
Ealing 140006
Newham 137275
Brent 129925
Lewisham 128232
Barnet 127194
Tower Hamlets 120099
Wandsworth 118995
Enfield 117953
Hackney 116521
Haringey 116315
Waltham Forest 114603
Camden 112029
Islington 111755
Hillingdon 110614
Westminster 110070
Bromley 109855
Hounslow 106561
Redbridge 105932
Greenwich 104654
Hammersmith and Fulham 92084
Barking and Dagenham 86849
Havering 82288
Kensington and Chelsea 81295
Harrow 73993
Bexley 73948
Merton 73661
Sutton 62776
Richmond upon Thames 61857
Kingston upon Thames 46846
City of London 565
Name: Borough, dtype: int64
The Boroughs of Lambeth, Croydon, Southwark and Ealing have the highest number of crimes from the year 2008 to 2016.
What is the total number of crimes per major category?
df['Major_Category'].value_counts()
Theft and Handling 1136994
Violence Against the Person 894859
Criminal Damage 466268
Burglary 441209
Drugs 231894
Robbery 163549
Other Notifiable Offences 80569
Fraud or Forgery 2682
Sexual Offences 1075
Name: Major_Category, dtype: int64
Pivot the table to view the number of crimes for each major category in each Borough.
London_crime = pd.pivot_table(df, values=['No_of_Crimes'],
index=['Borough'],
columns=['Major_Category'],
aggfunc=np.sum, fill_value=0)
London_crime.head()
No_of_Crimes | |||||||||
---|---|---|---|---|---|---|---|---|---|
Major_Category | Burglary | Criminal Damage | Drugs | Fraud or Forgery | Other Notifiable Offences | Robbery | Sexual Offences | Theft and Handling | Violence Against the Person |
Borough | |||||||||
Barking and Dagenham | 18103 | 18888 | 9188 | 205 | 2819 | 6105 | 49 | 50999 | 43091 |
Barnet | 36981 | 21024 | 9796 | 175 | 2953 | 7374 | 38 | 87285 | 46565 |
Bexley | 14973 | 17244 | 7346 | 106 | 1999 | 2338 | 22 | 40071 | 30037 |
Brent | 28923 | 20569 | 25978 | 157 | 3711 | 12473 | 39 | 72523 | 63178 |
Bromley | 27135 | 24039 | 8942 | 196 | 2637 | 4868 | 31 | 69742 | 46759 |
# reset the index
London_crime.reset_index(inplace=True)
# total crimes per Borough
London_crime['Total'] = London_crime.sum(axis=1)
London_crime.head()
Borough | No_of_Crimes | Total | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Major_Category | Burglary | Criminal Damage | Drugs | Fraud or Forgery | Other Notifiable Offences | Robbery | Sexual Offences | Theft and Handling | Violence Against the Person | ||
0 | Barking and Dagenham | 18103 | 18888 | 9188 | 205 | 2819 | 6105 | 49 | 50999 | 43091 | 149447 |
1 | Barnet | 36981 | 21024 | 9796 | 175 | 2953 | 7374 | 38 | 87285 | 46565 | 212191 |
2 | Bexley | 14973 | 17244 | 7346 | 106 | 1999 | 2338 | 22 | 40071 | 30037 | 114136 |
3 | Brent | 28923 | 20569 | 25978 | 157 | 3711 | 12473 | 39 | 72523 | 63178 | 227551 |
4 | Bromley | 27135 | 24039 | 8942 | 196 | 2637 | 4868 | 31 | 69742 | 46759 | 184349 |
Remove the multi-index so that it will be easier to merge the columns.
London_crime.columns = London_crime.columns.map(' '.join)
London_crime.head()
Borough | No_of_Crimes Burglary | No_of_Crimes Criminal Damage | No_of_Crimes Drugs | No_of_Crimes Fraud or Forgery | No_of_Crimes Other Notifiable Offences | No_of_Crimes Robbery | No_of_Crimes Sexual Offences | No_of_Crimes Theft and Handling | No_of_Crimes Violence Against the Person | Total | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | Barking and Dagenham | 18103 | 18888 | 9188 | 205 | 2819 | 6105 | 49 | 50999 | 43091 | 149447 |
1 | Barnet | 36981 | 21024 | 9796 | 175 | 2953 | 7374 | 38 | 87285 | 46565 | 212191 |
2 | Bexley | 14973 | 17244 | 7346 | 106 | 1999 | 2338 | 22 | 40071 | 30037 | 114136 |
3 | Brent | 28923 | 20569 | 25978 | 157 | 3711 | 12473 | 39 | 72523 | 63178 | 227551 |
4 | Bromley | 27135 | 24039 | 8942 | 196 | 2637 | 4868 | 31 | 69742 | 46759 | 184349 |
Let’s rename the columns for better comprehensibility.
London_crime.columns = ['Borough', 'Burglary', 'Criminal Damage', 'Drugs', 'Fraud or Forgery', 'Other Notifiable Offenses',
'Robbery', 'Sexual Offences', 'Theft and Handling', 'Violence Against the Person', 'Total']
London_crime
Borough | Burglary | Criminal Damage | Drugs | Fraud or Forgery | Other Notifiable Offenses | Robbery | Sexual Offences | Theft and Handling | Violence Against the Person | Total | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | Barking and Dagenham | 18103 | 18888 | 9188 | 205 | 2819 | 6105 | 49 | 50999 | 43091 | 149447 |
1 | Barnet | 36981 | 21024 | 9796 | 175 | 2953 | 7374 | 38 | 87285 | 46565 | 212191 |
2 | Bexley | 14973 | 17244 | 7346 | 106 | 1999 | 2338 | 22 | 40071 | 30037 | 114136 |
3 | Brent | 28923 | 20569 | 25978 | 157 | 3711 | 12473 | 39 | 72523 | 63178 | 227551 |
4 | Bromley | 27135 | 24039 | 8942 | 196 | 2637 | 4868 | 31 | 69742 | 46759 | 184349 |
5 | Camden | 27939 | 18482 | 21816 | 123 | 3857 | 9286 | 36 | 140596 | 53012 | 275147 |
6 | City of London | 15 | 16 | 33 | 0 | 17 | 24 | 0 | 561 | 114 | 780 |
7 | Croydon | 33376 | 31218 | 19162 | 270 | 4340 | 12645 | 55 | 91437 | 67791 | 260294 |
8 | Ealing | 30831 | 25613 | 18591 | 175 | 4406 | 9568 | 52 | 93834 | 68492 | 251562 |
9 | Enfield | 30213 | 22487 | 13251 | 132 | 3293 | 9059 | 38 | 70371 | 45036 | 193880 |
10 | Greenwich | 20966 | 22755 | 10836 | 107 | 3598 | 5430 | 56 | 64923 | 52897 | 181568 |
11 | Hackney | 21450 | 17327 | 18144 | 143 | 3332 | 8975 | 46 | 91118 | 56584 | 217119 |
12 | Hammersmith and Fulham | 17010 | 14595 | 15492 | 91 | 3352 | 5279 | 45 | 86381 | 43014 | 185259 |
13 | Haringey | 28213 | 22272 | 14563 | 207 | 2971 | 10084 | 40 | 83979 | 50943 | 213272 |
14 | Harrow | 19630 | 12724 | 7122 | 92 | 1998 | 4242 | 27 | 40800 | 30213 | 116848 |
15 | Havering | 21302 | 17252 | 8171 | 179 | 2358 | 3089 | 19 | 52609 | 33968 | 138947 |
16 | Hillingdon | 26056 | 24485 | 11413 | 223 | 6504 | 5663 | 44 | 80028 | 55264 | 209680 |
17 | Hounslow | 21026 | 21407 | 13722 | 183 | 3963 | 4847 | 40 | 70180 | 51404 | 186772 |
18 | Islington | 22207 | 18354 | 16553 | 85 | 3675 | 8736 | 40 | 107661 | 52975 | 230286 |
19 | Kensington and Chelsea | 14980 | 9839 | 14573 | 85 | 2203 | 4744 | 24 | 95963 | 29570 | 171981 |
20 | Kingston upon Thames | 10131 | 10610 | 5682 | 65 | 1332 | 1702 | 18 | 38226 | 21540 | 89306 |
21 | Lambeth | 30199 | 26136 | 25083 | 137 | 4520 | 18408 | 70 | 114899 | 72726 | 292178 |
22 | Lewisham | 24871 | 24810 | 16825 | 262 | 3809 | 10455 | 71 | 70382 | 63652 | 215137 |
23 | Merton | 16485 | 14339 | 6651 | 111 | 1571 | 4021 | 26 | 44128 | 28322 | 115654 |
24 | Newham | 25356 | 24177 | 18389 | 323 | 4456 | 16913 | 43 | 106146 | 66221 | 262024 |
25 | Redbridge | 26735 | 17543 | 15736 | 284 | 2619 | 7688 | 31 | 71496 | 41430 | 183562 |
26 | Richmond upon Thames | 16097 | 11722 | 4707 | 37 | 1420 | 1590 | 26 | 40858 | 20314 | 96771 |
27 | Southwark | 27980 | 24450 | 27381 | 321 | 4696 | 16153 | 40 | 109432 | 68356 | 278809 |
28 | Sutton | 13207 | 14474 | 4586 | 57 | 1393 | 2308 | 20 | 39533 | 25409 | 100987 |
29 | Tower Hamlets | 21510 | 21593 | 23408 | 124 | 4268 | 10050 | 47 | 87620 | 59993 | 228613 |
30 | Waltham Forest | 25565 | 20459 | 14101 | 236 | 3040 | 10606 | 34 | 77940 | 51898 | 203879 |
31 | Wandsworth | 25533 | 19630 | 9493 | 161 | 3091 | 8398 | 47 | 92523 | 45865 | 204741 |
32 | Westminster | 29295 | 20405 | 34031 | 273 | 6148 | 15752 | 59 | 277617 | 71448 | 455028 |
Scraping data from the web
Let’s scrape additional information about the different Boroughs in London from the “List of London boroughs” Wikipedia page.
We’ll use the Beautiful Soup library to scrape the latitude and longitude coordinates of the boroughs in London.
# getting data from internet
wikipedia_link = 'https://en.wikipedia.org/wiki/List_of_London_boroughs'
raw_wikipedia_page = requests.get(wikipedia_link).text
# using beautiful soup to parse the HTML/XML codes.
soup = BeautifulSoup(raw_wikipedia_page,'xml')
print(soup.prettify())
Note: I am not including the extracted data from the HTML page since it will take up too much space in this post.
Extract the raw table inside the webpage.
table = soup.find_all('table', {'class':'wikitable sortable'})
print(table)
Note: I am not including the extracted data from the table since it will take up too much space in this post.
Convert the table into a dataframe.
London_table = pd.read_html(str(table[0]), index_col=None, header=0)[0]
London_table.head()
Borough | Inner | Status | Local authority | Political control | Headquarters | Area (sq mi) | Population (2013 est)[1] | Co-ordinates | Nr. in map | |
---|---|---|---|---|---|---|---|---|---|---|
0 | Barking and Dagenham [note 1] | NaN | NaN | Barking and Dagenham London Borough Council | Labour | Town Hall, 1 Town Square | 13.93 | 194352 | 51°33′39″N 0°09′21″E / 51.5607°N 0.1557°E | 25 |
1 | Barnet | NaN | NaN | Barnet London Borough Council | Conservative | North London Business Park, Oakleigh Road South | 33.49 | 369088 | 51°37′31″N 0°09′06″W / 51.6252°N 0.1517°W | 31 |
2 | Bexley | NaN | NaN | Bexley London Borough Council | Conservative | Civic Offices, 2 Watling Street | 23.38 | 236687 | 51°27′18″N 0°09′02″E / 51.4549°N 0.1505°E | 23 |
3 | Brent | NaN | NaN | Brent London Borough Council | Labour | Brent Civic Centre, Engineers Way | 16.70 | 317264 | 51°33′32″N 0°16′54″W / 51.5588°N 0.2817°W | 12 |
4 | Bromley | NaN | NaN | Bromley London Borough Council | Conservative | Civic Centre, Stockwell Close | 57.97 | 317899 | 51°24′14″N 0°01′11″E / 51.4039°N 0.0198°E | 20 |
There is a second table on the webpage that contains the additional Borough - City of London.
# read the second table
London_table1 = pd.read_html(str(table[1]), index_col=None, header=0)[0]
# rename the columns to match the previous table
London_table1.columns = ['Borough', 'Inner', 'Status', 'Local authority', 'Political control', 'Headquarters',
'Area (sq mi)', 'Population (2013 est)[1]', 'Co-ordinates', 'Nr. in map']
# view the table
London_table1
Borough | Inner | Status | Local authority | Political control | Headquarters | Area (sq mi) | Population (2013 est)[1] | Co-ordinates | Nr. in map | |
---|---|---|---|---|---|---|---|---|---|---|
0 | City of London | ([note 5] | Sui generis;City;Ceremonial county | Corporation of London;Inner Temple;Middle Temple | ? | Guildhall | 1.12 | 7000 | 51°30′56″N 0°05′32″W / 51.5155°N 0.0922°W | 1 |
Let’s append the dataframes of ‘London_table’ and ‘London_table1’ together. A continuous index value will be maintained across the rows in the newly appended dataframe.
London_table = London_table.append(London_table1, ignore_index=True)
# check the last rows of the data set
London_table.tail()
Borough | Inner | Status | Local authority | Political control | Headquarters | Area (sq mi) | Population (2013 est)[1] | Co-ordinates | Nr. in map | |
---|---|---|---|---|---|---|---|---|---|---|
28 | Tower Hamlets | NaN | NaN | Tower Hamlets London Borough Council | Labour | Town Hall, Mulberry Place, 5 Clove Crescent | 7.63 | 272890 | 51°30′36″N 0°00′21″W / 51.5099°N 0.0059°W | 8 |
29 | Waltham Forest | NaN | NaN | Waltham Forest London Borough Council | Labour | Waltham Forest Town Hall, Forest Road | 14.99 | 265797 | 51°35′27″N 0°00′48″W / 51.5908°N 0.0134°W | 28 |
30 | Wandsworth | NaN | NaN | Wandsworth London Borough Council | Conservative | The Town Hall, Wandsworth High Street | 13.23 | 310516 | 51°27′24″N 0°11′28″W / 51.4567°N 0.1910°W | 5 |
31 | Westminster | NaN | City | Westminster City Council | Conservative | Westminster City Hall, 64 Victoria Street | 8.29 | 226841 | 51°29′50″N 0°08′14″W / 51.4973°N 0.1372°W | 2 |
32 | City of London | ([note 5] | Sui generis;City;Ceremonial county | Corporation of London;Inner Temple;Middle Temple | ? | Guildhall | 1.12 | 7000 | 51°30′56″N 0°05′32″W / 51.5155°N 0.0922°W | 1 |
We’ll remove the unnecessary strings in the dataset.
London_table = London_table.replace('note 1','', regex=True)
London_table = London_table.replace('note 2','', regex=True)
London_table = London_table.replace('note 3','', regex=True)
London_table = London_table.replace('note 4','', regex=True)
London_table = London_table.replace('note 5','', regex=True)
London_table.head()
Borough | Inner | Status | Local authority | Political control | Headquarters | Area (sq mi) | Population (2013 est)[1] | Co-ordinates | Nr. in map | |
---|---|---|---|---|---|---|---|---|---|---|
0 | Barking and Dagenham [] | NaN | NaN | Barking and Dagenham London Borough Council | Labour | Town Hall, 1 Town Square | 13.93 | 194352 | 51°33′39″N 0°09′21″E / 51.5607°N 0.1557°E | 25 |
1 | Barnet | NaN | NaN | Barnet London Borough Council | Conservative | North London Business Park, Oakleigh Road South | 33.49 | 369088 | 51°37′31″N 0°09′06″W / 51.6252°N 0.1517°W | 31 |
2 | Bexley | NaN | NaN | Bexley London Borough Council | Conservative | Civic Offices, 2 Watling Street | 23.38 | 236687 | 51°27′18″N 0°09′02″E / 51.4549°N 0.1505°E | 23 |
3 | Brent | NaN | NaN | Brent London Borough Council | Labour | Brent Civic Centre, Engineers Way | 16.70 | 317264 | 51°33′32″N 0°16′54″W / 51.5588°N 0.2817°W | 12 |
4 | Bromley | NaN | NaN | Bromley London Borough Council | Conservative | Civic Centre, Stockwell Close | 57.97 | 317899 | 51°24′14″N 0°01′11″E / 51.4039°N 0.0198°E | 20 |
# type of the dataframe
type(London_table)
pandas.core.frame.DataFrame
# shape of the dataframe
London_table.shape
(33, 10)
Check if the Borough in both the dataframes match.
set(df.Borough) - set(London_table.Borough)
{'Barking and Dagenham', 'Greenwich', 'Hammersmith and Fulham'}
These 3 Boroughs don’t match because of the unnecessary symbols like ‘[ ]’ present.
Let’s find the index of the 3 Boroughs that do not match.
print("The index of first borough is",London_table.index[London_table['Borough'] == 'Barking and Dagenham []'].tolist())
print("The index of second borough is",London_table.index[London_table['Borough'] == 'Greenwich []'].tolist())
print("The index of third borough is",London_table.index[London_table['Borough'] == 'Hammersmith and Fulham []'].tolist())
The index of first borough is [0]
The index of second borough is [9]
The index of third borough is [11]
Change the Borough names to match the other data frame.
London_table.iloc[0,0] = 'Barking and Dagenham'
London_table.iloc[9,0] = 'Greenwich'
London_table.iloc[11,0] = 'Hammersmith and Fulham'
set(df.Borough) - set(London_table.Borough)
set()
The Borough names in both dataframes match.
Now, we combine both the dataframes together.
Ld_crime = pd.merge(London_crime, London_table, on='Borough')
Ld_crime.head()
Borough | Burglary | Criminal Damage | Drugs | Fraud or Forgery | Other Notifiable Offenses | Robbery | Sexual Offences | Theft and Handling | Violence Against the Person | Total | Inner | Status | Local authority | Political control | Headquarters | Area (sq mi) | Population (2013 est)[1] | Co-ordinates | Nr. in map | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Barking and Dagenham | 18103 | 18888 | 9188 | 205 | 2819 | 6105 | 49 | 50999 | 43091 | 149447 | NaN | NaN | Barking and Dagenham London Borough Council | Labour | Town Hall, 1 Town Square | 13.93 | 194352 | 51°33′39″N 0°09′21″E / 51.5607°N 0.1557°E | 25 |
1 | Barnet | 36981 | 21024 | 9796 | 175 | 2953 | 7374 | 38 | 87285 | 46565 | 212191 | NaN | NaN | Barnet London Borough Council | Conservative | North London Business Park, Oakleigh Road South | 33.49 | 369088 | 51°37′31″N 0°09′06″W / 51.6252°N 0.1517°W | 31 |
2 | Bexley | 14973 | 17244 | 7346 | 106 | 1999 | 2338 | 22 | 40071 | 30037 | 114136 | NaN | NaN | Bexley London Borough Council | Conservative | Civic Offices, 2 Watling Street | 23.38 | 236687 | 51°27′18″N 0°09′02″E / 51.4549°N 0.1505°E | 23 |
3 | Brent | 28923 | 20569 | 25978 | 157 | 3711 | 12473 | 39 | 72523 | 63178 | 227551 | NaN | NaN | Brent London Borough Council | Labour | Brent Civic Centre, Engineers Way | 16.70 | 317264 | 51°33′32″N 0°16′54″W / 51.5588°N 0.2817°W | 12 |
4 | Bromley | 27135 | 24039 | 8942 | 196 | 2637 | 4868 | 31 | 69742 | 46759 | 184349 | NaN | NaN | Bromley London Borough Council | Conservative | Civic Centre, Stockwell Close | 57.97 | 317899 | 51°24′14″N 0°01′11″E / 51.4039°N 0.0198°E | 20 |
# shape of the dataframe
Ld_crime.shape
(33, 20)
# check if the names of Boroughs in both the dataframes match
set(df.Borough) - set(Ld_crime.Borough)
set()
Rearrange the Columns.
# list the column names of the dataframe
list(Ld_crime)
['Borough',
'Burglary',
'Criminal Damage',
'Drugs',
'Fraud or Forgery',
'Other Notifiable Offenses',
'Robbery',
'Sexual Offences',
'Theft and Handling',
'Violence Against the Person',
'Total',
'Inner',
'Status',
'Local authority',
'Political control',
'Headquarters',
'Area (sq mi)',
'Population (2013 est)[1]',
'Co-ordinates',
'Nr. in map']
# rename the Population column
Ld_crime = Ld_crime.rename(columns = {'Population (2013 est)[1]':'Population'})
columnsTitles = ['Borough', 'Local authority', 'Political control', 'Headquarters', 'Area (sq mi)', 'Population', 'Co-ordinates',
'Burglary', 'Criminal Damage', 'Drugs', 'Fraud or Forgery', 'Other Notifiable Offenses', 'Robbery', 'Sexual Offenses',
'Theft and Handling', 'Violence Against the Person', 'Total']
Ld_crime = Ld_crime.reindex(columns=columnsTitles)
Ld_crime = Ld_crime[['Borough', 'Local authority', 'Political control', 'Headquarters', 'Area (sq mi)', 'Population', 'Co-ordinates',
'Burglary', 'Criminal Damage', 'Drugs', 'Fraud or Forgery', 'Other Notifiable Offenses', 'Robbery', 'Sexual Offenses',
'Theft and Handling', 'Violence Against the Person', 'Total']]
Ld_crime
Borough | Local authority | Political control | Headquarters | Area (sq mi) | Population | Co-ordinates | Burglary | Criminal Damage | Drugs | Fraud or Forgery | Other Notifiable Offenses | Robbery | Sexual Offenses | Theft and Handling | Violence Against the Person | Total | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Barking and Dagenham | Barking and Dagenham London Borough Council | Labour | Town Hall, 1 Town Square | 13.93 | 194352 | 51°33′39″N 0°09′21″E / 51.5607°N 0.1557°E | 18103 | 18888 | 9188 | 205 | 2819 | 6105 | NaN | 50999 | 43091 | 149447 |
1 | Barnet | Barnet London Borough Council | Conservative | North London Business Park, Oakleigh Road South | 33.49 | 369088 | 51°37′31″N 0°09′06″W / 51.6252°N 0.1517°W | 36981 | 21024 | 9796 | 175 | 2953 | 7374 | NaN | 87285 | 46565 | 212191 |
2 | Bexley | Bexley London Borough Council | Conservative | Civic Offices, 2 Watling Street | 23.38 | 236687 | 51°27′18″N 0°09′02″E / 51.4549°N 0.1505°E | 14973 | 17244 | 7346 | 106 | 1999 | 2338 | NaN | 40071 | 30037 | 114136 |
3 | Brent | Brent London Borough Council | Labour | Brent Civic Centre, Engineers Way | 16.70 | 317264 | 51°33′32″N 0°16′54″W / 51.5588°N 0.2817°W | 28923 | 20569 | 25978 | 157 | 3711 | 12473 | NaN | 72523 | 63178 | 227551 |
4 | Bromley | Bromley London Borough Council | Conservative | Civic Centre, Stockwell Close | 57.97 | 317899 | 51°24′14″N 0°01′11″E / 51.4039°N 0.0198°E | 27135 | 24039 | 8942 | 196 | 2637 | 4868 | NaN | 69742 | 46759 | 184349 |
5 | Camden | Camden London Borough Council | Labour | Camden Town Hall, Judd Street | 8.40 | 229719 | 51°31′44″N 0°07′32″W / 51.5290°N 0.1255°W | 27939 | 18482 | 21816 | 123 | 3857 | 9286 | NaN | 140596 | 53012 | 275147 |
6 | City of London | Corporation of London;Inner Temple;Middle Temple | ? | Guildhall | 1.12 | 7000 | 51°30′56″N 0°05′32″W / 51.5155°N 0.0922°W | 15 | 16 | 33 | 0 | 17 | 24 | NaN | 561 | 114 | 780 |
7 | Croydon | Croydon London Borough Council | Labour | Bernard Weatherill House, Mint Walk | 33.41 | 372752 | 51°22′17″N 0°05′52″W / 51.3714°N 0.0977°W | 33376 | 31218 | 19162 | 270 | 4340 | 12645 | NaN | 91437 | 67791 | 260294 |
8 | Ealing | Ealing London Borough Council | Labour | Perceval House, 14-16 Uxbridge Road | 21.44 | 342494 | 51°30′47″N 0°18′32″W / 51.5130°N 0.3089°W | 30831 | 25613 | 18591 | 175 | 4406 | 9568 | NaN | 93834 | 68492 | 251562 |
9 | Enfield | Enfield London Borough Council | Labour | Civic Centre, Silver Street | 31.74 | 320524 | 51°39′14″N 0°04′48″W / 51.6538°N 0.0799°W | 30213 | 22487 | 13251 | 132 | 3293 | 9059 | NaN | 70371 | 45036 | 193880 |
10 | Greenwich | Greenwich London Borough Council | Labour | Woolwich Town Hall, Wellington Street | 18.28 | 264008 | 51°29′21″N 0°03′53″E / 51.4892°N 0.0648°E | 20966 | 22755 | 10836 | 107 | 3598 | 5430 | NaN | 64923 | 52897 | 181568 |
11 | Hackney | Hackney London Borough Council | Labour | Hackney Town Hall, Mare Street | 7.36 | 257379 | 51°32′42″N 0°03′19″W / 51.5450°N 0.0553°W | 21450 | 17327 | 18144 | 143 | 3332 | 8975 | NaN | 91118 | 56584 | 217119 |
12 | Hammersmith and Fulham | Hammersmith and Fulham London Borough Council | Labour | Town Hall, King Street | 6.33 | 178685 | 51°29′34″N 0°14′02″W / 51.4927°N 0.2339°W | 17010 | 14595 | 15492 | 91 | 3352 | 5279 | NaN | 86381 | 43014 | 185259 |
13 | Haringey | Haringey London Borough Council | Labour | Civic Centre, High Road | 11.42 | 263386 | 51°36′00″N 0°06′43″W / 51.6000°N 0.1119°W | 28213 | 22272 | 14563 | 207 | 2971 | 10084 | NaN | 83979 | 50943 | 213272 |
14 | Harrow | Harrow London Borough Council | Labour | Civic Centre, Station Road | 19.49 | 243372 | 51°35′23″N 0°20′05″W / 51.5898°N 0.3346°W | 19630 | 12724 | 7122 | 92 | 1998 | 4242 | NaN | 40800 | 30213 | 116848 |
15 | Havering | Havering London Borough Council | Conservative (council NOC) | Town Hall, Main Road | 43.35 | 242080 | 51°34′52″N 0°11′01″E / 51.5812°N 0.1837°E | 21302 | 17252 | 8171 | 179 | 2358 | 3089 | NaN | 52609 | 33968 | 138947 |
16 | Hillingdon | Hillingdon London Borough Council | Conservative | Civic Centre, High Street | 44.67 | 286806 | 51°32′39″N 0°28′34″W / 51.5441°N 0.4760°W | 26056 | 24485 | 11413 | 223 | 6504 | 5663 | NaN | 80028 | 55264 | 209680 |
17 | Hounslow | Hounslow London Borough Council | Labour | Hounslow House, 7 Bath Road | 21.61 | 262407 | 51°28′29″N 0°22′05″W / 51.4746°N 0.3680°W | 21026 | 21407 | 13722 | 183 | 3963 | 4847 | NaN | 70180 | 51404 | 186772 |
18 | Islington | Islington London Borough Council | Labour | Municipal Offices, 222 Upper Street | 5.74 | 215667 | 51°32′30″N 0°06′08″W / 51.5416°N 0.1022°W | 22207 | 18354 | 16553 | 85 | 3675 | 8736 | NaN | 107661 | 52975 | 230286 |
19 | Kensington and Chelsea | Kensington and Chelsea London Borough Council | Conservative | The Town Hall, Hornton Street | 4.68 | 155594 | 51°30′07″N 0°11′41″W / 51.5020°N 0.1947°W | 14980 | 9839 | 14573 | 85 | 2203 | 4744 | NaN | 95963 | 29570 | 171981 |
20 | Kingston upon Thames | Kingston upon Thames London Borough Council | Liberal Democrat | Guildhall, High Street | 14.38 | 166793 | 51°24′31″N 0°18′23″W / 51.4085°N 0.3064°W | 10131 | 10610 | 5682 | 65 | 1332 | 1702 | NaN | 38226 | 21540 | 89306 |
21 | Lambeth | Lambeth London Borough Council | Labour | Lambeth Town Hall, Brixton Hill | 10.36 | 314242 | 51°27′39″N 0°06′59″W / 51.4607°N 0.1163°W | 30199 | 26136 | 25083 | 137 | 4520 | 18408 | NaN | 114899 | 72726 | 292178 |
22 | Lewisham | Lewisham London Borough Council | Labour | Town Hall, 1 Catford Road | 13.57 | 286180 | 51°26′43″N 0°01′15″W / 51.4452°N 0.0209°W | 24871 | 24810 | 16825 | 262 | 3809 | 10455 | NaN | 70382 | 63652 | 215137 |
23 | Merton | Merton London Borough Council | Labour | Civic Centre, London Road | 14.52 | 203223 | 51°24′05″N 0°11′45″W / 51.4014°N 0.1958°W | 16485 | 14339 | 6651 | 111 | 1571 | 4021 | NaN | 44128 | 28322 | 115654 |
24 | Newham | Newham London Borough Council | Labour | Newham Dockside, 1000 Dockside Road | 13.98 | 318227 | 51°30′28″N 0°02′49″E / 51.5077°N 0.0469°E | 25356 | 24177 | 18389 | 323 | 4456 | 16913 | NaN | 106146 | 66221 | 262024 |
25 | Redbridge | Redbridge London Borough Council | Labour | Town Hall, 128-142 High Road | 21.78 | 288272 | 51°33′32″N 0°04′27″E / 51.5590°N 0.0741°E | 26735 | 17543 | 15736 | 284 | 2619 | 7688 | NaN | 71496 | 41430 | 183562 |
26 | Richmond upon Thames | Richmond upon Thames London Borough Council | Liberal Democrat | Civic Centre, 44 York Street | 22.17 | 191365 | 51°26′52″N 0°19′34″W / 51.4479°N 0.3260°W | 16097 | 11722 | 4707 | 37 | 1420 | 1590 | NaN | 40858 | 20314 | 96771 |
27 | Southwark | Southwark London Borough Council | Labour | 160 Tooley Street | 11.14 | 298464 | 51°30′13″N 0°04′49″W / 51.5035°N 0.0804°W | 27980 | 24450 | 27381 | 321 | 4696 | 16153 | NaN | 109432 | 68356 | 278809 |
28 | Sutton | Sutton London Borough Council | Liberal Democrat | Civic Offices, St Nicholas Way | 16.93 | 195914 | 51°21′42″N 0°11′40″W / 51.3618°N 0.1945°W | 13207 | 14474 | 4586 | 57 | 1393 | 2308 | NaN | 39533 | 25409 | 100987 |
29 | Tower Hamlets | Tower Hamlets London Borough Council | Labour | Town Hall, Mulberry Place, 5 Clove Crescent | 7.63 | 272890 | 51°30′36″N 0°00′21″W / 51.5099°N 0.0059°W | 21510 | 21593 | 23408 | 124 | 4268 | 10050 | NaN | 87620 | 59993 | 228613 |
30 | Waltham Forest | Waltham Forest London Borough Council | Labour | Waltham Forest Town Hall, Forest Road | 14.99 | 265797 | 51°35′27″N 0°00′48″W / 51.5908°N 0.0134°W | 25565 | 20459 | 14101 | 236 | 3040 | 10606 | NaN | 77940 | 51898 | 203879 |
31 | Wandsworth | Wandsworth London Borough Council | Conservative | The Town Hall, Wandsworth High Street | 13.23 | 310516 | 51°27′24″N 0°11′28″W / 51.4567°N 0.1910°W | 25533 | 19630 | 9493 | 161 | 3091 | 8398 | NaN | 92523 | 45865 | 204741 |
32 | Westminster | Westminster City Council | Conservative | Westminster City Hall, 64 Victoria Street | 8.29 | 226841 | 51°29′50″N 0°08′14″W / 51.4973°N 0.1372°W | 29295 | 20405 | 34031 | 273 | 6148 | 15752 | NaN | 277617 | 71448 | 455028 |
# shape of the dataframe
Ld_crime.shape
(33, 17)
Exploratory Data Analysis
# descriptive statistics of the data
London_crime.describe()
Burglary | Criminal Damage | Drugs | Fraud or Forgery | Other Notifiable Offenses | Robbery | Sexual Offences | Theft and Handling | Violence Against the Person | Total | |
---|---|---|---|---|---|---|---|---|---|---|
count | 33.000000 | 33.000000 | 33.000000 | 33.000000 | 33.000000 | 33.000000 | 33.000000 | 33.000000 | 33.000000 | 33.000000 |
mean | 22857.363636 | 19119.333333 | 14265.606061 | 161.363636 | 3222.696970 | 7844.636364 | 38.575758 | 80662.454545 | 47214.575758 | 195386.606061 |
std | 7452.366846 | 5942.903618 | 7544.259564 | 81.603775 | 1362.107294 | 4677.643075 | 15.139002 | 45155.624776 | 17226.165191 | 79148.057551 |
min | 15.000000 | 16.000000 | 33.000000 | 0.000000 | 17.000000 | 24.000000 | 0.000000 | 561.000000 | 114.000000 | 780.000000 |
25% | 18103.000000 | 17244.000000 | 8942.000000 | 106.000000 | 2358.000000 | 4744.000000 | 27.000000 | 52609.000000 | 33968.000000 | 149447.000000 |
50% | 24871.000000 | 20405.000000 | 14101.000000 | 157.000000 | 3293.000000 | 7688.000000 | 40.000000 | 77940.000000 | 50943.000000 | 203879.000000 |
75% | 27980.000000 | 22755.000000 | 18389.000000 | 207.000000 | 3963.000000 | 10084.000000 | 47.000000 | 92523.000000 | 59993.000000 | 228613.000000 |
max | 36981.000000 | 31218.000000 | 34031.000000 | 323.000000 | 6504.000000 | 18408.000000 | 71.000000 | 277617.000000 | 72726.000000 | 455028.000000 |
# import libraries for plotting
import matplotlib as mpl
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as colors
mpl.style.use('ggplot')
Check if the column names are strings.
Ld_crime.columns = list(map(str, Ld_crime.columns))
# check the column labels type
all(isinstance(column, str) for column in Ld_crime.columns)
True
Let’s sort the total crimes in descending order to see 5 boroughs with the highest number of crimes.
Ld_crime.sort_values(['Total'], ascending=False, axis=0, inplace=True)
df_top5 = Ld_crime.head()
df_top5
Borough | Local authority | Political control | Headquarters | Area (sq mi) | Population | Co-ordinates | Burglary | Criminal Damage | Drugs | Fraud or Forgery | Other Notifiable Offenses | Robbery | Sexual Offenses | Theft and Handling | Violence Against the Person | Total | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
32 | Westminster | Westminster City Council | Conservative | Westminster City Hall, 64 Victoria Street | 8.29 | 226841 | 51°29′50″N 0°08′14″W / 51.4973°N 0.1372°W | 29295 | 20405 | 34031 | 273 | 6148 | 15752 | NaN | 277617 | 71448 | 455028 |
21 | Lambeth | Lambeth London Borough Council | Labour | Lambeth Town Hall, Brixton Hill | 10.36 | 314242 | 51°27′39″N 0°06′59″W / 51.4607°N 0.1163°W | 30199 | 26136 | 25083 | 137 | 4520 | 18408 | NaN | 114899 | 72726 | 292178 |
27 | Southwark | Southwark London Borough Council | Labour | 160 Tooley Street | 11.14 | 298464 | 51°30′13″N 0°04′49″W / 51.5035°N 0.0804°W | 27980 | 24450 | 27381 | 321 | 4696 | 16153 | NaN | 109432 | 68356 | 278809 |
5 | Camden | Camden London Borough Council | Labour | Camden Town Hall, Judd Street | 8.40 | 229719 | 51°31′44″N 0°07′32″W / 51.5290°N 0.1255°W | 27939 | 18482 | 21816 | 123 | 3857 | 9286 | NaN | 140596 | 53012 | 275147 |
24 | Newham | Newham London Borough Council | Labour | Newham Dockside, 1000 Dockside Road | 13.98 | 318227 | 51°30′28″N 0°02′49″E / 51.5077°N 0.0469°E | 25356 | 24177 | 18389 | 323 | 4456 | 16913 | NaN | 106146 | 66221 | 262024 |
Let’s visualize these 5 boroughs.
df_tt = df_top5[['Borough','Total']]
df_tt.set_index('Borough',inplace = True)
ax = df_tt.plot(kind='bar', figsize=(10, 6), rot=0)
ax.set_ylabel('Number of Crimes')
ax.set_xlabel('Borough')
ax.set_title('London Boroughs with the Highest no. of crime')
# create a function to display the percentage.
for p in ax.patches:
ax.annotate(np.round(p.get_height(),decimals=2),
(p.get_x()+p.get_width()/2., p.get_height()),
ha='center',
va='center',
xytext=(0, 10),
textcoords='offset points',
fontsize = 14
)
plt.show()
Okay. Now we know which places you need to stay away from.
Now, let’s sort the total crimes in ascending order to see 5 boroughs with the lowest number of crimes.
Ld_crime.sort_values(['Total'], ascending=True, axis=0, inplace=True)
df_bot5 = Ld_crime.head()
df_bot5
Borough | Local authority | Political control | Headquarters | Area (sq mi) | Population | Co-ordinates | Burglary | Criminal Damage | Drugs | Fraud or Forgery | Other Notifiable Offenses | Robbery | Sexual Offenses | Theft and Handling | Violence Against the Person | Total | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
6 | City of London | Corporation of London;Inner Temple;Middle Temple | ? | Guildhall | 1.12 | 7000 | 51°30′56″N 0°05′32″W / 51.5155°N 0.0922°W | 15 | 16 | 33 | 0 | 17 | 24 | NaN | 561 | 114 | 780 |
20 | Kingston upon Thames | Kingston upon Thames London Borough Council | Liberal Democrat | Guildhall, High Street | 14.38 | 166793 | 51°24′31″N 0°18′23″W / 51.4085°N 0.3064°W | 10131 | 10610 | 5682 | 65 | 1332 | 1702 | NaN | 38226 | 21540 | 89306 |
26 | Richmond upon Thames | Richmond upon Thames London Borough Council | Liberal Democrat | Civic Centre, 44 York Street | 22.17 | 191365 | 51°26′52″N 0°19′34″W / 51.4479°N 0.3260°W | 16097 | 11722 | 4707 | 37 | 1420 | 1590 | NaN | 40858 | 20314 | 96771 |
28 | Sutton | Sutton London Borough Council | Liberal Democrat | Civic Offices, St Nicholas Way | 16.93 | 195914 | 51°21′42″N 0°11′40″W / 51.3618°N 0.1945°W | 13207 | 14474 | 4586 | 57 | 1393 | 2308 | NaN | 39533 | 25409 | 100987 |
2 | Bexley | Bexley London Borough Council | Conservative | Civic Offices, 2 Watling Street | 23.38 | 236687 | 51°27′18″N 0°09′02″E / 51.4549°N 0.1505°E | 14973 | 17244 | 7346 | 106 | 1999 | 2338 | NaN | 40071 | 30037 | 114136 |
Let’s visualize these 5 boroughs.
df_bt = df_bot5[['Borough','Total']]
df_bt.set_index('Borough',inplace = True)
ax = df_bt.plot(kind='bar', figsize=(10, 6), rot=0)
ax.set_ylabel('Number of Crimes')
ax.set_xlabel('Borough')
ax.set_title('London Boroughs with the least no. of crime')
# create a function to display the percentage.
for p in ax.patches:
ax.annotate(np.round(p.get_height(),decimals=2),
(p.get_x()+p.get_width()/2., p.get_height()),
ha='center',
va='center',
xytext=(0, 10),
textcoords='offset points',
fontsize = 14
)
plt.show()
The borough City of London has the lowest crime recorded over the years. Let’s look into its details.
df_col = df_bot5[df_bot5['Borough'] == 'City of London']
df_col = df_col[['Borough','Total','Area (sq mi)','Population']]
df_col
Borough | Total | Area (sq mi) | Population | |
---|---|---|---|---|
6 | City of London | 780 | 1.12 | 7000 |
According to the London Boroughs Wikipedia page, the City of London is the 33rd principal division of Greater London, but it is not a London borough. You also realise that living in this area would be very expensive and you’re not looking to spend most of your income on rent.
So let’s focus on the next safest borough i.e. Kingston upon Thames, just to keep our options open.
Visualize different types of crimes in the borough ‘Kingston upon Thames’.
df_bc1 = df_bot5[df_bot5['Borough'] == 'Kingston upon Thames']
df_bc = df_bc1[['Borough', 'Burglary', 'Criminal Damage', 'Drugs', 'Fraud or Forgery', 'Other Notifiable Offenses',
'Robbery', 'Sexual Offenses', 'Theft and Handling', 'Violence Against the Person']]
df_bc.set_index('Borough', inplace=True)
ax = df_bc.plot(kind='bar', figsize=(10, 6), rot=0)
ax.set_ylabel('Number of Crimes')
ax.set_xlabel('Borough')
ax.set_title('Crimes in Kingston upon Thames')
# create a function to display the percentage.
for p in ax.patches:
ax.annotate(np.round(p.get_height(),decimals=2),
(p.get_x()+p.get_width()/2., p.get_height()),
ha='center',
va='center',
xytext=(0, 10),
textcoords='offset points',
fontsize = 14
)
plt.show()
This borough is a great option for you to live in and is also extremely safe compared to the other boroughs.
Dataset of the Neighborhood
The list of Neighborhoods in the Royal Borough of Kingston upon Thames can be found here.
Neighborhood = ['Berrylands','Canbury','Chessington','Coombe','Kingston upon Thames','Kingston Vale',
'Malden Rushett','Motspur Park','New Malden','Norbiton','Old Malden','Surbiton','Tolworth']
Borough = ['Kingston upon Thames','Kingston upon Thames','Kingston upon Thames','Kingston upon Thames','Kingston upon Thames',
'Kingston upon Thames','Kingston upon Thames','Kingston upon Thames','Kingston upon Thames','Kingston upon Thames',
'Kingston upon Thames','Kingston upon Thames','Kingston upon Thames']
Latitude = ['','','','','','','','','','','','','']
Longitude = ['','','','','','','','','','','','','']
df_neigh = {'Neighborhood':Neighborhood, 'Borough':Borough, 'Latitude':Latitude, 'Longitude':Longitude}
kut_neigh = pd.DataFrame(data=df_neigh, columns=['Neighborhood', 'Borough', 'Latitude', 'Longitude'], index=None)
kut_neigh
Neighborhood | Borough | Latitude | Longitude | |
---|---|---|---|---|
0 | Berrylands | Kingston upon Thames | ||
1 | Canbury | Kingston upon Thames | ||
2 | Chessington | Kingston upon Thames | ||
3 | Coombe | Kingston upon Thames | ||
4 | Kingston upon Thames | Kingston upon Thames | ||
5 | Kingston Vale | Kingston upon Thames | ||
6 | Malden Rushett | Kingston upon Thames | ||
7 | Motspur Park | Kingston upon Thames | ||
8 | New Malden | Kingston upon Thames | ||
9 | Norbiton | Kingston upon Thames | ||
10 | Old Malden | Kingston upon Thames | ||
11 | Surbiton | Kingston upon Thames | ||
12 | Tolworth | Kingston upon Thames |
Find the co-ordinates of each neighborhood in the Kingston upon Thames borough.
Latitude = []
Longitude = []
for i in range(len(Neighborhood)):
address = '{}, London, United Kingdom'.format(Neighborhood[i])
geolocator = Nominatim(user_agent='London_agent')
location = geolocator.geocode(address)
Latitude.append(location.latitude)
Longitude.append(location.longitude)
print(Latitude, Longitude)
[51.3937811, 51.41749865, 51.358336, 51.4194499, 51.4096275, 51.43185, 51.3410523, 51.3909852, 51.4053347, 51.4099994, 51.382484, 51.3937557, 51.3788758] [-0.2848024, -0.30555280504926163, -0.2986216, -0.2653985, -0.3062621, -0.2581379, -0.3190757, -0.2488979, -0.2634066, -0.2873963, -0.2590897, -0.3033105, -0.2828604]
df_neigh = {'Neighborhood':Neighborhood, 'Borough':Borough, 'Latitude':Latitude, 'Longitude':Longitude}
kut_neigh = pd.DataFrame(data=df_neigh, columns=['Neighborhood', 'Borough', 'Latitude', 'Longitude'], index=None)
kut_neigh
Neighborhood | Borough | Latitude | Longitude | |
---|---|---|---|---|
0 | Berrylands | Kingston upon Thames | 51.393781 | -0.284802 |
1 | Canbury | Kingston upon Thames | 51.417499 | -0.305553 |
2 | Chessington | Kingston upon Thames | 51.358336 | -0.298622 |
3 | Coombe | Kingston upon Thames | 51.419450 | -0.265398 |
4 | Kingston upon Thames | Kingston upon Thames | 51.409627 | -0.306262 |
5 | Kingston Vale | Kingston upon Thames | 51.431850 | -0.258138 |
6 | Malden Rushett | Kingston upon Thames | 51.341052 | -0.319076 |
7 | Motspur Park | Kingston upon Thames | 51.390985 | -0.248898 |
8 | New Malden | Kingston upon Thames | 51.405335 | -0.263407 |
9 | Norbiton | Kingston upon Thames | 51.409999 | -0.287396 |
10 | Old Malden | Kingston upon Thames | 51.382484 | -0.259090 |
11 | Surbiton | Kingston upon Thames | 51.393756 | -0.303310 |
12 | Tolworth | Kingston upon Thames | 51.378876 | -0.282860 |
Let’s get the co-ordinates of Berrylands, which is the center neighborhood of the Kingston upon Thames borough.
address = 'Berrylands, London, United Kingdom'
geolocator = Nominatim(user_agent='ld_explorer')
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geographical co-ordinates of Berrylands, London are {}, {}.'.format(latitude, longitude))
The geographical co-ordinates of Berrylands, London are 51.3937811, -0.2848024.
Let’s visualize the neighborhood of Kingston upon Thames borough.
# create map of London using latitude and longitude values
map_lon = folium.Map(location=[latitude, longitude], zoom_start=12)
# add markers to map
for lat, lng, borough, neighborhood in zip(kut_neigh['Latitude'], kut_neigh['Longitude'],
kut_neigh['Borough'], kut_neigh['Neighborhood']):
label = '{}, {}'.format(neighborhood, borough)
label = folium.Popup(label, parse_html=True)
folium.CircleMarker([lat, lng], radius=5, popup=label, color='blue', fill=True,
fill_color='#3186cc', fill_opacity=0.7, parse_html=False).add_to(map_lon)
map_lon
Modeling
- Find all the venues within a 500 meter radius of each neighborhood.
- Perform one hot encoding on the venues data.
- Group the venues by the neighborhood and calculate their mean.
- Perform a k-means clustering.
Create a function to extract the venues from each Neighborhood
def getNearbyVenues(names, latitudes, longitudes, radius=500):
venues_list=[]
for name, lat, lng in zip(names, latitudes, longitudes):
print(name)
# create the API request URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
CLIENT_ID,
CLIENT_SECRET,
VERSION,
lat,
lng,
radius,
LIMIT)
# make the GET request
results = requests.get(url).json()['response']['groups'][0]['items']
# return only relevant information for each nearby venue
venues_list.append([(
name,
lat,
lng,
v['venue']['name'],
v['venue']['location']['lat'],
v['venue']['location']['lng'],
v['venue']['categories'][0]['name']) for v in results])
nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
nearby_venues.columns = ['Neighborhood',
'Neighborhood Latitude',
'Neighborhood Longitude',
'Venue',
'Venue Latitude',
'Venue Longitude',
'Venue Category']
return(nearby_venues)
kut_venues= getNearbyVenues(names=kut_neigh['Neighborhood'],
latitudes=kut_neigh['Latitude'],
longitudes=kut_neigh['Longitude'])
Berrylands
Canbury
Chessington
Coombe
Kingston upon Thames
Kingston Vale
Malden Rushett
Motspur Park
New Malden
Norbiton
Old Malden
Surbiton
Tolworth
print(kut_venues.shape)
kut_venues.head()
(171, 7)
Neighborhood | Neighborhood Latitude | Neighborhood Longitude | Venue | Venue Latitude | Venue Longitude | Venue Category | |
---|---|---|---|---|---|---|---|
0 | Berrylands | 51.393781 | -0.284802 | Surbiton Racket & Fitness Club | 51.392676 | -0.290224 | Gym / Fitness Center |
1 | Berrylands | 51.393781 | -0.284802 | Alexandra Park | 51.394230 | -0.281206 | Park |
2 | Berrylands | 51.393781 | -0.284802 | K2 Bus Stop | 51.392302 | -0.281534 | Bus Stop |
3 | Canbury | 51.417499 | -0.305553 | Canbury Gardens | 51.417409 | -0.305300 | Park |
4 | Canbury | 51.417499 | -0.305553 | The Grey Horse | 51.414192 | -0.300759 | Pub |
kut_venues.groupby('Neighborhood').count()
Neighborhood Latitude | Neighborhood Longitude | Venue | Venue Latitude | Venue Longitude | Venue Category | |
---|---|---|---|---|---|---|
Neighborhood | ||||||
Berrylands | 3 | 3 | 3 | 3 | 3 | 3 |
Canbury | 14 | 14 | 14 | 14 | 14 | 14 |
Coombe | 1 | 1 | 1 | 1 | 1 | 1 |
Kingston Vale | 4 | 4 | 4 | 4 | 4 | 4 |
Kingston upon Thames | 50 | 50 | 50 | 50 | 50 | 50 |
Malden Rushett | 4 | 4 | 4 | 4 | 4 | 4 |
Motspur Park | 4 | 4 | 4 | 4 | 4 | 4 |
New Malden | 8 | 8 | 8 | 8 | 8 | 8 |
Norbiton | 28 | 28 | 28 | 28 | 28 | 28 |
Old Malden | 3 | 3 | 3 | 3 | 3 | 3 |
Surbiton | 33 | 33 | 33 | 33 | 33 | 33 |
Tolworth | 19 | 19 | 19 | 19 | 19 | 19 |
print('There are {} uniques categories.'.format(len(kut_venues['Venue Category'].unique())))
There are 72 uniques categories.
One hot encoding
# one hot encoding
kut_onehot = pd.get_dummies(kut_venues[['Venue Category']], prefix='', prefix_sep='')
# add neighborhood column back to the dataframe
kut_onehot['Neighborhood'] = kut_venues['Neighborhood']
# move neighborhood column to the first column
fixed_columns = [kut_onehot.columns[-1]] + list(kut_onehot.columns[:-1])
kut_onehot = kut_onehot[fixed_columns]
kut_onehot.head()
Neighborhood | Asian Restaurant | Athletics & Sports | Auto Garage | Bakery | Bar | Beer Bar | Bistro | Bookstore | Bowling Alley | ... | Spa | Stationery Store | Supermarket | Sushi Restaurant | Tea Room | Thai Restaurant | Theater | Train Station | Turkish Restaurant | Wine Shop | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Berrylands | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1 | Berrylands | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
2 | Berrylands | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
3 | Canbury | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
4 | Canbury | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
5 rows × 73 columns
Group the rows by neighborhood and take the mean of the frequency of coocurence of each category.
kut_grouped = kut_onehot.groupby('Neighborhood').mean().reset_index()
kut_grouped
Neighborhood | Asian Restaurant | Athletics & Sports | Auto Garage | Bakery | Bar | Beer Bar | Bistro | Bookstore | Bowling Alley | ... | Spa | Stationery Store | Supermarket | Sushi Restaurant | Tea Room | Thai Restaurant | Theater | Train Station | Turkish Restaurant | Wine Shop | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Berrylands | 0.00 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.00 | 0.000000 | 0.00 | 0.000000 | ... | 0.000000 | 0.00 | 0.000000 | 0.000 | 0.000000 | 0.000000 | 0.00 | 0.000000 | 0.00 | 0.000000 |
1 | Canbury | 0.00 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.00 | 0.000000 | 0.00 | 0.000000 | ... | 0.071429 | 0.00 | 0.071429 | 0.000 | 0.000000 | 0.000000 | 0.00 | 0.000000 | 0.00 | 0.000000 |
2 | Coombe | 0.00 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.00 | 0.000000 | 0.00 | 0.000000 | ... | 0.000000 | 0.00 | 0.000000 | 0.000 | 1.000000 | 0.000000 | 0.00 | 0.000000 | 0.00 | 0.000000 |
3 | Kingston Vale | 0.00 | 0.000000 | 0.000000 | 0.000000 | 0.250000 | 0.00 | 0.000000 | 0.00 | 0.000000 | ... | 0.000000 | 0.00 | 0.000000 | 0.000 | 0.000000 | 0.000000 | 0.00 | 0.000000 | 0.00 | 0.000000 |
4 | Kingston upon Thames | 0.02 | 0.000000 | 0.000000 | 0.020000 | 0.000000 | 0.02 | 0.000000 | 0.02 | 0.000000 | ... | 0.000000 | 0.02 | 0.020000 | 0.040 | 0.000000 | 0.040000 | 0.02 | 0.000000 | 0.02 | 0.000000 |
5 | Malden Rushett | 0.00 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.00 | 0.000000 | 0.00 | 0.000000 | ... | 0.000000 | 0.00 | 0.000000 | 0.000 | 0.000000 | 0.000000 | 0.00 | 0.000000 | 0.00 | 0.000000 |
6 | Motspur Park | 0.00 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.00 | 0.000000 | 0.00 | 0.000000 | ... | 0.000000 | 0.00 | 0.000000 | 0.000 | 0.000000 | 0.000000 | 0.00 | 0.000000 | 0.00 | 0.000000 |
7 | New Malden | 0.00 | 0.000000 | 0.000000 | 0.000000 | 0.125000 | 0.00 | 0.000000 | 0.00 | 0.000000 | ... | 0.000000 | 0.00 | 0.125000 | 0.125 | 0.000000 | 0.000000 | 0.00 | 0.000000 | 0.00 | 0.000000 |
8 | Norbiton | 0.00 | 0.035714 | 0.035714 | 0.000000 | 0.000000 | 0.00 | 0.000000 | 0.00 | 0.000000 | ... | 0.000000 | 0.00 | 0.035714 | 0.000 | 0.000000 | 0.035714 | 0.00 | 0.000000 | 0.00 | 0.035714 |
9 | Old Malden | 0.00 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.00 | 0.000000 | 0.00 | 0.000000 | ... | 0.000000 | 0.00 | 0.000000 | 0.000 | 0.000000 | 0.000000 | 0.00 | 0.333333 | 0.00 | 0.000000 |
10 | Surbiton | 0.00 | 0.000000 | 0.000000 | 0.030303 | 0.030303 | 0.00 | 0.030303 | 0.00 | 0.000000 | ... | 0.000000 | 0.00 | 0.030303 | 0.000 | 0.030303 | 0.030303 | 0.00 | 0.030303 | 0.00 | 0.000000 |
11 | Tolworth | 0.00 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.00 | 0.000000 | 0.00 | 0.052632 | ... | 0.000000 | 0.00 | 0.000000 | 0.000 | 0.000000 | 0.052632 | 0.00 | 0.052632 | 0.00 | 0.000000 |
12 rows × 73 columns
# dimensions of the dataframe
kut_grouped.shape
(12, 73)
num_top_venues = 5
for hood in kut_grouped['Neighborhood']:
print('----'+hood+'----')
temp = kut_grouped[kut_grouped['Neighborhood'] == hood].T.reset_index()
temp.columns = ['venue', 'freq']
temp = temp.iloc[1:]
temp['freq'] = temp['freq'].astype(float)
temp = temp.round({'freq': 2})
print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
print('\n')
----Berrylands----
venue freq
0 Gym / Fitness Center 0.33
1 Park 0.33
2 Bus Stop 0.33
3 Portuguese Restaurant 0.00
4 Plaza 0.00
----Canbury----
venue freq
0 Pub 0.29
1 Plaza 0.07
2 Park 0.07
3 Hotel 0.07
4 Indian Restaurant 0.07
----Coombe----
venue freq
0 Tea Room 1.0
1 Asian Restaurant 0.0
2 Market 0.0
3 Platform 0.0
4 Pizza Place 0.0
----Kingston Vale----
venue freq
0 Grocery Store 0.25
1 Bar 0.25
2 Sandwich Place 0.25
3 Soccer Field 0.25
4 Asian Restaurant 0.00
----Kingston upon Thames----
venue freq
0 Coffee Shop 0.12
1 Café 0.08
2 Department Store 0.06
3 Thai Restaurant 0.04
4 Clothing Store 0.04
----Malden Rushett----
venue freq
0 Convenience Store 0.25
1 Restaurant 0.25
2 Garden Center 0.25
3 Pub 0.25
4 Park 0.00
----Motspur Park----
venue freq
0 Gym 0.25
1 Restaurant 0.25
2 Park 0.25
3 Soccer Field 0.25
4 Mexican Restaurant 0.00
----New Malden----
venue freq
0 Gym 0.12
1 Indian Restaurant 0.12
2 Bar 0.12
3 Gastropub 0.12
4 Korean Restaurant 0.12
----Norbiton----
venue freq
0 Indian Restaurant 0.11
1 Italian Restaurant 0.07
2 Food 0.07
3 Pub 0.07
4 Wine Shop 0.04
----Old Malden----
venue freq
0 Pub 0.33
1 Food 0.33
2 Train Station 0.33
3 Platform 0.00
4 Pizza Place 0.00
----Surbiton----
venue freq
0 Coffee Shop 0.18
1 Pub 0.12
2 Italian Restaurant 0.06
3 Pharmacy 0.06
4 Grocery Store 0.06
----Tolworth----
venue freq
0 Grocery Store 0.16
1 Restaurant 0.11
2 Indian Restaurant 0.05
3 Bus Stop 0.05
4 Discount Store 0.05
Create a dataframe of the venues.
First, create a function to sort the venues in descending order.
def return_most_common_venues(row, num_top_venues):
row_categories = row.iloc[1:]
row_categories_sorted = row_categories.sort_values(ascending=False)
return row_categories_sorted.index.values[0:num_top_venues]
Create the new dataframe and display yhe top 10 venues for each neighborhood.
num_top_venues = 10
indicators = ['st', 'nd', 'rd']
# create columns according to tthe number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
try:
columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
except:
columns.append('{}th Most Common Venue'.format(ind+1))
# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = kut_grouped['Neighborhood']
for ind in np.arange(kut_grouped.shape[0]):
neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(kut_grouped.iloc[ind, :], num_top_venues)
neighborhoods_venues_sorted.head()
Neighborhood | 1st Most Common Venue | 2nd Most Common Venue | 3rd Most Common Venue | 4th Most Common Venue | 5th Most Common Venue | 6th Most Common Venue | 7th Most Common Venue | 8th Most Common Venue | 9th Most Common Venue | 10th Most Common Venue | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | Berrylands | Gym / Fitness Center | Park | Bus Stop | Wine Shop | Fast Food Restaurant | Deli / Bodega | Department Store | Discount Store | Electronics Store | Farmers Market |
1 | Canbury | Pub | Shop & Service | Spa | Plaza | Café | Indian Restaurant | Hotel | Park | Supermarket | Gym / Fitness Center |
2 | Coombe | Tea Room | Wine Shop | Fast Food Restaurant | Cosmetics Shop | Deli / Bodega | Department Store | Discount Store | Electronics Store | Farmers Market | Fish & Chips Shop |
3 | Kingston Vale | Grocery Store | Bar | Sandwich Place | Soccer Field | Furniture / Home Store | Garden Center | Fried Chicken Joint | French Restaurant | Food | Fish & Chips Shop |
4 | Kingston upon Thames | Coffee Shop | Café | Department Store | Thai Restaurant | Sushi Restaurant | Burger Joint | Pub | Clothing Store | Italian Restaurant | Asian Restaurant |
Cluster similar neighborhoods together using k-means clustering
# import k-means
from sklearn.cluster import KMeans
# set the number of clusters
kclusters = 5
kut_grouped_clustering = kut_grouped.drop('Neighborhood', 1)
# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(kut_grouped_clustering)
# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]
array([3, 1, 0, 2, 1, 1, 2, 1, 1, 4])
# add clustering labels
neighborhoods_venues_sorted.insert(0,'Cluster Labels', kmeans.labels_)
kut_merged = kut_neigh
# merge kut_grouped with kut_neigh to add latitude/longitude for each neighborhood
kut_merged = kut_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
kut_merged.head()
Neighborhood | Borough | Latitude | Longitude | Cluster Labels | 1st Most Common Venue | 2nd Most Common Venue | 3rd Most Common Venue | 4th Most Common Venue | 5th Most Common Venue | 6th Most Common Venue | 7th Most Common Venue | 8th Most Common Venue | 9th Most Common Venue | 10th Most Common Venue | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Berrylands | Kingston upon Thames | 51.393781 | -0.284802 | 3.0 | Gym / Fitness Center | Park | Bus Stop | Wine Shop | Fast Food Restaurant | Deli / Bodega | Department Store | Discount Store | Electronics Store | Farmers Market |
1 | Canbury | Kingston upon Thames | 51.417499 | -0.305553 | 1.0 | Pub | Shop & Service | Spa | Plaza | Café | Indian Restaurant | Hotel | Park | Supermarket | Gym / Fitness Center |
2 | Chessington | Kingston upon Thames | 51.358336 | -0.298622 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
3 | Coombe | Kingston upon Thames | 51.419450 | -0.265398 | 0.0 | Tea Room | Wine Shop | Fast Food Restaurant | Cosmetics Shop | Deli / Bodega | Department Store | Discount Store | Electronics Store | Farmers Market | Fish & Chips Shop |
4 | Kingston upon Thames | Kingston upon Thames | 51.409627 | -0.306262 | 1.0 | Coffee Shop | Café | Department Store | Thai Restaurant | Sushi Restaurant | Burger Joint | Pub | Clothing Store | Italian Restaurant | Asian Restaurant |
kut_merged.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 13 entries, 0 to 12
Data columns (total 15 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Neighborhood 13 non-null object
1 Borough 13 non-null object
2 Latitude 13 non-null float64
3 Longitude 13 non-null float64
4 Cluster Labels 12 non-null float64
5 1st Most Common Venue 12 non-null object
6 2nd Most Common Venue 12 non-null object
7 3rd Most Common Venue 12 non-null object
8 4th Most Common Venue 12 non-null object
9 5th Most Common Venue 12 non-null object
10 6th Most Common Venue 12 non-null object
11 7th Most Common Venue 12 non-null object
12 8th Most Common Venue 12 non-null object
13 9th Most Common Venue 12 non-null object
14 10th Most Common Venue 12 non-null object
dtypes: float64(3), object(12)
memory usage: 1.6+ KB
# drop the rows with NaN value
kut_merged.dropna(inplace=True)
kut_merged.shape
(12, 15)
kut_merged['Cluster Labels'] = kut_merged['Cluster Labels'].astype(int)
kut_merged.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 12 entries, 0 to 12
Data columns (total 15 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Neighborhood 12 non-null object
1 Borough 12 non-null object
2 Latitude 12 non-null float64
3 Longitude 12 non-null float64
4 Cluster Labels 12 non-null int32
5 1st Most Common Venue 12 non-null object
6 2nd Most Common Venue 12 non-null object
7 3rd Most Common Venue 12 non-null object
8 4th Most Common Venue 12 non-null object
9 5th Most Common Venue 12 non-null object
10 6th Most Common Venue 12 non-null object
11 7th Most Common Venue 12 non-null object
12 8th Most Common Venue 12 non-null object
13 9th Most Common Venue 12 non-null object
14 10th Most Common Venue 12 non-null object
dtypes: float64(2), int32(1), object(12)
memory usage: 1.5+ KB
Visualize the clusters
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11.5)
# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]
# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(kut_merged['Latitude'], kut_merged['Longitude'], kut_merged['Neighborhood'], kut_merged['Cluster Labels']):
label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
folium.CircleMarker(
[lat, lon],
radius=8,
popup=label,
color=rainbow[cluster-1],
fill=True,
fill_color=rainbow[cluster-1],
fill_opacity=0.5).add_to(map_clusters)
map_clusters
Each cluster is color-coded for the ease of presentation. We can see that the majority of the neighborhoods fall in the purple cluster, which is Cluster 1. Three neighborhoods have their own cluster, which are Red, Green and Yellow, i.e. Cluster 0, 3 and 4 respectively. The Blue cluster, which is Cluster 2, consists of three neighborhoods.
Analysis
Analyze each of the clusters to identify the characteristics of each cluster and the neighborhoods in them.
Examine the first cluster.
kut_merged[kut_merged['Cluster Labels'] == 0]
Neighborhood | Borough | Latitude | Longitude | Cluster Labels | 1st Most Common Venue | 2nd Most Common Venue | 3rd Most Common Venue | 4th Most Common Venue | 5th Most Common Venue | 6th Most Common Venue | 7th Most Common Venue | 8th Most Common Venue | 9th Most Common Venue | 10th Most Common Venue | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
3 | Coombe | Kingston upon Thames | 51.41945 | -0.265398 | 0 | Tea Room | Wine Shop | Fast Food Restaurant | Cosmetics Shop | Deli / Bodega | Department Store | Discount Store | Electronics Store | Farmers Market | Fish & Chips Shop |
Cluster 0 has only one neighborhood in it. The most common venues are Tea Rooms, Wine Shops, and Fast Food Restaurants.
Examine the second cluster.
kut_merged[kut_merged['Cluster Labels'] == 1]
Neighborhood | Borough | Latitude | Longitude | Cluster Labels | 1st Most Common Venue | 2nd Most Common Venue | 3rd Most Common Venue | 4th Most Common Venue | 5th Most Common Venue | 6th Most Common Venue | 7th Most Common Venue | 8th Most Common Venue | 9th Most Common Venue | 10th Most Common Venue | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Canbury | Kingston upon Thames | 51.417499 | -0.305553 | 1 | Pub | Shop & Service | Spa | Plaza | Café | Indian Restaurant | Hotel | Park | Supermarket | Gym / Fitness Center |
4 | Kingston upon Thames | Kingston upon Thames | 51.409627 | -0.306262 | 1 | Coffee Shop | Café | Department Store | Thai Restaurant | Sushi Restaurant | Burger Joint | Pub | Clothing Store | Italian Restaurant | Asian Restaurant |
6 | Malden Rushett | Kingston upon Thames | 51.341052 | -0.319076 | 1 | Convenience Store | Pub | Garden Center | Restaurant | Farmers Market | Cosmetics Shop | Deli / Bodega | Department Store | Discount Store | Electronics Store |
8 | New Malden | Kingston upon Thames | 51.405335 | -0.263407 | 1 | Indian Restaurant | Korean Restaurant | Gastropub | Gym | Bar | Sushi Restaurant | Supermarket | Chinese Restaurant | Department Store | Discount Store |
9 | Norbiton | Kingston upon Thames | 51.409999 | -0.287396 | 1 | Indian Restaurant | Pub | Italian Restaurant | Food | Hardware Store | Pizza Place | Pharmacy | Japanese Restaurant | Hotel | Wine Shop |
11 | Surbiton | Kingston upon Thames | 51.393756 | -0.303310 | 1 | Coffee Shop | Pub | Grocery Store | Italian Restaurant | Pharmacy | Breakfast Spot | Gastropub | Fast Food Restaurant | Farmers Market | Gym / Fitness Center |
Cluster 1 has six neighborhods, the highest number of neighborhoods, in it. After examining these neighborhoods, we can see that the most common venues are Restaurants, Coffee shops, Cafes, Convenience Stores, Department Stores, Grocery Stores, Pubs, Shops & Services, and Spas. There are also Gyms, Spas and other Stores around. This seems to be a great cluster to live in.
Examine the third cluster.
kut_merged[kut_merged['Cluster Labels'] == 2]
Neighborhood | Borough | Latitude | Longitude | Cluster Labels | 1st Most Common Venue | 2nd Most Common Venue | 3rd Most Common Venue | 4th Most Common Venue | 5th Most Common Venue | 6th Most Common Venue | 7th Most Common Venue | 8th Most Common Venue | 9th Most Common Venue | 10th Most Common Venue | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
5 | Kingston Vale | Kingston upon Thames | 51.431850 | -0.258138 | 2 | Grocery Store | Bar | Sandwich Place | Soccer Field | Furniture / Home Store | Garden Center | Fried Chicken Joint | French Restaurant | Food | Fish & Chips Shop |
7 | Motspur Park | Kingston upon Thames | 51.390985 | -0.248898 | 2 | Soccer Field | Gym | Park | Restaurant | Farmers Market | Cosmetics Shop | Deli / Bodega | Department Store | Discount Store | Electronics Store |
12 | Tolworth | Kingston upon Thames | 51.378876 | -0.282860 | 2 | Grocery Store | Restaurant | Discount Store | Pharmacy | Pizza Place | Furniture / Home Store | Italian Restaurant | Bus Stop | Indian Restaurant | Hotel |
Cluster 2 has three neighborhoods in it. The most common venues are Grocery Stores, Soccer Fields, Bars, Restaurants, Gyms, and Parks.
Examine the fourth cluster.
kut_merged[kut_merged['Cluster Labels'] == 3]
Neighborhood | Borough | Latitude | Longitude | Cluster Labels | 1st Most Common Venue | 2nd Most Common Venue | 3rd Most Common Venue | 4th Most Common Venue | 5th Most Common Venue | 6th Most Common Venue | 7th Most Common Venue | 8th Most Common Venue | 9th Most Common Venue | 10th Most Common Venue | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Berrylands | Kingston upon Thames | 51.393781 | -0.284802 | 3 | Gym / Fitness Center | Park | Bus Stop | Wine Shop | Fast Food Restaurant | Deli / Bodega | Department Store | Discount Store | Electronics Store | Farmers Market |
Cluster 3 has only one neighborhood in it. The most common venues are Gyms, Parks, and Bus stops.
Examine the fifth cluster.
kut_merged[kut_merged['Cluster Labels'] == 4]
Neighborhood | Borough | Latitude | Longitude | Cluster Labels | 1st Most Common Venue | 2nd Most Common Venue | 3rd Most Common Venue | 4th Most Common Venue | 5th Most Common Venue | 6th Most Common Venue | 7th Most Common Venue | 8th Most Common Venue | 9th Most Common Venue | 10th Most Common Venue | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
10 | Old Malden | Kingston upon Thames | 51.382484 | -0.25909 | 4 | Train Station | Pub | Food | Wine Shop | Farmers Market | Cosmetics Shop | Deli / Bodega | Department Store | Discount Store | Electronics Store |
Cluster 4 has only one neighborhood in it. The most common venues are Train Stations, Pubs, and Food Joints.
Results
The aim of this project is to help people who want to relocate to the safest borough in London. Expats can choose the neighborhoods to which they want to relocate based on the most common venues in it. For example, if a person is looking for a neighborhood with good connectivity and public transportation we can see that Clusters 3 and 4 have Bus Stops and Train Stations respectively, as the most common venues. If a person is looking for a neighborhood with stores and restaurants in close proximity, then the neighborhoods in Cluster 1 are suitable. For a family, I feel that the neighborhoods in Cluster 2 are more suitable due to the common venues such as Parks, Gym/Fitness centers, Bus Stops, Restaurants, Grocery Stores and Soccer Fields.
Conclusion
This project helps a person get a better understanding of the neighborhoods with respect to the most common venues in that neighborhood. It is always helpful to make use of technology to stay one step ahead i.e. finding out more about places before moving into a neighborhood. We have just taken safety as a primary concern to shortlist the borough of London. The future of this project includes taking other factors such as cost of living in the areas into consideration to shortlist the boroughs based on safety and a predefined budget.