Understanding spatial developments within the location of Tokyo comfort shops
When strolling round Tokyo you’ll typically go quite a few comfort shops, regionally generally known as “konbinis”, which is smart since there are over 56,000 comfort shops in Japan. Typically there can be totally different chains of comfort retailer situated very shut to 1 one other; it isn’t unusual to see shops across the nook from one another or on reverse sides of the road. Given Tokyo’s inhabitants density, it’s comprehensible for competing companies to be compelled nearer to one another, nonetheless, may there be any relationships between which chains of comfort shops are discovered close to one another?
The purpose can be to gather location knowledge from quite a few comfort retailer chains in a Tokyo neighbourhood, to grasp if there are any relationships between which chains are co-located with one another. To do that would require:
- Potential to question the situation of various comfort shops in Tokyo, with the intention to retrieve every retailer’s identify and site
- Discovering which comfort shops are co-located with one another inside a pre-defined radius
- Utilizing the information on co-located shops to derive affiliation guidelines
- Plotting and visualising outcomes for inspection
Let’s start!
For our use case we need to discover comfort shops in Tokyo, so first we’ll must perform a little homework on what are the widespread retailer chains. A fast Google search tells me that the primary shops are FamilyMart, Lawson, 7-Eleven, Ministop, Every day Yamazaki and NewDays.
Now we all know what we’re looking out, lets go to OSMNX; a terrific Python bundle for looking out knowledge in OpenStreetMap (OSM). In accordance the OSM’s schema, we should always be capable of discover the shop identify in both the ‘model:en’ or ‘model’ area.
We are able to begin by importing some helpful libraries for getting our knowledge, and defining a perform to return a desk of areas for a given comfort retailer chain inside a specified space:
import geopandas as gpd
from shapely.geometry import Level, Polygon
import osmnx
import shapely
import pandas as pd
import numpy as np
import networkx as nxdef point_finder(place, tags):
'''
Returns a dataframe of coordinates of an entity from OSM.
Parameters:
place (str): a location (i.e., 'Tokyo, Japan')
tags (dict): key worth of entity attribute in OSM (i.e., 'Identify') and worth (i.e., amenity identify)
Returns:
outcomes (DataFrame): desk of latitude and longitude with entity worth
'''
gdf = osmnx.geocode_to_gdf(place)
#Getting the bounding field of the gdf
bounding = gdf.bounds
north, south, east, west = bounding.iloc[0,3], bounding.iloc[0,1], bounding.iloc[0,2], bounding.iloc[0,0]
location = gdf.geometry.unary_union
#Discovering the factors throughout the space polygon
level = osmnx.geometries_from_bbox(north,
south,
east,
west,
tags=tags)
level.set_crs(crs=4326)
level = level[point.geometry.within(location)]
#Ensuring we're coping with factors
level['geometry'] = level['geometry'].apply(lambda x : x.centroid if kind(x) == Polygon else x)
level = level[point.geom_type != 'MultiPolygon']
level = level[point.geom_type != 'Polygon']
outcomes = pd.DataFrame({'identify' : checklist(level['name']),
'longitude' : checklist(level['geometry'].x),
'latitude' : checklist(level['geometry'].y)}
)
outcomes['name'] = checklist(tags.values())[0]
return outcomes
convenience_stores = place_finder(place = 'Shinjuku, Tokyo',
tags={"model:en" : " "})
We are able to go every comfort retailer identify and mix the outcomes right into a single desk of retailer identify, longitude and latitude. For our use case we will deal with the Shinjuku neighbourhood in Tokyo, and see what the abundance of every comfort retailer seems like:
Clearly FamilyMart and 7-Eleven dominate within the frequency of shops, however how does this look spatially? Plotting geospatial knowledge is fairly easy when utilizing Kepler.gl, which features a good interface for creating visualisations which may be saved as html objects or visualised straight in Jupyter notebooks:
Now that we’ve got our knowledge, the subsequent step can be to search out nearest neighbours for every comfort retailer. To do that, we can be utilizing Scikit Be taught’s ‘BallTree’ class to search out the names of the closest comfort shops inside a two minute strolling radius. We aren’t occupied with what number of shops are thought of nearest neighbours, so we’ll simply have a look at which comfort retailer chains are throughout the outlined radius.
# Convert location to radians
areas = convenience_stores[["latitude", "longitude"]].values
locations_radians = np.radians(areas)# Create a balltree to look areas
tree = BallTree(locations_radians, leaf_size=15, metric='haversine')
# Discover nearest neighbours in a 2 minute strolling radius
is_within, distances = tree.query_radius(locations_radians, r=168/6371000, count_only=False, return_distance=True)
# Change the neighbour indices with retailer names
df = pd.DataFrame(is_within)
df.columns = ['indices']
df['indices'] = [[val for val in row if val != idx] for idx, row in enumerate(df['indices'])]
# create short-term index column
convenience_stores = convenience_stores.reset_index()
# set short-term index column as index
convenience_stores = convenience_stores.set_index('index')
# create index-name mapping
index_name_mapping = convenience_stores['name'].to_dict()
# exchange index values with names and take away duplicates
df['indices'] = df['indices'].apply(lambda lst: checklist(set(map(index_name_mapping.get, set(lst)))))
# Append again to authentic df
convenience_stores['neighbours'] = df['indices']
# Establish when a retailer has no neighbours
convenience_stores['neighbours'] = [lst if lst else ['no-neighbours'] for lst in convenience_stores['neighbours']]
# Distinctive retailer names
unique_elements = set([item for sublist in convenience_stores['neighbours'] for merchandise in sublist])
# Rely every shops frequency within the set of neighbours per location
counts = [dict(Counter(row)) for row in convenience_stores['neighbours']]
# Create a brand new dataframe with the counts
output_df = pd.DataFrame(counts).fillna(0)[sorted(unique_elements)]
If we need to enhance the accuracy of our work, we may exchange the haversine distance measure for one thing extra correct (i.e., strolling instances calculated utilizing networkx), however we’ll preserve issues easy.
This can give us a DataFrame the place every row corresponds to a location, and a binary rely of which comfort retailer chains are close by:
We now have a dataset able to carry out affiliation rule mining. Utilizing the mlxtend library we will derive affiliation guidelines utilizing the Apriori algorithm. There’s a minimal assist of 5%, in order that we will look at solely the foundations associated to frequent occurrences in our dataset (i.e., co-located comfort retailer chains). We use the metric ‘elevate’ when deriving guidelines; elevate is the ratio of the proportion of areas that comprise each the antecedent and consequent relative to the anticipated assist beneath the belief of independence.
from mlxtend.frequent_patterns import association_rules, apriori# Calculate apriori
frequent_set = apriori(output_df, min_support = 0.05, use_colnames = True)
# Create guidelines
guidelines = association_rules(frequent_set, metric = 'elevate')
# Type guidelines by the assist worth
guidelines.sort_values(['support'], ascending=False)
This provides us the next outcomes desk:
We are going to now interpret these affiliation guidelines to make some excessive degree takeaway learnings. To interpret this desk its greatest to learn extra about Affiliation Guidelines, utilizing these hyperlinks:
Okay, again to the desk.
Help is telling us how typically totally different comfort retailer chains are literally discovered collectively. Due to this fact we will say that 7-Eleven and FamilyMart are discovered collectively in ~31% of the information. A elevate over 1 signifies that the presence of the antecedent will increase the chance of the ensuing, suggesting that the areas of the 2 chains are partially dependent. However, the affiliation between 7-Eleven and Lawson exhibits the next elevate however with a decrease confidence.
Every day Yamazaki has a low assist close to our cutoff and exhibits a weak relationship with the situation of FamilyMart, given by a elevate barely above 1.
Different guidelines are referring to mixtures of comfort shops. For instance when a 7-Eleven and FamilyMart are already co-located, there’s a excessive elevate worth of 1.42 that means a powerful affiliation with Lawson.
If we had simply stopped at discovering the closest neighbours for every retailer location, we might not have been in a position to decide something concerning the relationships between these shops.
An instance of why geospatial affiliation guidelines may be insightful for companies is in figuring out new retailer areas. If a comfort retailer chain is opening a brand new location, affiliation guidelines might help to establish which shops are more likely to co-occur.
The worth on this turns into clear when tailoring advertising campaigns and pricing methods, because it supplies quantitative relationships about which shops are more likely to compete. Since we all know that FamilyMart and 7-Eleven typically co-occur, which we show with affiliation guidelines, it could make sense for each of those chains to pay extra consideration to how their merchandise compete relative to different chains akin to Lawson and Every day Yamazaki.
On this article we’ve got created geospatial affiliation guidelines for comfort retailer chains in a Tokyo neighbourhood. This was executed utilizing knowledge extraction from OpenStreetMap, discovering nearest neighbour comfort retailer chains, visualising knowledge on maps, and creating affiliation guidelines utilizing an Apriori algorithm.
Thanks for studying!