If you want to open a coffee shop, you will encounter many difficulties. Among them, choose the best place for this. It is not a secret that the best place for new coffe shop is where a lot of people are and lack of places to get coffe. There are many different location data providers, but how not to get confused in a lot of information? Can machine learning be useful to complete this non-trivial task?
This project aims to predict the best place for coffe shop based on location data. In this project we’ll try to supply business owner with such insights, as:
- Get a visual representation of venues distribution in the selected city.
- See how they are grouped into clusters in places of particular popularity
- Determine the centers of these clusters.
- For each cluster you calculate how many coffe-shops are already there.
Obviously coffee shop keepers and businessmen who want to open their own coffee shop will be interested in the results of this project.
2. Data source
In this document we will use Foursquare location data. Foursquare location data is data describing places and venues, such as their geographical location, their category, working hours, full address, and so on, such that for a given location given in the form of its geographical coordinates (or latitude and longitude values) one is able to determine what types of venues exist within a defined radius from that location.
Using the Foursquare API, we can search for specific type of venues or stores around a given location. And for a given location you will be able to tell how many of each venue category exist and how each surrounding venue is reviewed by other people.
As parameters for building the model, geographical coordinates of Brest city venues were selected within a radius of 3000 meters from its center. Such a distance made it possible to cover the entire historical center and the surrounding areas, potentially interesting for placing coffee shops.
Using the parameters indicated above, we were able to obtain information about 54 places of rest. This is a fairly small amount of data, since Forsquare is not very popular in Belarus. Nevertheless, this study will be useful as additional information for starting a new business.
3. Exploratory Data Analysis
Having visualized the distribution of venues on the map, the regularities in their distribution are not obvious. Such visualization does not provide an understanding of the category of venues and the centers around which they are placed.
Our task now is to determine the centers of congestion and determine the number of existing coffee shops around these centers. A large number of venues with a small number or absence of coffee shops will inform us of a potentially advantageous placement. But firs we’ll create a feature dataset with a row for aech venue with columns Venue Latitude and Venue Longitude:
Now we fit the K-means model with this dataset. K-Means is a type of partitioning clustering, that is, it divides the data into K non-overlapping subsets or clusters without any cluster internal structure or labels.We will take K=10 as Brest is not a big city. It will give us not very big clusters with high venues density. As a result for each venue we will get its cluster label and for each cluster we will get its center:
So now that we have the labels and their centers data generated and the KMeans model initialized, let’s plot them and see what the clusters look like.
This time the map looks more informative. Clusters became apparent and cluster centers are circled by spherical lines.Now we will calculate total number of venues in the cluster, number of coffe shops and cofe shop — venue ratio, wich will tell us the level of competition:
As a result we will print out beautiful report. Such report, as well as cluster centers location will be perfect start point for a businessmen who want to open new coffe shop and looking for the best place for it.
As noted above, Brest is not a big city. Foresquare data is not enough for a full analysis to open a new cafe. However, the results helped determine the best place. According to this study, the best cluster for a cafe is cluster 0 — with the largest number of venues and a relatively low density of cafe shops. Clusters 2 and 4 with a low density of coffee shops and with thematically suitable venues in it may also be promising.
5. Future directions
Of course, we did not conduct a full analysis. The Foursquare dataset is much more powerful. We could additionally take user ratings into account in order to understand the level of competitors. In addition, we could make an analysis of the clusters and determine their thematic affiliation — sports, cultural, etc. Let us leave this for future versions.
In this study, I analized location data that could be helpful fo those who want to open their firs caffe shop as well as for those, who allready own coffe shop and want to be in the know of the competitive world. Thank you!