In many location based applications such as google maps, user data would most certainly contain their location represented by latitude and longitude. To enable quality services and accurate analytics, we need to gather this information as precisely as possible.
For example, the LocationManager class in the android platform provides latitude and longitude accurate upto six decimal places.This level of accuracy is sufficient for almost all location-enabled services. If we are storing this in a database, it makes sense to store all the six decimal places.
However, a problem arises when we try to try to group data for visualization. If the database is big enough, its almost impossible to achieve any meaningful grouping by using the co-ordinate data as-is.
Here’s an example that occurs in the project I am working on. The Google Map has a marker for every report received by the server and groups it if there are any reports with the same latitude and longitude (all six decimal places of both the latitude and longitude have to be SAME )
The SQL query was
SELECT count(*), Latitude, Longitude FROM `geodata` WHERE 1 GROUP BY Latitude, Longitude
On further zooming in….
let’s zoom in further..
There were only five records in this example but as you can see, the grouping was ineffective due to the high amount of precision
So how do we fix this
Well, there are several solutions with their advantages and disadvantages. You can decide on which solution to implement depending on your needs
Solution 1: Reduce precision in the input data
This is probably the simplest and dirtiest way around this problem. Just take the first three decimal points and ignore all trailing data.This can be used when your application does not depend heavily on the user location
Advantage: Quick, no computation or disk storage cost
Disadvantage: Loss in data, which can have serious consequences
Solution 2: Create two extra columns which will have the reduced precision co-ordinates
This is a straight-forward way to have a meaningful grouping, have a column say lat_red_prec which will have the reduced precision latitude and similarly for longitude and group using these rather than using the actual ones.
Advantage: Simple, straight-forward and if you’re running analytics which require only lesser precison this will come handy in that
Disadvantage: Two extra columns, if you’re having a huge number of records, the extra storage required may cause latencies while running the query
Solution 3: Group by using the ROUND function
The ROUND() function takes a double value and rounds it off to the nearest integer or rounds it off to the required number of decimal places.
Your modified query would be
SELECT count(*), Latitude, Longitude FROM `geodata` WHERE 1 GROUP BY ROUND(Latitude,3),ROUND(Longitude,3)
This rounds off the data to three decimal places, which is exactly what we want.We can now either save the results of this query to a XML file which can be accessed by the map to plot the markers or execute it whenever a user requests the page.
Advantage: No data loss or redundant data
Disadvantage: If we are saving into a XML file, then we would have to have a cron job to run the query periodically. Hence, the map will not be always up to date. If we are running the query for every request, they will a overhead due to rounding off and hence may reduce the service quality
Irrespective of the solution used, final result should be:
As you can see, we have data loss and hence the marker has changed its position but only by a small amount. Depending upon whether this deviation is tolerable, you can set the parameters to the ROUND( ) function.
Happy coding 🙂