Grouping geodata – Problems and Solutions

In many location based applications such as google maps, user data would most certainly contain their location represented by latitude and longitude. To enable quality services and accurate analytics, we need to gather this information as precisely as possible.

For example, the LocationManager class in the android platform provides latitude and longitude accurate upto six decimal places.This level of accuracy is sufficient for almost all location-enabled services. If we are storing this in a database, it makes sense to store all the six decimal places.

However, a problem arises when we try to try to group data for visualization. If the database is big enough, its almost impossible to achieve any meaningful grouping by using the co-ordinate data as-is.

Here’s an example that occurs in the project I am working on. The Google Map has a marker for every report received by the server and groups it if there are any reports with the same latitude and longitude (all six decimal places of both the latitude and longitude have to be SAME )

The  SQL query was
SELECT count(*), Latitude, Longitude FROM `geodata` WHERE 1 GROUP BY Latitude, Longitude

The result:

ex1

 

On further zooming in….

ex3Oh look there are actually two reports!

let’s zoom in further..

ex2Surprise!! There are actually three reports

There were only five records in this example but as you can see, the grouping was ineffective due to the high amount of precision

So how do we fix this

Well, there are several solutions with their advantages and disadvantages. You can decide on which solution to implement depending on your needs

Solution 1: Reduce precision in the input data

This is probably the simplest and dirtiest way around this problem. Just take the first three decimal points and ignore all trailing data.This can be used when your application does not depend heavily on the user location

Advantage: Quick, no computation or disk storage cost

Disadvantage: Loss in data, which can have serious consequences

Solution 2: Create two extra columns which will have the reduced precision co-ordinates

This is a straight-forward way to have a meaningful grouping, have a column say lat_red_prec which will have the reduced precision latitude and similarly for longitude and group using these rather than using the actual ones.

Advantage: Simple, straight-forward and if you’re running analytics which require only lesser precison this will come handy in that

Disadvantage: Two extra columns, if you’re having a huge number of records, the extra storage required may cause latencies while running the query

Solution 3: Group by using the ROUND function

The ROUND() function takes a double value and rounds it off to the nearest integer or rounds it off to the required number of decimal places.
Your modified query would be

SELECT count(*), Latitude, Longitude FROM `geodata` WHERE 1 GROUP BY ROUND(Latitude,3),ROUND(Longitude,3)

This rounds off the data to three decimal places, which is exactly what we want.We can now either save the results of this query to a XML file which can be accessed by the map to plot the markers or execute it whenever a user requests the page.

Advantage: No data loss or redundant data

Disadvantage: If we are saving into a XML file, then we would have to have a cron job to run the query periodically. Hence, the map will not be always up to date. If we are running the query for every request, they will a overhead due to rounding off and hence may reduce the service quality
Irrespective of the solution used, final result should be:
exx

As you can see, we have data loss and hence the marker has changed its position but only by a small amount. Depending upon whether this deviation is tolerable, you can set the parameters to the ROUND( ) function.

Happy coding 🙂

 

 

Advertisements

Published by

gokul

Hi!, My name is Gokula Krishnan, you can (and I prefer) call me gokul. I'm a third year Computer Science major from BITS Pilani. I'm interested in Technology, Theoretical Computer Science and Discrete Mathematics. A FOSS enthusiast, I'm one of the founders of the BITS Firefox community. I'm currently working on Big Data Analytics, Machine learning and UNIX shell programming. My not-so-geeky hobbies include playing volleyball and football and origami

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s