sklearn

sklearn / GeographicSpectralClustering / 0.1.4

README.md
Spectral clustering for geographic (lat/long) data. Input is json for a python dictionary containing keys
  • "data" - whose value is a list of lat/long pairs
  • "numClusters" - whose value is an integer denoting the number of clusters that the data will be partitioned into.
The output is an ordered list containing the cluster label of each point.

We use inverse distance (in km, as calculated by the Haversine formula)  for similarity, so close points are more similar. Any points within about a meter of each other are counted as the same point. We cannot guarantee the accuracy of Haversine distances on very nearby points, so be careful. The advantage of spectral clustering is that is does not depend on cluster centers, like K-means, and so can resolve clusters that are naturally non-convex. This is based on scikit-learn's spectral clustering implementation. Read more about spectral clustering here