site

source files for beau's website
git clone https://git.beauhilton.com/site.git
Log | Files | Refs

geocheatcode.md (5643B)


      1 
      2 # geocheatcode
      3 
      4 Here is background and code
      5 for a trick I use to get
      6 Google to give me best-in-class guesses 
      7 for latitude and longitude,
      8 despite goofy and/or downright bad location searches.
      9 
     10 ## Map all the things
     11 
     12 I love maps.
     13 
     14 Several of my projects involve mapping things at scale.
     15 
     16 When you want to map a few things,
     17 you type searches into Google Maps
     18 and get addresses and/or latitudes and longitudes
     19 quickly and reliably.
     20 
     21 But what if you'd like to map 90,000 things
     22 whose locations you don't yet know?
     23 
     24 [Google](https://developers.google.com/maps) 
     25 and 
     26 [OpenStreetMap](https://www.openstreetmap.org/), 
     27 as well as others,
     28 provide mapping services
     29 you can call programmatically from your software.
     30 You send in some query, 
     31 such as "VUMC Internal Medicine,"
     32 and they return information
     33 relevant to that query,
     34 such as street address and 
     35 latitude and longitude.
     36 Up to a certain number of queries per day or hour, 
     37 the services are free, 
     38 and since my work is academic,
     39 rather than real-time mapping for some
     40 for-profit app, 
     41 I am happy to send in small batches
     42 to stay under the limits in the free tier.
     43 
     44 I've used these services to make large maps,
     45 and they work pretty well. 
     46 
     47 *Pretty* well.
     48 
     49 ## But mapping is hard
     50 
     51 Problems with these services:
     52 
     53 1. they expected well-formed and reasonable queries
     54 2. if they didn't know the answer, the guesses were often wildly off, or they would refuse to guess at all
     55 
     56 If I'm mapping 90,000 things,
     57 I'm going to write some code 
     58 to go through each of those 90,000 things
     59 and ask the mapping services 
     60 to kindly tell me what I want to know.
     61 Though I write sanitation code to clean up the 90,000 things,
     62 I'm not going to quality check each of those 90,000 things.
     63 Sometimes things among the 90,000 things are kinda nuts
     64 (misspelled, inclusive of extraneous data, oddly formatted),
     65 in idiosyncratic ways that are impossible to completely cover,
     66 no matter how much code I write to catch the weird cases.
     67 
     68 I would like a solution that is fairly tolerant of weirdnesses,
     69 and makes good guesses.
     70 
     71 ## Google is really good at search
     72 
     73 I noticed that when I manually typed things 
     74 into the Google Maps search bar,
     75 it forgave a myriad of sins
     76 and did a great job centering the map on its best guess.
     77 When I copied and pasted some of the weird things among the 90,000
     78 into the Google Maps search bar
     79 (the same things that made the 
     80 official mapping services - including Google's - 
     81 go all Poltergeist),
     82 *voila!*, the right answer appeared,
     83 success rates nearing 100%.
     84 
     85 I thought there must be a way to repeat this process with code,
     86 in a scalable way.
     87 
     88 Turns out there is, and it's easy.
     89 
     90 ## `geocheatcode.py`
     91 
     92 ```python
     93 
     94 from requests_html import HTMLSession
     95 
     96 session = HTMLSession()
     97 
     98 
     99 def google_lat_lon(query: str):
    100 
    101     url = "https://www.google.com/maps/search/?api=1"
    102     params = {}
    103     params["query"] = query
    104 
    105     r = session.get(url, params=params)
    106 
    107     reg = "APP_INITIALIZATION_STATE=[[[{}]"
    108     res = r.html.search(reg)[0]
    109     lat = res.split(",")[2]
    110     lon = res.split(",")[1]
    111 
    112     return lat, lon
    113 
    114 
    115 extraneous = """ something something
    116                  the earth is banana shaped
    117                  latitude and longitude 
    118                  wouldn't you like to know, maybe """
    119 
    120 relevant = """ Vanderbilt University Medical Center 
    121                Internal Medicine """
    122 
    123 query = extraneous + relevant
    124 
    125 lat, lon = google_lat_lon(query)
    126 
    127 print( 
    128        "Hello. "
    129        "My name is Google. "
    130        "I am really good at guessing what you meant. "
    131       f"Your query was '{query}'. "
    132        "Here are the coordinates you probably wanted. "
    133       f"The latitude is {lat}, and the longitude is {lon}. "
    134        "Don't believe me? "
    135        "Here it is again, "
    136        "in a format you can paste into the search bar: \n"
    137       f"{lat}, {lon} \n"
    138        "Told ya. "
    139 )
    140 
    141 ```
    142 
    143 Despite having all that extra junk in the query,
    144 this returns the right answer. 
    145 Because Google is many things good and evil,
    146 but of these one is certain: 
    147 Google is *really* good at search.
    148 
    149 ## How does the code work?
    150 
    151 If you inspect the source HTML
    152 on the Google Maps website 
    153 after you search for something
    154 and it centers the map on its best guess, 
    155 and you scroll way on down (or Ctrl-F search for it)
    156 you'll find `APP_INITIALIZATION_STATE`, which contains
    157 latitude and longitude for the place the map centered on.
    158 
    159 - [example search](https://www.google.com/maps?q=something+whose+latitude+and+longitude+you+would+like+to+know,+maybe+VUMC+Internal+Medicine)
    160 - [example source](view-source:https://www.google.com/maps/search/something+whose+latitude+and+longitude+you+would+like+to+know,+maybe+VUMC+Internal+Medicine/) (you have to copy and paste this link into a new tab manually, clicking won't work)
    161 
    162 I use the lovely 
    163 [`requests-html`](https://docs.python-requests.org/projects/requests-html/en/latest/) 
    164 Python library
    165 to send the query to Google,
    166 receive the response,
    167 and search through the response for the part I want to extract.
    168 Then I use a little standard Python 
    169 to parse the extracted part and save the important bits.
    170 
    171 ## With great power...
    172 
    173 Don't go crazy with this. 
    174 
    175 The trick is good for 
    176 leisurely automation 
    177 of location retrieval
    178 when you have squirrelly queries.
    179 
    180 If you need real-time mapping of many things,
    181 you don't want this solution.
    182 Use the actual APIs, 
    183 and work instead on formatting the queries properly
    184 before sending them to Google/OSM.
    185 
    186 Also, if you try to query too much/too quickly,
    187 Google will shut you out after a little while.
    188 Put a few seconds of delay between each request 
    189 and run it overnight and/or in automated batches.
    190 
    191 ## Know a better way?
    192 
    193 I'd love to know. Drop me a line.