site

files for beauhilton.com
git clone https://git.beauhilton.com/site.git
Log | Files | Refs

geocheatcode.md (5709B)


      1 # geocheatcode
      2 
      3 <time id="post-date">2022-04-22</time>
      4 
      5 <p id="post-excerpt">
      6 Here is background and code
      7 for a trick I use to get
      8 Google to give me best-in-class guesses 
      9 for latitude and longitude,
     10 despite goofy and/or downright bad location searches.
     11 </p>
     12 
     13 ## Map all the things
     14 
     15 I love maps.
     16 
     17 Several of my projects involve mapping things at scale.
     18 
     19 When you want to map a few things,
     20 you type searches into Google Maps
     21 and get addresses and/or latitudes and longitudes
     22 quickly and reliably.
     23 
     24 But what if you'd like to map 90,000 things
     25 whose locations you don't yet know?
     26 
     27 [Google](https://developers.google.com/maps) 
     28 and 
     29 [OpenStreetMap](https://www.openstreetmap.org/), 
     30 as well as others,
     31 provide mapping services
     32 you can call programmatically from your software.
     33 You send in some query, 
     34 such as "VUMC Internal Medicine,"
     35 and they return information
     36 relevant to that query,
     37 such as street address and 
     38 latitude and longitude.
     39 Up to a certain number of queries per day or hour, 
     40 the services are free, 
     41 and since my work is academic,
     42 rather than real-time mapping for some
     43 for-profit app, 
     44 I am happy to send in small batches
     45 to stay under the limits in the free tier.
     46 
     47 I've used these services to make large maps,
     48 and they work pretty well. 
     49 
     50 *Pretty* well.
     51 
     52 ## But mapping is hard
     53 
     54 Problems with these services:
     55 
     56 1. they expected well-formed and reasonable queries
     57 2. if they didn't know the answer, the guesses were often wildly off, or they would refuse to guess at all
     58 
     59 If I'm mapping 90,000 things,
     60 I'm going to write some code 
     61 to go through each of those 90,000 things
     62 and ask the mapping services 
     63 to kindly tell me what I want to know.
     64 Though I write sanitation code to clean up the 90,000 things,
     65 I'm not going to quality check each of those 90,000 things.
     66 Sometimes things among the 90,000 things are kinda nuts
     67 (misspelled, inclusive of extraneous data, oddly formatted),
     68 in idiosyncratic ways that are impossible to completely cover,
     69 no matter how much code I write to catch the weird cases.
     70 
     71 I would like a solution that is fairly tolerant of weirdnesses,
     72 and makes good guesses.
     73 
     74 ## Google is really good at search
     75 
     76 I noticed that when I manually typed things 
     77 into the Google Maps search bar,
     78 it forgave a myriad of sins
     79 and did a great job centering the map on its best guess.
     80 When I copied and pasted some of the weird things among the 90,000
     81 into the Google Maps search bar
     82 (the same things that made the 
     83 official mapping services - including Google's - 
     84 go all Poltergeist),
     85 *voila!*, the right answer appeared,
     86 success rates nearing 100%.
     87 
     88 I thought there must be a way to repeat this process with code,
     89 in a scalable way.
     90 
     91 Turns out there is, and it's easy.
     92 
     93 ## `geocheatcode.py`
     94 
     95 ```python
     96 
     97 from requests_html import HTMLSession
     98 
     99 session = HTMLSession()
    100 
    101 
    102 def google_lat_lon(query: str):
    103 
    104     url = "https://www.google.com/maps/search/?api=1"
    105     params = {}
    106     params["query"] = query
    107 
    108     r = session.get(url, params=params)
    109 
    110     reg = "APP_INITIALIZATION_STATE=[[[{}]"
    111     res = r.html.search(reg)[0]
    112     lat = res.split(",")[2]
    113     lon = res.split(",")[1]
    114 
    115     return lat, lon
    116 
    117 
    118 extraneous = """ something something
    119                  the earth is banana shaped
    120                  latitude and longitude 
    121                  wouldn't you like to know, maybe """
    122 
    123 relevant = """ Vanderbilt University Medical Center 
    124                Internal Medicine """
    125 
    126 query = extraneous + relevant
    127 
    128 lat, lon = google_lat_lon(query)
    129 
    130 print( 
    131        "Hello. "
    132        "My name is Google. "
    133        "I am really good at guessing what you meant. "
    134       f"Your query was '{query}'. "
    135        "Here are the coordinates you probably wanted. "
    136       f"The latitude is {lat}, and the longitude is {lon}. "
    137        "Don't believe me? "
    138        "Here it is again, "
    139        "in a format you can paste into the search bar: \n"
    140       f"{lat}, {lon} \n"
    141        "Told ya. "
    142 )
    143 
    144 ```
    145 
    146 Despite having all that extra junk in the query,
    147 this returns the right answer. 
    148 Because Google is many things good and evil,
    149 but of these one is certain: 
    150 Google is *really* good at search.
    151 
    152 ## How does the code work?
    153 
    154 If you inspect the source HTML
    155 on the Google Maps website 
    156 after you search for something
    157 and it centers the map on its best guess, 
    158 and you scroll way on down (or Ctrl-F search for it)
    159 you'll find `APP_INITIALIZATION_STATE`, which contains
    160 latitude and longitude for the place the map centered on.
    161 
    162 - [example search](https://www.google.com/maps?q=something+whose+latitude+and+longitude+you+would+like+to+know,+maybe+VUMC+Internal+Medicine)
    163 - [example source](view-source:https://www.google.com/maps/search/something+whose+latitude+and+longitude+you+would+like+to+know,+maybe+VUMC+Internal+Medicine/) (you have to copy and paste this link into a new tab manually, clicking won't work)
    164 
    165 I use the lovely 
    166 [`requests-html`](https://docs.python-requests.org/projects/requests-html/en/latest/) 
    167 Python library
    168 to send the query to Google,
    169 receive the response,
    170 and search through the response for the part I want to extract.
    171 Then I use a little standard Python 
    172 to parse the extracted part and save the important bits.
    173 
    174 ## With great power...
    175 
    176 Don't go crazy with this. 
    177 
    178 The trick is good for 
    179 leisurely automation 
    180 of location retrieval
    181 when you have squirrelly queries.
    182 
    183 If you need real-time mapping of many things,
    184 you don't want this solution.
    185 Use the actual APIs, 
    186 and work instead on formatting the queries properly
    187 before sending them to Google/OSM.
    188 
    189 Also, if you try to query too much/too quickly,
    190 Google will shut you out after a little while.
    191 Put a few seconds of delay between each request 
    192 and run it overnight and/or in automated batches.
    193 
    194 ## Know a better way?
    195 
    196 I'd love to know. Drop me a line.