geocheatcode.md (5709B)
1 # geocheatcode 2 3 <time id="post-date">2022-04-22</time> 4 5 <p id="post-excerpt"> 6 Here is background and code 7 for a trick I use to get 8 Google to give me best-in-class guesses 9 for latitude and longitude, 10 despite goofy and/or downright bad location searches. 11 </p> 12 13 ## Map all the things 14 15 I love maps. 16 17 Several of my projects involve mapping things at scale. 18 19 When you want to map a few things, 20 you type searches into Google Maps 21 and get addresses and/or latitudes and longitudes 22 quickly and reliably. 23 24 But what if you'd like to map 90,000 things 25 whose locations you don't yet know? 26 27 [Google](https://developers.google.com/maps) 28 and 29 [OpenStreetMap](https://www.openstreetmap.org/), 30 as well as others, 31 provide mapping services 32 you can call programmatically from your software. 33 You send in some query, 34 such as "VUMC Internal Medicine," 35 and they return information 36 relevant to that query, 37 such as street address and 38 latitude and longitude. 39 Up to a certain number of queries per day or hour, 40 the services are free, 41 and since my work is academic, 42 rather than real-time mapping for some 43 for-profit app, 44 I am happy to send in small batches 45 to stay under the limits in the free tier. 46 47 I've used these services to make large maps, 48 and they work pretty well. 49 50 *Pretty* well. 51 52 ## But mapping is hard 53 54 Problems with these services: 55 56 1. they expected well-formed and reasonable queries 57 2. if they didn't know the answer, the guesses were often wildly off, or they would refuse to guess at all 58 59 If I'm mapping 90,000 things, 60 I'm going to write some code 61 to go through each of those 90,000 things 62 and ask the mapping services 63 to kindly tell me what I want to know. 64 Though I write sanitation code to clean up the 90,000 things, 65 I'm not going to quality check each of those 90,000 things. 66 Sometimes things among the 90,000 things are kinda nuts 67 (misspelled, inclusive of extraneous data, oddly formatted), 68 in idiosyncratic ways that are impossible to completely cover, 69 no matter how much code I write to catch the weird cases. 70 71 I would like a solution that is fairly tolerant of weirdnesses, 72 and makes good guesses. 73 74 ## Google is really good at search 75 76 I noticed that when I manually typed things 77 into the Google Maps search bar, 78 it forgave a myriad of sins 79 and did a great job centering the map on its best guess. 80 When I copied and pasted some of the weird things among the 90,000 81 into the Google Maps search bar 82 (the same things that made the 83 official mapping services - including Google's - 84 go all Poltergeist), 85 *voila!*, the right answer appeared, 86 success rates nearing 100%. 87 88 I thought there must be a way to repeat this process with code, 89 in a scalable way. 90 91 Turns out there is, and it's easy. 92 93 ## `geocheatcode.py` 94 95 ```python 96 97 from requests_html import HTMLSession 98 99 session = HTMLSession() 100 101 102 def google_lat_lon(query: str): 103 104 url = "https://www.google.com/maps/search/?api=1" 105 params = {} 106 params["query"] = query 107 108 r = session.get(url, params=params) 109 110 reg = "APP_INITIALIZATION_STATE=[[[{}]" 111 res = r.html.search(reg)[0] 112 lat = res.split(",")[2] 113 lon = res.split(",")[1] 114 115 return lat, lon 116 117 118 extraneous = """ something something 119 the earth is banana shaped 120 latitude and longitude 121 wouldn't you like to know, maybe """ 122 123 relevant = """ Vanderbilt University Medical Center 124 Internal Medicine """ 125 126 query = extraneous + relevant 127 128 lat, lon = google_lat_lon(query) 129 130 print( 131 "Hello. " 132 "My name is Google. " 133 "I am really good at guessing what you meant. " 134 f"Your query was '{query}'. " 135 "Here are the coordinates you probably wanted. " 136 f"The latitude is {lat}, and the longitude is {lon}. " 137 "Don't believe me? " 138 "Here it is again, " 139 "in a format you can paste into the search bar: \n" 140 f"{lat}, {lon} \n" 141 "Told ya. " 142 ) 143 144 ``` 145 146 Despite having all that extra junk in the query, 147 this returns the right answer. 148 Because Google is many things good and evil, 149 but of these one is certain: 150 Google is *really* good at search. 151 152 ## How does the code work? 153 154 If you inspect the source HTML 155 on the Google Maps website 156 after you search for something 157 and it centers the map on its best guess, 158 and you scroll way on down (or Ctrl-F search for it) 159 you'll find `APP_INITIALIZATION_STATE`, which contains 160 latitude and longitude for the place the map centered on. 161 162 - [example search](https://www.google.com/maps?q=something+whose+latitude+and+longitude+you+would+like+to+know,+maybe+VUMC+Internal+Medicine) 163 - [example source](view-source:https://www.google.com/maps/search/something+whose+latitude+and+longitude+you+would+like+to+know,+maybe+VUMC+Internal+Medicine/) (you have to copy and paste this link into a new tab manually, clicking won't work) 164 165 I use the lovely 166 [`requests-html`](https://docs.python-requests.org/projects/requests-html/en/latest/) 167 Python library 168 to send the query to Google, 169 receive the response, 170 and search through the response for the part I want to extract. 171 Then I use a little standard Python 172 to parse the extracted part and save the important bits. 173 174 ## With great power... 175 176 Don't go crazy with this. 177 178 The trick is good for 179 leisurely automation 180 of location retrieval 181 when you have squirrelly queries. 182 183 If you need real-time mapping of many things, 184 you don't want this solution. 185 Use the actual APIs, 186 and work instead on formatting the queries properly 187 before sending them to Google/OSM. 188 189 Also, if you try to query too much/too quickly, 190 Google will shut you out after a little while. 191 Put a few seconds of delay between each request 192 and run it overnight and/or in automated batches. 193 194 ## Know a better way? 195 196 I'd love to know. Drop me a line.