geocheatcode.md (5643B)
1 2 # geocheatcode 3 4 Here is background and code 5 for a trick I use to get 6 Google to give me best-in-class guesses 7 for latitude and longitude, 8 despite goofy and/or downright bad location searches. 9 10 ## Map all the things 11 12 I love maps. 13 14 Several of my projects involve mapping things at scale. 15 16 When you want to map a few things, 17 you type searches into Google Maps 18 and get addresses and/or latitudes and longitudes 19 quickly and reliably. 20 21 But what if you'd like to map 90,000 things 22 whose locations you don't yet know? 23 24 [Google](https://developers.google.com/maps) 25 and 26 [OpenStreetMap](https://www.openstreetmap.org/), 27 as well as others, 28 provide mapping services 29 you can call programmatically from your software. 30 You send in some query, 31 such as "VUMC Internal Medicine," 32 and they return information 33 relevant to that query, 34 such as street address and 35 latitude and longitude. 36 Up to a certain number of queries per day or hour, 37 the services are free, 38 and since my work is academic, 39 rather than real-time mapping for some 40 for-profit app, 41 I am happy to send in small batches 42 to stay under the limits in the free tier. 43 44 I've used these services to make large maps, 45 and they work pretty well. 46 47 *Pretty* well. 48 49 ## But mapping is hard 50 51 Problems with these services: 52 53 1. they expected well-formed and reasonable queries 54 2. if they didn't know the answer, the guesses were often wildly off, or they would refuse to guess at all 55 56 If I'm mapping 90,000 things, 57 I'm going to write some code 58 to go through each of those 90,000 things 59 and ask the mapping services 60 to kindly tell me what I want to know. 61 Though I write sanitation code to clean up the 90,000 things, 62 I'm not going to quality check each of those 90,000 things. 63 Sometimes things among the 90,000 things are kinda nuts 64 (misspelled, inclusive of extraneous data, oddly formatted), 65 in idiosyncratic ways that are impossible to completely cover, 66 no matter how much code I write to catch the weird cases. 67 68 I would like a solution that is fairly tolerant of weirdnesses, 69 and makes good guesses. 70 71 ## Google is really good at search 72 73 I noticed that when I manually typed things 74 into the Google Maps search bar, 75 it forgave a myriad of sins 76 and did a great job centering the map on its best guess. 77 When I copied and pasted some of the weird things among the 90,000 78 into the Google Maps search bar 79 (the same things that made the 80 official mapping services - including Google's - 81 go all Poltergeist), 82 *voila!*, the right answer appeared, 83 success rates nearing 100%. 84 85 I thought there must be a way to repeat this process with code, 86 in a scalable way. 87 88 Turns out there is, and it's easy. 89 90 ## `geocheatcode.py` 91 92 ```python 93 94 from requests_html import HTMLSession 95 96 session = HTMLSession() 97 98 99 def google_lat_lon(query: str): 100 101 url = "https://www.google.com/maps/search/?api=1" 102 params = {} 103 params["query"] = query 104 105 r = session.get(url, params=params) 106 107 reg = "APP_INITIALIZATION_STATE=[[[{}]" 108 res = r.html.search(reg)[0] 109 lat = res.split(",")[2] 110 lon = res.split(",")[1] 111 112 return lat, lon 113 114 115 extraneous = """ something something 116 the earth is banana shaped 117 latitude and longitude 118 wouldn't you like to know, maybe """ 119 120 relevant = """ Vanderbilt University Medical Center 121 Internal Medicine """ 122 123 query = extraneous + relevant 124 125 lat, lon = google_lat_lon(query) 126 127 print( 128 "Hello. " 129 "My name is Google. " 130 "I am really good at guessing what you meant. " 131 f"Your query was '{query}'. " 132 "Here are the coordinates you probably wanted. " 133 f"The latitude is {lat}, and the longitude is {lon}. " 134 "Don't believe me? " 135 "Here it is again, " 136 "in a format you can paste into the search bar: \n" 137 f"{lat}, {lon} \n" 138 "Told ya. " 139 ) 140 141 ``` 142 143 Despite having all that extra junk in the query, 144 this returns the right answer. 145 Because Google is many things good and evil, 146 but of these one is certain: 147 Google is *really* good at search. 148 149 ## How does the code work? 150 151 If you inspect the source HTML 152 on the Google Maps website 153 after you search for something 154 and it centers the map on its best guess, 155 and you scroll way on down (or Ctrl-F search for it) 156 you'll find `APP_INITIALIZATION_STATE`, which contains 157 latitude and longitude for the place the map centered on. 158 159 - [example search](https://www.google.com/maps?q=something+whose+latitude+and+longitude+you+would+like+to+know,+maybe+VUMC+Internal+Medicine) 160 - [example source](view-source:https://www.google.com/maps/search/something+whose+latitude+and+longitude+you+would+like+to+know,+maybe+VUMC+Internal+Medicine/) (you have to copy and paste this link into a new tab manually, clicking won't work) 161 162 I use the lovely 163 [`requests-html`](https://docs.python-requests.org/projects/requests-html/en/latest/) 164 Python library 165 to send the query to Google, 166 receive the response, 167 and search through the response for the part I want to extract. 168 Then I use a little standard Python 169 to parse the extracted part and save the important bits. 170 171 ## With great power... 172 173 Don't go crazy with this. 174 175 The trick is good for 176 leisurely automation 177 of location retrieval 178 when you have squirrelly queries. 179 180 If you need real-time mapping of many things, 181 you don't want this solution. 182 Use the actual APIs, 183 and work instead on formatting the queries properly 184 before sending them to Google/OSM. 185 186 Also, if you try to query too much/too quickly, 187 Google will shut you out after a little while. 188 Put a few seconds of delay between each request 189 and run it overnight and/or in automated batches. 190 191 ## Know a better way? 192 193 I'd love to know. Drop me a line.