index.html (10523B)
1 <!DOCTYPE html> 2 <html lang="en"> 3 <head> 4 <link rel="stylesheet" href="/style.css" type="text/css"> 5 <meta charset="utf-8"> 6 <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> 7 <meta name="viewport" content="width=device-width, initial-scale=1.0"> 8 <link rel="stylesheet" type="text/css" href="/style.css"> 9 <link rel="icon" href="data:image/svg+xml,<svg xmlns=%22http://www.w3.org/2000/svg%22 viewBox=%220 0 100 100%22><text y=%22.9em%22 font-size=%2290%22>🏕️</text></svg>"> 10 <title></title> 11 </head> 12 <body> 13 <div id="page-wrapper"> 14 <div id="header" role="banner"> 15 <header class="banner"> 16 <div id="banner-text"> 17 <span class="banner-title"><a href="/">beauhilton</a></span> 18 </div> 19 </header> 20 <nav> 21 <a href="/about">about</a> 22 <a href="/now">now</a> 23 <a href="/thanks">thanks</a> 24 <a class="nav-active" href="/posts">posts</a> 25 <a href="https://notes.beauhilton.com">notes</a> 26 <a href="https://talks.beauhilton.com">talks</a> 27 <a href="https://git.beauhilton.com">git</a> 28 <a href="/contact">contact</a> 29 <a href="/atom.xml">rss</a> 30 </nav> 31 </div> 32 <main> 33 <h1> 34 geocheatcode 35 </h1> 36 <p> 37 <time id="post-date">2022-04-22</time> 38 </p> 39 <p id="post-excerpt"> 40 Here is background and code 41 for a trick I use to get 42 Google to give me best-in-class guesses 43 for latitude and longitude, 44 despite goofy and/or downright bad location searches. 45 </p> 46 <h2> 47 Map all the things 48 </h2> 49 <p> 50 I love maps. 51 </p> 52 <p> 53 Several of my projects involve mapping things at scale. 54 </p> 55 <p> 56 When you want to map a few things, you type searches into Google Maps 57 and get addresses and/or latitudes and longitudes quickly and 58 reliably. 59 </p> 60 <p> 61 But what if you’d like to map 90,000 things whose locations you don’t 62 yet know? 63 </p> 64 <p> 65 <a href="https://developers.google.com/maps">Google</a> and <a href="https://www.openstreetmap.org/">OpenStreetMap</a>, as well as 66 others, provide mapping services you can call programmatically from your 67 software. You send in some query, such as “VUMC Internal Medicine,” and 68 they return information relevant to that query, such as street address 69 and latitude and longitude. Up to a certain number of queries per day or 70 hour, the services are free, and since my work is academic, rather than 71 real-time mapping for some for-profit app, I am happy to send in small 72 batches to stay under the limits in the free tier. 73 </p> 74 <p> 75 I’ve used these services to make large maps, and they work pretty 76 well. 77 </p> 78 <p> 79 <em>Pretty</em> well. 80 </p> 81 <h2> 82 But mapping is hard 83 </h2> 84 <p> 85 Problems with these services: 86 </p> 87 <ol type="1"> 88 <li> 89 they expected well-formed and reasonable queries 90 </li> 91 <li> 92 if they didn’t know the answer, the guesses were often wildly off, 93 or they would refuse to guess at all 94 </li> 95 </ol> 96 <p> 97 If I’m mapping 90,000 things, I’m going to write some code to go 98 through each of those 90,000 things and ask the mapping services to 99 kindly tell me what I want to know. Though I write sanitation code to 100 clean up the 90,000 things, I’m not going to quality check each of those 101 90,000 things. Sometimes things among the 90,000 things are kinda nuts 102 (misspelled, inclusive of extraneous data, oddly formatted), in 103 idiosyncratic ways that are impossible to completely cover, no matter 104 how much code I write to catch the weird cases. 105 </p> 106 <p> 107 I would like a solution that is fairly tolerant of weirdnesses, and 108 makes good guesses. 109 </p> 110 <h2> 111 Google is really good at search 112 </h2> 113 <p> 114 I noticed that when I manually typed things into the Google Maps 115 search bar, it forgave a myriad of sins and did a great job centering 116 the map on its best guess. When I copied and pasted some of the weird 117 things among the 90,000 into the Google Maps search bar (the same things 118 that made the official mapping services - including Google’s - go all 119 Poltergeist), <em>voila!</em>, the right answer appeared, success rates 120 nearing 100%. 121 </p> 122 <p> 123 I thought there must be a way to repeat this process with code, in a 124 scalable way. 125 </p> 126 <p> 127 Turns out there is, and it’s easy. 128 </p> 129 <h2> 130 <code>geocheatcode.py</code> 131 </h2> 132 <pre tabindex="0"><code class="language-python"> 133 <span class="hl kwa">from</span> requests_html <span class="hl kwa">import</span> HTMLSession 134 135 session <span class="hl opt">=</span> <span class="hl kwd">HTMLSession</span><span class="hl opt">()</span> 136 137 138 <span class="hl kwa">def</span> <span class="hl kwd">google_lat_lon</span><span class="hl opt">(</span>query<span class="hl opt">:</span> <span class="hl kwb">str</span><span class="hl opt">):</span> 139 140 url <span class="hl opt">=</span> <span class="hl sng">"https://www.google.com/maps/search/?api=1"</span> 141 params <span class="hl opt">= {}</span> 142 params<span class="hl opt">[</span><span class="hl sng">"query"</span><span class="hl opt">] =</span> query 143 144 r <span class="hl opt">=</span> session<span class="hl opt">.</span><span class="hl kwd">get</span><span class="hl opt">(</span>url<span class="hl opt">,</span> params<span class="hl opt">=</span>params<span class="hl opt">)</span> 145 146 reg <span class="hl opt">=</span> <span class="hl sng">"APP_INITIALIZATION_STATE=[[[{}]"</span> 147 res <span class="hl opt">=</span> r<span class="hl opt">.</span>html<span class="hl opt">.</span><span class="hl kwd">search</span><span class="hl opt">(</span>reg<span class="hl opt">)[</span><span class="hl num">0</span><span class="hl opt">]</span> 148 lat <span class="hl opt">=</span> res<span class="hl opt">.</span><span class="hl kwd">split</span><span class="hl opt">(</span><span class="hl sng">","</span><span class="hl opt">)[</span><span class="hl num">2</span><span class="hl opt">]</span> 149 lon <span class="hl opt">=</span> res<span class="hl opt">.</span><span class="hl kwd">split</span><span class="hl opt">(</span><span class="hl sng">","</span><span class="hl opt">)[</span><span class="hl num">1</span><span class="hl opt">]</span> 150 151 <span class="hl kwa">return</span> lat<span class="hl opt">,</span> lon 152 153 154 extraneous <span class="hl opt">=</span> <span class="hl sng">""" something something</span> 155 <span class="hl sng"> the earth is banana shaped</span> 156 <span class="hl sng"> latitude and longitude </span> 157 <span class="hl sng"> wouldn't you like to know, maybe """</span> 158 159 relevant <span class="hl opt">=</span> <span class="hl sng">""" Vanderbilt University Medical Center </span> 160 <span class="hl sng"> Internal Medicine """</span> 161 162 query <span class="hl opt">=</span> extraneous <span class="hl opt">+</span> relevant 163 164 lat<span class="hl opt">,</span> lon <span class="hl opt">=</span> <span class="hl kwd">google_lat_lon</span><span class="hl opt">(</span>query<span class="hl opt">)</span> 165 166 <span class="hl kwa">print</span><span class="hl opt">(</span> 167 <span class="hl sng">"Hello. "</span> 168 <span class="hl sng">"My name is Google. "</span> 169 <span class="hl sng">"I am really good at guessing what you meant. "</span> 170 f<span class="hl sng">"Your query was '</span><span class="hl ipl">{query}</span><span class="hl sng">'. "</span> 171 <span class="hl sng">"Here are the coordinates you probably wanted. "</span> 172 f<span class="hl sng">"The latitude is</span> <span class="hl ipl">{lat}</span><span class="hl sng">, and the longitude is</span> <span class="hl ipl">{lon}</span><span class="hl sng">. "</span> 173 <span class="hl sng">"Don't believe me? "</span> 174 <span class="hl sng">"Here it is again, "</span> 175 <span class="hl sng">"in a format you can paste into the search bar:</span> <span class="hl esc">\n</span><span class="hl sng">"</span> 176 f<span class="hl sng">"</span><span class="hl ipl">{lat}</span><span class="hl sng">,</span> <span class="hl ipl">{lon}</span> <span class="hl sng"></span><span class="hl esc">\n</span><span class="hl sng">"</span> 177 <span class="hl sng">"Told ya. "</span> 178 <span class="hl opt">)</span> 179 </code></pre> 180 <p> 181 Despite having all that extra junk in the query, this returns the 182 right answer. Because Google is many things good and evil, but of these 183 one is certain: Google is <em>really</em> good at search. 184 </p> 185 <h2> 186 How does the code work? 187 </h2> 188 <p> 189 If you inspect the source HTML on the Google Maps website after you 190 search for something and it centers the map on its best guess, and you 191 scroll way on down (or Ctrl-F search for it) you’ll find 192 <code>APP_INITIALIZATION_STATE</code>, which contains latitude and 193 longitude for the place the map centered on. 194 </p> 195 <ul> 196 <li> 197 <a href="https://www.google.com/maps?q=something+whose+latitude+and+longitude+you+would+like+to+know,+maybe+VUMC+Internal+Medicine">example 198 search</a> 199 </li> 200 <li> 201 <a href="view-source:https://www.google.com/maps/search/something+whose+latitude+and+longitude+you+would+like+to+know,+maybe+VUMC+Internal+Medicine/">example 202 source</a> (you have to copy and paste this link into a new tab 203 manually, clicking won’t work) 204 </li> 205 </ul> 206 <p> 207 I use the lovely <a href="https://docs.python-requests.org/projects/requests-html/en/latest/"><code>requests-html</code></a> 208 Python library to send the query to Google, receive the response, and 209 search through the response for the part I want to extract. Then I use a 210 little standard Python to parse the extracted part and save the 211 important bits. 212 </p> 213 <h2> 214 With great power… 215 </h2> 216 <p> 217 Don’t go crazy with this. 218 </p> 219 <p> 220 The trick is good for leisurely automation of location retrieval when 221 you have squirrelly queries. 222 </p> 223 <p> 224 If you need real-time mapping of many things, you don’t want this 225 solution. Use the actual APIs, and work instead on formatting the 226 queries properly before sending them to Google/OSM. 227 </p> 228 <p> 229 Also, if you try to query too much/too quickly, Google will shut you 230 out after a little while. Put a few seconds of delay between each 231 request and run it overnight and/or in automated batches. 232 </p> 233 <h2> 234 Know a better way? 235 </h2> 236 <p> 237 I’d love to know. Drop me a line. 238 </p> 239 </main> 240 <div id="footnotes"></div> 241 <footer></footer> 242 </div> 243 </body> 244 </html>