index.html (10700B)
1 <!DOCTYPE html> 2 <html lang="en"> 3 <head> 4 <link rel="stylesheet" href="/style.css" type="text/css"> 5 <meta charset="utf-8"> 6 <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> 7 <meta name="viewport" content="width=device-width, initial-scale=1.0"> 8 <link rel="stylesheet" type="text/css" href="/style.css"> 9 <link rel="icon" href="data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' viewBox='0 0 100 100'%3E%3Cstyle%3E %23m %7B opacity:0; %7D%0A@media (prefers-color-scheme: dark) %7B %23m %7B opacity:1; %7D %23e %7B opacity:0 %7D%0A%7D %3C/style%3E%3Ctext id='m' y='.9em' font-size='90'%3E🏕️%3C/text%3E%3Ctext id='e' y='.9em' font-size='90'%3E🌞%3C/text%3E%3C/svg%3E"> 10 <title></title> 11 </head> 12 <body> 13 <div id="page-wrapper"> 14 <div id="header" role="banner"> 15 <header class="banner"> 16 <div id="banner-text"> 17 <span class="banner-title"><a href="/">beauhilton</a></span> 18 </div> 19 </header> 20 <nav> 21 <a href="/about">about</a> 22 <a href="/now">now</a> 23 <a class="nav-active" href="/posts">posts</a> 24 <a href="https://notes.beauhilton.com">notes</a> 25 <a href="https://talks.beauhilton.com">talks</a> 26 <a href="https://git.beauhilton.com">git</a> 27 <a href="/contact">contact</a> 28 <a href="/feed.xml">rss</a> 29 </nav> 30 </div> 31 <main> 32 <h1> 33 geocheatcode 34 </h1> 35 <p> 36 <time id="post-date">2022-04-22</time> 37 </p> 38 <p id="post-excerpt"> 39 Here is background and code 40 for a trick I use to get 41 Google to give me best-in-class guesses 42 for latitude and longitude, 43 despite goofy and/or downright bad location searches. 44 </p> 45 <h2> 46 Map all the things 47 </h2> 48 <p> 49 I love maps. 50 </p> 51 <p> 52 Several of my projects involve mapping things at scale. 53 </p> 54 <p> 55 When you want to map a few things, you type searches into Google Maps 56 and get addresses and/or latitudes and longitudes quickly and 57 reliably. 58 </p> 59 <p> 60 But what if you’d like to map 90,000 things whose locations you don’t 61 yet know? 62 </p> 63 <p> 64 <a href="https://developers.google.com/maps">Google</a> and <a href="https://www.openstreetmap.org/">OpenStreetMap</a>, as well as 65 others, provide mapping services you can call programmatically from your 66 software. You send in some query, such as “VUMC Internal Medicine,” and 67 they return information relevant to that query, such as street address 68 and latitude and longitude. Up to a certain number of queries per day or 69 hour, the services are free, and since my work is academic, rather than 70 real-time mapping for some for-profit app, I am happy to send in small 71 batches to stay under the limits in the free tier. 72 </p> 73 <p> 74 I’ve used these services to make large maps, and they work pretty 75 well. 76 </p> 77 <p> 78 <em>Pretty</em> well. 79 </p> 80 <h2> 81 But mapping is hard 82 </h2> 83 <p> 84 Problems with these services: 85 </p> 86 <ol type="1"> 87 <li> 88 they expected well-formed and reasonable queries 89 </li> 90 <li> 91 if they didn’t know the answer, the guesses were often wildly off, 92 or they would refuse to guess at all 93 </li> 94 </ol> 95 <p> 96 If I’m mapping 90,000 things, I’m going to write some code to go 97 through each of those 90,000 things and ask the mapping services to 98 kindly tell me what I want to know. Though I write sanitation code to 99 clean up the 90,000 things, I’m not going to quality check each of those 100 90,000 things. Sometimes things among the 90,000 things are kinda nuts 101 (misspelled, inclusive of extraneous data, oddly formatted), in 102 idiosyncratic ways that are impossible to completely cover, no matter 103 how much code I write to catch the weird cases. 104 </p> 105 <p> 106 I would like a solution that is fairly tolerant of weirdnesses, and 107 makes good guesses. 108 </p> 109 <h2> 110 Google is really good at search 111 </h2> 112 <p> 113 I noticed that when I manually typed things into the Google Maps 114 search bar, it forgave a myriad of sins and did a great job centering 115 the map on its best guess. When I copied and pasted some of the weird 116 things among the 90,000 into the Google Maps search bar (the same things 117 that made the official mapping services - including Google’s - go all 118 Poltergeist), <em>voila!</em>, the right answer appeared, success rates 119 nearing 100%. 120 </p> 121 <p> 122 I thought there must be a way to repeat this process with code, in a 123 scalable way. 124 </p> 125 <p> 126 Turns out there is, and it’s easy. 127 </p> 128 <h2> 129 <code>geocheatcode.py</code> 130 </h2> 131 <pre tabindex="0"><code class="language-python"> 132 <span class="hl kwa">from</span> requests_html <span class="hl kwa">import</span> HTMLSession 133 134 session <span class="hl opt">=</span> <span class="hl kwd">HTMLSession</span><span class="hl opt">()</span> 135 136 137 <span class="hl kwa">def</span> <span class="hl kwd">google_lat_lon</span><span class="hl opt">(</span>query<span class="hl opt">:</span> <span class="hl kwb">str</span><span class="hl opt">):</span> 138 139 url <span class="hl opt">=</span> <span class="hl sng">"https://www.google.com/maps/search/?api=1"</span> 140 params <span class="hl opt">= {}</span> 141 params<span class="hl opt">[</span><span class="hl sng">"query"</span><span class="hl opt">] =</span> query 142 143 r <span class="hl opt">=</span> session<span class="hl opt">.</span><span class="hl kwd">get</span><span class="hl opt">(</span>url<span class="hl opt">,</span> params<span class="hl opt">=</span>params<span class="hl opt">)</span> 144 145 reg <span class="hl opt">=</span> <span class="hl sng">"APP_INITIALIZATION_STATE=[[[{}]"</span> 146 res <span class="hl opt">=</span> r<span class="hl opt">.</span>html<span class="hl opt">.</span><span class="hl kwd">search</span><span class="hl opt">(</span>reg<span class="hl opt">)[</span><span class="hl num">0</span><span class="hl opt">]</span> 147 lat <span class="hl opt">=</span> res<span class="hl opt">.</span><span class="hl kwd">split</span><span class="hl opt">(</span><span class="hl sng">","</span><span class="hl opt">)[</span><span class="hl num">2</span><span class="hl opt">]</span> 148 lon <span class="hl opt">=</span> res<span class="hl opt">.</span><span class="hl kwd">split</span><span class="hl opt">(</span><span class="hl sng">","</span><span class="hl opt">)[</span><span class="hl num">1</span><span class="hl opt">]</span> 149 150 <span class="hl kwa">return</span> lat<span class="hl opt">,</span> lon 151 152 153 extraneous <span class="hl opt">=</span> <span class="hl sng">""" something something</span> 154 <span class="hl sng"> the earth is banana shaped</span> 155 <span class="hl sng"> latitude and longitude </span> 156 <span class="hl sng"> wouldn't you like to know, maybe """</span> 157 158 relevant <span class="hl opt">=</span> <span class="hl sng">""" Vanderbilt University Medical Center </span> 159 <span class="hl sng"> Internal Medicine """</span> 160 161 query <span class="hl opt">=</span> extraneous <span class="hl opt">+</span> relevant 162 163 lat<span class="hl opt">,</span> lon <span class="hl opt">=</span> <span class="hl kwd">google_lat_lon</span><span class="hl opt">(</span>query<span class="hl opt">)</span> 164 165 <span class="hl kwa">print</span><span class="hl opt">(</span> 166 <span class="hl sng">"Hello. "</span> 167 <span class="hl sng">"My name is Google. "</span> 168 <span class="hl sng">"I am really good at guessing what you meant. "</span> 169 f<span class="hl sng">"Your query was '</span><span class="hl ipl">{query}</span><span class="hl sng">'. "</span> 170 <span class="hl sng">"Here are the coordinates you probably wanted. "</span> 171 f<span class="hl sng">"The latitude is</span> <span class="hl ipl">{lat}</span><span class="hl sng">, and the longitude is</span> <span class="hl ipl">{lon}</span><span class="hl sng">. "</span> 172 <span class="hl sng">"Don't believe me? "</span> 173 <span class="hl sng">"Here it is again, "</span> 174 <span class="hl sng">"in a format you can paste into the search bar:</span> <span class="hl esc">\n</span><span class="hl sng">"</span> 175 f<span class="hl sng">"</span><span class="hl ipl">{lat}</span><span class="hl sng">,</span> <span class="hl ipl">{lon}</span> <span class="hl sng"></span><span class="hl esc">\n</span><span class="hl sng">"</span> 176 <span class="hl sng">"Told ya. "</span> 177 <span class="hl opt">)</span> 178 </code></pre> 179 <p> 180 Despite having all that extra junk in the query, this returns the 181 right answer. Because Google is many things good and evil, but of these 182 one is certain: Google is <em>really</em> good at search. 183 </p> 184 <h2> 185 How does the code work? 186 </h2> 187 <p> 188 If you inspect the source HTML on the Google Maps website after you 189 search for something and it centers the map on its best guess, and you 190 scroll way on down (or Ctrl-F search for it) you’ll find 191 <code>APP_INITIALIZATION_STATE</code>, which contains latitude and 192 longitude for the place the map centered on. 193 </p> 194 <ul> 195 <li> 196 <a href="https://www.google.com/maps?q=something+whose+latitude+and+longitude+you+would+like+to+know,+maybe+VUMC+Internal+Medicine">example 197 search</a> 198 </li> 199 <li> 200 <a href="view-source:https://www.google.com/maps/search/something+whose+latitude+and+longitude+you+would+like+to+know,+maybe+VUMC+Internal+Medicine/">example 201 source</a> (you have to copy and paste this link into a new tab 202 manually, clicking won’t work) 203 </li> 204 </ul> 205 <p> 206 I use the lovely <a href="https://docs.python-requests.org/projects/requests-html/en/latest/"><code>requests-html</code></a> 207 Python library to send the query to Google, receive the response, and 208 search through the response for the part I want to extract. Then I use a 209 little standard Python to parse the extracted part and save the 210 important bits. 211 </p> 212 <h2> 213 With great power… 214 </h2> 215 <p> 216 Don’t go crazy with this. 217 </p> 218 <p> 219 The trick is good for leisurely automation of location retrieval when 220 you have squirrelly queries. 221 </p> 222 <p> 223 If you need real-time mapping of many things, you don’t want this 224 solution. Use the actual APIs, and work instead on formatting the 225 queries properly before sending them to Google/OSM. 226 </p> 227 <p> 228 Also, if you try to query too much/too quickly, Google will shut you 229 out after a little while. Put a few seconds of delay between each 230 request and run it overnight and/or in automated batches. 231 </p> 232 <h2> 233 Know a better way? 234 </h2> 235 <p> 236 I’d love to know. Drop me a line. 237 </p> 238 </main> 239 <div id="footnotes"></div> 240 <footer></footer> 241 </div> 242 </body> 243 </html>