site

files for beauhilton.com
git clone https://git.beauhilton.com/site.git
Log | Files | Refs

index.html (10523B)


      1 <!DOCTYPE html>
      2 <html lang="en">
      3  <head>
      4   <link rel="stylesheet" href="/style.css" type="text/css">
      5   <meta charset="utf-8">
      6   <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
      7   <meta name="viewport" content="width=device-width, initial-scale=1.0">
      8   <link rel="stylesheet" type="text/css" href="/style.css">
      9   <link rel="icon" href="data:image/svg+xml,<svg xmlns=%22http://www.w3.org/2000/svg%22 viewBox=%220 0 100 100%22><text y=%22.9em%22 font-size=%2290%22>🏕️</text></svg>">
     10   <title></title>
     11  </head>
     12  <body>
     13   <div id="page-wrapper">
     14    <div id="header" role="banner">
     15     <header class="banner">
     16      <div id="banner-text">
     17       <span class="banner-title"><a href="/">beauhilton</a></span>
     18      </div>
     19     </header>
     20     <nav>
     21      <a href="/about">about</a>
     22 <a href="/now">now</a>
     23 <a href="/thanks">thanks</a>
     24 <a class="nav-active" href="/posts">posts</a>
     25 <a href="https://notes.beauhilton.com">notes</a>
     26 <a href="https://talks.beauhilton.com">talks</a>
     27 <a href="https://git.beauhilton.com">git</a>
     28 <a href="/contact">contact</a>
     29 <a href="/atom.xml">rss</a>
     30     </nav>
     31    </div>
     32    <main>
     33     <h1>
     34      geocheatcode
     35     </h1>
     36     <p>
     37      <time id="post-date">2022-04-22</time>
     38     </p>
     39     <p id="post-excerpt">
     40      Here is background and code
     41 for a trick I use to get
     42 Google to give me best-in-class guesses 
     43 for latitude and longitude,
     44 despite goofy and/or downright bad location searches.
     45     </p>
     46     <h2>
     47      Map all the things
     48     </h2>
     49     <p>
     50      I love maps.
     51     </p>
     52     <p>
     53      Several of my projects involve mapping things at scale.
     54     </p>
     55     <p>
     56      When you want to map a few things, you type searches into Google Maps
     57 and get addresses and/or latitudes and longitudes quickly and
     58 reliably.
     59     </p>
     60     <p>
     61      But what if you’d like to map 90,000 things whose locations you don’t
     62 yet know?
     63     </p>
     64     <p>
     65      <a href="https://developers.google.com/maps">Google</a> and <a href="https://www.openstreetmap.org/">OpenStreetMap</a>, as well as
     66 others, provide mapping services you can call programmatically from your
     67 software. You send in some query, such as “VUMC Internal Medicine,” and
     68 they return information relevant to that query, such as street address
     69 and latitude and longitude. Up to a certain number of queries per day or
     70 hour, the services are free, and since my work is academic, rather than
     71 real-time mapping for some for-profit app, I am happy to send in small
     72 batches to stay under the limits in the free tier.
     73     </p>
     74     <p>
     75      I’ve used these services to make large maps, and they work pretty
     76 well.
     77     </p>
     78     <p>
     79      <em>Pretty</em> well.
     80     </p>
     81     <h2>
     82      But mapping is hard
     83     </h2>
     84     <p>
     85      Problems with these services:
     86     </p>
     87     <ol type="1">
     88      <li>
     89       they expected well-formed and reasonable queries
     90      </li>
     91      <li>
     92       if they didn’t know the answer, the guesses were often wildly off,
     93 or they would refuse to guess at all
     94      </li>
     95     </ol>
     96     <p>
     97      If I’m mapping 90,000 things, I’m going to write some code to go
     98 through each of those 90,000 things and ask the mapping services to
     99 kindly tell me what I want to know. Though I write sanitation code to
    100 clean up the 90,000 things, I’m not going to quality check each of those
    101 90,000 things. Sometimes things among the 90,000 things are kinda nuts
    102 (misspelled, inclusive of extraneous data, oddly formatted), in
    103 idiosyncratic ways that are impossible to completely cover, no matter
    104 how much code I write to catch the weird cases.
    105     </p>
    106     <p>
    107      I would like a solution that is fairly tolerant of weirdnesses, and
    108 makes good guesses.
    109     </p>
    110     <h2>
    111      Google is really good at search
    112     </h2>
    113     <p>
    114      I noticed that when I manually typed things into the Google Maps
    115 search bar, it forgave a myriad of sins and did a great job centering
    116 the map on its best guess. When I copied and pasted some of the weird
    117 things among the 90,000 into the Google Maps search bar (the same things
    118 that made the official mapping services - including Google’s - go all
    119 Poltergeist), <em>voila!</em>, the right answer appeared, success rates
    120 nearing 100%.
    121     </p>
    122     <p>
    123      I thought there must be a way to repeat this process with code, in a
    124 scalable way.
    125     </p>
    126     <p>
    127      Turns out there is, and it’s easy.
    128     </p>
    129     <h2>
    130      <code>geocheatcode.py</code>
    131     </h2>
    132     <pre tabindex="0"><code class="language-python">
    133 <span class="hl kwa">from</span> requests_html <span class="hl kwa">import</span> HTMLSession
    134 
    135 session <span class="hl opt">=</span> <span class="hl kwd">HTMLSession</span><span class="hl opt">()</span>
    136 
    137 
    138 <span class="hl kwa">def</span> <span class="hl kwd">google_lat_lon</span><span class="hl opt">(</span>query<span class="hl opt">:</span> <span class="hl kwb">str</span><span class="hl opt">):</span>
    139 
    140     url <span class="hl opt">=</span> <span class="hl sng">"https://www.google.com/maps/search/?api=1"</span>
    141     params <span class="hl opt">= {}</span>
    142     params<span class="hl opt">[</span><span class="hl sng">"query"</span><span class="hl opt">] =</span> query
    143 
    144     r <span class="hl opt">=</span> session<span class="hl opt">.</span><span class="hl kwd">get</span><span class="hl opt">(</span>url<span class="hl opt">,</span> params<span class="hl opt">=</span>params<span class="hl opt">)</span>
    145 
    146     reg <span class="hl opt">=</span> <span class="hl sng">"APP_INITIALIZATION_STATE=[[[{}]"</span>
    147     res <span class="hl opt">=</span> r<span class="hl opt">.</span>html<span class="hl opt">.</span><span class="hl kwd">search</span><span class="hl opt">(</span>reg<span class="hl opt">)[</span><span class="hl num">0</span><span class="hl opt">]</span>
    148     lat <span class="hl opt">=</span> res<span class="hl opt">.</span><span class="hl kwd">split</span><span class="hl opt">(</span><span class="hl sng">","</span><span class="hl opt">)[</span><span class="hl num">2</span><span class="hl opt">]</span>
    149     lon <span class="hl opt">=</span> res<span class="hl opt">.</span><span class="hl kwd">split</span><span class="hl opt">(</span><span class="hl sng">","</span><span class="hl opt">)[</span><span class="hl num">1</span><span class="hl opt">]</span>
    150 
    151     <span class="hl kwa">return</span> lat<span class="hl opt">,</span> lon
    152 
    153 
    154 extraneous <span class="hl opt">=</span> <span class="hl sng">""" something something</span>
    155 <span class="hl sng">                 the earth is banana shaped</span>
    156 <span class="hl sng">                 latitude and longitude </span>
    157 <span class="hl sng">                 wouldn't you like to know, maybe """</span>
    158 
    159 relevant <span class="hl opt">=</span> <span class="hl sng">""" Vanderbilt University Medical Center </span>
    160 <span class="hl sng">               Internal Medicine """</span>
    161 
    162 query <span class="hl opt">=</span> extraneous <span class="hl opt">+</span> relevant
    163 
    164 lat<span class="hl opt">,</span> lon <span class="hl opt">=</span> <span class="hl kwd">google_lat_lon</span><span class="hl opt">(</span>query<span class="hl opt">)</span>
    165 
    166 <span class="hl kwa">print</span><span class="hl opt">(</span> 
    167        <span class="hl sng">"Hello. "</span>
    168        <span class="hl sng">"My name is Google. "</span>
    169        <span class="hl sng">"I am really good at guessing what you meant. "</span>
    170       f<span class="hl sng">"Your query was '</span><span class="hl ipl">{query}</span><span class="hl sng">'. "</span>
    171        <span class="hl sng">"Here are the coordinates you probably wanted. "</span>
    172       f<span class="hl sng">"The latitude is</span> <span class="hl ipl">{lat}</span><span class="hl sng">, and the longitude is</span> <span class="hl ipl">{lon}</span><span class="hl sng">. "</span>
    173        <span class="hl sng">"Don't believe me? "</span>
    174        <span class="hl sng">"Here it is again, "</span>
    175        <span class="hl sng">"in a format you can paste into the search bar:</span> <span class="hl esc">\n</span><span class="hl sng">"</span>
    176       f<span class="hl sng">"</span><span class="hl ipl">{lat}</span><span class="hl sng">,</span> <span class="hl ipl">{lon}</span> <span class="hl sng"></span><span class="hl esc">\n</span><span class="hl sng">"</span>
    177        <span class="hl sng">"Told ya. "</span>
    178 <span class="hl opt">)</span>
    179 </code></pre>
    180     <p>
    181      Despite having all that extra junk in the query, this returns the
    182 right answer. Because Google is many things good and evil, but of these
    183 one is certain: Google is <em>really</em> good at search.
    184     </p>
    185     <h2>
    186      How does the code work?
    187     </h2>
    188     <p>
    189      If you inspect the source HTML on the Google Maps website after you
    190 search for something and it centers the map on its best guess, and you
    191 scroll way on down (or Ctrl-F search for it) you’ll find
    192 <code>APP_INITIALIZATION_STATE</code>, which contains latitude and
    193 longitude for the place the map centered on.
    194     </p>
    195     <ul>
    196      <li>
    197       <a href="https://www.google.com/maps?q=something+whose+latitude+and+longitude+you+would+like+to+know,+maybe+VUMC+Internal+Medicine">example
    198 search</a>
    199      </li>
    200      <li>
    201       <a href="view-source:https://www.google.com/maps/search/something+whose+latitude+and+longitude+you+would+like+to+know,+maybe+VUMC+Internal+Medicine/">example
    202 source</a> (you have to copy and paste this link into a new tab
    203 manually, clicking won’t work)
    204      </li>
    205     </ul>
    206     <p>
    207      I use the lovely <a href="https://docs.python-requests.org/projects/requests-html/en/latest/"><code>requests-html</code></a>
    208 Python library to send the query to Google, receive the response, and
    209 search through the response for the part I want to extract. Then I use a
    210 little standard Python to parse the extracted part and save the
    211 important bits.
    212     </p>
    213     <h2>
    214      With great power…
    215     </h2>
    216     <p>
    217      Don’t go crazy with this.
    218     </p>
    219     <p>
    220      The trick is good for leisurely automation of location retrieval when
    221 you have squirrelly queries.
    222     </p>
    223     <p>
    224      If you need real-time mapping of many things, you don’t want this
    225 solution. Use the actual APIs, and work instead on formatting the
    226 queries properly before sending them to Google/OSM.
    227     </p>
    228     <p>
    229      Also, if you try to query too much/too quickly, Google will shut you
    230 out after a little while. Put a few seconds of delay between each
    231 request and run it overnight and/or in automated batches.
    232     </p>
    233     <h2>
    234      Know a better way?
    235     </h2>
    236     <p>
    237      I’d love to know. Drop me a line.
    238     </p>
    239    </main>
    240    <div id="footnotes"></div>
    241    <footer></footer>
    242   </div>
    243  </body>
    244 </html>