site

files for beauhilton.com
git clone https://git.beauhilton.com/site.git
Log | Files | Refs

til-llm-colbertv2.md (998B)


      1 # Playing with ColBERTV2 Embeddings and Retrieval
      2 
      3 <time id="post-date">2024-05-09</time>
      4 
      5 <p id="post-excerpt">
      6   There are a lot of embedding models out there for LLMs.
      7   ColbertV2 is a neat one.
      8   Here are some thoughts and code examples.
      9 </p>
     10 
     11 ## ColbertV2
     12 
     13 The way you shove data into any embedding model can make a difference,
     14 and ColBERT is no different.
     15 I started off just giving it an html file 
     16 with the entirety of a website ([vimbook's print-site one-pager](https://www.vim-book.org/print_page/)).
     17 This had a bunch of junk that wasn't needed, 
     18 which occasionally affected the 
     19 
     20 [sqlite-utils insert-files](https://sqlite-utils.datasette.io/en/stable/cli.html#id43)
     21 https://github.com/bclavie/RAGatouille
     22 
     23 Multiline script example:
     24 
     25 ```sh
     26 # enable multilib - see link below
     27 paru # make sure things are up to date generally
     28 paru -S android-tools android-sdk-build-tools # includes adb and other goodies
     29 reboot
     30 ```
     31 
     32 Image example: ![Source selection](/images/ncmpcpp-mopidy-selector.png)
     33