Private search: Take a walk on the client side
Client side search done right
We tend to take keyword search for granted. Email, documents, text messages — as long as we remember a word or phrase that shows up in the content we’re looking for, we expect any modern software to find it for us instantly.
But as with many things we take for granted (like password resets), keyword search comes with a cost.
In the case of search, the usual price we pay is our privacy. It’s pretty straightforward for a platform to help us find something we’re looking for by keyword … so long as we give them complete access to every word in every one of our documents.
For Skiff, that approach is not an option. We end-to-end encrypt all data stored on Skiff so that we could never access it even if we wanted to.
If you ask us which of your documents the word “callipygian” shows up in, we’re not going to be able to tell you. We can’t see your documents, so how could we?
But not offering keyword search isn’t an option either — it’s a dealbreaker, a must-have for any collaboration platform. So, as with our password-reset process, Skiff had to rethink keyword search from the ground up to design it in a way that respects and preserves your privacy.
The Standard Approach: The Server-Side Search Index
The standard approach to keyword search — the one used by other collaboration platforms like Notion and Google Docs — is to create and store an inverted index that encompasses every one of the documents you store on the platform.
An inverted index is a list of every word that appears in any of your documents, in which each word corresponds to a list of all the documents that include that word. This data structure allows for fast and efficient keyword searches, and it’s the same basic design that underlies most search engines, including Google.
After a platform creates an inverted index for all of your documents, theystore it on their server, and then when you run a search query for “callipygian,” they simply pass your query to the index and return the list of potential documents you might be looking for.
The key thing to note is that this approach only works if you’re willing to allow the platform to store and regularly access an exhaustive list of every word that shows up in every document you have access to. For Skiff, that’s a non-starter.
The Skiff Approach: Take A Walk on the Client Side
When you use Skiff, there’s only one place where your data is ever unencrypted: on your own device.
That means that — in order to let you carry out keyword searches — Skiff needs to give you the tools to run fast and efficient searches locally on your own device (a.k.a. “client side”).
So that’s exactly what we do. The process works like this:
1. The first time you log into Skiff and decrypt your documents on your own device, your Skiff client (the software that’s running on your machine) creates a local search index using our custom search algorithm, Trawler, and stores it in your browser. (Trawler is Skiff’s open-source, client-side search algorithm based on Flexsearch.
2. Once the client-side index has been created, Trawler parses and indexes your documents.
3. After the index has been created (and while you’re logged into Skiff), your Skiff client regularly checks to see whether any of the documents you have access to have been updated. If they have, it re-indexes them to reflect the changes.
4. While you’re using Skiff, any keyword searches you run query your own local index. Skiff’s search experience is as fast and convenient as any other platform — with the added benefit that your privacy remains intact.
Thanks to this design, Skiff users get all the benefits of snappy, efficient keyword search without having to turn over the keys to their data.
The takeaway: When you use keyword search on other platforms, they rummage through your files to find what you’re looking for. When you use it on Skiff, we give you the algorithm to get the job done, but let you carry out the search on your own device, in complete privacy.