Index Build Tools
NOTE: This page was updated on 9/19/16 to reflect significant changes in the index build tools. After many months of hard work, we kind of, sort of have a document ingestion pipeline that seems to...
View ArticleGetting started with NativeJIT
NativeJIT is a just-in-time compiler that handles expressions involving C data structures. It was originally developed in Bing, with the goal of being able to compile search query matching and search...
View ArticleStream Configuration
BitFunnel models each document as a set of streams, each of which consists of a sequence of terms corresponding to the words and phrases that make up the document.Real world documents are usually...
View ArticleA Small Query Language
A challenge in bringing BitFunnel to open source is providing functionality that was previously supplied by portions of Bing upstream of BitFunnel. BitFunnel was designed as a library that takes, as...
View ArticleBitFunnel performance estimation
.slide {border: 1px solid;} Hi! I’m going to talk about two things today.First, I’m going to talk about one way to think about performance. That is, one way you can reason about performance. Second,...
View ArticleSample Data
I’ve been trying to make it really easy to get started with BitFunnel, but we still have a ways to go. From the beginning we put a lot of effort into ensuring our code would build and run on Linux,...
View ArticleSearching for Primes
What do prime numbers have to do with BitFunnel?It turns out we use them to test our matching engine. One of the challenges in bringing up a new search engine is figuring out how to test it. If you...
View ArticleAll's Well That Ends Well
We’ve been having some stability problems of late. In our rush to get some minimal version of the document ingestion pipeline up and running, we created a number of tools for gathering corpus...
View ArticleWhen will BitFunnel be usable?
How long should we expect this project to take? In theory, we should have a relatively easy time guessing how long this project will take because this project is a half-port-half-rewrite whose aim to...
View ArticleDebugging an SEH Crash
Here’s a video showing how I debugged a read access violation that was caused by an earlier buffer overflow. This sort of problem can sometimes be hard to track down, but in this case, a data...
View ArticleHow do make onboarding to BitFunnel easier?
I’ve been working on BitFunnel for roughly six months now. If I look at how I’ve used that time, my guess is that I’ve taken about a month of Mike’s time. If you look at the progress we’ve made, I...
View ArticleBitFunnel Glossary
To get a high level overview of the algorithm, please see this talk transcript. This glossary is incomplete and needs a lot of work! While our plan is to fill out the whole thing, that will probably...
View ArticleWikipedia as test corpus for BitFunnel
Wikipedia is a great test corpus for search engines. It is free and easy to obtain, it carries a license appropriate for research, and at ~59GB uncompressed, it is large, but not too large to fit on a...
View ArticleRow Table Analysis
I spent the weekend implementing code to analyze bit densities in the rows and columns of the row tables. This tool should help us determine whether the row tables are configured correctly. A good row...
View ArticleDebugging Bit Densities
Things are starting to get exciting in the Land of BitFunnel. We’re now at the point where we can ingest a significant fraction of Wikipedia and run millions of queries, all without crashing – and we...
View Article
More Pages to Explore .....