Uplimit - Using machine learning to improve search: A conversation with Daniel Tunkelang and Dmitry Kan

Daniel Tunkelang is one of the instructors of the co:rise Search Fundamentals track. He has been working in search for more than two decades, and is currently a full-time consultant; previously, he held roles at Google and LinkedIn and co-founded the search company Endeca, which was acquired by Oracle in 2011.

Dmitry Kan has 16 years of experience developing search engines for startups and multinational technology giants. Earlier this year, he completed the Search with Machine Learning course on co:rise to uplevel his skills. He is currently Principal AI Scientist at Silo AI and a senior product manager at TomTom, a company focused on map search and navigation.

Daniel and Dmitry recently hosted a Q&A on search for the co:rise community, and we’re excited to share highlights from their conversation here on our blog.

The excerpts below have been edited and condensed for clarity.

Dmitry: I saw a LinkedIn post from you recently where you said in a little bit of a sad tone, “Not everyone shares my passion for search, but I suspect that many would be more excited about search if they understood it better.” What was going through your mind when you wrote that?

Daniel: Well, a lot of people seem to think search is “done,” in that all the big problems are solved. But that’s not true at all. There are still so many opportunities, including improving search by applying the latest developments in machine learning. Search is also a place where you can make a huge impact on the way people interact with machines. So my hopes is that more people start to see that, and get excited about what they can contribute to the field.

Dmitry: That’s a great segue to my next question—why do you think we need machine learning in search today?

Daniel: To start with, the main thing machine learning does well is optimization, which means it can replace a lot of the hand-tuning that’s been involved in building search systems up to now. It can also make that tuning much more sophisticated, because it allows you to work with more variables than you could possibly keep track of when you’re doing things by hand. And then machine learning makes it possible to solve much more difficult search problems, especially in areas like query and content understanding.

Dmitry: Let’s talk about those areas for a minute. When people mention machine learning in a search context, they’re often referring to ranking functions and determining relevance. But there are also problems upstream from ranking—like query and content understanding. What’s your view on the benefit machine learning can bring to bear on those problems?

Daniel: Let’s start by thinking about what ranking does. It takes a search query and all the potential results, and it uses a function to score those results based on their relevance to the query.

A different way to approach this problem is to say, “I have a query, and I'm going to start by trying to represent that query as usefully as possible, before I even look at any documents that I might serve up as results. Also, I have documents, and I’m going to try to represent those as well as I can before I see any queries.” Then those representations can do some of the work for us. For example, if the query and the documents are both mapped to the same set of categories, we can start by retrieving all the documents that are in the same categories as the query.

Machine learning is how we get those representations. It's how we turn the query into a more useful representation, and how we turn the content into a more useful representation. That allows us to be a lot more targeted with the content we actually consider and rank, and in my experience it leads to far better results.

Dmitry: I'm going to ask you my favorite question. What drives you to continue focusing on search? And what drives you to teach these topics and encourage others to get into this field?

Daniel: Of all the things we do with technology, I believe search is the one that most puts us—human beings—front and center. So much of what you see in machine learning and AI is being done to us—feeds, recommendations, advertisements. But search starts with people expressing what _they_ want. My hope for the future is that machines will help us, but that has to start with us expressing our intent. So that’s why I find search so exciting.

As for why I teach search—it comes back to what you asked in the beginning. I think the need for people who can build great search experiences is not met by the supply of people with those skills. And there are so many people out there who could make a big impact in this field if they get just a little bit of a push. If you have a basic knowledge of coding, and you learn a few fundamentals, you can do wonders with the tooling that's out there. So I'm excited that I can be a part of equipping and empowering the next generation of search engineers.

Dmitry: Let's take some questions from the audience. First—is there a specific book you’d recommend for someone getting started with search and machine learning?

Daniel: There's a lot that's been written on learning to rank. I think Chris Manning's book, Introduction to Information Retrieval, discusses it. What’s harder to find is good writing on query and content understanding. There’s one book, Query Understanding for Search Engines, that’s really a more of a collection of essays. You can also check out my blogs at queryunderstanding.com and contentunderstanding.com for a survey of techniques.

Dmitry: Awesome. The next one I'm going to take is, what are your recommendations for integrating information retrieval—for example, retrieving documents with question answering, or returning answers within a context?

Daniel: Great—so question answering is really exciting. Basically, it’s the idea that you can retrieve information from a document, instead of just retrieving the document. A lot of it starts from something called passage retrieval. You see this when you search for something on Google—the answer to your question is extracted and pulled up to the top of the results as “search snippet.”

Passage retrieval has gotten a lot more sophisticated in just the last few years. Not that long ago, Google might have retrieved a passage that contained the exact words you used, maybe with a little bit of variation. But now it's more likely using a vector-based approach to find passages that are similar in the vector space.

The next stage of this, which we haven’t cracked yet, would be to have a search engine that can really understand your question and synthesize an answer from different pieces of content. That’s really going beyond a search engine to what we’d probably call an answer engine. Like I said, we’re not there yet, but it’s exciting to think about.

Dmitry: I've enjoyed this conversation so much. Thanks to everyone for asking your questions, and thanks Daniel for answering them brilliantly as you usually do.

My final note is—if you’re curious about anything we discussed today, I highly recommend that you take a course on search and then start experimenting with what you learn. Daniel and Grant Ingersoll are teaching two classes on co:rise in June. The Search Fundamentals class is a two-week class intended for people with no background in search. The Search with Machine Learning class builds on that and starts two weeks later, so you can take both. These classes are available to anyone in the world, and Daniel and Grant do an amazing job answering questions and making it a really fun, hands-on experience. You can register for one or both courses at corise.com/#search-track.

Daniel Tunkelang

Uplimit instructor, Search Fundamentals and Search with Machine Learning; Machine Learning Consultant

Using machine learning to improve search: A conversation with Daniel Tunkelang and Dmitry Kan

Stay in the loop

LEARN

PLATFORM

RESOURCES

COMPANY

SOCIAL