Gojomo

2006-02-11
Meredith L. Patterson, Query By Example @ CodeCon 2006, 4:45pm Saturday

Continuing prejudicial CodeCon session previews:

Meredith L. Patterson: Query By Example, 4:45pm Saturday @ CodeCon 2006

Query By Example brings supervised machine learning into the realm of SQL in order to provide intuitive, qualitative queries. Within your query, you list a few examples which are LIKE the kind of rows you're looking for, and a few more which are NOT LIKE the kind of rows you're looking for, and using a fast, flexible machine learning algorithm, the database will automatically find rows which are similar to what you're interested in.

At the moment, QBE only works on real-valued data, i.e., integers and floats. Future releases will address text data and perhaps even binary data, but for the moment, this is a limitation of the system. (Don't worry. There's a lot of real-valued data out there.)

This was a graduate student project that got sponsored by Google's Summer of Code and has added the above-described new capability to PostgreSQL. Patterson gave a great presentation about analyzing (and even making!) DNA at last year's CodeCon, and Query By Example looks useful for a lot of common datamining operations, such as the recommendation systems of online retailers and content-aggregators.

It's apparently based on a support vector machine, which looks worth the effort to truly understand (though I don't yet). It's something about calculating the best dividing plane between two sets of example coordinates in a muli-dimensional vector space, then using that plane to classify other coordinates.

I would guess that when eventually generalized to text, the technique would view the presence or absence of any interesting term as an independent dimension with a binary coordinate.

Update (8:35pm Saturday): I almost forgot to mention: this is another problematic project name. (That makes 6 of the first 10 presentations that fall short of my standards for effective names.) There are already concepts that go by "Query By Example" in the SQL and full-text realms. Now, the name may make more sense for a system, like the one presented, where exact rows (or documents) are used as the 'query' -- as opposed to the prior meanings where you provide some fragmentary field values to match. But, squatter's rights count, and I think Patterson's QBE might be underestimated by SQL-heads who see "Query by Example" and think of the previous 'example values' meaning rather than 'example items'.

Which would be too bad, because this is a very neat capability backed by very interesting and general algorithms. Reading more about the support vector machine approach, I see that it can be used, among other things, for training a web search engine based on implicit user feedback. I strongly suspect such feedback has become even more important than the link structure of the web in commercial search engine operation.

During the Q&A period, the last question actually made the presenter cry -- but in a good way. CodeCon program chair Len Sassaman asked presenter Meredith Patterson, who he met at last year's conference, to marry him. She accepted. Not something you often see at a technical conference, but CodeCon has always been a bit special.

Technorati Tags: , , , , , , , , ,


Comments: Post a Comment