Gojomo

2006-02-13
TinyURLs are evil URLs

TinyURLs are awful, for usability, for stability of reference, and for browsing safety. Please don't use them in wikis, email, or anywhere.

The problems:

  1. TinyURLs are opaque, hiding their ultimate destination from users and software. This might be a minor inconvenience, except...
  2. TinyURLs can and often are used to send people to spam or malware sites. One recent study found a significant number of webpages try to exploit browser flaws to compromise your computer's security. It's dangerous and rude to suggest someone visit an obfuscated TinyURL. And because of (1), above, even known problem sites can't be programmatically blacklisted against abusive linking by, for example, wiki software.
  3. TinyURLs introduce a dependency on a third-party service that could go away or be completely compromised in the future. Not only could that TinyURL you see today send you to a porn spam malware site -- but someday in the future, a takeover of the TinyURL domain could send every TinyURL ever used to porn spam malware sites.
Note that the policy of TinyURL to disable "spam" URLs does not remedy (2); in fact it introduces another problem, the potential for third parties to disable your TinyURLs by reusing them in spam -- triggering a takedown of your URL through no fault of your own.

(A TinyURL competitor, SnipURL, also lets the creator of a short URL modify it in the future -- making it possible to popularize a SnipURL, have it pass manual review, and then change to a problem URL at any arbitrary time in the future.)

There is very little information at the sites of TinyURL or its competitors to assess the long-term stability and trustworthiness of these services -- but even if the service were run by a large venerable institution with an impeccable reputation, most of these problems would remain. And other problems would arise, as venerable institutions are subject to social and political pressures that could make them more willing to, for example, censor or redirect certain short URLs. (If the Chinese Communist Party were to demand that a TinyURL to a dissident page get remapped to a state propaganda page, the same large stable institutions most likely to provide long-lived servers are also most likely to comply to authoritarian change requests.)

URLs should be naked, as endowed by their creators with the inalienable rights of meaningfulness, transparency, and stability. Friends don't let friends use TinyURLs.

Technorati Tags: , , , , , ,


2006-02-12
Daniel S. Wilkerson, Elsa/Oink/Cqual++ @ CodeCon 2006, 4:45pm Sunday

And the very last prejudicial CodeCon session preview:

Daniel S. Wilkerson: Elsa/Oink/Cqual++, (scheduled for but starting much later than) 4:45pm Sunday @ CodeCon 2006

Daniel Wilkerson hooked Elsa [a C/C++ parser] and Cqual [a type-based analysis tool that provides a lightweight, practical mechanism for specifying and checking properties of C programs] together to make Cqual++. It resides in a kind of super-project called Oink which is designed to allow multiple backends for Elsa to cooperate (the only example of which presently is Cqual++). For example, the dataflow analysis is pretty generic and other dataflow-based C++ analyses could be written using it and added to Oink.

The major thing you can do with a cqual-style dataflow analysis is you can find bad dataflow bugs...

Nothing to speculate here. Entry only for completeness sake.

Technorati Tags: , , , , , , ,

Adam Souzis, Rhizome @ CodeCon 2006, 4pm Sunday

Continuing prejudicial CodeCon session previews:

Adam Souzis: Rhizome, (scheduled for) 4pm Sunday @ CodeCon 2006

Rhizome is a open source project written in Python which consists of a stack of components:

At the top is Rhizome Wiki, a wiki-like content management system that let users create structured data with explicit semantics in the same way you create pages in a wiki.

Rhizome Wiki runs on top of Raccoon, a simple application server that uses an RDF model as its data store. Raccoon presents a uniform and purely semantic environment for applications. This enables the creation of applications that are easily migrated and distributed and that are resistant to change.

Raccoon uses two novel technologies to getting RDF in and out of the system: First, "shredding", which extracts RDF out of content such as HTML, wiki text, and various microcontent formats. Second, RxPath, a deterministic mapping between RDF's abstract syntax and the XPath data model which lets developers treat RDF as regular XML and allows them to use standard XML technologies such as XPath, XSLT, and Schematron to query, transform, present and validate RDF data.

I love wikis. I can't wait to see effective ways to allow wiki-style editting of (slightly) more-structured data than the free-form text for which wikis are known. But anything RDF tends to make my eyes glaze over. Even when there's something RDF-ish that's interesting, it's couched in terminology and indirection that hides the interesting parts.

So it's the aspects of Rhizome that insulate users from RDF -- extracting RDF automatically from familiar syntaxes, making RDF more amenable to usual XML operations -- that I look forward to seeing in this presentation. I'd be tickled pink if the demo showed a plausible interface for unsophisticated users -- for example, enthusiasts in non-technical fields -- to generate useful RDF about their fields... but I'm not holding my breath.

Technorati Tags: , , , , ,

Nathaniel Smith, Monotone @ CodeCon 2006, 3:15pm Sunday

Continuing prejudicial CodeCon session previews:

Nathaniel Smith: Monotone, (scheduled for) 3:15pm Sunday @ CodeCon 2006

monotone is a free distributed version control system. it provides a simple, single-file transactional version store, with fully disconnected operation and an efficient peer-to-peer synchronization protocol. it understands history-sensitive merging, lightweight branches, integrated code review and 3rd party testing. it uses cryptographic version naming and client-side RSA certificates. it has good internationalization support, has no external dependencies, runs on linux, solaris, OSX, windows, and other unixes, and is licensed under the GNU GPL.
I'm a secure-hash-for-content-identification fetishist, so I really like Monotone's approach of naming all versions by their SHA1 hash. Also, despite my experience being almost exclusively with centralized version control like CVS, MS SourceSafe, and SVN, I find systems like Monotone with no inherent central server or canonical version intellectually appealing.

But, since decentralized systems can be harder to explain via a childishly simple model, and can obscure the focal points for casual understanding/contribution, I doubt they will often be the right choice for projects that seek a wide audience. Still, I'd love to see some vociferous advocacy of the Monotone model in the presentation. Also, if any of the tools, especially any visual interfaces, help tame the complexity of a centerless system.

The Monotone docs claim their internal 'netsync' protocol is far more efficient than rsync or Unison, so I'm curious if it is a separable piece applicable elsewhere.

Technorati Tags: , , , , ,

Davies, Newman, O'Connor, & Tam: Deme @ CodeCon 2006, 1:15pm Sunday

Continuing prejudicial CodeCon session previews:

Todd Davies, Benjamin Newman, Brendan O'Connor, Aaron Tam: Deme, 1:15pm Sunday @ CodeCon 2006

Deme is being developed for groups of people who want to make decisions democratically and to do at least some of their organizational work without having to meet face-to-face. It provides the functionality of message boards and email lists for discussion, integrated with tools for collaborative writing, item-structured and document-centered commentary, straw polling and decision making, and storing and displaying group information. It is intended to be a flexible platform, supporting various styles of group interaction: dialogue and debate, cooperation and management, consensus and voting…
This is an area of longstanding interest to me; creating online spaces that could reach convergence on a consensus, with the help of discussion and ranking widgetry, was a big theme at the founding of Activerse (1996-1999), though we eventually went in the direction of a decentralized buddy list.

The complementary-and-contrasting yin-and-yang of software for enabling online groups is:

(1) Technical decisions do influence the character of a community and the results it produces, including whether it is welcoming of newcomers, maintains an institutional memory, converges or diverges on major issues, fans or suppresses 'flaming' communication, and so forth. The Arrow Impossibility Theorem demonstrates that every voting system exhibits 'artifacts' where idealized goals cannot be met simultaneously; larger collaborative systems, which go beyond voting to include other forms of interaction, both magnify the potential for artifacts, and offer the chance for norms of behavior and other feedback to ameliorate any untoward gaming.

But...

(2) Programmers consistently overestimate the importance of technical measures, over-engineering in anticipation of problems that may never arise, and inappropriately reaching for technical solutions to whatever problems do come up. In Wikipedia founder Jimmy Wales' stump speech, he likens this tendency to caging restaurant-goers to protect them from each others' steak knives.

Solutions like Deme have to be conscious of the influence of technical design, but not enthralled by its possibilities. The proper balance is difficult to determine -- and is certainly different for different communities.

Deme has four presenters listed, each tackling a different aspect of their mission or technology. I predict they run long and get truncated.

As aspect I don't see mentioned in their materials is the transparent management of organization resources -- especially finances. I've long wanted to see someone develop what would essentially be an open, multiuser, web-based QuickBooks for distributed organizations. Every cent inbound and out could be tracked, linked to the group goals and decisions, and mapped back to the contributors who made things happen. This would also open new avenues for voting on priorities by contributing or allocating contributions.

The Deme team might find it very helpful to review Christopher Allen's series of articles on Collective Choice:

Technorati Tags: , , , , , ,

Quinn Weaver, Dido @ CodeCon 2006, 12:30pm Sunday

Continuing prejudicial CodeCon session previews:

Quinn Weaver: Dido, 12:30pm Sunday @ CodeCon 2006

Dido grew out of a frustration with the open-source telephony platform Asterisk's dialplan system... I quickly realized that what I wanted to do--reordering menu options in voice menu, by popularity--was impossible in Asterisk... I ended up creating Dido, a radically new system that makes use of declarative XML templates, interspersed with Perl code that generates more XML. The result is a programming model that mimics the way dynamic Web pages are written...
Very little descriptive info on the project website, but from the above I'm guessing the essence is: make call-in voice-response systems as easy to author as dynamic web pages, without arbitrary limitations as imposed by previous offerings. The code is available for download, and "[t]he audience will be able to call into the demo system during the talk, using their cell phones, and traverse it independently" -- which is exactly what a demo of this functionality should include.

For what it's worth, there's nothing wrong with this project name! No obvious name collisions in the tech domain nor prominent untoward connotations in other meanings of 'dido'.


2006-02-11
Meredith L. Patterson, Query By Example @ CodeCon 2006, 4:45pm Saturday

Continuing prejudicial CodeCon session previews:

Meredith L. Patterson: Query By Example, 4:45pm Saturday @ CodeCon 2006

Query By Example brings supervised machine learning into the realm of SQL in order to provide intuitive, qualitative queries. Within your query, you list a few examples which are LIKE the kind of rows you're looking for, and a few more which are NOT LIKE the kind of rows you're looking for, and using a fast, flexible machine learning algorithm, the database will automatically find rows which are similar to what you're interested in.

At the moment, QBE only works on real-valued data, i.e., integers and floats. Future releases will address text data and perhaps even binary data, but for the moment, this is a limitation of the system. (Don't worry. There's a lot of real-valued data out there.)

This was a graduate student project that got sponsored by Google's Summer of Code and has added the above-described new capability to PostgreSQL. Patterson gave a great presentation about analyzing (and even making!) DNA at last year's CodeCon, and Query By Example looks useful for a lot of common datamining operations, such as the recommendation systems of online retailers and content-aggregators.

It's apparently based on a support vector machine, which looks worth the effort to truly understand (though I don't yet). It's something about calculating the best dividing plane between two sets of example coordinates in a muli-dimensional vector space, then using that plane to classify other coordinates.

I would guess that when eventually generalized to text, the technique would view the presence or absence of any interesting term as an independent dimension with a binary coordinate.

Update (8:35pm Saturday): I almost forgot to mention: this is another problematic project name. (That makes 6 of the first 10 presentations that fall short of my standards for effective names.) There are already concepts that go by "Query By Example" in the SQL and full-text realms. Now, the name may make more sense for a system, like the one presented, where exact rows (or documents) are used as the 'query' -- as opposed to the prior meanings where you provide some fragmentary field values to match. But, squatter's rights count, and I think Patterson's QBE might be underestimated by SQL-heads who see "Query by Example" and think of the previous 'example values' meaning rather than 'example items'.

Which would be too bad, because this is a very neat capability backed by very interesting and general algorithms. Reading more about the support vector machine approach, I see that it can be used, among other things, for training a web search engine based on implicit user feedback. I strongly suspect such feedback has become even more important than the link structure of the web in commercial search engine operation.

During the Q&A period, the last question actually made the presenter cry -- but in a good way. CodeCon program chair Len Sassaman asked presenter Meredith Patterson, who he met at last year's conference, to marry him. She accepted. Not something you often see at a technical conference, but CodeCon has always been a bit special.

Technorati Tags: , , , , , , , , ,

Michael J. Freedman, OASIS (Overlay Anycast Service InfraStructure) @ CodeCon 2006, 4pm Saturday

Continuing prejudicial CodeCon session previews:

Michael J. Freedman: OASIS (Overlay Anycast Service InfraStructure), 4pm Saturday @ Codecon 2006

OASIS (Overlay Anycast Service InfraStructure) is a locality-aware server selection infrastructure. At a high level, OASIS allows a service to register a list of servers, and then, for any client IP address, answers the question, ``Which server should the client contact?'' Server selection is primarily optimized for network locality, but also incorporates factors like liveness and, optionally, load. OASIS might, for instance, be used by CGI scripts to redirect clients to an appropriate download site for large files. It could be used by IP anycast proxies to locate servers. Currently, in addition to a simple web interface, we have implemented a DNS redirector that performs server selection upon hostname lookups, thus supporting a wide range of unmodified legacy client applications.
Their homepage demo apparently is supposed to show a circle on the embedded map showing my location -- it's not working for me here from CodeCon, even after typing in my external IP address.

From the overview, it's clear that the reference solution OASIS has in mind for comparison is probing a new IP address from multiple sites on-demand at the moment they want to know the right server. But, they report this solution has problems: latency and too much redundant traffic probing IPs that are near each other. So OASIS is constantly mapping network blocks in the background, and remembering and updating its results for lags and geographic location guesses. Because this effort is ongoing, amortized over time and over diverse applications, the costs in latency and traffic are less than the naive on-demand solution. In fact, as more applications share the same location architecture, the marginal cost for each new one drops.

There would seem to be some overlap with the constant, distributed net-mapping done by the net DIMES project based at Tel-Aviv University.

There's already a OASIS abbrieviation in use in technical/internet circles: "a non-profit, international consortium that creates interoperable industry specifications based on public standards such as XML and SGML." So yet again at CodeCon, this is a bad project name on uniqueness grounds alone. (Not to mention: how is a worldwide always-on active map of the Internet anything like a remote desert oasis?) Do we need a boot camp that teaches engineers to pick better evocative and unique project names?

Technorati Tags: , , , , ,

David Barrett:, iGlance @ CodeCon 2006, 3:15pm Saturday

Continuing prejudicial CodeCon session previews:

David Barrett: iGlance, 3:15pm Saturday @ Codecon 2006

Basically iGlance tries to recreate the advantages of "being there", but remotely, over the internet. In essence I asked "What's so great about being physically present? Well, I can see you, I can talk to you, I can see your computer, I can use your computer... Heck, I can do all that online!" With iGlance, you can continue using the same "social tools" you've refined for when physically present, but from any internet connection. Peeking over a cubical wall is replaced with glancing at your buddy list. Yelling across the room is push-to-talk. Screen sharing is asking you to sit at my keyboard. Everything is oriented around this central metaphor.
More deja vu for me. This yearning to recreate the benefits of physical co-presence online, by reusing the familiar metaphors of co-presence, was also what animated my Austin-based Internet startup Activerse (Web 1.0 era, 1996-1999).

It appears iGlance is an evolution of classic net collaboration tools, integrated, updated to assume VOIP (via push-to-talk) and webcams, and open-sourced. It's not clear how colleagues' current IP addresses are found -- is there a dependency on a lookup service @ quinthar.com? To the extent it includes nice features and unifying philosophy, I would expect its winning techniques to be eventually adopted by the big IM networks and sotware packages.

Technorati Tags: , , , ,

Hansen & Thiede, Djinni @ CodeCon 2006, 1:15pm Saturday

Continuing prejudicial CodeCon session previews:

Robert J. Hansen, Tristan D. Thiede: Djinni, 1:15pm Saturday @ Codecon 2006

Djinni is an extensible, heavily documented framework for the efficient approximation of problems generally thought to be unsolvable [in polynomial time]. It doesn't give you the optimal solution, but generally gives you very close to it, and in a very reasonable time frame.
Djinni is apparently based on "a new approximation algorithm for NP-complete problems" by Drs. Jeff Ohlmann and Barrett Thomas of the University of Iowa. However, I can't find in the Djinni docs a simple explanation of what their insight was, in contrast to what came before: how it's different, how it might be better. An expert might be able to recognize the novelty by parsing the Djinni User Guide, but I can't.

The writing in the User Guide (which is actually more of a tutorial walkthrough applying Djinni to Boggle) suggests Mr. Hansen tends a bit towards discursive flourishes. (Example: "Bad or out-of-date documentation is something no programmer should tolerate. Far better there be no documentation at all than significantly out-of-date documentation."). This could make his presentation entertaining... or awkward as whimsical asides fall flat. We'll see.

Update (8:13pm Saturday): Presenter reported that the Ohlmann/Thomas approach is qunatifiably better than another popular solution, whose name I neglected to note, which was also denigrated as being comparatively complex. The whimsical tendencies of the presenters did get laughs... some genuine, some uncomfortable.

Technorati Tags: , , , , ,

Wilkerson & McPeak, Delta @ CodeCon 2006 12:30pm Saturday

Continuing prejudicial CodeCon session previews:

Daniel S. Wilkerson, Scott McPeak: Delta, 12:30pm Saturday @ CodeCon 2006

Delta assists you in minimizing "interesting" files subject to a test of their interestingness. A common such situation is when attempting to isolate a small failure-inducing substring of a large input that causes your program to exhibit a bug.
'Delta' has a number of glowing testimonials from the gcc team and users on their project page. The two big wins there seem to be: (1) When a giant range of proprietary code shows a gcc bug, minimizing it via delta makes it concisely reportable, where it couldn't be effectively reported at all otherwise; (2) people often submit larger-than-minimal trigger cases, and delta is used to (automatically always?) minimize these.

Very neat. There must be other applications, too. Considering yesterday's theme of dissecting novel malware, perhaps this could be used to discover the minimal effective countermeasures against a new threat? (Apply all draconian countermeasures. Loosen progressively until finding smallest set that works against the threat.) Or bioinformatics? One can easily imagine real heredity, mutation, and immunology working this way at the microscopic level to explore the survivability possibility space.

I'm most curious to see the actual efficient 'winnowing' processes used by the tool. (I hope there's a visualization or animation of some sort.) Is it completely systematic, guaranteed to determine the smallest possible trigger case possible by dropping lines? That seems impossible, given the size of input files: a thousand-line input has 2^1000 possible minimizations, too many to search exhaustively. How many trials does it run, how fast, in a typical minimization (independent of the user-supplied interestingness predicate)? Perhaps there's a random element -- so that repeated reruns (with different seeds) or a willingness to wait longer could discover different/better minimal cases.

Update (Saturday 1:26pm): The search for lines that are droppable is dumb and deterministic -- essentially a binary search for logarithmically-sized subranges that can be eliminated. It's thus somewhat sensitive to the alignment of content with respect to the subranges. Its performance is thus improved a lot by a 'flattening' preprocessing which manages to group units of the input (eg C code blocks) into a single line. Wilkerson suggested a randomization element might help.

This project, too, deserves a better name. (Am I obsessed with names or what?) Something like 'occamizer'. It's available and descriptive.

Technorati Tags: , , , ,


2006-02-10
Joe Stewart, The Reusable Unknown Malware Analysis Net (Truman) @ CodeCon 2006, 4:45pm Friday

Continuing prejudicial CodeCon session previews:

Joe Stewart: The Reusable Unknown Malware Analysis Net (Truman), 4:45pm Friday @ CodeCon 2006

Truman can be used to build a "sandnet", a tool for analyzing malware in an environment that is isolated, yet provides a virtual Internet for the malware to interact with. It runs on native hardware, therefore it is not stymied by malware which can detect VMWare and other VMs. The major stumbling block to not using VMs is the difficulty involved with repeatedly imaging machines for re-use. Truman automates this process, leaving the researcher with only minimal work to do in order to get an initial analysis of a piece of malware.
It's nice for this to be the same day, but it really should have been adjacent to the SiteAdvisor talk -- perhaps even just before it, thus setting the technical stage for the overall SiteAdvisor process.

That a Linux boot image and set of (Perl!) scripts could reconstruct the entire running state of a Windows system, and switch over to it, without a usual full-fledged virtualization layer, seems impressive. But in a way, isn't that what the 'hibernate' feature of (for example) Windows XP accomplishes on resume? Could you do this more simply by just wrangling the 'hibernate' image of a Windows system around? (Is this in fact what Truman does? I can't tell -- my preview powers fail me, because of the thin level of description at the project home page.)

Certainly having free open-source utility boot images and scripts for these tasks is nice, and there might be other testing scenarios, besides catching malware, where this approach beats virtualization. (The SiteAdvisor presenter mentioned that licensing costs of virtualization systems were an issue at scale.)

Technorati Tags: , , , ,

Harwood and Jacobs, Localhost @ CodeCon 2006 4pm Friday

Continuing prejudicial CodeCon session previews:

Aaron Harwood & Thomas Jacobs: Localhost, 4pm Friday @ CodeCon 2006

Localhost is a program that lets you access a shared, world-wide file system through your web browser. This file system is maintained in a fully decentralized way by all of the computers running Localhost. The program uses BitTorrent technology, and P2P Distributed Hashtable technology called Kademlia. (Localhost is a modification of Azureus.)
Great idea. Another bad name.

Localhost creates a wiki-liek global virtual hierarchical filesystem "containing" torrents. Wiki-like, anyone can edit any node of the directory tree. Un-wiki-like, all versions and branches coexist as long as anyone has viewed (and keeps caching while running the client) that version. People newly browsing to a new node apparently get the "most popular" version. (Most popular globally or locally, I wonder.)

In functionality, this is very similar to an idea we kicked around at Bitzi in early 2001. We called it "Chaotegories", and was intended as a bigger, messier, overlapping-visions way to categorize media files in a vaguely DMOZ-like way. There'd be a "majoritarian" view of the tree -- summed from everyone's preferences -- but your personal moves of items and categories to and fro would be persistent for you, and perhaps others in declared agreement with you, so that's what you and they would see. (Consistent with the idea of Bitzi being a canonical reference site, we would have hosted the tree -- allowing mirroring -- unlike in Localhost, where it lives in the network.)

(I think I had several 'chaotegories' domain names for a while... the problem with reserving a marginal obscure name, then letting it lapse, is that some speculators/squatters specifically grab names on expiration. So if you want to then come back to it -- your earlier registration is worse than if you'd never registered it at all. Chaotegories.com is held by a squatter now.)

I suspect the system of Localhost would have to include some dampener to prevent the "popularity breeds popularity" effect, where the fact most people never go past the top versions means they enjoy self-reinforcing dominance. I look forward to seeing the interface for choosing versions "below" the top one. It looks susceptible to intentional pollution.

The natural extension would be to allow multiple signed roots to which only "leagues" of coordinating, loosely-trusted users can make changes. (Perhaps, everyone sharing some secret would have mutual edit rights for a certain rooted subtree.) Leagues would compete, merge, split, and so forth. (This variant is a lot like a completely P2P Wikipedia idea I've kicked around with a few people.)

A neat feature would be: given a target torrent, what are all the different paths that lead to it (with weights)? But that begs the question: would a nonhierarchical del.icio.us-style tagging system be more flexible and appropriate here? I know that idea will be tested soon.

Technorati Tags: , , , , , , , ,

Vyzovitis & Mirkin: VidTorrent/Peers @ Codecon 2006 3:15pm Friday

Continuing prejudicial CodeCon session previews:

Vyzovitis & Mirkin: VidTorrent/Peers, 3:15pm Friday @ Codecon 2006

VidTorrent is a protocol for global scale cooperative real-time streaming over the Internet. At a high level, VidTorrent is a peer-to-peer protocol that builds an adapative overlay mesh suitable for real-time streaming. The protocol works by aggressively probing for bandwidth and minimizing latency.
As their page seems to admit with an aside "about the name," VidTorrent is a name likely to confuse users. It's not based on BitTorrent or associated with BitTorrent, Inc. I'm all for vaguely evocative names -- like the "-ster" names after Napster and Friendster, the droppd vowel names after Flickr, and doubled vowels as in Google in Yahoo. This is too much like BitTorrent without any official relationship.

(I think the bad naming extends to the hosting research group, "Viral Communications". They say "Viral systems work by putting the intelligence at the edge and using terminals and radios as cooperative, agile, scalable elements that build networlks opportunistically" -- but that seems to match neither the classic nor nouveau marketing/adoption/memetics understood meanings of "viral". But that's neither here nor there.)

The technology looks better than the naming. The exact details appear to be hidden in the code, or even worse, PDFs of past presentations -- so perhaps this will be one of those rare demos that communicates more than the project website? (They did win a "Best demo" award at the January IEEE Consumer Communications and Networking Conference.)

My biggest question would be: is "live" (or nearly so) streaming important anymore? It is if you're trying to replicate broadcast TV, but its "liveness" is turning out to be as much an accident of the implementation technology as an inherent quality. Digital satellite lag, content-censorship bleep delays, Tivos, pay-per-show downloads, the real BitTorrent, and lots of other trends suggest people don't care that much about jitter across time of seconds, minutes, hours, or days. Only people in the same room need to see something in complete sync -- and even watching major events via Tivo, while a TV in the next room over watches the same event 'live', hasn't disrupted enjoyment in my recent experience.

So any "real-time" streaming solutions, no matter how cool the technology, may be optimizing the wrong thing and fighting the last war. Ship the shows, not the seconds, I say. Perhaps VidTorrent will also have useful tech for this more practical approach.

Update (4:31pm Friday): Most interesting idea was grouping overlay net peers together into trees matched by upstream bandwidth, as a way to manage asymmetry and heterogeneity of capabilities. As would be expected, they considered ('real-time') 'streaming' a distinct and important problem as compared to discrete-file-based p2p distribution, rather than just an arbitrary artifact of current media habits that can be discarded. I wondered, but did not get to ask if they have benchmarked their system against file-centric p2p, like say BitTorrent, to see how total throughput compares on similar constellations of machines.

Technorati Tags: , , , , ,

Tom Pinckney, SiteAdvisor @ CodeCon 2006 1:15pm Friday

To continue my series of prejudicial CodeCon session previews:

Tom Pinckney: SiteAdvisor, 1:15pm Friday @ Codecon 2006

[W]e built an army of robot testers which click around the net looking for Web forms, downloads, exploits, pop-ups, etc. We automatically download, install and test every program in a fresh virtual machine. We submit unique e-mail addresses on forms so we can track any resulting spam. We run kernel hooks that look for new processes or executables that may indicate an exploit. A workflow system routes the test to a human operator if the bots detect an error.
Awesome idea. Site Advisor may want to collaborate with 4:45 presenter Joe Stewart (Truman), who claims "malware is increasingly able to detect the presence of virtual machines".

SiteAdvisor recently got funding from, among others, Google. I wonder, will they warn users about the spyware-like aspects of Google's toolbar, desktop search, and other services? The funding is almost certainly a way to help innoculate Google against such accusations, by helping to draw a bright line in the sand, about where Google wants the line to be, with themselves on the "right" side and many of their competitors on the "wrong". (See also from May 2004: Google's self-innoculation.) (Correction 11pm Friday: I misread their press release -- SiteAdvisor's investors "include early investors in" Google, not Google itself. In my defense, a Google investment, to help Google down-rank evil sites and innoculate Google's own desktop offerings from criticism, would make a lot of sense!)

Update (2:29pm Friday): Saw the latter half of this presentation. Mostly as expected, and impressive. I neglected to realize that their own advisor toolbar necessarily collects URL histories like other spyware-ish toolbars. Brad Templeton suggested SiteAdvisor use technical measures to let toolbar users check sites without revealing their own visit history. Pinckney suggested SiteAdvisor had considered this, but he didn't seem too dedicated/open to the idea of fixing this issue. Essentially, "worry about giving your details to those other sites -- don't worry about us!" Should fit right in with Google's agenda.

Technorati Tags: , , , , , , , ,

Lance James, Daylight Fraud Prevention @ CodeCon 2006, 12:30pm Friday

To begin my series of prejudicial CodeCon session previews:

Lance James (Secure Science): Daylight Fraud Protection, 12:30pm Friday @ Codecon 2006

Daylight Fraud-Prevention* (DFP) is a suite of technologies offering a powerful proactive defense against scammers and online criminals...

[using] ...a combination of detection, identification, prevention, and tracking methodologies.

The DFP description and website are long on highfalutin' terminology but short on details, but here's what it appears DFP does in more plain language:

  1. DFP rejects image requests without acceptable REFERRER info, so that phishers can't easily use your own hosted images against you.

    One possible phisher countermeasure might be to get a target's browser to visit the source page in an unnoticed way, loading its cache with the images using the right referrer -- then reusing the images (unless the source site cripples image caching) such that the use generates no hits against the source site.

    More likely, phishers would just download the images and host them at the same fly-by-night sites (or hijacked machines) used for their other hosting. Which brings us to...

  2. DFP watermarks every outgoing image to be uniquely trackable and correlatable with original requestor -- so that if it's later seen on a phishing attempt, the perpetrator's IP address on the original fetch can be retrieved.

    Possible phisher countermeasures include scrubbing received images t remove the watermarks, or most likely just ensuring that another IP not traceable to them appears in the original site logs. For example, the images could be swiped from some innocent rube's session-in-progress (or cache on disk), setting up a convenient patsy. Or most likely, they'd just use a IP-hiding proxy to download the images in the first place. Which brings us to...

  3. DFP attempts to discover the real IP of any site accesses, either when retrieving images or trying out harvested username/password combos. They probably do this by consulting the extra headers added by polite proxies or some sort of Java or Javascript page insert.

    Indeed, visiting their test page discovered my private NAT address, apparently by use of a Java applet. Visiting the same page via the free Anonymizer gave a blank page warning that an Applet had been disabled. Visiting the original test page after disabling Java in my browser gave the same blank, inert page. (The Applet itself must trigger redirection to the page that reports the discovered IP.)

    Phishers are likely to just use effective anonymizing proxies or disable Java before interacting with targets sites. Or, cause Java itself to give misleading IP info, by using a private NAT address that looks like an open-internet address.

  4. DFP works to hide form parameters from locally-installed malware that is presumed able to spy even on SSL transactions. I suspect it does this through transforms on form parameter names on initial outbound display that are reversed on submission, or perhaps Javascript-based scrambling of form contents.

    DFP form-parameter scrambling could be considered just a mildly good hygenic measure against the most simpleminded traffic-scrapers. But largely, once you assume deeply intrusive malware is on the target's machine, it's game over. So what if you scramble the forms? The malware can read all keystrokes, package up and send off entire cache contents and form submissions for studious analysis by smart humans, and so on. A little parameter-scrambling isn't going to help that.

All these steps are reasonable and likely to make marginal improvements in fraud-resistance, but all could be easily defeated by a savvy phisher aware of their existence. All send the signal: we're more vigilant than average, and so might encourage lazy phishers to concentrate their efforts elsewhere. ("Bear menaces two men. One puts on sneakers. 'You can't outrun a bear.' 'I only have to outrun you.'")

DFP techniques wouldn't be more secure via obscurity -- but they might entrap a larger proportion of naive phishers if generally not publicized. So SecureScience might enjoy more success showing their wares only in private to potential customers (banks), rather than on a public demo site and CodeCon. Their decision to present is thus curious.

The DFP public website has odd and paranoid disclaimers like "Confidential and Proprietary Information. FOIA Exempt. Patent Pending Daylight Technology." Should CodeCon require a patent-disclosure statement of presenters? Allow attendees to leave the room before they are "contaminated" by the presentation of a possible patent-troll?

Update (2:27pm Friday): I completely missed this session. Someone please let me know if I got anything wrong.

Technorati Tags: , , , secure science, , ,

CodeCon 2006: With Extreme Prejudice

As an experiment in attention, I'll be writing my impressions of CodeCon sessions this year, but with a twist: I'll be publishing judgements about sessions before attending, using only the official description and online presences of the presenting projects.

No, it's not quite fair. It may in cases be prejudicial and wrongheaded.

But it's rare for a demo's information content to exceed what's already published about a project. I can read faster than presenters can talk, and I can also usually fill in the blanks and anticipate the questions and answers before they come up.

For those rare (and treasured!) cases where the session actually adds some new insight (or corrects an earlier misimpression), I'll append an update to my prior writeups. And so here goes...

Technorati Tags: ,