Monthly ArchiveMay 2007
Tech & VC 30 May 2007 08:18 pm
The Curated Web
I have been reading the coverage of Mahalo, Jason Calacanis’s latest project, with much interest. One startup I really like is Top 10 Sources, and Mahalo has a lot of the elements that Top 10 Sources gets right (domain experts curating links), plus some cool user generated tools too to suppliment the curators.
In general, I am optimistic about curated web experiences, especially if the curator is (at least in part) powered or audited by the crowd. A Google search results page requires lots of past searches to be really effective and efficient. For example, a search for almost any coding question will generate a search result for a page at the “experts-exchange” domain. Experienced Google searchers know not to click on the “experts-exchange” links because they’re all garbage that requires registration in order to see answers to your question. Yet, how could a novice know to avoid these links? “Experts-exchange” is SEO’d extremely well, so it’s a tough nut for Google to crack, but curation fixes this problem via human filter.
Curation makes a lot of sense for broad, common subjects and searches. I’m curious how the Mahalo approach to curation will hold up further out on the long tail, and I look forward to playing with it in the coming weeks.
Tech & VC 29 May 2007 02:19 pm
reCAPTCHA on my Blog
reCAPTCHA is the latest project from Luis von Ahn the inventor of the CAPTCHA and founder of the highly addictive ESP Game. The underlying principle to all of Luis con Ahn’s work is that perhaps the smartest computer is one that is powered by the crowdsourced intelligence of humans. Typically, Luis leverages games in order to incentive humans to contribute work to a greater cause, such as labeling images for improved search quality and seeing-impaired access.
reCAPTCHA is not a game. It’s a web service version of a CAPTCHA. reCAPTCHA is a leap forward in OCR technology. A normal CAPTCHA is just randomly chosen characters,; by contrast, a reCAPTCHA is two words that modern-day OCR technology fails to recognize in book digitizing efforts. One of the words is known to the computer (based on previous reCAPTCHAs) and one of the words is unknown. If computer tests if you can recognize the known word. If you get it right, it assumes you know the unknown word. The unknown word is considered unknown until a statistically significant number of people agree on the word. Once there is strong agreement, the word is known and can be used as a known word in future reCAPTCHAs. More importantly, once a previously unknown word is known, it can be used to improve the digitization of the book that was the source of the word. So, if there is a smudge in the scan of Moby Dick and OCR fails to recognize a word, that word can possibly be recovered by the human computing power of reCAPTCHA.
It’s free to use (unless you suck up a lot of bandwidth). You need to request an API key and then implement their API (or use a plugin, like the Wordpress plugin I’m using). As a nice side benefit, the usability of reCAPTCHA is significantly improved over original CAPTCHAs (for example, CAPTCHAs are not accessible to the blind, but reCAPTCHA has an audio option for the seeing-impaired).
Filling out a reCAPTCHA is now required in order to comment on my blog. I’m not a big fan of making it harder for people to comment here, but I’m glad this will reduce comment spam, and I’m really glad I will be adding to the book digitizing efforts of archive.org, which is the first beneficiary of reCAPTCHA’s OCR computing power. If the number of comments I receive dips significantly, then I’ll kill it, but otherwise, the reCAPTCHA is here to stay.
Also, I have a feature request for reCAPTCHA: I wish they would report to my how many people successfully and unsuccessfully filled out my reCAPTCHA. Just a simple counter of both numbers would be great.
I love the reCAPTCHA tagline… it’s a perfect description of their value proposition: “Stop Spam. Read Books.” Write a comment here to test it out. :) The WordPress plugin isn’t perfect, but I’m looking into customizing the CSS to make it a little more intuitive, especially when a user fails to fill out the reCAPTCHA correctly.
I first learned about reCAPTCHAs from the consistently excellent O’Reilly Radar blog. Also, check out this original announcement by Ben Maurer, a student of Luis Von Ahn.
Tech & VC 28 May 2007 02:47 pm
The Social Media Filter
Mainstream computer users are getting better at using and creating social media. More and more people are figuring out Flickr (enough to kill off Y! Photos). Wikis are becoming more like Microsoft Word and less like VI in terms of usability.
This progress is beneficial for social media companies because it is a step closer to mainstream adoption. For social media to have a long future, tools like Twitter need to be as commonplace as IM is today.
However, there are unwanted side-effects of this increased adoption of social media. It used to be that the ability to use social web services was a filter. If you knew how to use del.icio.us or Digg, that was a fairly reliable filter of information, such that the stories on Digg and del.icio.us were generally interesting and relevant to the other users of these social bookmarking services. Wikipedia relies on this barrier more than any other web service: the ability to use MediaWiki is the best was to filter the people that should (and shouldn’t) be editing Wikipedia.
But, as social media usability has been improved and as users have become more savvy, the filter of simply being able to use a piece of software has decreased in quality.
Richard MacManus wrote about this subject over a year ago on 2/16/06. Back then, Richard concluded that 2006 would thus be the “year of the filter”. I think Richard was right that 2006 needed to be the year of the filter (social media became increasingly noisy throughout 2006), but I think that the web 2.0 community failed in finding good filters in 2006. I’m still aggressively looking for solid, scalable social media filters halfway through 2007. Comment on this post or shoot me an email if you’re working on something exciting to solve this problem.
Tech & VC 27 May 2007 03:26 pm
Dot-Com Dev Cycles
On Friday afternoon I asked Gabe Rivera (Founder of Memeorandum) for a small feature request. Sunday morning at 3:25 am, my desired feature was public on Techmeme. That’s an amazingly quick dev cycle for such a heavily-trafficked web service. I’m impressed.
I know Gabe is the only guy behind the wheel at Memeorandum, so it’s easier to do quick small builds when you don’t have to worry about peer developers’ code. Nonetheless, I think Gabe’s speed is a huge asset for his company.
I have been involved in two different types of dot-com dev cycles in the past. In one, I was a developer on a 3-person team open source project built on LAMP. The other, I was a producer in a 70-person company.
Each had their advantages. On the 3-person team, we could churn out bug fixes or small feature requests in a few days. If a user had a legitimate complaint, we could fix the problem within a day or two. There was no formal QA process and there was very little ownership of code (ie everyone worked on everyone else’s code). There was almost zero-bureaucracy so we can get stuff done incredibly quickly. But, larger features took a long time to develop and test in this format, and code migration became an annoying chore if someone was working on a large feature for a few months and someone else was doing daily bug fixes. Also, nightly builds were messy: sometimes the code wouldn’t run (I would say “wouldn’t compile” instead of “wouldn’t run”, but PHP is an interpreted language, so it doesn’t really work like that…), but fixes to a broken nightly were always prompt.
By contrast, the dev cycle in the 70-person company was careful, deliberate, long, and cumbersome. A minor feature request from a customer would get dropped on a long list of feature requests, and there were monthly meetings to prioritize the feature request list. The build cycle was as long as 6 weeks or as short as 2 weeks (when pieces of the dev cycle efficiently worked in parallel on different builds). But a feature, no matter how small, always took a minimum of 3 weeks from initial feature request through build. The process felt frustratingly slow, but it was the most efficient way to manage code across a large development team. And, it was safe. Builds rarely backfired.
I’m not sure which dev cycle was better. There’s a lot to be said in favor of stability (where the 70-person company excelled), but working on a 3-person team was much more nimble. I think my own personal preference is for smaller teams, but scaling the team and growing pains make bureaucracy a necessary evil for stability in the long run.
Tech & VC 25 May 2007 01:38 pm
Jotspot Support Has Vanished
Jotspot has three exposed email resources for support in their help documentation:
- support@jot.com
- jotspotsupport@google.com
- jim@jot.com
I have emailed all three of these addresses at least once over the past few months. I’ve sent a total of 5 cries for help dating back to March 21st. I have not received a single response.
Jotspot support doesn’t suck: to suck you’d have to exist.
I remember paying a $228.00 bill for an annual subscription to Jotspot just before the Google acquisition. What did that buy us? A hosted service with no one at home? Where’s the love?
I would LOVE to migrate away from Jot, but 4 of the support emails I have sent have been asking them to fix their data export tools, which are totally broken for their spreadsheet applications. If anyone has any solutions, I’m all ears.
Personal 24 May 2007 03:55 pm
Stanford Imposter
This is really incredible…. like, a little TOO incredible to be believable (like the 2004 Red Sox ALCS comeback). It’s the story of a person squatting at Stanford that faked being a student for 8 months. She “took” classes. She studied for “exams” (which were not graded because she wasn’t registered). Most creepy of all, she conned her way into living in the dorms.
She was recently caught, and the Stanford Daily reported on the story.
Why would anyone do this? Her friends’ best guesses (from the article):
Friends aren’t sure of her motive for sneaking onto campus and living a lie, but many speculate that she felt pressure from overbearing parents to attend Stanford — regardless of whether she was admitted.
Wow. At least they’re blaming parents instead of the media ;)
PS: this is my first test with a SmartLink in a post. Trying clicking on the blue icon next to the link (RSS readers will see nothing).
Tech & VC 24 May 2007 03:26 pm
Facebook – The Platform
This is a total 180… I always thought of Facebook as this intentionally closed off system designed to lock out third-party widgets in order to preserve the consistency of the user experience.
Not anymore, according to TechCrunch, Facebook is now ready to become the uber-platform with unprecedented access via a remarkably open developers’ API:
The API would allow, for example, a third party to recreate Facebook Photos, the most used photo application on the web. Users could then remove the default Facebook Photos and install the third party version instead.
I think this is a real game-changer. Facebook is definitely the most interesting company out there right now, and it’s only going to get more exciting as innovation and new features have now become crowd-sourced.
I wonder how Facebook will manage their data asset in this world of open access. Since developers can now build widgets that leverages all of a given user’s data in Facebook, what’s to stop the develops from sending that data back to a home server and storing it? I think it would be incredibly valuable to know the structure of the social network at major universities. Widget makers like Slidea and RockYou, and even consumer brands with digital media campaigns, like Red Bull, can now figure out who the super-nodes are at various universities, and they could use this data to market directly to key influencers. Wild!
Tech & VC 22 May 2007 06:42 am
Twitter Fiction
I shot off a quick tweet yesterday asking if anyone had seen people using Twitter as a platform for fiction. Greg Cohn responded:
@ andrewparker: http://twitter.com/zombieattack been meaning to blog it!
I’m beating Greg to blogging it ;)
I really dig this. It’s a piece of fiction that is being written one Twitter message at a time, and it is updated about once a day. The story is two brothers leading a quiet life that is rudely jolted by a creeping illness that develops into a zombie attack. The 140 character limit enforces tight and terse sentence creation. Each tweet hits hard, and echoes like a solo gunshot. Here’s a quick taste for those not interested in clicking through the link:
We hear screaming and then gunshots a few cells down, people running in and out, people crying. Everything is out of control! 02:25 AM May 02, 2007
After a few minutes things start to quiet down. Two bodies are dragged away, and in comes 3 people in suits, they carry cleaning products. 03:56 AM May 02, 2007
Matt and I Wake up to two guards pulling us up. The move us to an office where a middle age man starts to question us. 08:02 PM May 02, 2007
Sure, it’s a bit overly dramatic, but it’s amazing how much imagery the authors (there’s supposedly two authors, Matt and Greg) can convey in just 140 characters.
I love how flexible/malleable Twitter is. There is such a wide range of use cases, and I’m glad Zombie Attack is adding fiction to that long list.

