Need someone to lead product management at your software company? I create software for people that create software and I'm looking for my next opportunity. Check out my resume and get in touch.

Autodiscovery of hierarchies through tagging

Freshness Warning
This blog post is over 17 years old. It's possible that the information you read below isn't current and the links no longer work.

16 May 2006

There’s been a growing amount of thought recently on how to use the data from folksonomies to organize information in other ways. One blogger suggests data mining folksonomies to populate a hierarchical taxonomy. Through David Weinberger I find that Library Thing notes that tags standing alone don’t always provide as much context as a taxonomy.

The ordered structure of subject headings gives added meaning. History > Philosophy is very different from Philosophy > History—a distinction that isn’t necessarily apparent when searching history or philosophy separately as tags.

Expanding on this concept a bit, I’d note that with a detailed taxonomy it will be difficult to rely soly on a document’s tags to determine where within that taxonomy the document should go. Software > Design is a very different concept than Design > Software (the act of designing a piece of software as opposed to a piece of software used for design) and if your document simply includes the tags design and software how would the classification engine decide where it goes? With a larger tag set it should be possible to infer additional meaning. The additional tags could provide clues as to whether you’re talking about a software product or the process of creating software.

It’s possible for instance to look at the tags mustang, ford, horsepower, specs and determine that this is about the Ford Mustang automobile instead of a horse and that mustang, running, animal are tags that don’t describe a car. But building a taxonomy based on these tags will require that a person with knowledge about automobile brands build a dictionary of tags that map to nodes in their taxonomy. Once you have a corpus of tags that do and don’t place a document into your taxonomy you can start using machine learning—perhaps through statistical analysis—to learn what other tags would define a document as either a horse or a car.

While part of this problem can be solved by using sufficiently large and detailed training sets, but it’s going to be a lot of work. You can’t assume that training a learning algorithm where tags belong in one taxonomy will map well to other similar taxonomies.

Taxonomies are structured for a particular knowledge domain by someone with detailed understanding of that domain. An algotrithm isn’t able to understand nuances in how I want my content structured. It is unable to adjust for my personal preferences and biases. Tagyu tends to classify pages about podcasts as Entertainment but you might feel that everything podcast-related is better suited for the Technology category. With this sort of problem in a generic flat taxonomy, imagine how much more complicated it would be to place items into a detailed hierarchy with lots of similar nodes.

With Tagyu, I’m working on some solutions to this, and I’m excited to see what others are coming up with.

ram
November 29, 2007 10:24 PM

hi pretty interesting. we are also trying to work on the same problem. would definitely be interested to see what can be achieved. check us out at www.findnearyou.com . we have these hierarchical DBs and freetagging by users and we need to use these to divine user intent and deliver what they are searching for

This discussion has been closed.

Recently Written

Too Big To Fail (Apr 9): When a company piles resources on a new product idea, it doesn't have room to fail. That keeps it from succeeding.
Go small (Apr 4): The strengths of a large organization are the opposite of what makes innovation work. Starting something new requires that you start with a small team.
Start with a Belief (Apr 1): You can't use data to build products unless you start with a hypothesis.
Mastery doesn’t come from perfect planning (Dec 21): In a ceramics class, one group focused on a single perfect dish, while another made many with no quality focus. The result? A lesson in the value of practice over perfection.
The Dark Side of Input Metrics (Nov 27): Using input metrics in the wrong way can cause unexpected behaviors, stifled creativity, and micromanagement.
Reframe How You Think About Users of your Internal Platform (Nov 13): Changing from "Customers" to "Partners" will give you a better perspective on internal product development.
Measuring Feature success (Oct 17): You're building features to solve problems. If you don't know what success looks like, how did you decide on that feature at all?
How I use OKRs (Oct 13): A description of how I use OKRs to guide a team, written so I can send to future teams.