Autodiscovery of hierarchies through tagging

Freshness Warning
This article is over 12 years old. It's possible that the information you read below isn't current.

There’s been a growing amount of thought recently on how to use the data from folksonomies to organize information in other ways. One blogger suggests data mining folksonomies to populate a hierarchical taxonomy. Through David Weinberger I find that Library Thing notes that tags standing alone don’t always provide as much context as a taxonomy.

The ordered structure of subject headings gives added meaning. History > Philosophy is very different from Philosophy > History—a distinction that isn’t necessarily apparent when searching history or philosophy separately as tags.

Expanding on this concept a bit, I’d note that with a detailed taxonomy it will be difficult to rely soly on a document’s tags to determine where within that taxonomy the document should go. Software > Design is a very different concept than Design > Software (the act of designing a piece of software as opposed to a piece of software used for design) and if your document simply includes the tags design and software how would the classification engine decide where it goes? With a larger tag set it should be possible to infer additional meaning. The additional tags could provide clues as to whether you’re talking about a software product or the process of creating software.

It’s possible for instance to look at the tags mustang, ford, horsepower, specs and determine that this is about the Ford Mustang automobile instead of a horse and that mustang, running, animal are tags that don’t describe a car. But building a taxonomy based on these tags will require that a person with knowledge about automobile brands build a dictionary of tags that map to nodes in their taxonomy. Once you have a corpus of tags that do and don’t place a document into your taxonomy you can start using machine learning—perhaps through statistical analysis—to learn what other tags would define a document as either a horse or a car.

While part of this problem can be solved by using sufficiently large and detailed training sets, but it’s going to be a lot of work. You can’t assume that training a learning algorithm where tags belong in one taxonomy will map well to other similar taxonomies.

Taxonomies are structured for a particular knowledge domain by someone with detailed understanding of that domain. An algotrithm isn’t able to understand nuances in how I want my content structured. It is unable to adjust for my personal preferences and biases. Tagyu tends to classify pages about podcasts as Entertainment but you might feel that everything podcast-related is better suited for the Technology category. With this sort of problem in a generic flat taxonomy, imagine how much more complicated it would be to place items into a detailed hierarchy with lots of similar nodes.

With Tagyu, I’m working on some solutions to this, and I’m excited to see what others are coming up with.

ram
November 29, 2007 10:24 PM

hi pretty interesting. we are also trying to work on the same problem. would definitely be interested to see what can be achieved. check us out at www.findnearyou.com . we have these hierarchical DBs and freetagging by users and we need to use these to divine user intent and deliver what they are searching for

Your comments:

Text only, no HTML. URLs will automatically be converted to links. Your email address is required, but it will not be displayed on the site.

Name:

Not your company or your SEO link. Comments without a real name will be deleted as spam.

Email: (not displayed)

If you don't feel comfortable giving me your real email address, don't expect me to feel comfortable publishing your comment.

Website (optional):

Follow me on Twitter

Best Of

  • California State Fair The California State Fair lets you buy tickets in advance from their Web site. That's good. But the site is a horror house of usability problems.
  • Best of Newly Digital There have been dozens of Newly Digital entries from all over the world. Here are some of the best.
  • How not to apply for a job Applying for a job isn't that hard, but it does take some minimal effort and common sense.
  • Newly Digital Newly Digital is an experimental writing project. I've asked 11 people to write about their early experiences with computing technology and post their essays on their weblogs. So go read, enjoy, and then contribute. This collection is open to you. Write up your own story, and then let the world know about it.
  • Lock-in is bad T-Mobile thinks they'll get new Hotspot customers with exclusive content and locked-in devices.
  • More of the best »

Recently Read

Get More

Subscribe | Archives

Recently

Encouraging 1:1s from other managers in your organization (Jan 4)
If you’re managing other managers, encourage them to hold their own 1:1s. It’s such an important tool for managing and leading that everyone needs to be holding them.
One on One Meetings - a collection of posts about 1:1s (Jan 2)
A collection of all my writing on 1:1s
Are 1:1s confidential? (Jan 2)
Is the discussion that occurs in a 1:1 confidential, even if no agreed in the meeting to keep it so?
Skip-level 1:1s are your hidden superpower (Jan 1)
Holding 1:1s with peers and with people far below you on the reporting chain will open your eyes up to what’s really going on in your business.
Do you need a 1:1 if you’re regularly communicating with your team? (Dec 28)
You’re simply not having deep meaningful conversation about the process of work in hallway conversations or in your chat apps.
What agenda items should a manager bring to a 1:1? (Dec 23)
At least 80% of a 1:1 agenda should be driven by your report, but if you also to use this time to work on things with them, then you’ll have better meetings.
Handling “I don’t have anything to talk about” in your 1:1s (Dec 21)
When someone says they have nothing to discuss, they’re almost always thinking too narrowly.
What should you talk about in a 1:1? (Dec 19)
Who sets the agenda? What should you discuss, and what should you avoid discussing?

Subscribe to this site's feed.

Contact

Adam Kalsey

Mobile: 916.600.2497

Email: adam AT kalsey.com

Twitter, etc: akalsey

Resume

PGP Key

©1999-2019 Adam Kalsey.