Autodiscovery of hierarchies through tagging

Freshness Warning
This blog post is over 16 years old. It's possible that the information you read below isn't current and the links no longer work.

There’s been a growing amount of thought recently on how to use the data from folksonomies to organize information in other ways. One blogger suggests data mining folksonomies to populate a hierarchical taxonomy. Through David Weinberger I find that Library Thing notes that tags standing alone don’t always provide as much context as a taxonomy.

The ordered structure of subject headings gives added meaning. History > Philosophy is very different from Philosophy > History—a distinction that isn’t necessarily apparent when searching history or philosophy separately as tags.

Expanding on this concept a bit, I’d note that with a detailed taxonomy it will be difficult to rely soly on a document’s tags to determine where within that taxonomy the document should go. Software > Design is a very different concept than Design > Software (the act of designing a piece of software as opposed to a piece of software used for design) and if your document simply includes the tags design and software how would the classification engine decide where it goes? With a larger tag set it should be possible to infer additional meaning. The additional tags could provide clues as to whether you’re talking about a software product or the process of creating software.

It’s possible for instance to look at the tags mustang, ford, horsepower, specs and determine that this is about the Ford Mustang automobile instead of a horse and that mustang, running, animal are tags that don’t describe a car. But building a taxonomy based on these tags will require that a person with knowledge about automobile brands build a dictionary of tags that map to nodes in their taxonomy. Once you have a corpus of tags that do and don’t place a document into your taxonomy you can start using machine learning—perhaps through statistical analysis—to learn what other tags would define a document as either a horse or a car.

While part of this problem can be solved by using sufficiently large and detailed training sets, but it’s going to be a lot of work. You can’t assume that training a learning algorithm where tags belong in one taxonomy will map well to other similar taxonomies.

Taxonomies are structured for a particular knowledge domain by someone with detailed understanding of that domain. An algotrithm isn’t able to understand nuances in how I want my content structured. It is unable to adjust for my personal preferences and biases. Tagyu tends to classify pages about podcasts as Entertainment but you might feel that everything podcast-related is better suited for the Technology category. With this sort of problem in a generic flat taxonomy, imagine how much more complicated it would be to place items into a detailed hierarchy with lots of similar nodes.

With Tagyu, I’m working on some solutions to this, and I’m excited to see what others are coming up with.

ram
November 29, 2007 10:24 PM

hi pretty interesting. we are also trying to work on the same problem. would definitely be interested to see what can be achieved. check us out at www.findnearyou.com . we have these hierarchical DBs and freetagging by users and we need to use these to divine user intent and deliver what they are searching for

This discussion has been closed.

Recently Written

The Trap of The Sales-Led Product (Dec 10)
It’s not a winning way to build a product company.
The Hidden Cost of Custom Customer Features (Dec 7)
One-off features will cost you more than you think and make your customers unhappy.
Domain expertise in Product Management (Nov 16)
When you're hiring software product managers, hire for product management skills. Looking for domain experts will reduce the pool of people you can hire and might just be worse for your product.
Strategy Means Saying No (Oct 27)
An oft-overlooked aspect of strategy is to define what you are not doing. There are lots of adjacent problems you can attack. Strategy means defining which ones you will ignore.
Understanding vision, strategy, and execution (Oct 24)
Vision is what you're trying to do. Strategy is broad strokes on how you'll get there. Execution is the tasks you complete to complete the strategy.
How to advance your Product Market Fit KPI (Oct 21)
Finding the gaps in your product that will unlock the next round of growth.
Developer Relations as Developer Success (Oct 19)
Outreach, marketing, and developer evangelism are a part of Developer Relations. But the companies that are most successful with developers spend most of their time on something else.
Developer Experience Principle 6: Easy to Maintain (Oct 17)
Keeping your product Easy to Maintain will improve the lives of your team and your customers. It will help keep your docs up to date. Your SDKs and APIs will be released in sync. Your tooling and overall experience will shine.

Older...

What I'm Reading

Contact

Adam Kalsey

+1 916 600 2497

Resume

Public Key

© 1999-2023 Adam Kalsey.