Content Management

Automatic Keywords

26 Nov 2002

Nathan Jacobs wants MT to automatically suggest categories for the current entry.

The question is, how would it know? What criteria would be used to determine the category? I played with the concept of creating a keyword generator for MT that would parse your entry text and create a keyword list. But how to come up with the keywords? Word frequency is the most likely method, and generating keywords solely on word frequencies isn’t likely to create acceptable results.

For example, take a look at a recent entry here, Open Letter to Barnes and Noble:

The entry is about their customer service mistake in sending me a marketing email when I had clearly asked not to receive them. Here’s the list of keywords generated by a word frequency analysis:

email, sent, unsubscribe, received, first, it’s, since, easier, list, now, they, why, preferences, mistake, understand, email preferences, marketing email

Here’s my hand generated list of keywords:

barnes and noble, email, customers, open letter, privacy

Now obviously the automatic list could be improved somewhat by ignoring certain words (it already ignores things like "the" and "and") but there are still going to be limits to the automatic method. The subject of the entry is Barnes & Noble, but the words "Barnes and Noble" don’t appear very often in the text. So going by word frequency alone obviously won’t cut it.

This is the problem that the big search engines had a few years ago. Keyword frequency isn’t always good indicator of the subject of a page. Just because my page has the word "animation" in it repeatedly doesn’t mean it’s about cartoons. That’s why Google was such a big hit. They determined relavancy based on what other Web sites thought. If sites about cartoons link to me, then my site is probably about cartoons. So Google uses humans—Web site owners—to determine what my pages are about.

Perhaps someone can suggest a better algorithm for generating keywords?

(Edit, Oct 2005) I released Tagyu to solve this exact problem, Tagyu analyzes the content, the context it’s in, and other factors and generates a list of keywords. These keywords aren’t extracted from the content, but instead they are created by understanding how a human has classified similar text.

Recently Written

VC’s Future Lies In Building Winners

Jun 21: AI and megafunds are about to kill the traditional venture model, forcing smaller VCs to stop hunting for hidden gems and start rolling up their sleeves to fix broken companies instead.

Should individual people have OKRs?

May 14: A good OKR describes and measures an outcome, but it can be challenging to create an outcome-focused OKR for an individual.

10 OKR traps and how to avoid them

May 8: I’ve helped lots of teams implement OKRs or fix a broken OKR process. Here are the 10 most common problems I see, and what to do instead.

AI is Smart, But Wisdom Requires Judgement

May 3: AI can process data at lightning speed, but wisdom comes from human judgment—picking the best imperfect option when facts alone don’t point the way.

Decoding Product Leadership Titles

Mar 18: Not all product leadership titles mean what they sound like. ‘Head of Product’ can mean anything from a senior PM to a true VP. Here’s how to tell the difference.

What branding can teach about culture

Jan 8: Culture is your company’s point of view in action—a framework guiding behavior, even in the unknown. You can’t copy it; it must reflect your unique perspective.

Think Systems, not Symptoms

Dec 15: Piecemeal process creation frustrates teams and slows work. Stop patching problems and start solving systems. Adopting a systems thinking approach helps you design processes that are efficient, aligned with goals, and truly add value.

Your Policies Aren’t Your Culture

Dec 13: Policies guide behavior, but culture is the lived norms and values of your team. Policies reflect culture -- they don’t define it. Netflix’s parental leave shift didn’t change its culture of freedom and responsibility. It clarified how to live it.

Automatic Keywords

Related Reading

Recently Written

What I'm Reading