ALTer

Freshness Warning
This article is over 15 years old. It's possible that the information you read below isn't current.

I’m working on a site and realized that I had left off the alt text from some of my images. Going through several hundred pages looking for images that didn’t contain the alt text didn’t sound like much fun, so I whipped up a quick Perl script to add the text for me.

The program walks through a directory and opens every file ending with .html or .htm. Then it looks for image tags without alt attributes and adds them in. The alt text that it adds comes from a pipe-delimited data file containing image file names followed by the alt text. If the program comes across an image that doesn’t have an alt attribute and it isn’t listed in the data file, an empty alt attribute is inserted. Image tags that already have alt attributes are ignored.

To use the program, create a text file that looks like this:

someimage.jpg|Some Imagelogo.gif|Kalsey Consulting Groupyomama.jpg|Your Mother

Then run the Perl app like so…

ALTer.pl /path/to/data_file.txt /path/to/search/inside/

Please note: This is provided with no warranty or tech support whatsoever. Make backups. If you come across a problem, I probably won’t be able to help you. The code is licensed under the MIT License, copyright 2003 Kalsey Consulting Group.

Here’s the contents of ALTer.pl

#!/usr/bin/perluse File::Find;use strict;my $directory = './';my $data_file;my @valid_extensions = ('\.html?');my %texts;$directory = $ARGV[1]   if $ARGV[1];$data_file = $ARGV[0];open (DATA, $data_file) or die "Can’t open $data_file: $!";while (<DATA>) {        chomp;        my ($src, $alt) = split /\|/;        $texts{$src} = $alt;}chomp($directory);$directory =~ s/\\/\//g;find(\&add_alt, $directory);sub add_alt() {  my $file = $File::Find::name;  my @html;  my $blnValidType = 0;  # skip directories  return if(-d $file);  # only insert alt text if this is an HTML file  foreach(@valid_extensions) {    if($file =~ m/$_$/) {      $blnValidType = 1;      last;    }  }  return       if($blnValidType == 0);  chomp($file);  open(IN, $file) ||     die "Cannot open $file for reading: $!";  @html = <IN>;  close(IN);  my $line = join("\n", @html);  $line =~ s+<img([^>]*)?>+      my $orgtxt = $1;       if($1 !~ /alt\=\"/) {           my $alttxt = '';          if ($1 =~ m/src="([^"]*)/) {              local $_ = $1;              s/^.*(\\|\/)//;              $alttxt = $texts{$_};          }          "<img alt=\"$alttxt\" $orgtxt>";       } else {           "<img$orgtxt>";       }      +egs;  open (OUT, ">$file");  print OUT $line;  close OUT;}

Adam Kalsey
September 24, 2003 1:13 PM

Some quick notes... I'm not doing very robust HTML parsing here, so if you don't write your HTML like me then things probably won't work. The code won't handle uppercase tag or attribute names, spaces around the = signs, or attributes that aren't double-quoted. It will work with attrbutes that are in odd orders, though. If your image tag ends with an XHTML slash at the end, that slash will still be there after ALTer does its thing. If the tag doesn't end in a slash, ALTer won't add it.

Chris Vance
September 24, 2003 9:01 PM

That looks to be a handy script. You can make your regex handle uppercase tags by using the 'i' modifier (which makes the regex case insensitive). In the img regex, s+]*)?>+ would become s+]*)?>+i. (the period is a full stop and not part of the regex) If you wanted to write some messy regex (is there any other kind :-) ), you could find unquoted attributes by looking for anything but spaces and single/double quote tags after the equals sign, ala m/src=([^"/s']*)/ . I'm sure the regex I just gave fails in some instances. YMMV.

Adam Kalsey
September 24, 2003 11:25 PM

You'd want \s for a space character, not /s. I'm aware of the i modifier, and you'd actually have to use it three times. Once after /alt\=\"/ once after m/src="([^"]*)/ and once after +egs Since the script was written primarily for my use, I'm not too concerned with different HTML styles. But if people want to suggest changes that will make things a bit more robust, please do. It could be handy.

Clayton
September 25, 2003 8:21 AM

I thought I would post this for you since I got so much milage from your Forms Errors Simplified article. Just for info here's how you could do it with HTML::TokeParser::Simple which will be more accepting about the nature of the HTML that's put in. -- replace lines 45 - 65 in your script use HTML::TokeParser::Simple; my $parser = HTML::TokeParser::Simple->new( $file ) || die "Can't open $file for reading: $!"; my $html; while (my $token = $parser->get_token) { # find image files without alt if ( $token->is_start_tag('img') ) { my $tag = $token->return_tag; my $attr = $token->return_attr; # If we have an alt entry already we'll skip it if ( defined $attr->{alt} ){ # we could change the above test to look in %texts for new alt contents # for tags that exist but are empty # if ( defined $attr->{alt} and $attr->{alt} ne '' ) next; } # Add an alt attribute my $alt = exists $texts{ $attr->{src} } ? $texts{ $attr->{src} } : ''; $token->set_attr( alt => $alt ); } $html .= $token->as_is; } open (OUT, ">$file"); print OUT $html; -- above code is mostly untested unless otherwise stated Also here's a link to an example of a script I wrote to extract inline css to external style sheets. Extract inline styles to an external style sheet http://www.perlmonks.org/index.pl?node_id=292225 Clayton PS the code would look nicer if the comment system would let me put code tags around it. :)

Adam Kalsey
September 25, 2003 9:01 AM

That's a good idea. I hadn't thought about using an actual HTML parser. I wonder how fast this method would be though. One advantage of the big ugly regex is that the script takes under a second to process 150 3kb HTML files. On the other hand, since this is a utility script that is rarely run, speed probably isn't much of an issue as long as it doesn't take more than a few minutes to process.

Clayton Scott
September 25, 2003 9:22 AM

Since it's a rarely used utility script I'm more worried about maintainability and robustness than speed. I'd use a set of similar scripts for all kinds of HTML transformations when refactoring a new customers old website.

Charles
October 7, 2003 11:32 PM

It would be nice to set up a find and replace script for this in BBEdit using grep. I would if I knew how.

Alex
November 9, 2004 10:51 AM

Looking for a regex to find and tags that have file:/// in them... any ideas?

chris
February 15, 2005 6:56 PM

use while ( my $line = ) {} instead of @lines = ; for those really big files, you will save a lot of memory this way. thx for the script

Amie Stilo
January 10, 2008 10:17 PM

What a "perler" of an idea (sorry about that) but seriously this is very clever. As a bit of a novice I use dreamweaver to do all my design work and use the find & replace all the time to try and save work, it would be nice to just whip up a script sometimes!

Your comments:

Text only, no HTML. URLs will automatically be converted to links. Your email address is required, but it will not be displayed on the site.

Name:

Not your company or your SEO link. Comments without a real name will be deleted as spam.

Email: (not displayed)

If you don't feel comfortable giving me your real email address, don't expect me to feel comfortable publishing your comment.

Website (optional):

Follow me on Twitter

Best Of

  • How not to apply for a job Applying for a job isn't that hard, but it does take some minimal effort and common sense.
  • Movie marketing on a budget Mark Cuban's looking for more cost effective ways to market movies.
  • California State Fair The California State Fair lets you buy tickets in advance from their Web site. That's good. But the site is a horror house of usability problems.
  • Customer reference questions. Sample questions to ask customer references when choosing a software vendor.
  • Comment Spam Manifesto Spammers are hereby put on notice. Your comments are not welcome. If the purpose behind your comment is to advertise yourself, your Web site, or a product that you are affiliated with, that comment is spam and will not be tolerated. We will hit you where it hurts by attacking your source of income.
  • More of the best »

Recently Read

Get More

Subscribe | Archives

Recently

Assumptions and project planning (Feb 18)
When your assumptions change, it's reasonable that your project plans and needs change as well. But too many managers are afraid to go back and re-work a plan that they've already agreed to.
Feature voting is harmful to your product (Feb 7)
There's a lot of problems with using feature voting to drive your product.
Encouraging 1:1s from other managers in your organization (Jan 4)
If you’re managing other managers, encourage them to hold their own 1:1s. It’s such an important tool for managing and leading that everyone needs to be holding them.
One on One Meetings - a collection of posts about 1:1s (Jan 2)
A collection of all my writing on 1:1s
Are 1:1s confidential? (Jan 2)
Is the discussion that occurs in a 1:1 confidential, even if no agreed in the meeting to keep it so?
Skip-level 1:1s are your hidden superpower (Jan 1)
Holding 1:1s with peers and with people far below you on the reporting chain will open your eyes up to what’s really going on in your business.
Do you need a 1:1 if you’re regularly communicating with your team? (Dec 28)
You’re simply not having deep meaningful conversation about the process of work in hallway conversations or in your chat apps.
What agenda items should a manager bring to a 1:1? (Dec 23)
At least 80% of a 1:1 agenda should be driven by your report, but if you also to use this time to work on things with them, then you’ll have better meetings.

Subscribe to this site's feed.

Contact

Adam Kalsey

Mobile: 916.600.2497

Email: adam AT kalsey.com

Twitter, etc: akalsey

Resume

PGP Key

©1999-2019 Adam Kalsey.