Use of words: 2007

Monday 10 December 2007

Game of the day: Dwarf Fortress

Dwarf Fortress ... freeware game for the os that must not be named. Works admirably in Wine.

An obscure cross between Dungeon Crawl and The Settlers. Dig your own Moria. Then try to reclaim it after it inevitably falls - to war, famine, accidental magma flood, wandering dragon - or just adventure in the ruins.

Alpha version, probably permanently. Quite addictive.

Tuesday 13 November 2007

The very large and peculiar cats from Saturn

This is an actual radio capture. From Saturn. By Cassini.

What's Titan hiding below that orange smog?

*dreamy* I want to go out there...

On that note, fellow Osloians: Buzz Aldrin is here this coming Sunday, at Astrofestival 2007.

Thursday 25 October 2007

Driver, test pilot, vague guiding lights?

The Linux Driver Project is taking off a bit, with several hundred people signed up to help. And we have a few companies dipping their toes in. Getting involved in Linux kernel development has accreted some barriers to entry these last few years. It's difficult to find a worthwhile place to start contributing, and the review-and-resubmit process is drawn-out when no-one knows and trusts your judgement and good taste in coding yet. (Your good taste in most other aspects of life is naturally a given, despite occasional innuendos in heated discussions.)

OutOfTreeDrivers and DriversNeeded now collects things to do (and places to go?). Most places I've worked, I eventually end up adding out-of-tree, or even binary-only, drivers to support something someone needs done - which is a big maintenance pain. Collecting them here might inspire a brave soul to start the mainlining, lobby-for-opening, or reverse-engineering (where legal, like here) process.

Do help out. I'm doing my part.

Thursday 11 October 2007

Ink, read while it's fresh

Just out: Halting State by Charles Stross. Extrapolates from today to 2018, and will certainly be outdated before then. Technothriller. Quirky sarcastic eloquence.

Recommended reader qualifications: MMO gamer w/computer science degree.

Enjoy. I did.

Monday 8 October 2007

The Beast Remade: Niceties, wrapping up

Text::Reform took care of the rest of the pretty-printing I needed. The Perl builtin "format" command was more familiar, but the listings would not come out right. I have a sneaking suspicion that it does not like UTF-8 STDOUT much. Text::Reform was easier to get right:


$menu_form = "<< "
 ."<<<<<<<<<<<<<<<<<<<<<<<<< "
 ."<<<<<<<<<<<<<<<<<<<<<<<<< "
 ."<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<";

 print form $menu_form, $index, $from, $to, $subject;

Some manual command-line parsing were needed, in order to do labor-saving speedups like "rm 1-7,17". I expect everyone reinvents that sort of thing at need.

Wrapping up

All this was wrapped up in a two-day hack that saves some time and much aggravation.

I would like to see someone do a similar command-line stunt for interacting with an AJAX web application. Consider this a dare, fellow hacker. I'll owe you a beverage of your choice next time we meet.

Wednesday 3 October 2007

The Beast Remade: Use your head

Now that "cat" and "rm" works, the quick hack is somewhat useful for its intended purpose. But a short practical test reveals issues. A "cat" dumps the whole mail to the console - while that works as intended, it also scrolls the top of the mail and headers far outside the scrollback buffer, if the mail is large. A "head" command soon looks desirable.

First try is the naive "print first n lines". That approach is quickly thrown out the window. Grumbling ensues about the evils of HTML email. Spam promptly materializes containing roughly fourteen long paragraphs about random body part enlargement, without a line break anywhere in sight.

I found a rather easy solution, after some thought - we run this on Linux, after all, so why not use the traditional Unix utilities:


 $spam = grabmsg($mailnum);
 open HEAD, "| fold | head -24";
 print HEAD $spam;

No sense in reimplementing everything in Perl if you can think of something else that does the job already. There probably are twenty-five ways of doing this in pure Perl that would be more elegant, if you know of and remember them. I had two days to build this, so I grabbed the first approach I could think of that worked.

Monday 24 September 2007

The Beast Remade: Do What I Mean

WWW::Mechanize is a subclass of LWP::UserAgent. LWP::UserAgent stops at the HTTP layer, and leaves the application to deal with HTML or whatever else the server sends us in its own way. WWW::Mechanize mixes in knowledge of HTML, and checks your HTTP conversations for error messages. (Now you know what the autocheck => 1 parameter in Enter WWW::Mechanize is.) So using WWW::Mechanize feels like scripted use of a browser, which fits my problem. I am trying, after all, to abbreviate user interaction from several mouse clicks per suspected spam to a short command-line interaction.

Intuitive methods like ->tick() for checkboxes, ->click() for buttons and ->select() for radio buttons and listboxes helps me fill out the necessary forms quickly. In my "rm" function, I deal with the spam like this:


  foreach my $spamid (@spamid_arr)
  {
    $localmech->form_name("quarantine");
    $localmech->tick("mailid", $spamid);
  }
  $localmech->click("delete");

From inspection, the relevant checkboxes had "name=mailselected" and "value=" an id number that stayed constant even when the mail queue view changed. Which makes sense, from a robustness perspective.

Monday 17 September 2007

The Beast Remade: Deep magic with Storable::dclone

The cat/head/rm/ok commands posed some difficulties. My naive "ls" implementation refreshed its view of the quarantine section every time you ran it, as a normal "ls" would.

The underlying web interface happened to add any new spam that arrived in the meantime to the top of the numbered list. So a certain spam might be renumbered every time my hack looked at the index page. I certainly did not want the numbering to refresh between calls to "ls"; that could cause a quick "ls; cat 1; rm 1" sequence to delete the wrong mail!

Both the deletion form and the "view mail" links, thankfully, used mail queue id as identifier, not the 1-20 numbered list. Keeping a cached copy of the deletion form seemed wise, for later use in the "rm" command. But I also wanted to make further use of the WWW::Mechanize object, in order to get a look at the suspected spam, and avoid any issues with multiple logins kicking each other out or similar.

Storable::dclone is deep Perl magic, cloning an instantiated object and all its references, including internal state, functions and all. As such, it is not exactly lightweight or easy to wrap your head around. But it does exactly what I want in this case; it gives me a separate copy of the $mech WWW::Mechanize object to play around with that does not disturb the original:


unless (defined ($Storable::Deparse))
  { $Storable::Deparse = 1; }
unless (defined ($Storable::Eval   ))
  { $Storable::Eval    = 1; }
my $localmech = dclone($main::mech);

... and then $localmech is mine to do with as I like. Below is a complete example of use. It extracts the spam from its detail page, pretty-prints it to a string and returns.


# clone a $mech for viewing, keep
# the old around for the index page
sub grabmsg
{
my $spamindex = $_[0];

unless (defined ($Storable::Deparse))
  { $Storable::Deparse = 1; }
unless (defined ($Storable::Eval   ))
  { $Storable::Eval    = 1; }
my $localmech = dclone($main::mech);

$localmech->get($spam_viewurl[$spamindex]);
my $content = decode("utf-8",$localmech->content);

my $extractor = new HTML::TableExtract(
                 depth => 1, count => 1);
$extractor->parse($decoded);

my $spam = "";
my $table = $extractor->first_table_found();
my @headerlist = qw(Received: Return-Path: Date:
                    From: Reply-To: To: CC:
                    Subject: Attachments);

for (my $count = 0; $count<@headerlist; $count++)
{
  if (defined (my $cell = $table->cell($count,1)))
  {
    $spam .= color ('BOLD') . $headerlist[$count]
           . color ('RESET') . "\n" ;
    $spam .= "$cell\n" ;
  }
}
$spam .= color ('BOLD') . "Mail body:"
       . color ('RESET') . "\n" ;
$spam .= $table->cell(10,0) . "\n" ;

return $spam;
}

The Beast Remade: The lay of the land

HTML::TableExtract emerged as my poison of choice for doing the grunt work of HTML parsing. It was the easiest to improvise extraction criteria with, of the three or four Perl modules I tried in the course of a frustrating morning.

The easiest way to pick out the table of interest from the quarantine main menu, for example, turned out to be:


my $te = new HTML::TableExtract(
 attribs => { width => 525 }, keep_html => 1 );
$te->parse($mech->content);

...since the attribute "width=525" was its only distinguishing feature. Once parsed, however, it was the work of moments to make a "cd" command to enter the quarantine section of interest. An "ls" command followed shortly behind, with momentary difficulties only.

Thursday 13 September 2007

The Beast Remade: Term::Shell Basics

Grabbing and parsing the web pages are all very well, but without a fast way to interact with them this is all just a waste of time and effort. Term::Shell by Neil Watkiss has been at version 0.01 for half a decade (woo! recently updated to 0.02!), and is barely mentioned in a Google search. That does not deter a Perl-familiar sysadmin out to improve his daily routine. A non-interactive critical script would be another matter entirely.

Term::Shell is excellent scaffolding and support when you do not want to muck around with Term::Readline yourself. As long as you have one of the Term::Readline:: alternatives installed, you get command-line editing and history for free.

A demonstration:


knan@viconia:~$ perl ./listing3.pl
World domination. But first, take care of the spam.
spsh:/> ls
No spam. Yet.
spsh:/> ?
Unknown command '?'; type 'help' for a list.
spsh:/> help
Type 'help command' for more detailed help.
Commands:
 exit - exits the program
 help - prints this screen, or help on 'command'
 ls   - Is there any spam? - no help available
spsh:/> exit

So far so good. Add a help_ls subroutine if you feel like explaining the intricacies of spam listing to your fellow spam handlers.

Source of the shell above:


package main;

Spsh->new->cmdloop;

package Spsh;
use base qw(Term::Shell);

sub preloop
{
$state_path = "";
binmode(STDOUT, ":utf8");

print "World domination. But first,"
     ."take care of the spam.\n";
}

sub prompt_str
{
return "spsh:/$state_path> ";
}

# Define an empty command, to avoid
# "unknown command" on an empty
# command-line.
sub run_
{
return;
}

sub run_ls
{
print "No spam. Yet.\n";
return;
}

sub smry_ls
{
return "Is there any spam?"
}

Tuesday 11 September 2007

The Beast Remade: Enter WWW::Mechanize

I had recently seen something called WWW::Mechanize mentioned in Google Hacks, 2nd edition. So I determined to have a play around with it, to see if there was something to be gained. An initial look seemed promising. In fairly short order, I had the web interface yelling at me to use a civilized browser, not this scary WWW-Mechanize thingy.


use WWW::Mechanize;

my $url = "http://spamproxy:8888/login.php";

$mech = WWW::Mechanize->new(autocheck => 1);

$mech->get($url);
print $mech->content;

A quick adjustment to $mech->agent took care of that ill temper. Adding invocations to $mech->save_content() allowed me to get a look at what it expected of me - which was simple, at first. It just wanted a password in the single visible form element:


my $password = "foobarword";
$mech->submit_form(fields => { password => $password });

A $mech->save_content() later, I had won a small victory. Next target had to be the web application's menu structure. The story so far:


use WWW::Mechanize;

my $url = "http://spamproxy:8888/login.php";
my $password = "foobarword";

$mech = WWW::Mechanize->new(autocheck => 1);

# Yeah, sure, this is a tested and approved browser
$mech->agent ("Mozilla/5.0 (X11; U; Linux i686;"
."en-US; rv:1.7.13) Gecko/20060418 Firefox/1.0.8");

# Get login page
$mech->get($url);

# Try to log in
$mech->submit_form(fields => { password => $password });

# Save the response page for a look around
$mech->save_content("loggedin.html");

Starting out, I worried that extensive Javascript use would make this hard or impossible. As perldoc WWW::Mechanize::FAQ states, Javascript is not supported and not likely to be until someone looks seriously into it. It turned out that this particular web interface had followed web accessibility guidelines pretty well and was navigable without Javascript support. If your local friendly web interface is less accommodating, well, the FAQ has some tips.

The menu structure turned out to be a mixed bag. Link names were constructed by Javascript, so I could not rely on them. Link URLs, however, were static and thus available for use. Some calls to ->follow_link() later, I had the quarantine main menu dumped to a HTML file for my perusal. This all looked very parsable and scriptable, and a sneak peek at the quarantine listings seemed to confirm my gut feeling.

The Beast Remade: Part the first

As web interfaces to various software have matured and multiplied, they have in many cases moved toward resembling traditional GUIs. Checkboxes, labeled buttons, client-side Javascript error checking and "do you really want to do this" dialogs point the way and do the required hand-holding for novice users. In many cases, this is a good thing - say, remote firmware updates or RAID configuration restores.

System administrators, however, are usually not novice users. Especially not when the web interface in question is the only available interface for a routine daily task. The task in question? To inspect a spam quarantine for false positives - a tiring chore at best. A somewhat cumbersome web interface slowed us down, so grumbles, less than complimentary epithets, and searches for a better way were inevitable.

The Powers That Be heard our grumbles, and were sympathetic to our point of view. This was not optimal, and we could use some time to find a way to speed this up. Nevertheless, any tricks to speed this up were to make use of The Official Interface, and not delve into product internals in the layers below - due to support contract clauses. So we could not manipulate the mail queue directly, even though that was the most obvious path to speedups.

(scary cliffhanger - stay tuned for episode 2!)

Use of words