During the mid 90's, Doug and I were deeply involved in what has proven to be one of the better projects of our careers in academic research computing. The decision was made to shut down our university's research MVS mainframe and move all projects, data, and users to a UNIX system. How to accomplish this without imposing down time on researchers was left as an exercise to the implementors -- us.

One of our self-imposed criteria was that researchers should have the opportunity to learn their way around UNIX and adjust their processes for the new environment without disrupting ongoing work on the mainframe. To that end, we set up a migration process that would pull MVS data sets from the backup system (so as not to interfere with "live" data sets users may be using) and copy them to an archive on the UNIX side. Users could check out copies from this archive and work with them under UNIX to hone their processes. If they screwed up the data, they could just check it out again from the archive.

We were a large SAS shop then; most of the research projects used SAS' data management and statistical capabilities extensively. We chose to use SAS on the mainframe side to manage the migration itself, tracking which data sets had been transfered or were in progress, and determining which were good candidates to go next. SAS also contained a communications component that could transfer the data sets from within SAS. Our research UNIX computer at the time was named "gibbs" (as were many universities' research computers) after Josiah Gibbs, who is considered the father of physical chemestry. Connecting from the mainframe's SAS to our UNIX box gibbs was extremely easy -- create a variable called "gibbs", assign gibbs' IP address to it as a string, and say "connect gibbs;". That worked like a charm, and we were transferring data.

The problem was that were were using the public interface, which at the time was a mere 10Mbit connection that everybody had to share. Both machines were in the same machine room and both had unused fiber interfaces. If they were configured and directly connected, we should be able to transfer data over fiber without taking network bandwidth from users. So the interfaces were configured, and we changed the value of the "gibbs" variable to the fiber's private IP address, ran a test, and voilá, data sets migrated to their new home without a hitch. And so they did, day and night, 24 hours a day, for the next year. Whenever MVS users would update their data, and it was backed up, our system would pick up the change and copy the new version of the data set into the UNIX archive where users could check it out at their leisure. Life was good.

Like many universities, we have a sister university not far away, and they were looking at how to shut down their research mainframe and migrate their users and data to UNIX. Our project was winding down and seemed to have been a success, so when they asked, we agreed to port our system to their mainframe and UNIX box.

Because our mainframes were so similar, using the same backup system and same version of SAS, and because the UNIX side of our code was as POSIX-compliant as we could make it, porting went very quickly. When everything seemed ready to go, we set the "gibbs" variable to the IP address of their UNIX host. But we just couldn't get their mainframe SAS to login to their UNIX box to actually transfer data sets. It would connect, but it didn't seem to like the userid and password -- the same userid and password I had been using to do the port work on that same box. It kept saying "gibbs: invalid userid or password", even when we pointed the "gibbs" variable back at our own gibbs and gave that userid and password.

Out of frustration, or lack of any other ideas, Doug decided to change the name of the variable itself from "gibbs" to "gibbsx", just to try to get the error message to change. But there was no error message this time. This time it worked.

Here's the first part of the double WTF. SAS' connection component had evolved over the years, and one of the evolutionary warts it had retained was that, if the variable name itself resolved through DNS, then the value of the variable was never examined. The "gibbs" that was giving us the login error at our sister campus was another local box there named gibbs, not the box we were trying to connect to. By changing the variable name to something that wouldn't resolve through DNS, we got it to then use the IP address stored in that variable.

And the second part of the WTF? Because our own gibbs had been resolving to the public interface's IP address, we ran our system longer than a year with a perfectly good dedicated fiber lying under the floor between our MVS and UNIX boxes and never sent a single bit through it. With only days before our mainframe was to be shut down for good, we renamed our "gibbs" variable and finally moved a few data sets across the fiber, but at that point it hardly mattered.