dacs.doc electric

 

The Server Won’t Boot
or
Why You Can’t Have Too Many Backups, Part 3

By Jim Scheef


Previously on Server Won't Boot…

Last time we talked about how we made the system stable, got past the BSD (Blue Screen of Death) and recovered the fault-tolerant disk drives. Rick, the Microsoft Support Engineer assigned to my case, led me thru these pothole-infested steps. One of the first steps had been to disable as many services as possible. This meant that there were fewer things to fail during the boot process and made it easier to get the ma-chine running. Now we're ready to move on to the real server issues.

Make It Network

One morning when Rick was not available, I decided to go ahead and try to get the network going again on my own. To keep our definitions straight, "the network" consists of:

  • the Server service (allows other computers to see shared resources)
  • the Workstation service (connects to resources shared by other machines)
  • the device driver for the server network card
  • the Dynamic Host Configuration Protocol service (DHCP)
  • Windows Internet Name Service (WINS)
  • Domain Name Service (DNS)

All of these had been disabled during the initial effort to get the machine to boot. Now it was time to turn them back on. Naturally it would not be quite so easy.

First I tried to set the Intel Pro100 device to start on "System" in the Devices applet in Control Panel. On reboot this produced a BSD with the "bad image" message, so I changed the start option back to "Disabled" from RegEdt32 in Workstation . It turns out that the NT Service Pack installer does not update any devices or services that are set to disabled. Thus the networking files were still corrupt. The trick was to change all of these devices and services to "Manual" so that they were enabled for the service pack update but would not run and crash the server. Got that?

After reapplying SP6a some things worked but others did not, and I had to wait for Rick's assistance. So much for striking out on my own! In the mean time, I found that Internet Explorer would not run. IE is needed for many Microsoft functions including installing many programs. My first attempt was to install Internet Explorer 5.01, the version I had been running. When the system told me that "IE501INS.EXE is not a Windows NT program", I knew that I was still in deep you-know-what. Then I had what turned out to be a stroke of genius. I installed Internet Explorer 5.5. The reason is that this installs the "Windows In-staller", Microsoft's new program installation facility, which then allowed several other fixes down the road. The IE5.5 install was successful.

When Rick returned, we readdressed (no pun) the network. It turns out that NT will not function correctly without a "network adaptor" of some sort. When NT is installed without a local area network or a modem, a 'loopback adapter' is installed to keep NT happy. If I knew this before, I certainly did not think of it at the time. All of the network problems boiled down to a corrupt driver for the Intel Pro100 network adapter built into the motherboard. Before removing and reinstalling that driver, which would probably have cured almost everything in the network, I uninstalled and reinstalled RAS and WINS (several times) with several reapplications of SP6a along the way. At some point I got desperate and installed an Intel driver downloaded from the Intel web site. This was the magic bullet for the network as almost everything started working.

Reinstalling DHCP had created an empty database. DHCP (Dynamic Host Configuration Protocol) is a means to automatically assign an IP address to computers on a network. Some computers, like servers and routers, need an assigned or reserved address that does not change. Thus, I had to reenter the reservations for several machines on my network. DHCP reservations are a means for a network administrator to assign a specific IP address to a machine from a central facility. This greatly simplifies management of devices like print servers, routers, and any machine that needs to retain its IP address indefinitely. The down side is that you need to know each machine's MAC address - a unique number burned into every network card when it is manufactured. Since my network is small, this was not an onerous task but I sure wouldn't want to do it for a larger server farm!
WINS (Windows Internet Name Service) is a simplified way to translate between IP addresses and the friendly names assigned to each computer on a Windows network. Normally the service finds all the com-puters on the network and builds the database automatically. (Did I mention the simplified part?) Now, has anything in this story been that easy? At first I celebrated that the network was almost complete but soon realized that WINS was not functioning at all. The clue was a recurring message in Event Viewer that said the WINS could not start due to a corrupt database. How could this be when the service had been removed and reinstalled? There is a procedure in the Microsoft Knowledge Base to create a new WINS database, but this only changed the problem. A new message in Event Viewer said, "WINS could not create the notification socket. Make sure TCP/IP driver is installed and working properly." Ultimately I had to reinstall TCP/IP!

Finally I could see and access other machines on the network and vice versa. Did you think we were done? You forgot about DNS.

DNS (Domain Name Service) is the Internet standard to resolve a URL like www.telemarksys.com into an IP address. DNS is not needed for the typical Windows network, but Microsoft has adopted DNS to replace WINS on Windows 2000 networks and it's needed if you want to integrate Linux into an existing Windows network.

Microsoft's DNS on NT does not use a Jet database like DHCP and WINS, but restoring it still took some work. Without a current backup I thought I had to recreate all of the domain records from scratch. This is an involved task made more difficult by the fact that I no longer have the magazine article that led me thru the process. The savior here was DNS itself. DNS on UNIX runs from a set of text files that contain all of the 'records' needed to name the domain, define the naming authority, and list each machine in the domain that has a static (assigned) IP address. Naturally Microsoft does it differently and uses the system registry to store this information. NT's DNS will create a set of domain data files on request. It's up to the adminis-trator to click the button whenever the domain information is updated to keep these data files up to date. The good news is that the files were current enough to be useful, so it was back to the keyboard to re-enter all the DNS records needed to define the domain. Thanks to these text files, I had the information I needed.

Here is one of the DNS data files showing the information needed to map a computer name to the IP ad-dress. Since my network is small recreating this was not a big deal, of course it would be for a larger enter-prise.

; Database file telemarksys.com.dns for telemarksys.com zone.
; Zone version: 30
;

@ IN SOA home2.telemarksys.com.
info@telemarksys.com.
(
30 ; serial number
3600 ; refresh
600 ; retry
86400 ; expire
3600 ) ; minimum TTL

;
; Zone NS records
;

@ NS home2

;
; WINS lookup record
;

@ 0 WINS L1 C600 (
10.0.42.1 )

;
; Zone records
;

home2 A 10.0.42.1
ptr01 A 10.0.42.26
tlx A 10.0.42.5
ts2 A 10.0.42.6
wlap0 A 10.0.42.24

Whew! So what did we learn here? Networking is built of layers of software. This rebuild should have been treated as a new install. If we had taken that approach we would have removed and replaced from the bottom up. I'm sure now that would have cut the time in half.

Deliver the Mail

Meanwhile Rick had been researching our problem with the mysterious DLL that "was not a valid Windows NT image" and how we could get a newer version to replace the one on the system. It was not part of Windows NT but was a Microsoft DLL. A couple years ago the Microsoft web site added a searchable da-tabase of every DLL produced by Microsoft showing every version number and what install packages car-ried each version. It is a wonder to behold. A search of the DLL database revealed that our mysterious DLL was a part of several server products.

One of the tech support people Rick consulted found the most recent version in Site Server Service Pack 4. The first attempt to install this service pack failed. So we uninstalled Site Server so it could be re-installed just to install the service pack. Is that convoluted or what? But you know what? It worked.

My server runs two server applications: Microsoft SQL Server and Microsoft Exchange Server. At the time the problem started, I had also installed Site Server, Microsoft's high-end web server product, although it was not actually running. Recovering these applications was the last task.

Exchange Server is Microsoft's enterprise email and collaboration product. Admittedly it is overkill for a one-man software shop, but I first set it up to learn enough to install it for a client. We intended to use it to store documents created by the system I developed. Of course once it was running, I wanted to use it and once I started to use it, I was immediately addicted . Now it contains several thousand emails and other documents that can be searched at the click of a mouse. Loosing that database (data store in Exchange-speak) was unthinkable.

If this were a normal situation where there was a backup tape from, oh say, last nite, then we could just blow away the existing installation of Exchange, reinstall, restore the backup and be back in business. Well, as we have seen, it ain't so easy when there ain't no [sic] backup! To guide me thru this recovery, Rick handed me off to an Exchange Server specialist. We did blow away the existing installation after carefully making backup copies of certain directories. After reinstalling and applying latest service pack, we pointed Exchange to the original data stores. Rebooting to restart the services (Exchange Server is actually five or more individual processes that run in the background) I could access my mailbox from Outlook on a client machine. This event was the biggest relief of this entire agonizing process.

Dish Out Data

Next up was SQL Server, Microsoft's relational database manager. Given our success with Exchange and my greater familiarity with SQL Server, I was ready to tackle this one on my own. Just to be sure, Rick connected me to a SQL specialist and in no time, we had SQL Server working again. The steps were the same as with Exchange: remove, reinstall, apply the service pack and then connect to the data. SQL Server keeps track of user databases in a database of it's own called Master. This has records that point the way to each database file in disk. The SQL support specialist wrote a script to add the records needed to connect the database files still on disk from before all this started.

Pages to the Web

We are now on the home stretch. The last step was to get Internet Information Server (IIS) going again. I had two web sites - sort of. One web site was my feeble attempt at a company web site. The other site was Outlook Web Access. OWA is a collection of server-based programs and web pages that imitate the look and feel of the Outlook email program over the web. This makes it possible to check email from any PC with Internet access.

The situation with IIS was muddied by the presence of Site Server, so I uninstalled Site Server. Yes, I know I just blew away all that work reinstalling it, but that effort was really just to get past the DLL error. Based on our successes with Exchange and SQL Server, the next task was to remove IIS. In the NT scheme of things this means removing something called the "Windows NT Option Pack". The Option Pack includes all the stuff needed to run a Microsoft-based web site including IIS, Index Server and Microsoft Transaction Server (MTS) along with various control panels to configure these programs. The uninstall ran to com-pletion but not without errors. Some were probably caused by files removed with Site Server and the rest - who knows. Reinstalling Option Pack ran fine until near the end when it could not configure the FTP site. Since I do not plan to run the FTP server service, I could live with these results.

With IIS running and serving up the sample web pages, it was time to reinstall Outlook Web Access (OWA). This proved to be the final nightmare and required several conversations with Exchange tech sup-port specialists. OWA walks a narrow path between an acceptable level of security and opening your server to a world full of bad guys. When we finished, OWA is working but I have an idea that it is not as secure as it should be. Some people would say that IIS security is an oxymoron, so this could be a problem.

To increase security, I moved the port used by IIS to a non-standard number and opened only that non-standard port thru the firewall. When checking the firewall logs I have yet to see any activity on that port. Of course that does mean it's safe, just that it's hidden for the moment. On the server I have gradually tightened file and directory permissions on OWA and the IIS directories hoping that I'm moving in the right direction. As long as the added security permissions do not break IIS or OWA, then things must be better, right?

Lessons Learned

So what did I learn from this experience? In Part 1, I talked about the danger of lulling yourself into a sense of false security regarding hardware redundancy and backups. This is the biggest lesson here. Hardware redundancy is good for system reliability but it is not a replacement for good and frequent backups.

Another point on backups is that if all your backup tapes are stored in one place, you are still vulnerable. For a small business, maintaining off-site backups can be a simple as taking a tape home from the office once a month.

Last, solving a puzzle as big as my server disaster (at least it seemed like a disaster at the time) requires a plan and that's what Rick brought to the table along with his knowledge of the NT product. His approach broke the recovery into the stages I've reported here. Most of the flailing around was the result of my desire to rush ahead. Of course there were several speed bumps along the way and many times errors stopped an install or uninstall. Solving these problems in an logical manner kept us moving.

For all of the bad press Microsoft Support has received over the years, I was highly impressed with the level of knowledge, patience and professionalism I saw from Rick and his cohorts. If fee-based support yields this level of quality, then it's worth every penny. Now, I must also point out that the products in-volved here were all high-end server-based enterprise products so you would expect any vendor to assign their best support engineers to these products.


Jim Scheef is the Mad Scientist at Telemark Systems Inc. where he develops custom software using Visual Basic and SQL Server and provides networking services using Windows NT/2000. He has been a DACS member since the day DOG became WC/MUG.

BackHomeNext