Saturday, August 6, 2011

Doing the impossible, one server at a time

It's been a long time since my last post and a lot of things have changed. I won't type much now, but I wanted to let you know that soon I will tell you the tale of a company in California wanted to upgrade the operating system on all their E1 servers - in three days.

I'll also share with you the closely guarded secret of how to change all server names in a matter of minutes. That plus tales of politics, frustration, backstabbing, and all the other fascinating things happening in the life of a CNC road warrior.

Stay tuned for a new blog post next week!

Tuesday, March 9, 2010

It's always something

I'm doing an upgrade from 8.10 to 9.0 right now and once again I run up against some crazy issue. It doesn't matter if the configuration is exactly the same as the last client, or if it's a slam-dunk fresh 9.0 install - something always goes wrong.

This time it was a combination of simple issues. I couldn't get the spec merge to run correctly - I kept getting the classic "Invalid Data Source Name" on the source data sources. I've fixed that more times than I can count, and it usually boils down to a typo in the data source name. Sure enough, I found the typo and re-ran the spec merge.

Unfortunately it failed again, this time on the Object Librarian - 810 data source. I went back and re-checked the data sources and the spelling was correct. I saw that there were appropriate entries in the OCMs. I went in to the ODBCs on all machines and made sure everything looked good there too. This is a 64-bit install, so I had to check the 32-bit and the 64-bit ODBC settings. It was all correct.

It was the end of the day by this time and I headed back to the hotel. I took a quick nap and got back on the case. I decided to methodically go through every setting related to Object Librarian - 810 until I found the issue. I looked again at the OCMs and realized that the OCM status column was off the right side of the screen. I scrolled over in the off-chance they were set to NA and sure enough, that was the problem. Once I set all the OL OCMs to AV, the spec merge ran through without errors.

The lesson learned from this incident is when there's an issue that clearly is a result of some simple problem, a methodical analysis of all the settings is the best bet. Leave nothing unchecked, no matter how minor. All CNCs know that even one wrong setting can bring everything to a screeching halt.

Wednesday, March 3, 2010

Getting the client ready for your arrival

One of the challenges I face in my job as a traveling consultant is making sure I have everything ready for me to start working when I get to a client. At my job, I often do not find out about a new installation or upgrade until a few days or a week before I have to fly out. Usually this information comes to me while I'm working full-time for another client, so there normally isn't any prep work I can do for them. Even if I had the time, companies are trying to squeeze every penny in savings out of the contract and the timeline is already compressed. I have to get the client to do some of the prep work for me before I get on-site.

Fortunately there is a lot the client can do in advance of my arrival that doesn't take any special knowledge of EnterpriseOne at all. One time-consuming thing a client can do is download all the necessary files for the upgrade/install. Oracle doesn't send out physical media anymore unless you ask, and even then the software might not be as up-to-date as you would find online.

To help out with the downloads, I maintain a current list of all the software required for an upgrade/installation along with instructions on how to log in to eDelivery and download the files. With that, the client can download virtually all the files necessary to get started.

It's a bit of a stretch to ask any client to get Change Assistant configured and start them downloading the Planner ESU, tools release files, and fix current ESUs. There are few companies - even those already using E1 - that know how to use CA. I usually don't send instructions for that unless I'm certain they can do it.

The most important thing the client can do is simply have all the systems ready. Our company sends someone on-site a few weeks before the engagement begins to tell them what kinds of servers to buy, what operating system to install, and what additional software and hardware requirements there are. By the time I find out that I'm the one that going to do the install, the client should be just about finished with all that setup.

The week before I show up on site, I have a conference call with the IT contact at the client and go over their setup with a fine-tooth comb. You have to ask them about each server, every bit of software - everything - and don't take "we're working on it" as an answer. If they can't be confident that the systems will be ready when you get there on Monday, then there's no point in traveling out there just so you can sit on your hands all day.

Of course there is no fool-proof way to determine if a client is totally ready before you show up on-site. With some clients you just have to be there to tell them what it is they need to finish. As a result you can spend a lot of time waiting for things to get ready and burning hours off the clock.

While it all pays the same in the end, the prep work saves the client money by offloading the busy work to them rather than having the expensive consultant do it. It also saves me from losing time I need to finish the installation. Often the next phase of the install begins on a certain date, with other consultants scheduled on-site and client activities scheduled for the implementation. The install has to be finished by that date regardless, so that lost time gets made up during overtime I could be spending with my family.

Just a few checklists prepared and sent to the client in advance can make the installation go smoother. A conference call before going on-site to discuss their preparation can eliminate a lot of frustration in the first week of work. Getting all that out of they way early can also let you enjoy that family BBQ three weeks from now.

Tuesday, March 2, 2010

OE Linux install part 2

Well, so much for updating the blog on a daily basis for the install. Things have been moving fast for me in the past couple weeks, but fortunately there really isn't much to the Linux/Oracle DB install.

I started getting in to the database end of the install and started finding that it took a lot of effort to get 10g patched and capable of running on Linux. After a while I decided that I would install 11g instead since that is what I would actually be installing out in the field.

Installing 11g on Linux is about as easy as it gets... well, for Linux, that is. Of course before you install 11g you'll want to patch Linux using up2date. If you don't have an up2date account, well then I can't help you. next you will extract and run the Oracle installer.

The 11g installer makes things so easy you'll be wondering what steps you skipped when you're done. The very few things you need to do are explained in detail. You'll have to run a couple of scripts of course, and 11g will attempt to download and install any rpms that it needs. There is a list that you'll have to do manually there as well.

I was installing on a 64-bit server, and interestingly enough it required three 32-bit RPMs. To get RPMs for a different architecture, just use the --arch:i386 argument when you run up2date.

After you're finished with all that, you just have to check patches and what-not, and here Oracle 11g shines again. In the new Oracle 11g EM, there is a link for patches and updates. That will go out and find anything you might need to get up to date. Chances are all that was taken care of during the installation, but it's not a bad idea to go ahead and look one more time.

That's all there is to Linux/Oracle DB - seriously. Of course the hard part would be installing E1 on that system but unfortunately I didn't get to that. If I get the chance to do the actual Linux/Oracle install at the client, you can be sure that I'll talk about it a lot!

Monday, February 22, 2010

Oracle Database on Oracle Enterprise Linux

This week I'll be installing Oracle 10g on OE Linux in preparation for an upcoming client in Florida. I'm really looking forward to getting this under my belt for many reasons. First, it opens up a whole new set of clients I can consult for. That set may be extremely small, but there it is anyway. Second, the experience will make me very marketable, if you know what I mean.

Today all I have had time for is the installation of OE Linux and starting the download of Oracle 10g. The installation of Linux was very straightforward - just pop in the CD and let 'er rip! It's quite easy to step through all the options and since the installer is graphical pretty much anyone can get it done easily.

I installed Linux on a Dell 64-bit Intel server, so I had the installer format the hard drive. I accepted the defaults until I got to the Network Devices screens where I had to manually type in my TCP/IP information. I then selected my location and typed in a root password before selecting the "Software Development" customization option.

In Package Group Selection, I went through and picked some of the items to make it backward compatible and also the tools. It went through the actual installation of the OS then asked for a reboot when it was done.

After the restart and accepting the license agreement, I elected to disable the firewall and SELinux. I wasn't going to need those for what I was doing. I also skipped over Kdump, then set the time and created a user. The sound card test came up next and finally a request to add more installation CDs if I wanted to. I declined both of those and allowed the computer to reboot once again.

Once the system was back up, I enabled VNC on the machine so I didn't have to stand in the cold server room, then began downloading Oracle 10g from edelivery.oracle.com.

Now I just have to wait a few hours for it to finish and I can get started on the database! Tune in tomorrow for the exciting continuation of the install!

Wednesday, February 17, 2010

Windows Data Execution Prevention can stop you dead

I just heard about this and thought I'd pass it along - there is a recent problem with batch jobs executing in E1 and showing up as done in Work with Server Jobs (WSJ), but the processes are not completing on the server. It's also been reported as preventing Server Manager from stopping and could affect other E1 components as well.

This is most likely caused by Windows Data Execution Prevention (DEP). I'll leave it to you to learn exactly what it is, so go check out these links:
In a nutshell, it prevents programs from running in a reserved memory area. By default this setting is enabled for all programs, and that is going to cause problems with E1. The correct setting is "Turn on DEP for essential Windows programs and services only".

Check your settings by going to Control Panel > System > Advanced > Performance. Make it look like the screen shot below:

Go ahead and reboot the servers just to be safe and you should be up and running again!

Tuesday, February 16, 2010

Scheduler problems and server map

I don't know if you remember my previous post about the scheduler server not starting, but I am happy to report that I found the solution to that problem. For those of you too lazy to click the above link, here is the short version: After moving the scheduler to the production server, it would not start.

I gathered logs and dozens of screen shots to share with my coworkers and Oracle. Nobody could figure out what the problem was. Everything looked as if it were set up correctly, so everyone was scratching their heads.

The call with Oracle was a travesty... I opened it almost two weeks ago and the tech in charge of the case would not respond. I would go days without any contact. Finally he started asking me for more screenshots, which I provided last Thursday. This morning, after hearing nothing, I called Oracle and escalated the issue. I then talked to a senior engineer who did a web conference with me to check out the system.

During this web conference, I saw an SQL statement in the scheduler kernel debug log that was looking for an entry for the enterprise server in SVM900.F98611. Whoa! I never thought to look in there! Why? I don't know. At any rate, I ran that SQL directly on the database and it returned zero records. The server map data source for the production enterprise server was missing.

I recreated the entry in the table manually and restarted the scheduler - it worked! I tested a few jobs and it looked good. I thanked the Oracle guys for their help (not much) and let the client know.

My question is: Why was that entry missing? How did we get all the way to a month after go live before noticing we were missing that data source? When did it go missing? Who knows. I care, but only a little bit now that it's working.

This issue once again shows me that I need to look better at the logs and rely especially on my debug log. Those logs are packed with information and it's sometimes hard to find what you need, but they almost always have the answers.

Bottom line is: I didn't troubleshoot this well enough. Once I saw the debug log for the scheduler kernel I figured it out right away. I hadn't run the debug logs yet, and nobody else that looked at the issue had asked for them.

I can take some solace in the fact that I was not the only one who didn't think about the debug log, but I was the person who was supposed to figure it out. I'm also the one who installed their system, so it's doubly on me. I just wish I knew how it happened.

I think that on these difficult issues I should go through and verify the basics every time - path codes, environments, data sources, OCM. It seems like a lot of problems revolve around a mistake with one of those,  especially with new installations and upgrades.