Tuesday, March 9, 2010

It's always something

I'm doing an upgrade from 8.10 to 9.0 right now and once again I run up against some crazy issue. It doesn't matter if the configuration is exactly the same as the last client, or if it's a slam-dunk fresh 9.0 install - something always goes wrong.

This time it was a combination of simple issues. I couldn't get the spec merge to run correctly - I kept getting the classic "Invalid Data Source Name" on the source data sources. I've fixed that more times than I can count, and it usually boils down to a typo in the data source name. Sure enough, I found the typo and re-ran the spec merge.

Unfortunately it failed again, this time on the Object Librarian - 810 data source. I went back and re-checked the data sources and the spelling was correct. I saw that there were appropriate entries in the OCMs. I went in to the ODBCs on all machines and made sure everything looked good there too. This is a 64-bit install, so I had to check the 32-bit and the 64-bit ODBC settings. It was all correct.

It was the end of the day by this time and I headed back to the hotel. I took a quick nap and got back on the case. I decided to methodically go through every setting related to Object Librarian - 810 until I found the issue. I looked again at the OCMs and realized that the OCM status column was off the right side of the screen. I scrolled over in the off-chance they were set to NA and sure enough, that was the problem. Once I set all the OL OCMs to AV, the spec merge ran through without errors.

The lesson learned from this incident is when there's an issue that clearly is a result of some simple problem, a methodical analysis of all the settings is the best bet. Leave nothing unchecked, no matter how minor. All CNCs know that even one wrong setting can bring everything to a screeching halt.

Wednesday, March 3, 2010

Getting the client ready for your arrival

One of the challenges I face in my job as a traveling consultant is making sure I have everything ready for me to start working when I get to a client. At my job, I often do not find out about a new installation or upgrade until a few days or a week before I have to fly out. Usually this information comes to me while I'm working full-time for another client, so there normally isn't any prep work I can do for them. Even if I had the time, companies are trying to squeeze every penny in savings out of the contract and the timeline is already compressed. I have to get the client to do some of the prep work for me before I get on-site.

Fortunately there is a lot the client can do in advance of my arrival that doesn't take any special knowledge of EnterpriseOne at all. One time-consuming thing a client can do is download all the necessary files for the upgrade/install. Oracle doesn't send out physical media anymore unless you ask, and even then the software might not be as up-to-date as you would find online.

To help out with the downloads, I maintain a current list of all the software required for an upgrade/installation along with instructions on how to log in to eDelivery and download the files. With that, the client can download virtually all the files necessary to get started.

It's a bit of a stretch to ask any client to get Change Assistant configured and start them downloading the Planner ESU, tools release files, and fix current ESUs. There are few companies - even those already using E1 - that know how to use CA. I usually don't send instructions for that unless I'm certain they can do it.

The most important thing the client can do is simply have all the systems ready. Our company sends someone on-site a few weeks before the engagement begins to tell them what kinds of servers to buy, what operating system to install, and what additional software and hardware requirements there are. By the time I find out that I'm the one that going to do the install, the client should be just about finished with all that setup.

The week before I show up on site, I have a conference call with the IT contact at the client and go over their setup with a fine-tooth comb. You have to ask them about each server, every bit of software - everything - and don't take "we're working on it" as an answer. If they can't be confident that the systems will be ready when you get there on Monday, then there's no point in traveling out there just so you can sit on your hands all day.

Of course there is no fool-proof way to determine if a client is totally ready before you show up on-site. With some clients you just have to be there to tell them what it is they need to finish. As a result you can spend a lot of time waiting for things to get ready and burning hours off the clock.

While it all pays the same in the end, the prep work saves the client money by offloading the busy work to them rather than having the expensive consultant do it. It also saves me from losing time I need to finish the installation. Often the next phase of the install begins on a certain date, with other consultants scheduled on-site and client activities scheduled for the implementation. The install has to be finished by that date regardless, so that lost time gets made up during overtime I could be spending with my family.

Just a few checklists prepared and sent to the client in advance can make the installation go smoother. A conference call before going on-site to discuss their preparation can eliminate a lot of frustration in the first week of work. Getting all that out of they way early can also let you enjoy that family BBQ three weeks from now.

Tuesday, March 2, 2010

OE Linux install part 2

Well, so much for updating the blog on a daily basis for the install. Things have been moving fast for me in the past couple weeks, but fortunately there really isn't much to the Linux/Oracle DB install.

I started getting in to the database end of the install and started finding that it took a lot of effort to get 10g patched and capable of running on Linux. After a while I decided that I would install 11g instead since that is what I would actually be installing out in the field.

Installing 11g on Linux is about as easy as it gets... well, for Linux, that is. Of course before you install 11g you'll want to patch Linux using up2date. If you don't have an up2date account, well then I can't help you. next you will extract and run the Oracle installer.

The 11g installer makes things so easy you'll be wondering what steps you skipped when you're done. The very few things you need to do are explained in detail. You'll have to run a couple of scripts of course, and 11g will attempt to download and install any rpms that it needs. There is a list that you'll have to do manually there as well.

I was installing on a 64-bit server, and interestingly enough it required three 32-bit RPMs. To get RPMs for a different architecture, just use the --arch:i386 argument when you run up2date.

After you're finished with all that, you just have to check patches and what-not, and here Oracle 11g shines again. In the new Oracle 11g EM, there is a link for patches and updates. That will go out and find anything you might need to get up to date. Chances are all that was taken care of during the installation, but it's not a bad idea to go ahead and look one more time.

That's all there is to Linux/Oracle DB - seriously. Of course the hard part would be installing E1 on that system but unfortunately I didn't get to that. If I get the chance to do the actual Linux/Oracle install at the client, you can be sure that I'll talk about it a lot!

Monday, February 22, 2010

Oracle Database on Oracle Enterprise Linux

This week I'll be installing Oracle 10g on OE Linux in preparation for an upcoming client in Florida. I'm really looking forward to getting this under my belt for many reasons. First, it opens up a whole new set of clients I can consult for. That set may be extremely small, but there it is anyway. Second, the experience will make me very marketable, if you know what I mean.

Today all I have had time for is the installation of OE Linux and starting the download of Oracle 10g. The installation of Linux was very straightforward - just pop in the CD and let 'er rip! It's quite easy to step through all the options and since the installer is graphical pretty much anyone can get it done easily.

I installed Linux on a Dell 64-bit Intel server, so I had the installer format the hard drive. I accepted the defaults until I got to the Network Devices screens where I had to manually type in my TCP/IP information. I then selected my location and typed in a root password before selecting the "Software Development" customization option.

In Package Group Selection, I went through and picked some of the items to make it backward compatible and also the tools. It went through the actual installation of the OS then asked for a reboot when it was done.

After the restart and accepting the license agreement, I elected to disable the firewall and SELinux. I wasn't going to need those for what I was doing. I also skipped over Kdump, then set the time and created a user. The sound card test came up next and finally a request to add more installation CDs if I wanted to. I declined both of those and allowed the computer to reboot once again.

Once the system was back up, I enabled VNC on the machine so I didn't have to stand in the cold server room, then began downloading Oracle 10g from edelivery.oracle.com.

Now I just have to wait a few hours for it to finish and I can get started on the database! Tune in tomorrow for the exciting continuation of the install!

Wednesday, February 17, 2010

Windows Data Execution Prevention can stop you dead

I just heard about this and thought I'd pass it along - there is a recent problem with batch jobs executing in E1 and showing up as done in Work with Server Jobs (WSJ), but the processes are not completing on the server. It's also been reported as preventing Server Manager from stopping and could affect other E1 components as well.

This is most likely caused by Windows Data Execution Prevention (DEP). I'll leave it to you to learn exactly what it is, so go check out these links:
In a nutshell, it prevents programs from running in a reserved memory area. By default this setting is enabled for all programs, and that is going to cause problems with E1. The correct setting is "Turn on DEP for essential Windows programs and services only".

Check your settings by going to Control Panel > System > Advanced > Performance. Make it look like the screen shot below:

Go ahead and reboot the servers just to be safe and you should be up and running again!

Tuesday, February 16, 2010

Scheduler problems and server map

I don't know if you remember my previous post about the scheduler server not starting, but I am happy to report that I found the solution to that problem. For those of you too lazy to click the above link, here is the short version: After moving the scheduler to the production server, it would not start.

I gathered logs and dozens of screen shots to share with my coworkers and Oracle. Nobody could figure out what the problem was. Everything looked as if it were set up correctly, so everyone was scratching their heads.

The call with Oracle was a travesty... I opened it almost two weeks ago and the tech in charge of the case would not respond. I would go days without any contact. Finally he started asking me for more screenshots, which I provided last Thursday. This morning, after hearing nothing, I called Oracle and escalated the issue. I then talked to a senior engineer who did a web conference with me to check out the system.

During this web conference, I saw an SQL statement in the scheduler kernel debug log that was looking for an entry for the enterprise server in SVM900.F98611. Whoa! I never thought to look in there! Why? I don't know. At any rate, I ran that SQL directly on the database and it returned zero records. The server map data source for the production enterprise server was missing.

I recreated the entry in the table manually and restarted the scheduler - it worked! I tested a few jobs and it looked good. I thanked the Oracle guys for their help (not much) and let the client know.

My question is: Why was that entry missing? How did we get all the way to a month after go live before noticing we were missing that data source? When did it go missing? Who knows. I care, but only a little bit now that it's working.

This issue once again shows me that I need to look better at the logs and rely especially on my debug log. Those logs are packed with information and it's sometimes hard to find what you need, but they almost always have the answers.

Bottom line is: I didn't troubleshoot this well enough. Once I saw the debug log for the scheduler kernel I figured it out right away. I hadn't run the debug logs yet, and nobody else that looked at the issue had asked for them.

I can take some solace in the fact that I was not the only one who didn't think about the debug log, but I was the person who was supposed to figure it out. I'm also the one who installed their system, so it's doubly on me. I just wish I knew how it happened.

I think that on these difficult issues I should go through and verify the basics every time - path codes, environments, data sources, OCM. It seems like a lot of problems revolve around a mistake with one of those,  especially with new installations and upgrades.

Monday, February 15, 2010

The most important part of E1

I'm just going to come right out and say it: The most important part of EnterpriseOne is the database. All the other stuff - enterprise server, deployment, web servers - that can all be reinstalled. A good CNC can recreate an E1 installation from scratch. You're gonna have to get some developers in there for custom stuff of course, but it's not that hard to reinstall E1.

What you can't recreate is the data. A company can go out of business if they can't recreate their data. Protecting the data is the most important job of any IT department. Without it, you can't pay your employees, customers, or vendors. You can't file taxes. You can't keep track of manufacturing. You can't do ANYTHING.

I am absolutely floored by the number of companies that don't take their data seriously. They don't have an intelligent backup plan at all. No thoughts on disaster recovery ever passed through their thick skull. Nothing has been tested either - it just won't happen to them, they think. Most clients I go to have a recovery failure at some point during an implementation, and it's because they have practically ignored the need to protect their data.

When I talk to clients about disaster recovery, I ask them some direct questions about the worst possible scenarios:

If this building and all its contents are destroyed (or the server room catches fire, fills with water, whatever), how much data loss will you experience?

This helps answer a number of other questions, including:
  • Have they even considered this possibility?
  • Are they doing any kind of off-site replication?
  • If not, how often are backups moved off-site (if at all)?
  • Do they store their backups in the server room? (happens more often than you think)
  • How long does it take to retrieve a backup?
  • How often do they back up their data?
How long will it take you to purchase new servers and have the system back up and running?

This follow up question gets them thinking in more detail about recovery.
  • Do they know how many new servers, workstations, etc., it will take to get the system back up and running?
  • Do they know where to purchase it and how long it will take to arrive?
  • Who will reinstall all the software?
  • Where will the company's operations continue after a disaster?
  • Who will coordinate this effort?

Once we've covered the doom-and-gloom stuff, I move to more realistic possibilities.

If you have a hardware failure or database corruption, how much data will you lose?

In other words, how much of your company's money will you waste because of your current backup strategy. Now we're getting to the area that they may have thought about. Unfortunately, virtually every company I talk to only does full backups every night with too few (or no) transaction log backups. After this you'll know:
  • How often they do a full backup.
  • If they do transaction/redo log backups.
  • Point-in-time restoration capability.
  • Their basic backup strategy.
  • How much of the company's work they can afford to lose.
Have you tested a complete recovery?

I can almost guarantee the answer to that is "no".

Have you written your disaster recovery procedures down?

Again, this is almost certainly too much to expect from most companies.

Who will perform the necessary recovery if you (the IT/database guy) are not available?

Maybe the IT guy is on vacation in Bora-Bora and he can't be contacted. Maybe he died in the server room explosion that also took out your systems. Whatever happened, it's up to someone else to do the work. In a perfect world, that person would take the company's written disaster recovery instructions and go to it. Unfortunately the previous question established they didn't test it or write it down, so they're already screwed. The thought of the IT guy not being there when disaster strikes will chill their hearts.

Okay, sorry for the long-winded post, but I gotta tell you - NOBODY pays enough attention to disaster recovery. Until, that is, an actual disaster happens. But it doesn't have to be a disaster to bring your company to your knees; Simply dropping the wrong table can cause operations to come to a screeching halt. I know - that's happened to me. Don't let it happen to you. Back up your data and write down how to restore it.

Wednesday, February 10, 2010

Internet Explorer "Processing" message

Some clients these days are reporting that when they go into some applications, they find that the "Processing" message that you usually see only for an instant is now sticking around, putting its feet up on the couch and drinking all their beer. The application never comes up.

This is an issue with Internet Explorer, so you can happily point your finger at something besides E1 for a change. Somehow when IE is upgraded it leaves a little errant bit of registry set the wrong way. You won't see this problem on fresh installs of the latest version of IE.

If you see this issue, here's the fix:
  • Open Regedit
  • Navigate to [HKEY_LOCAL_MACHINE\SOFTWARE\Classes\TypeLib\{EAB22AC0-30C1-11CF-A7EB-0000C05BAE0B}\1.1\0\win32\]
  • If you use Internet Explorer 6 (click Help/about while using IE), the value should be c:\windows\system32\shdocvw.dll
  • If you use IE7 (more likely), the value should be c:\windows\system32\ieframe.dll
Once you got that bad boy fixed up, it'll be smooth sailing from there on... until the next error.

Friday, February 5, 2010

SQL Native Client

Sorry for the information overload at the beginning of this blog. I've got 12 years of CNC stories (and some development stories) to tell just to catch up!

Anyway, let's get down to business and talk about the SQL Native Client. Not a whole lot to say here, but it's pretty important. The bottom line with that is you have to make sure the native client matches across all the machines you're using - including the database server.

That's right - the client has to be installed on the database server. It is a separate product and has to be on all the servers and workstations. I bet if you go look, you'll find a difference in the client versions between your machines. One client I had did not make their native client version match and they were getting 12 zombies a day. Once they fixed that, the zombies went away completely.

Another thing about SQL- E1 normally is compatible with all service packs and cumulative updates. What, you don't know about cumulative updates? Oh yeah, they're out there and you have to search for them. Not only that, most of those updates are not available for download. You have to request the update from Microsoft and they'll email you a link to download. I think SQL is up to update 10 now or some such thing, so go check it out.

In summary, just make sure SQL is up-to-date and all your native client version match. Just fixing that one little thing will make you system run smoother.

Thursday, February 4, 2010

Change Table Owners in SQL

If you're doing a path code refresh, you know it's faster just to backup and restore the databases through SQL rather than use R98403. What I've found is a lot of CNCs still use R98403 because they're not sure how to straighten things out after a restore.

You see, when you back up JDE_PRODUCTION (for example) and restore it as JDE_CRP, it doesn't bother to change the owners. The R98403 will do that for you, but it will take three weeks to finish it.

I found this awesome script that will change all the table owners quick and easy. I got it from Scott Forsythe over at http://weblogs.asp.net/owscott/

DECLARE @old sysname, @new sysname, @sql varchar(1000)

SELECT
@old = 'OldOwner_CHANGE_THIS'
, @new = 'NewOwner_CHANGE_THIS'
, @sql = '
IF EXISTS (SELECT NULL FROM INFORMATION_SCHEMA.TABLES
WHERE
QUOTENAME(TABLE_SCHEMA)+''.''+QUOTENAME(TABLE_NAME) = ''?''
AND TABLE_SCHEMA = ''' + @old + '''
)
EXECUTE sp_changeobjectowner ''?'', ''' + @new + ''''

EXECUTE sp_MSforeachtable @sql

This script will work in SQL 2005 and SQL 2008. It's the greatest!

Wednesday, February 3, 2010

MSDE and SSE

Today I learned of a client that was having some problem with their MSDE database and Oracle told them to upgrade it to SSE. I don't know what was wrong or how Oracle thought it would fix this unknown problem, but using an SSE database on your deployment server is just a good idea anyway.

If you're installing 9.0 or something, you're going to install SSE or the Oracle database so you won't have to worry about upgrading. If you're on some ancient release like 8.12 or before then it's possible you're still running on MSDE.

It's really easy to upgrade to SSE, the problem is finding the freaking documentation. It's linked in the special instructions for the 8.12 planner ESU. Take a look on support.oracle.com and find Doc ID 756629.1, scroll about halfway down to the link for Special Instructions for upgrading to SQL Server 2005 Express Edition, click that, find the link on the new page that says the same thing and click that, then click the Here link to go to the actual instructions. Can they make that any more complicated? Ugh.

I guess I should have said this before telling you about upgrading to SSE - make sure you back up your MSDE databases. There is no way to downgrade to MSDE from SSE. You have to uninstall SSE, reinstall MSDE and reattach your original databases. If you forget to back that stuff up, then you are screwed. You have to go with SSE no matter what.

Anyway, let's say you followed the instructions and you jacked it up. You can always reinstall SSE if you want. This'll be your only option if you didn't backup the MSDE databases. To reinstall SSE just follow the directions in Doc ID 661133.1, that'll fix everything right up.

A side note about the SSE database: If you install Visual Studio .NET and don't deselect everything except C++, it will install its own SSE database and place certain restrictions on it that will prevent the E1 SSE database from working correctly. If you install SQL server tools on the deployment server, it will also install an SSE database that will interfere with E1. On the first option, you're almost screwed. You have to completely uninstall Visual Studio and reinstall it the right way. On the second option you can simply stop the other SSE service and the E1 SSE will work fine.

I digress. Now you have SSE installed and what-not, but for some reason the planner DBs aren't populating or maybe you can't build ESUs or packages. One thing I've seen is that the JDE user isn't created or isn't assigned any privileges on the planner databases. This is easy to fix: Connect to the JDESSELocal database via SQL Server Management Studio and add/fix the JDE user. Simple.

If it all goes to hell and doesn't come back, you can always revert to the old MSDE. Uninstall SSE, reinstall MSDE (from the E1 installation), and reattach your old planner databases. You did back those up, right?

Tuesday, February 2, 2010

Scheduler Server not starting

The scheduler server is not starting at one of my clients today. It was set to run on the DV/PY enterprise server, but now that they're live it has to run on the PD server. So, I changed the server name but now it's not starting.

When I click the Start Scheduler button, I get this pop-up window:

Well, I'm the system administrator, so I know about it already. I plugged this info into Oracle Support and big surprise, nothing came up.

The jde.log on the client has this strange message: SCHEDULER - Scheduler Server Set an Error

The debug log has its usual five jillion lines of stuff in it, but the only thing interesting is that the log has a DBPerformRequest for the production server over and over until about the last 10% of the log when it changes to the dev server for a DBResetRequest.

Finally, the scheduler kernel log:

3676/3904 MAIN_THREAD Tue Feb 02 11:27:08.765000 sc_dsptc.c846
Failed to fetch the Scheduler Server host name from the F98611. The Scheduler Server's Logical Data Source is E1PROD. Could not validate the Scheduler Name.

3676/3904 MAIN_THREAD Tue Feb 02 11:27:08.766000 sc_dsptc.c865
Scheduler does not run on this machine. The real Scheduler Server is .

3676/3904 MAIN_THREAD Tue Feb 02 11:27:08.767000 sc_dsptc.c969
This server will not start the Scheduler.
The real scheduler server is blank? Don't you worry about blank, let me worry about blank. Blank, blank, what is blank? The F91300 has the correct server name in it and the F98611 has the production server set up there. What's the dealio?

I opened a call with Oracle. We'll see what they say.