This week I'll be installing Oracle 10g on OE Linux in preparation for an upcoming client in Florida. I'm really looking forward to getting this under my belt for many reasons. First, it opens up a whole new set of clients I can consult for. That set may be extremely small, but there it is anyway. Second, the experience will make me very marketable, if you know what I mean.
Today all I have had time for is the installation of OE Linux and starting the download of Oracle 10g. The installation of Linux was very straightforward - just pop in the CD and let 'er rip! It's quite easy to step through all the options and since the installer is graphical pretty much anyone can get it done easily.
I installed Linux on a Dell 64-bit Intel server, so I had the installer format the hard drive. I accepted the defaults until I got to the Network Devices screens where I had to manually type in my TCP/IP information. I then selected my location and typed in a root password before selecting the "Software Development" customization option.
In Package Group Selection, I went through and picked some of the items to make it backward compatible and also the tools. It went through the actual installation of the OS then asked for a reboot when it was done.
After the restart and accepting the license agreement, I elected to disable the firewall and SELinux. I wasn't going to need those for what I was doing. I also skipped over Kdump, then set the time and created a user. The sound card test came up next and finally a request to add more installation CDs if I wanted to. I declined both of those and allowed the computer to reboot once again.
Once the system was back up, I enabled VNC on the machine so I didn't have to stand in the cold server room, then began downloading Oracle 10g from edelivery.oracle.com.
Now I just have to wait a few hours for it to finish and I can get started on the database! Tune in tomorrow for the exciting continuation of the install!
I'm a senior DBA on multiple database platforms, an infrastructure/cloud architect, a project manager, a cybersecurity professional, a programmer, and much more. I'm also an expert JD Edwards CNC administrator with 20+ years experience. I come across a lot of interesting stuff in my work, and before it all gets pushed out of my brain I'm going to write some of it down for you. I hope it helps out.
Monday, February 22, 2010
Wednesday, February 17, 2010
Windows Data Execution Prevention can stop you dead
I just heard about this and thought I'd pass it along - there is a recent problem with batch jobs executing in E1 and showing up as done in Work with Server Jobs (WSJ), but the processes are not completing on the server. It's also been reported as preventing Server Manager from stopping and could affect other E1 components as well.
This is most likely caused by Windows Data Execution Prevention (DEP). I'll leave it to you to learn exactly what it is, so go check out these links:
This is most likely caused by Windows Data Execution Prevention (DEP). I'll leave it to you to learn exactly what it is, so go check out these links:
In a nutshell, it prevents programs from running in a reserved memory area. By default this setting is enabled for all programs, and that is going to cause problems with E1. The correct setting is "Turn on DEP for essential Windows programs and services only".
Check your settings by going to Control Panel > System > Advanced > Performance. Make it look like the screen shot below:
Check your settings by going to Control Panel > System > Advanced > Performance. Make it look like the screen shot below:
Tuesday, February 16, 2010
Scheduler problems and server map
I don't know if you remember my previous post about the scheduler server not starting, but I am happy to report that I found the solution to that problem. For those of you too lazy to click the above link, here is the short version: After moving the scheduler to the production server, it would not start.
I gathered logs and dozens of screen shots to share with my coworkers and Oracle. Nobody could figure out what the problem was. Everything looked as if it were set up correctly, so everyone was scratching their heads.
The call with Oracle was a travesty... I opened it almost two weeks ago and the tech in charge of the case would not respond. I would go days without any contact. Finally he started asking me for more screenshots, which I provided last Thursday. This morning, after hearing nothing, I called Oracle and escalated the issue. I then talked to a senior engineer who did a web conference with me to check out the system.
During this web conference, I saw an SQL statement in the scheduler kernel debug log that was looking for an entry for the enterprise server in SVM900.F98611. Whoa! I never thought to look in there! Why? I don't know. At any rate, I ran that SQL directly on the database and it returned zero records. The server map data source for the production enterprise server was missing.
I recreated the entry in the table manually and restarted the scheduler - it worked! I tested a few jobs and it looked good. I thanked the Oracle guys for their help (not much) and let the client know.
My question is: Why was that entry missing? How did we get all the way to a month after go live before noticing we were missing that data source? When did it go missing? Who knows. I care, but only a little bit now that it's working.
This issue once again shows me that I need to look better at the logs and rely especially on my debug log. Those logs are packed with information and it's sometimes hard to find what you need, but they almost always have the answers.
Bottom line is: I didn't troubleshoot this well enough. Once I saw the debug log for the scheduler kernel I figured it out right away. I hadn't run the debug logs yet, and nobody else that looked at the issue had asked for them.
I can take some solace in the fact that I was not the only one who didn't think about the debug log, but I was the person who was supposed to figure it out. I'm also the one who installed their system, so it's doubly on me. I just wish I knew how it happened.
I think that on these difficult issues I should go through and verify the basics every time - path codes, environments, data sources, OCM. It seems like a lot of problems revolve around a mistake with one of those, especially with new installations and upgrades.
I gathered logs and dozens of screen shots to share with my coworkers and Oracle. Nobody could figure out what the problem was. Everything looked as if it were set up correctly, so everyone was scratching their heads.
The call with Oracle was a travesty... I opened it almost two weeks ago and the tech in charge of the case would not respond. I would go days without any contact. Finally he started asking me for more screenshots, which I provided last Thursday. This morning, after hearing nothing, I called Oracle and escalated the issue. I then talked to a senior engineer who did a web conference with me to check out the system.
During this web conference, I saw an SQL statement in the scheduler kernel debug log that was looking for an entry for the enterprise server in SVM900.F98611. Whoa! I never thought to look in there! Why? I don't know. At any rate, I ran that SQL directly on the database and it returned zero records. The server map data source for the production enterprise server was missing.
I recreated the entry in the table manually and restarted the scheduler - it worked! I tested a few jobs and it looked good. I thanked the Oracle guys for their help (not much) and let the client know.
My question is: Why was that entry missing? How did we get all the way to a month after go live before noticing we were missing that data source? When did it go missing? Who knows. I care, but only a little bit now that it's working.
This issue once again shows me that I need to look better at the logs and rely especially on my debug log. Those logs are packed with information and it's sometimes hard to find what you need, but they almost always have the answers.
Bottom line is: I didn't troubleshoot this well enough. Once I saw the debug log for the scheduler kernel I figured it out right away. I hadn't run the debug logs yet, and nobody else that looked at the issue had asked for them.
I can take some solace in the fact that I was not the only one who didn't think about the debug log, but I was the person who was supposed to figure it out. I'm also the one who installed their system, so it's doubly on me. I just wish I knew how it happened.
I think that on these difficult issues I should go through and verify the basics every time - path codes, environments, data sources, OCM. It seems like a lot of problems revolve around a mistake with one of those, especially with new installations and upgrades.
Monday, February 15, 2010
The most important part of E1
I'm just going to come right out and say it: The most important part of EnterpriseOne is the database. All the other stuff - enterprise server, deployment, web servers - that can all be reinstalled. A good CNC can recreate an E1 installation from scratch. You're gonna have to get some developers in there for custom stuff of course, but it's not that hard to reinstall E1.
What you can't recreate is the data. A company can go out of business if they can't recreate their data. Protecting the data is the most important job of any IT department. Without it, you can't pay your employees, customers, or vendors. You can't file taxes. You can't keep track of manufacturing. You can't do ANYTHING.
I am absolutely floored by the number of companies that don't take their data seriously. They don't have an intelligent backup plan at all. No thoughts on disaster recovery ever passed through their thick skull. Nothing has been tested either - it just won't happen to them, they think. Most clients I go to have a recovery failure at some point during an implementation, and it's because they have practically ignored the need to protect their data.
When I talk to clients about disaster recovery, I ask them some direct questions about the worst possible scenarios:
If this building and all its contents are destroyed (or the server room catches fire, fills with water, whatever), how much data loss will you experience?
This helps answer a number of other questions, including:
- Have they even considered this possibility?
- Are they doing any kind of off-site replication?
- If not, how often are backups moved off-site (if at all)?
- Do they store their backups in the server room? (happens more often than you think)
- How long does it take to retrieve a backup?
- How often do they back up their data?
How long will it take you to purchase new servers and have the system back up and running?
This follow up question gets them thinking in more detail about recovery.
- Do they know how many new servers, workstations, etc., it will take to get the system back up and running?
- Do they know where to purchase it and how long it will take to arrive?
- Who will reinstall all the software?
- Where will the company's operations continue after a disaster?
- Who will coordinate this effort?
Once we've covered the doom-and-gloom stuff, I move to more realistic possibilities.
If you have a hardware failure or database corruption, how much data will you lose?
In other words, how much of your company's money will you waste because of your current backup strategy. Now we're getting to the area that they may have thought about. Unfortunately, virtually every company I talk to only does full backups every night with too few (or no) transaction log backups. After this you'll know:
- How often they do a full backup.
- If they do transaction/redo log backups.
- Point-in-time restoration capability.
- Their basic backup strategy.
- How much of the company's work they can afford to lose.
Have you tested a complete recovery?
I can almost guarantee the answer to that is "no".
Have you written your disaster recovery procedures down?
Again, this is almost certainly too much to expect from most companies.
Who will perform the necessary recovery if you (the IT/database guy) are not available?
Maybe the IT guy is on vacation in Bora-Bora and he can't be contacted. Maybe he died in the server room explosion that also took out your systems. Whatever happened, it's up to someone else to do the work. In a perfect world, that person would take the company's written disaster recovery instructions and go to it. Unfortunately the previous question established they didn't test it or write it down, so they're already screwed. The thought of the IT guy not being there when disaster strikes will chill their hearts.
Okay, sorry for the long-winded post, but I gotta tell you - NOBODY pays enough attention to disaster recovery. Until, that is, an actual disaster happens. But it doesn't have to be a disaster to bring your company to your knees; Simply dropping the wrong table can cause operations to come to a screeching halt. I know - that's happened to me. Don't let it happen to you. Back up your data and write down how to restore it.
Wednesday, February 10, 2010
Internet Explorer "Processing" message
Some clients these days are reporting that when they go into some applications, they find that the "Processing" message that you usually see only for an instant is now sticking around, putting its feet up on the couch and drinking all their beer. The application never comes up.
This is an issue with Internet Explorer, so you can happily point your finger at something besides E1 for a change. Somehow when IE is upgraded it leaves a little errant bit of registry set the wrong way. You won't see this problem on fresh installs of the latest version of IE.
If you see this issue, here's the fix:
- Open Regedit
- Navigate to [HKEY_LOCAL_MACHINE\SOFTWARE\Classes\TypeLib\{EAB22AC0-30C1-11CF-A7EB-0000C05BAE0B}\1.1\0\win32\]
- If you use Internet Explorer 6 (click Help/about while using IE), the value should be c:\windows\system32\shdocvw.dll
- If you use IE7 (more likely), the value should be c:\windows\system32\ieframe.dll
Once you got that bad boy fixed up, it'll be smooth sailing from there on... until the next error.
Friday, February 5, 2010
SQL Native Client
Sorry for the information overload at the beginning of this blog. I've got 12 years of CNC stories (and some development stories) to tell just to catch up!
Anyway, let's get down to business and talk about the SQL Native Client. Not a whole lot to say here, but it's pretty important. The bottom line with that is you have to make sure the native client matches across all the machines you're using - including the database server.
That's right - the client has to be installed on the database server. It is a separate product and has to be on all the servers and workstations. I bet if you go look, you'll find a difference in the client versions between your machines. One client I had did not make their native client version match and they were getting 12 zombies a day. Once they fixed that, the zombies went away completely.
Another thing about SQL- E1 normally is compatible with all service packs and cumulative updates. What, you don't know about cumulative updates? Oh yeah, they're out there and you have to search for them. Not only that, most of those updates are not available for download. You have to request the update from Microsoft and they'll email you a link to download. I think SQL is up to update 10 now or some such thing, so go check it out.
In summary, just make sure SQL is up-to-date and all your native client version match. Just fixing that one little thing will make you system run smoother.
Thursday, February 4, 2010
Change Table Owners in SQL
If you're doing a path code refresh, you know it's faster just to backup and restore the databases through SQL rather than use R98403. What I've found is a lot of CNCs still use R98403 because they're not sure how to straighten things out after a restore.
You see, when you back up JDE_PRODUCTION (for example) and restore it as JDE_CRP, it doesn't bother to change the owners. The R98403 will do that for you, but it will take three weeks to finish it.
I found this awesome script that will change all the table owners quick and easy. I got it from Scott Forsythe over at http://weblogs.asp.net/owscott/
This script will work in SQL 2005 and SQL 2008. It's the greatest!DECLARE @old sysname, @new sysname, @sql varchar(1000)SELECT@old = 'OldOwner_CHANGE_THIS', @new = 'NewOwner_CHANGE_THIS', @sql = 'IF EXISTS (SELECT NULL FROM INFORMATION_SCHEMA.TABLESWHEREQUOTENAME(TABLE_SCHEMA)+''.''+QUOTENAME(TABLE_NAME) = ''?''AND TABLE_SCHEMA = ''' + @old + ''')EXECUTE sp_changeobjectowner ''?'', ''' + @new + ''''EXECUTE sp_MSforeachtable @sql
Wednesday, February 3, 2010
MSDE and SSE
Today I learned of a client that was having some problem with their MSDE database and Oracle told them to upgrade it to SSE. I don't know what was wrong or how Oracle thought it would fix this unknown problem, but using an SSE database on your deployment server is just a good idea anyway.
If you're installing 9.0 or something, you're going to install SSE or the Oracle database so you won't have to worry about upgrading. If you're on some ancient release like 8.12 or before then it's possible you're still running on MSDE.
It's really easy to upgrade to SSE, the problem is finding the freaking documentation. It's linked in the special instructions for the 8.12 planner ESU. Take a look on support.oracle.com and find Doc ID 756629.1, scroll about halfway down to the link for Special Instructions for upgrading to SQL Server 2005 Express Edition, click that, find the link on the new page that says the same thing and click that, then click the Here link to go to the actual instructions. Can they make that any more complicated? Ugh.
I guess I should have said this before telling you about upgrading to SSE - make sure you back up your MSDE databases. There is no way to downgrade to MSDE from SSE. You have to uninstall SSE, reinstall MSDE and reattach your original databases. If you forget to back that stuff up, then you are screwed. You have to go with SSE no matter what.
Anyway, let's say you followed the instructions and you jacked it up. You can always reinstall SSE if you want. This'll be your only option if you didn't backup the MSDE databases. To reinstall SSE just follow the directions in Doc ID 661133.1, that'll fix everything right up.
A side note about the SSE database: If you install Visual Studio .NET and don't deselect everything except C++, it will install its own SSE database and place certain restrictions on it that will prevent the E1 SSE database from working correctly. If you install SQL server tools on the deployment server, it will also install an SSE database that will interfere with E1. On the first option, you're almost screwed. You have to completely uninstall Visual Studio and reinstall it the right way. On the second option you can simply stop the other SSE service and the E1 SSE will work fine.
I digress. Now you have SSE installed and what-not, but for some reason the planner DBs aren't populating or maybe you can't build ESUs or packages. One thing I've seen is that the JDE user isn't created or isn't assigned any privileges on the planner databases. This is easy to fix: Connect to the JDESSELocal database via SQL Server Management Studio and add/fix the JDE user. Simple.
If it all goes to hell and doesn't come back, you can always revert to the old MSDE. Uninstall SSE, reinstall MSDE (from the E1 installation), and reattach your old planner databases. You did back those up, right?
Tuesday, February 2, 2010
Scheduler Server not starting
The scheduler server is not starting at one of my clients today. It was set to run on the DV/PY enterprise server, but now that they're live it has to run on the PD server. So, I changed the server name but now it's not starting.
Well, I'm the system administrator, so I know about it already. I plugged this info into Oracle Support and big surprise, nothing came up.
When I click the Start Scheduler button, I get this pop-up window:
The jde.log on the client has this strange message: SCHEDULER - Scheduler Server Set an Error
The debug log has its usual five jillion lines of stuff in it, but the only thing interesting is that the log has a DBPerformRequest for the production server over and over until about the last 10% of the log when it changes to the dev server for a DBResetRequest.
Finally, the scheduler kernel log:
3676/3904 MAIN_THREAD Tue Feb 02 11:27:08.765000 sc_dsptc.c846Failed to fetch the Scheduler Server host name from the F98611. The Scheduler Server's Logical Data Source is E1PROD. Could not validate the Scheduler Name.
3676/3904 MAIN_THREAD Tue Feb 02 11:27:08.766000 sc_dsptc.c865Scheduler does not run on this machine. The real Scheduler Server is .
3676/3904 MAIN_THREAD Tue Feb 02 11:27:08.767000 sc_dsptc.c969This server will not start the Scheduler.
The real scheduler server is blank? Don't you worry about blank, let me worry about blank. Blank, blank, what is blank? The F91300 has the correct server name in it and the F98611 has the production server set up there. What's the dealio?
I opened a call with Oracle. We'll see what they say.
Subscribe to:
Posts (Atom)