Friday, July 31, 2015

ORACLE 11g IS DEAD! LONG LIVE 12c!

At the end of this year, the Oracle 11.2.0.4 database goes on Extended Support. 11.2.0.3 hits extended support on August 27th - less than one month away! 11.2.0.1 and 2 are completely out of support.

For 11.2.0.3 and .4, clients cannot get support or bug fixes after Extended Support ends unless they purchase additional coverage. There is no such option for lower releases.

That's not all. Oracle 12c is only supported as of the 9.1.4 E1 tools release. If a client is on 11g and wishes to go to 12c, they also have to be at that TR or higher.

I know many companies are running 11g and don't know that the database is going off of support. For the ones that do know of their impending doom, some are making plans to upgrade without considering the tools release they're currently on.

Are you going to have problems if you don't upgrade all the things? I've got good money on "no". My theory on upgrades and patches is that you should only do it if you have a need. A need for new functionality or a need to patch something. Otherwise if everything is running fine why go through the trouble?

Still, most companies don't like the idea of running out-of-date or unsupported software. With the various deadlines approaching fast, now is a good time to sell some upgrades. Whether you're selling to a client or just to your own boss, you better get on it before the end of support. Better to forewarn people than trying to explain why you didn't say anything prior to the deadline.

Wednesday, July 29, 2015

I was just doing an environment refresh and the database crashed!


Business as usual
At the end of the day, you get asked to refresh PY business data from production. No problem, you think. You'll kick it off, let it run, then the next morning it'll be good to go. After all, nobody uses the non-prod environments after hours so it's not going to matter if they get a bit wonky while the refresh is in progress.

When things hit fans
Around 8pm you get a call: JDE is down! You get off the couch and log in to the system. Sure enough, you try to run something inside E1 but you get an error like this:



If you were to look in the server jde.logs you'd see this:

3956/6844 MAIN_THREAD                            Fri Jun 05 12:17:20.351000            Jdb_ctl.c4091
            Starting OneWorld

3956/6844 MAIN_THREAD                            Fri Jun 05 12:18:04.439000            dbinitcn.c899
            OCI0000065 - Unable to create user session to database server

3956/6844 MAIN_THREAD                            Fri Jun 05 12:18:04.439001            dbinitcn.c904
            OCI0000141 - Error - ORA-00257: archiver error. Connect internal only, until freed.
3956/6844 MAIN_THREAD                            Fri Jun 05 12:18:04.439002            dbinitcn.c520
            OCI0000367 - Unable to connect to Oracle ORA-00257: archiver error. Connect internal only, until freed.
3956/6844 MAIN_THREAD                            Fri Jun 05 12:18:04.439003            Jdb_drvm.c798
            JDB9900164 - Failed to connect to ovsorcl


Maybe you're an experienced or brave CNC and you know how to log in to the database via SQL*Plus. You'd be greeted by this message:


The first screenshot from E1 is the most descriptive message you will get, but it's not very understandable unless you are an experienced DBA. What can add to the confusion is when the server has plenty of free space available.

Everyone is yelling at me to fix it
First, just stop everything you and everyone else are doing. You're probably hearing people throwing out ideas that range from good to crazy - but they're all wrong. Most people's first reaction is to restart E1, the database, or reboot all the servers. When that doesn't work you'll start deleting files on the database server and you might even figure out where the archive logs are and delete those. Unfortunately, none of those actions are going to fix the problem. 

I'm so confused
To understand what's going on and how to fix it, you have to know a little bit about how the Oracle database works. Whenever you do something inside E1 via an application or a UBE it sends a SQL statement to the database. It could be requesting data, updating rows, or inserting new records. Every single action E1 performs on the database is recorded into the redo log files. 

Those redo logs collect every SQL statement executed and are used to recover the database. When a redo log gets full, it writes its contents to another file called an archive log. There are only a handful of redo logs and they get overwritten frequently. Archive logs are continuously created and can be backed up, then later used to restore your database to any point in the past.

So what's the problem?
Obviously if you run out of disk space you're going to have issues with any piece of software. It's the same with a database. If the database can't write any new archive logs (for example), then it can't recover from a failure. When archiving stops, the database stops.

But I have free disk space!
This is entirely possible. Space for archive logs is allocated from within Oracle. You can have 200 GB of space available, but if you've only set aside 5 GB for archive logging inside Oracle you're never going to max out your disk.

Okay, enough talk! How do I fix this?
The easiest, fastest way to get your database going is to expand the space Oracle uses for the archive logs - IF you have extra disk space. First, you need to find out where your archive logs are going. An example default destination is /u01/app/oracle/fast_recovery_area/ORCL/archivelog. Look in there (or similar) and if you see a bunch of directories named in the style of YYYY_MM_DD then you've struck gold. If you can't find them you're going to have to look for them.

If SQL*Plus isn't letting you connect due to the archiver error, you can connect like so:

sqlplus /nolog 
SQL> conn sys/ as sysdba 

Once connected you can run the following SQL statement to find out where they're going:

SQL> show parameter db_recovery_file_dest

Wherever the log files are going, find out how much room is left on the disk. Let's say that on your /u01 mount you have 50GB free and you want to expand the archive logs into that. The next thing you need to do is see how much space Oracle thinks it has for the logs.

SQL> show parameter db_recovery_file_dest_size

We'll pretend this returns a value of 20GB. That means we can increase the allocated space to 70GB which will use up all available space on the disk. Now all you have to do is tell Oracle to use it:

SQL> alter system set db_recovery_file_dest_size = 70GB scope=BOTH

As soon as you execute this statement your database will start working again. No need for restarts or reboots, the DB just goes on about its business.

My archive log destination is as big as the disk
Well now you're in a real pickle. Still, there is a simple solution - do a full backup including archive logs. I'm not going to go in to every possible way to back up a database, but here is a simple command to clear out archive logs using RMAN:

RMAN> BACKUP ARCHIVELOG ALL DELETE ALL INPUT;

When it's finished you'll have a backup file and all the archive logs will be gone. You'll have some space freed up, but it'll be a good idea to move that backup file off of the server to open it up even more.

Backup takes too long/goes to same disk as archive logs/I have nowhere to put it.
Back in the pickle again I see. Your next best option is pretty simple but also the most dangerous: Delete the archive logs. This frees up space but you negatively affect (or completely eliminate) your ability to restore your database to any point after the last backup. However, on a non-production system or even during a production emergency this might be the best option.

First thing to do is find your archive log destination. Use the same instructions from above (show parameter db_recovery_file_dest) then delete the directories and files in the archive log destination. To finish up you have to tell RMAN the files are gone:

rman target /
crosscheck archivelog all;
delete noprompt expired archivelog all;
exit

That connects you to RMAN, compares the list of archive log files Oracle knows about to what actually exists, then deletes the entries for any that don't match up. As soon as you finish the RMAN statements your database will resume on its own. There is nothing more you should have to do for operations to continue. Now you have free space on your disk and within Oracle. 

The horrible, horrible catch when deleting archive logs
You've broken your backups. If the database were to fail due to data corruption, power failure, tornado, or a giant monster attack then you're going to have some very unhappy executives. You need to immediately take a full backup of your database to un-break it.

How do we stop this from happening again?
  1. Make sure your database has enough space. Not just for archive logs, but also for your data files and everything else. Figure out how much everything grows on a day-to-day basis, then bust out your calculator and find out where it's going over the next 6-12 months.
  2. Be prepared for large data load operations. If you're space constrained you probably need to take precautions before doing a refresh. Your users need to let you know before they do any large data inserts.
  3. Set up database and server monitors to warn you well in advance. If you can be alerted to the space issues in enough time you can fix everything before anybody knows there is a problem.

Reality check
I kind of glossed over some details and skipped some advanced techniques so I could keep things relatively simple while also providing solutions almost anyone could use. Having a database halt is a very stressful situation and it can be difficult to troubleshoot especially if you're not a DBA. The actions I talked about in this post should only be followed by a non-DBA in an extreme emergency. There is too much that could go wrong and you don't want to make a bad situation worse. 

Proceed with caution.

Friday, July 24, 2015

The shocking truth: You can't create a read-only user on your E1 database!

Did you know that by default all users created on Oracle and MSSQL databases have full permission on all JDE tables? It's true. Not only that, this permission takes precedence over any restrictions you might place on a particular DB user such as making them read-only. This is hard-coded in to EnterpriseOne and Oracle has no plans to change this behavior.

Why is this? JDE is designed to insulate users and administrators from the hardware and database. This is part of the spirit of CNC - Configurable Network Computing. Yes, CNC actually means something other than "Person Who Gets Blamed For Everything". The CNC concept means that with only JDE tools a user can perform select, insert, update, delete, create, and drop operations on any table.

Because you don't have to know what database or operating system is being used, you don't need to have any specific database skills. Without ever leaving EnterpriseOne you can do full environment refreshes, table copies, add new indexes, and much more. The best part is that it is all controlled by JDE security. You never have to touch the database to do your day-to-day job.

As flexible as that is, there are often good reasons to directly access the database. One example is to do faster refreshes. In the middle ground are SQL fixes due to data issues. On the other end are bad reasons which (in my opinion) include providing read-only access and large exports of raw data. I'll talk about why those are bad in another post.

No matter what the justification is for providing direct access, the shocking truth is that every database user can view and manipulate all JDE tables and data on the system - and there is nothing you can to do change that.

I take that back. There actually is something you can do to change it. It's called Oracle Public Shutdown (OPS), a somewhat complicated procedure that removes these global permissions and allows administrators to have better security control at the database level. Once you've run OPS you can now create read-only users as well as any other levels of access you can think of.

Unfortunately for your administrative workload, OPS is not permanent. Every time a table is created or regenerated in OMW, every user gets full permissions reassigned to that object. Every ESU, ASU, and Update that contains tables will reset those permissions. An upgrade will reset many permissions. What this all means is that you have to come back to your database every so often and re-run OPS.

The next time someone asks for a database user, make sure there is a really good justification for their access. You'd be surprised how often they can do what they want inside E1. Plus, you don't want to give what is essentially DBA authority to just any user. Related tip: Make sure you have good backups.

If your company is concerned about the security of their database - especially if they have SOX or FDA requirements - they may need Oracle Public Shutdown. Until then: Be careful. Leverage the tools included within EnterpriseOne wherever possible.

Reference: Oracle Public Shutdown