Making Data Useful: Once again, we must talk about indexes

I've given a couple of talks about JDE and indexes, as well as had numerous discussions with developers, DBAs, and others. What I've found is that almost nobody really knows how indexes work, what the relationship between JDE and the database is regarding indexes, and that adding an index might not help anything at all.

There are many reasons why someone would think they need an index. A common reason is when performance is slow with a particular app or UBE. The first thing people usually recommend is to add an index. The problem I normally find is that nobody truly understands the problem.

Slow performance is not a problem, it's a symptom of something. Do you really need a new index, or do you have bad table and index statistics? Could the application need to be re-written slightly? Maybe there's a NOT IN condition that forces a full table scan? You really have to figure out exactly what's going on before choosing a course of action. I see a lot of people put an index on a database only to scratch their heads when there isn't any improvement. What usually ends up happening is the database is blamed, so the DBA crawls up in the guts of the application only to find that the real issue is data or development related.

Sometimes there's not yet any problem with performance at all. A developer could add an index just because it makes their work easier or they think their app will need the index. Inside JDE when you're using Table Design Aid inside OMW to create an object that has table IO, you have to choose an index from a drop-down list that contains all indexes inside the JDE specs for that table. A developer will pick the index they think is appropriate for their work and move on. If there isn't one in there that works for them, they can create a new index that fits their needs.

Now if a developer creates a new index within JDE for whatever reason, it will show up on the list for the developer to choose. If the table or indexes are regenerated using OMW the new index will be created on the database. However, at runtime JDE does not send any information to the database on what index to use – it only sends a SQL statement. That's because the information in that drop-down box is only there for the benefit of the developer – it has no bearing on what index the database will use to serve the data.

What this means is that an the existence of an index is not a guarantee that it will ever be accessed. I analyze a lot of databases and many indexes shipped from JDE are never touched. Some are only used a few times in their lives. Others could be used thousands of times an hour. Regardless of how much they’re used, every one of those indexes creates an additional write and/or delete for every single transaction. As a result, extra unnecessary indexes can actually cause slower performance.

How does the database decide what index to use? In a nutshell, the optimizer tries to determine the “cost” of the transaction based on the existing table and index statistics, then chooses a plan that can run it with the lowest cost. It can (and does) get much more complicated than that, but for the purposes of this blog post it works.

The question of what indexes could it use to better service a particular statement has some guidelines as well. For example, adding a new index on columns A and B when there is already an index on A, B, C, and D is a waste of time and decreases performance. The database can already get the best results out of an AB statement by using an ABCD index.

"Go big or go home" isn't a phrase you should utter either. Indexes over five to eight columns don’t improve performance much if at all. The database can also use multiple indexes to fulfill a request, so adding a new index that encompasses what exists in two others might not add a huge benefit.

Indexes are a complicated subject to discuss. Sometimes they can make things go faster, sometimes they don’t seem to do anything, and occasionally they make things worse. From the JDE side there is no control over how the database chooses to execute a SQL statement, and that is largely due to its DB-agnostic design.

Ideally, developers and users should work together with DBAs to find the best solution to a performance problem. If an index is required, then that's what we'll do - but let's do it because it's solving a problem, not reacting to a symptom. If you really need an index, ask your DBA add it on the database side temporarily to see how it goes. If it works, then put it in the table specs. If not, attack it from another angle to see what's really going on.

Making Data Useful

Thursday, July 21, 2016

Once again, we must talk about indexes

No comments:

Post a Comment

Followers