I am working on developing a webapp (visual jsf, ejb3, hibernate on glassfish/mysql) that works with an existing legacy system.
I have an 'employee' table (with fields such as name (String), location (String), etc.) which is separate from an employee table on the legacy db2 side. I do all of the webapp processing with my employee table. However, every week I need to schedule a task to go through all the employees in my table and compare them against the employees in the legacy db2 table. If the employee location has changed in the legacy table, I need to update my employee table to reflect the new location.
What would you suggest as the best way to go about doing this?
Currently I am reading in all the employees into an ArrayList and then looping through each employee entity in the list, getting the corresponding legacy employee instance, comparing locations and updating my employee entity if location change detected.
Since I have close to 50000 records in my employee table, the initial build of the ArrayList takes around 5 minutes and this employee number will only keep on increasing.
Is there a reason why it should be synched only once in a week? If not, you might want to spread the operation over the week - do 1/7-th of the work every day. You can also consider adding a table in your side to keep track of which record was synched when.
I would create a dblink ( dblinks do exist on DB2 right? ) and do something like:
select a.id, a.location from empl a, empl@link b where a.id = b.id and a.location <> b.location
Then iterate the resultset which will have all those whose location have changed.
If you have the ability to alter the legacy table in any way, you could add a needs_sync column to it. Then, using a trigger or modifying the code that updates the location, set needs_sync = 1 when you do the update. (Add an index on that column, too.)
Then, to find records to update
select id, location from legacy.employee where needs_sync = 1
When you've successfully done the sync
update employee set needs_sync = 0 where needs_sync = 1
Do it all in a transaction to avoid a race condition.
This solution has the advantage of only examining records which have been changed, so it will be efficient at runtime. It does require a change in the legacy schema, which might be painful or impossible to do.
Im thinking of using jpa query's "setMaxResults()" and "setFirstResults()" methods to retrieve employee data in chunks. These methods are used for paginating display data in the UI, so I dont see any reason why I cant do the same. This way I can process chunks at a time. And I could probably throw in a queue and mdb for processing the chunks in parallel since I cant create threads within the ejb container.
I am thinking of using JMS messages, queues and MDBs to try and solve this problem instead. I would send each employee record as a separate message to a queue and then, the corresponding MDB can do all the processing and updating for that record. I am thinking I might get more simultaneous multiprocessing done that way.