meaning of life stuff

(601) 691-1162

find truth everywhere

how to be a lazy programmer

You want to do the least work possible. That's why the company hired you.

Work is time-consuming. Work is, by definition, the consumption of resources.

Everyone benefits if you put forward the least effort to achieve the greatest effect.

So how do you pull this off? Some people find the answer counter-intuitive.

Checklists, counts, and reports.

It requires a relatively small amount of effort to produce a checklist, use it to guide the development of a job, then produce and look at basic reports that confirm what you did during the course of that job.

A significant effort is required to revisit the job later because unanswered questions come up, or because the client becomes aware of a problem. You have to stop what you're doing, pull up the job you thought you had completed, and study that job long after it is no longer fresh in your mind.

It's the same principle as fastening a seatbelt a thousand times being preferable to your face slamming into a windshield just once.

Besides, when people come to trust you because your ducks are always in a row, they bother you less and focus more on annoying the programmers who keep screwing up.

quicker matching with break groups

When performing matches on large data sets, break your data into smaller groups, leaving out definite non-matches that don't need to be compared.

For instance, why compare two addresses to see if they're the same, when they are in different zip codes or states?

If you break on zip code, records will be grouped into matching zips and only compared with each other.

Smart use of break groups can save serious processing time.

breaking it down

No matter whether you develop for apps or web, or you're an ETL programmer like me, much of your work is more about making good choices than about writing code.

In a typical day, I may have to handle a wide range of data of varying quality from arbitrary sources and in strange formats. And I may be asked to do unpredictable things with it.

And of course everything has to be done RUSH RUSH RUSH.

In a custom shop like mine, programmers have to come up with solutions on the fly, and those solutions have to be right.

Building everything from scratch constantly is mentally taxing and leaves lots of room for human error, which nobody likes.

So what's the answer?

Personally, I favor a little structure. Compile a small but rock solid set of solutions that just work and can be combined to form elegant solutions to a wide range of problems.

Certain bits of math and basic logic in a handful of patterns can be replicated and tweaked to transform data in surprisingly sophisticated ways.

Avoid building from scratch as much as possible. About the third time you get burned because you typo an A instead of D when choosing whether to sort Ascending or Decending, or = instead of !=, or a seemingly endless list of other easy slips, and you may come to appreciate easily reusable code and data flows.

If reasonably possible, have data flows do only one thing each, so you can snap them together like LEGO blocks. And if you properly index your data (seriously, a unique sequence number works for most things) before you do anything else, you can run any process you like and simply join the results to your data.

Don't make things hard on yourself. Keep it simple, modular, reusable, and predictable. The hard way is for suckers.

numbers are easier

When dealing with numbers in your data, it's generally easier to work with them than with character representations of the numbers.

Sort the set containing 1, 10, and 2 as integer values, and the order will be predictable: 1, 2, 10

Sort the same set as varchar(2), and you can get curious results: '1', '10', '2'

And let's say you only want to join a set of employee names with the corresponding set of departments in which they work, using employeeID as a key.

If the key is numeric, 1 matches 1, and 10 matches 10.

What if the employee name list has employeeID as '01', '10', '02', but in the department listing, the employeeID values are '001', '010', '002'? None will match, because zero-padded characters work differently than integers.

the 3 stages of Quality Assurance

QA involves a whole mental toolbox of healthy habits & ways of thinking.

Aristotle would have pointed out that a job has three main parts: a beginning, a middle, and an end.

In the beginning phase, look at the job requirements. Are instructions clear, complete, and meaningful? Do you see any ambiguities or have immediate questions? Are all the needed resources accounted for, available for your use, and intact? Are you able to produce a provisional plan for providing the requested solution?

In the middle phase, does your plan allow flexibility for adapting to changes and new information? Do you have concrete spots where you can verify that your course of action is producing the predictable and/or desired result?

In the end phase, does the result of your work appear consistent with the request? Do you see any anomalous data, anything open to question? Have you asked the questions that others may not have thought to ask when they provided the request?

If you don't perform QA, your chances of the job returning after the "end" part increase exponentially, because you may miss something critical (or simply screw something up, which is sub-optimal).

catching exceptions

Your code may be brilliant, your logic flawless, but your code will be executed under circumstances over which you have little to no control once it leaves your dev environment. You're going to want to handle exceptions gracefully.

You may want to simply write an incident to a log and keep the program running, or send a notification. What you don't want to do is let unpredicted garbage make it to your output.

Let's say your instruction is to assign statecode 'A' to Alabama records and 'B' to Georgia records. You could apply simple logic like:

ifthenelse( state = 'AL', 'A',  'B' )

But during NCOA processing, some records wound up in Tennessee. The above code merrily assigns them all statecode B.

By contrast, here's the same idea with a few extra keystrokes to toss a flag on that nonsense:

decode( state = 'AL', 'A', state = 'GA', 'B', 'ERR' )

Never assume your data is pristine. Ever.

...but how do you know?

You've built a data flow in SAP, or implemented a method or function in python. You see no obvious flaws in the logic, and nothing flashes red when you execute it. One or more tables or files come out at the end.

You run counts on the output and pass them along to whoever needs them. And you move on to the next project in the queue.

At some point ranging from minutes to several months later, you are asked about what you did on this project.

So you to look at what you did closely and run tests to see if you did what you think you did.

But is this the first time you looked? If so, your job was never complete to begin with.

New processes and changes to existing processes need to be tested before going into production, not tested by changing things on the production line and waiting to find out which clients flip out.

Tests don't have to be elaborate or burdensome most of the time. Pump some data through your logic, and see if it comes out the way it should come out if your algorithms were right.

accepting criticism

My first programming tip is about getting the most value from the criticism people offer you. Criticism is a valuable and precious resource, even if it doesn't seem like it at the time.

First, criticism is generally not a damnation of you as an individual. Take a deep breath.

Second, even if the person offering the criticism really does want you to rot in hell, that's their problem. Take another deep breath and find something of value in the actual criticism they have to offer.

Third, the criticism offered may shed light on an opportunity for you to improve someone else's situation. If you can find this and act on it, you may make the world a better place in some small way, if only briefly.

Okay, now you've received your criticism. If you don't see the value in it just yet, that's okay. Make note of it and reconsider it tomorrow. After your brain has had a good night's sleep to process the events of the day, you may see some sense in it and a way to improve your work and life.

These tips will be a grab bag of technical, practical, and philosophical thoughts about being a better member of an ETL production team. I've come by some opinions over the years.