Code Smell and FME Golf
I was recently asked about what guidelines there are for authoring workspaces in FME.
As you might know, we have the idea of Best Practices in FME; a concept about what makes a workspace, a good workspace.
Chapter three of our training course covers Best Practices. It mentions annotations, bookmarks, debugging, project organization, and a lot more. We also have a sample project that we use for certification candidates. It’s an example of what a project ought to look like – both in the workspace and for its documentation.
So, I can tell you what makes a workspace good – but it’s perhaps a little harder to say what makes a workspace bad. I mean – if a workspace runs to completion, and produces the output you want, it can’t be bad… can it?
Code Smell
When I looked up methods to identify bad code, I found that it is not always a firm answer. Instead you have to look for identifiers that Wikipedia calls Code Smell. It states:
Code smells are usually not bugs—they are not technically incorrect and do not currently prevent the program from functioning. Instead, they indicate weaknesses in design that may be slowing down development or increasing the risk of bugs or failures in the future.
In FME terms, your workspace runs, it produces the output you need, but… further editing may be harder than it should be, plus even if the workspace runs now, maybe it might fail in the future.
So, can I find any identifiers that suggest an FME workspace is… a bit whiffy?
Duplication
Duplication is the biggest red flag of all.
Objects repeating themselves in a workspace are like the blue cheese of FME (a noticeably sharp aroma that taints everything it comes in contact with).
In particular, these are bad duplications:
- The same transformer or – worse – group of transformers, occurring multiple times with very little difference
- Multiple feature types all connected to the same transformer (ever heard of a Merge Filter?)
- Multiple Readers all reading the same format of data (one Reader can read multiple datasets)
These are not always errors – sometimes you really do need repetition – but they indicate that your design could be weak.
Complexity
I found another quote, which I think is great because it is a dilemma in how we evaluate workspaces for certification:
one way [to code] is to make it so simple that there are obviously no deficiencies; the other way is to make it so complicated that there are no obvious deficiencies
A workspace does not need to be complex to be a good project. It can be carrying out a complex process in a simple way. But sometimes I see a workspace that’s so complex it takes me hours to determine whether there really are no deficiencies, or whether they are just really well-hidden.
These workspaces are the bad wines of FME: the label says it’s a big, bold, oaked nose; with a complex backdrop of juicy raspberries, velvety vanilla, opulent cigars, and spicy figs. But after lengthy sampling, you figure out it reminds you more of used gym socks.
These are clues that there might be a problem:
- Low Level Complexity: Using FME functions and factories inside a workspace, or an excess of Python scripting
- Multiple workspace complexity: This workspace calls that one, which calls this one, which runs Python to call that one… etc
- Multiple Connections: When your connections are so dense they form a moiré pattern, it’s time to reevaluate your workspace!
- Excess Debugging: When you have Loggers, Inspectors, and breakpoints attached to just about every transformer
Again, not always errors; sometimes (I admit) there are FME limitations that force this complexity. But none of the above issues are a good sign.
Bulletproofing
By bulletproofing I mean that you design a workspace with the assumption that problems will arise – therefore you build in methods to handle failure. Error trapping is – I think – the term used in development.
Bulletproofing can simply be adding a test or filter transformer to weed out bad features – usually before they get to the point at which an error would occur.
However, though easy to describe, it’s not so easy to notice when bulletproofing is missing. These are the wet dogs of FME! You have to put your nose up close and sniff hard to catch the odour (but, boy, when you do!) – I would double check my workspace in these scenarios:
- Transformers whose parameters accept attribute values (what’s going to happen to a null or missing value?)
- Transformers and formats that accept limited geometry types (what’s going to happen to unsupported geometry?)
- Source formats that support aggregates (not all transformations allow aggregate features)
- If the workspace will be deployed on a different operating system (will file paths be an issue?)
- If the workspace will be deployed on FME Server or FME Cloud (will all custom resources be available?)
To an extent, this is less of a problem now that we’ve started to add <rejected> ports to transformers, but it’s still worth investigating I think. Bulletproofing is of particular interest when the source data is not your own creation, and/or when it is liable to change without notice; for example when you are processing data uploads on FME Server.
If you can’t be sure what data is coming, you better be prepared for the worst!
Deodorizing a Workspace
In the same way you might think your seafood dinner has a delicate aroma of the ocean, it can be hard to perceive your own workspaces as being anything other than delightfully fragrant! The solution to both problems is much the same: you need a good friend to point out the problem.
At Safe, our developers don’t commit their work to the FME product until it has been code-reviewed by a colleague. So that’s what I’m suggesting you try. In particular:
- Don’t deliver a project until an FME-using colleague has examined your workspaces
- Make a checklist of issues to look for (like those I’ve mentioned above)
- Make a formal process (e.g. put a check-box in your records) so you don’t forget or dodge it
At Safe it’s actually quite informal – we don’t always wear doctor’s coats – and it can be simple for you too. It can be as little as a quick look over your shoulder, or can be carried out by email.
You would only need to get a group of people together for a code review meeting when the project is really large and complex, or there is a disproportionately high-cost associated with failure.
It really is worth doing, and both reviewer and reviewee will perhaps learn about techniques they weren’t aware of.
FME Golf
The other interesting thing I found while researching code quality, was the concept of “code golf“.
Like traditional golf, where the object is to play a round in the fewest number of strokes, code golf requires a developer to solve a problem using the shortest amount of code possible. For example this snippet of code:
!(y%100<1&&y%400||y%4)
…is (so they say) Javascript that determines whether a given year is a leap year or not. You couldn’t get much smaller than that. In fact, you couldn’t get further away from the concept of well-designed code either!
Still, I’m going to throw out a fun challenge. I’ll set a task and provide a dataset, and you have to create the smallest workspace possible to carry it out.
The challenge is this: download the source CSV dataset of public artwork (below) and create a workspace to do the following:
- Read the public artwork source data
- Make a point feature for each record
- Drop features whose Name field is empty/missing/null
- Find out which neighborhood each artwork is located in
- Sort the artwork features in alphabetical order on the Neighborhood Name and Title fields
- Write the data to a Shape dataset. There should be fields for the neighborhood name, artwork title, and artwork (location) name
Here are the files:
- The source CSV
- A dataset of neighborhood boundaries
- My sample workspace
Or course, there are some rules:
- The aim is the smallest workspace file (number of bytes) possible. It’s not just the fewest transformers.
- You can use Workbench only. No manual editing of the fmw file contents is allowed (so don’t open the file in a text editor and strip out spaces)
- No Python or Tcl scripting is permitted – it must be pure FME. However, you may use FME functions if you wish.
- You can’t edit or manipulate the contents of the source data before it is read into FME
- You must use FME2015.0 or newer
My workspace carries out the task in the way you would usually expect. It checks in at 68kb (actually 69,411 bytes). Can you reduce that number and produce the same result?! I’m looking forward to seeing some experiments with different transformers, or even different Readers – and to see how you can bend the rules (read them very carefully) without breaking them!
You can email me your creations at the address below (I’m on vacation right now, so there’s no rush). I can’t promise a prize, but I’ll see if I can raid the marketing team’s cache of swag when they aren’t looking.
Conclusions
The FME golf idea is just a bit of fun, but producing well-formed workspaces is no joke. A bad workspace can be slow and inefficient. It will be harder to maintain and not necessarily proof against future problems. The instructions for code reviews at Safe suggest:
It costs more to fix defects that are found later in the development process. Fix early and fix often
And it’s hard to argue with that.