Aloha Akamai

Aloha –
Welcome, greetings, farewell, goodbye, love, mercy, … (Hawaiian)

Akamai –
Smart, clever, expert, skill, witt, … (Hawaiian)

I started a new job with Akamai Technologies. They have a small San Diego office and from what I can tell, it’s a very bright and motivated team with technical aptitude to match their creativity. A positive group with complex challenges that I’m excited to dive in to!

Buddhism has a concept of “right thought”, of freeing your mind of lust, ill-will, and cruelty. I’ve been thinking a lot about this lately. We surround ourselves with people that often reflect our own dispositions of thought. The environment we find ourselves in, for better or worse, is of our own doing. In many ways, this phase of my career represents a welcome “right thought” and I realize that “right work” is a necessary path to that state of mind.

And to my friends and family who have never heard of Akamai; you use their services all of the time without even knowing it. Primarily known for their content delivery network, which is a grid of compute and data resources covering multiple Internet backbones optimized to deliver content, to you, the end user. In other words, when you’re watching some cool video on myspace or previewing film trailers on apple.com, the content you’re watching is being managed by Akamai.

http://www.akamai.com
http://en.wikipedia.org/wiki/Akamai_Technologies
http://money.cnn.com/magazines/fortune/fortunefastestgrowing/2007/snapshots/48.html

Organizational Intelligence

Organizational intelligence is the measure of an organization’s ability to comprehend and conclude knowledge relevant to its purpose. In simple terms it is the performance of a group or organization measured as a whole. For example, Basketball and Hockey require a high-degree of organizational intelligence in order for a team to win. The success of the team depends greatly on the team’s ability to work together, rather than based only on individual performance.

For software development, and most business organizations, the goal of an organization is to function at the highest levels of competency across all individuals. Unfortunately, a low organizational intelligence means that an organization is only as smart as the combined incompetence of all individuals. The net effect is that simple problems are harder for the organization to solve than they would be for a single individual.

Consider the following overly simplified skill summary:

In this situation, there are three developers and a single project manager. Of the three developers, one is a database developer, one an application developer and the other a WEB/UI developer. This represents a fairly standard cross-functional team, though greatly simplified.

In order to achieve the goal this group must perform at their combined strengths. This creates a positive emergent behavior where the group dynamic is able to achieve something that no one individual on the team could perform. This is a case of positive emergence and assumes a well-functioning team with a high degree of organizational intelligence:

On the other hand, the team could be functioning in an overly competitive or chaotic manner leading to a negative emergent behavior where the group dynamic is only as competent as the combined incompetence of all individuals:

This type of scenario is fairly common in software development and engineering. Concepts such as Analysis Paralysis, Design by Committee, and Feature Creep usually result in a low organization intelligence with a negative emergent behavior.

Emergent Behavior is the result of simple entities forming more complex behaviors as a collective. The complex behavior of the group is not a property of any single entity. Group/Organization behavior is often emergent, not the fault or credit of any one individual but the dynamic formed by the group itself.

Typically smaller teams like the one in the charts perform more on the positive side than the negative. However, as teams scale the challenges quickly become less about specific skill sets but instead more about understanding and exploiting the emergent behavior of the organization in order to achieve the desired business goals.

Consider the overly simplified version:

The individuals possess the correct skills necessary to achieve the goals of the organization and ideally will surpass those goals through a positive emergent behavior of team. On the other hand, the areas of incompetence are drastic enough that given a negative emergent behavior the team will surely fail despite individual efforts to the contrary.

Illusion of Stability

Sometimes we plant our feet firmly in hope for stability. Kids, mortgage, bills; the excuses are endless. Yet for all our static behavior life has a tendency to knock us right on our ass. If you’re alive and breathing, then there is no such thing as stability. Not only can your life change at any moment, it’s changing at every moment.

But what about all of the compromises? As stability becomes a value in life, where do we rank that with our other values? Everyone has a ranking of values. Some people will maintain honesty as a value even over friendship. Other people value friendship more and will gladly lie to preserve their friends favor. So what values are we willing to compromise for stability?

Maybe you’re a few months from vesting or just a couple of years from retiring? When do you stop rocking the boat and hope things settle down?

I hope the answer is never. The boat is always rocking whether you want it to or not. The water is guaranteed not to stay calm. Stability often appears as an imaginary value. The problem is not stability itself, but with the values that you compromised for something imaginary.

In my own life and in the lives of people around me, I’ve seen compromises that sacrifice integrity, competence, and honesty all for that illusive stability. I’ve watched good people do bad things, and observed my own bad behavior to make compromises in order to maintain a deceitful status quo. A lifetime of compromising your values and it’s no surprise where otherwise good people with well-intentioned values become bad people based on the ease with which they act out against their own principles.

I can’t speak of a perfect solution. It’s difficult if not impossible to just simply erase stability from our core values. The irony is, we want stability to protect ourselves and the people around us, but as we compromise in favor of stability we end up hurting ourselves and the people around us. My idea is to instead focus your life on the process and not the outcome.

We tend to focus on what we want and our minds fill with imaginary outcomes. Some people want kids and a house, and once they get them they want to preserve them, they want stability. I’ve seen people sacrifice happiness (which in my opinion should be the highest order value of them all) in search for stability. Try instead to focus on the process; on living in your home and raising your children rather than outcomes like kids, house, and bills. Or focus on living a good life rather than imagining a future retirement. Perhaps then we won’t compromise our values and our actions will reflect our good intentions.


Like waking from a slumber
I open my eyes and see clearly
The world around me
My place in it
The present
Where have I been?

Goodbye Qualcomm

It’s been just under two years and after a whirlwind of experience and excitement at Qualcomm I have decided to move on.

This was my first foray into the private sector and it was a blast! The academics had it completely wrong. I was warned that I’d be a cog in a machine. Not even close. The academic machine is a rusty axe compared to the shiny chainsaw of the corporate world. Each has its cogs. You know who you are.

There’s an old saying that originated from the Soviet Union: Initiative is Punishable (“Initsiativa Nakazuima”, thanks Katya!). I’ve learned in my career that this is true. Initiative is punishable, but as a friend of mine pointed out and looking at my own career I’d say that this same initiative has been rewarding.

Our lives, whether at a job or elsewhere, are filled with challenges. Sometimes we win, sometimes we don’t, but we should never lose focus of our ideals and our integrity. It is our ideals that inspire innovation. And it is integrity that keeps us on the right path. Often times we find ourselves so lost in details that we forget this larger context.

For anyone reading this far and finding any sense to this post: remember that we are defined not by our words but by our actions, and all too often the dissonance between our mouth and hands is the real source of our stress.

The Good, The Bad and the Ugly of Scrum

We all hear about and we all love it: the Rugby-inspired software development methodology known as Scrum. It’s fast becoming an industry buzz-word and causing many project managers to question their Gantt charts. For all the hype, what is the reality of Scrum?

Scrum is an agile-based software development methodology for project management. It is characterized by a prioritized product backlog that lists new features. Work is completed and delivered in time-boxed iterations known as sprints (e.g., two week iterations). Scrum teams are cross-functional and typically number 3-7 people each. Iterations begin with an iteration planning meeting and end with a retrospective to review what worked and what didn’t.

During a sprint each scrum team gathers for a daily stand up, which is a short meeting where each person describes what they did since the previous meeting, what they’re planning to do now, and any impediments. The team is self-organizing leveraging our instinctive behavior to work in small groups. The Scrum process is facilitated by a Scrum Master. That title is a bit of a misnomer since the Scrum Master carries no authority and is instead responsible for blocking any distracting influences that could disrupt the teams progress.

The principles of Scrum are well defined in the wikipedia article as well as in the book Agile Project Management with Scrum by Ken Schwaber. You can also shell out ten grand for an in-person experience with Ken. There is nothing like an expert talking about the work that you should be doing. As some of my friends and co-workers like to hear me say: Get back to work!

What’s so Good about Scrum?

Delivering working software. Working software is where Scrum really shines. It’s proving to be an excellent implementation of Agile Software Development with core values such as customer satisfaction and individual interaction.

There are four core values to the Agile Manifesto:

  • Individuals and interactions over processes and tools
  • Working software over comprehensive documentation
  • Customer collaboration over contract negotiation
  • Responding to change over following a plan

These are all valid principles that are easy to ignore and have proven to be hard learned lessons despite how obvious they may seem. Projects typically fail when they ignore these principles. Good documentation has never compensated for crap software. Try telling the upset customers that you gave them exactly what they contractually signed in the Service Level Agreement. The principles of the Agile Manifesto should be held as software engineering law.

Scrum provides a very effective methodology that ensures these principles through an empirical approach to software development that embraces and encourages change.

If it’s so great, what’s the Bad news?

No one likes to admit this, especially not Scrum advocates like myself, but Scrum fundamentally conflicts with traditional PMO. It’s an interesting round of cognitive dissonance to watch a PMI-certified project manager attempt to rationalize Scrum. There are of course several ways to deal effectively with this dissonance but understand: these are fundamental differences which are not philosophically compatible.

Scrum will go far in delivering working software, but what about managing roadmaps? Better yet, what about resource allocation and God-forbid budget forecasts that are needed before a project starts? We need money and people to start a project and we’d like to know roughly how much something will cost before we agree to invest. In a perfect world we would know these answers and we wouldn’t need Scrum. But Scrum exists out of necessity from the failures of so many software development projects and reminds us that the entire enterprise of software engineering is often times more like scientific discovery than building construction (which is where PMI originated, and rightfully so).

The Ugly

Scrum delivers working software in chaotic environments. At the same time Scrum is a symptom of a larger problem in software engineering such that many software projects cannot be managed like construction projects else they face increasing technical debt, unhappy customers, declining quality, going over budget, and missing deadlines.

Migrating to a Scrum methodology typically has an effect of providing early visibility to problems. The wisdom being that it’s better to know that you’re failing one or two months into a project rather than years. While this makes sense and this transparency is a very valuable aspect of Scrum the reality is very ugly.

Transparency to critical problems often times stems from the fundamental conflict of traditional PMO and Scrum. These are problems that are unfortunately outside of the realm of Scrum or software engineering. In this situation it is easy to attack the symptom (Scrum) than it is to address the underlying issue, which is unifying project management (roadmaps, budgets, resource allocation) with the software development.

Visibility? Be careful what you ask for! If a project is going to fail, maybe it’s better to let it fail naturally than to induce ulcers in the Scrum Masters and the development staff. Remember: a good Scrum Master will be a burned out Scrum Master in most environments.

This isn’t an easy problem to solve, but done wrong Scrum can create an emergent failure. Take the following anecdotal quote from Brad Wilson:
Scrummerfall. n. The practice of combining Scrum and Waterfall so as to ensure failure at a much faster rate than you had with Waterfall alone.

On the other hand: Scrum, done right, has the potential for an emergent success given iterative and continuous improvement. A potential method for solving the real problem is to exploit the emergent behavior of a system.

Emergent behaviors are difficult to track, but analyze the existing software, processes, and development and determine whether or not they are evolving appropriately. A successful development process should continually improve the same way the code itself should continually improve. The process itself should be agile, responding to change to better produce working software.

Better yet, the individuals and interactions should be agile. It is the people that must respond to change.

Evolving Database Schemas

Master your Domain

It amazes me that while database schema evolution is one of the most critical factors in software development it’s also one of the most ignored and least understood aspects. Pretty much every interesting software development project requires an evolving database schema. From renaming tables to modifying relationships it’s as critical as any piece of source code but is often the least cared for part of any project. Your schema is what maintains order in your data, arguably this is the most important part of any enterprise (organizing your data)!

There are mountains of excellent (and practical) resources on how to manage software development projects. I can get certified in SCRUM or nearly any Agile-based development methodology but I can barely find any useful websites on managing my database in an Agile environment. When your development process encourages change this applies to your data modelling and schema design and not just application development.

There are some excellent articles from Fowler and Ambler that provide a good starting point. However, I found a definite lack of details and practical advice in those articles. It goes without saying that I want to run regression tests and version my work, but a database is distinctly different in that I have to deal with all that existing data and can’t exactly redeploy a schema while preserving the old data (not easily anyhow).

Existing persistence strategies often fail to accommodate Agile-based development leading to poor design choices in data modeling. In this article I’d like to explore practical solutions to Agile database development. Like anything Agile, there’s no perfect solution, but this is a pervasive problem across all interesting software development projects and I hope the discussion alone will yield better solutions.

To start, there is one important observation that I’ve found to be true across various software development projects: If the code smells it’s likely that the database smells worse! Let’s examine some common database smells:

  • Inconsistent relationship strategies; when your ER diagram starts looking like spaghetti and every piece of business logic introduces a different convention you’ve got a problem. You have object tables and three possible types of relationships (1-1, 1-N, N-M), your data model is only as complicated as you make it; pick a strategy for each of the three types of relationships and stick to it. I liken this to using GOTO statements in software, it’s unacceptable.
  • Inconsistent object model strategies; this is often the impetus to change your relationship strategies, that is, when I have inconsistent strategies for object tables it leads to very confusing relationships. You’ll see one table with a varchar(20) NAME and another with a varchar(16) NAME, is this the name of the object or does the object contain a “name”? Use a consistent strategy for tables as well as concepts like status, timestamp, IDs and alternate keys (such as name).
  • Inconsistent naming conventions; STAT_DATE, STATUS_DT, or STATDATE? Pick a convention and stick with it!
  • Inconsistent usage of the same column; a common example is a varchar column named TYPE that means different things to different applications. I’ve noticed that even good data modellers make this mistake.
  • Overloading fields; things like a comma-separated list of values where only the application knows what each value means. Don’t use a relational database if this is how you model – text files may work better!
  • F normal form; Johnny just took a class on database design and learned about normalizing a database and now you have 175 tables in what he claims is 6NF! There are appropriate times to denormalize just as often as there are to normalize.
  • Know when to OLAP; why are there summary tables attached to each of my transaction tables?
  • The Cauldron of Data; this is the crux of the problem, everyone is so scared of the data that they lose control of the schema and treat it like a bubbling cauldron too paranoid to make any significant changes out of fear of breaking a legacy application. This is the end result in any application where they didn’t manage their evolving database schema.

Let’s talk about some solutions!

First of all, this is a developer problem! Don’t expect your DBA or Hibernate to fix this for you – if you’re a developer this is your problem. This leads to the central theme of how I propose database evolution to be solved: Your development methodology must cover application and database development.

If your application depends on a database, then your development methodology better cover both application and database development! I know, you like building the code and leaving the responsibility of the database to someone else. But that brings you back to the Cauldron of Data scenario where you can’t make any significant changes to a schema because it got our of your control. And if you can’t control the schema you can hardly control the application that depends on that schema!

We tend to ignore database development as a way to simplify our application development – I suggest you make the application development suffer by adhering to a development methodology that works with database development! Think of it like this: you’re going to be the databases bitch if you don’t take this responsibility.

I know, this seems like more work from the application side but like anything done right it’s hard to imagine doing it differently once you get your development methodology to cover applications and databases. That said, how does one integrate their database development into a unified development methodology?

Let’s look at the differences between application development and database development (and what needs to change in the traditional Agile-based development methodologies):

  • Databases contain data that cannot be lost; this means you have to migrate production rather than reinstall
  • Data is easy to migrate when your data is not controlling you (see the Cauldron above)
  • Rebuilding your database is like compiling and deploying your code (this sounds like a maven target)
  • Databases should have unit tests, and not just for the stored procedures (more on this later)
  • Map your database development to your project lifecycle goals exactly like you would with application development (say, in Maven 2) but introduce the migrate step in the deploy target.

If I compile and build my application why not build the database schemas at the same time, just like I would with anyother dependent artifact? So, let’s get practical and talk about things you can actually do to accomplish Agile-based evolutionary database design:

Create a Database Schema Change Policy

Keep it simple and make sure you answer how you plan to address schema migrations planned and unplanned. Your process should lend itself to an emergent property of better schema design. This by itself requires you to not only support planned and unplanned schema changes, but to encourage them. Either do a big design up front (not-agile) or encourage change in all aspects of your development (including your data model). I recommend you clearly define a process for planned migrations (migrating from one version of the schema to another) and unplanned patches (critical fixes, the kind you get in the middle of the night).

Bring DBAs in Early

You’ll need their help, and you know it, best to get friendly with them early on – give them a chance to know what you’re trying to do on their database. I argue that you’ll find more resistance to agile development from software developers than you will from DBAs. Most DBAs have been on the front-line fixing smelly database code and are likely your strongest ally. Not only can they help with the development process they can (and should) assist with design.

Use Stored Procedures

Read up (separately) on “End-to-End Architecture”, if your schema is going to change then you better clearly define your endpoints and provide an API-like package to abstract the schema completely. What’s great about stored procedures is that they can (and should) be treated like application code. I recommend two types of packages, consider using a suffix of _PKG and _API. All of your object tables will likely have GET, PUT, and DELETE procedures. These should be autogenerated, if not, write yourself a script or invest into some software to autogenerate CRUDL stored procedures. Each schema should have a _PKG with CRUDL procedures for all object tables. You also have business logic, from complicated transactions to simple procedures like authenticate(user, pass). Procedures that encapsulate business logic should be in packages with an _API suffix and follow the same rigorous design that would be employed for any application API.

The naming convention of an _API and _PKG suffix is unimportant (any convention here would be fine), but what is important is distinguishing between your CRUDL procedures and your APIs that encapsulate your business logic. Once you have a convention that cleanly separates these concepts you now have a mechanism which can completely abstract your schema from your application and best of all, you’ve likely imposed some constraints and standardization on your object tables that lend themselves to easy autogeneration of the CRUDL procedures.

Version your Schema just like you would an Application

Versioning is a given for application code, why should database code be any different? Schemas, default data, packages, grants, everything should be versioned along with ALL other application code. Applications depend on a versioned database, just like any other versioned artifact – I would expect my build to fail if the dependent database for my application doesn’t exist.

Each of your schemas is like an application, and all of the DDLs should be checked into source control and managed as applications! Check in your test data (sql inserts) and you’ll easily be able to define a database-specific unit test environment!

Use the Right Tools

You’ll need more than a modeling tool, modeling tools are great at helping you to visualize your schema, but don’t get carried away. You need to track schema AND default data! Use tools that fit your process not the other way around – write your own scripts if you need, they’re not that hard once you have a working process. Between Maven and some sqlplus scripts we’ve gotten plenty of mileage at my current job with the following scripts:

  • drop_objects.sql; loops through all of the schemas and drops everything, there’s also a delete user approach but with the drop script you don’t have to redefine your tablespace; this script is never run in production
  • create_objects.sql; loops through all of the schemas and creates all of the tables and default data; this script is never run in production
  • create_pkg_spec.sql, create_pkg_body.sql; loops through all schemas and compiles the package specs and separately the package bodies
  • run_tests.sql; loops through all schemas and runs database unit tests, it uses stored functions with setUp and tearDown procedures similar to Junit; this script is never run in production
  • migrate_objects.sql; loops through all objects and runs a per-schema migrate script which is created based on the delta between two different versions of the same schema

Localhost Development

Why else do we have fancy development workstations? Stop assuming Eclipse is allowed to eat up all of your resources – let Oracle do it! The only way to empower your developers to be agile is to give them an environment where they can easily change the database schema!

We’ve gone so far at my current job to support localhost Oracle instances where we checked Oracle into our software version control (along with Tomcat, Java, etc). We tried using the Express Edition but it didn’t support all of the PL/SQL code we were developing so we’ve got the full bloated 10g running on all of the developer workstations (takes about 30 minutes to install on a new workstation). So don’t tell me you can’t run MySQL locally!!

REFERENCES

http://www.martinfowler.com/articles/evodb.html
http://www.agiledata.org/essays/databaseRefactoring.html
http://www.agiledata.org/essays/databaseRefactoringSmells.html
http://web.mit.edu/Saltzer/www/publications/endtoend/endtoend.pdf

On Turning 30

While I shouldn’t care that I’m turning 30; I still feel depressed. Why would birthdays make me depressed? Every year at this time I end up in an existential crisis.

I used to think that I didn’t like birthdays because I don’t like being the center of attention, but that’s stupid, we all love and crave attention. If anything, I find myself getting annoyed that I have to entertain, laugh at everyone’s jokes, thank them for the thoughtless cards, and make them feel good at my expense. Yes, yes, I’m old, that’s hilarious.

The irony is, as I’m depressed I still smile and joke with everyone. I can’t help but to smile when smiled at, nod and laugh at the appropriate moments. Dance Monkey, DANCE! I guess it makes me feel better making someone laugh rather than cry.

The more I think about it, I guess birthdays are about everyone else. We’re all equally selfish, the poor bastard having the birthday usually does more work than the people supposedly celebrating the birthday. If anything, having a birthday gives you more responsibility to entertain friends and family.

But why all the depression?

I will admit, my reaction to birthdays has never been typical. I’ve noticed two distinct reactions in other people.

The first reaction is the imaginary todo list. Where you reflect on how many items you’ve checked off this list. Some people have “live in a foreign country” while others have “marriage and kids”. The principle is the same in all cases. It’s completely arbitrary, and no doubt will cause depression since turning 30 becomes a deadline where it’s not clear what you’re suppose to do afterwards.

The second reaction is “I don’t think it’s important”. I know a lot people who have this reaction, and I suspect most of them are full of shit. You’re getting older, it’s perfectly natural to reflect on your life and think about past accomplishments and mistakes. This is how we grow and learn. I would argue that your life isn’t worth living if you’re not taking the time to reflect and examine. Take it up with Socrates if you don’t agree.

All that said, my reaction to birthdays seems bizarre by comparison. I’m starting to fear I’m alone on this one. My reaction to every birthday is an unprovoked existential crisis. Normally I love reflecting on existence, pondering those seemingly unanswerable questions. But on a birthday it just hits me; as if some higher power decided that on your birthday you’re going to think about why you exist and what you’re suppose to be doing with your life.

So, every year, it’s time for the status report. Only I lost my orders, and have been playing video games instead. “Sorry boss, I have absolutely no idea what I’m suppose to be doing”.

Quite the perplexing state: we desire to know why we exist, knowing only that we do exist and with a feeling of self-importance. We can’t help ourselves but to be ignorant of why we exist while simultaneously overstating our importance to exist.

We assume our lives are important even though we don’t know why.

In all likelihood my life isn’t that important, but believing that is contrary to the human condition. It’s like Nihilism, even with a good argument, it’s so contrary to the human condition that the only people who believe in Nihilism are either being cynical or they do so with a Pandoras Box approach (where Nihilism is the path that breaks down the interpretations of the world that prevent us from understanding our right course). Nihilism tends to contradict because of this instinctive idea that our existance is important; that there is a right path for humanity. It seems that our existance is reason enough to believe in this importance.

Most of the year it is. Most of the year my existance is reason enough to assume that my life is important and that through intuition I can understand what I ought to be doing.

It’s different around my birthday. “Hey, you’re 30”. I laugh, I make jokes. I get depressed. I think about why I’m depressed. On my birthday, the fact I exist doesn’t provide me enough reason to believe that my life is important. Depressing, but confusing too, since the rest of the year is different.

Normally I trust my intuition, faith in myself if you will, that the only way to overcome not knowing why you exist is to trust that you exist for a reason, that all of those crazy emotions are key. We are guided, without reason, by an intuitive sense of purpose, a need to belong and do what is ultimately right. Slaves to our own intuition, fed by the instinctive desire to feel important, to feel needed.

See what I mean?! Every year is like this, some worse than others. It’s the weirdest thing… you’d think I’d just write a list like everyone else.

Making Java less Difficult

I’m sure that you’ve heard this before, but I’ll say it anyway: Why do Java developers make simple problems difficult? There are plenty of people who have written on that topic, and I have no desire to further expound the point. I think we all get it. What I do want to talk about is how to make things less difficult, in particular how to make web application development as easy in Java as it is in Perl, PHP, Python, or Ruby.

I challenge that the answer is not to build Yet Another Crappy Web Framework ™. I for one, am terribly sick of complicated frameworks that are more work to configure than a web application is to create manually.

What I’ve noticed, is that developing in Perl is often faster than Java mostly due to the built-in regular expression handling Perl offers; and this is despite the fact Java offers a comparable regular expression engine.

Two things that slow me down as developer:

1) Staring at code that involves a StringTokenizer, while loops, and several nested substring statements where I’m counting characters on my figners. Why do Java programmers continue to write code like this? I suspect the reason is because of the next issue.

2) Wanting to use a regular expression, you first create a Pattern object, followed by a Matcher object, remember to compile your pattern, and then finally match the pattern using the Matcher object. Your matched object can use the group method, which is implemented from the MatchResult interface. I’m going to repeat myself, as this bears repeating, why do Java programmers make simple problems difficult?

These two issues cover 90% of the cause for Java development to be slower than Perl. Solving these two problems should then speed up my development in Java to be on par with my development in Perl!

Let’s look at some sample Perl code:

if (/a(.*)b/) {
  print "$1 is between a and b";
}

We could do this fairly easily in Java, but let’s be honest, the real strength of Perl is when I start doing things like:

 @bar = split(/:/, $_);
 @foo = grep(/^#/, @bar);
 print join(':', @foo)

While that example is simple to any Perl programmer, it is, admittedly less intuitive and less clean than a Java version. Although it is only three lines rather than the twenty or more it would take in Java. But what about something like this:

XYZ bar = XYZ.split(":", arg);
XYZ foo = bar.match("^#");
System.out.print( foo.join(":") );

This seems easy, and perhaps clean enough not to bewilder the Java developer the way Perl code tends to. Perhaps we can create a utility Java class that supports the needed data structure and methods to write code like that. What other methods would we need to empower Java developers to stop writing the usual tokenizer/substring mess? Consider just the basics:

  join - join the values of an array into one string
  split - split a string into parts
  match - match the parts
  matchAll - match repeated patterns
  trim - trim the ends of a string

This much is already provided in Java in one class or another. The String class already lets me match and split, but it’s not enough. I need more than just a String class, I want an array of Strings. No, I want a dynamically sized array of Strings that I can easily get and put from. While I’m at it, I want it to be associative array, where I can treat it like a normal array and it will retain index order, or I can add my own keys (similar to PHP arrays). Given a data structure like that, add in the above methods to that data structure, and I think we’d be in business.

The scripting languages provide a default context that includes regular expressions, hashtables, dynamically sized arrays and various string manipulation features. I can do all of that in Java, but it’s not conveniant creating a different object for multiple data structures and methods, especially since they’re almost always part of the same context.

Let’s create one class that contains the data structure and the methods. This should solve both of my issues, making it easy to do what once was difficult and speeding up development for most all applications.

Even better, we can leverage what Java is good at to get this done. Let’s extend a Java Hashtable which solves most of the data structure part.

public class Preg extends Hashtable {

 public Object get( int inInt ) throws NullPointerException {
  Integer cast = new Integer(inInt);
  return this.get(cast);
 }

 public void put( Object value ) {
  Integer key = new Integer( this.size() );
  this.put( key, value);
 }
}

I’ve added the ability to put values without providing a key, this will allow us to use this object as both a HashTable and a Vector (i.e. a resizable array). For those of you who are upset that this is inefficient, you’re absolutely right. Although I would argue that it’s a negligible difference. But since we’re speeding up development, this will more than compensate since I’ll actually have time to optimize my code and do some refactoring before the deadline (a luxury I rarely see Java developers having time for).

The next step is to create the methods, we should also provide static versions of the methods to gain maximum convenience. For example, I may only implement one “join” method, but I’ll provide several prototypes:

 public String join() {
  return Preg.join("", this);
 }
 public String join( String glue ) {
  return Preg.join(glue, this);
 }
 public static String join( Preg src ) {
  return Preg.join("", src);
 }
 public static String join( String glue, Preg src ) {
  ... the actual implementation
 }

After I do this for all of the methods, I’m left with a very powerful object that allows me to accomplish the previous Perl example:

Preg bar = Preg.split(":", arg);
Preg foo = bar.match("^#");
System.out.print( foo.join(":") );

Three lines of Java that for once is equal to three lines of Perl!

All the fancy trickery that we do in Perl or PHP can be done in Java with just the right mashup of HashTable, Pattern, and Matcher. You can split a string into parts, filter, trim, and arrange to your hearts desire without ever having to count a character offset for a substring!!


Note: I have all of this working for one of my projects, which will hopefully move as fast as it would in Perl or PHP (it’s a web application, so I suspect this will in fact compensate for that 90% slowdown I mentioned earlier). If there’s interest, I can provide the source and/or Javadoc.

Even with my extremely verbose Javadoc it’s surprisingly not that much code.