Tuesday, 22 July 2014

Deprecate at Your Peril!

You're building something great. You're investing a lot of time and effort into writing the code. Question:

What kind of platform would you rather develop for? One which has a bit of a learning curve at the start, but then you'll be able to keep using it, depending on it, and adding new features to your application, for years to come? Or would you choose a platform where features your application depends on might be taken out or changed in the next version, requiring you to spend time re-writing your code in a few years just so your application won't stop working when the new version of the platform comes out?

Enter the concept of stability

'Stability' is a misunderstood term that gets thrown around a lot in IT. 'Is this OS stable?' People use it to mean something reliable and well-built. Something that won't crash easily.

In academic software engineering, stability has a different, but related, definition. Stable means you can build on it, because it isn't subject to change underneath your application between versions. That's important, because if you have a mature code-base that depends on a particular API, and the API's fundamental interface changes between versions, it creates a moving target problem. It means you have to periodically modify your application's function calls just to keep up. Not only does that rob you of time you could otherwise have spent adding new features, it can require major surgery on your application at the risk of introducing regression faults or breaking the original internal architectural design of your application (the new version of the API you're using might require you to adopt a new, 'improved' usage paradigm that your code wasn't originally designed around), making your code less elegant for future maintainers. In the real world, where enterprise applications have incredibly complicated and mature code-bases that are often not even well understood by the people who are paid to maintain them, this is a real problem.

And if it's bad for large enterprises, it's worse for independent developers, who will often abandon their work when the platform no longer runs it rather than continue to maintain it indefinitely. In contrast to film and literature (non-interactive forms of entertainment), where classic works may endure for centuries, think of all of the cultural loss of the countless computer games that have been forgotten simply because it is no longer possible to run them.

Examples of stable platforms

Programming languages like C and C++ have been officially standardised by official standards bodies. Although new features get added in each new revision to the language, the standard remains backward compatible. While these languages mightn't be perfect, standardisation means that you can depend on them to be a fixed point in your project.

Recently, in response to concerns by governments and large organisations that documents archived as Microsoft Word documents might be rendered inaccessible in a decade's time, and the existence of the already-standardised Open Document formats, Microsoft went to a lot of trouble to get their XML-based Office formats officially standardised. Microsoft's DOCX format might leave a little to be desired in terms of compatibility issues, but at least they made the effort.

The X Window system, version 11, is one of the most stable pieces of software out there. It's the graphical front-end used by almost every Linux distribution, almost every BSD distribution, and is even provided as a compatibility layer in Apple's OSX. And it's been at version 11 since the 1980s. The API is horrible to program for (people rarely work with the library directly any-more), and it provides several features that are now redundant because people have come up with better ways of doing things. But that doesn't matter. What matters is that it's always there behind the scenes, and it's reached a level of stability that makes it dependable and means it will continue being used for years to come.

Why we're giving up on OpenID

We had high hopes for OpenID. The vision was that you would sign up for an OpenID account through any OpenID provider, and you would be able to use that account to log into any website that also followed the OpenID standard. Rather than having to create a separate account for every website, you'd only need one or two accounts for all websites. Individual website owners wouldn't need to worry about securing your credentials as these would be held and authenticated by OpenID providers instead.

Companies like Google, Yahoo, Microsoft, even AOL, adopted the OpenID standard. We set up an OpenID login system on our website. We wouldn't need to deal with account security at our end, we could simply allow people to use their existing OpenID account (such as a Google account) to log in without even having to sign up to our website separately. The system was simple to implement and seemed to work well. There were potential security vulnerabilities, but no really fatal flaws that couldn't be fixed.

Then something changed. The OpenID foundation announced that they didn't believe in OpenID anymore, and release a new improved and very different system called OpenID Connect instead. A website called MyOpenID, which provided OpenID accounts for people who didn't want to sign up with larger companies like Google, announced that they were shutting down for good. Websites like Flickr and Facebook announced that they were moving away from OpenID and would no longer be accepting third-party login credentials.

Fortunately for us, our OpenID login facility was never more than experimental. Had we been serving customers through it, those customers would have potentially found themselves locked out of accounts that no longer authenticated and unable to access their purchases. All because the OpenID foundation decided that pursuing a new 'easier to use' system was more important than preserving the functionality that existing websites were already depending on.

Why the PHP Group is making a mistake

PHP is a programming language that's commonly used to generate dynamic websites by the server. MySQL is a database system that is often used hand-in-hand with PHP for data stored and accessed by the website. (Shameless plug: for people who want a simple web content management system without the mystery of a MySQL database, there's always FolderCMS.)

A few years ago, the PHP Group announced that the MySQL functions in PHP were being deprecated. This means 'we're going to get rid of them, so stop using them.' In their place, there would be two new, but somewhat different, MySQL function sets to choose from. This was, and is, a controversial move. A lot of very popular websites rely on PHP's established MySQL functionality, and PHP owes a lot of its popularity to its ability to interface easily with MySQL. Why were they doing this? Their own website's FAQ isn't very clear:

Why is the MySQL extension (ext/mysql) that I've been using for over 10 years discouraged from use? Is it deprecated? ...

The old API should not be used, and one day it will be deprecated and eventually removed from PHP. It is a popular extension so this will be a slow process, but you are strongly encouraged to write all new code with either mysqli or PDO_MySQL.

That isn't really a justification, it's just the question re-worded into the format of an answer. There are several threads on StackOverflow, where the question has been repeatedly asked, which provide some more substantial answers: one is that the old functions are potentially dangerous for beginners who don't know that they are supposed to validate and sanitise user input before sending it into an SQL query. Another is because of a belief that developers should be moving away from text based SQL queries and moving towards pre-compiled queries. This provides a performance boost. On the other hand it represents a significant move away from the usage paradigm that made SQL popular in the first place. SQL is a database language that has become universal because, like HTML that powers the web, data is transmitted in a well-established human-readable language which is not coupled to system-dependent function bindings or compiled code. You send a text-based query to the database engine and receive a reply. It doesn't need to be complicated.