OpsGuru Achieves AWS DevOps Competency
OpsGuru has achieved AWS DevOps Competency, recognizing its CI/CD practice and expertise in infrastructure automation and configuration management...
Infrastructure as Code (IAC), Continuous Integration and Continuous Delivery (CI/CD) are becoming part of the standard pattern for delivering application code into production environments. Unfortunately, this methodology is rarely applied when deploying models for relational databases, often favoring more classic and manual methods that are thought to be safer. This is unfortunate because Infrastructure as Code and Continuous Delivery Makes Database Development easier for the developer and less risky for the business.
Implementing IAC combined with a CI/CD pipeline usually follows a progression from manual, to semi-automated with infrequent releases, to automated with more frequent releases. The common occurrence when CI/CD is applied to database models is to transition from manual to semi-automated but often, the process gets stuck at this stage, never reaching a stage with frequent releases. The cause of this stall will be slightly different for each company, but it is generally fear of losing or corrupting data. Every database developer I know has a war story about a production database mistake they have made that caused data loss or corruption.
Usually, these mistakes have a semi-happy ending where there was some downtime, and the database backup was used to do a full restore, but that is not always the case, and sometimes data is gone forever.
In the postmortems of database events, the common suggestions are to go slower to better understand changes. On the surface, this is a good suggestion, but what “slowing down” means to most companies is to release less frequently. Unfortunately, releasing less often is not going slower, it just feels like it is. If the same number of developers are making roughly the same number of changes and those changes are just being released less often but all at once, that is better characterized as doing nothing broken up by short periods of going very fast. This can create a negative feedback loop where large batches of changes are made all at once to a database model, causing errors, leading to releasing less often, leading to larger batches, and more errors.
The steps above may look like they introduce a lot of overhead, but realistically they are just a reversal of most people’s current practices. Rather than doing a lot of work less often, the pattern above is to do a little work more often. As well, mistakes will always happen. This applies to databases just as much as any other software. The best approach you can take is the path that limits impact and increases the chance of a safe fix. The current standard for deploying databases does not do this.
Addendum: As stated above, mistakes will always happen, the pattern above does not remove all mistakes as nothing can, it is important that a disaster recovery strategy exists for when large mistakes happen. If you do not have one for your databases, that should be your highest priority above all else.