Horror de codificação.
programação e fatores humanos.
Obtenha seu banco de dados no controle de versão.
Quando pergunto para as equipes de desenvolvimento, se o banco de dados está sob controle de versão, eu costumo ficar em branco.
O banco de dados é uma parte crítica da sua aplicação. Se você implantar a versão 2.0 do seu aplicativo contra a versão 1.0 do seu banco de dados, o que você obtém? Uma aplicação quebrada, isso é o que. É por isso que seu banco de dados deve sempre estar sob controle de origem, logo ao lado do código de sua aplicação. Você implanta o aplicativo e você implanta o banco de dados. Como a manteiga de amendoim e o chocolate, eles são dois ótimos gostos que são ótimos juntos.
Quando se trata de controle de versão, o banco de dados geralmente é um segundo ou mesmo cidadão de terceira classe. A partir do que eu vi, equipes que nunca pensariam em escrever código sem controle de versão em um milhão de anos - e com razão - de alguma forma, seria completamente inconsciente da necessidade de controle de versão em torno dos bancos de dados críticos em que suas aplicações dependem . Eu não sei como você pode se chamar de engenheiro de software e manter um rosto reto quando seu banco de dados não for exatamente o mesmo nível rigoroso de controle de origem que o resto do seu código. Não deixe isso acontecer com você. Obtenha seu banco de dados sob controle de versão.
Eu estava pensando nisso novamente porque meu amigo e co-autor K. Scott Allen escreveu uma brilhante série de cinco partes sobre a filosofia e a prática do controle de versão do banco de dados:
K é um dos desenvolvedores de software mais inteligentes que conheço. Leia tudo; mesmo se você atualmente tiver seu banco de dados sob o controle de versão (e intimidar para você, se você fizer isso), há muito alimento para chocolate amargo e chocolate pensado aqui. Não importa quais ferramentas você usa - de acordo com o manifesto ágil, indivíduos e interações são mais importantes do que processos e ferramentas. Basta obter o seu banco de dados no controle de versão já.
Uma lição na Apple Economics.
Todo Mentiroso do Usuário.
Escrito por Jeff Atwood.
Entusiastas do interior. Co-fundador de Stack Overflow e Discourse. Disclaimer: Eu não tenho idéia do que eu estou falando. Encontre-me aqui: twitter / codinghorror.
O Codage Horror tem sido continuamente publicado desde 2004.
Estratégia de controle de versão do banco de dados
Obter através da App Store Leia esta publicação em nosso aplicativo!
Metodologia de controle de versão SQL.
Existem várias perguntas sobre SO sobre controle de versão para SQL e muitos recursos na web, mas não consigo encontrar algo que abranja o que estou tentando fazer.
Primeiro, estou falando de uma metodologia aqui. Estou familiarizado com as várias aplicações de controle de origem lá fora e estou familiarizado com ferramentas como o SQL Compare do Red Gate, etc. e sei como escrever um aplicativo para verificar as coisas dentro e fora do meu sistema de controle de origem automaticamente. Se existe uma ferramenta que seja particularmente útil para fornecer uma nova metodologia ou que tenha uma funcionalidade útil e incomum, então excelente, mas para as tarefas mencionadas acima, já estou configurado.
Os requisitos que eu estou tentando encontrar são:
O esquema do banco de dados e os dados da tabela de pesquisa são versões de scripts DML para correções de dados para tabelas maiores são versionadas. Um servidor pode ser promovido da versão N para a versão N + X, onde X pode não ser sempre 1 O código não está duplicado no controle de versão sistema - por exemplo, se eu adicionar uma coluna a uma tabela, não quero ter certeza de que a mudança esteja em um script de criação e um script alter. O sistema precisa suportar vários clientes que estão em várias versões para o aplicativo (tentando levá-los todos a dentro de 1 ou 2 lançamentos, mas ainda não existe)
Algumas organizações mantêm scripts de mudança incremental em seu controle de versão e para obter da versão N para N + 3 você teria que executar scripts para N> N + 1, então N + 1-> N + 2 então N + 2> N + 3. Alguns desses scripts podem ser repetitivos (por exemplo, uma coluna é adicionada, mas depois é alterada para alterar o tipo de dados). Estamos tentando evitar essa repetição, uma vez que alguns dos DBs do cliente podem ser muito grandes, então essas mudanças podem demorar mais do que o necessário.
Algumas organizações simplesmente manterão um script de compilação de banco de dados completo em cada nível de versão e usarão uma ferramenta como o SQL Compare para trazer um banco de dados para uma dessas versões. O problema aqui é que misturar scripts DML pode ser um problema. Imagine um cenário onde eu adicione uma coluna, use um script DML para preencher a referida coluna e, em uma versão posterior, o nome da coluna é alterado.
Talvez haja alguma solução híbrida? Talvez eu só esteja pedindo demais? Qualquer dúvida ou sugestão seria muito apreciada.
Se os moderadores acharem que isso seria mais apropriado como um wiki da comunidade, por favor me avise.
Eu lutei com isso há vários anos antes de adotar uma estratégia que parece funcionar muito bem. Pontos principais em que vivo:
O banco de dados não precisa ser autenticado independentemente do aplicativo. Todos os scripts de atualização de banco de dados devem ser idempotentes.
Como resultado, não crio mais nenhum tipo de tabela de versões. Simplesmente adiciono mudanças em uma seqüência numerada de arquivos. sql que podem ser aplicados a qualquer momento sem corromper o banco de dados. Se isso facilitar as coisas, escreverei uma tela de instalação simples para o aplicativo para permitir que os administradores executem esses scripts sempre que quiserem.
Claro, esse método impõe alguns requisitos no projeto do banco de dados:
Todas as mudanças de esquema são feitas através de script - nenhum trabalho GUI. Deve-se ter cuidado especial para garantir que todas as chaves, restrições, etc. sejam nomeados para que possam ser referenciados por um script de atualização posterior, se necessário. Todos os scripts de atualização devem verificar as condições existentes.
Exemplos de um projeto recente:
003.sql, 004.sql, etc.
Em qualquer momento eu posso executar toda a série de scripts contra o banco de dados em qualquer estado e saber que as coisas serão imediatamente atualizadas com a versão atual do aplicativo. Como tudo é roteado, é muito mais fácil criar um instalador simples para fazer isso, e é adicionar as mudanças de esquema ao controle de origem, não é problema.
Você tem um conjunto rigoroso de requisitos, não tenho certeza se você encontrará algo que coloca cheques em todas as caixas, especialmente os múltiplos esquemas concorrentes e o controle de versão inteligente.
A ferramenta mais promissora que eu li sobre esse tipo de ajustes é Liquibase.
Aqui estão alguns links adicionais:
Sim, você está pedindo muito, mas todos são pontos realmente pertinentes! Aqui, no Red Gate, estamos nos movendo em direção a uma solução completa de desenvolvimento de banco de dados com nossa extensão SQL SSMS e estamos enfrentando desafios semelhantes.
Para o próximo lançamento, estamos apoiando as mudanças de esquema e apoiando dados estáticos indiretamente através da nossa ferramenta SQL Data Compare. Todas as alterações são salvas como scripts de criação, embora, ao atualizar ou implantar em um banco de dados, a ferramenta assegurará que as alterações sejam aplicadas apropriadamente como uma ALTER ou CREATE.
O requisito mais desafiante que ainda não possui uma solução simples é o gerenciamento de versões e a implantação, que você descreve com muita clareza. Se você fizer mudanças complexas no esquema e nos dados, pode ser inevitável que um script de migração artesanal seja construído para obter entre duas versões adjacentes, pois nem todas as "intenções" são sempre guardadas ao lado de uma versão mais recente. Os renomeamentos de coluna são um excelente exemplo. A solução pode ser que um sistema seja concebido que salve a intenção, ou se isso for muito complexo, permite ao usuário fornecer um script personalizado para executar a mudança complexa. Algum tipo de estrutura de gerenciamento de versão gerenciaria esses e "magicamente" construir scripts de implantação a partir de duas versões arbitrárias.
Para este tipo de problema, use o Visual Studio Team System 2008 para controlar a versão do seu banco de dados sql.
No tsf não há. do tipo avialbe like.
Estamos usando o SQL Examiner para manter o esquema do banco de dados sob controle de versão. Eu tentei o VS2010 também, mas na minha opinião, a abordagem VS é muito complexa para projetos pequenos e médios. Com o SQL Examiner, principalmente trabalho com o SSMS e uso o SQL Examiner para atualizações de check-in para SVN (TFS e SourceSafe também são suportados, mas eu nunca tentei).
É de código aberto, e especificamente projetado para rotear um banco de dados inteiro - tabelas, visualizações, procs para disco e, em seguida, recriar esse banco de dados por meio de um destino de implantação.
Você pode rotear todos os dados, ou apenas especificar quais tabelas para dados de script.
Além disso, você pode compactar os resultados para distribuição.
Usamos isso para controle de origem de bancos de dados e para testar patches de atualização para novos lançamentos.
No back-end é construído em torno do SMO e, portanto, suporta SQL 2000, 2005 e 2008.
DBDiff está integrado, para permitir comparações de esquema.
Estratégias de versão de software.
Um perfeccionista preso no mundo real.
Software Versioning pode ser uma dessas áreas onde você nunca sente como se você tivesse exatamente isso. Não existe uma orientação definitiva com uma solução que satisfaça todos. Principalmente equipes de software estão confusas sobre o assunto, ou estão escolhendo ignorá-lo. Este guia visa preencher a lacuna e oferecer uma visão prática de várias estratégias populares e trade-offs.
Algumas das técnicas serão voltadas para a Microsoft stack (Windows), pois é o que tenho mais experiência, mas os princípios se aplicam em geral. Linux, Node. js, Python & amp; Ruby também é levemente tocada.
Versões em todo lugar.
Estamos bastante acostumados com o termo "versão" hoje em dia. Mais comumente usado no mundo do software, vazou para a mídia e outras indústrias. As seqüências de filmes estão sendo versionadas - "Fast & Furious 7" (7 !?), os sapatos estão sendo versionados - "Air Jordan XX8" e, mais popular, os livros estão sendo versionados - "One Minute Manager, 1984 edition". Na verdade, olhando para os livros, as pessoas estão sendo controladas há bastante tempo - "Enciclopédia Britânica", desde 1768 !.
A premissa é simples - à medida que os produtos vivem e continuam a ser melhorados, os lançamentos mais recentes devem ser distinguidos dos anteriores. O nome do produto não muda, porque o mercado já se familiarizou com isso, então algo é anexado no final para indicar que ele é mais novo (ou diferente).
Enquanto o controle de versão existia muito antes da era digital, o software realmente empurrou o problema para a frente. Modificar e liberar uma nova cópia de software é um processo muito rápido, muitas vezes mais rápido do que mudar uma linha de produção industrial para produzir uma nova peça de roupa ou imprimir uma nova edição de livros. Assim, os ciclos de iteração do software são muito mais curtos, e um potencial para muitas edições simultâneas é muito maior.
Basta usar anos (ou mesmo meses), como em edições de livros, não é suficiente. Novas versões do software podem ser produzidas em poucos minutos. Além disso, o software tem um enorme aspecto paralelo - fluxos de software - onde várias versões principais podem existir e todas podem ser atualizadas continuamente ao mesmo tempo. Isso raramente acontece com seus sapatos. (Eu queria fazê-lo, às vezes eu simplesmente não quero atualizar para o modelo de catálogo deste ano, quero uma melhoria para o meu antigo par!)
Por que Versão?
Antes de mergulhar em como implementar o controle de versão, vamos parar e considerar o motivo pelo qual queremos fazê-lo em primeiro lugar! Afinal, se conhecemos as razões exatas de por que é útil, podemos avaliar melhor se as soluções propostas são adequadas.
Nós aludimos a isso na última seção, referindo-se ao chamado versão pública. Esta é a versão que é visivelmente visível e, principalmente, possui peso no mercado (ou seja, é mais provável que seja definido pelo departamento de marketing / vendas). "Windows 7", "iPhone 5S", "Office 2013" - são exemplos de uma versão pública.
A versão pública destina-se a ser simples e memorável, indicando aos clientes que é novo e brilhante (assumindo que as pessoas geralmente querem "novo e brilhante"). As pessoas não entendem "10.6.6527.14789" - mas recebem "2013" ou "5". Tem sido cada vez mais popular usar o ano de lançamento como o número da versão pública, pois transmite de forma simples e poderosa o status atualizado. Os fabricantes de automóveis estão fazendo isso há muito tempo.
A versão privada é o que estamos acostumados no mundo do software. Um selo interno que (espero) identifica de forma exclusiva um determinado software. O software, como um carro, pode ser feito de muitas partes. Seguindo a analogia do carro, a "versão privada" do carro é o número do chassi VIN. Os fabricantes liberam e mantêm catálogos maciços de peças, mapeando para "números de versão" do carro. Um mecânico pode então pedir uma parte exata que corresponda ao seu veículo.
Sem um "número de peça privada", você não seria capaz de atender seu software na natureza, pois não saberia o "formato" exato que um módulo de substituição deve ser para se adequar ao sistema geral. Imagine se você foi forçado a mudar seu carro inteiro quando uma luz da cauda quebrou.
Portanto, o número da versão privada é usado como um identificador de catálogo. Destina-se a ser usado quando solucionar problemas ou atender seu software. (Eu gosto da analogia "dogtag" de Jeff Attwood!) Ele deve mapear para uma descrição do que é essa peça de software - qual é a sua forma e função. E o que melhor "descrição" do que o próprio código-fonte original!
O uso essencialmente se resume a:
Identificando o código fonte original para uma peça de software, para permitir parches incrementais e para confirmar a operação defeituosa Identificando se uma parte é "compatível" com outra, ou se ela pode substituí-la.
Tudo isso é realizado com um número de versão privado. A versão pública é simplesmente um moniker de marketing, e mapeia para uma ou mais partes internas de software, cada uma com sua própria versão privada. (Assim como o Toyota Corolla 2011 contém um quadro ZRE142 e um conversor de torque 32000-12420)
Uso da versão.
No Windows, um conceito de número de versão é suportado por uma camada do sistema operacional. Os números de versão são incorporados em todos os arquivos executáveis binários e podem ser vistos ao pairar sobre o EXE / DLL no Windows Explorer ou ao exibir Propriedades. Na verdade, qualquer arquivo que possa ter "recursos" pode ter uma versão, pois está armazenado no recurso VERSIONINFO.
Ele usa o formato comum em que todos nós somos usados para: major. minor. build. revision (por exemplo, "1.2.360.0"). É importante notar que cada número é limitado a 16 bits e, portanto, não pode exceder 65535. Isso tem certas implicações sobre o que podemos representar com esses números.
Observe que um rótulo para esses números não é definido de forma estrita - são simples 4 números inteiros curtos. Os dois primeiros são referidos como principais e menores, por unanimidade. Os dois últimos são onde vemos algumas variações, dependendo do esquema de versão.
Esta versão é usada de forma proeminente durante o processo de atualização do Windows, que utiliza a tecnologia Windows Installer (MSI) para atualizar várias partes do sistema. Essencialmente, o Windows Installer segue certas regras para determinar se a atualização que está instalando é mais nova que o que já está instalado. Se a versão for maior, então está ok para atualizar.
Naturalmente, esse conceito flui para o Framework, que foi construído em torno de muitos conceitos existentes do Windows. Nós temos a classe Version, que segue o paradigma 4 integer. Também podemos definir AssemblyVersionAttribute e AssemblyFileVersionAttribute, que especificam uma versão de montagem e um recurso de versão do Windows, respectivamente.
Em, a versão de montagem existe separadamente da versão baseada no Windows VERSIONINFO, que é o que você vê no Windows Explorer (ou propriedades do arquivo). Ele forma parte do nome forte da montagem e é usado exclusivamente pelo Framework ao resolver assemblies. A versão de duas montagens e a versão do arquivo do Windows - podem ser diferentes, mas, mais frequentemente, elas são as mesmas para evitar confusões.
usa a versão para rastreamento de dependência, ou seja, observando as versões de montagens a serem referenciadas, tornando assim óbvio quando uma atualização quebra a compatibilidade para aplicativos que dependem de uma determinada biblioteca. Este é um passo à frente da versão do arquivo nativo do Windows, que foi usada apenas durante o processo de atualização, e não ao referenciar uma biblioteca, levando ao infame "DLL Hell".
Vale ressaltar que a Versão da Versão permite 4 inteiros de 32 bits, enquanto o AssemblyFileVersionAttribute é limitado a 16 bits, pois ele mapeia diretamente para o recurso VERSIONINFO. Assim, se queremos que AssemblyVersionAttribute e AssemblyFileVersionAttribute sejam iguais, isso também coloca um limite nos componentes da versão de montagem.
O Linux, em geral, usa um método diferente para abordar o controle de versão. Os arquivos binários não possuem um selo de versão incorporado, como a maioria dos binários do Windows. Em vez disso, um nome de arquivo da biblioteca compartilhada indica sua versão, e. /usr/local/lib/mylib. so.1.5.
São criados vários links simbólicos, e. mylib. so - & gt; mylib. so.1 e mylib. so.1 - & gt; mylib. so.1.5. Um aplicativo pode fazer referência a uma biblioteca via link simbólico, como mylib. so.1, e obter a versão compatível 1.x mais recente instalada.
Isso funciona bastante bem, desde que todos sigam esta convenção. Cada biblioteca pode então, por sua vez, carregar bibliotecas, dependendo da mesma maneira.
Os usuários de Linux também estarão familiarizados com o popular "Advanced Package Tool", apt-get, usado de forma ubíqua nos sistemas derivados da Debian, como o Ubuntu. Sendo um verdadeiro gerenciador de pacotes, ele suporta a instalação de versões lado a lado e dependências de rastreamento entre pacotes. Examinamos as vantagens dos gerenciadores de pacotes nas seções a seguir.
Esquemas dos números de versão.
Existem vários esquemas de numeração popular para software, mas todos eles são uma variação do mesmo tema e compartilham características comuns. Ter componentes principais e menores da versão é o mesmo em toda a placa. O que eles representam é bastante consistente:
Maior aumento de número: representa grandes mudanças de falha no sistema de software, muitas vezes não é compatível com versões anteriores, ou adição de grande quantidade de novas funcionalidades. Aumento de número menor: representa mudanças substanciais de evolução, principalmente atualizações ou melhorias na funcionalidade existente ou adição de uma nova nova conjunto de características.
Acima é apenas uma diretriz - não há regras estabelecidas sobre o que as versões maiores e menores devem representar. Só que eles deveriam aumentar à medida que mais recursos são adicionados ao software com o tempo.
Windows e binários especificam um esquema de versão de 4 partes: maior. menor. construir. revisão . Os dois últimos componentes são de forma bastante gratuita, existem muitas variações no que representam - alguns usam contadores de construção incrementais, alguns utilizam a data / hora da compilação, e alguns os derivam dos números de revisão interna do controle de origem.
Muitos ignoram o número da revisão e focam apenas na compilação. O Windows Installer, por exemplo, possui apenas 3 componentes. Se você deseja que sua versão abranja binários e o pacote que contém, então é melhor limitar-se a apenas três números: maior. menor. construir.
Em qualquer caso, o padrão geral: quanto maior for o número da versão, mais recente será o software.
Um popular sistema de controle de versão nos últimos anos (especialmente entre projetos de código aberto) foi apelidado de Versão Semântica (também conhecido como SemVer) e documentado no Semver. Ele apresenta alguns outros componentes, e torna a versão uma string alfanumérica, em vez de um número puro - abrindo algumas possibilidades interessantes.
Os três primeiros componentes são os mesmos que já discutimos, sendo o patch opcional. Patch é bastante equivalente ao componente de compilação, mas a semântica pode ser diferente. A versão semântica realmente prescreve quando cada componente deve ser incrementado (com base em mudanças de "API pública").
O pré-lançamento, se especificado, é uma seqüência alfanumérica que é usada para marcar uma versão como uma que precede a versão final. Por exemplo, 1.3.567-rc1 precederá 1.3.567. Isso é útil para adicionar mais significado ao rótulo da versão do que simplesmente usando números.
Metadata é outro componente opcional, que permite uma maior marcação do rótulo da versão (geralmente com um timestamp de compilação), mas não participa no pedido de versão, ou seja, as versões que apenas diferem nos metadados são consideradas as mesmas.
O pré-lançamento é útil com gerenciadores de pacotes como o NuGet, que os tratam de forma diferente - eles são considerados instáveis e não são visíveis para o público em geral, a menos que seja explicitamente solicitado. Isso permite a liberação de versões alfa / beta sem afetar aqueles que dependem de versões estáveis.
As tags de pré-lançamento também podem ser úteis no fluxo de lançamento interno ao lidar com hotfixes paralelos e compilações privadas, conforme discutido mais adiante neste artigo.
Versioning Non-Binary Files.
Então, sabemos como marcar uma versão nos arquivos binários. Mas e os outros arquivos que compõem um sistema de software - arquivos de configuração, imagens, documentos, fontes, etc.? Como você marca uma versão neles?
E quanto a frameworks da web como o ASP (ou Ruby, Node. js, Python, etc.), onde arquivos e páginas de origem podem ser modificados no local e atualizados automaticamente? Como podemos partilhar um sistema web, por exemplo, atualizar alguns arquivos de destino e ainda mantê-lo versionado?
A resposta é - não atualize arquivos individuais! Não há como manter um número de versão significativo para o seu aplicativo de software, se os arquivos não binários individuais podem ser atualizados ad-hoc como hotfixes.
Atualize usando um pacote em vez disso.
Importância da compilação e do pacote.
Quando você ouve o termo "construir", normalmente a compilação vem à mente - a maioria dos idiomas compilados, como C #, C ++ ou Java, precisam ser compilados em um binário antes de poderem ser executados. E, portanto, a construção é comumente associada ao processo de compilação.
Mas essa não é uma imagem inteira. Alguns idiomas ou frameworks, como Python ou ASP, não exigem estritamente a compilação. Eles podem ser interpretados, no caso de Python, ou compilados on-the-fly, no caso do ASP. O que uma construção deve fazer para esses sistemas? Como você "constrói" um aplicativo Python?
É por isso que é mais útil pensar em construir como um processo de montagem ou simplesmente em uma embalagem. Assim como uma linha de bens de consumo, e. sapatos, é embalado antes de enviar para as lojas, assim como um sistema de software, antes de ser lançado.
Um conceito de pacote é essencial para o controle de versão, porque um pacote é uma coleção única das peças que compõem um sistema de software, ou parte dele, e, portanto, pode ser identificado e carimbado com uma versão. Com o sistema de Gerenciamento de Pacotes certo (que analisamos na próxima seção), ele pode ser implantado e atualizado e especificar dependências nos outros pacotes.
O software hoje nunca é um único arquivo executável binário - é uma coleção de vários binários, bibliotecas, documentos, arquivos de configuração, imagens e outros recursos. Um pacote é o que nos ajuda a agrupá-los, versão e lançamento para o mundo exterior.
Um pacote não precisa ser sofisticado, embora ajude em algumas situações (por exemplo, bancos de dados). Pode até ser um arquivo ZIP simples, que pode conter versão no nome do arquivo ou incorporado como um arquivo de texto. Na verdade, muitos projetos de código aberto fazem exatamente isso: um lançamento é um ZIP ou um arquivo. tar. gz.
O importante é que um pacote é uma única unidade, que é lançada e atualizada ao mesmo tempo, levando a consistência. É comum ter vários pacotes, por exemplo, representando componentes "cliente" e "servidor", ou qualquer outro agrupamento lógico aplicável a um sistema de software. Cada pacote pode ser atualizado por conta própria.
Vamos dar uma olhada em alguns dos métodos de embalagem comuns, a abordagem de versão e para qual aplicativo eles são mais adequados.
Instalador do Windows.
Melhor Adequado: aplicativos completos de GUI do Windows, Serviços Windows ou Drivers.
O mais antigo e, por muito tempo, o único caminho recomendado, para instalar aplicativos em uma plataforma Windows. Ele possui um suporte de versão incorporado e um conjunto de regras sofisticadas (alguns dirão "complicado") para determinar quando atualizar componentes. Enquanto um pacote do Windows Installer (.msi) é um único arquivo, em essência, é uma coleção de pequenos componentes lógicos (até arquivos únicos) que podem ser atualizados de forma independente.
O Windows Installer verificará cada arquivo individual que está sendo instalado, seja ele uma versão e se a versão é maior do que um arquivo com o mesmo nome já instalado. Isso significa que é importante para a versão não apenas o pacote do instalador, mas cada arquivo contido nele. Mas isso também significa que é incrivelmente difícil fazer downgrades (ou seja, reversões) com o Windows Installer.
É mais adequado para aplicativos tradicionais do Windows (GUI, serviços, drivers) que são lançados ao público. No entanto, não é a melhor escolha para desenvolvimento interno e amp; aplicativos distribuídos, qualquer tipo de aplicativos da Web ou sistemas de banco de dados.
Também foi usado para implantar bibliotecas distribuíveis (DLL nativas) e objetos COM, mas com o foco de hoje, não é o mecanismo certo para distribuir bibliotecas.
Web Deploy.
Melhor Adequada: Aplicações Web (IIS, ASP)
A tecnologia Web Deploy foi projetada especificamente para implantação e sincronização de aplicativos em servidores da Microsoft IIS. A replicação do IIS Web Farm usa os comandos e pacotes do Web Deploy atrás das cenas para sincronizar sites em um conjunto de servidores. O Gerenciador do IIS possui uma extensão (habilitada pela instalação do Web Deploy) para "Importar Aplicação", que pode instalar ou atualizar uma aplicação web usando um pacote zip Web Deploy.
A maior desvantagem é que ele só pode ser usado para aplicativos da Web na plataforma Microsoft IIS e o mecanismo limitado para personalizar a instalação. Embora possa ser adequado para aplicações web simples, pode tornar-se rapidamente frustrante para qualquer coisa mais sofisticada, ou seja, variáveis, lógica condicional, bancos de dados, etc.
Além disso, não possui suporte intrínseco para o controle de versão.
Gerenciadores de Pacotes.
Best Suited: bibliotecas compartilhadas, dependências, utilitários de linha de comando.
Os gerenciadores de pacotes são ótimos para liberar e controlar componentes de compartilhamento compartilhados e controlar as dependências entre eles. Por exemplo, se você tem uma biblioteca compartilhada que deseja que outros usem, um Gerenciador de Pacotes permite que você publique várias versões lado a lado e que os consumidores da biblioteca façam referência à versão em que dependem. Os gerenciadores de pacotes podem resolver todas as dependências entre pacotes e recuperar apenas as versões esperadas. Com efeito, os gerentes de pacotes resolvem o problema do "inferno DLL".
Eles são usados melhor durante o desenvolvimento, para resolver as dependências da biblioteca. No entanto, alguns gerenciadores de pacotes, como o Chocolatey for Windows ou o apt-get para o Ubuntu, estão voltados para a instalação de software completo.
Mais importante, os gerentes de pacotes são projetados em torno do conceito de versão. Então eles são um mecanismo perfeito para distribuir bibliotecas de software versionado.
Pois nós temos o NuGet. Muitas bibliotecas de código aberto foram publicadas em seu repositório on-line, e agora é o padrão de fato para distribuição de componentes de terceiros. É encorajado que cada equipe crie seu próprio repositório NuGet para compartilhar e publicar bibliotecas desenvolvidas internamente de forma variada.
O NuGet pode até ser usado para liberar sistemas de software completos - veja a próxima seção.
Outros ambientes de desenvolvimento têm a sua própria - npm para Node. js, pip para Python, gemas para Ruby, apt-get no Linux. Gerentes de Pacotes provaram ser extremamente úteis e explodiram em popularidade.
Octopus Deploy.
Melhor Adepto: desenvolvido internamente e amp; Software implantado.
O Octopus usa o NuGet como embalagem e shell de versão. É semelhante a um instalador, apenas conduzido pelo PowerShell, o que significa infinita flexibilidade na forma como o software deve ser implantado. O PowerShell já possui um excelente suporte para configurar o Windows Services, aplicativos da Web do IIS, tarefas agendadas, SQL Server e muito mais.
Para software desenvolvido e distribuído internamente (ou seja, para uma empresa com soluções de software desenvolvidas em casa), este é um veículo de gerenciamento de lançamento perfeito. Os pacotes são versionados e empurrados para um feed NuGet compartilhado (por exemplo, um compartilhamento de rede), de onde o Octopus Deploy pode liberar e implantar cada pacote no ambiente apropriado.
NuGet aqui desempenha um papel do pacote de aplicação / container, com uma versão marcada nela. O pacote pode ser construído uma vez e, em seguida, implementado quantas vezes for necessário para qualquer ambiente.
Versioning & amp; Bases de dados de embalagem.
O controle de versão do banco de dados é um dos maiores desafios em projetos de software. Quase todas as equipas que encontrei, ignoraram completamente ou tiveram algo inadequado no lugar. Certamente apresenta um desafio - a definição de esquema de mix de sistemas de banco de dados com dados reais em tempo real, e não há nenhum "arquivo" único que possa ser efetivamente versionado.
Devemos reconhecer o banco de dados como parte integrante do sistema de software. Um que executa em uma plataforma proprietária de terceiros (SQL Server, Oracle, PostgreSQL, etc.), mas a fonte é parte da definição do software. Pode ser comparado com sistemas baseados em scripts, como Node. js ou Python, apenas os scripts são escritos em um dialecto SQL.
Existem essencialmente três abordagens populares para o controle de versão do banco de dados, que suportam implantações automatizadas (não estou considerando abordagens manuais, porque elas são propensas a erros e não têm nada a ver com versões reais!).
DB - Migrações.
"Migrações" é um conceito em que os desenvolvedores mantêm um conjunto de arquivos de script SQL organizados, numerados sequencialmente, onde cada script aplica modificações ao banco de dados de destino para trazê-lo para o estado esperado. Sempre que uma mudança é necessária para o banco de dados do aplicativo, um desenvolvedor cria um novo script de migração que aplica as mudanças do delta.
Todos os scripts são mantidos como parte do controle de origem e são empacotados com o aplicativo (incorporado no binário executável ou instalado ao longo do lado). Uma biblioteca de migrações, em seguida, verifica o banco de dados de destino para uma tabela dedicada que contém o último "número de script de migração" aplicado e, em seguida, executa todos os scripts com um número maior do que na ordem, aplicando efetivamente todas as mudanças em turno.
Embora esta abordagem seja simples de implementar, e é favorecida entre vários frameworks populares (Ruby Rails, Entity Framework), ela possui uma série de falhas significativas. Em primeiro lugar, não existe uma única visão de origem de todos os objetos de banco de dados (por exemplo, tabelas, procedimentos armazenados, etc.), são espalhadas pelos vários scripts de migração. Não está claro qual dos scripts contém qual das modificações. É preciso "reproduzir" todos para gerar um banco de dados e, em seguida, procurar diretamente no banco de dados (em vez de código fonte).
Em segundo lugar, o número de scripts de migração torna-se a "versão" do banco de dados, que é diferente do número da versão do pacote de software para o resto do aplicativo. Isso é algo confuso. Além disso, esta "versão" não identifica o estado do banco de dados, uma vez que um banco de dados pode ser alterado fora de um aplicativo sem atualizar a "versão". Isso pode potencialmente interromper instalações futuras, porque os scripts de migração esperam que o banco de dados esteja em certo estado para funcionar.
Em terceiro lugar, os desenvolvedores precisam ser disciplinados o suficiente para seguir a estrutura e aplicar TODAS as mudanças através de scripts de migração. Além disso, ao desenvolver e depurar localmente, muitas vezes tem que passar por várias iterações antes de obter essa tabela ou mudança de procedimento de loja corretamente. No entanto, apenas as mudanças finais devem entrar no script de migração, o que significa que eles devem ser lembrados e escritos manualmente. Caso contrário, os scripts de migração conteriam todas as mudanças intermediárias feitas por todos os desenvolvedores no projeto. It is easy to see how that can grow out of proportion quickly.
Finally, there is an argument that migration scripts are a "history of changes", and it is a bit of a redundancy to store them in source control, which already is a "history" of code changes. We would be storing a history of a history . There's something philosophical about that.
Supported by some frameworks and libraries (Rails, DbUp, RoundHousE, EF Code First) Can work with any database Potentially high degree of control over SQL scripts.
Have to manually maintain all migration scripts Tracking changes through source control is difficult Not robust against target database out-of-band changes.
DB - SQL Compare.
Most often this is used in a manual approach, comparing a database between two environments (e. g. development vs test) to copy over the changes. We are considering an automated approach, suitable for the packaging and versioning strategies being discussed.
In source control, database is represent by a series of creation scripts (e. g. to create tables, stored procedures, triggers, etc), such that a new database with the right schema can be created from scratch. Usually each script file logically represents a corresponding object in the database, e. g. Table1.sql would be the create script for Table1 table. All of the scripts are included in the released package (sometimes even combined into a large single create script, by concatenating them).
The idea is that during automated package deployment a temporary fresh database copy is created, by running all of the creation scripts , and then a SQL Compare tool is executed to compare the pristine copy with the target database to generate a migration delta script on the fly.
The advantage of this approach is that it is robust against the target database out-of-band changes, since delta script is generated during deployment , rather than during development. SQL Compare tools (such a RedGate's SQLCompare or XSQL Compare) are sophisticated and mature enough tools that we can have some confidence in the generate SQL code. Each can be controlled by a multitude of options to fine-tune behavior with respect to renames, reordering columns, avoiding drops, etc.
In this case, target database is considered as a runtime environment , and we avoid having the issue of versioning it . Instead we version the package that contains all of the creation scripts , which is much easier, and use it to synchronize target database with what's expected in each version.
The big disadvantage of this approach is the difficulty of getting it right - there is no off-the-shelf framework that would support it, and it has to be developed. For SQL Server, read the next section for a better approach. For others, some day I may put together the set of scripts and logic necessary to achieve this, based on some of my prior work (unless someone else beats me to it).
Automatically detect and migrate changes, regardless of target DB state Only maintaining DDL (i. e. create) scripts in source control, meaning easy change tracking.
More difficult to setup, especially to be automated Having to create a temporary database during each deployment (need " create database " permission)
DB - DACPAC (SQL Server)
For SQL Server there is now a new recommended approach - DACPAC, and it can be produced by Visual Studio 2012 and above, if using the SQL Server database project. Really, this is a slick variation of the "SQL Compare" method above, just that Microsoft has done all the heavy lifting for you!
Essentially, DACPAC is a zip package which contains an XML schema model of what the target database should look like. It is compiled by Visual Studio based on the creation scripts in your project. In fact, it represents that temporary pristine database that we would have had to create manually. Only it is done automatically and the schema represented in an XML format. The real bonus is that a DACPAC can be versioned , i. e. its metadata supports storing a version number.
SQL Server Data Tools can be used to deploy a DACPAC package, which really performs a SQL Compare operation between the in-memory database model loaded from DACPAC and the target database. It does the same thing as SQL Compare, but avoids having to create the extra temporary database copy to do the comparison.
For applications having SQL Server as a back-end, a DACPAC can be included as one of the deployable packages, stamped with appropriate version generated during the build. Starting with SQL Server 2008 R2, database can be registered as a Data-Tier Application, and the latest DAC version is tracked in a system view that can be queried.
Can package the whole DB definition into a single package (or several packages) Can apply the same version to the package as the rest of the software system Same advantages as the SQL Compare method.
SQL Server only Need to treat lookup data in a special way (post-deploy MERGE script)
Build Auto-versioning.
Given the importance of consistent versioning discussed above, it makes sense to implement a strategy for automatically generating and stamping a version number during the software automated build process. We want the version number to be applied to the produced packages, and also applied to all the binaries generated through compilation.
There are several well-known and not so well-known ways of achieving this. We look at pros and cons of each.
Applying Build Number.
There are some who prefer to update the version number manually just before a release. I will argue that this is a bad practice. Firstly, it is easy to forget to do it, if you don't have an automated system for incrementing the version build number. And, if it is easy to forget, it will be forgotten at some point.
Secondly, without automatically updating build number, there will be multiple packages produced from the source code that have the same version number, but different functionality (as more commits are made to the source control). This will be confusing to say the least.
It is better to have a process, like ones described below, where version number build component is automatically updated whenever a non-local build is made.
Multiple Versions for Multiple Components.
If there are multiple software components, where each needs to have its own version number, then it is best to split them each into its own separate build. Don't mix multiple version numbers in the same build, as it unnecessarily increases the complexity, and raises a question about which of the build numbers should be used to label the build itself (in addition to having to tag each source sub-tree separately).
Developer vs Continuous vs Release Builds.
Release build is the one that will potentially be released to public or a particular environment - test, staging, production, etc. That's the build that needs to be consistently versioned to keep track of changes that are included and to link back to the source code at the time of compilation.
Note that the Release build can scheduled - it is popular to have a Daily or Nightly build. In most situations it should be the Release build, i. e. it should be versioned and packaged ready to be released.
Continuous Integration builds run whenever someone commits to the repository and are used to validate that the code compiles, and passes unit tests. There is no need to version this build, as it is not intended to be released.
Developers must also be able to do a Developer build , whether it is to test/fix the build process itself, or to generate shared software components to be used in development. Such builds are intended to be run locally only and should never be publicly released.
You can default the build part of the version number to "0". This will identify Developer builds, i. e. ones that are not supposed to be released. For Release builds pass the build number to your build scripts as a property. Have MSBuild stamp a version number on all generated assemblies and packages.
Tagging Source Control.
Since one of the primary reasons for having a version number is to be able to link back to source code used to build the software (see beginning of the article), it is important to create tags/labels in source control that identify the state of source code at the time that version was built.
Various systems call it differently - TFS has "Labels", Git has "tags". Tag should include the full version (including the build number) of the build, so that it can later be found, if needed.
Build Number - Version File Auto Increment.
Common technique is to record version number together with source code, usually in a separate file (e. g. "version. txt"). The build process then finds the file, reads the version, increments the build number portion, and commits the file back to repository.
If the commit message also includes the version number, e. g "Auto-increment: 1.3.156.0" , then it comes in handy when viewing commit history. You can see the changes that occurred between versions clearly by seeing the commits between the two "Auto-increment: . " messages.
This works fairly well, but has a few drawbacks. Mainly due to the fact that "version" becomes part of the source code. When merging changes between say release branch and main, you have to resort to "cherry-picking" (i. e. selecting just the code changesets) to avoid merging the modified version number. That requires being always careful, because you can accidentally change the versioning sequence of another branch just by merging the "version file" into it.
Control over the build number sequence (i. e. sequential) Can make it easy to see changes between versions in source control history.
Difficult to control merging between code branches in source control.
Build Number - External.
Overcoming the drawbacks of the auto increment approach, it is possible to track the build number outside of the source tree. Build server software such as CruiseControl or TFS Builds can do that - they track a build number internally for each "project" and are able to pass it as a parameter to MSBuild.
Version file is still used, but it records major and minor versions only, and doesn't have to change between each build. This makes it easier to merge changes from release branches back to main and others, since they will contain only code changes, without being intermingled with version increments. Major/minor version changes would occur early in the development cycle, when starting work on the next update, and are already set by the time release branch is created.
Not modifying source tree on every build makes merging between branches easier Versioned builds are forced to be built by a dedicated build server.
Relies on a build system that can supply a build number (e. g. CruiseControl, TFS Builds) Changing build number sequence can be difficult (e. g. TFS Builds)
Build Number - Derived from Date/Time.
A popular alternative is to derive build number for the date/time of the build. The advantage being that it carries more meaning (useful in diagnosis), and each build inherently should get a different build number (with later builds getting a higher number).
The trick, of course, is fitting all this into a 16-bit number, if using the standard 4-part Windows version number. While some solve it by using both, the build and revision components, I cannot recommend it, because revision cannot always be applied to external packages (like Windows Installer, or NuGet), which use only a 3-part version number.
This only allows only 4 unique builds per day, which is not a lot, unless all you want is a daily build .
Not depending on keeping track of the last build number Build number can be given more meaning, if it derives from a date.
Build number is not sequential (but it increases nevertheless) Limited to 16-bit (maximum 65535), so some overflow into revision (4th) number.
Build Number - Derived from Source Control.
A variation of the previous technique is to derive build number from a unique property in source control. With a centralized SCM like Subversion or TFS, a revision or changeset number is an ever increasing number that is tied directly to the source code. The big problem with it is that it can quickly overflow the 16-bit limit, meaning you may have to accept build numbers looping back to zero.
An alternative in distributed SCM, like Git, is to use the size of the commit history log as the build number. This will monotonously increase for any single branch, as new commits are made. It too can overflow the 16-bit limit, but goes a lot further than the global revision number.
Example: git rev-list HEAD --count.
Not depending on keeping track of the last build number No possibility of "forgetting" to update version file, or accidentally merge it to/from another branch.
Commit history size will grow beyond 65,535 at some point, overflowing the 16-bit build number.
Parallel Branches.
It's no secret that developing for multiple versions requires multiple branches in source control, each representing a "version" stream for the software. They can be roughly divided into:
Development branches - where unstable code for the next version lives, and where developers commit daily work Feature branches - veering off from development branches, encorporating larger feature development, that would otherwise disrupt other team members Release branches - representing versions of released software, or a release undergoing stabilization.
Each release branch needs to have an identifying version, and is usually named after it, e. g. "1.7" . A decision of whether to create a new release branch depends on how long it is expected that it will be in stabilization mode before releasing, and whether concurrent live versions are permitted (i. e. for packaged software). If you need to be able to maintain & hotfix the current released version, while a new version is being tested & stabilized, then create a new branch.
Development and feature branches need to have a version number that is above any of the existing release branches to avoid confusion. For example, if a 1.7 release branch is created, for the upcoming 1.7 release, then immediately update development branch version sequence to 1.8 .
Versioning feature branches is more difficult, since you don't want to start a new versioning sequence for every feature . Nothing should be "released" from feature branches, so this version is for internal purposes only. If using Semantic Versioning, attach a prerelease tag to clearly indicate this is a version for a feature branch, e. g. 1.8.781-dev-feature-x .
In any case, you wouldn't deploy anything built from a feature branch to the shared testing or production environment, or release a package from it. So it is acceptable to have version sequence overlap with that of development branch.
Finally, in the next section we look at how to version patches & hotfixes that are applied to release branches.
Handling Patches / Hotfixes.
Devising a system to handle patches depends heavily on the rest of the software development cycle, which is what many teams forget when searching for the "one, true way" of handling concurrent patching of the released/production software in parallel with working on the new version.
For example, having a short QA/test cycle, where most of the tests are automated, results in a more simplified and robust system, which does not have to deal with multiple parallel hotfixes "in test".
Overlapping hotfixes.
One difficulty that comes with managing parallel development is consistent versioning and deployment strategy that would overcome inherent conflicts. Consider following scenario: you have recently released a software package 1.5.167. Two urgent show-stopping issues have slipped past your QA process and now require a quick fix. You assign two developers to work on each one in parallel. How would they commit their fixes to minimize conflicts? How do you test each fix? How do you release one independent of the other?
This is a good example of the complexity of software release processes that can be encountered in real-world teams. It applies both to internal software and packaged software, but distribution of the hotfix might be slightly different for each one.
First, let's consider what happens if we remove concurrency . In the case where the two issues are worked one after the other , the solution becomes simple. The first fix gets committed into the maintenance/hotfix branch for 1.5 release stream, a new build is generated, with an incremented build number. Build goes through a quick QA cycle to make sure there is no regression, and then it is ready to be deployed. Same process repeats for the second fix.
The problem with concurrent approach is the time when development is in parallel, creating the entangled case where there is no build/package that contains only one of the fixes , i. e. independent of the other. This problem is magnified by a slow QA cycle , usually meaning there are no automated tests. While one fix is in test, if a commit for a second fix is made to the same branch, and a problem is discovered with the first one, it becomes very difficult to separate the two now.
The culprit here is, of course, the concept of a partial fix - the state where the fix is not complete. It has been committed, but has a problem with it, requiring further commits . This can easily create the case of a hotfix branch where the two fixes are "entangled" (quantum physics on the code level!).
Solution is to remove possibility of a partial hotfix .
This means that each hotfix has to be coded and tested in a separate code stream, independent of the other. Once tested, and ready for release, it is merged into the main hotfix release branch, where the automated build can create a new package and apply versioning (i. e. increment build number, for example, to 1.5.168).
Second hotfix, once tested, also has to be merged into the main hotfix release branch. But, because during the work on this second hotfix, the first hotfix got released, we first merge the first hotfix into the second hotfix's branch ! This ensures that we can test how the second hotfix operates, when applied on top of the first hotfix, and merge any code conflicts, if any.
In the end, you want a system with both hotfixes applied - that is the "next" version. So it makes sense that whatever hotfix is "second", it is applied on top of the "first" one. And creating a packaged release from the single hotfix release branch ensures that the version number is consistently incremented for the whole system.
Of course, above means that we must create a separate branch for each hotfix. Some version control systems, namely Git, make this very easy and part of the expected developer workflow. If you are using a version control system like TFS, then creating new branches for each hotfix is a bit more painful. In TFS, I suggest using named Shelvesets feature to emulate Git's process, and perform initial QA tests for a hotfix from a Shelveset-branch build. Then commit Shelveset into the hotfix branch to build the official hotfix package (and perform necessary merging).
What about the versioning of the interim hotfix builds ? The main hotfix release branch would have a standard versioning scheme applied (as discussed above), either incrementing a build number, or using a timestamp. Each new hotfix, applied on top of all previous hotfixes, gets an increased build number , and the software version keeps moving forward.
However, when building from the developer hotfix branch (or Shelveset in TFS), we also need to apply a version to distinguish it from other builds, and be able to deploy it into QA/test environment. We want to be able to test each hotfix in isolation, applied on top of an existing released version of the software system. This becomes problematic, if you have a single test environment .
You do not want to apply both hotfixes into one test environment, because there is no guarantee that they won't conflict or affect each other. If you are able to quickly spin up a test environment for a hotfix development branch, then you can truly parallelize team efforts. For a shared test environment, they have to be applied one at a time :
Force install latest release version (e. g. 1.5.168) to bring environment to a known state Install the hotfix version to be tested Perform the tests (preferably automated) For shared test environnments this is the bottleneck, since no other hotfixes can be tested at the same time (automation can help minimize the time spent in this step) Repeat 1-3, until tests are satisfactory.
What this means is that each hotfix has to have its build version number greater than the latest released version, the one it is being applied on top of. There are several ways to achieve that. If using a derived build number , this should just work out of the box. If incrementing or using external build numbers, then the easiest option is to simply force the build for hotfix development branch (or Shelveset) to use a number greater than latest released version (i. e. .168).
With Semantic Versioning, we can setup hotfix builds to use a "prerelease" tag that clearly marks it as a hotfix-test build. For example - 1.5.169-check14761 , where the trailing number could be a reference to the issue tracking system. This works especially well when using NuGet as the packaging mechanism.
Once tested, the changes can be merged into hotfix release branch, and an official build generated, with incremented build version number.
NOTE: Above process to resolve concurrent hotfixes is undoubtedly complicated. It is intended to solve a particular real-world scenario, but one that does not happen too often. If there are no concurrent fixes expected, you can simplify your life by applying fixes directly to the hotfix release branch.
Patching a large system.
If applying hotfixes to a large system, we don't want to upgrade the whole thing, which may involve a lot of different components - services, GUI applications, scheduled jobs, databases, etc. Instead, we want to apply the fix only to affected parts.
This is where splitting the system into multiple packages helps. Each corresponds to a logically contained piece of the system - for example, each service, application, database, etc is its own package. That means they can be patched independently by applying just that package .
Care must be taken about dependencies, if hotfix affects multiple packages at once. Although, in that case, ask yourself is it really a hotfix or a new minor version?
Patching for specific installation.
Some software shops may have developed the practice of patching the software for individual customers (for packaged software), in other words creating a "custom" version for just that installation, without including this fix in the rest of released software streams. This is one of the worst situations to be in, with regards to versioning, since it creates a large number of variations that have to be maintained separately.
Instead, release a general update , moving the overall software version forward for that release stream. Adopt a "feature" system , where parts of the software can be turned on & off based on configuration. If a specific fix is needed for a particular installation, then that code can be encapsulated behind a configuration switch which turns this section of the code on or off. That particular customer can turn it on , while the rest can have it off!
This is also a popular technique in web applications, of which only one installation exists (on the server), where various "features" can be enabled based on "configuration" for each user , or a set of users.
Patching the changes only.
There is often the temptation to simply patch in the changes to the live/production system by editing/replacing one file, or updating one table or stored procedure. The change is small, and it seems like the fastest way to solve the imminent issue, without changing anything else in the system.
While it seems like a smaller risk to make only the necessary updates directly, it makes it a whole lot harder to know the state of the system in the future. As more and more such "small" patches get applied, there is no longer any reliable way to link the running system back to the original source code, making further maintenance exponentially more complicated (and, ironically, increasing the risk).
Updating individual non-binary (e. g. config files) or altering database objects does not update any version number . That means it is difficult to tell which changes have been made to the system, leading to "maintenance hell" (a variation of the infamous "DLL Hell").
Rule of thumb: Any change to the system should change the version number.
NOTE : Windows Installer allows a so called "small update", where product version number does not have to change, used for small hotfix patches. I believe this creates too much confusion, and so I do not recommend it. Windows Installer does track each patch, through package code, so you always know which patches have been applied. But it means now having to track and remove patches on subsequent product updates, which complicates the process. It may work for Microsoft Windows and Microsoft Office, but I wouldn't recommend using it for any system.
Palavras finais.
This turned out to be a much longer article than I originally anticipated when I sat down to write about versioning . I am hoping it proves useful for software engineers out there looking for some guidance on how to apply these concepts in their own projects.
Still this seems like only a partial treatment of the topic.
Everything I wrote above has been learned through the painful process of trial & error over the years. If just a few readers have an "aha!" moment while reading this, then I have achieved my goal!
Database Upgrade Scripts.
A sequence database upgrade scripts that contain the DDL necessary to move the schema from version N to N+1. (These go in your version control system.) A _version_history_ table, something like.
gets a new entry every time an upgrade script runs which corresponds to the new version.
This ensures that it's easy to see what version of the database schema exists and that database upgrade scripts are run only once. Again, these are not database dumps. Rather, each script represents the changes necessary to move from one version to the next. They're the script that you apply to your production database to "upgrade" it.
Developer Sandbox Synchronization.
A script to backup, sanitize, and shrink a production database. Run this after each upgrade to the production DB. A script to restore (and tweak, if necessary) the backup on a developer's workstation. Each developer runs this script after each upgrade to the production DB.
A caveat: My automated tests run against a schema-correct but empty database, so this advice will not perfectly suit your needs.
Red Gate's SQL Compare product not only allows you to do object-level comparisons, and generate change scripts from that, but it also allows you to export your database objects into a folder hierarchy organized by object type, with one [objectname].sql creation script per object in these directories. The object-type hierarchy is like this:
If you dump your scripts to the same root directory after you make changes, you can use this to update your SVN repo, and keep a running history of each object individually.
This is one of the "hard problems" surrounding development. As far as I know there are no perfect solutions.
If you only need to store the database structure and not the data you can export the database as SQL queries. (in Enterprise Manager: Right click on database -> Generate SQL script. I recommend setting the "create one file per object" on the options tab) You can then commit these text files to svn and make use of svn's diff and logging functions.
I have this tied together with a Batch script that takes a couple parameters and sets up the database. I also added some additional queries that enter default data like user types and the admin user. (If you want more info on this, post something and I can put the script somewhere accessible)
If you need to keep all of the data as well, I recommend keeping a back up of the database and using Redgate (red-gate/) products to do the comparisons. They don't come cheap, but they are worth every penny.
First, you must choose the version control system that is right for you:
Centralized Version Control system - a standard system where users check out/check in before/after they work on files, and the files are being kept in a single central server.
Distributed Version Control system - a system where the repository is being cloned, and each clone is actually the full backup of the repository, so if any server crashes, then any cloned repository can be used to restore it After choosing the right system for your needs, you'll need to setup the repository which is the core of every version control system All this is explained in the following article: solutioncenter. apexsql/sql-server-source-control-part-i-understanding-source-control-basics/
After setting up a repository, and in case of a central version control system a working folder, you can read this article. It shows how to setup source control in a development environment using:
SQL Server Management Studio via the MSSCCI provider,
Visual Studio and SQL Server Data Tools.
Here at Red Gate we offer a tool, SQL Source Control, which uses SQL Compare technology to link your database with a TFS or SVN repository. This tool integrates into SSMS and lets you work as you would normally, except it now lets you commit the objects.
For a migrations-based approach (more suited for automated deployments), we offer ReadyRoll, which creates and manages a set of incremental scripts as a Visual Studio project.
In SQL Source Control it is possible to specify static data tables. These are stored in source control as INSERT statements.
If you're talking about test data, we'd recommend that you either generate test data with a tool or via a post-deployment script you define, or you simply restore a production backup to the dev environment.
You might want to look at Liquibase (liquibase/). Even if you don't use the tool itself it handles the concepts of database change management or refactoring pretty well.
+1 for everyone who's recommended the RedGate tools, with an additional recommendation and a caveat.
SqlCompare also has a decently documented API: so you can, for instance, write a console app which syncs your source controlled scripts folder with a CI integration testing database on checkin, so that when someone checks in a change to the schema from their scripts folder it's automatically deployed along with the matching application code change. This helps close the gap with developers who are forgetful about propagating changes in their local db up to a shared development DB (about half of us, I think :) ).
A caveat is that with a scripted solution or otherwise, the RedGate tools are sufficiently smooth that it's easy to forget about SQL realities underlying the abstraction. If you rename all the columns in a table, SqlCompare has no way to map the old columns to the new columns and will drop all the data in the table. It will generate warnings but I've seen people click past that. There's a general point here worth making, I think, that you can only automate DB versioning and upgrade so far - the abstractions are very leaky.
We use DBGhost to manage our SQL database. Then you put your scripts to build a new database in your version control, and it'll either build a new database, or upgrade any existing database to the schema in version control. That way you don't have to worry about creating change scripts (although you can still do that, if for example you want to change the data type of a column and need to convert data).
With VS 2010, use the Database project.
Script out your database Make changes to scripts or directly on your db server Sync up using Data > Schema Compare.
Makes a perfect DB versioning solution, and makes syncing DB's a breeze.
It is a good approach to save database scripts into version control with change scripts so that you can upgrade any one database you have. Also you might want to save schemas for different versions so that you can create a full database without having to apply all the change scripts. Handling the scripts should be automated so that you don't have to do manual work.
I think its important to have a separate database for every developer and not use a shared database. That way the developers can create test cases and development phases independently from other developers.
The automating tool should have means for handling database metadata, which tells what databases are in what state of development and which tables contain version controllable data and so on.
You didn't mention any specifics about your target environment or constraints, so this may not be entirely applicable. but if you're looking for a way to effectively track an evolving DB schema and aren't adverse to the idea of using Ruby, ActiveRecord's migrations are right up your alley.
Migrations programatically define database transformations using a Ruby DSL; each transformation can be applied or (usually) rolled back, allowing you to jump to a different version of your DB schema at any given point in time. The file defining these transformations can be checked into version control like any other piece of source code.
Because migrations are a part of ActiveRecord, they typically find use in full-stack Rails apps; however, you can use ActiveRecord independent of Rails with minimal effort. See here for a more detailed treatment of using AR's migrations outside of Rails.
You could also look at a migrations solution. These allow you to specify your database schema in C# code, and roll your database version up and down using MSBuild.
I'm currently using DbUp, and it's been working well.
Every database should be under source-code control. What is lacking is a tool to automatically script all database objects - and "configuration data" - to file, which then can be added to any source control system. If you are using SQL Server, then my solution is here : dbsourcetools. codeplex/ . Have fun. - Nathan.
When the base project is ready then you must create full database script. This script is commited to SVN. It is first version.
After that all developers creates change scripts (ALTER. new tables, sprocs, etc).
When you need current version then you should execute all new change scripts.
When app is released to production then you go back to 1 (but then it will be successive version of course).
Nant will help you to execute those change scripts. :)
And remember. Everything works fine when there is discipline. Every time when database change is commited then corresponding functions in code are commited too.
If you have a small database and you want to version the entire thing, this batch script might help. It detaches, compresses, and checks a MSSQL database MDF file in to Subversion.
If you mostly want to version your schema and just have a small amount of reference data, you can possibly use SubSonic Migrations to handle that. The benefit there is that you can easily migrate up or down to any specific version.
To make the dump to a source code control system that little bit faster, you can see which objects have changed since last time by using the version information in sysobjects.
Setup: Create a table in each database you want to check incrementally to hold the version information from the last time you checked it (empty on the first run). Clear this table if you want to re-scan your whole data structure.
Normal running mode: You can take the results from this sql, and generate sql scripts for just the ones you're interested in, and put them into a source control of your choice.
Note: If you use a non-standard collation in any of your databases, you will need to replace /* COLLATE */ with your database collation. i. e. COLLATE Latin1_General_CI_AI.
Because our app has to work across multiple RDBMSs, we store our schema definition in version control using the database-neutral Torque format (XML). We also version-control the reference data for our database in XML format as follows (where "Relationship" is one of the reference tables):
We then use home-grown tools to generate the schema upgrade and reference data upgrade scripts that are required to go from version X of the database to version X + 1.
We don't store the database schema, we store the changes to the database. What we do is store the schema changes so that we build a change script for any version of the database and apply it to our customer's databases. I wrote an database utility app that gets distributed with our main application that can read that script and know which updates need to be applied. It also has enough smarts to refresh views and stored procedures as needed.
We had the need to version our SQL database after we migrated to an x64 platform and our old version broke with the migration. We wrote a C# application which used SQLDMO to map out all of the SQL objects to a folder:
The application would then compare the newly written version to the version stored in SVN and if there were differences it would update SVN. We determined that running the process once a night was sufficient since we do not make that many changes to SQL. It allows us to track changes to all the objects we care about plus it allows us to rebuild our full schema in the event of a serious problem.
I wrote this app a while ago, sqlschemasourcectrl. codeplex/ which will scan your MSFT SQL db's as often as you want and automatically dump your objects (tables, views, procs, functions, sql settings) into SVN. Works like a charm. I use it with Unfuddle (which allows me to get alerts on checkins)
The typical solution is to dump the database as necessary and backup those files.
Depending on your development platform, there may be opensource plugins available. Rolling your own code to do it is usually fairly trivial.
Note: You may want to backup the database dump instead of putting it into version control. The files can get huge fast in version control, and cause your entire source control system to become slow (I'm recalling a CVS horror story at the moment).
We just started using Team Foundation Server. If your database is medium sized, then visual studio has some nice project integrations with built in compare, data compare, database refactoring tools, database testing framework, and even data generation tools.
But, that model doesn't fit very large or third party databases (that encrypt objects) very well. So, what we've done is to store only our customized objects. Visual Studio / Team foundation server works very well for that.
I agree with ESV answer and for that exact reason I started a little project a while back to help maintain database updates in a very simple file which could then be maintained a long side out source code. It allows easy updates to developers as well as UAT and Production. The tool works on but Sql Server and MySql.
Some project features:
Allows schema changes Allows value tree population Allows separate test data inserts for eg. UAT Allows option for rollback (not automated) Maintains support for SQL server and Mysql Has the ability to import your existing database into version control with one simple command(sql server only . still working on mysql)
The code is hosted on google code. Please check out Google code for some more information.
A while ago I found a VB bas module that used DMO and VSS objects to get an entire db scripted off and into VSS. I turned it into a VB Script and posted it here. You could easily take out the VSS calls and use the DMO stuff to generate all the scripts, and then call SVN from the same batch file that calls the VBScript to check them in?
I'm also using a version in the database stored via the database extended properties family of procedures. My application has scripts for each version step (ie. move from 1.1 to 1.2). When deployed, it looks at the current version and then runs the scripts one by one until it reaches the last app version. There is no script that has the straight 'final' version, even deploy on a clean DB does the deploy via a series of upgrade steps.
Now what I like to add is that I've seen two days ago a presentation on the MS campus about the new and upcoming VS DB edition. The presentation was focused specifically on this topic and I was blown out of the water. You should definitely check it out, the new facilities are focused on keeping schema definition in T-SQL scripts (CREATEs), a runtime delta engine to compare deployment schema with defined schema and doing the delta ALTERs and integration with source code integration, up to and including MSBUILD continuous integration for automated build drops. The drop will contain a new file type, the. dbschema files, that can be taken to the deployment site and a command line tool can do the actual 'deltas' and run the deployment. I have a blog entry on this topic with links to the VSDE downloads, you should check them out: rusanu/2009/05/15/version-control-and-your-database/
Its a very old question, however many are trying to solve this even now. All they have to do is to research about Visual Studio Database Projects. Without this, any database development looks very feeble. From code organization to deployment to versioning, it simplifies everything.
In my experience the solution is twofold:
You need to handle changes to the development database that are done by multiple developers during development.
You need to handle database upgrades in customers sites.
In order to handle #1 you'll need a strong database diff/merge tool. The best tool should be able to perform automatic merge as much as possible while allowing you to resolve unhandled conflicts manually.
The perfect tool should handle merge operations by using a 3-way merge algorithm that brings into account the changes that were made in the THEIRS database and the MINE database, relative to the BASE database.
I wrote a commercial tool that provides manual merge support for SQLite databases and I'm currently adding support for 3-way merge algorithm for SQLite. Check it out at sqlitecompare.
In order to handle #2 you will need an upgrade framework in place.
The basic idea is to develop an automatic upgrade framework that knows how to upgrade from an existing SQL schema to the newer SQL schema and can build an upgrade path for every existing DB installation.
Check out my article on the subject in codeproject/KB/database/sqlite_upgrade. aspx to get a general idea of what I'm talking about.
Check out DBGhost innovartis. co. uk/. I have used in an automated fashion for 2 years now and it works great. It allows our DB builds to happen much like a Java or C build happens, except for the database. Você sabe o que eu quero dizer.
I would suggest using comparison tools to improvise a version control system for your database. A good alternative are xSQL Schema Compare and xSQL Data Compare.
Now, if your goal is to have only the database's schema under version control you can simply use xSQL Schema Compare to generate xSQL Snapshots of the schema and add these files in your version control. Than, to revert or update to a specific version just compare the current version of the database with the snapshot for the destination version.
Alas, if you want to have the data under version control as well, you can use xSQL Data Compare to generate change scripts for you database and add the. sql files in your version control. You could then execute these scripts to revert / update to any version you want. Keep in mind that for the 'revert' functionality you need to generate change scripts that when executed will make Version 3 the same as Version 2 and for the 'update' functionality, you need to generate change scripts that do the opposite.
Lastly, with some basic batch programming skills you can automate the whole process by using the command line versions of xSQL Schema Compare and xSQL Data Compare.
Database Version Control.
By placing under source control everything we need to describe any version of a database, we make it much easier to achieve consistent database builds and releases, to find out who made which changes and why, and to access all database support materials. Matthew Skelton explains how to make sure your version control system fully supports all phases of the database lifecycle, from governance, development, delivery and through to operations.
DevOps, Continuous Delivery & Database Lifecycle Management.
To achieve reliable, repeatable database builds and migrations, as part of Database Lifecycle Management (DLM), we need to store the Data Definition Language (DDL) code for a database in a version control system (VCS). It should be possible to reconstruct any version of a database from the scripts in the VCS and every database build we perform, and every database change, however trivial, should start from version control.
Ideally, you will also have in the VCS a complete history of changes to individual schemas and objects; who did them and why. Any build and version-control system must allow developers, as well as people outside the development team, to be able to see what is changing and why, without having to rely on time-wasting meetings and formal processes.
Without good version control practices, large parts of DLM will remain difficult for your organization to achieve, so getting it right is vital. In this article, we explore the database versioning capabilities provided by source control system, considerations when choosing a one, and good practices for version control in the context of DLM.
What goes in version control?
Every application or database that we build should originate from a version in the source control system. With most developments, there are many points in the process where a consistent working build should be available. You need to store in version control everything that is needed in order to build, test or deploy a new database, at a given version, or promote an existing database from one version to another.
The most obvious candidates for versioning are:
individual DDL scripts for each table individual DDL scripts for all other schema-scoped objects such as stored procedures, functions, views, aggregates, synonyms, queues ‘static’ data.
There are other candidates that we will mention shortly. If a VCS saves database scripts at the ‘object’ level then each file corresponds to a table or routine. In this case, the task of creating a new database or upgrading an existing one, from what’s in source control, is primarily an exercise in creating an effective database script from the component ‘object’ scripts. These will have to be executed in the correct dependency order. Subsequently, any static data must be loaded in such a way as to avoid referential constraints being triggered.
An alternative is a migration-based approach, which uses a series of individual change scripts to migrate a database progressively from one version to the next. The Database Migrations article discusses these approaches in more detail.
However they are produced, we need to also version the complete build script for the database structure and routines, as well as the data migration scripts required to ensure preservation of existing data during table refactoring, plus associated rollback scripts. Normally, the complete build scripts would be generated automatically from the nightly integration build, after it has passed its integration tests (see the Database Continuous Integration article). These integration tests will also verify that any hand-crafted migration scripts work exactly as intended, and preserve data correctly.
A common mistake that development teams make is to assume that database source code consists merely of a number of ‘objects’. In fact, the dependencies within a working database system are numerous and sometimes complex. For example, a change to one item in the database configuration, such as the database collation setting, might be enough to stop the database working as expected. Therefore, beyond the tables, routines and static/reference data, we also need consider the broader database environment, and place into version control elements such as:
database configuration properties server configuration properties network configuration scripts DDL scripts to define database users and roles, and their permissions database creation script database interface definition (stored with the application it serves) requirements document technical documentation ETL scripts, SSIS packages, batch files and so on SQL agent jobs.
Benefits of version control.
Unless we have in the VCS the correct versions of all the scripts necessary to create our database objects, load lookup data, add security accounts, and take any other necessary actions, we have no hope of achieving reliable and repeatable database build, release and deployment processes, nor of coordinating database upgrades with changes to the associated application. If we perform ad – hoc database patching, outside of a controlled VCS process, it will inevitably cause data inconsistencies and even data loss.
Version control provides traceability.
The VCS provides a complete history of changes made to the database source. The team can see which developer is working on which particular module, which changes have been applied and when they were applied, which modules are available for release, the current state of the production system and the current state of any test systems. It also means that the team can, at any time:
Roll the database forward or back between any versions Recreate older database versions to find when a subtle bug was introduced. Perform a code review, checking coding standards and conventions.
This traceability is crucial when diagnosing incidents in Production or when responding to an internal or external audit, and is particularly powerful when using hash-based VCS tools such as Git, which trace the file contents. With the correct permissions scheme in place, a VCS provides a trail of changes to all text files used in the software system.
Version control provides predictability and repeatability.
Keeping all text-based assets in version control means that processes are more repeatable and reliable, because they are being driven from a controlled set of traceable files rather than, for example, arbitrary files on a network share. The modern VCS is highly reliable and data loss is extremely rare, making it ideal to depend on for automation.
Version Control protects production systems from uncontrolled change.
The VCS acts as a guard against ‘uncontrolled’ database changes i. e. against direct changes to the code, structure, or configuration of a production database. The VCS must be treated as a single source of truth for the database source, and configuration, including database schema, reference data, and server-level configuration settings.
The VCS is a mechanism to ensure that the database source that has been stored is identical to what was released. Before deploying a new database version, the team can compare the target database with the source scripts for that database version, in the VCS. If they do not describe identical databases, then the target database has drifted. In other words, there have been unplanned and untested fixes or functionality updates made directly on the live database, and the team must find out what changed and why.
Version control aids communication between teams.
Current version control environments offer rich, browser-based features for collaboration, communication, and sharing between teams, helping to foster interaction and engagement. Features such as pull requests, tickets or issue tracking, and commenting promote good practices such as code review and coding in the open.
A VCS platform that makes it easy for DBAs or Ops teams to review proposed database changes, while automatically storing all the traceability information, will encourage tight feedback loops between Ops and Dev and other stakeholders.
Choosing a version control system for DLM.
A good source control system should make it simple for people across the organization to track changes to files. Usability should be high on the wish-list for any VCS, particularly if it must be easily-accessible to other teams besides developers, such as testers and database administrators, as well as governance and operations people.
Version control tools.
We need to distinguish between low-level version control tools, usually a combination of client tool and server engine, such as Git or Subversion, and version control platforms that provide deep integration with other tools and a rich, browser-based user experience, such as Github or Bitbucket Server (previously known as Stash). We cover tools in this section, and platforms in the section, Version control platforms .
Git’s superior workflow model together with lightweight branching, merging, and history operations make Git the best choice for most teams. Unless you have a good reason to do otherwise, you should use Git as the version control repository part of your VCS. All modern VCS tools support Git, including Microsoft’s Team Foundation Server.
Mercurial is similar to Git, they both use similar abstractions. Some advocate that Mercurial is more elegant and easier to use while others claim Git is more versatile. So it comes down to a matter of personal choice. Certainly, Git is the more widely-adopted tool due to the rise of platforms like GitHub.
Subversion.
Subversion is a sound choice for smaller repositories or where there is a fast network connection to the central server. Subversion can also act as a low-cost artifact repository for storing binary files.
Team Foundation Server.
Team Foundation Server (TFS) 2015 and later support Git as the versioning tool. If you use an older version of TFS, you should either switch to an alternative VCS or upgrade to TFS 2015 or later in order to take advantage of Git support. The older versions of TFS have features and quirks that significantly hamper Continuous Delivery and DLM; in particular, older versions of TFS require an instance of SQL Server per TFS repository, which acts as a significant driver to use only a single repository across several teams, or even an entire organization, rather than having many small repositories that are cheap and quick to create.
Centralized vs. distributed?
Some people distinguish between ‘centralized’ and ‘distributed’ version control systems, but these distinctions can become quite academic, because most teams today use a definitive central repository, even when using a distributed VCS.
Distributed version control systems (DVCS) don’t require users to be connected to the central repository. Instead, they can make changes in their local copy of the repository (commit, merge, create new branches, and so on) and synchronize later. Pushing changes from the local repository and pulling other people’s changes from the central repository are manual operations, dictated by the user.
In centralized systems, by contrast, the interaction is between a local working folder and the remote repository, on a central server, which manages the repository, controls versioning, and orchestrates the update and commit processes. There is no versioning on each client; the users can work offline, but there is no history saved for any of these changes and no other user can access those changes until the user connects to the central server, and commits them to the repository.
The different approaches have important implications in terms of speed of operations. In general, command execution in DVCS is considerably faster. The local repository contains all the history, up to the last synchronization point, thus searching for changes or listing an artifact’s history takes little time. In a centralized system, any user with access to the central server, can see the full repository history, including the last changes made by teammates or other people with access to the repository. However, it requires at least one round-trip network connection for each command.
In my experience, more teams are now opting for the support for the speed and support for asynchronous work that DVCS offers, but it’s worth remembering that with them often comes a more complex workflow model, due to their asynchronous nature. I believe that either flavor of VCS, centralized or distributed, can work well as long as it fits with the preferred workflow of the team, and provided that the right culture and accountability exists in the team.
With either system, if the team allow changes to live for too long in individual machines, it runs counter to the idea of continuous integration and will cause problems. In a DVCS, users feel safer knowing their changes are versioned locally, but won’t affect others until they push them to the central repository. However, we still need to encourage good CI practices such as frequent, small commits.
Version control platforms.
Version control platforms provide the version control functionality but add a lot more besides to improve the user experience for devotees of both the command line and GUI, to help different teams interact, and to provide integration with other DLM tools. For example, many version control platforms offer features such as:
first-class command-line access for experts helpful GUI tools for less experienced users browser-based code review: diffs, commenting, tracking, tickets an HTTP API for custom integrations and chat-bots powerful search via the browser integration with issue trackers and other tools.
We also need to consider how easy the system is to operate, particularly if we will run the system ourselves. An increasing number of organizations are choosing to use cloud-hosted or SaaS providers for their VCS, due to the reduction in operational overhead and the increased richness of integration offered by SaaS tools. There is also an argument that SaaS VCS tools are more secure than self-hosted tools, since most self-hosted VCS tools security management is average at best.
Some popular SaaS-based VCS platforms include Github, Bitbucket, CodebaseHQ and Visual Studio Online. These tools all offer Git as the version control technology plus at least one other (Subversion, Mercurial, and so on). Other options such as Beanstalk (beanstalkapp/) may work if you have a homogeneous code environment, because they are more focused on the Linux/Mac platforms.
Self-hosted VCS solutions are generally less-integrated with third-party tools, which may limit how easily they can be ‘wired together’. Examples of self-hosted VCS platforms include Gitolite, Gitlab, Bitbucket Server and Team Foundation Server.
Essential version control practices for DLM.
Version control is central to the development, testing and release of databases, because it represents a “single source of truth” for each database. As discussed earlier, the VCS should contain everything that is needed in order to build a new database, at a given version, or update an existing database from one version to another. This may be necessary for a new deployment, for testing, or for troubleshooting ( e. g. reproducing a reported bug).
In addition, several other DLM practices depend on version control. Several activities carried out by governance rely on being able to inspect the state of the database at any released version. It is sometimes necessary to determine when a change was made, by whom and why. Also, when release-gates need sign-off, the participants can see what changes are in the release and what is affected by the change. Any audit is made much easier if the auditor can trace changes that are deployed in production all the way back to the original change in the VCS.
It’s essential that the process of building and updating databases from version control is as quick and simple as possible, in order to encourage the practices of continuous integration and frequent release cycles. This section will discuss some of the version control practices that will help.
Integrate version control with issue tracking.
Version control pays dividends when it is integrated with issue tracking. This allows the developer to reference the source of defects, quickly and uniquely, and thereby save a lot of debugging time. It also allows the management of the development effort to check on progress in fixing issues and identifying where in the code issues are most frequent. Developers will also appreciate being able to automate the process of reporting how bugs were fixed and when. It also allows the team to share out the issue-fixing effort more equally and monitor progress.
Adopt a simple standard for laying out the files in version control.
It is sometime useful to store database script files in version control in a format that matches or resembles a layout that will be familiar from use of GUI tool, such as SSMS or TOAD.
Generally, this simply means storing each object type in a subdirectory from a base ‘Database’ diretório. For example, Database | Tables , Database | Stored Procedures , Database | Security , and so on. It is normal to store child objects that have no independent existence beyond the object they associated with the parent object. Constraints and indexes, for example, are best stored with the table creation scripts.
Making frequent, non-disruptive commits.
In order to maintain a high level of service of a database, we need to integrate and test all changes regularly (see the Database Continuous Integration article).
To achieve this, we need to adopt practices that encourage regular commits of small units of work, in a way that is as non-disruptive as possible to the rest of the team. Any commit of a set of changes should be working and releasable. Feature toggles allow deactivating particular features but those features still need to work and pass tests before deployment. Consider a commit to be like the end of a paragraph of text: the paragraph does not simply trail off, but makes sense as a set of phrases, even if the text is not finished yet.
Adopt consistent whitespace and text layout standards.
Agree and establish a workable standard for whitespace and text layout conventions across all teams that need to work with a given set of code or configuration files. Modern text editors make it simple to import whitespace settings such as the number of spaces to indent when the TAB key is pressed, or how code blocks are formatted.
This is especially important in mixed Windows/Mac/*NIX environments, where newline characters can sometimes cause problems with detecting differences. Agree on a convention and stick with it, but make it very simple for people to import the standardized settings; do not make them click through 27 different checkboxes based on information in a Wiki.
When detecting changes, a source control system will default to simply comparing text. It will therefore see any changes of text as a change. This will include dates in headers that merely record the time of scripting, and are irrelevant. It will also include text in comments, even whitespace, which in SQL has no meaning, in contrast to an indentation-based language such as Python. If your source control system just compares text, you need to be careful to exclude anything that can falsely indicate to a VCS that a change has been made.
Tools such as a database comparison tool will parse the text of the SQL to create a parse tree and compare these rather than the text. This not only prevents this sort of false positive but allows the user to specify exactly what a change is and what isn’t. If this sort of tool is used to update source control, then the only changes will be the ones that the user wants. However, this needs a certain amount of care since formatting of SQL Code can be lost unless it is flagged as a change.
Keep whitespace reformatting separate from meaningful changes.
Even if you have a database comparison tool that will help avoid detecting ‘false changes’, it still helps to commit whitespace and comment reformatting separately from changes to the actual code i. e. that will affect the behavior of a script.
Figure 2 shows the colored console output for a comparing current to previous versions, after a commit that contained both whitespace reformatting, and a behaviour change, making the latter (an increase in the width of the LastName column from 50 characters to 90) much harder to spot.
Figure 1: A commit that made whitespace and semantic changes.
If we separate out the whitespace and semantic changes, the colored console output highlights the meaningful change very clearly, in this example someone has reduced the column width from 90 to 70.
Figure 2: isolating a single semantic change.
Just as with any changes to files in a VCS, separating reformatting changes from semantic changes is crucial for readability and tracing problems. With a DVCS like git, you can make multiple local commits (say two commits for whitespace and layout reformatting, followed by some semantic changes) before pushing to the central repository. This helps to encourage good practice in isolating different kinds of changes.
Plan how to coordinate application and database changes.
A database can have a variety of relationships with applications, from almost total integration within the data layer of the application, to being a remote server providing data services via a protocol such as the Open Data Protocol ( OData). Commonly in enterprises, a single database will provide services to a number of applications, and provide integration, and reporting services, for them, via abstraction layers provided for each application, within the database. It means that there will be a number of perfectly valid approaches to managing the change process. At one extreme, the database sits in an entirely separate development regime, sometimes not even subject to the same version control, test and build processes as the application. At the other, the database code is treated on equal terms in source control to the rest of the code and settings of the application.
Where database and application are close-coupled, we can adopt a unified approach to the development and deployment of both application and database, and therefore a unified strategy for versioning a project, with all project resources organized side by side in the same repository. Of course, often this means little more than creating a directory for the database alongside the application, in the VCS, evolving the structure of the database rapidly alongside the application, with the necessary branching and merging (covered later), as their understanding of the problem domain evolves. For a relational database of any size, with complex object interdependencies, this can prove challenging, especially given that when the team need to upgrade an existing database with what’s in the VCS, then every database refactoring needs to be described in a way that will carefully preserve all business data.
For databases that support multiple applications, if the delivery of all the changes that span the applications and database haven’t been planned properly, then the subsequent ‘big bang’ integration between application and database changes can be painful and time-consuming. This, along with deploying the new versions of database and application to QA for testing, then on to the operations team for deployment to Staging and then, finally to Production, forms part of the infamous “last mile” that can delay releases interminably.
Databases, and their interactions with other databases, applications or services, are not immune from the general rules that apply to other interconnected systems. There are several database design and versioning strategies that can help to allow the component parts to change without causing grief to other components who are accessing it. By applying the rules that govern the interfaces between any systems, it is possible to make database changes without disrupting applications, as well as avoid complex branching strategies, and ensuring that our database code is free from dependencies on database configuration and server-level settings.
Every applications needs a published application-database interface.
Regardless of the exact branching strategy, a team that creates an application that makes direct access to base tables in a database will have to put a lot of energy into keeping database and application in sync. When using “feature and release” branching, this issue will only worsen, the more branches we maintain and the more versions of the database we have in the VCS.
An application version should be coupled to an application-database interface version, rather than to a database version. Each application should have a stable, well-defined interface with the database, one for each application, usually, if more than one application uses a single database. A good interface implementation will never expose the inner workings of the database to the application, and will therefore provide the necessary level of abstraction.
Usually, the application development team owns the interface definition, which should be stored in the VCS. It forms a contract that determines, for example, what parameters need to be supplied, what data is expected back and in what form.
The database developers implement the interface, also stored in the VCS. Database developers and application developers must carefully track and negotiate any changes to the interface. If, for example, the database developer role wishes to refactor schema, he or she must do so without changing the public interface, at least until the application is ready to move to a new version, at which point the team can negotiate an interface update. Changes to the interface have to be subject to change-control procedures, as they will require a chain of tests.
The database developers will, of course, maintain their own source control for the database “internals,” and will be likely to maintain versions for all major releases. However, this will not need to be shared with the associated applications.
Keep database code in a separate repository from application code.
Unless your database will only ever be used by one version of one application, which is unlikely with DLM, you should keep the database code in its own version control repository. This separation of application and database, and use of a published interface as described above, helps us to deploy and evolve the database independently of the application, providing a crucial degree of flexibility that helps to keep our release process nimble and responsive.
Adopt a “minimal” branching strategy.
In many cases, we will store the files related to the main development effort in a common root subfolder of the VCS, often named trunk , but sometimes referred to by other names, such as main or mainline .
A VCS allows the team to copy a particular set of files, such as those in the trunk, and use and amend them for a different purpose, without affecting the original files. This is referred to as creating a branch or fork . Traditionally, branching has been seen as a mechanism to ‘maximize concurrency’ in the team’s development efforts, since it allows team members to work together and in isolation on specific features. A typical example is the creation of a “release” branch to freeze code for a release while allowing development to continue in the development branch. If changes are made to the release branch, normally as a result of bug fixes, then these can be merged into the development branch.
Branches can be a valuable asset to teams working with DLM, but should be used sparingly and with the idea that any given branch will only have a transient existence; just a few days. When working in a branch, there is a strong risk of isolating changes for too long, and then causing disruption when the merge ambush eventually arrives. It also discourages other good behavior. For example, if a developer is fixing a bug in a branch and spots an opportunity to do some general refactoring to improve the efficiency of the code, the thought of the additional merge pain may dissuade them from acting.
Instead we advocate that the team avoid using branches in version control wherever possible. This may sound like strange advice, given how much focus is placed on branching and merging in VCS, but by minimizing the number of branches, we can avoid many of the associated merge problems.
Each repository should, ideally, have just one main branch plus the occasional short-lived release branch. Instead of creating a physical branch for feature development, we use ‘logical’ branching and feature toggles. When combined with small, regular releases, many small repositories, and the use of package management for dependencies (covered later), the lack of branches helps to ensure that code is continuously integrated and tested.
The following sections outline this strategy in more detail.
Minimize “work in progress”
Teams often use branches in version control in order to make progress on different changes simultaneously. However, teams often end up losing much of the time they gained from parallel development streams in time-consuming merge operations, when the different work streams need to be brought together.
In my experience, teams work more effectively when they focus on completing a small number of changes, rather than tackling many things in parallel; develop one feature at a time, make it releasable. This reduces the volume of work-in-progress, minimizes context switching, avoids the need for branching, and helps keep changesets small and frequent.
Beware of organizational choices that force the use of branching.
The need for branching in version control often arises from the choices, explicit or otherwise, made by people in the commercial, business, or program divisions of an organization. Specifically, the commercial team often tries to undertake many different activities simultaneously, sometimes with a kind of competition between different product or project managers. This upstream parallelization has the effect of encouraging downstream teams, and development teams in particular, to undertake several different streams of work at the same time; this usually leads to branching in version control.
Another organizational choice that tends to increase the use of branching is support for multiple simultaneous customized versions of a core software product, where each client or customer receives a special set of features, or a special version of the software. This deal-driven approach to software development typically requires the use of branching in version control, leading rapidly to unmaintainable code, failed deployments, and a drastic reduction in development speed.
While there are strategies we can adopt to minimize the potential for endless feature branches or “per-customer” branches, which we’ll cover shortly, it’s also better to instead address and fix the problems upstream with the choices made by the commercial or product teams.
Encourage the commercial teams to do regular joint-exercises of prioritizing all the features they want to see implemented, thus reducing the number of downstream simultaneous activities.
Avoid feature branches in favor of trunk-based development.
A popular approach within development is to create feature branches in addition to release branches. We isolate a large-scale or complex change to a component or a fix to a difficult bug, in its own branch, so that it doesn’t disrupt the established build cycle. Other developers can then continue to work undisturbed on “mainline”. One of the major drawbacks of this approach is that it runs counter the principles of Continuous Integration, which require frequent, small changes that we integrate constantly with the work of others. By contrast, trunk-based development encourages good discipline and practices such as committing small, well-defined units of work to a single common trunk branch.
In SQL Server, use of schemas, and the permission system, is the obvious way of enabling trunk-based development. Schemas group together database objects that support each logical area of the application’s functionality. Ideally, the VCS structure for the database will reflect the schema structure, in the same way that C# code can be saved in namespaces.
Features within a given schema of a database will be visible only to those users who have the necessary permissions on that schema. This will make it easier to break down the development work per logical area of the database, and minimize interference when all developers are committing to trunk. It also means that the ‘hiding’ of features is extremely easy, merely by means of changing permissions.
Use feature toggles to help avoid branching.
In application development, if we wish to introduce a new feature that will take longer to develop than our established deployment cycle, then rather than push it out to a branch we hide it behind a feature toggle within the application code. We maintain a configuration file that determines whether a feature is on or off. We then write conditional statements around the new feature that prevents it from running until enabled by the ‘switch’ in the configuration file. It means the team can deploy these unfinished features alongside the completed ones, and so avoid having to deployment till the new feature is complete.
In database development, it is relatively straightforward to adopt a similar strategy. As long as we have a published database interface, as described earlier, we can decouple database and application deployments, to some extent. The views, stored procedures and functions that typically comprise such an interface allow us to abstract the base tables. As long as the interface “contract” doesn’t change, then we can make substantial schema changes without affecting the application. Instead of isolating the development of a new version of a database feature in a branch, the new and the old version exist side by side in trunk, behind the abstraction layer.
It means that we can test and “dark launch” database changes, such as adding a new table or column, ahead of time, and then adapt the application to benefit from the change at a later point. The article Database Branching and Merging Strategies suggests one example of how this might be done using a “proxy stored procedure.
By a similar strategy, we can avoid creating custom branches per customer requirements. Instead, we can use ‘feature toggles’ or plugin modules to produce the effect of customization without the need for branching and per-client customization of the source code. Of course, systems with long lived customizations (for example a payroll system that must support different countries/regions regulatory frameworks) will probably also require architectural decisions promoting those requirements.
Make any non-trunk branches short-lived.
Sometimes it is useful to create additional branches, particularly for fixing problems in the live environment, or for trying out new ideas. Branches that are short-lived can be a valuable asset to teams working with DLM. If a branch lasts only a few days, the person who created the branch knows exactly why it exists and knows what needs to be merged into the main branch; the drift from the main branch is small.
Branches that last for many days, weeks, or months represent a ‘merge nightmare’, where each branch is in effect a separate piece of software because the changes are not merged for a long time, possibly not until after the author of the changes has moved to a different team or even a different organization.
Account for differences in database configuration across envirohnments.
As discussed earlier, database source code does not consist merely of a number of tables and programmable objects. A database system is dependent on a range of different database and server-level configuration settings and properties. Furthermore, a differences in some of these settings, between environments, such as differences in collation settings, can cause differences in behavior. As a result, we need to place into version control the scripts and files that define these properties for each environment.
However, we still want to use the same set of database scripts for a given database version to deploy that database version to any environment. In other words, databases schema code and stored procedures/functions should be identical across all environments (Dev, Test, Staging, and Production). We do not want to use different versions of database code for different environments, as this leads to untested changes and a lack of traceability. This means that we must not include in the scripts any configuration properties, such as data and log file locations, nor any permission assignments, because at that point, we need one set of database scripts per version per environment.
Use database configuration files and version them.
A common source of problems with software deployments, in general, is that the configuration of the software is different between environments. Where software configuration is driven from text files, we can make significant gains in the success of deployment and the stability of the server environment by putting configuration files into version control. Where security is an issue then this should be within a configuration-management archive separate to the development archive. Whichever way it is done, it is best to think of it as being logically separate from database source because it deals with settings that are dependent on the server environment, such as mapping tables to drives, or mapping database roles to users and server logins.
Examples of configuration settings for databases include:
Database properties such as file layouts – data file and log file SQL Server instance-level configuration – such as Fill factor, max server memory SQL Server database-level configuration – such as Auto Shrink, Auto Update Statistics, forced parameterization Server properties – such as collation setting Security accounts – users and logins Roles and permissions – database roles and their membership.
Scripts or property files that define and control these configuration settings should be stored in version control in a separate configuration management source control archive. A general practice with SQL Server databases is to use SQLCMD, which allows us to use variables in our database scripts, for properties like database file locations, and then reads the correct value for a given environment from a separate file. SSDT also exports a SQLCMD file to allow the same script to be used in several environments.
This approach is particularly useful for DLM because it opens up a dialogue between software developers and IT Operations people, including DBAs; both groups need to collaborate on the location of the configuration files and the way in which the files are updated, and both groups have a strong interest in the environment configuration being correct: developers so that their code works first time, and operations because they will have fewer problems to diagnose.
We recommend one of two simple approaches to working with configuration files in version control, as part of Configuration Management. The first approach uses one repository with branch-level security (or multiple repositories if branch-level security is not available).
Figure 3: Configuration Management – single repository, branch-level security.
In this model, each environment has its own branch in version control, and settings are merged from one branch to another. This makes security boundaries simple to enforce, but changes are more difficult to track compared to the second model.
The second approach uses a single branch with multiple side-by-side versions of a particular configuration file, one per environment. In this model, per-environment security is tricky to set up, but merging and tracking changes is easier than in the first approach.
Figure 4: Configuration Management – single repository, single branch.
Either of these approaches will most likely lead to increased trust in environment configuration and more rapid diagnosis of environment-related problems.
Do not use per-environment repositories.
The configuration information for all server environments should live in one version control CM archive. Do not use a repository per environment, because this prevents tracing of changes between environments. The only exception to this will be for compliance reasons where the regulatory framework requires that configuration settings for the Production environment are kept in a separate repository to all other configuration values.
Use packaging to deal with external dependencies.
Many database applications have dependencies on third party modules. Rather than import into our VCS every module required by any of our applications, we can integrate these libraries and dependencies through packaging, for example using NuGet packages for libraries, or Chocolatey for Windows runtime packages. We can specify which versions of which libraries each application repository depends on, and store that configuration in version control. At build time the packages containing the library versions we need are fetched, thus decoupling our dependencies on external (to our repo) libraries from the repositories organization.
We can also use package management to bring together configuration and application or database packages by having the configuration package depend on the generic application or database package, which in turn might depend on other packages.
By putting database code into source control, we provide a way of making it easier to coordinate the work of the different teams who share responsibility for the database. At different points in the life of a database, it is the focus of very different types of activities, ranging from creating an initial data model to decommissioning.
Version control acts as a communication channel between teams, because the changes captured in the version control system are treated as the single definitive ‘source of truth’ for people to collaborate on, bringing together delivery, governance and operations around a common DLM approach.
A working database will have a variety of materials that are required by the delivery team, governance or operations. With source-controlled archives, it is always obvious where the source of any material is, who made which alteration and update, when and why; Any materials, whether code, instructions, training materials, support information, configuration items, step-by-step disaster recover procedures, signoffs, release documents, entity-relationship diagrams or whatever, can be accessed easily.
Version-control isn’t the core of DLM but it makes it achievable by providing a source of information that is accessible, traceable, reliable, repeatable, and auditable.
DevOps, Continuous Delivery & Database Lifecycle Management.
Go to the Simple Talk library to find more articles, or visit red-gate/solutions for more information on the benefits of extending DevOps practices to SQL Server databases.
Database Lifecycle Management.
Achieving DevOps for the Database.
Inscreva-se para obter mais artigos.
Os boletins quinzenais ajudam a aprimorar suas habilidades e mantê-lo à frente, com artigos, ebooks e opiniões para mantê-lo informado.
Inscreva-se no nosso boletim quinzenal.
Avalie este artigo.
Inscreva-se para obter mais artigos.
Os boletins quinzenais ajudam a aprimorar suas habilidades e mantê-lo à frente, com artigos, ebooks e opiniões para mantê-lo informado.
Inscreva-se no nosso boletim quinzenal.
Matthew Skelton.
Matthew Skelton has been building, deploying, and operating commercial software systems since 1998, and for several years he led a team that built and operated large database-driven websites for clients across many industry sectors. Co-founder and Principal Consultant at Skelton Thatcher Consulting Ltd, he specialises in helping organisations to adopt and sustain good practices for building and operating software systems: Continuous Delivery, DevOps, aspects of ITIL, and software operability. Matthew founded and leads the 700-member London Continuous Delivery meetup group, and instigated the first conference in Europe dedicated to Continuous Delivery, PIPELINE Conference. He also co-facilitates the popular Experience DevOps workshop series, and is a Chartered Engineer (CEng), and can be found on twitter as matthewpskelton.
Mooney's Blog.
Stuff about software development.
Think Globally, Act Locally (or Why your database version control strategy sucks and what to do about it, Part II)
This post is part 2 of a 382-part series on how to manage database changes, primarily for SQL Server, starting here.
I figured this would be a good week to discuss ways that you can make your world a better place by making small changes to things you do in your everyday work. No, this post is not about inconvenient truths or sustainability or hybrids or manbearpig. This post is about the importance of local development databases.
The situation you see all too often is that a development team has a single database server in their development environment that they share, and everyone is developing application code locally while simultaneously making changes to a shared database instance. Bad, bad, bad.
Captain Obvious.
These days, most everyone develops their code locally. That’s just want you do. Many developers have learned the hard way that this is important, and won’t tolerate any divergence. And for the less experienced developers who are doing it just because they are told to, eventually they will make the same mistakes and learn from them too. This is such an easy lesson to learn that you don’t see too many people violate it intentionally.
Even if you HAVE to develop on a server environment, you’ll probably at least find a way to isolate yourself. For example, SharePoint developers don’t tend to install the SharePoint platform on their local machines, mostly because it requires a server OS, but also because SharePoint is pure, unadulterated evil that will steal the very soul of any machine it comes into contact with. Nonetheless, in those cases where a local machine is not practical, the developer will install the SharePoint software onto a virtual machine so that they can still work in isolation.
This local development approach is critically important to any form of version control or change management. For all practical purposes, developers must have a workable environment that they can fully control and work in peace. From there, developers check their code into source control, and hopefully it gets built from source control before being deployed to another server. This gives each developer a degree of control over how much the other developers can screw them up, and more importantly it ensures that every change is traceable back to a date and time and person responsible.
This approach is so ingrained in so many developers, that often we take it for granted. Just try to remind yourself regularly how awful it was that time that everyone was working directly on the same developer server, and nobody can keep track of who changed what when. Or better yet, how fired up everyone got the last time somebody sneaked directly into the production server and started mucking around.
The database is like the village bicycle.
I am consistently surprised how often developers go to so much trouble to isolate their local application development environment, and then point their local application code to a shared development database server that the whole team is working on.
If you never need to make database changes, and nobody on your team needs to make database changes, this can certainly work. In that case, the database behaves like a third party service, rather than an actively developed part of the system.
However, if you are ever making database changes, you need to isolate your database for the exact same reasons that you need to isolate your application code.
Imagine that you working on a project that involves several layers of DLLs communicating with each other. Because you are in active development, you and your team are constantly making changes that affect the interfaces between those DLLs. The result in is that you continually need to check in your changes in a whole batches; you can’t just check in a few files here and there because you will be breaking the interfaces for anyone else working in that code.
The same rules must apply to the databases as well, for all of the same reasons. At any given point in time, anyone should be able to pull the code that is in source control, build it, and run it. However, if I’m making a series of changes to my local code and the shared development database, my crazy C# changes are isolated on my local machine, but coworkers are getting my database changes as they happen, so their systems will stop working all of the sudden, and they won’t even know why, or worse yet they will know exactly why and I’ll be the guy “who busted everything up.”
Better yet, after a few days of wasting time on a bad design, I give up on it, and with one or two clicks I can undo all of my code changes and roll back to the main development code stream. However, there is no one-click rollback to the database schema, and so now those changes need to be manually backed out. Hopefully I kept a good list of the changes so I can do this without missing anything, but we all know that a few things will get missed, and now the development database becomes even more of a mutant branch of the true database schema, full of changes that nobody remember or owns, and it is all going to blow up and make us all look like fools when we are rushing to deploy it into QA next month.
DVCS isn’t all bad.
Distributed Version Control Systems like Git and Mercurial are the fun new fad in version control, and everyone seems to think that they are so much cooler than more traditional and linear systems like Vault. To me, it seems to grossly overcomplicate an already difficult issue by exacerbating the most problematic concepts, namely branching and merging. But I’m a crusty old conservative who scoffs at anything new, so maybe (and even hopefully) I’m wrong. I was quick to dismiss it as a new toy of bored astronauts, but some people at lot smarter than me have done the same and seem to be coming around to it, if not embracing it completely, so I will continue to believe that I am right for now, even though I know I’m probably wrong and will change my mind eventually.
But there is one idea in DVCS systems that I can get on board with, and that’s the idea that everyone is working in their own branch. As we’ve discussed, you simply cannot be working in the same sandbox as everyone else, or you will have intractable chaos. You should stay plugged into what everyone else is doing on a regular basis, usually through version control, but you must also isolate yourself, and you must do so thoroughly.
And here’s the thing (and this may very well be the idea that eventually opens my path to DVCS enlightenment): your local machine is branch. Granted, it is not a very robust branch, because it only has two states (your current state and the latest thing in source control), but you are still essentially branched until you check in, in which case you will have to merge. It might be a really small merge, because the changes were small or backwards compatible, or because you were not branched off locally for that long, or you are the only one working on a feature, or because you’ve been communicating with the rest of your team, or because you are the only person who actually does any work, but you are merging nonetheless.
What does this have to do with databases? Branching is all about isolation. You must isolate your development environment, and you must so thoroughly. If you think of your machine as simply a branch of the source code, it crystallizes the idea that everything you are doing locally is a full stream of code, and it must contain everything needed to run that code, and must represent all of the changes in that code, including the database. In a broader view, if you were to branch your code to represent a release or a patch or feature, you obviously should be branching your database code at the same time (assuming of course that your database is under version control). If that is the case, and if the code on your local machine is nothing more than a primitive branch of what is in source control, then your local machine should also have its own copy of the database.
Database scripting is hard, let’s go shopping!
I know. This makes pushing your code to the development server more difficult, because you have to script everything out, and honestly the last thing I really want to do is complicate the lives of developers. In fact, I think the primary purpose of most development processes should be to reduce friction, and to make a developer’s life as worry-free as possible, so that they can focus on the real complicated business problems they are paid to solve, not the silly process crap that people invent to make themselves appear smart and organized.
That being the case, it may make your life a little harder to develop locally, and then write up all of the scripts necessary to push those changes to the dev server, but it is definitely worth it. This is not a theoretical improvement that will hopefully save you time in the distant future, when design patters rule and everybody’s tasks are on index cards and you’ve achieved 100% code coverage in your unit tests. No, this is a real, tangible, and immediate benefit, because you will save yourself effort when you deploy it to the next stage, namely QA or production. At that point, you’ll already have everything organized and listed out, and you did so when you were still working on the code and everything is still fresh in your mind. In my humble opinion, this is a much more maintainable process than everyone just trashing around in a wild-west development database, and then after it all spending days trying figure out which schema differences need to be included to release which features, because the change of getting that right consistently are almost non-existent.
And if this is really too much work for you to do well, maybe we can find you a ball to bounce with instead. Or maybe some UML diagrams and crayons to draw with. Either way, get the hell out of my code.
Beating a dead horse.
Hopefully I’ve convinced you that you should have your own copy of the database for you local development. I could go on forever giving locally reasons for this. Or, I could give specific examples, like a recent client that did not follow this pattern, and our team was constantly breaking each other because the code and database was out of sync, even though we were just a small team of 3 experienced and practical developers sitting in a single office right next to each other, working together well and communicating all day, but the lack of database isolation made the issues unavoidable.
So yeah, I could go on and on. But I won’t, because it’s getting boring. If you’re agreeing with me by now, feel free to go read something else.
I can’t run SQL Server locally, my machine sucks!
I’ve heard a lot of people say that they can’t run SQL Server locally, and sometimes they are right, but I think a lot of the time it is an excuse.
Maybe you don’t have a license to SQL Server. That’s usually fine, SQL Server Express Edition is free. Sure, it has some limitations, like the lack of a SQL profiler, but there are great free tools out there like the one from AnjLab. And if you still need a full-featured copy, the developer edition costs less than $50. Can you or your company not spare $50 for something like that? Mesmo?
Or maybe your machine doesn’t have enough memory. It’s true, SQL will eat up memory like nobody’s business and if you have Visual Studio 2008 and Outlook 2007 running, it can be pretty heavy. But I’ve found that as long as you have 3 or 4 GB of RAM, it works pretty well, and doesn’t everyone have that these days? Sure, a lot of you are stuck with crappy old machines that your employer gave you because he considers you to be a high-priced janitor, and he can’t justify in his mind spending a few hundred extra to make help you be more productive, but in that case you have bigger problems than anything we’re going to solve here. I would say, if possible, you should even shell out a few hundred and get some more memory for your machine, even if it’s a work machine and they won’t reimburse you for it. I know plenty of people who would be opposed to this just out of principle, but those people and their principles can go have their own little pity party and see who comes; in the meantime I’d rather solve the problem and move on.
Too Beaucoup?
However there is certainly one potential problem that can be difficult to overcome. What if your existing database is just too damn big to run locally?
One recent client had a production database which was used for a million unrelated purposes, and it was 30GB. Another recent client had the vast majority of their of their business data spread across two databases that were each about 300 GB. Sometimes, the database is just too big to copy down to your local machine. There are a few ways to deal with the problem.
Sometimes the best option is to separate the schema and the data. Strip down the data, get rid of the 300 GB, and get the minimum amount of sample data necessary to run your applications. Maybe clear it out entirely, and have some scripts or automated tests that generate a batch of sample data. Often times this will require a lot of analysis to determine what is necessary, and what all of the data is being used for, but that’s not an entirely bad thing. If you get a good and portable development database out of it, while also getting a better understanding of how the data is being used, then that has a lot of benefits. Granted, this is not easy by any stretch, but it may be doable. It all depends on your situation.
Another option is to setup a single high-powered development database server, and give each developer their own instance of the database on the server. This approach can have its own problems as well, such as people getting confused about which database instance belongs to who, and having enough disk space to store the inevitable terabytes of data.
So if you have any of these problems, my sympathies to you, and I hope that you can find a workable solution.
So hopefully you are now inspired to change your process at little bit, or just entertained for a little while.
Or, if you’ve been trying to do the same thing but aren’t getting the buy-in you need, you have a little more reference material for saying “I read an article about this on the interwebs, and some guy said…”. It’s one reason I make these things so long, as a favor to you, because the person you are arguing with isn’t actually going to read all of this crap, so maybe they will just give up and let you do whatever you want. The person that wins the argument usually isn’t the person who is right, it’s usually the person who is willing to waste the most time arguing about it.
Pós-navegação.
3 pensamentos sobre & ldquo; Think Globally, Act Locally (or Why your database version control strategy sucks and what to do about it, Part II) ”
I like working with databases but working with developers and managers that don’t understand the issues with db change management is frustrating.
Most of the work I’ve done in the past few years has been data warehouse systems for commercial insurance companies. Although tend to have modest data volumes by data warehouse standards (fact tables between 1-100m rows) you do need to be able to do financial reconciliations so you need a box capable of loading production data sets reasonably quickly.
I’m definitely a fan of the DB-per-developer principle. On one occasion I had 7 live development environments on a sole-charge job and I’ve seen a team of 3 with 14 separate environments.
My solution to this problem is to use two socket workstation systems that can take half a dozen disks. To cut the cost I got the bits off ebay, where secondhand workstations can be bought fairly cheaply – and RAID controllers and disks through the open market.
Appropriately tuned, the I/O on a box like this is pretty quick for ETL jobs and the entire environment is self-contained. The I/O on one of these is quicker than you might think – I’ve yet to see a SAN-based production system that isn’t quite a lot slower than my 2007 vintage HP workstation.
Typically you can build a machine of this spec off ebay for about £2,000, which is probably less than the cost of your development tooling. Even if you have to purchase it from a vendor like HP you will probably see change out of £5,000. Note that £5,000 would pay for about two weeks of an average contractor’s time in London.
These days, fairly ordinary PCs can take up to 16GB of RAM and 2-3 modern SSDs will get you even faster I/O than the SCSI disks I used.
If you took that route you could have a system capable of doing this sort of work in a package that should be palatable even to fairly conservative I. T. lojas.
As a bonus, because it’s a desktop and not a server, you don’t have to deal with the incumbent production bureaucracy when you need to manage environments.
Hi, and thanks for the great articles.
For the past few years, I’ve been working in an environment without a modest DB change management system and it was a disaster. My situation is one of those you mentioned, Tens of GB of data on a High-End server, multiple developers with shared development database!
We want to isolate the developer instances on the Server, and think we can do it by having a shared Data instance of the DB, and numeral Schemes for developers. Do you think it is feasible?
Also we’ve planned to use source control for DB Scripts, but have a little problem with it. We are unsure how to version database changes, as to whether store every single object in a separate file, with the file name same as the object, and everyone contributing to the file having to change that file, or to have any change script numbered incrementally, so we can run every change in a single version automated?
In the second option, we are not sure how to prevent two developers from changing a single object simultaneously and not to override each other’s code.
Comments
Post a Comment