Versioning a project

Project versioning provides many confort when developing a project: managing concurent developments, keeping an history, easing the release process, isolation of some features, version tagging, unified version for all developers, and so on.

Many systems exist : Subversion, Git, Bazaar, Mercurial... but both systems share some common philosophy, and every informations of this article are true for every versioning system.

Only keep your brain-generated code

The first important rule is to keep only the created code. There is no place for generated code, for example the API documentation : it is generated from your code.

The most dummy reason is to keep the source repository as small as possible. The other reason is the consistency : if you modify your source code without re-generating the API documentation, you will have a difference between the sourcecode and its documentation.

Removing generated code from the repository will also prevent users from modifying them. For example, in symfony 1.4, Doctrine generates base entities. It was very confusing, because they were placed with the sourcecode, and many developers versionned them. Then, someone else started modifying it. At the end, you cannot regenerate entities without breaking the application.

All generated files should be ignored with svn:ignore or .gitignore, or whatever, depending of your versioning system.

This rule is also true when dealing with external softwares. For example, you shouldn't version third libraries, because this is not your code. This will be discussed later in this article.

To conclude, just remember : only version what you created, with your brain.

Environment and security

It's important to identify what's specific to your local environment and what's in the project scope. For example, a folder path is specific to your local environment. The database configuration is also dependent of your environment. Maybe you are using project:project@localhost, and say "Ok, it's the way it should be", but I will tell you : "I don't use a specific MySQL access for each project, because it's boring to create an access for each project".

So, each time you put some code or configuration, the question you must ask to yourself is : "Is this value specific to my instance, or every instance of the project will have this value ?". There will be different instances : development, staging, testing, production, etc. The source code shouldn't be aware of physical environment specificities. And also, never ever-ever-never version something like this :

serverA:
  host: myhostA
  user: userA
  pass: passA
serverB:
  host: myHostB
  user: userB
  pass: passB

This is a also a security problem : it means that any developer of the project will have the production passwords. It also means that on server A, you will have access informations of server B.

The "Security-In-Depth" (future article ?) concept applies here, and you have no valable reason for keeping on server A informations about server B. It's a security lack.

Still concerning the security, no password should be stored in version system : CSRF token, third-service key, etc. All those informations should be configured per-instance, and secured somewhere else.

The main idea is : If a hacker gets access to your sourcecode repository, he shouldn't be able to connect to your services. He should only be able to do a a code-review.

Good practices

Atomic commits

Every commit should be as small as possible (until it makes sense). For example, if you are developing the feature "User account", a correct commit-stream could be :

  • Modelization of user
  • Fixtures for user
  • Register page
  • Login page
  • Logout page
  • User profile page
  • Tests for the user account

Acting like this, you will be able to review quickly your code and find a given information, when needed.

Review before committing

With git, it's very easy, you just have to type git diff --staged. With Subversion : svn diff <files you will commit>. A global review often show you some small mistakes, and will prevent you from committing passwords, useless code and typo mistakes.

Tagging the releases

Before putting to your production server, you should always tag it, to keep a trace of the release history. It will allow you to revert to a previous known state, in case of mistake.

Branching the unstable developments

Branching is very useful when you are developing new features. For example, if you are developing the comments feature, you will have to make many commits before stabilizing the feature. But the master/trunk should always be as stable as possible.

So to keep the application stable, create branch for developing feature and merge it to the master/trunk when finished.

External dependencies

When dealing with external dependencies, you shouldn't put in your sourcecode repository the code coming from other sources. Each versioning system provides tools for linking to external repositories : svn:externals or .gitmodules.

If you have to deal with Git from SVN of SVN from Git, you should consider creating a small script (Bash?) that will checkout the external source in an ignored folder.

It is also very important to fix the revision of the external library, because it occurs very often that the external is changing, and this could insert bugs into your application. With Git modules, you are constrainted to it. With Subversion, just set the svn:externals property to myvendor -r445 http://....

You should always rely on stable versions, with a version number.

To conclude

  • Just version code generated with your brain
  • Small commits
  • Tag to keep a trace
  • Branch when unstable working
  • Fix externals