This page is here to codify the development procedures we will employ for PySIT. I apologize in advance if anything here seems overly restrictive or annoying. A set policy is necessary on many fronts to enable smooth, quality development with multiple collaborators. I put this together, not to be bossy, but so that this project can succeed. We all have different backgrounds and coding styles, but it is best to present the software with a united front. If there are any suggestions for modifications to these standards and practices, by all means bring them up. There is no 'one true right way,' and if even if there is, I don't profess to know it.
The notes below are compiled from experience developing numerical code in a mixed Python and C++ environment, extensive reading on best practices (and reading arguments about best practices) in Python software development, an understanding of the conventions within the Python and numerical Python software communities, and by borrowing (and modifying) ideas from existing projects. By no means do I claim that this is the be-all-end-all, best (or only) way to do things, but they should work. If there are any problems or flaws, let's fix things!
We will be using the Mercurial (hg) package for source code revision control. It does not appear that hg is available by default in the math department computing environment, so you will have to install it to your home directory. Download the package and follow the installation directions. Make sure the hg binary is in your path.
Mercurial is a distributed version control system (DVCS; similar to git, if that is familiar). If you have used CVS or SVN in the past, a DVCS has the same core goal, but the philosophy is different. In either case, if you are new to hg, a great place to start understanding the software is Hg Init. There are plenty of other tutorials on the web, as well. Also, feel free to ask me any questions.
We will use a centralized repository to maintain a common version of the code. The central repository is hosted on BitBucket.org. The general idea is that each of us will have their own 'fork' of the central repository in which we will do our development. Once a feature or bug fix is complete, we will then issue a 'pull request' so that the change can be reviewed and merged into the central repository.
A major difference in between classical version control (like SVN) and
Mercurial is that in SVN, a commit
is immediately pushed to
the central repository while in hg, a commit
is a local
operation. Commits/changes in hg are kept in the local repository until
they are explicitly pushed to the central repository. This means that
you can rollback local changes easily, without ever worrying about
breaking the main code. You can even remove all of your changes by
deleting your local clone, with no adverse effects on the central
repository. You can also have multiple clones, each to work on
different features. This last option may be useful if you are working
on two large, but independent additions to PySIT, though 'branching' is the preferred approach.
After you have cloned your fork of the repository using
hg clone https://YOUT_BITBUCKET_USERNAME@bitbucket.org/YOUT_BITBUCKET_USERNAME/pysit LOCAL_DIR
the development cycle will look something like this:
hg branch NEW_BRANCH_NAME
if it is a significant change.hg push
hg pull upstream default
to get the latest changes from the central repository. This ensures that no one else has changed the main repository since you last pulled from it.Note: For the pull to upstream
to work, you must add the line (under the [paths]
heading)
upstream = https://rhewett@bitbucket.org/rhewett/pysit
to your .hg/hgrc
file in the repository directory
We will deal with the procedure for releasing the software (i.e., major version releases) when we get there. No need to burn that bridge until we are standing on it.
We will be using the sphinx python package for automagic generation of documentation. Follow the example docstrings in the current version of PySIT for an idea of how we are documenting classes and function calls.
More details on this will be included here as the project evolves and we have a better idea of what we need. As a rule, it is best to start documenting early, and keeping up with it as you go. It is much more difficult to handle documentation after code is written. For this reason, when you make a change to existing code (e.g., a function signature), be sure to update its global documentation as well, and be sure to do this before you push your changes to the central repository.
In addition to the global documentation, let's be consistent in our local code documentation too.
We will use the python package nose
for unit testing.
Briefly, the idea behind unit testing is that every bit of functionality
(within reason, of course) should have a test that guarantees that it
works correctly.
For example, if you wrote an iterative linear solver, you might have unit tests to confirm that the output vector has the correct dimensions (you would be surprised how easy this is to screw up), and using a series of small examples, check to see that the solution is within the specified tolerance and check that any error modes (e.g., if the linear system should be SPD but it is not) are properly accounted for.
With this development model, if I make a change to a separate piece of code, I can run the unit tests to verify that my change did not break something elsewhere. Such bugs can be hard to track down otherwise. Before a change is pushed to the central repository, it (likely) should have its own unit test and (definitely) should not cause any existing unit tests to fail.
As development of PySIT proceeds, examples of unit tests will be available. This aspect of the project we can develop as we go.
For consistency and ease of development, Python code should be written using the following conventions:
Python uses indentation to delineate code blocks. This has caused a bit of a holy war within the Python community. Some prefer to use a single tab character for indentation, some prefer a certain number of spaces (e.g., two or four). The former is preferred by some because different coders prefer different tab display widths settings. The latter is preferred because it is easier to align code and it is more consistent across editors. Inconsistent indentation breaks Python scripts.
We will use a hybrid approach that resolves both of these issues.
Tabs will be used for indentation block purposes, spaces
will be used for alignment. For example (where <tab>
is a tab and ~ is a space) an if
block should look
like
if (x == 3):
<tab>print(x)
and inside a block, a function call with many arguments might look like
if (x == 3):
<tab>np.meshgrid( np.linspace(0,1,100),
<tab>~~~~~~~~~~~~~np.linspace(0,10,1000) )
In keeping with the recommended Python style,
UpperCamelCase
(note first word
is capitalized),CAPITALIZED_WITH_UNDERSCORES
separating whole words,lowercase_with_underscores
separating whole words.Use intelligent abbreviations if the variable name (or on occasion, the method name) is too long or if there is a well known abbreviation (e.g., PML). Good, descriptive variable names makes for more readable code. Excessively long variable names hinders development.
For consistency and ease of development, C/C++ code should be written using the following conventions: [defined later].