Developing in PySIT

[ PySIT Setup ] [ An Example ] [ Developing PySIT ]

An Apology

This page is here to codify the development procedures we will employ for PySIT. I apologize in advance if anything here seems overly restrictive or annoying. A set policy is necessary on many fronts to enable smooth, quality development with multiple collaborators. I put this together, not to be bossy, but so that this project can succeed. We all have different backgrounds and coding styles, but it is best to present the software with a united front. If there are any suggestions for modifications to these standards and practices, by all means bring them up. There is no 'one true right way,' and if even if there is, I don't profess to know it.

The notes below are compiled from experience developing numerical code in a mixed Python and C++ environment, extensive reading on best practices (and reading arguments about best practices) in Python software development, an understanding of the conventions within the Python and numerical Python software communities, and by borrowing (and modifying) ideas from existing projects. By no means do I claim that this is the be-all-end-all, best (or only) way to do things, but they should work. If there are any problems or flaws, let's fix things!

Source Control

We will be using the Mercurial (hg) package for source code revision control. It does not appear that hg is available by default in the math department computing environment, so you will have to install it to your home directory. Download the package and follow the installation directions. Make sure the hg binary is in your path.

Mercurial is a distributed version control system (DVCS; similar to git, if that is familiar). If you have used CVS or SVN in the past, a DVCS has the same core goal, but the philosophy is different. In either case, if you are new to hg, a great place to start understanding the software is Hg Init. There are plenty of other tutorials on the web, as well. Also, feel free to ask me any questions.

We will use a centralized repository to maintain a common version of the code. The central repository is hosted on BitBucket.org. The general idea is that each of us will have their own 'fork' of the central repository in which we will do our development. Once a feature or bug fix is complete, we will then issue a 'pull request' so that the change can be reviewed and merged into the central repository.

A major difference in between classical version control (like SVN) and Mercurial is that in SVN, a commit is immediately pushed to the central repository while in hg, a commit is a local operation. Commits/changes in hg are kept in the local repository until they are explicitly pushed to the central repository. This means that you can rollback local changes easily, without ever worrying about breaking the main code. You can even remove all of your changes by deleting your local clone, with no adverse effects on the central repository. You can also have multiple clones, each to work on different features. This last option may be useful if you are working on two large, but independent additions to PySIT, though 'branching' is the preferred approach.

After you have cloned your fork of the repository using

hg clone https://YOUT_BITBUCKET_USERNAME@bitbucket.org/YOUT_BITBUCKET_USERNAME/pysit LOCAL_DIR

the development cycle will look something like this:

Work on a change or bug fix in the local repository, perhaps having used hg branch NEW_BRANCH_NAME if it is a significant change.
Commit changes to your clone frequently and on a per change basis. This means that if you change multiple files for the same purpose (e.g., changing a common variable name globally within the project), you should commit all of these files at once.
Periodically, you can push your changes up to your fork (for backup purposes) using hg push
Once you are satisfied that the bug fix or feature is complete and that the new code passes the unit tests (more on this later), you are ready to push the changes to the central repository.
First, run hg pull upstream default to get the latest changes from the central repository. This ensures that no one else has changed the main repository since you last pulled from it.
If there has been changes, and if there are conflicts, follow instructions for merging to resolve the conflicts.
Once all conflicts have been resolved, then go to your fork on BitBucket and issue a pull request to the default branch of the main repository.
From there, we can all comment and review the new code and I can merge in completed code.
Continue developing your local version, and repeat from step 2 as necessary.

Note: For the pull to upstream to work, you must add the line (under the [paths] heading)

upstream = https://rhewett@bitbucket.org/rhewett/pysit

to your .hg/hgrc file in the repository directory

We will deal with the procedure for releasing the software (i.e., major version releases) when we get there. No need to burn that bridge until we are standing on it.

Documentation

Global Project Documentation

We will be using the sphinx python package for automagic generation of documentation. Follow the example docstrings in the current version of PySIT for an idea of how we are documenting classes and function calls.

More details on this will be included here as the project evolves and we have a better idea of what we need. As a rule, it is best to start documenting early, and keeping up with it as you go. It is much more difficult to handle documentation after code is written. For this reason, when you make a change to existing code (e.g., a function signature), be sure to update its global documentation as well, and be sure to do this before you push your changes to the central repository.

Local Source Code Documentation

In addition to the global documentation, let's be consistent in our local code documentation too.

Use common sense and document code as it is written.
Avoid clutter, really obvious code should not need long explanations.
Using descriptive variable and function names goes a long way toward making code easier to read, and cuts down on the number of comments required.
Clever coding and numerical tricks should be documented so other developers (and users!) are not left guessing at what tricky code is doing. Explain the logic of the code if necessary.
References to outside sources (books, wiki, web pages, articles) are always welcome.

Unit Tests (aka, Test Driven Development)

We will use the python package nose for unit testing. Briefly, the idea behind unit testing is that every bit of functionality (within reason, of course) should have a test that guarantees that it works correctly.

For example, if you wrote an iterative linear solver, you might have unit tests to confirm that the output vector has the correct dimensions (you would be surprised how easy this is to screw up), and using a series of small examples, check to see that the solution is within the specified tolerance and check that any error modes (e.g., if the linear system should be SPD but it is not) are properly accounted for.

With this development model, if I make a change to a separate piece of code, I can run the unit tests to verify that my change did not break something elsewhere. Such bugs can be hard to track down otherwise. Before a change is pushed to the central repository, it (likely) should have its own unit test and (definitely) should not cause any existing unit tests to fail.

As development of PySIT proceeds, examples of unit tests will be available. This aspect of the project we can develop as we go.

Conventions

For consistency and ease of development, Python code should be written using the following conventions:

Python uses indentation to delineate code blocks. This has caused a bit of a holy war within the Python community. Some prefer to use a single tab character for indentation, some prefer a certain number of spaces (e.g., two or four). The former is preferred by some because different coders prefer different tab display widths settings. The latter is preferred because it is easier to align code and it is more consistent across editors. Inconsistent indentation breaks Python scripts.

We will use a hybrid approach that resolves both of these issues. Tabs will be used for indentation block purposes, spaces will be used for alignment. For example (where <tab> is a tab and ~ is a space) an if block should look like

if (x == 3): <tab>print(x)

and inside a block, a function call with many arguments might look like

if (x == 3): <tab>np.meshgrid( np.linspace(0,1,100), <tab>~~~~~~~~~~~~~np.linspace(0,10,1000) )
In keeping with the recommended Python style,
- class names should be in UpperCamelCase (note first word is capitalized),
- constants (used rarely) should be CAPITALIZED_WITH_UNDERSCORES separating whole words,
- and other variables, functions, and class members should be lowercase_with_underscores separating whole words.
Use intelligent abbreviations if the variable name (or on occasion, the method name) is too long or if there is a well known abbreviation (e.g., PML). Good, descriptive variable names makes for more readable code. Excessively long variable names hinders development.
More as I think of them...

For consistency and ease of development, C/C++ code should be written using the following conventions: [defined later].