Download Python Programming for Biology_ - Tim J. Stevens PDF

TitlePython Programming for Biology_ - Tim J. Stevens
File Size7.7 MB
Total Pages821
Table of Contents
                            Half title page
Title page
Copyright page
1 Prologue
	Python programming for biology
2 A beginners’ guide
	Programming principles
	Basic data types
	Program flow
3 Python basics
	Introducing the fundamentals
	Simple data types
	Collection data types
	Importing modules
4 Program control and logic
	Controlling command execution
	Conditional execution
	Error exceptions
	Further considerations
5 Functions
	Function basics
	Input arguments
	Variable scope
	Further considerations
6 Files
	Computer files
	Reading files
	File reading examples
	Writing files
	Further considerations
7 Object orientation
	Creating classes
	Further details
8 Object data modelling
	Data models
	Implementing a data model
	Refined implementation
9 Mathematics
	Using Python for mathematics
	Linear algebra
	NumPy package
	Linear algebra examples
10 Coding tips
	Improving Python code
	A compendium of tips
11 Biological sequences
	Bio-molecules for non-biologists
	Using biological sequences in computing
	Simple sub-sequence properties
	Obtaining sequences with BioPython
12 Pairwise sequence alignments
	Sequence alignment
	Calculating an alignment score
	Optimising pairwise alignment
	Quick database searches
13 Multiple-sequence alignments
	Multiple alignments
	Alignment consensus and profiles
	Generating simple multiple alignments in Python
	Interfacing multiple-alignment programs
14 Sequence variation and evolution
	A basic introduction to sequence variation
	Similarity measures
	Phylogenetic trees
15 Macromolecular structures
	An introduction to 3D structures of bio-molecules
	Using Python for macromolecular structures
	Coordinate superimposition
	External macromolecular structure modules
16 Array data
	Multiplexed experiments
	Reading array data
	The ‘Microarray’ class
	Array analysis
17 High-throughput sequence analyses
	High-throughput sequencing
	Mapping sequences to a genome
	Using the HTSeq library
18 Images
	Biological images
	Basic image operations
	Adjustments and filters
	Feature detection
19 Signal processing
	Fast Fourier transform
20 Databases
	A brief introduction to relational databases
	Basic SQL
	Designing a molecular structure database
21 Probability
	The basics of probability theory
	Restriction enzyme example
	Random variables
	Markov chains
22 Statistics
	Statistical analyses
	Simple statistical parameters
	Statistical tests
	Correlation and covariance
23 Clustering and discrimination
	Separating and grouping data
	Clustering methods
	Data discrimination
24 Machine learning
	A guide to machine learning
	k-nearest neighbours
	Self-organising maps
	Feed-forward artificial neural networks
	Support vector machines
25 Hard problems
	Solving hard problems
	The Monte Carlo method
	Simulated annealing
26 Graphical interfaces
	An introduction to graphical user interfaces
	Python GUI examples
27 Improving speed
	Running things faster
	Writing faster modules
Appendix 1 Simplified language reference
Appendix 2 Selected standard type methods and operations
Appendix 3 Standard module highlights
Appendix 4 String formatting
Appendix 5 Regular expressions
Appendix 6 Further statistics
Document Text Contents
Page 2

Python Programming for Biology

Bioinformatics and Beyond

Page 410

grey = pixmap.mean(axis=2)

pixmapEdge = convolveMatrix2D(grey, matrix)


The grey pixmap with the enhanced edges has its values centred on the average
brightness. So, for example, if the average brightness of pixmapEdge is 127, the range of
values changes from 0…255 to −127…128. These centred values, either side of zero,
represent how much adjustment we will apply to sharpen the original image. Before
making the adjustment pixmapEdge is stacked so that it is three layers deep, and thus will
operate on red, green and blue.

pixmapEdge -= pixmapEdge.mean()

pixmapEdge = dstack([pixmapEdge, pixmapEdge, pixmapEdge])

The new, sharpened image is created by adding the pixmap edge adjustment to the
original pixmap. With the pixels adjusted the clip function (inbuilt into NumPy arrays) is
used to make sure that adding the pixmaps does not exceed the limits of 0 and 255.

pixmapSharp = pixmap + pixmapEdge

pixmapSharp = pixmapSharp.clip(0, 255)

return pixmapSharp

The next example is the Gaussian filter, which blurs pixels with a weighting that has a
normal (‘bell curve’) distribution (see Figure 22.4 for an illustration). For this, two values
are passed in: r is the half-width of the filter excluding the centre and sigma is the amount
of spread in the distribution. These parameters respectively control the size and strength of
the blur. Larger filters with wider distributions (i.e. influence away from the centre) will
give more blurring. It is notable that the mgrid object is used to give a range of initial grid
values for the filter, specifying the separation of each point from the centre in terms of
rows and columns; this is similar to using range() to generate a list.

def gaussFilter(pixmap, r=2, sigma=1.4):

x, y = mgrid[-r:r+1, -r:r+1]

The Gaussian function is applied by taking the row and column values (x and y),
squaring them, scaling by two times sigma squared and finally taking the negative
exponent of the sum. The exact centre row and column will be zero and so the exponent
will be at a maximum here, but the further x and y row and column values are from the
centre the smaller the value is.

s2 = 2.0 * sigma * sigma

x2 = x * x / s2

y2 = y * y / s2

matrix = exp( -(x2 + y2))

matrix /= matrix.sum()

Once the filter matrix is defined it is applied to the pixmap using convolution, to each

Page 411

of the colour components.

pixmap2 = convolveMatrix2D(pixmap, matrix)

return pixmap2

The final filter example is for edge detection and uses what is known as the Sobel
operator. In essence this is a filter that detects the intensity gradient between nearby pixels
(see Figure 18.4f). It is applied horizontally, vertically or in both directions and gives
bright pixels at those edges. As can be seen in the Python code the filter is a 3×3 matrix
where there is a line of negative numbers, then zeros, then positive numbers. This matrix
is transposed to switch between horizontal and vertical operations. The matrix means that,
for a given orientation, a transformed pixel has none of its original value, but rather a
value which represents the difference between values on either side.

def sobelFilter(pixmap):

matrix = array([[-1, 0, 1],

[-2, 0, 2],

[-1, 0, 1]])

The Sobel filter matrix is applied to the grey average of the input pixmap. This is done
twice for both orientations so we get two edge maps.

grey = pixmap.mean(axis=2)

edgeX = convolveMatrix2D(grey, matrix)

edgeY = convolveMatrix2D(grey, matrix.T)

The final pixmap of edges is then a combination of horizontal and vertical edge maps.
Taking the square root of the sum of the squares of the two edge maps means the values
will always be positive; it won’t make a difference between an edge going from light to
dark or dark to light in an image. The edge-detected pixmap is also normalised so we can
see the full range of values and finally it is returned from the function.

pixmap2 = sqrt(edgeX * edgeX + edgeY * edgeY)

normalisePixmap(pixmap2) # Put min, max at 0, 255

return pixmap2

The filter functions can all be tested with the example image, using to see
the results, after the appropriate array conversions. Note that for the sobelFilter() output
we pass the ‘L’ mode to the PIL conversion function because it is a greyscale image, not

from PIL import Image

img ='examples/Cells.jpg')

pixmap = imageToPixmapRGB(img)

pixmap = sharpenPixmap(pixmap)


pixmap = gaussFilter(pixmap)

Page 820

substitution matrices 215–216

calculating 258

substitution rates 272

substrings 28

support vector machines 534, 541

sys module 640

sys.argv attribute 84

text functions 622

textual data see Strings: data type

thumbnail images 366

time module 641

timing performance 163

tkinter module 570

torsion angle 157

transcription 185

translate() function 70, 167

translation, of codons 185, 190

travelling salesman problem 552

trigonometric functions 139

triple-quoted string 25

True value 47

try statement 58

T-statistic 472

T-tests 472

tuples 32

manipulation 34

membership 35

operations 621

two-tailed test 463

type checking 177

type classes 115

type() function 25, 115

Page 821

types module 177

undefined value see None value

urllib module 287, 345

urlopen() function 288, 345

variables 23

as attributes 105

naming 23

naming convention 160

scope 72

self 106

swapping 165

vector distance 516

vector space 144

Viterbi algorithm 443

while loops 52


in Python code 22

removal 29

splitting on 84

with statement 83

XML files 90, 229

zip() function 164, 169

Z-score 470

Similer Documents