COURSE ANNOUNCEMENT (Spring 2000):

Algorithms in biological sequence analysis


This course is about algorithms used in the manipulation and analysis of biological sequence data, which may be regarded as strings from a four symbol alphabet (in the case of DNA), or a twenty symbol alphabet (in the case of proteins).

The course has two related foci: from a biological standpoint, what are the sensible questions to ask, and from a computer science standpoint, how do you answer them. As such, the course could easily occupy four semesters and require extensive background from several fields. This course has these same foci, but it is intended to serve only as an introduction to the subject matter.

One point of view on part of the course is that it is about finding exact or approximate matchings between strings. Much of our time will be occupied with a careful study of some of the algorithms for doing this: what they are and how fast they are; how do we know that they work (and as quickly as claimed).

TEXTBOOKS

The text for the course is:
[R] Introduction to Computational Molecular Biology, by J. Setubal and J. Meidanis (PWS Publishing Co., 1997; $66.75.)
This will be greatly supplemented in class. One source I will use is the following superb book, which presents approximately the same material as [R], but in much greater depth:
[O] Algorithms on strings, trees, and sequences: computer science and computational biology, by Dan Gusfield (Cambridge Univ. Press, 1997; $64.95). [The author is a computer science professor at U.C. Davis who has worked on genome sequencing.]
You might wish to buy it. In addition, if you have no knowledge of molecular cell biology, I recommend that you purchase one of the following texts:
[B1] Essential Cell Biology: An Introduction to the Molecular Biology of the Cell, by B. Alberts (and six coauthors). ($70.95)
[B2] Molecular Biology of the Cell, by B. Alberts (and five coauthors). ($75.95)
Both of these books are very highly thought of. The first one is pretty much a simplified and condensed version of the second. So for the brave, I recommend the second. In any case, please understand two things: first, even [B1] has a vast amount of material that is not connected to the course (although it is good stuff to learn); second, of course I want you to understand the biology, but if you don't, you can still learn a lot from the course, and your evaluation will not suffer.

WHAT ARE THE PREREQUISITES?

The course requires no specific background knowledge from mathematics, computer science, or biology, but does require a willingness to grapple with the sometimes complicated details of algorithms and their analyses. It is my hope and belief that the material will be accessible to talented undergraduate and graduate students from several disciplines.

WHO SHOULD TAKE THE COURSE?

Anyone intrigued by the burgeoning field of bioinformatics, in which government and industry are investing billions. If you like the course, you might wish to consider further study in bioinformatics, which would necessarily involve taking many courses in biology, biochemistry, computer science, and mathematics. There may exist a bioinformatics major at UNL in the near future. A companion course is likely to be taught by Professor Stephen Scott (CSE) next semester.

Anyone interested in basic algorithms from computer science. Most of the topics we will consider have both biological and non-biological applications.

EVALUATION

Course grades will be based on homework assignments. Programming is not required.

LINKS

See my list and Stephen Scott's list.

INTERESTED?

Please let me know. Feel free to come talk to me if you have questions. The course is Math 939, call number 8749. It is a three hour course. We will meet Monday, Thursday, and Friday from 4:30 to 5:20, in OldH 204.

David Jaffe
936 Oldfather Hall
Department of Mathematics and Statistics
University of Nebraska - Lincoln
e-mail: jaffe@cpthree.unl.edu
this page: http://www.math.unl.edu/~djaffe/bio.html
phone: 472-7253