18.335J/6.7310J: Introduction to Numerical Methods

This is the repository of course materials for the 18.335J/6.7310J course at MIT, taught by Dr. Andrew Horning, in Spring 2023.

Syllabus

Lectures: Tuesday/Thursday 11am–12:30pm in room 4-370

Office Hours: Wednesday and Thursday at 12:30 - 1:30 in room 2-238C.

Contact: horninga@mit.edu

Topics: Advanced introduction to numerical linear algebra and related numerical methods. Topics include direct and iterative methods for linear systems, eigenvalue decompositions and QR/SVD factorizations, stability and accuracy of numerical algorithms, the IEEE floating-point standard, sparse and structured matrices, and linear algebra software. Other topics may include memory hierarchies and the impact of caches on algorithms, nonlinear optimization, numerical integration, FFTs, and sensitivity analysis. Problem sets will involve use of Julia, a Matlab-like environment (little or no prior experience required; you will learn as you go).

Launch a Julia environment in the cloud:

Prerequisites: Understanding of linear algebra (18.06, 18.700, or equivalents). 18.335 is a graduate-level subject, however, so much more mathematical maturity, ability to deal with abstractions and proofs, and general exposure to mathematics is assumed than for 18.06!

Textbook: The primary textbook for the course is Numerical Linear Algebra by Trefethen and Bau. For a classical (and rigorous) treatment, see Nick Higham's Accuracy and Stability of Numerical Algorithms.

Other Reading: Previous terms can be found in branches of the 18335 git repository. The course notes from 18.335 in much earlier terms can be found on OpenCourseWare. For a review of iterative methods, the online books Templates for the Solution of Linear Systems (Barrett et al.) and Templates for the Solution of Algebraic Eigenvalue Problems are useful surveys.

Grading: 40% problem sets (four psets due / released every other Friday), 30% take-home mid-term exam (second week of April), 30% final project (one-page proposal due Friday April 7, project due Tuesday May 16).

Psets will be submitted electronically via Gradescope (sign up for Gradescope with the entry code on Canvas). Submit a good-quality PDF scan of any handwritten solutions and also a PDF printout of a Julia notebook of your computational solutions.
Piazza discussion board
previous midterms: fall 2008 and solutions, fall 2009 (no solutions), fall 2010 and solutions, fall 2011 and solutions, fall 2012 and solutions, fall 2013 and solutions, spring 2015 and solutions, spring 2019 and solutions, spring 2020 and solutions.

TA/grader: TBD

Collaboration policy: Talk to anyone you want to and read anything you want to, with three exceptions: First, you may not refer to homework solutions from the previous terms in which I taught 18.335. Second, make a solid effort to solve a problem on your own before discussing it with classmates or googling. Third, no matter whom you talk to or what you read, write up the solution on your own, without having their answer in front of you.

You can use psetpartners.mit.edu to help you find classmates to chat with.

Final Projects: The final project will be an 8–15 page paper reviewing some interesting numerical algorithm not covered in the course. See the 18.335 final-projects page for more information, including topics from past semesters.

Assignments

Pset 1 is due on Friday, February 24 at 11:59pm.
Pset 2 is due on Friday, March 17 at 11:59pm.
Pset 3 is due on Friday, April 14 at 11:59pm.

Lecture Summaries and Handouts

Lecture 1 (February 7)

This course is about Numerical Linear Algebra (NLA) and related numerical methods. But why do we need NLA? How does it fit in to other areas of computational science and engineering (CSE)? Three simple examples demonstrate how NLA problems arise naturally when solving problems drawn from across continuous applied mathematics.

Solving Poisson's equation: from charge density to electric potential. (Linear systems: LU and Cholesky, iterative methods.)
Dynamic Mode Decomposition: learning models from data. (Least squares: QR factorization, SVD, low-rank approximation.)
Charge density of interacting electrons: NLA in nonlinear problems. (Eigenvalue problem: QR algorithm, iterative methods)

NLA is often applied in tandem with tools from other fields of mathematics: approximation theory, functional analysis, and statistics, to name a few. We'll focus on NLA, which is a computational workhorse within CSE. The starting point is floating point: how do we respresent real numbers on the computer?

Further Reading: L.N. Trefethen, The Definition of Numerical Analysis.

Lecture 2 (February 9)

Floating point arithmetic, exact rounding, and the "fundamental axiom"
Catastrophic cancellation, overflow, underflow
Forward and backward stability
Stability of summation algorithms

Further Reading: L. N. Trefethen, Lectures 13 and 14. Also, see the notebook about floating point.

Lecture 3 (February 14)

Vector and matrix norms
Jacobian and condition numbers
Accuracy <= backward stable algorithms + well-conditioned problems

Further Reading: L. N. Trefethen, Lectures 12 and 15.

Lecture 4 (February 16)

Solving Ax = b
Condition number of A
Orthogonal/Unitary matrices
The singular value decomposition (SVD)

** Further Reading:** L. N. Trefethen, Lectures 4 and 5.

Lecture 5 (February 23)

Gaussian elimination and A = LU (no row interchanges)
Cost (flops) for A = LU and Ux = b
Solving Ax=b via A=LU: when to save L?
Partial pivoting and backward stability (counterexamples)

** Further Reading:** L. N. Trefethen, Lectures 20, 21, and 22. Can you bring GE's instability to life in this example?

Lecture 6 (February 28)

This lecture is remote (MIT only): https://mit.zoom.us/j/96174583317

Partial pivoting and PA=LU
Solving Ax=b via PA=LU
Backward stability of PA=LU in theory and in practice
Sparse LU factors via row and column pivoting

** Further Reading:** L. N. Trefethen, Lectures 21 and 22. See the MATLAB docs for PA=LU for more examples of pivoting in action.

Lecture 7 (March 2)

The Cholesky decomposition: symmetric elimination for symmetric positive definite (SPD) matrices
Cost (1/2 of A=LU) and stability (backward stable, no pivoting) of Cholesky
Column and row pivoting to improve sparsity
Least-squares solutions to overdetermined linear systems

** Further Reading:** L. N. Trefethen, Lecture 23. See the MATLAB docs for Cholesky and, e.g., approximate minimal degree (AMD) reordering to reduce fill-in.

Lecture 8 (March 7)

Three ways to compute least-squares solutions to overdetermined linear systems

The normal equations
The singular value decomposition
The QR decomposition

All three formulations lead to the pseudoinverse of A and are mathematically equivalent, but not numerically equivalent. The SVD approach leverages orthogonal projections to simplify the pseudoinverse and avoid squaring the condition number of the resulting n x n diagonal system for the least-squares solution. While the SVD diagonalizes A by rotating/reflecting inputs and outputs, the QR decomposition triangularizes A by rotating/reflecting outputs. The result is an orthogonal basis for the column space of A and a triangular n x n system to solve for the least-squares solution.

Further Reading: L. N. Trefethen, Lectures 11, 18, and 19.

Lecture 9 (March 9)

Sensitivity and conditioning of least-squares solutions
Gram-Schmidt orthogonalization and A = QR
Loss of orthogonality and modified Gram-Schmidt
Householder triangularization

Further Reading: L. N. Trefethen, Lectures 7, 8, and 9.

Lecture 10 (March 14)

Triangularization vs orthogonalization
"Thin" and "full" QR decompositions
Computing and applying Householder reflectors
Householder QR algorithm
Flop count and backward stability

Further Reading: L. N. Trefethen, Lectures 10 and 16. For backward stability analysis, see Chapter 18.3 of Higham.

Lecture 11 (March 16)

Householder reflections and Givens rotations demos
Eigenvalues, eigenvectors, and diagonalizable matrices
Why do we need eigenvectors? (And not just the SVD)

Further Reading: L. N. Trefethen, Lecture 24. Explore the reflections-and-rotations notebook and the least-squares notebook for more computational demos with least-squares and A=QR in Julia.

Lecture 12 (March 21)

Eigensolvers must be iterative
Simultaneous power iterations
The QR iteration for eigenvalues
Convergence of QR iterations

Further Reading: L. N. Trefethen, Lecture 25 and 28. For the remarkable stories of John Francis and Vera Kublanovskaya, who independently proposed and analyzed the QR algorithm, see Gene Golub and Frank Uhlig's article.

Lecture 13 (March 23)

Accelerating power iterations with shift-and-invert
Rayleigh quotient iterations
QR iterations with shifts
Tridiagonal reduction and QR iterations

Further Reading: L. N. Trefethen, Lectures 26, 27, and 29.

Lecture 14 (April 4)

Wilkinson's condition number for simple eigenvalues
"Sine Theta" theorem for eigenvectors of Hermitian matrices
The pseudospectrum of a matrix

Further Reading: Check out the Pseudospectral Gateway for more about pseudospectral analysis and its remarkable power in applications across applied math. For more about the perturbation theory of invariant subspaces, see Prof. Ipsen excellent summary of Pete Stewart's contributions to the area.

Lecture 15 (April 6)

Iterative methods and mat-vec paradigm
Build a subspace and extract approximations: Galerkin and Rayleigh-Ritz projections
Arnoldi iteration: power iteration meets modified Gram-Schmidt

Further Reading: L. N. Trefethen, Lectures 32 and 33. Check out the SuiteSparse Matrix Collection for four decades worth of sparse, structured matrix examples.

Lecture 16 (April 11)

Arnoldi decomposition
Approximating eigenvalues with Arnoldi
Convergence of Ritz values

Further Reading: L. N. Trefethen, Lectures 33 and 34. For a beautiful explanation of the Ritz values' behavior using potential theory, see the SIAM review paper by Arno Kuijlaars. You can also read about the implicitly restarted Arnoldi iteration and the bells and whistles that made it a standard iterative eigensolver for non-Hermitian matrices.

Lecture 17 (April 13)

Arnoldi for Ax=b
The method of Generalized Minimal Residuals (GMRES)
Convergence of GMRES

Further Reading: L. N. Trefethen, Lecture 35.

Lecture 18 (April 20)

Symmetric and symmetric positive definite matrices
The Lanczos iteration
The method of Conjugate Gradients (CG)

Further Reading: L. N. Trefethen, Lectures 36 and 38.

Lecture 19 (April 25)

Convergence of CG
CG as an optimization algorithm
Gradient descent

Further Reading: L. N. Trefethen, Lecture 38.

Lecture 20 (April 27)

Convergence of gradient descent
CG vs gradient descent
Newton's method

Further Reading: See this excellent introduction to CG from first principles.

Lecture 21 (May 2)

Newton's method in higher dimensions
Secant and quasi-Newton methods
Broyden's updates and BFGS

Further Reading: See Prof. Johnson's notes on the BFGS update (find there also the original references of Broyden, Fletcher, Goldfarb, and Shanno). For a modern perspective on secant and multi-secant methods, see Feng and Saad.

Lecture 22 (May 4)

Many big data matrices are approximately low-rank
Optimal low-rank approximation and the SVD
Randomized range-finders

Further Reading: See the now classic review paper by Halko, Martinsson, and Tropp for an introduction to randomized range-finders and approximate matrix decompositions. This interesting paper by Udell and Townsend explores the origins of low-rank structure in big-data matrices.

Lecture 23 (May 9)

Accuracy of randomized range-finders
Oversampling and failure probability
Accelerating NLA via "sketch-and-solve"

Further Reading: Read, for example, the recent work of Nakatsukasa and Tropp to see how randomized "sketch-and-solve" techniques can accelerate established iterative methods for linear systems and the eigenproblem.

Lecture 24 (May 11)

Dynamic Mode Decomposition (DMD)
Procrustes problems over matrix manifolds
physics informed Dynamic Mode Decomposition (piDMD)

Further Reading: Check out Steven Brunton's youtube channel for a variety of wonderful presentations on DMD and its applications, including a video on piDMD by Peter Baddoo. You can also find the piDMD paper here.

Lecture 25 (May 16)

Why (and when) do we count flops?
Memory hierarchy and the "ideal cache"
Cache-aware matrix-matrix multiplication

Further Reading: Check out these slides and experiments by Prof. Johnson from 2008 on the memory hierarchy.

Name		Name	Last commit message	Last commit date
Latest commit History 581 Commits
midterm		midterm
notes		notes
project		project
psets		psets
.gitignore		.gitignore
README.md		README.md

mitmath/18335

Folders and files

Latest commit

History

Repository files navigation