Vaclav Kosar's face photo
Vaclav Kosar
Software, Machine Learning, & Business

Sparse Matrix Why and When?

Sparce matrix formats like CSR, LOL, COO compress and speed up certain operations on mostly zero matrices

Hi, this is a draft of a post, but it could be already useful.

Dense Matrix Multiplication Example

The simplest the most common formats used.

Dense matrix multiplication example

CSR Sparse Matrix Multiplication Example

A very common sparse format that is useful in situations described in below.

Compressed Sparce Row sparse matrix matrix multiplication

Sparse Matrix Formats

Sparse matrix format compresses matrices with more than half zero values and speeds up certain operations on them. There are two main groups of the sparce matrix representations:

  • efficient for incremental construction and modification DOK, LIL, COO
  • efficient for access and operations CSR, CSC There are several types of sparse matrix representations, where each has an advantage in different situations.

DOK: Dictionary of Keys format:

  • row, column pairs map to value,
  • dict((i, j): matrix[i, j] for i in matrix.shape[0] for j in matrix.shape[1])

LOL: List of Lists format:

  • each row has one list of tuples of column and value
  • example [[(j, matrix[i,j]) for j in range(matrix.shape[1])] for i in range(matrix.shape[0])]

COO: COOrdinate format (aka IJV, triplet format):

  • sorted list of row, column, value tuples
  • [(i, j, matrix[i, j]) for i in range(matrix.shape[0]) for j in range(matrix.shape[1])]

CSR: Compressed Sparse Row format

  • stores matrix in 3 lists
    • “row list” for each row contains a number that is references a position in the value and column lists, where the row’s column and values start
    • “column list” contains column indexes and is indexed by the row list
    • “value list” contains values and is indexed by the row list
  • to access a row i retrieve indexes = range(row_list[i], row_list[i+1]), then build row’s list of list representation row_lol = [(i, column_list[j], value_list[j]) for j in indexes]
  • fast matrix to vector multiplication (product) thus useful for KNN
  • fast CSR + CSR, CSR * CSR, CSR * Dense
  • fast row slicing
  • slow transpose
  • converting to Scipy’s CSR with tocsr method

CSC: Compressed Sparse Column format

  • similar to CSR except columns and rows switch their role

Other formats

Block Sparse Row format, Diagonal, …

How to Use Sparse Matrices

You can use Scipy library for sparce matrixes. The main image is from this study.

Created on 20 Mar 2022.

Let's connect

Privacy Policy How many days left in this quarter? Twitter Bullet Points to Copy & Paste