Software, Machine Learning, & Business
Vaclav Kosar's Software, Machine Learning, & Business Blog
Transformer Positional Embeddings and Encodings
How transformers encode information about token positions?
Transformer Embeddings and Tokenization
How transformers convert words and other objects to vectors and back.
Bits-Per-Byte and Bits-Per-Character
BPB and BPC are metrics used in compression and language modelling related to compression ratio.
Neural Data Compression
Lossless bit reduction with machine learning by minimizing cross-entropy. Examples: NNCP and TRACE models.
How Computers Understood Humans
Catch on with this 7-slide introduction to deep natural language processing of 2022, featuring TF-IDF, Word2vec, knowledge graphs, and transformers.
OpenAI's DALL-E 2 and DALL-E 1 Explained
Compare of text-to-image generation models DALL-E 1, 2, and understand related models VQ-VAE, CLIP, and GLIDE
Google's Pathways Language Model and Chain-of-Thought Prompting
PaLM, the largest model as of early 2022, outperforms average human on grade school logic and math (BIG-bench) by simulating reasoning steps.
Word Alignment for Sentence Similarity
Semantic similarity increases with similar semantic units of similar semantic contexts in the monolingual word alignment.
Sparse Matrix Why and When?
Sparce matrix formats like CSR, LOL, COO compress and speed up certain operations on mostly zero matrices
MassiveText Dataset introduced for pre-training of DeepMind's Gopher
Private diverse 10-lingual textual dataset composed of web, Github, news, Wikipedia, Books, C4.
Transformer's Self-Attention Mechanism Simplified
How transformer models like BERT and GPT work?
SRU++ Model Speeds Up Transformer with Simple Recurrent Unit
Reducing compute by combining RNN with self-attention from Transformer architecture.
7 Powers' Moats Through the Lens of DiBello's Business Mental Model
An insight into Helmer's enterprise moats via dimensions of demand, supply, and capital.
DeepMind's RETRO Retrieval-Enhanced Transformer
Retrieval-Enhanced Language Model cross-attends trillions of tokens for SoTA on Wikitext103 and The Pile with 25x fewer parameters.
Cross-Attention in Transformer Architecture
Cross-attention is a way to merge two embedding sequences e.g. image with text.
Ten Commandments for Business Failure Book Summary
With a foreword from Warren Buffett, this short book is worthy of skimming.
Manipulate Item Attributes via Disentangled Representation
Using attribute-specific subspaces for image manipulation retrieval, outfit completion, conditional similarity retrieval.
ELECTRA - How to Train BERT 4x Cheaper
Reducing training flops 4x by GAN-like discriminative task compared to RoBERTa-500K transformer model.
Expire-Span: Scaling Transformer by Forgetting
Reducing computational costs by differentiably dropping memorized embeddings from self-attention context.
Scout Mindset Book Summary
Short summary of a book from Julia Galef on clear thinking.
Quilt Data Versioning Review & How-to
How to version data using Quilt data for Python on AWS S3 for machine learning.
Wav2vec: Semi and Unsupervised Speech Recognition
Audio Word2vec Guide - Quantizes phonemes, transforms, GAN trains on text and audio.
PID Controller: A Simple Control Loop Mechanism
Proportional–integral–derivative controller calculates feedback to reduce the error in the next step.
DreamCoder: Wake & Sleep Program Learning
Learning to code by growing function library, fantasising coding tasks, and training neural search.
Google Product Taxonomy Viewer
Interactively explore Google Shopping's and Shopify's categories to configure products in your feed.
Automatically Expanding Taxonomy
Pinterest's Arborist model finds parents for unseen textual nodes using triplet-loss, StarSpace embeddings, & shortest path.
Submodularity in Ranking, Summarization, and Self-attention
Diminishing returns with a budget constraint in problems of coverage and results diversification.
Feed-Forward, Self-Attention & Key-Value
Feed-forward layer is similar to cross-attention as observed in SwiGLU and All-attention.
Lambda Networks Transform Self-Attention
Is Lambda Layer similar to self-attention in a Transformer? What gives LambdaNet its power? LambdaResNet beats EfficientNet but does it loose to Performer?
Performers FAVOR+ Faster Transformer Attention
The Performer model attention approximation has linear complexity in contrast to square and outperforms Linformer.
Double Descent Contrary to Bias-Variance Trade-Off
Increasing model's parameter count leads to multiple test loss peaks and achieving global minima in the overparameterized regime.
To What Python Number Types Does json.loads Parse?
JSON specifies only a number value, so how to infer the correct type between int and float? How are NaN and Infinity handled?
Brutalist and Modernist Architectures Collide at Sunshine Plaza
Take a tour of the surprising merge of a modern and soviet era design on a Prague's public square with this photo album.
Word Mover's Embedding: Cheap WMD For Documents
What is Word Mover's Embedding for documents and how it approximates Word Mover's Distance between documents.
Transfigure Stress into Energy by Drawing on Research
Your pounding heart and blush will announce a flashing opportunity instead of an impending fight or flight after applying research from this post.
OpenAI's Glow - Flow-Based Model Teardown
Interpretable latent representations by composing non-linear invertible functions and maximizing the exact log-likelihood.
BentoML vs Cortex - ML Serving Showdown
To find the best model serving tool, compare open-source MLOps platforms BentoML and Cortex.
StarSpace - Embeddings For Documents, Users, and Words
Create vectors of various entities in a single space with this general-purpose embedding model from Facebook AI.
Thinkpad P52 Disassembly For Repaste, RAM Upgrade, Or Anything Else
Repaste, max RAM, or install antenna into Thinkpad P52 with these links and tips for full disassembly.
Python Context Manager Exception Handling and Retrying
Wrap your resource into a context manager with-statement to catch, handle exceptions, and close the resource.
Result Diversification in Web Search and Recommenders
Increase coverage in web search and recommendation via re-ranking diversification factor
I read papers on a podcast
To improve my pronunciation and speech, I read mostly Machine Learning scientific papers on a podcast.
Learn faster with a generated quiz
Reduce your effort of creating and revising learning material using a free AI-powered tool.
Thinkpad P53 vs P52 Thermals: Any Improvement?
Is Thinkpad P53 the cooler brother of P52?
Constant 1D Kalman Filter Is Exponential Or Cumulative Average
In one dimension and with constant measurement uncertainty and process noise, the filter converges to cumulative average or exponential average.
FastText Word Embeddings
How FastText works, word embeddings, ngrams, OOV words, and visualize embedding norms.
Highly Compressed Richard Hamming's Lectures
Get inspired by Hamming's lectures compressed into tiny downloadable files.
Thinkpad P52 vs ZBook 15 G5 vs Precision 7530
This is my experience working on the best mobile workstation of 2019 with specs matching Thinkpad P52 and ZBook 15 G5.
Spline: Data Lineage For Spark Structure Streaming (2018)
Vaclav Kosar and Marek Novotny presentation at Spark N AI Summit 2018 of a POC of Structured Streaming data lineage tool.
Debounce In Bash To Fix Lenovo Touchpad And Trackpoint Lost Sync
Another functional programming tip for Bash.
My First Contribution To A Major OSS Project Apache Spark
Finally my rather small pull request was merged into master of Apache Spark!
How To Create Custom Ubuntu Web Link App
Turn any web page into an Ubuntu application and prevent the web owner from tracking you around the web.
Modern Config Injection In Maven Plugins
Maven Mojo constructor injection of config parameters via Guice JSR-330 support.
Walking Desk: Cheap And Tiny
A review of my motor-less walking desk setup.
Boundary Control Entity Architecture Pattern
BCE is a source code structure pattern sometimes called ECB, EBC, Hexagonal, Onion, or Clean architecture.
Spring Integration Highlights - message driven architecture
Get familiar with Spring Integration implementation of Enterprise Integration Patterns and compare it to Java 8 Streams and RxJS.
Fish Roe vs Fish Oil
Healthiness and price of a salty delicacy versus oily softgels. EPA, DHA, Neu5Gc.
Restore Missing Punctuation with Keras Convolutional Text Punctuator
Simple neural network android app for restoring punctuation in text e.g. YouTube subtitles.
Easy Online Independence: Mail Backup, File Synchronization
Cheap way to increase your independence from the online giants with Syncthing, mbsync, Thunderbird.
Generic Class Name Signals Low Cohesion
Why and how to avoid non-specific class names like util, utils, or helper?
How to Structure Code
Localize Related, Inline over Extract, Specific over Generic. My view partially based on Carmack, Jonathan Blow, and Adam Bien's posts.
Is $15 USB Microscope Enough For You?
See yeast cells and pond water critters paying just $15 for an USB microscope.
Linux Text To Speech Comparison: Flite Vs Pico2Wave Vs Festival
Comparison of open-source text to speech (TTS) software in terms of pleasantness, comprehensibility, and modularity.
Try This Sped-Up Classical Music Attuned For Today's Sped-Up Age
Are you attracted to complexity and nobility of classical music, but deterred by its slow pace?
Drone Detecting White Marker for Stabilization
On a hackathon I implemented trivial image processing algorithm to locate white piece of paper on grey carpet floor to be used for drone horizontal stabilization.
Functional ForEach In Bash
Don't you hate verbosity of Bash's while-do statements when writing in-line scripts? No worries, you can improve on that!
Obsolete Git Branch Remover Maven Plugin
Having many branches left behind, abandoned, never deleted? How do you deal with them? We had same problem and I developed a automated DevOps solution.
How To Boost Your Jog Morale Using Military Cadence And Run Farther
Run beyond your max with this professional mind hack.
GitFlow Incremental Builder - Speed up your multi-module Maven build
Incrementally build only those modules that changed compared to a reference Git branch and all their dependents with this open-source Maven plugin.
Hamiltonians with constant spectral intervals and time-dependent perturbation
On quantum systems determined by time-dependent Hamilton operators. Family of quantum systems, whose Hamilton operators take form H(t) = H 0 + V (t), where V (t) is perturbation and H 0 is self-adjoint with pure-point spectrum and constant gaps between eigenvalues in spectrum σ(H 0 ).
Feynman summation in finite-dimensional quantum mechanics
A summary and enhancement of existing literature regarding finite-dimensional quantum mechanics. In the later parts Feynman’s path summation is discussed.
Simulation of Soft Photon Calorimeter
Understand how an electromagnetic calorimeter works presented at Dubna JINR 2011.
Transverse momentum spectra and correlations in the blast wave model with resonances
This work provides a review of theories of properties of high energy density matter originating in heavy-ion high energy collisions (GeV/nucleus).
Sort by category
How many days left in this quarter?
Twitter Bullet Points to Copy & Paste