Why Taxonomies?
- map concepts into a hierarchy, usually tree
- parents are related but more general or more abstract
- multiple relation types possible
- abstraction is information compression
- simplifies user navigation via categorization
Google Product Taxonomy (Viewer)
Click on the green nodes below to Google Product Taxonomy interactively. The second click collapses the children.
Loading taxonomy viewer…
Examples:
- Animals & Pet Supplies > Pet Supplies > Bird Supplies > Bird Cage Accessories > Bird Cage Bird Baths
- Home & Garden > Kitchen & Dining > Tableware > Serveware Accessories > Punch Bowl Stands
- Sporting Goods > Outdoor Recreation > Outdoor Games > Platform & Paddle Tennis > Platform & Paddle Tennis Paddles
Shopify Taxonomy
- uses Google Product Taxonomy
- published scalable product categorization
Pinterest Taxonomy
- Interests are organized into Pinterest taxonomy
- Advertizeres use taxonomy to target ad campaigns
- In 2020:
- 11000 nodes/edges
- 7 levels deep
- 100% expert curated
- nodes have 300dim feature embedding
- trained StarSpace-like
- Examples:
- Food & Drink > Drinks > Wines
- Use custom software for curation
- Pins and Pinners are categorized into the taxonomy to allow recommendation.
- together called Taste Graph
Other papers on use of taxonomies
Sherlock recommender
- “Casualness” of skirts has different visuals to casualness of “flip-flops”
- taxonomy sub-trees use different projections of visual embeddings
- Visual projections are shared from parents to children via stacking
Taxonomy-Aware Multi-Hop Reasoning Networks for Sequential Recommendation
- taxonomy data as structural knowledge to instruct the learning of our model
- learn a unique preference representation corresponding to each level in the taxonomy based on her/his overall sequential preference
Pininterest’s automatic taxonomy expansion
- Pinterest motivation:
- 8 curators added 6000 new taxonomy nodes in a month
- compared to 100M of new pins every month
- Interests = textual phrases describing concepts (mid-century architecture)
- textual embeddings available
- automatically find new node parent
- handles multiple relationships: is-type-of, is-in
- construct embeddings for unseen nodes
- minimizes an upper-bound on the shortest-path distances between the predicted and actual taxonomy parents
- SoTA and ablation study
- Code: available
- Data: no new published
- Name: Arborist
Building Taxonomy
- is parent retrieval problem
- Given taxonomy and a query node
- Rank true parents high
- Rank short-path to true parent high as well
Parental Retrieval Examples
- luxor: africa travel, european travel, asia travel, greece
- 2nd month baby: baby stage, baby, baby names, preparing for baby
- depression: mental illness, stress, mental wellbeing, disease
- ramadan: hosting occasions, holiday, sukkot, middle east and african cuisine
- minion humor: humor, people humor, character humor, funny
Modeling
Setup
- each node \( u \) of the dataset has a feature vector \( e_u \in \mathbb{R}^{d} \)
- choose natural number for hyper-param \( k \in \mathbb{N} \)
- \( k \) linear relatedness matricies \( M_i \in \mathbb{R}^{d \times d} \)
- to each node \( u \) assign \( k \) learnable weights \( w_{i, u} \in \mathbb{R} \)
- define relatedness score from child \( u \) to candidate parent \( v \):
\( s(u, v) = e_u^\intercal \sum_{i = 0}^{k} w_{i, v} M_i e_v \)
Training
Loss
- similar to triplet loss
- we want \( s(child, parent) > s(child, nonparent) + \gamma \)
- loss \( \sum max(0, s(child, nonparent) + \gamma \) \( - s(child, parent) ) \)
- How to choose margin \( \gamma \) and scalable sum?
Dynamic Margin
- set \( \gamma \) to shortest path to the true parent
- minimizing leads to upper-bounding of the path from predicted to parents
- curator needs to only move node in the neighborhood
Negative sampling
- cannot sum over all
- easy samples slow convergence
- use embedding-similarity-weighted negative-sampling
\( \mathrm{Pr}(v’, (u, v)) \propto e_u^\intercal e_{v’} \)
Evaluation
Datasets:
- Public: Mammal & SemEval
- FastText embeddings
- Pinterest
- Custom 300dim text embedding - StarSpace-like
Metrics:
- mean reciprocal rank (MRR)
- = the multiplicative inverse of its rank in the predicted parents list
- if multiple parents => the reciprocal of the highest ranked true parent
- 0% (worst) to 100% (best)
- Recall@15
- SPDist = shortest-path distance in the taxonomy
SemEval | Mammal | ||||||||
---|---|---|---|---|---|---|---|---|---|
MRR (%) | Recall@15 (%) | SPDist | MRR (%) | Recall@15 (%) | SPDist | MRR (%) | Recall@15 (%) | SPDist | |
Vec-Concat | 41.831 | 64.671 | 3.816 | 20.992 | 33.155 | 3.474 | 14.995 | 30.726 | 4.274 |
Vec-Sum | 33.891 | 62.548 | 4.124 | 17.803 | 27.607 | 4.047 | 19.611 | 38.175 | 4.186 |
Vec-Diff | 41.185 | 67.699 | 3.494 | 18.514 | 30.949 | 4.163 | 31.386 | 46.182 | 3.674 |
Vec-Prod | 42.233 | 68.743 | 3.144 | 17.483 | 31.083 | 4.178 | 32.177 | 48.976 | 3.665 |
CRIM | 53.223 | 79.325 | 2.393 | 41.691 | 62.064 | 2.743 | 21.345 | 52.700 | 4.080 |
Arborist | 59.044 | 83.606 | 2.220 | 43.373 | 67.694 | 2.864 | 29.354 | 61.639 | 3.225 |
Sources
Meet Other ML Enthusiasts One-on-One Online
Video-call each week an interesting person in the machine learning field and break out of your remote isolation.
Network One-on-One Around Your Online Village with RandomMeets.
Or create a room of your own for you company or friend group and invite them to meet each other!