Vector embedding — Pith glossary

Back to glossary

A vector embedding is a fixed-length array of numbers that represents the meaning of a piece of text (or an image, or audio), produced by a neural network so that semantically similar inputs produce numerically close vectors.

Why it matters

Embeddings are the substrate of modern AI retrieval. OpenAI's text-embedding-3-large produces 3072-dimensional vectors; Cohere's embed-v3 does similar. The dimensionality is a tunable knob — smaller embeddings are faster but less accurate.

The useful property of embeddings is **cosine similarity**: the angle between two vectors approximates how related their texts are. This is what powers semantic search, RAG retrieval, clustering (topic maps), and recommendation systems.

How Pith relates

Every Pith bookmark is embedded once at save time. Embeddings drive the topic map's clustering, the wiki's RAG retrieval, the search's semantic layer, and the auto-tag service's similarity scoring. See the Topic Map feature for the visualisation.

Why it matters

How Pith relates

See also