Index / Work
Prices — Croatian grocery price aggregation API Live site ↗
Data pipeline & API

PricesCroatian grocery price aggregation API

A headless data service that aggregates daily-published Croatian grocery prices into a single queryable API. A Laravel ingestion pipeline normalises raw retailer price files into a canonical product catalog, tracks every price change over time, and answers basket, price-history and stores-near-me queries — the data backbone behind the Household app's price comparison.

Prices is the data backbone behind the Household app’s price comparison, and it’s built like infrastructure rather than a product: a Laravel 13 API with no consumer UI, backed by PostgreSQL and PostGIS. Under the 2025 Croatian price-publishing law, large retailers publish daily machine-readable price files; this service ingests them, reconciles them into one catalog, and serves the result over a small authenticated API.

The interesting work is in the pipeline and the data model. A per-chain adapter turns each retailer’s format into a common DTO, and an adapter-agnostic importer bulk-upserts into a canonical catalog — 23.6k products resolved by barcode across 21 chains, with a classification pass mapping messy chain names onto normalized categories. Every price change is appended to a month-partitioned history table, and 294 geocoded stores make “cheapest at my stores” a real PostGIS radius query rather than a guess.

The admin is deliberately utilitarian — a watch-it-yourself view of crawl runs and classification, not a product. That’s the point: Prices exists to keep one job done well, and to hand clean data to Household over an honest API boundary. The counts and claims above are read straight from the source and the production database, not estimated.

RoleArchitecture + full-stack
Year2026
Core stackLaravel 13 · PostgreSQL + PostGIS
SurfaceAPI-only + utilitarian admin
Consumed byHousehold
Utilitarian admin dashboard showing crawl runs, chain coverage and product classification counts
Ingestion pipeline

Raw retailer files in, one canonical catalog out

Each chain publishes a different price-file format under the 2025 Croatian price-publishing law. A per-chain adapter parses its format into a common PriceRowDTO; an adapter-agnostic Importer then bulk-upserts those rows into products, chain_products, prices and price_history in chunked transactions.

  • PriceSource interface with Konzum, Studenac and Tommy adapters live; new chains are a new adapter, nothing else
  • Importer resolves products → chain_products → prices → history → yearly stats per ~1000-row batch
  • CrawlCommand orchestrates the daily run; CrawlRun + CrawlStatus enum record every import
Product identity

Matching the same product across 21 different catalogs

Every chain names and codes products its own way. Canonical products are keyed by barcode, with each chain's raw entry linked as a chain_product, and a classification pass maps messy chain names onto normalized categories — so a basket can compare like with like across retailers.

  • 23.6k canonical products resolved from per-chain product rows
  • ApplyClassificationCommand + batched SQL assign normalized categories with a confidence score
  • Manual-verification flag lets a human override low-confidence matches
Price history

Every price change, kept and queryable

Current prices live in a hot prices table; every observed change is appended to a partitioned price_history. Monthly Postgres partitions keep the history table fast as it grows, and a rotation command provisions upcoming months and drops expired ones on a schedule.

  • price_history range-partitioned by month; RotatePartitionsCommand maintains the window
  • Per-product, per-store history powers the Household price-history view
  • Yearly stats rolled up on import for fast long-range queries
Geospatial

Stores near me, by real distance

Stores carry coordinates as a PostGIS point, so a stores/near query returns shops within a radius ordered by actual distance — not a bounding box. This is what lets a consumer ask 'cheapest at my stores' rather than 'cheapest nationally'.

  • ST_DWithin + ST_Distance over geom::geography (SRID 4326) for true metric distance
  • 294 stores geocoded; the law's format gap is surfaced honestly, not hidden
Consumer API

A small, honest API behind a single token

The whole service is API-first: nine authenticated endpoints covering chains, stores, basket, product search, current prices, price history and a status feed. Household is the primary consumer, calling it over an authenticated HTTP boundary with no shared database.

  • basket, products/search, prices/history and stores/near are the Household-facing core
  • auth.api middleware gates every route with a single service token
  • Admin controllers back a deliberately utilitarian Vue monitoring view
Architecture

An ingestion pipeline feeding a partitioned Postgres store, exposed as a small authenticated API — no consumer UI by design.

Sources
Per-chain price files
Daily legally-published CSV / XML / API feeds, one bespoke adapter per chain, normalised to a common DTO.
Ingestion
Laravel Scheduler + Importer
CrawlCommand pulls each source, the Importer bulk-upserts in chunked transactions, CrawlRun records the outcome.
Data
PostgreSQL 18 + PostGIS
Canonical catalog, hot prices, month-partitioned price_history, geocoded stores, jsonb raw payloads.
API
Authenticated REST + admin
Nine token-gated endpoints consumed by Household; a thin Vue admin for monitoring crawls and classification.
Challenges solved
01

One product, twenty-one different names

Problem

Each retailer publishes its own product names, codes and category labels. Without a shared identity, a 'cheapest basket' comparison would be meaningless — the same milk looks like 21 different products.

Solution

Products are keyed canonically by barcode, with each chain's raw row linked as a chain_product. A classification pass (batched SQL + an Artisan command) maps chain names onto normalized categories with a confidence score and a manual-override flag, so comparisons line up across chains while leaving low-confidence matches reviewable.

02

A price-history table that doesn't rot as it grows

Problem

Tracking every price change across thousands of products and hundreds of stores means a history table that grows without bound — and slows every query as it does.

Solution

price_history is range-partitioned by month in Postgres. A scheduled RotatePartitionsCommand provisions upcoming partitions and drops ones past the retention window, so reads stay fast and old data ages out cleanly without a manual migration.

03

Honest coverage on an island

Problem

The 2025 law only covers large-format stores. On Korčula, the physical Konzums are all small-format and publish nothing — so naive 'no data' answers would be both wrong and unhelpful.

Solution

Stores are geocoded and queried by real PostGIS distance, and the model keeps the format gap explicit: the service can answer 'nearest published store' rather than pretending coverage it doesn't have. The honesty is built into the data, not bolted on at the UI.

04

A service, not a second database

Problem

Household needs grocery prices, but a large, daily-updated price dataset has no business living inside a home-management app's schema.

Solution

Prices is a standalone Laravel + Postgres service with its own ingestion and lifecycle. Household reaches it only through an authenticated HTTP API (basket, search, history, stores-near), so the two stay independently deployable and neither owns the other's data.

How it's built
Backend & API
Laravel 13, PHP 8.3, Sanctum (token), API Resources, Form Requests, PHP enums
Data
PostgreSQL 18, PostGIS 3.6, monthly partitioning, jsonb raw payloads, ILIKE / pg_trgm matching
Ingestion
Adapter-per-chain, DTO mapping, chunked upserts, Laravel Scheduler, Artisan commands
Infra & quality
IONOS VPS, Pest, Laravel Pint, single-token auth, monitoring admin (Vue 3)

The unglamorous half of a system — a price engine that just keeps the data clean and the API honest.