#usda#postgres#bulk#fooddata-central#self-hosting

USDA FDC bulk CSV → Postgres for offline lookups

When the API isn't enough: load the entire FDC into Postgres in 20 minutes.

By Alex Brennan · Published February 8, 2026

Why bulk

The USDA FDC API gives you 1,000 requests per hour per IP, free. Generous for one user. Not generous for:

A self-hosted tracker that does search-as-you-type.
A small homelab serving multiple users.
Anything offline.

The FDC bulk download gives you the entire database, refreshed monthly, free, no rate limit. About 6 GiB compressed, ~14 GiB uncompressed. Loads into Postgres in about 20 minutes on modest hardware.

What you download

From fdc.nal.usda.gov/download-datasets.html:

FoodData_Central_csv_2025-10-31.zip (or whatever the current dump is). Contains:
- food.csv — base food records with FDC IDs, descriptions, data types
- nutrient.csv — the master list of nutrients
- food_nutrient.csv — the (food, nutrient, amount) tuples (the big one)
- branded_food.csv — extra fields for branded products
- Several smaller files for Foundation, Survey, etc.

The bulk dump is updated roughly monthly. You don’t need to refresh it daily — for our use case quarterly is fine.

Schema

A practical schema that maps the CSVs into Postgres:

CREATE TABLE food (
    fdc_id INTEGER PRIMARY KEY,
    data_type TEXT NOT NULL,
    description TEXT NOT NULL,
    food_category_id INTEGER,
    publication_date DATE
);

CREATE TABLE nutrient (
    id INTEGER PRIMARY KEY,
    name TEXT NOT NULL,
    unit_name TEXT NOT NULL,
    rank INTEGER
);

CREATE TABLE food_nutrient (
    id INTEGER PRIMARY KEY,
    fdc_id INTEGER NOT NULL,
    nutrient_id INTEGER NOT NULL,
    amount NUMERIC,
    FOREIGN KEY (fdc_id) REFERENCES food(fdc_id),
    FOREIGN KEY (nutrient_id) REFERENCES nutrient(id)
);

CREATE TABLE branded_food (
    fdc_id INTEGER PRIMARY KEY REFERENCES food(fdc_id),
    brand_owner TEXT,
    gtin_upc TEXT,
    serving_size NUMERIC,
    serving_size_unit TEXT,
    branded_food_category TEXT
);

That’s the minimum useful surface. The full schema has 40+ tables; we don’t need most of them.

Loading

Unzip the dump:

unzip FoodData_Central_csv_2025-10-31.zip -d /tmp/fdc/

COPY from CSV (run from psql):

\copy food                FROM '/tmp/fdc/food.csv'           WITH CSV HEADER
\copy nutrient            FROM '/tmp/fdc/nutrient.csv'       WITH CSV HEADER
\copy food_nutrient       FROM '/tmp/fdc/food_nutrient.csv'  WITH CSV HEADER
\copy branded_food        FROM '/tmp/fdc/branded_food.csv'   WITH CSV HEADER

Sizes: food ~1.5M rows, nutrient ~700, food_nutrient ~28M, branded_food ~1.3M. The last two are the heavy ones.

Time: about 18 minutes total on a Pi 5 with NVMe storage. Closer to 8 minutes on a modest VPS.

Indexes

The defaults aren’t enough. Add at minimum:

CREATE INDEX idx_food_description_trgm ON food USING gin (description gin_trgm_ops);
CREATE INDEX idx_food_data_type ON food (data_type);
CREATE INDEX idx_food_nutrient_fdc ON food_nutrient (fdc_id);
CREATE INDEX idx_food_nutrient_nut ON food_nutrient (nutrient_id);
CREATE INDEX idx_branded_gtin ON branded_food (gtin_upc);

You’ll need pg_trgm for the trigram index (CREATE EXTENSION IF NOT EXISTS pg_trgm;). It makes search-as-you-type viable on the description column.

Index build time: about 4 minutes.

Sample queries

Barcode lookup

SELECT
    f.fdc_id,
    f.description,
    bf.brand_owner,
    bf.serving_size,
    bf.serving_size_unit,
    (SELECT amount FROM food_nutrient fn 
     JOIN nutrient n ON fn.nutrient_id = n.id 
     WHERE fn.fdc_id = f.fdc_id 
       AND n.name = 'Energy' 
       AND n.unit_name = 'kcal') AS kcal_100g
FROM food f
JOIN branded_food bf ON f.fdc_id = bf.fdc_id
WHERE bf.gtin_upc = '038000138416';

p99 latency on the Pi 5: about 4 ms.

Search

SELECT fdc_id, description, data_type
FROM food
WHERE description % 'rolled oats'
  AND data_type IN ('Foundation', 'SR Legacy')
ORDER BY similarity(description, 'rolled oats') DESC
LIMIT 10;

The % operator uses the trigram index. p99 around 12 ms on our setup.

Macro panel for a food

SELECT n.name, n.unit_name, fn.amount
FROM food_nutrient fn
JOIN nutrient n ON fn.nutrient_id = n.id
WHERE fn.fdc_id = 173944
  AND n.name IN ('Energy', 'Protein', 'Carbohydrate, by difference', 'Total lipid (fat)')
ORDER BY n.rank;

p99 around 1 ms.

Wrapping it as a service

A tiny Flask service in front of this gives you a USDA-equivalent API on your LAN:

from flask import Flask, jsonify, request
import psycopg
app = Flask(__name__)
DSN = "postgresql://usda:secret@localhost/usda"

@app.route("/v1/barcode/<gtin>")
def barcode(gtin):
    with psycopg.connect(DSN) as c:
        with c.cursor() as cur:
            cur.execute("""
                SELECT f.fdc_id, f.description, bf.serving_size,
                       (SELECT amount FROM food_nutrient fn 
                        JOIN nutrient n ON fn.nutrient_id = n.id 
                        WHERE fn.fdc_id = f.fdc_id AND n.name = 'Energy' 
                          AND n.unit_name = 'kcal')
                FROM food f JOIN branded_food bf ON f.fdc_id = bf.fdc_id
                WHERE bf.gtin_upc = %s
            """, (gtin,))
            row = cur.fetchone()
    if not row: return ("", 404)
    return jsonify({"fdc_id": row[0], "description": row[1],
                    "serving_size": row[2], "kcal_100g": row[3]})

Behind a Caddy reverse proxy, this gives the rest of your homelab a stable internal endpoint. Sub-10ms responses. No rate limits.

Refresh strategy

Quarterly is enough for most uses. The Branded sub-database changes the most — manufacturers add SKUs continuously — but for a personal tracker the rate of change is smaller than the rate of human consumption changes.

We script refresh with:

Download the new dump.
Load into a _staging schema.
Atomic schema swap (BEGIN; ALTER ...; COMMIT;).
Drop the old.

Total downtime: under 30 seconds. Most of the work happens in the staging schema.

When this isn’t worth it

If you’re a single user with API-key-per-IP, the API is fine for a personal tracker that does maybe 200 lookups a day. Bulk loading is worth it when:

Multiple devices share one server.
You want offline-first.
You’re building a tool that does heavier search.
You want to do bulk analysis (notebook-style queries against the whole DB).

References

FDC bulk: fdc.nal.usda.gov/download-datasets.html
Postgres: postgresql.org
pg_trgm: postgresql.org/docs/current/pgtrgm.html
USDA FDC API getting started
Pi 5 nutrition stack