Amex Expense Processing · Automation Architecture

1. Requirements (credentials & access)

Everything I need handed over before development can start.

Data samples + tooling access

Item	Status	Notes
RPL sample (Kim, Cara, Garrett)	Received - `RPL 06 2026 AMEX Kim, Cara, Garrett.xlsx`	21 columns, ~126 rows. Multi-cardholder layout, one subsection per RPL cardholder. Includes raw Amex fields + Tiffany's enrichment (Vendor ID, GL Account, Description).
Wholesale sample (accounting-team-facing)	Received - `Amex Wholesale JUNE 2026.xlsx`	Three sheets, one per week (05.26-06.07, 06.13, 06.20). Sage-ready 5-column format: Invoice # (= Reference #), Invoice Date, Card holder name & Comments, Invoice Amount, GL Account.
Master vendor + GL reference	Received - `AMEX Where we book.xlsx`	THE booking bible. 8 sheets: Amex booking (354 vendor -> GL map), AP Expensing (main account list), Departments-Territories (territory codes), Original Account Cheat Sheet, Sage COA Export (11,507 active account-territory combos), Department Pivot, Sean File, No Longer in use. Effectively the entire lookup table needed to auto-classify.
Cardholder default mapping	Received - embedded in `AMEX Where we book.xlsx` - Amex booking sheet	60 cardholders with default account + territory suffix (e.g., AMY HARI -> 63000/AZ1, APRIL TAPLEY -> 63400/9340, ELLIOT -> his default). Falls back to sensible default when no vendor hit.
Sage chart of accounts	Received - Sage COA Export sheet	11,507 active account-territory combos in the format `60400-CO1` (Merchandising Expense : Colorado). Authoritative validation set for any GL code the classifier emits.
ChatGPT chat history export	Pending from Tiffany	Nice-to-have. Not blocking - we now have the underlying lookup logic that ChatGPT was proxying.
Raw Amex weekly statement (unprocessed)	Pending	One recent statement exactly as Karen delivers it Monday morning, before any of Tiffany's work. Anchors the input format. Blocker for wiring the ingest side end-to-end.
One past processed weekly workbook	Pending	A full week that Tiffany has run through ChatGPT + finalized, with Columns O, P, Q populated. Lets us verify the classifier matches her output exactly.
Payables VI job template	Pending if we pursue back-door import	Field mappings for the first Sage import step.
Cash Clearing VI job template	Pending if we pursue back-door import	Field mappings for the second Sage import step (this is where the bad-line collisions happen).
Sage sandbox / test environment	Pending	Or a documented dry-run procedure if no sandbox exists.
Amex direct-feed / API access	Only if we consider the QuickBooks path	Requires the credit-card module in the accounting system.

What the reference data unlocks

The AMEX Where we book.xlsx file is not just documentation - it is the actual classifier. Every classification decision comes down to three lookups against these tables:

Vendor match - the incoming charge's vendor string ("AMAZON MKTPL*Y23456", "FACEBK", "ADOBE") is matched against the 354 vendors in the Amex booking sheet, which yields the vendor ID + base GL account.
Cardholder default - the cardholder on the charge (AMY HARI, ELLIOT, TANNER, etc.) is matched against the 60 cardholder defaults in the same sheet, which yields the territory suffix (AZ1, 9340, 1000, etc.).
Sage COA validation - the combined account-territory string is validated against the 11,507 active combinations in the Sage COA Export sheet. Invalid combos get flagged before they can hit Sage.

Anything that fails all three lookups is a novel vendor and gets kicked into the Teams clarification queue for Tiffany to answer once, after which the new mapping is written back to the reference data and never asked again.

Decisions / sign-offs needed

Item	Why we need it
Sage back-door import - yes or no?	Same approach Nestor and Terran already use for another process. Skips the VI-job UI entirely, cutting the 2-hour posting step.
QuickBooks migration - in scope?	QuickBooks supports direct Amex feed + AI classification, which bypasses the whole VI/AP problem. But it forces a credit-card-module reconciliation change downstream.
Cardholder follow-up email tone + sender	Auto-emails go out asking employees to clarify unknown charges. Should they come from Tiffany, a shared inbox, or a system address?
Novel-vendor confidence threshold	Below what confidence score should a charge go to human review vs. be auto-classified? Sets the human-in-the-loop gate.
Matthew reconciliation - Phase 1 or Phase 2?	Phase 2 today per the meeting; confirm we defer or pull it forward.

Infrastructure on our side

Item	Status
Postgres database for classification history + vendor lookup + audit log	To provision
Object storage for the raw + processed CSVs (audit trail)	To provision
Secret Manager (Sage / Amex / SMTP credentials)	To configure
OpenAI / Anthropic API key for classification	Already have

2. Two workflows

What looks like one weekly Amex bill is actually two workflows running back-to-back on different people. Tiffany's processing happens first; Matthew's reconciliation cleans up after. Each has to be automated separately, and Phase 2 is deferred to the follow-up meeting.

Phase	Owner	Cadence	What it produces
1. Weekly Amex processing	Tiffany	Monday - Thursday, every week; whole thing in one day at month-end	Report posted into Sage via AP, plus RPL and retail sub-reports
2. Reconciliation	Matthew	Monthly	Reconciled Amex activity in the accounting books

Combined effort is roughly 8 hours per week across the two people (before Tiffany's ChatGPT layer trimmed some of hers). That is the number we are trying to bring down.

3. Phase 1 - Weekly Amex processing

Tiffany's current weekly cadence. Karen sends the statement Monday, Tiffany processes it Mon-Wed, chases clarifications by Thursday, then does the double Sage import (Payables VI, then Cash Clearing VI) with a manual bad-line fix in between. The Sage posting alone eats about two hours.

Phase 1 - step by step

The full 10-step flow from Karen's Monday email to files saved and everything reconciled.

Rendering…

Where the time actually goes

Step	What happens	Pain point
1. Receive statement	Karen pulls the report and sends it to Tiffany every Monday morning, with a 2-week rolling window so nothing falls through the cracks.	None currently. Dedup handles the overlap.
2. Dedup	Compare reference numbers against last week's report, delete duplicate charges.	Solved by Karen's consistent pull + Tiffany's formula.
3. Organize + ChatGPT classify	Organize the report, then run it through ChatGPT with the prompt: "Please fill in Columns O, P, and Q on file [name] the best you can from information you have from past workbooks uploaded." ChatGPT has all past workbooks in memory. On a recent run: 435 of 501 lines auto-filled (411 GL accounts correct, 394 descriptions correct).	Novel vendors trip it up. Different GL per cardholder for the same vendor (e.g., Facebook ads for Tanner vs. Jake vs. Karen) is the tricky pattern.
4. Double-check + highlight	Review all prefilled data, add anything missing, highlight anything new or questionable for the follow-up step.	Manual QA over every ChatGPT output. Depends heavily on Tiffany's memory of past classifications.
5. Teams clarification loop	By Wednesday, Tiffany sends Teams messages about the questionable charges. Most common: Amazon charges, plus many random new vendors. Responses expected by Thursday to post on time.	Waiting on humans. Some cardholders need 2-3 nudges. Bottleneck for the whole week.
6. Build the RPL sub-file	Locate all Cara, Garrett, and Kim charges and add them to Karen's spreadsheet (the accounting-team file in Teams). This is the RPL workbook.	Manual filter + copy step. Multi-cardholder layout (one subsection per person) means it can't be a straight paste.
7. Payables VI → Sage	Add the classified data to the Payables VI template, run it through Sage. This is the AP-side import.	First half of the ~2-hour Sage posting.
8. Build the retail sub-file	Separate all retail-related charges and copy them over to the retail spreadsheet in Teams/Accounting.	Second manual filter + copy pass on the same source file.
9. Cash Clearing VI → Sage	Copy the data over to the Cash Clearing VI and run it through Sage.	Second half of the ~2-hour Sage posting.
10. Bad-line fix + re-run	Sage uses only the last 5-6 digits of the Amex reference number as a check number. Collisions are frequent on 500+ transactions. Fix: replace the last digit with "A" to force uniqueness, then re-run the Cash Clearing VI.	Extends the 2-hour Sage step to 3+ hours whenever a collision hits. Also the reason the field format on the wholesale sheet has that `A320261520370314908`-style prefix - it is a Tiffany-side collision workaround, not the raw Amex reference.
11. Verify + save	Double-check everything ran, make sure both sub-files (RPL for Karen, retail for accounting) are saved in the correct Teams locations.	Housekeeping. Fine, but every extra manual save is a place a wrong version could ship.

What we now know about the output format

From the samples Tiffany sent, the wholesale output that goes into the Cash Clearing VI has exactly 5 columns:

Column	Example	Meaning
`Invoice #` (= Ref #)	`A320261520370314908`	Amex reference number, prefixed with "A" for Sage-side uniqueness handling.
`Invoice Date`	2026-05-31	Transaction date.
`Card holder name & Comments`	`AMY HARI_FLYERS`	Format is `NAME_TAG`. The tag after the underscore is Tiffany's category shorthand (`_FLYERS`, `_FOOD`, `_GIFT CARD FOR STORE`, `_COPIES`, `_SPROUTS`, etc.).
`Invoice Amount`	118.05	Dollar amount.
`GL Account`	`63000AZ1`	Base account code (5 digits) + location suffix (`AZ1` = Arizona location 1, etc.). The classifier must produce a valid GL + location combination.

The RPL file is richer (21 columns): it carries the raw Amex fields (Date, Card Member, Account #, Amount, Extended Details, Appears On Your Statement As, Reference, Category) plus Tiffany's enrichment (Vendor ID, GL Account posted to, Description). Multi-cardholder layout with one subsection per RPL cardholder.

4. Phase 2 - Reconciliation (deferred)

Once Tiffany posts, Matthew on the accounting team spends significant additional time reconciling everything against the statement in Sage. This was flagged as "a boatload of time" in the discovery meeting but was deferred to a dedicated follow-up with Matthew.

What we know so far: Matthew's work is monthly, happens after Tiffany posts, and mostly consists of reconciling entries against the actual Amex statement in Sage. Everything else about his process is TBD until the Matthew meeting.

Once we have Matthew's transcript, this section becomes a mirror of Phase 1: a step-by-step diagram, a table of pain points, and a mapping to what the automation can absorb.

5. V1 scope

Two essentials that go in no matter what, then a menu of optional pieces to pick from for V1 vs. V2.

The essentials (going in regardless)

Take the classification off ChatGPT. Move Tiffany's "upload workbook + say do the same thing you did last time" prompt into a controlled pipeline: raw statement in, classified statement out, historical workbook context injected as retrieval, novel-vendor cases flagged by confidence score. Output must exactly match today's format (5-column wholesale for the Cash Clearing VI, 21-column multi-cardholder RPL for Karen). Same output Tiffany produces today, but auditable, repeatable, and faster.
Pre-import collision detection + "A" prefix / suffix handling. Detect duplicate check-number collisions on the last 5-6 digits of the reference numbers before either VI runs, and auto-apply the "append A" workaround. Also emit the invoice numbers in the current A-prefixed format the wholesale VI already expects (A320261520370314908). No more re-running the Cash Clearing VI because of a single collision.

The optional pieces - which do we want in V1?

Automated Teams follow-up messages for unclassified or low-confidence charges - close the Wed-Thu clarification loop without Tiffany chasing anyone. Amazon and other novel vendors are the common case.
Auto-generate the RPL sub-file (Cara, Garrett, Kim) into Karen's Teams spreadsheet in the exact multi-cardholder layout she uses today.
Auto-generate the retail sub-file into the Teams/Accounting spreadsheet.
Sage back-door direct import (same approach as Nestor and Terran for the other process) - skip both VI-job UIs entirely, cut the 2-hour posting step down to a verify-and-confirm step. Covers both the Payables VI and the Cash Clearing VI.
GL + location constraint enforcement - classifier output is constrained to valid GL-account + location-suffix combinations (like 63000AZ1). No hallucinated codes.
New-vendor learning loop - when Tiffany corrects a novel-vendor classification manually, the fix gets folded back into the classifier so the next occurrence is auto-handled.
QuickBooks direct Amex feed evaluation - if we move to QuickBooks, the direct feed + built-in AI classification kills the VI path entirely. Big change downstream (credit-card module, reconciliation shifts).
Matthew's reconciliation automation (Phase 2) - defer until after his meeting, or pull forward if the shape becomes clear early.

Mark each as V1, V2, or skip entirely. That gives me the shape of what to price.

6. Can we start building now?

Yes - the middle piece (the classifier that fills Columns O, P, Q) can be built this week. Between the three files Tiffany already sent, we effectively have the entire brain of the operation.

What we have (enough to start)

354-vendor lookup table mapping vendor string -> vendor ID + base GL account.
60-cardholder default map giving each cardholder their default territory + account.
11,507-row Sage COA to validate any generated account-territory combo before it can hit Sage.
87-row territory catalog mapping suffix codes to descriptions + rollups + channels.
Two output-format samples: the 5-column wholesale sheet (Cash Clearing VI target) and the 21-column RPL sheet (Karen's target).
The exact classification prompt Tiffany has been running through ChatGPT.

What is still missing (nice to have, not blocking to start)

One raw Amex weekly statement as Karen delivers it - so we can wire the ingest side end-to-end. Until we have it we can build against a synthetic version derived from the wholesale sample.
One fully finalized past weekly workbook (Columns O, P, Q populated) - to regression-test our output against Tiffany's ground truth.
The two VI job templates (Payables + Cash Clearing) - only needed when we go beyond the middle piece into the Sage import step.

Proposed sequence to ship the middle piece

Day 1-2: Ingest the reference data into a small local database (Postgres or SQLite). Vendor table, cardholder-default table, Sage COA. Add fuzzy-match on vendor strings (Amex descriptors are noisy - AMZN MKTP US*Y23456 etc.) with confidence scoring.
Day 2-3: Build the classifier function - takes a row (cardholder, vendor string, amount, date) and returns (vendor ID, GL account, description, confidence). Runs against synthetic input derived from the wholesale sample.
Day 3-4: Wire the raw Amex statement in as soon as Tiffany sends it. Verify output against the finalized past workbook.
Day 4-5: Add the novel-vendor Teams follow-up loop and the reference-data write-back so the classifier learns from every clarification.

Rough shape: the middle piece alone is a 5-7 day build with what we have today. The other pieces (auto-generating the two sub-files, Sage back-door import, novel-vendor learning loop) sit on top of that classifier and are independent workstreams we can layer in based on the V1 checkbox choices in Section 5.

Do we need anything from anyone else to start? No. I can start on the classifier core today with the three files already in hand. Tiffany sending the raw weekly statement + one past finalized workbook is the fastest path to a real end-to-end test, but neither is blocking the build itself.