Amex Expense Processing · Automation Architecture


1. Requirements (credentials & access)

Everything I need handed over before development can start.

Data samples + tooling access

ItemStatusNotes
RPL sample (Kim, Cara, Garrett)Received - RPL 06 2026 AMEX Kim, Cara, Garrett.xlsx21 columns, ~126 rows. Multi-cardholder layout, one subsection per RPL cardholder. Includes raw Amex fields + Tiffany's enrichment (Vendor ID, GL Account, Description).
Wholesale sample (accounting-team-facing)Received - Amex Wholesale JUNE 2026.xlsxThree sheets, one per week (05.26-06.07, 06.13, 06.20). Sage-ready 5-column format: Invoice # (= Reference #), Invoice Date, Card holder name & Comments, Invoice Amount, GL Account.
Master vendor + GL referenceReceived - AMEX Where we book.xlsxTHE booking bible. 8 sheets: Amex booking (354 vendor -> GL map), AP Expensing (main account list), Departments-Territories (territory codes), Original Account Cheat Sheet, Sage COA Export (11,507 active account-territory combos), Department Pivot, Sean File, No Longer in use. Effectively the entire lookup table needed to auto-classify.
Cardholder default mappingReceived - embedded in AMEX Where we book.xlsx - Amex booking sheet60 cardholders with default account + territory suffix (e.g., AMY HARI -> 63000/AZ1, APRIL TAPLEY -> 63400/9340, ELLIOT -> his default). Falls back to sensible default when no vendor hit.
Sage chart of accountsReceived - Sage COA Export sheet11,507 active account-territory combos in the format 60400-CO1 (Merchandising Expense : Colorado). Authoritative validation set for any GL code the classifier emits.
ChatGPT chat history exportPending from TiffanyNice-to-have. Not blocking - we now have the underlying lookup logic that ChatGPT was proxying.
Raw Amex weekly statement (unprocessed)PendingOne recent statement exactly as Karen delivers it Monday morning, before any of Tiffany's work. Anchors the input format. Blocker for wiring the ingest side end-to-end.
One past processed weekly workbookPendingA full week that Tiffany has run through ChatGPT + finalized, with Columns O, P, Q populated. Lets us verify the classifier matches her output exactly.
Payables VI job templatePending if we pursue back-door importField mappings for the first Sage import step.
Cash Clearing VI job templatePending if we pursue back-door importField mappings for the second Sage import step (this is where the bad-line collisions happen).
Sage sandbox / test environmentPendingOr a documented dry-run procedure if no sandbox exists.
Amex direct-feed / API accessOnly if we consider the QuickBooks pathRequires the credit-card module in the accounting system.

What the reference data unlocks

The AMEX Where we book.xlsx file is not just documentation - it is the actual classifier. Every classification decision comes down to three lookups against these tables:

  1. Vendor match - the incoming charge's vendor string ("AMAZON MKTPL*Y23456", "FACEBK", "ADOBE") is matched against the 354 vendors in the Amex booking sheet, which yields the vendor ID + base GL account.
  2. Cardholder default - the cardholder on the charge (AMY HARI, ELLIOT, TANNER, etc.) is matched against the 60 cardholder defaults in the same sheet, which yields the territory suffix (AZ1, 9340, 1000, etc.).
  3. Sage COA validation - the combined account-territory string is validated against the 11,507 active combinations in the Sage COA Export sheet. Invalid combos get flagged before they can hit Sage.

Anything that fails all three lookups is a novel vendor and gets kicked into the Teams clarification queue for Tiffany to answer once, after which the new mapping is written back to the reference data and never asked again.

Decisions / sign-offs needed

ItemWhy we need it
Sage back-door import - yes or no?Same approach Nestor and Terran already use for another process. Skips the VI-job UI entirely, cutting the 2-hour posting step.
QuickBooks migration - in scope?QuickBooks supports direct Amex feed + AI classification, which bypasses the whole VI/AP problem. But it forces a credit-card-module reconciliation change downstream.
Cardholder follow-up email tone + senderAuto-emails go out asking employees to clarify unknown charges. Should they come from Tiffany, a shared inbox, or a system address?
Novel-vendor confidence thresholdBelow what confidence score should a charge go to human review vs. be auto-classified? Sets the human-in-the-loop gate.
Matthew reconciliation - Phase 1 or Phase 2?Phase 2 today per the meeting; confirm we defer or pull it forward.

Infrastructure on our side

ItemStatus
Postgres database for classification history + vendor lookup + audit logTo provision
Object storage for the raw + processed CSVs (audit trail)To provision
Secret Manager (Sage / Amex / SMTP credentials)To configure
OpenAI / Anthropic API key for classificationAlready have

2. Two workflows

What looks like one weekly Amex bill is actually two workflows running back-to-back on different people. Tiffany's processing happens first; Matthew's reconciliation cleans up after. Each has to be automated separately, and Phase 2 is deferred to the follow-up meeting.

PhaseOwnerCadenceWhat it produces
1. Weekly Amex processingTiffanyMonday - Thursday, every week; whole thing in one day at month-endReport posted into Sage via AP, plus RPL and retail sub-reports
2. ReconciliationMatthewMonthlyReconciled Amex activity in the accounting books
Combined effort is roughly 8 hours per week across the two people (before Tiffany's ChatGPT layer trimmed some of hers). That is the number we are trying to bring down.

3. Phase 1 - Weekly Amex processing

Tiffany's current weekly cadence. Karen sends the statement Monday, Tiffany processes it Mon-Wed, chases clarifications by Thursday, then does the double Sage import (Payables VI, then Cash Clearing VI) with a manual bad-line fix in between. The Sage posting alone eats about two hours.

Phase 1 - step by step

The full 10-step flow from Karen's Monday email to files saved and everything reconciled.

Rendering…

Where the time actually goes

StepWhat happensPain point
1. Receive statementKaren pulls the report and sends it to Tiffany every Monday morning, with a 2-week rolling window so nothing falls through the cracks.None currently. Dedup handles the overlap.
2. DedupCompare reference numbers against last week's report, delete duplicate charges.Solved by Karen's consistent pull + Tiffany's formula.
3. Organize + ChatGPT classifyOrganize the report, then run it through ChatGPT with the prompt: "Please fill in Columns O, P, and Q on file [name] the best you can from information you have from past workbooks uploaded." ChatGPT has all past workbooks in memory. On a recent run: 435 of 501 lines auto-filled (411 GL accounts correct, 394 descriptions correct).Novel vendors trip it up. Different GL per cardholder for the same vendor (e.g., Facebook ads for Tanner vs. Jake vs. Karen) is the tricky pattern.
4. Double-check + highlightReview all prefilled data, add anything missing, highlight anything new or questionable for the follow-up step.Manual QA over every ChatGPT output. Depends heavily on Tiffany's memory of past classifications.
5. Teams clarification loopBy Wednesday, Tiffany sends Teams messages about the questionable charges. Most common: Amazon charges, plus many random new vendors. Responses expected by Thursday to post on time.Waiting on humans. Some cardholders need 2-3 nudges. Bottleneck for the whole week.
6. Build the RPL sub-fileLocate all Cara, Garrett, and Kim charges and add them to Karen's spreadsheet (the accounting-team file in Teams). This is the RPL workbook.Manual filter + copy step. Multi-cardholder layout (one subsection per person) means it can't be a straight paste.
7. Payables VI → SageAdd the classified data to the Payables VI template, run it through Sage. This is the AP-side import.First half of the ~2-hour Sage posting.
8. Build the retail sub-fileSeparate all retail-related charges and copy them over to the retail spreadsheet in Teams/Accounting.Second manual filter + copy pass on the same source file.
9. Cash Clearing VI → SageCopy the data over to the Cash Clearing VI and run it through Sage.Second half of the ~2-hour Sage posting.
10. Bad-line fix + re-runSage uses only the last 5-6 digits of the Amex reference number as a check number. Collisions are frequent on 500+ transactions. Fix: replace the last digit with "A" to force uniqueness, then re-run the Cash Clearing VI.Extends the 2-hour Sage step to 3+ hours whenever a collision hits. Also the reason the field format on the wholesale sheet has that A320261520370314908-style prefix - it is a Tiffany-side collision workaround, not the raw Amex reference.
11. Verify + saveDouble-check everything ran, make sure both sub-files (RPL for Karen, retail for accounting) are saved in the correct Teams locations.Housekeeping. Fine, but every extra manual save is a place a wrong version could ship.

What we now know about the output format

From the samples Tiffany sent, the wholesale output that goes into the Cash Clearing VI has exactly 5 columns:

ColumnExampleMeaning
Invoice # (= Ref #)A320261520370314908Amex reference number, prefixed with "A" for Sage-side uniqueness handling.
Invoice Date2026-05-31Transaction date.
Card holder name & CommentsAMY HARI_FLYERSFormat is NAME_TAG. The tag after the underscore is Tiffany's category shorthand (_FLYERS, _FOOD, _GIFT CARD FOR STORE, _COPIES, _SPROUTS, etc.).
Invoice Amount118.05Dollar amount.
GL Account63000AZ1Base account code (5 digits) + location suffix (AZ1 = Arizona location 1, etc.). The classifier must produce a valid GL + location combination.

The RPL file is richer (21 columns): it carries the raw Amex fields (Date, Card Member, Account #, Amount, Extended Details, Appears On Your Statement As, Reference, Category) plus Tiffany's enrichment (Vendor ID, GL Account posted to, Description). Multi-cardholder layout with one subsection per RPL cardholder.


4. Phase 2 - Reconciliation (deferred)

Once Tiffany posts, Matthew on the accounting team spends significant additional time reconciling everything against the statement in Sage. This was flagged as "a boatload of time" in the discovery meeting but was deferred to a dedicated follow-up with Matthew.

What we know so far: Matthew's work is monthly, happens after Tiffany posts, and mostly consists of reconciling entries against the actual Amex statement in Sage. Everything else about his process is TBD until the Matthew meeting.

Once we have Matthew's transcript, this section becomes a mirror of Phase 1: a step-by-step diagram, a table of pain points, and a mapping to what the automation can absorb.


5. V1 scope

Two essentials that go in no matter what, then a menu of optional pieces to pick from for V1 vs. V2.

The essentials (going in regardless)

  1. Take the classification off ChatGPT. Move Tiffany's "upload workbook + say do the same thing you did last time" prompt into a controlled pipeline: raw statement in, classified statement out, historical workbook context injected as retrieval, novel-vendor cases flagged by confidence score. Output must exactly match today's format (5-column wholesale for the Cash Clearing VI, 21-column multi-cardholder RPL for Karen). Same output Tiffany produces today, but auditable, repeatable, and faster.
  2. Pre-import collision detection + "A" prefix / suffix handling. Detect duplicate check-number collisions on the last 5-6 digits of the reference numbers before either VI runs, and auto-apply the "append A" workaround. Also emit the invoice numbers in the current A-prefixed format the wholesale VI already expects (A320261520370314908). No more re-running the Cash Clearing VI because of a single collision.

The optional pieces - which do we want in V1?

Mark each as V1, V2, or skip entirely. That gives me the shape of what to price.


6. Can we start building now?

Yes - the middle piece (the classifier that fills Columns O, P, Q) can be built this week. Between the three files Tiffany already sent, we effectively have the entire brain of the operation.

What we have (enough to start)

What is still missing (nice to have, not blocking to start)

Proposed sequence to ship the middle piece

  1. Day 1-2: Ingest the reference data into a small local database (Postgres or SQLite). Vendor table, cardholder-default table, Sage COA. Add fuzzy-match on vendor strings (Amex descriptors are noisy - AMZN MKTP US*Y23456 etc.) with confidence scoring.
  2. Day 2-3: Build the classifier function - takes a row (cardholder, vendor string, amount, date) and returns (vendor ID, GL account, description, confidence). Runs against synthetic input derived from the wholesale sample.
  3. Day 3-4: Wire the raw Amex statement in as soon as Tiffany sends it. Verify output against the finalized past workbook.
  4. Day 4-5: Add the novel-vendor Teams follow-up loop and the reference-data write-back so the classifier learns from every clarification.

Rough shape: the middle piece alone is a 5-7 day build with what we have today. The other pieces (auto-generating the two sub-files, Sage back-door import, novel-vendor learning loop) sit on top of that classifier and are independent workstreams we can layer in based on the V1 checkbox choices in Section 5.

Do we need anything from anyone else to start? No. I can start on the classifier core today with the three files already in hand. Tiffany sending the raw weekly statement + one past finalized workbook is the fastest path to a real end-to-end test, but neither is blocking the build itself.