Log System
The backend tracks every user operation on a table through a two-layer log system. Logs are written per (dataset, table) pair and drive the dependency graph that the frontend uses to show operation history and handle cascaded deletes.
Overview
| Layer | File | Format | Purpose |
|---|---|---|---|
| Plain-text log | public/logs/logs-{datasetId}-{tableId}.log | One line per operation | Human-readable audit trail |
| JSON Lines log | public/logs/logs-{datasetId}-{tableId}.jsonl | One JSON object per line | Machine-readable; powers the dependency graph |
Both files are written in parallel for every logged operation. The .jsonl file is the authoritative source; the .log file is kept in sync mainly for quick manual inspection.
File locations
All log files live under public/logs/ relative to the backend process working directory. The directory is created automatically on first write.
public/
โโโ logs/
โโโ logs-1-3.log
โโโ logs-1-3.jsonl
โโโ logs-2-7.log
โโโ logs-2-7.jsonl
Operation types
Both services share the same set of operation type constants:
| Constant | Value | Description |
|---|---|---|
RECONCILIATION | "RECONCILIATION" | A column was reconciled against a service |
EXTENSION | "EXTENSION" | One or more columns were extended |
MODIFICATION | "MODIFICATION" | A column was modified |
PROPAGATE_TYPE | "PROPAGATE_TYPE" | A cell annotation was propagated |
EXPORT | "EXPORT" | The table was exported |
SAVE_TABLE | "SAVE_TABLE" | The table was saved |
GET_TABLE | "GET_TABLE" | The table was loaded |
SAVE_TABLE and GET_TABLE are lifecycle markers โ they are logged but never assigned an opNumber and are excluded from the dependency graph nodes.
Services
LoggerService โ plain-text logger
Path: src/api/services/logger/logger.service.js
Writes a single human-readable line per operation using fs.appendFileSync.
Log line format
[<ISO-timestamp>] -| OpType: <type> -| DatasetId: <id> -| TableId: <id> [-| ColumnName: <col>] [-| <ServiceLabel>: <service>] [-| AdditionalData: <json>]
Example:
[2025-05-10T14:32:01.123Z] -| OpType: RECONCILIATION -| DatasetId: 1 -| TableId: 3 -| ColumnName: country -| Reconciler: wikidata -| AdditionalData: {"serviceId":"wikidata"}
Public API
LoggerService.logReconciliation({ datasetId, tableId, columnName, service, additionalData })
LoggerService.logExtension({ datasetId, tableId, columnName, service, additionalData })
LoggerService.logModification({ datasetId, tableId, columnName, service, additionalData })
LoggerService.logTypePropagation(datasetId, tableId, columnName, additionalData)
LoggerService.logExportTable(datasetId, tableId, format)
LoggerService.logSave({ datasetId, tableId, deletedCols })
LoggerService.logGetTable({ datasetId, tableId })
LoggerJsonService โ JSON Lines logger
Path: src/api/services/logger/logger-json.service.js
Writes one JSON object per line (JSON Lines / .jsonl format). This is the file read by Log.js to build the dependency graph.
Schema entry
The first line of every .jsonl file is a special schema entry written once when the file is created:
{"type":"schema","datasetId":1,"tableId":3,"columns":["col1","col2"],"createdAt":"2025-05-10T14:00:00.000Z"}
It is never overwritten. All consumers skip lines where type === "schema".
Operation entry
{
"id": "uuid-v4",
"opNumber": 1,
"timestamp": "2025-05-10T14:32:01.123Z",
"operationType": "RECONCILIATION",
"datasetId": "1",
"tableId": "3",
"columnName": "country",
"reconciler": "wikidata",
"additionalData": { "serviceId": "wikidata" }
}
Key fields:
| Field | Present when | Description |
|---|---|---|
id | Always | UUID uniquely identifying this log entry |
opNumber | Operations only (not SAVE_TABLE/GET_TABLE) | Incremental integer; computed as max(existing opNumbers) + 1 |
columnName | Reconciliation, extension, modification, type propagation | Target column |
reconciler / extender / modifier | Service operations | Service identifier |
createdColumns | Extension, modification | Columns created as a result of the operation |
deletedCols | Save | Columns deleted during the save |
additionalData | Service operations | Request body (with items stripped to keep logs small) |
Public API
Mirrors LoggerService exactly:
LoggerJsonService.logReconciliation({ datasetId, tableId, columnName, service, additionalData })
LoggerJsonService.logExtension({ datasetId, tableId, columnName, service, additionalData, createdColumns })
LoggerJsonService.logModification({ datasetId, tableId, columnName, service, additionalData, createdColumns })
LoggerJsonService.logTypePropagation(datasetId, tableId, columnName, additionalData)
LoggerJsonService.logExportTable(datasetId, tableId, format)
LoggerJsonService.logSave({ datasetId, tableId, deletedCols })
LoggerJsonService.logGetTable({ datasetId, tableId })
Middleware
logger-json.js
Path: src/api/middleware/logger-json.js
Intercepts incoming requests and delegates to LoggerJsonService based on the matched URL pattern:
| URL pattern | Handler |
|---|---|
*/api/reconcilers/* | handleReconciliationRoute |
*/api/extenders/* | handleExtenderRoute |
*/api/modifiers/* | handleModificationRoute |
PUT /api/dataset/:id/table/:id | handleSaveRoute โ logSave |
GET /api/dataset/:id/table/:id | handleSaveRoute โ logGetTable |
*/export* | handleExportOperation |
For service operations (reconciliation, extension, modification) the log is written after the response is sent and only on HTTP 2xx. This is implemented by monkey-patching res.json.
X-Table-Dataset-Info header
The frontend must send this header on every service request:
X-Table-Dataset-Info: tableId:<id>;datasetId:<id>;columnName:<col>
The middleware parses it to extract the context needed to write the log entry. If the header is absent, the request is passed through without logging. Set the flag skipLog=1 in this header to suppress logging for automated internal requests (e.g. redo-after-delete reconciliations).
dependencies.middleware.js
Path: src/api/middleware/dependencies.middleware.js
After a service response is produced it injects a dependencies field into the JSON response body. This field is built by instantiating Log and calling buildDependencyGraph(), so it always reflects the state of the log including the operation that was just written.
Log class โ dependency graph
Path: src/api/services/logger/Log.js
Reads the .jsonl file and builds an in-memory graph of operation dependencies for a given (dataset, table) pair.
Construction
import { Log } from "../services/logger/Log.js";
const log = new Log(datasetId, tableId);
log.buildDependencyGraph();
The constructor synchronously parses the .jsonl file and applies the consolidation window heuristic (see below). Call buildDependencyGraph() afterwards to populate the graph nodes.
Consolidation window
Operations are sliced from the log to include only the current editing session:
- The file lines are reversed (newest first).
- The first
SAVE_TABLEentry is found โ this is the last save point. - Lines up to the first
GET_TABLEentry that follows it are kept.
Operations that fall outside this window are considered non-consolidated and are pruned from the log on the next table load by calling pruneNonConsolidated().
Graph structure
nodes = {
root: { children: [...], parents: [...], supportChildren: [...], supportParents: [...] },
"<opId>": { ... },
...
}
- children / parents โ primary dependency edges (e.g. an extension built on a reconciled column).
- supportChildren / supportParents โ secondary dependency edges from multi-column parameters (e.g. a modifier that reads from two columns).
Key methods
| Method | Description |
|---|---|
buildDependencyGraph() | Builds nodes from parsed operations sorted by opNumber |
getObject() | Returns a plain object with { datasetId, tableId, columns, operationsCount, latestTableData, nodes, operations } representing the whole log |
getDownstreamDependencies(opId) | Returns all operation IDs that depend (directly or transitively) on opId |
pruneNonConsolidated() | Deletes all non-consolidated operations from both log files |
deleteOperationsFromLog(opIds) | Removes specific operations by ID from both the .jsonl and .log files |
LogJson class
Path: src/api/services/logger/LogJson.js
A lighter read-only class that parses the .jsonl file and exposes the operations list. Used when the full dependency graph is not needed.
import { LogJson } from "../services/logger/LogJson.js";
const log = new LogJson(datasetId, tableId);
await log.getOperations(); // returns parsed operation objects
Maintenance guide
Adding a new operation type
- Add the constant to
OPERATION_TYPESin bothlogger.service.jsandlogger-json.service.js. - Add a public static method (e.g.
logMyOperation) to both services following the existing pattern. - Call both methods from the relevant controller or middleware. The two services must always be called together to keep the two files in sync.
- If the new type should appear in the dependency graph, handle it inside
Log.js#appendOperationNode. Otherwise add it to theEXCLUDED_TYPESlist inLoggerJsonService.#writeLogso noopNumberis assigned.
Adding a new route to log
- Add a URL pattern constant to
ROUTE_PATTERNSinlogger-json.js. - Write a handler function (e.g.
handleMyRoute) following the existing handlers. - Add a branch in
routeLogsthat calls your handler. - For routes where the log should be written only after a successful response, use
interceptResponse.
Deleting log files
Log files are plain files under public/logs/. They can be deleted manually when disk space is a concern. The system will recreate them automatically on the next operation. Deleting a .jsonl file resets the operation history for that table; the dependency graph will be empty on next load.
Keeping files in sync
Always write to both LoggerService and LoggerJsonService for any new log point. The .log file is not the source of truth but is used by deleteOperationsFromLog to mirror deletions; if it drifts from the .jsonl file, deleted operations may reappear in the plain-text view.
caution
Never manually edit a .jsonl file unless you also update the opNumber values. The opNumber must remain unique and monotonically increasing; gaps are acceptable but duplicates will corrupt the dependency graph.