Guide to Standardizing Ocean Freight Rates

The impact of unstructured data on turnaround times

The speed at which a freight forwarder can calculate and offer an ocean freight rate dictates the hit rate of their quotes. Shipping lines and NVOCCs distribute their freight rates in a multitude of formats: from structured Excel files to flat PDF documents and plain-text emails. Promptly processing ocean freight rates through the automated extraction, interpretation, and transcription of this data into a Transport Management System (TMS) prevents immediate delays in operational workflows. This translates into shorter turnaround times for commercial teams waiting on up-to-date procurement data.

The core of this delay lies in the complexity of surcharges. While the base ocean freight rate is generally straightforward, local surcharges such as Terminal Handling Charges (THC), ISPS fees, and road sharing levies fluctuate heavily per port and per carrier [1]. Manually deciphering the validity and specific conditions of these additional costs results in a high risk of calculation errors and margin erosion.

A hybrid processing model, combining Robotic Process Automation (RPA) with human quality control, reduces the processing time of a complex carrier sheet to a maximum of four hours.

Process Step (Batch of 500 rate lines)	Manual Processing (Hours)	Hybrid Processing Model (Hours)
Document receipt and triage	2.5	0.2
Data extraction of base rates	6.0	0.5
Interpretation and entry of surcharges	8.5	1.0
Validation and data cleansing	4.0	1.8
Export to TMS / Database	1.0	0.1
Total processing time	22.0 hours	3.6 hours

Step 1: Classification of incoming file formats

Before automated data extraction begins, all incoming documents undergo stringent triage. Automation tools function primarily on predictability. Directly forwarding every incoming email attachment to an Optical Character Recognition (OCR) engine leads to corrupted data and system failures.

Triage is the process of pre-sorting documents based on their file structure. An employee or a classification algorithm scans the inbox and separates PDF matrices from irregular Excel files. Once the structure is determined, the file is assigned to the appropriate extraction method. Structured documents flow straight into the RPA process. Files without a logical layout revert to a specialized BPO team for manual preprocessing or direct entry.

Criteria for sorting files for direct extraction:

Type of file extension (.xlsx, .csv, iterative .pdf)
Presence of vector text versus raster images
Sender recognition (linked to known carrier templates)
Consistent row and column structure without varying merges per page

Categorization of incoming source files

Different document types introduce own their technical challenges during extraction. A .csv or clean Excel sheet contains data in well-defined cells, allowing a script to read these columns directly. A vector PDF (generated straight from a digital system) contains text that software recognizes as a digital text layer. Here, an OCR tool can pinpoint coordinates with high accuracy.

Technically speaking, a scanned PDF (raster) is an image. Extracting data from this format requires an extra translation step where the software converts pixels into characters, increasing the error margin for small fonts or compression artifacts. Unformatted email texts—where rates are embedded as plain text in varying paragraphs—lack the anchor points required for templates, blocking regular pattern recognition entirely.

Checklist: Evaluation criteria for OCR readiness

A carrier sheet must meet specific conditions to be processed by a text-mining application without requiring manual corrections.

Minimum resolution of 300 DPI: Necessary for scanned documents to prevent character confusion (such as mistaking the letters ‘rn’ for the letter ‘m’).
Selectable text (Vector-based): Text layers must be digitally generated, ensuring extraction software reads characters rather than guessing pixels.
Consistent table structure: Columns for ‘Origin’, ‘Destination’, ’20FT’, and ’40FT’ must align on the exact same horizontal axis on every subsequent page.
No nested or merged data fields: Cell merging, where a single port is linked to multiple prices across different rows, disrupts the linear data output.
Standardized typography: The use of standard fonts without handwritten annotations, watermarks, or overlapping carrier logos in the data fields.

Step 2: Data extraction and configuring validation rules

Following classification, OCR-ready data flows into the RPA infrastructure. An RPA bot reads documents using predefined extraction rules (templates). These templates contain coordinates and ‘Regular Expressions’ (RegEx) that hunt for specific text patterns. The bot identifies a column header, searches for the corresponding data sets, and extracts the value to a temporary staging database.

During this stage, the software performs validations based on master data. The system cross-references extracted date fields with logical parameters (for instance, a validity period cannot be in the past). Currency codes are evaluated against the ISO 4217 standard. If a shipping line lists ‘USD’ as ‘$’ or ‘US Dollars’, the script normalizes this string back to the database standard ‘USD’. This strict normalization is a prerequisite for executing calculations in the FMS (Freight Management System) without a hitch.

Extraction rules for port terminology

Shipping lines utilize internal abbreviations for the same logistical hubs. To accurately link the Port of Loading and Port of Discharge, an RPA script relies on a fixed data dictionary, typically based on the UN/LOCODE standard.

A template uniformly translates ‘RTM’, ‘NLROT’, or ‘Rotterdam Port’ to ‘NL RTM’. The extraction rule first searches for the closest UN/LOCODE in the string. If a match isn’t found, the system looks for geographical anchors within the cell. This type of mapping prevents a rate for the ‘Port of Shanghai’ from failing just because a carrier spells it as ‘CN SHA’ while the target system exclusively accepts ‘CNSHA’.

Mapping complex surcharges

The core ocean freight rate is generally static during its validity period, but surcharges are not. Configuring logic for these costs requires conditional instructions within the extraction platform.

The Bunker Adjustment Factor (BAF) is a surcharge for fluctuating fuel prices. The Flexport glossary defines BAF as: “a fee to adjust for the fluctuating costs of fuel”. Extraction software must recognize whether a BAF is listed as inclusive or exclusive of the base rate on the sheet. By establishing If-Then logic, the system reads the footnotes of the carrier sheet. If a note reads “BAF subject to monthly review”, the script assigns a restricted validity period (valid to) to this specific data field, entirely separate from the base rate’s validity. This partitioned data is then injected via an API into the proper cost lines of the internal platform.

Logistiek specialist vergelijkt carrier sheets met TMS-data om zeevrachttarieven standaardiseren op donker scherm.

Step 3: Applying hybrid quality control

A fully automated solution quickly collides with the volatile reality of global supply chains. RPA algorithms operate purely on programmed rules and lack contextual intelligence. Shipping lines frequently alter their documentation layouts, add new or temporary local levies without notice (like a waiting time surcharge due to port congestion), or unexpectedly change the currency for a specific route.

In moments like these, an algorithm loses its way. The system might read data from a misaligned column or ignore a new surcharge entirely. This results in invisible calculation errors. This is where the hybrid model steps in: the algorithm is programmed with confidence levels. If a cell value or pattern drops below a predetermined certainty threshold, the bot halts processing for that specific line and escalates the issue to a dashboard.

Trained data experts handle these exceptions immediately. Housing these specialists within the European time zone ensures that corrections occur during business hours, making tight SLA frameworks possible. This model guarantees that operations don’t grind to a halt due to delays outside regular hours, and inherently safeguards the required EU compliance for data management. In this way, ocean freight rate processing teams can validate the stalled lines, correct them directly in the source system, and complete the data upload within the targeted four hours.

Data expert verification of exceptional surcharges

Edge cases—such as a manual addition in a PDF stating “Surcharge Y does not apply to client X” — fall entirely outside regular pattern recognition. The data expert reviews the source in the validation dashboard alongside the extracted fields. The specialist interprets the commercial implication of the note, enters the appropriate exception via a manual correction (human-in-the-loop), and, in many cases, immediately trains the underlying algorithm for its next encounter with this specific exception.

RPA-workflow voor zeevrachttarieven standaardiseren met digitale procesblokken en automatisering op een scherm.

When full standardization falls short

For processing long-term contracts with monthly updates, automation offers unprecedented efficiency. However, there are scenarios where pure data standardization is economically irrational.

A freight forwarder receives daily ad-hoc requests and spot rates for occasional shipments. These rates generally reach the procurement department via platforms like WhatsApp, WeChat, or highly fragmented, short-lived email exchanges. Building, testing, and implementing an RPA script or a complex OCR template for a one-off format consumes hours of development time.

When a freight request lacks a fixed, repetitive layout or is delivered in extremely low volumes, the operational cost of automation outweighs the benefits. The interpretation of a single, unstructured email and subsequent data entry into the TMS is completed by a specialized employee in half the time—and at a fraction of the programming costs. An efficient back-office strategy channels massive, repetitive rate sheets through the RPA pipeline, and expressly reserves human capacity for the dynamic, unstructured spot market.

Immediate implementation for scalable rate management

Deploying strict file classification alongside configured extraction rules transforms a sluggish manual process into a predictable data stream. By isolating exceptional surcharges and employing targeted human quality control on these exceptions, you secure Data Accuracy within your TMS without sacrificing procurement flexibility. Structural data processing translates immediately to scalability; your pricing desk can handle higher volumes of quotes with reduced operational risk and lower costs through nearshoring and smart BPO.

Stop letting account managers waste hours waiting on basic data entry. To strengthen your competitive position, you can now streamline your ocean freight rate processing through DataMondial’s hybrid solution, where sophisticated RPA systems work hand-in-hand with certified data experts from our Operations Center in Romania. Our approach empowers freight forwarders to structurally process and update incoming rate sheets within four hours, fully compliant with European data standards (GDPR). Request a process analysis today or download our latest whitepaper on seamlessly integrating extracted freight data into your existing TMS architecture.

Sources

1. https://www.datamondial.nl

Standardizing Complex Carrier Sheets: How to Quote Flawlessly Within 4 Hours