Data Matching Wizard

Find hidden duplicates in your data — upload a CSV or TSV file and see matching clusters in seconds

The Data Matching Wizard is a browser-based, step-by-step tool that uses Interzoid's AI-powered similarity algorithms to identify and cluster matching records within your data files. Whether you need to deduplicate company names, match individual names across datasets, or find address variations, the wizard walks you through the entire process without writing a single line of code.

The wizard supports 17 languages and can be used from any modern browser. It works with both CSV and TSV files and provides a downloadable match report showing all identified clusters of similar records.

1Prerequisites

Before using the Data Matching Wizard, you will need:

An Interzoid API Key: Register for an account and obtain your unique API license key. This key authenticates your requests and tracks usage credits.
A CSV or TSV Data File: Prepare a comma-separated (CSV) or tab-separated (TSV) file containing the records you want to match. The file should have at least one column of data suitable for matching (company names, individual names, or street addresses).
Available Credits: Each record processed consumes one API credit. Ensure your account has sufficient credits for the number of records in your file.

2Launch the Wizard and Enter Your API Key

Open the Data Matching Wizard in your browser. Before beginning, enter your Interzoid API key in the top-right area of the header bar. Your key will be saved in your browser for future sessions.

API Key Field: Type or paste your API key into the input field in the header. Click the lock/eye icon to toggle visibility.
Check Credits: Click the Credits button to verify your current credit balance before starting a job.
Language Selection: Click the language dropdown in the navigation bar to switch between any of the 17 supported languages. The entire wizard interface will update immediately. You can also set the language via URL parameter: ?lang=fr for French, ?lang=ja for Japanese, etc.

Once your API key is entered, click Get Started on the introduction screen to begin the wizard.

Launch the Data Matching Wizard

3Select a Matching Function

The wizard presents six matching functions. Choose the one that matches your data and use case. Each function card shows a description and the column parameters it requires.

Single-Column Functions

These functions analyze one column of data to find matches:

Function	Use Case	Column Required
Company Name Matching	Match variations like "IBM", "I.B.M. Corp", "International Business Machines"	Company Name
Individual Name Matching	Match "James Johnston", "Jim Johnston", "J. Johnston" as the same person	Full Name
Street Address Matching	Match "400 E Broadway St" with "400 East Broadway Street"	Address

Combination Functions

These functions use two columns together for higher matching precision:

Function	Use Case	Columns Required
Company + Address	Higher precision matching using both company name and street address	Company Name, Address
Company + Full Name	Contact deduplication using company and individual name	Company Name, Full Name
Address + Full Name	Person-at-address matching using address and individual name	Address, Full Name

Tip: Combination functions require your file to have at least two columns. If you select a combination function and upload a single-column file, the wizard will display a warning and prevent you from proceeding until you upload a file with enough columns or switch to a single-column function.

Click on the card for your chosen function, then click Next to proceed.

4Upload Your Data File

Click the upload area to browse for a local CSV or TSV file on your computer. The wizard will:

Auto-detect the file format based on the file extension (.csv, .tsv, or .txt).
Upload the file securely to cloud storage so the matching engine can access it.
Display a preview of the first several rows so you can verify the data and column structure before proceeding.

File Requirements: Your file should be a properly formatted CSV (comma-separated with optional quoting per RFC 4180) or TSV (tab-separated). All rows should have the same number of columns. The maximum file size supported is 500,000 records per job.

Once the upload completes and you see the preview, click Next to continue.

5Select Columns and Options

Specify which column numbers from your file correspond to the matching parameters. The column preview table from the previous step is shown here for reference.

Column Numbers: Enter the 1-based column number for each required field. For example, if company names are in the third column of your file, enter 3.
Combination Functions: For two-column functions, both column numbers must be provided and must be different columns.

Output Options

Show Similarity Keys: When enabled (default), each output record includes the generated similarity key as the last column. Records with the same key are matches. Disable this if you want clean output with only the original data fields.
Matches Only: When enabled (default), only records that have at least one other matching record are shown. Disable this to see every record in the file after processing, sorted by similarity key.

Click Next when your column selections and options are configured.

6Review and Run

The final screen shows a summary of all your selections: matching function, file name, format, column assignments, and output options. Review these carefully before proceeding.

Click the green Run Match button to start processing. The wizard will:

Validate your API key and check that your account has sufficient credits for the job.
Process each record through the selected matching algorithm using concurrent workers for performance.
Generate the match report with records sorted and grouped into clusters of matching entries.

A progress indicator is shown while the job runs. Processing time depends on the number of records — most files complete within seconds, while very large files may take a minute or more.

Note: If the matching engine encounters too many errors (for example, due to malformed data), the job will stop early and display an error message. Check your data file for formatting issues and try again.

7Interpret the Results

The match report appears in the results panel at the bottom of the screen. Records are organized into clusters — groups of records that the AI has determined to be matches. Each cluster is separated by a blank line for readability.

Example Output

For a company name match on a CSV file with similarity keys enabled:

IBM Corporation,1 New Orchard Rd,Armonk,NY,d477E1d7sG6dja3hDNsk9P
I.B.M. Corp,1 New Orchard Road,Armonk,NY,d477E1d7sG6dja3hDNsk9P

Microsoft Inc.,1 Microsoft Way,Redmond,WA,k8Rp2mNx4wQjL9vB3cYh7T
Microsoft Corporation,One Microsoft Way,Redmond,WA,k8Rp2mNx4wQjL9vB3cYh7T
MSFT Corp,1 Microsoft Way,Redmond,WA,k8Rp2mNx4wQjL9vB3cYh7T

In this example, the first cluster contains two records identified as variations of IBM, and the second cluster contains three records identified as variations of Microsoft. The last column in each row is the similarity key — all records sharing the same key are considered matches.

8Save Your Results

Click the Save Results button above the results panel to download the match report as a file. On supported browsers, a save dialog will appear allowing you to choose the file name and location. On other browsers, the file will download automatically.

The saved file preserves the same CSV or TSV format as your original input, making it easy to import into spreadsheets, databases, or other data processing tools for further analysis.

Data Pipeline Integration: The match report output is clean, delimited text with no metadata or headers — just data rows and blank-line cluster separators. This makes it suitable for direct use in automated data pipelines.

9API Access for Developers

The Data Matching Wizard is powered by a REST API that can also be called directly from your own applications, scripts, or data pipelines. This is useful for automating matching jobs or integrating matching into larger workflows.

Example API Call

$ curl "https://match.interzoid.com/match?connection=https://your-file-url/data.csv&filetype=csv&function=company-name-only&apikey=YOUR_API_KEY&company_column=3&showkeys=true&matchesonly=true"

API Parameters

Parameter	Required	Description
`connection`	Yes	URL or path to the data file
`filetype`	Yes	`csv` or `tsv`
`function`	Yes	One of: `company-name-only`, `fullname-only`, `address-only`, `company-and-address`, `company-and-fullname`, `address-and-fullname`
`apikey`	Yes	Your Interzoid API license key
`company_column`	When applicable	1-based column number for company name
`fullname_column`	When applicable	1-based column number for individual name
`address_column`	When applicable	1-based column number for street address
`showkeys`	No	`true` (default) or `false` — append similarity key to output
`matchesonly`	No	`true` (default) or `false` — show only matching clusters

The API returns plain text output identical to what the wizard displays, making it suitable for piping into other tools or storing directly as a file.

View All Interzoid APIs Getting Started Guide

The Data Matching Wizard makes it easy to discover hidden duplicates and matching records across your datasets. Whether you use it interactively through the browser-based wizard or programmatically through the API, it delivers clean, actionable match reports that help you improve the quality, consistency, and value of your data assets. If you have any questions or need assistance, don't hesitate to reach out to our support team.