Data Transformations Catalog
List of Transformations
This document lays out the input and output schema for the full list of Transformations provided in the Catalog. Note that for every single output, Refuel will also produce a confidnence score.
Staffing, Recruiting and HRTech
Resume Parsing
Input:
resume_link
(str): Either a publicly readable URL, or a path to S3 or GCS that can be read by Refuel through our integration.
Output:
candidate_name
(str): Name of the candidate.contact_info
(json): A JSON object containing any physical addresses, email addresses, phone numbers or web addresses (LinkedIn, Github, personal websites, etc) for the candidate.education
(list(json)): A list of JSON objects, where each JSON contains the school, major, degree, start year, end year and other information about a specific educational degree for the candidate.work_history
(list(json)): A list of JSON objects, where each JSON contains the job title, company, start month, start year, end month, end year and description about a specific job held by the candidate.skills
(str): List of skills demonstrated by the candidate based on evidence in their resume.
Job Description Parsing
Input:
text
(str): Raw text from a job description.
Output:
company
(str): The company or organization offering this job.title
(str): The job title for this job.location
(str): Location where the job is based.pay
(json): A JSON object containing information about the pay period (hourly, weekly, monthly, etc), minimum and maximum amounts, any bonuses, etc.skills
(str): List of skills required by the job description.
Skills Extraction and Mapping
Input:
link
(str): Either a publicly readable URL, or a path to S3 or GCS for a resume, job description or other document from which skills needs to be extracted.
Output:
skills
(str): List of skills demonstrated by the candidate based on evidence in their resume, and mapped against a taxonomy.
Job Title Normalization
Input:
title
(str): Job title to be normalized.
Output:
normalized_title
(str): Job title as a string, with typos corrected, short forms expanded (ex. Sr to Senior), unnecessary modifiers or adjectives removed.
Job Title Seniority Classification
Input:
title
(str): Job title.
Output:
seniority
(str): The job title will be categorized against the following taxonomy:
- Owner
- Founder
- C Suite
- Partner
- VP
- Head
- Director
- Manager
- Senior
- Entry
- Intern
Sales data and SalesTech
Headquarters or Physical Address for a Business
Input:
business_name
(str): Name of the business.
Output:
address
(str): The physical address or headquarters for the business name supplied. If no physical address is found, “Not Found” is returned.
Revenue Estimate
Input:
business_name
(str): Name of the business.website
(str): Business website (domain).address
(str): Complete address of the Business HQ.
Output:
revenue
(str): The latest estimated revenue of the business. If a revenue number cannot be extracted, “Not Found” is returned.
Lead Scoring
Input:
business_description
(str): Description of the business that is qualifying leads.icp_description
(str): Description of the ideal customer profile for the business.customer_title
(str): Job title of the lead at the lead’s company.customer_company
(str): Company of the lead.customer_name
(str): Name of the lead.
Output:
lead_score
(str): A score between 0 and 100, indicating the likelihood of the lead being a good fit for the business.lead_score_rationale
(str): A rationale for the lead score, explaining the reasoning behind the score.
Get Phone Numbers for business
Input:
business_name
(str): Name of the business to extract the phone number for.website
(str): Website of the business to extract the phone number for.address
(str): Address of the business to extract the phone number for.
Output:
phone_number
(str): Phone number of the business.
Domain Name Extraction
Input:
business_name
(str): The name of the business for which the domain name needs to be extracted.address
(str): The address of the business for which the domain name needs to be extracted.
Output:
domain_name
(str): The domain name/website of the business.
ICP Fit Classification
Input:
business_name
(str): The name of the business looking for potential customers.business_description
(str): The description of the business looking for potential customers. A text description of the business and its offerings.icp_description
(str): The description of the ideal customer profile (ICP) of the business looking for potential customers. This describes the ICP in detail which will be matched to information extracted about the potential customer.customer_company
(str): The name of the company that is being evaluated for potential fit with the business.customer_website
(str): The website of the company that is being evaluated for potential fit with the business.
Output:
icp_fit
(str): The ICP fit of the business. This returns how good of a fit the customer is for the business based on the ICP description. This will be one of the following values -High
,Medium
,Low
.
SIC Classification
Input:
business_name
(str): The name of the business for which SIC code needs to be found.website
(str): The website of of the business.address
(str): The address of the business.
Output:
sic_code
(str): The relevant SIC codes of the business. The full list of possible codes can be found here.
NAICS Industry Classification
Input:
business_name
(str): The name of the business.website
(str): The website of of the business.address
(str): The address of the business.
Output:
naics_sector
(str): The NAICS sector of the business. The sector is the first two digits of the 6-digit NAICS code.naics_industry
(str): One or more 6-digit NAICS codes under which the business is categorized. If the business has multiple industries, the codes will be returned as a semicolon-separated list.
The full NAICS taxonomy can be found here.
MCC Industry Classification
Input:
business_name
(str): The name of the business.website
(str): The website of of the business.address
(str): The address of the business.
Output:
mcc_categories
(str): One or more Merchant Category Codes (MCC) under which the business is categorized. If the business has multiple MCC codes, the codes will be returned as a semicolon-separated list.
The full list of possible codes can be found here.
Number of Employees
Input:
business name
(str): The name of the business.website
(str): The website of of the business.address
(str): The address of the business.
Output:
Number of employees
(str): The number of employees who work at the business.
Address Cleaning and Normalization
Input:
address
(str): The unformatted address to be cleaned and normalized.
Output:
clean addresses
(str): The clean address in a standard format.