Data Transformations Catalog
List of Transformations
This document lays out the input and output schema for the full list of Transformations provided in the Catalog. Note that for every single output, Refuel will also produce a confidnence score.
Staffing, Recruiting and HRTech
Resume Parsing
Input:
resume_link
(str): Either a publicly readable URL, or a path to S3 or GCS that can be read by Refuel through our integration.
Output:
candidate_name
(str): Name of the candidate.contact_info
(json): A JSON object containing any physical addresses, email addresses, phone numbers or web addresses (LinkedIn, Github, personal websites, etc) for the candidate.education
(list(json)): A list of JSON objects, where each JSON contains the school, major, degree, start year, end year and other information about a specific educational degree for the candidate.work_history
(list(json)): A list of JSON objects, where each JSON contains the job title, company, start month, start year, end month, end year and description about a specific job held by the candidate.skills
(str): List of skills demonstrated by the candidate based on evidence in their resume.
Job Description Parsing
Input:
text
(str): Raw text from a job description.
Output:
company
(str): The company or organization offering this job.title
(str): The job title for this job.location
(str): Location where the job is based.pay
(json): A JSON object containing information about the pay period (hourly, weekly, monthly, etc), minimum and maximum amounts, any bonuses, etc.skills
(str): List of skills required by the job description.
Skills Extraction and Mapping
Input:
link
(str): Either a publicly readable URL, or a path to S3 or GCS for a resume, job description or other document from which skills needs to be extracted.
Output:
skills
(str): List of skills demonstrated by the candidate based on evidence in their resume, and mapped against a taxonomy.
Job Title Normalization
Input:
title
(str): Job title to be normalized.
Output:
normalized_title
(str): Job title as a string, with typos corrected, short forms expanded (ex. Sr to Senior), unnecessary modifiers or adjectives removed.
Job Title Seniority Classification
Input:
title
(str): Job title.
Output:
seniority
(str): The job title will be categorized against the following taxonomy:
- Owner
- Founder
- C Suite
- Partner
- VP
- Head
- Director
- Manager
- Senior
- Entry
- Intern
Sales data and SalesTech
Headquarters or Physical Address for a Business
Input:
business_name
(str): Name of the business.
Output:
address
(str): The physical address or headquarters for the business name supplied. If no physical address is found, “Not Found” is returned.
Revenue Estimate
Input:
business_name
(str): Name of the business.website
(str): Business website (domain).address
(str): Complete address of the Business HQ.
Output:
revenue
(str): The latest estimated revenue of the business. If a revenue number cannot be extracted, “Not Found” is returned.
Domain Name Extraction
Input:
business_name
(str): The name of the business for which the domain name needs to be extracted.address
(str): The address of the business for which the domain name needs to be extracted.
Output:
domain_name
(str): The domain name/website of the business.