Meet BESEE: The Business Email Scanning & Extraction Engine
What It Does
Behind Amitree’s Business Organizers and Insights Engines is proprietary, patent-pending technology that enables a user to plug in their email inbox and immediately begin sorting and organizing emails, deriving insights, and gaining visibility into their business from previously unstructured data in their inbox.
The technology works by identifying business objects that are meaningful to the particular business of the user, then classifying related content to these business objects, and extracting the data within them to provide easy, contextual access. These business objects are domain specific - e.g., they can be real estate transactions, client relationships, loan originations, or other key units of business activity.
How It Works
When a user connects their inbox to Amitree and grants explicit permissions using oAuth 2.0, the industry-standard protocol for authorization, our technology scans incoming and outgoing emails in real-time.
The scan analyzes each email, using Amitree’s machine learning algorithms to assess the likelihood that it represents a business object of interest to the user.
Identify & Sort
If the business object is already known, the content of the email will be associated with the existing business object.
All data contained within the email will be extracted and associated with the business object for easy retrieval by the user.
Identifying & Sorting Business Objects
The purpose of these inbox scans is to identify new business objects and associate data with known business objects. We utilize Google and Microsoft’s search platforms to narrow the candidate set of emails that may be about a new business object by performing queries that match known heuristics - we scan and OCR attachments as well. These queries are constructed using machine learning based on input and feedback from tens of thousands of users every day.
Once a new business object is identified, our technology identifies email messages that are related to that object, classifies them accordingly, and extracts and associates the data contained within those emails so they are easily accessible to the user.
Extracting Business Data
Organizing email around a business object is only one part of the value our technology delivers.
In order to streamline business workflow and deliver insights and visibility for our users, our technology extracts key data contained with email and its attachments and organizes it in structured form for contextually relevant display via one of our various user interfaces.
Below are examples of business data that we extract:
Amitree’s extraction engine identifies sender / recipient email addresses and syncs them with the user’s existing address book.
Attachments are downloaded, fingerprinted, and recorded for easy recall and organization for the user.
Verticalized Business Data
Unstructured data from emails and attachments that are relevant to a particular business object are customized by vertical.
The Machine Learning Algorithm Behind BESEE
Amitree’s machine learning algorithm works by running a model on an email and producing a score indicating how likely a user would be to confirm that the email is associated with a relevant business object (e.g., a real estate transaction or a loan origination). The model is constructed and updated by evaluating the textual content of emails derived from fields including sender and recipient information, subject line, email body, attachment file names, and text within attachments - both stored directly in the attachment and, when not available, derived by running OCR on the images stored within the document.
The textual content is then correlated with confirmation and rejection data provided by users through the interface and used to predict how messages with similar textual content will be evaluated by any user. The evaluation of the textual content consists of a “bag of words / TFIDF” approach, which involves creating dictionaries of all terms used in each of the above textual fields, computing term and document frequencies, and transforming these into document-term matrices. The matrices are then used as inputs in “ensemble model” construction (i.e., creating multiple models). These models include simple log-linear models, random forests, and neural networks. The results of these models are then applied to a weighting scheme based on the effectiveness of each model. The result is a system that takes an email and produces a score.
The email scanner then uses this score to determine whether or not the business object is valid, and whether to create a folder to begin organizing email and extracting key business data around that business object.