Architecture and Concepts
A high level overview of a production architecture is presented below. This is an ideal deployment scenario, with dedicated machines for each Explorer (Web Console), Data Workers, Job Workers, Queues and Data Ingestion.
There are different types of BigConnect nodes:
    Data Worker nodes usually handle data enrichment
    Long Running Process Worker nodes usually handle long running processes (such as running Machine Learning models or Graph algorithms) or scheduled tasks
    External Worker nodes that handle data ingestion or processing from potentially rate-limited external sources
    Explorer nodes serve the BigConnect Explorer application
    Graph Engine is BigGraph
    Bolt Server nodes execute Cypher queries
Each running node can be a combination of one or more types. This means that in a very simple deployment, one running node can be used to serve the Explorer, to run data worker plugins and job worker plugins, whereas in a typical production deployment you will have nodes that handle specific tasks and are also distributed across many machines.
Because nodes need to talk to each-other, each node instance can connect to a WorkQueue to publish and subscribe to messages. The current implementation offers an in-memory work queue for simple deployments and a RabbitMQ work queue for production deployments. Other types of WorkQueues can be easily implemented as well (eg. Apache Kafka, AWS Simple Queue Service etc.).

The Graph Engine

Handles the persistence of data and provides operations for adding, updating, deleting and querying stored data. It provides a graph-like structure based on Vertices, Edges and multi-valued Properties that can be applied to Vertices or Edges.
Property Graphs have been around for a while and the technology offering has grown consistently in the last years. BigConnect however brings two new important features: multi-valued properties and fine-grained security.
It makes a clear distinction between storing data and querying data, because not all storage containers can also retrieve the data in a timely manner. This is a common architecture for Property Graphs where a search index is used to perform fast queries.

Data Model (Ontology)

The core foundation of BigGraph is the dynamic, semantic data model, called Ontology. It represents the way you store, correlate and query all information and it’s a conceptual data model that expresses information in a factual way: objects, relations and attributes.
Typically, the ontology is used to provide meaning to the information stored in the system, either at ingestion time or later on, during the life-cycle of the data. It can be defined in the beginning or any time later on, to adapt to any data structure, type or meaning. Any information stored in the system can be mapped to an entity (that has a concept), can have some relations and some attributes.
Concepts can be anything you can think of: person, vehicle, bank transaction, phone call, equipment, company, location, event, network packet, log file etc. They are linked to each other using meaningful relations like “works at”, “lives in”, “has friend”, “is brother of”, “sent from IP”, “source file” etc.
Concepts and relations can also have attributes. For a company these can be the company name, formation date, address etc. For a person these can be its first name, last name, birth date, phone number, email address etc. For a “works at” relation we can have the “start date” and “end date” attributes to denote when the person started to work at a company. For a “sent from IP” relation we can have the “timestamp” and the “user” etc.
The ontology defines what Concepts, Relations and Properties are available. Each ontology item can also have several meta-properties such as: searchable, deletable, color etc.

Concepts

The conceps are hierarchical and inheritable. This means that child concepts will inherit the properties of parent concepts. There is a root concept that sits at the top of the hierarchy and is called thing and It's a system-level concept that cannot be removed. Any first-level concepts must inherit the thing root concept.
Concepts also have meta-properties that describe how the concept should be treated in the system and what actions are available for them:
Meta-Property
Description
title
Unique identifier for the Concept
displayName
The human-readable name that should be displayed in the platform
icon
The image to use to display the entity
userVisible
If the concept is visible in the Web Console
searchable
If the concept should show in the Concept type search filter
deleteable
If the delete button should show in the Web Console
updateable
If the concept's properties can be updated
intents
What the concept is for (optional)
displayType
Specifies how the UI should display the entity: audio, image, video or document
color
The color to use on the graph and when underlining the concept in a document text section
titleFormula
A JavaScript snippet used to display the title of the entity. The snipped could be a single expression, or multiple lines with a return. All formulas have access to:
    vertex: The json vertex object (if the element is vertex)
    edge: The json edge object (if the element is an edge)
    ontology: The json ontology object (concept/relation)
      id The iri
      displayName The display name of type
      parentConcept Parent iri
      pluralDisplayName The plural display name of type
      properties The property iris defined on this type
    prop: Function that accepts a property IRI and returns the display value.
    props: Function that accepts a property IRI and returns a list of all matching properties.
    propRaw: Function that accepts a property IRI and returns the raw value.
subtitleFormula
A JavaScript snippet used to display additional information in the search results.
timeFormula
A JavaScript snippet used to display additional information in the search results.
Out of the box, the following concept hierarchy is provided:
    Thing
      Event
      Raw
        Document
        Audio
        Image
        Video

Relations

The relations are hierarchical and inheritable, like concepts. This means that child relations will inherit the properties of parent relations . There is a root relation that sits at the top of the hierarchy and It's a system-level relation that cannot be removed. Any first-level relations must inherit the root relationship.
Relations also have meta-properties that describe how the they should be treated in the system and what actions are available for them:
Meta-Property
Description
title
Unique identifier for the relation
displayName
The human-readable name that should be displayed in the platform
domainConcepts
The source concepts
rangeConcepts
The destination concepts
inverseOfs
A list of relations that this relation is the inverse of
userVisible
If the relation is visible in the Web Console
deleteable
If the delete button should show in the Web Console
updateable
If the concept can be updated
intents
What the relation is for (optional)
color
The color to use on the graph
titleFormula
A JavaScript snippet used to display the title of the entity. The snipped could be a single expression, or multiple lines with a return. All formulas have access to:
    vertex: The json vertex object (if the element is vertex)
    edge: The json edge object (if the element is an edge)
    ontology: The json ontology object (concept/relation)
      id The iri
      displayName The display name of type
      parentIri Parent iri (if edge and is a child type)
      properties The property iris defined on this type
    prop: Function that accepts a property IRI and returns the display value.
    props: Function that accepts a property IRI and returns a list of all matching properties.
    propRaw: Function that accepts a property IRI and returns the raw value.
subtitleFormula
A JavaScript snippet used to display additional information in the search results.
timeFormula
A JavaScript snippet used to display additional information in the search results.
Out of the box, the following relation hierarchy is provided:
    Root Relationship
      Has Entity
      Has Source
      Contains image of
      Has image

Properties

Properties apply to both concepts and relations. They also have meta-properties that describe how the they should be treated in the system and what actions are available for them:
Meta-Property
Description
title
Unique identifier for the property
displayName
The human-readable name that should be displayed in the platform
dataType
The type of the property: string, integer, double, currency, date, boolean, geoLocation. The property is rendered in the Web Console based on its type.
userVisible
If the property is visible in the Web Console
searchable
If the property should show in the Filter by Property list in the Web Console
searchFacet
If the property should be displayed as a Search Facet on the Web Console
textIndexHints
Specifies how text is indexed in the full text search:
    NONE - Do not index this property (default)
    ALL - Combination of FULL_TEXT and EXACT_MATCH
    FULL_TEXT - Allow full text searching. Good for large text
    EXACT_MATCH - Allow exact matching. Good for multi-word known values.
deleteable
Should the delete button show in the Web Console and allow deleting properties in REST calls.
addable
Should the add property list show this property and allow creating property values in REST calls.
updateable
Should the edit button show in the UI and allow updating property values in REST calls.
intents
See the Intents section
displayType
Specifies how the UI should display the value. Plugins can add new display types, see the Ontology Property Display Types section in Font-end Plugins
    bytes: Show the value in a human readable size unit based on size. Assumes the value is in bytes.
    dateOnly: Remove the time from the property value and stop timezone shifting display for users (Date will be same regardless of users timezone).
    geoLocation: Show the geolocation using description (if available), and truncated coordinates.
    heading: Show a direction arrow, assumes the value is number in degrees.
    link: Show the value as a link (assumes the value is valid href)
    longtext: Show the value using multiline whitespace, and allow editing in a instead of one line
propertyGroup
Allows multiple properties to be included under a unified collapsible header in the Inspector. All properties that match the value (case-sensitive) will be placed in a section.
possibleValues
Creates a pick list on the Web Console. The value is a JSON document describing the possible values. In this example, F will be the raw value saved in the property value, but Female would be displayed to user in pick list and in the Inspector:
{ "M": "Male", "F": "Female" }
displayFormula
A JavaScript snippet used to display the value of the property. The snipped could be a single expression, or multiple lines with a return. All formulas have access to:
    vertex: The json vertex object (if the element is vertex)
    edge: The json edge object (if the element is an edge)
    ontology: The json ontology object
    prop: Function that accepts a property IRI and returns the display value.
    props: Function that accepts a property IRI and returns a list of all matching properties.
    propRaw: Function that accepts a property IRI and returns the raw value.
validationFormula
A JavaScript snippet used to validate the value of the property. The snipped could be a single expression, or multiple lines with a return true/false
aggType
How the property should be aggregated in ElasticSearch: none, Histogram, GeoHash , Statistics, Calendar
aggInterval
The aggregation interval for Histogram aggregation:
    For date fields: year, quarter, month, week, day, hour, minute, second
    For numeric fields: a positive decimal
aggCalendarField
For Calendar aggregation: DAY_OF_MONTH, DAY_OF_WEEK, HOUR_OF_DAY, MONTH, YEAR
aggTimeZone
The timezone to use in the Calendar aggregation
aggPrecision
The GeoHash precision to use for the GeoHash aggregation
Out of the box, the following properties are provided:
    Title - the title of the object
    Text - any text contents
    Source - the origin
    ...

Intents

The ontology defines concepts, relationships, and properties. During data processing, BigConnect needs to know what type of concept, relationship, and property to assign when it finds them. For example if BigConnect is scanning a document and finds a phone number, BigConnect will need to assign a concept to that phone number. This is where intents come in.
Intents can be defined in the ontology and overridden in the configuration. Out of the box, the following intents are provided, and they are used in the provided Data Workers:
    entityImage
    artifactContainsImage
    artifactTitle
    artifactHasEntity
    artifactContainsImageOfEntity
    entityHasImage
    media.duration
    media.dateTaken
    media.deviceMake
    media.deviceModel
    media.width
    media.height
    media.metadata
    media.fileSize
    media.description
    media.imageHeading
    media.yAxisFlipped
    media.clockwiseRotation
    bankAccount
    phoneNumber
    pageCount
    documentAuthor
    audioDuration
    videoDuration
    geoLocation
There are a lot of system properties, concepts and relations used internally by BigConnect.

Security

An important feature of the Graph Engine is that it includes a layer of fine-grained data security: every operation on the data structure is made using a set of Authorizations and each piece of data has a Visibility label attached to it.
The Visibility is used to determine whether a given user meets the security requirements to read the value. This enables data of various security levels to be stored in the same element (vertex or edge) and users of varying degrees of access to query the data, while preserving data confidentiality.
When changes to the graph are made, users can specify a visibility label for each value. These labels consist of a set of user-defined tokens that are required to read the value the label is associated with. The set of tokens required can be specified using syntax that supports logical AND & and OR | combinations of terms, as well as nesting groups () of terms together.
Each term is comprised of one to many alpha-numeric characters, hyphens, underscores or periods. Optionally, each term may be wrapped in quotation marks which removes the restriction on valid characters. In quoted terms, quotation marks and backslash characters can be used as characters in the term by escaping them with a backslash.
For example, suppose within our organization we want to label our data values with security labels defined in terms of user roles. We might have tokens such as: admin, audit, system
// Users must have admin privileges admin // Users must have admin AND audit privileges admin & audit // Users with either admin OR audit privileges admin | audit // Users must have audit and one or both of admin or system (admin|system)&audit
When both | and & operators are used, parentheses must be used to specify precedence of the operators.
When clients attempt to read data from BigConnect, any security labels present are examined against the set of authorizations passed by the client code. If the authorizations are determined to be insufficient to satisfy the visibility label, the value is suppressed from the set of results sent back to the client.
Authorizations are specified as a comma-separated list of tokens the user possesses:
1
Authorization auths = new Authorizations("admin","system");
Copied!

Storage and Search

In BigGraph the data is stored in a Graph by using Vertices, Edges, Properties and ExtendedData. There are three implementation for the Graph store:
    AccumuloGraph - this is the "production-grade" storage type. It's currently being used by large customers to store billions of vertices and edges. It uses Apache Accumulo as the underlying NoSQL data store and is designed to run in clusters of hundreds of nodes.
    RocksDBGraph - this is the "small-grade" storage type designed for local, desktop use. It should have to problem to store more than 1 million vertices and edges but it will start to get laggy once you pass that boundary. It cannot be distributed due to the nature of RocksDB.
    InMemoryGraph - built mostly for development purposes, it stores the data only in memory. Use this for development and keep in mind that restarting the JVM will lose all data.
BigGraph also needs a search index to run queries and aggregations against properties of vertices and edges. There are two implementations for the search index:
    Elasticsearch5SearchIndex - this is the "production-grade" implementation that uses Elasticsearch 7
    DefaultSearchIndex - built mostly for development purposes, it stores the data only in memory. Use this for development and keep in mind that restarting the JVM will lose all data.
Last modified 5mo ago