Download as pdf or txt
Download as pdf or txt
You are on page 1of 71

A Primer on Query Performance Optimization

in ArangoDB
Query Analysis. Indexes. Multi-Model.
Welcome
In this course you will learn how to achieve speed ups for your queries in ArangoDB.
We are going to cover fundamental techniques, with a focus on different indexes and
when to use which. We assume that you have written a few AQLs already.

As example dataset to try out queries and see the differences in the execution plans
and query performance, a modified version of the Amazon meta dataset featuring
product information is provided by us. Feel free to experiment with this dataset
following the course and enjoy!

Special thanks to @kavin and @chintanmishra for their detailed feedback to the beta version of this course!
What you will learn
‣ Profiling of queries
‣ Analyzing queries manually
‣ Usage of different index types
‣ Computational complexity of queries
‣ Query caching
‣ Storage engine differences

If you are completely new to ArangoDB, check out our other training material:

ArangoDB Training Center – Getting Started, Graph Course, New Features in v3.2 and more

3
Table of Content
‣ Preparations for this course (p.5) ‣ Search value range / sorting (p.42)

‣ Starting with the dataset (p.8) ‣ Combine hash and skiplist index? (p.45)
‣ Importing with arangoimp (p.9)
‣ Dataset structure (p.13) ‣ When to use a graph data model? (p.49)
‣ Category relations (p.50)
‣ Queries blocking collections (p.19) ‣ similar products graph (p.53)

‣ Find exact matches (p.23) ‣ Substring search in attribute (p.58)


‣ Document key (p.24)
‣ other attributes (p.27) ‣ Other (p.65)
‣ Vertex-centric indexes (p.66)
‣ Fast joins (p.33) ‣ Query caching (p.67)

‣ Hidden lookups (p.37) ‣ Final Tasks (p.68)

4
Preparations for this course

Requirements
&
Download and Install ArangoDB

5
Requirements
To follow this course and perform the described exercises, you will need:

‣ 4GB of RAM
‣ some free disk space
‣ an internet connection
‣ a web browser which supports Canvas or WebGL
‣ max. two hours of time

Note:

Click these symbols to get more details on a topic

This icon marks queries which will take a long time if run.
We recommend to only execute these queries with indexes in place,
in order to not interrupt your reading and learning flow.
6
Download and install ArangoDB
‣ Go to arangodb.com/download/ to find the latest Community or Enterprise Edition for your
operating system. Follow the instructions on how to download and install it for your OS. Further
details can be found here:
docs.arangodb.com/latest/Manual/GettingStarted/Installing/

‣ Once the server booted up, open http://localhost:8529 in your browser to access Aardvark,
the ArangoDB WebUI

‣ Login as "root", leave the password field empty and Login.


After that select "_system" as your database.

7
Starting with the dataset

Importing with Arangoimp


&
Dataset structure

8
Download and Import the Dataset
What we provide is a modified version of an Amazon dataset, comprised of product meta data.
Please download the training dataset zip ArangoDB_PerformanceCourse_DemoData.zip (46MB)
and unpack it to a folder of your choice. After unpacking you should see two newline-delimited
JSON files named products.jsonl and categories.jsonl.
JSON Lines website

Let’s import the data with ArangoDB's import tool arangoimp.


‣ Run the following in your console (single line):
arangoimp --file path to products.jsonl on your machine --collection
products --create-collection true
‣ Repeat this for the other file:
arangoimp --file path to categories.jsonl --collection categories
--create-collection true

Remember that we didn’t define a password for the user "root" so please just hit return when
asked for a password during import!
9
Download and Import the Dataset
You should see something like this in your console after putting in the import command:
Please specify a password:
Connected to ArangoDB 'http+tcp://127.0.0.1:8529', version 3.2.3, database: '_system', username: 'root'
----------------------------------------
database: _system
collection: products
create: yes
source filename: /path/to/products.jsonl
file type: json
threads: 2
connect timeout: 5
request timeout: 1200
----------------------------------------
Starting JSON import...
2017-09-21T05:00:24Z [10364] INFO processed 49838607 bytes (6%) of input file
2017-09-21T05:00:25Z [10364] INFO processed 99677214 bytes (12%) of input file
2017-09-21T05:00:25Z [10364] INFO processed 149483054 bytes (18%) of input file
2017-09-21T05:00:26Z [10364] INFO processed 199321661 bytes (24%) of input file
2017-09-21T05:00:27Z [10364] INFO processed 249127501 bytes (30%) of input file
2017-09-21T05:00:28Z [10364] INFO processed 298966108 bytes (36%) of input file
2017-09-21T05:00:29Z [10364] INFO processed 348771948 bytes (42%) of input file
2017-09-21T05:00:30Z [10364] INFO processed 398610555 bytes (48%) of input file
2017-09-21T05:00:31Z [10364] INFO processed 448416395 bytes (54%) of input file
2017-09-21T05:00:32Z [10364] INFO processed 498255002 bytes (60%) of input file
2017-09-21T05:00:33Z [10364] INFO processed 548093609 bytes (66%) of input file
2017-09-21T05:00:34Z [10364] INFO processed 597899449 bytes (72%) of input file
2017-09-21T05:00:35Z [10364] INFO processed 647738056 bytes (78%) of input file
2017-09-21T05:00:36Z [10364] INFO processed 697543896 bytes (84%) of input file
2017-09-21T05:00:38Z [10364] INFO processed 747382503 bytes (90%) of input file
2017-09-21T05:00:39Z [10364] INFO processed 797188343 bytes (96%) of input file
created: 548552
warnings/errors: 0
updated/replaced: 0
ignored: 0

10
What did arangoimp do?
‣ Created two new document collections (products and categories)
‣ Created one document for each line in the source files
‣ Imported around 550,000 product and 14,500 category documents

‣ Please note:
‣ The product ID of each original Amazon record is used as document _key in the edited dataset
we provide for this course
‣ Product reviews were removed, because they are not used in the course
‣ The categories documents were generated from the "categories" attribute in the products
documents and are referenced by their _key in the "categoryKeys" attribute of each product, to
demonstrate another data model.

11
In ArangoDB WebUI
‣ Go to ArangoDB WebUI
(http://localhost:8529 in
your browser) and choose
"COLLECTIONS" in the
menu
‣ Collection "products" and
"categories" should be
there now
‣ The document icon
indicates that both are
vertex collections, not
edge collections
‣ Click on a collection to
browse its documents

12
Starting with the dataset

Importing with Arangoimp


&
Dataset structure

13
Data structure of products documents
Example product as shown in the
Attribute Description document editor of the web interface:
_key a unique product identifier (as string)

_id collection name + "/" + _key (computed property)

ASIN Amazon Standard Identification Number

title product title

group generic product type (e.g. Book)

salesrank numeric value, lower means more units sold

similar array of similar products (ASINs as strings)

categories array of product categories

categoryKeys array of document keys (categories collection)

14
Hands on: First AQL Queries
You may run below queries to get an impression of the data. We will later show and explain other
queries in more detail. Copy below queries into the query editor and
- click Execute or hit
- Ctrl + Return, respectively
- Cmd + Return (Mac keyboard)

‣ Return ten products


FOR prod IN products
LIMIT 10
RETURN prod
‣ Return the titles of the most popular DVDs
FOR prod IN products
FILTER prod.group == 'DVD' AND prod.salesrank > 0
SORT prod.salesrank
RETURN prod.title

15
Hands on: First AQL Queries
The query results appear below the query
editor. Click on next to the total query
runtime in the header of the result box to
display detailed profiling information.

Click the Explain button if you want to see


the execution plan for the query. This does
not run your query, but merely analyzes
and optimizes the query statement by
checking for opportunities to use indexes
and applying other optimization rules that
aid query execution speed.

By the way: Query run times mentioned in this course are based
on the RocksDB storage engine and a recent desktop computer.

16
Hands on: First AQL Queries
The simple product query explained:
We can see the query string, how it translates to the execution plan and that there are no indexes
used and no optimization rules applied so far:

17
Hands on: First AQL Queries
‣ Return the ten most used categories
FOR prod IN products
FOR cat IN prod.categories OR []
COLLECT category = cat COLLECT in AQL documentation
WITH COUNT INTO count
SORT count DESC
LIMIT 10
RETURN { category, count }

prod.categories OR [] is used to fallback to an empty array in case of no categories


(some products don't have any) in order to prevent an error with the FOR loop.

By the way:
Keywords like FOR, COLLECT and RETURN are written all upper case in the code examples,
but it is merely a convention. You may also write them all lower case or in mixed case.
Names of variables, attributes and collections are case-sensitive however!

18
Queries blocking collections

Memory-mapped files
vs.
RocksDB

19
Queries blocking collections
While running a data modification query that inserts new documents into a collection for
instance, other queries that try to read from the same collection are blocked – if you use
ArangoDB with the memory mapped files (MMFiles) storage engine.

Using this mostly-memory engine, writes block reads and locking is on collection level, i.e. the
entire collection is locked for the duration of the query, even if only single document is created or
modified. It guarantees, that no simultaneous query interferes with the query, or sees a not yet
committed change (ACID transaction).

With the RocksDB storage engine however, document-level locks allow for concurrent writes.
Writes do not block reads. Reads do not block writes. You can test this with the following query:

FOR i IN 1..100
INSERT {value: i} INTO products Storage Engine documentation
RETURN SLEEP(1)

20
Queries blocking collections
The query inserts documents into the products collection we
already have, and waits for a second between each write. This
gives us enough time to try to view the products collection in
the web interface after starting the query. You will see the
"Fetching documents" animation and the documents will not
show before the slow query completed (after 100 seconds)
unless you are using the RocksDB storage engine.

If you are unsure which storage engine you are currently


using, check the dashboard in the web interface (versions
before 3.2 support MMFiles only and do not show an engine):

Switching the storage engine

21
Queries blocking collections
If you compare the query execution
MMFiles RocksDB
times of the examples presented in this
course, performance will probably be in default optional
favor of MMFiles. However, this is under
dataset needs to fit into work with as much data
laboratory conditions. You will likely see memory as fits on disk
a notable performance benefit with hot set in memory,
indexes in memory
RocksDB in production with many data and indexes on disk
slow restart due to index fast startup
concurrent requests.
rebuilding (no rebuilding of indexes)
volatile collections collection data always
See the key differences between both (only in memory, optional) persisted
storage engines in alongside table. If collection level locking concurrent reads and
(writes block reads) writes
RocksDB is suitable for your use case,
consider switching from MMFiles.
Comparing new RocksDB and MMFiles storage engines

22
Find exact matches

Document key
&
other attributes

23
Find exact match – document key
The probably most common operation you want to perform in a
database system is to retrieve one particular document.

If you know the primary key (or document key in ArangoDB terms)
of that document, you can do a lookup in a key-value store fashion.
What that means is, that the database system returns a value – the
desired document – upon your request, in which you tell it the key
for that value. What is in that document doesn’t matter, the
database system treats it as opaque.

It’s like a book in a shelf with a number on it and you ask the
librarian for that book, and he brings it without opening and
reading in it.

Let’s do such a request in AQL using the example dataset.

24
Find exact match – document key
RETURN DOCUMENT('products/31931')
{
"_key": "31931",
"_id": "products/31931",
"ASIN": "0851993575",
"title": "Avocado: Botany, Production and Uses",
"group": "Book",

}

We request the document with key "31931" from the products collection, which happens to be a
book about Avocado. The query time should be in the sub milliseconds area.

There is a primary index on the _key attribute of every collection in ArangoDB, which is used in
lookups by _key or _id and it can not be removed.

25
Find exact match – document key
Here is an equivalent AQL query, which also returns this document:

FOR prod IN products


FILTER prod._key == '31931'
RETURN prod

It is a bit more to type yet the query achieves the same result and is equally fast.

The benefit of above query is the flexibility of the FOR loop-style approach.
We can easily replace _key by any other attribute key like ASIN or group.

The DOCUMENT function on the other hand can only retrieve documents via the document ID
or a combination of collection name and document key.

DOCUMENT function in AQL documentation

26
Find exact matches

Document key
&
other attributes

27
Find exact match – other attribute
Instead of filtering by the document key, we could use the Amazon Standard Identification
Number (ASIN) to find the document:

FOR prod IN products


FILTER prod.ASIN == '0851993575'
RETURN prod

The result is identical, but wait, why does it suddenly take seconds than milliseconds to compute?

Well, to get back to the library metaphor: the book doesn’t have the ASIN written on the cover.
And there’s also no rule where to find it based on what the ASIN is, unlike the indexed document
key. This leads to the librarian opening every book in the library to check what ASIN is written
inside, which takes way longer than to locate a book by its key. He has to perform what is called a
full collection scan in database terms.

28
Find exact match – other attribute
For a full collection scan, we start at the first document in the
collection and check the ASIN attribute if it matches our condition.
Then we check the second document and so on until the last
document in the collection.

If we know that every value exists once at most (which is the case
for ASIN), we can stop scanning after the first match because there
can't be another match. To express this in a query, we can do:

FOR prod IN products


FILTER prod.ASIN == '0851993575'
LIMIT 1 // stop after first match
RETURN prod

The limit operation can be omitted if an index with the unique


option exists for the attribute.
29
Find exact match – other attribute
How can we avoid the full collection scan? By creating an index for the ASIN, so that we can do a
fast lookup. Since we are looking for an exact match of the ASIN, the correct choice is a hash
index. The primary index on the key attribute is also a hash index by the way.

30
Find exact match – other attribute
Now let’s re-run the same query from before:

FOR prod IN products


FILTER prod.ASIN == '0851993575'
RETURN prod

It should take like a millisecond, so about the same time as looking up the document by its key.
Here is a comparison of the execution plans before and after adding the index (using the Explain
feature):

31
Find exact match – other attribute

You can see that the hash index is used in the middle section of the explain output, as well as by
the optimization rule "use-indexes" listed at the bottom. The execution plan looks different as
well, there is no FILTER statement anymore. Instead of a full collection scan, the hash index is
consulted to retrieve the right document.

The selectivity of the hash index is 100%, which means there is at most one document for a given
ASIN. This is always the case for indexes with the unique option enabled. If there were duplicate
ASIN values, the index creation would had failed with a unique constraint violation error.

32
Joins

Fast joins with hash indexes

33
Fast joins
For some queries, the performance gain achieved with a hash index can be higher by orders of
magnitude. This is usually the case if you use JOINs to match up documents of multiple
collections.

FOR prod IN products


LET categories = (
FOR catKey IN prod.categoryKeys Long running query!
FOR cat IN categories You may add e.g. LIMIT 10 after first line.
FILTER cat._key == catKey
RETURN cat.label
)
RETURN MERGE(prod, {categories: categories})

If there are category keys stored per product and we want to resolve those keys to their readable
name, then we would need to iterate over the categoryKeys and join the category labels.
Without the primary index on _key, the computational complexity would be really high in the
worst case - for each key all documents in "categories" would be checked until a match is found.
34
Fast joins
For 5 categoryKeys and with 14,500 categories in total and
if we assume that we find a match after checking 50% of
all categories, that would sum up to 36,250 document
reads - for a single product. To do that for all 550,000
products would mean almost 20 billion document reads.

This insanely high computational complexity makes such a


query a no-go. No matter how much hardware you throw
at it, you can not perform this query in production in a
reasonable manner.

35
Fast joins
Because there is a hash index on _key that can be used for
matching up the categoryKeys however, there is no need
to check a lot of documents in categories. Instead, the
correct document can be looked up immediately. So for 5
categoryKeys, it takes 5 lookups in the hash index for one
product, or 2.75 million for all products.

As a rule of thumb:
If you write queries that contain nested FOR loops, make
sure to check if indexes can be and are actually utilized by
inspecting the query explain output.

36
Hidden lookups

Lookups in UPSERT operations

37
Hidden lookups
The previous example had nested FOR loops in the query, which are easy to spot and should
make you think for a second whenever ever you see such a construct. Check the explain output
to verify that some index is used for good performance.

Sometimes it is less obvious. An UPSERT statement inside a FOR loop can also have a high
runtime complexity. If we iterate over each product and either INSERT or UPDATE a document
based on whether or not a matching document already exists, then there's an obfuscated lookup
to check for the existence of a matching document – which can be another full collection scan if
no index can be used.

Create a new document collection "ranks" if you want to execute the query:
FOR prod IN products
UPSERT { salesrank: prod.salesrank }
INSERT { salesrank: prod.salesrank, count: 1 } Extremely long running query!
UPDATE { count: OLD.count + 1} IN ranks

38
Hidden lookups
The query creates one document per salesrank value and counts the number of occurrences.

The query explain reveals that there is a nested full collection scan, which is the representation of
the UPSERT { salesrank: prod.salesrank } expression to check for an existing document.
No indexes are used.

39
Hidden lookups
If we create a unique hash index for the "salesrank" attribute for the ranks collection and explain
the query, we can see the index being used for the lookup:

The query takes about a minute to complete with the hash index scan, and a long time without.

40
Hidden lookups
The better choice to compute the same FOR prod IN products
COLLECT salesrank = prod.salesrank
results would be a COLLECT query however,
WITH COUNT INTO count
which does not require any index being set INSERT { salesrank, count }
up before the query execution and takes INTO ranks
only around 15 seconds to complete.
Note: Remove the index on salesrank and truncate the ranks collection before the query.

41
Search value range / sorting

Using skiplist indexes

42
Search a value range / sorting
We have a "salesrank" attribute in the documents, which we could use to sort the products and
return the bestsellers. Let's try that without any preparations:

FOR prod IN products


SORT prod.salesrank
FILTER prod.salesrank > 0
LIMIT 100
RETURN prod

We sort by salesrank in ascending order (default), filter out some oddball values (-1, 0 and no
salesrank value) and return the top 100 products.

The query is kind of slow and actually sorts all products over and over again if you run the query
multiple times. We could create a hash index on the attribute, but it wouldn't be used at all –
because hash indexes exclusively support exact matching and can't compare the indexed values.

43
Search a value range / sorting
What we need is a skip list index,
which is a sorted list of values with
shortcuts to find and return a range
of values pretty fast:

3 milliseconds with the index, and


around 6 seconds without.

44
Combine hash and skip list indexes?

Multiple indexes
&
indexes over multiple fields

45
Combine hash and skip list indexes?
An actually more common query in a web shop however is to return bestsellers for a certain
category. What we want to check is that the category a user is interested in is contained in the list
of categories, then sort by salesrank and return the top products.

It sounds like we should use a hash array index to exactly match the category, and a skip list index
to sort by rank. This isn't possible however. Indexes can only be utilized in combination in a few
cases.
In a case like ours, we could either pre-filter products with a hash index, then sort the documents
without index, or go through all products from best to least selling via the skip list index and post-
filter on the category.

FOR prod IN products


SORT prod.salesrank
FILTER 'Study & Teaching[13985]' IN prod.categories
LIMIT 100
RETURN CONCAT(prod.salesrank, '. ', prod.title)

46
Combine hash and skip list indexes?
There is one more option however: a skip list
index, that indexes the array elements of
"categories" and also the salesrank. This creates
a rather large index, but it is the most efficient
way to return the top products per category:

FOR prod IN products


FILTER prod.salesrank > 0 AND prod.salesrank < 100000
SORT prod.salesrank
FILTER 'Wild West[513098]' IN prod.categories
LIMIT 100
RETURN prod

47
Combine hash and skip list indexes?
The upper boundary for salesrank is optional. Any range could be retrieved quickly. If there was a
price attribute, we could index that as well and define the relevant price range in the query.

As you can see in the query explainer, the index is used to restrict the result to the Wild West
category as well as to limit the salesrank.

48
Excursion:
When to use a graph data model?

Category relations
&
similar products graph

49
Excursion: When to use a graph data model?
We worked with the directly embedded categories as well as with the categoryKeys, which we used
to join the category labels.

The former approach is the fastest if we query for


products, but at the cost of redundant category data,
which may become inconsistent over time (i.e. if you
want to rename a category, you need to update all
product documents which have this category).
Moreover, you can't quickly return a list of all distinct
categories because you need to check each product
document and only return unique category entries.

The latter approach using joins overcomes these drawbacks at the cost of a slightly slower query
execution speed if the category labels are joined into the result.

50
Excursion: When to use a graph data model?
Thanks to ArangoDB's multi-model nature, there is a third approach to consider.
Products and categories can be linked by edges and make up a graph:

The graph traversal to get the category labels for a product is a bit slower than both other
approaches however.
Explanation and benchmark at StackOverflow

So when should you use a graph instead of joins from a performance perspective?

51
Excursion: When to use a graph data model?
If categories were not only connected with products, but also among each other like in the
example below, you could easily find all sub-categories of any main category at a variable depth.
This is easily possible with a graph data model without changing the product documents.

Rule of thumb: If you do not know the depth of your search a priori, a graph traversal is likely
to be faster compared to joins – and queries tend to be simpler and easier to maintain.

More graph use cases:

52
Excursion:
When to use a graph data model?

Category relations
&
similar products graph

53
Excursion: When to use a graph data model?
Another example of nested loops, which tremendously benefit from a hash index is if we want to
create edges from the "similar" attribute, to link products with products via their ASIN for a nice
graph of related products. Although this being a one-off task, we can vastly accelerate the query.

First off, create a new edge collection "similar". Make sure there is a hash index on the ASIN
attribute in the products collection. Then run the following query:

FOR p IN products
FOR s IN p.similar OR []
LET to = FIRST(FOR p2 IN products
FILTER p2.ASIN == s
LIMIT 1
RETURN p2._id)
FILTER to != null
INSERT {_from: p._id, _to: to} INTO similar

54
Excursion: When to use a graph data model?
For every product, we iterate over the elements To visualize the data, a graph definition can be
of the "similar" array and join the products by created like this in the GRAPHS section:
matching ASIN to determine the _id of the
similar product, which we need to create an
edge that links both products. If an ASIN can't
be found, the result in the "to" variable is null.
We need to filter it out to avoid query errors.

55
Excursion: When to use a graph data model?
Click on the graph to bring up the graph viewer, which chooses a random start vertex and depicts
its neighborhood, with a search depth of 2 and a limit of 250 by default. You can change the
settings in the side panel (click at the top right)

If we had tried to match up


the ASIN numbers listed in
the "similar" attribute without
the help of an index, it would
had taken forever to rescan
the products collection over
and over again to find the
matching document for each
element in the "similar" array.

56
Excursion: When to use a graph data model?
A possible query for the graph: find a product by ASIN, then start a graph traversal from there and
return similar products linked by edges pointing in any direction, at a depth between 1 and 2.
The result can be sorted by rank and limited to the most popular products.

FOR prod IN products


FILTER prod.ASIN == '0140314504'
FOR v, e, p IN 1..2 ANY prod GRAPH 'similar_products'
OPTIONS {uniqueVertices: 'global', bfs: true}
SORT v.salesrank
LIMIT 10
RETURN {
title: v.title,
parent: p.vertices[-2].title,
rank: v.salesrank
}

57
Substring search in attribute

From no index
to full-text index

58
Substring search in attribute
To let your customers find products in your web shop, you want to let them search the products
titles. One way to translate the search to an AQL query would be:

FOR prod IN products


FILTER LIKE(prod.title, '%avocado%', true)
RETURN prod.title

It matches all products which contain "avocado" (case-insensitive) in the title. The LIKE function
constructs a regular expression internally with wildcards on both ends of the search phrase.
While regular expressions are very flexible, they can't be accelerated by indexes.

For this particular query, it is better to use a specialized function that tests if a substring is
contained in a string, namely CONTAINS.
LIKE in AQL documentation

59
Substring search in attribute
CONTAINS does not have an option for case-insensitive comparison, but we can convert the
product titles on-the-fly with LOWER:

FOR prod IN products


FILTER CONTAINS(LOWER(prod.title), 'avocado')
RETURN prod.title

We can't make use of any indexes this way either, but performance is better than with LIKE.

CONTAINS in AQL documentation

60
Substring search in attribute
We can't use a hash index, because one would only find a product if the entire title is entered by
the user like it is stored in the database, to find the exact match.

A skiplist index can't be used to find a substring anywhere in an attribute, but it is possible to find
prefix matches. There are two products that start with "Avoc":

‣ Avocado: Botany, Production and Uses


‣ Avocado Recipes Etc

We can retrieve them from the alphabetically sorted index by limiting the range to everything
greater or equal to "Avoc" as lower boundary and everything smaller than "Avod" as upper
boundary, which is the lower boundary with the last letter replaced by the alphabetically next
letter (hence "c" becomes "d").

61
Substring search in attribute
FOR prod IN products
FILTER prod.title >= 'Avoc' AND prod.title < 'Avod'
RETURN prod.title

Don't forget to create a skiplist index on the title attribute for a sub-millisecond response.

This approach can be used to implement auto-complete suggestions on the client side.

Cookbook recipe: Populating an auto-complete textbook

A limitation of this approach is that the index lookup is case-sensitive. The index can no longer be
utilized if you extend the query with calls to LOWER to convert case. It would be possible however
to store an all lower case copy of the title in another attribute and create the index on this
instead.

62
Substring search in attribute
The by far best solution to tackle the challenge
of finding "avocado" case-insensitive in the
product title is to use a full-text index however.

The FULLTEXT function has to be called with the


collection name, the indexed attribute name, the
search expression and optionally a limit of how
many documents should be returned at most:

FOR prod IN FULLTEXT(products, 'title', 'avocado')


RETURN prod.title

FOR prod IN FULLTEXT(products, 'title', 'complete:avocado', 10)


RETURN prod.title

63
Substring search in attribute
The search expression "avocado" is the same as "complete:avocado" and finds documents
that contain the word "avocado" anywhere in the string.

Prefix matching is also possible with an expression like "prefix:avoc". In contrast to the skiplist
index approach, it finds the prefix "avoc" case-insensitive and for every word in the full string.

You can also search for records that contain multiple words or prefixes and at the same time do
not contain certain other words.

FULLTEXT in AQL documentation

64
Other

Vertex-centric indexes
&
Query caching

65
Vertex centric indexes
There are special indices designed to speed
up graph traversals, namely vertex-centric
indexes. They yield better query performance
than the default edge index especially if the
graph contains supernodes (vertices that have
an exceptionally high amount of connected
edges) by utilizing edge attributes.

You can find an example in our graph course


for freshers including a dataset to try it out.

ArangoDB Graph Course for Freshers

66
Query caching
If you run ArangoDB in production, different users may issue the same queries, which will be
computed over and over again. This can pose a performance problem in particular if some of the
often run queries with identical results are rather long running queries.

Instead of recomputing the same results, the query result can be stored after the first run and
subsequent requests can be served from the query cache (as long as the collection data doesn't
change), which will only take a fraction of the time of the original query.

Check out the AQL documentation how to configure the query cache and test it with your typical
workload to see if you can gain performance.

Caching query results in AQL documentation

67
Final tasks :)

Add-ons
&
Support ArangoDB

68
Add-ons

We hope you enjoyed the course and it helped to improve your


query performance.

What would you like to learn next ?


Tell us with 3 clicks:

69
Support ArangoDB

You can support ArangoDB in multiple ways…

Feedback to the course Add "ArangoDB" to your skills Tweet about ArangoDB

[email protected]

70
Your ArangoDB team

You might also like