A java library for creating standalone, portable, schema-full object databases supporting pagination and faceted search, and offering strong-typed and generic APIs. Built on top of Apache Lucene.
A java library for creating standalone, portable, schema-full object databases supporting pagination and faceted search, and offering strong-typed and generic APIs.
Built on top of Apache Lucene.
Main features:
Table of Contents:
<dependency>
<groupId>org.brutusin</groupId>
<artifactId>flea-db</artifactId>
</dependency>
Click here to see the latest available version released to the Maven Central Repository.
If you are not using maven and need help you can ask here.
All flea-db
functionality is defined by FleaDB
interface.
The library provides two implementations for it:
GenericFleaDB
.ObjectFleaDB
built on top of the previous one.GenericFleaDB
is the lowest level flea-db implementation that defines the database schema using a JSON schema and stores and indexes records of type JsonNode
. It uses Apache Lucene APIs and org.brutusin:json
SPI to maintain two different indexes (one for the terms and other for the taxonomy, see index structure), hyding the underlying complexity from the user perspective.
This is how it works:
JsonSchema
and an index folder are passed depending on whether the database is new and/or persistent. Then the JSON schema (passed or readed from the existing database flea.json
descriptor file) is processed, looking for its index
properties, and finally a database schema is created.JsonNode
record is validated against the JSON schema. Then a JsonTransformer
instance (making use of the processed database schema) transforms the records in terms understandable by Lucene (documents, fields, facet fields ...) and finally the storage is delegated to the Lucene API.Query
and Sort
objects are transformed into terms understandable by Lucene making use of the database schema. The returned paginator is basically a wrapper around the underlying luecene IndexSearcher
and Query
objects that lazily (on demand) performs searches to the index.ObjectFleaDB
is built on top of GenericFleaDB
.
Basically an ObjectFleaDB
delegates all its functionality to a wrapped GenericFleaDB
instance, making use of org.brutusin:json
to perform transformations POJO<->JsonNode
and Class<->JsonSchema
. This is the reason why all flea-db
databases can be used with GenericFleaDB
.
As cited before, this library makes use of the org.brutusin:json
, so a JSON service provider like json-provider
is needed at runtime. The choosen provider will determine JSON serialization, validation, parsing, schema generation and expression semantics.
Standard JSON schema specification has been extended to declare indexable properties ("index":"index"
and "index":"facet"
options). See annotations section for more details.
Example:
{
"type": "object",
"properties": {
"age": {
"type": "integer",
"index": "index"
},
"category": {
"type": "string",
"index": "facet"
}
}
}
"index":"index"
: Means that the property is indexed by Lucene under a field with name set according to the rules explained in nomenclature section."index":"facet"
: Means that the property is indexed as in the previous case, but also a facet is created with this field name.See documentation in JSON SPI for supported annotations used in the strong-typed scenario.
Databases are self descriptive, they provide information of their schema and indexed fields (via Schema
).
Field semantics are inherited from the expression semantics defined in the org.brutusin:json-provider
Supose JsonNode node
to be stored and let fieldId
be the expression identifying a database field, according to the previous section.
Expression exp = JsonCodec.getInstance().compile(fieldId);
JsonSchema fieldSchema = exp.projectSchema(rootSchema);
JsonNode fieldNode = exp.projectNode(node);
Then, the following rules apply to extract index and facet values for that field:
fieldSchema | index:index | index:facet |
---|---|---|
String | fieldNode.asString() |
fieldNode.asString() |
Boolean | fieldNode.asString() |
fieldNode.asString() |
Integer | fieldNode.asLong() |
Unsupported |
Number | fieldNode.asDouble() |
Unsupported |
Object | each of its property names | each of its property names |
Array | recurse for each of its elements | recurse for each of its elements |
Databases can be created in RAM memory or in disk, depending on the addressed problem characteristics (performance, dataset size, indexation time ...).
In order to create a persistent database, a constructor(s) with a File
argument has to be choosen:
Flea db1 = new GenericFleaDB(indexFolder, jsonSchema);
// or
Flea db2 = new ObjectFleaDB(indexFolder, Record.class);
NOTE: Multiple instances can be used to read the same persistent database (for example different concurrent JVM executions), but only one can hold the writing file-lock (claimed the first time a write method is called).
On the other side, the database will be kept in RAM memory and lost at the end of the JVM execution.
Flea db1 = new GenericFleaDB(jsonSchema);
// or
Flea db2 = new ObjectFleaDB(Record.class);
The following operations perform modifications on the database.
In order to store a record the store(...)
method has to be used:
db1.store(jsonNode);
// or
db2.store(record);
internally this ends up calling addDocument
in the underlying Lucene IndexWriter
.
The API enables to delete a set of records using delete(Query q)
.
NOTE: Due to Lucene facet internals, categories are never deleted from the taxonomy index, despite of being orphan.
Previous operations (store and delete) are not (and won't ever be) visible until commit()
is called. Underlying seachers and writers are released, to be lazily created in further read or write operations.
Databases can be optimized in order to achieve a better performance by using optimize()
. This method triggers a highly costly (in terms of free disk space needs and computation) merging of the Lucene index segments into a single one.
Nevertheless, this operation is useful for immutable databases, that can be once optimized prior its usage.
Two kind of read operations can be performed, both supporting a Query argument, that defines the search criteria.
Record queries can be paginated and the ordering of the results can be specified via a Sort argument.
public E getSingleResult(final Query q)
public Paginator<E> query(final Query q)
public Paginator<E> query(final Query q, final Sort sort)
FacetResponse
represents the faceting info returned by the database.
public List<FacetResponse> getFacetValues(final Query q, FacetMultiplicities activeFacets)
public List<FacetResponse> getFacetValues(final Query q, int maxFacetValues)
public List<FacetResponse> getFacetValuesStartingWith(String facetName, String prefix, Query q, int max)
public int getNumFacetValues(Query q, String facetName)
public double getFacetValueMultiplicity(String facetName, String facetValue, Query q)
Faceting is provided by lucene-facet.
Databases must be closed after its usage, via close()
method in order to free the resources and locks hold. Closing a database makes it no longer usable.
Both implementations are thread safe and can be shared across multiple threads.
Persistent flea-db databases create the following index structure:
/flea-db/
|-- flea.json
|-- record-index
| |-- ...
|-- taxonomy-index
| |-- ...
being flea.json
the database descriptor containing its schema, and being record-index
and taxonomy-index
subfolders the underlying Lucene index structures.
flea-db
offers the following ACID properties, inherited from Lucene ones:
Generic API:
// Generic interaction with a previously created database
FleaDB<JsonNode> db = new GenericFleaDB(indexFolder);
// Store records
JsonNode json = JsonCodec.getInstance.parse("...");
db.store(json);
db.commit();
// Query records
Query q = Query.createTermQuery("$.id", "0");
Paginator<JsonRecord> paginator = db.query(q);
int totalPages = paginator.getTotalPages(pageSize);
for (int i = 1; i <= totalPages; i++) {
List<JsonRecord> page = paginator.getPage(i, pageSize);
for (int j = 0; j < page.size(); j++) {
JsonRecord json = page.get(j);
System.out.println(json);
}
}
db.close();
Strong-typed API:
// Create object database
FleaDB<Record> db = new ObjectFleaDB(indexFolder, Record.class);
// Store records
for (int i = 0; i < REC_NO; i++) {
Record r = new Record();
// ... populate record
db.store(r);
}
db.commit();
// Query records
Query q = Query.createTermQuery("$.id", "0");
Paginator<Record> paginator = db.query(q);
int totalPages = paginator.getTotalPages(pageSize);
for (int i = 1; i <= totalPages; i++) {
List<Record> page = paginator.getPage(i, pageSize);
for (int j = 0; j < page.size(); j++) {
Record r = page.get(j);
System.out.println(r);
}
}
db.close();
See available test classes for more examples.
This module could not be possible without:
4.10.3
(Dec, 2014)
https://github.com/brutusin/flea-db/issues
Contributions are always welcome and greatly appreciated!
Apache License, Version 2.0 http://www.apache.org/licenses/LICENSE-2.0