Oracle Blockchain Platform tutorial, Part 2 of 3: The Rich History Database

The Rich History Database

This is the second in a three part series on Oracle's blockchain as a service cloud offering. In the first article I walked through Oracle's blockchain platform and how to build and deploy a Hyperledger Fabric network on OBP. In this second article I'm going to focus on the OBP Rich History Database and why I think this is an important feature. I will describe how to configure the RHDB to stream data from the Hyperledger Fabric network created in the first post. If you haven't read part one, I recommend taking a look before reading this post so the examples will make sense.

The rich history database is external to Oracle Blockchain Platform and contains data about the blockchain ledger’s transactions on the channels you select. You can use this database to create analytics, reports, and visualizations about your ledger’s activities. It can also be useful for audit and archive needs.

Internally, Oracle Blockchain Platform uses the Hyperledger Fabric history database to manage the ledger and present ledger transaction information to you in the console. Only the chaincodes can access this history database, and you can’t expose the Hyperledger Fabric history database as a data source for analytical queries. The rich history database uses an external Oracle database and contains details about every transaction committed on a channel.

Currently, you can only use an Oracle database such as Oracle Autonomous Data Warehouse or Oracle Database Cloud Service with Oracle Cloud Infrastructure to create your rich history database. You use the Oracle Blockchain Platform console to provide the connection string and credentials to access and write to the Oracle database. Note that the credentials you provide are the database’s credentials and Oracle Blockchain Platform doesn’t manage them. After you create the connection, you’ll select the channels that contain the ledger data that you want to include in the rich history database.

The RHDB is a very useful feature in OBP. As stated above, without the RHDB we have a challenge reporting data that is on the blockchain to end users. If you were to implement your own Hyperledger Fabric network you would have to develop a custom data pump process of your own to do the same thing Oracle is doing. Having this feature out of the box has been extremely useful in our proof of concepts. Setting up an Autonomous Data Warehouse instance takes a few minutes and then it is only another few moments to configure the RHDB.

One detraction is that in the current release I actually find the RHDB configuration very limited. What I mean by this is the interface between the OBP and RHDB is not robust and once it is set up, it is hard to go back and know what was done and if the connection is still up and running. I've submitted some enhancement requests to make the interface more robust so we have a clearer presentation of when the connection was set up, who set it up, and what is actually configured at that moment.


Let's take a look at how to configure RHDB using an ADW instance.

First let's take a quick run through setting up our ADW instance. Note that there are many resources available on the internet on this topic so I'm not going to get into a lot of detail here. Fortunately, the set up is pretty easy.

Setting up an Oracle Autonomous Data Warehouse instance

Log in to your Oracle cloud console, click the hamburger menu icon in upper left and navigate to Autonomous Data Warehouse.



Next, click the Create Autonomous Database button.



Select an appropriate Compartment from the drop down menu, then choose a name for the DB instance.



Configure the database resources. For this example I am going to use the "Always Free" option.
Enter a password for the ADMIN account.



Tag the instance, if you wish, and click on Create Autonomous Database



After a few moments your new database is ready for use.



Click on DB Connection to download your client credentials (wallet).

You can use this to set up a connection in SQL Developer to access the database or you can use the cloud based version of SQL Developer by clicking on Service Console, then Development, and then SQL Developer Web



You now have a fresh Oracle database instance to work with


Configuring Rich History

Now that we have a database available we can connect it to our blockchain instance and begin streaming our data from the blockchain into our schema.

The founder and all participants nodes on the blockchain network can configure their own rich history database. This makes sense if we consider that all of our participants are typically from different organizations. We are not going through the trouble of creating a decentralized network just so we can have to rely on a centralized data warehouse. Each participant in the network can set up their own ADW instance/schema to store their blockchain activity. RHDB tables are based on HLF channels, so participants can only stream into their RHDB channels they participate in.

For our example, we're going to configure the RHDB for the Founder.

Open the OBP founder console. Click on the hamburger menu in the upper right, below your user profile name. Select 'Configure Rich History'


The rich history wizard opens; enter the schema name in your ADW, for our example we are using ADMIN. Enter your password and upload the wallet file you downloaded earlier. In a perfect world, the wizard would extract the connection string from the wallet file you uploaded. Seems like a waste of time to upload the file and still have to get the connection string. Perhaps this will be enhanced in the future. To get the connection string, you can unzip your wallet file (make a copy first) and copy the connection string from the tnsnames.ora file. Alternatively, you can go back to your OCI administrator console, navigate to your ADW instance and click on DB Connection button. From the pop-up window copy the connection string you want to use.

Be aware this has been a bit buggy for me and doesn't always work the way you would expect or the way the screen tip suggests. If you paste the entire connection string, you are going to get a driver error (at least in the version I am currently using). To get this to work you will want to put in the schema name, password, the last part of the connection string that defines the connection you want to use (low, medium, or high) and then attach your wallet file. If you do it correctly you will get a success message when you save.



We're not done yet. So far what we have done is configured our database connection. Next we need to go configure our channel for RHDB.

From the OBP console, click on the Channels tab. Go to the hamburger menu on the far right for testch1 and click on Configure Rich History.



Here you simply check the box for Enable Rich History and Save.

Go ahead and open SQL Developer or SQL Developer Web and navigate to the ADMIN schema. Review the tables and you should see three new tables added. Note it may take a few minutes for OBP to create the objects and sync the data.

This is an area where I have had some difficulty. It seems getting the RHDB sync to begin is a bit buggy. You may have to try a few tricks to get it to work; if after a few minutes you still don't see the tables, try the following steps.
  1. Go back to channel tab and toggle the Enable Rich History check box off and on. You'll want to hit save between clicks.
  2. Go to nodes tab and restart your nodes by right clicking on the hamburger menu for each node and selecting restart.
Eventually, I got it to kick in and my tables were created. It seems once it gets going it is pretty stable after that, but the first sync may require a little playing around with it.

You should now see three tables in your schema
  1. testch1_hist
  2. testch1_last
  3. testch1_state
Click on each table to view the structure of the tables and the data that is currently in them; it should look familiar from when we ran our chaincode.



Let's explore these tables a little deeper.

testch1_hist

This is the main table I've used in my projects so far. As the name suggests, this is the history table and it will have the majority of content you are going to want, especially for reporting. The history table is made up of 10 fields. Let's look at the table DDL to learn more.

CREATE TABLE "TESTFOUNDER_testch1_hist"
    (
     CHAINCODEID  VARCHAR2 (256) ,
     KEY          VARCHAR2 (1024) ,
     TXNISVALID   NUMBER (1)  NOT NULL ,
     VALUE        VARCHAR2 (4000) ,
     VALUEJSON    CLOB ,
     BLOCKNO      NUMBER ,
     TXNNO        NUMBER ,
     TXNID        VARCHAR2 (128)  NOT NULL ,
     TXNTIMESTAMP TIMESTAMP ,
     TXNISDELETE  NUMBER (1)
    )
        TABLESPACE DATA
        LOGGING
    COLUMN STORE COMPRESS FOR QUERY HIGH
    NO INMEMORY
;

ALTER TABLE "TESTFOUNDER_testch1_hist"
    ADD CONSTRAINT TESTFOUNDER_TESTFOUNDER_TESTCH1_HIST_JS
    CHECK ( valueJson IS JSON)
;

CREATE UNIQUE INDEX TESTFOUNDER_TESTFOUNDER_TESTCH1_HIST_PK ON "TESTFOUNDER_testch1_hist"
    (
     CHAINCODEID ASC ,
     KEY ASC ,
     BLOCKNO ASC ,
     TXNNO ASC
    )
    TABLESPACE DATA
    LOGGING
;

ALTER TABLE "TESTFOUNDER_testch1_hist"
    ADD CONSTRAINT TESTFOUNDER_TESTFOUNDER_TESTCH1_HIST_PK PRIMARY KEY ( CHAINCODEID, KEY, BLOCKNO, TXNNO )
    USING INDEX TESTFOUNDER_TESTFOUNDER_TESTCH1_HIST_PK ;


You can see in this DDL the 10 fields and their data type. Most of the fields are informational, with the VALUEJSON being the one that has the actual transaction stored in a JSON format. Take note that the table has a composite primary key made up of four fields (CHAINCODEID, KEY, BLOCKNO, TXNNO). This is going to be important in the next section when I discuss how to normalize the data in the JSON structures.

testch1_last

The second table is the "last" table, which provides the latest height. The last table is very simple with only three fields. At any given time this table will only have one record in it. The purpose of the last table is to show the current height of the blockchain, which guides OBP if the RHDB has all the data it should have.

CREATE TABLE "TESTFOUNDER_testch1_last"
    (
     ID      VARCHAR2 (16) ,
     BLOCKNO NUMBER  NOT NULL ,
     TXNNO   NUMBER  NOT NULL
    )
        TABLESPACE DATA
        LOGGING
    COLUMN STORE COMPRESS FOR QUERY HIGH
    NO INMEMORY
;

CREATE UNIQUE INDEX SYS_C0015672 ON "TESTFOUNDER_testch1_last"
    (
     ID ASC
    )
    TABLESPACE DATA
    LOGGING
;

ALTER TABLE "TESTFOUNDER_testch1_last"
    ADD PRIMARY KEY ( ID )
    USING INDEX SYS_C0015672 ;



  

The 'last' table, has a single field primary key based on the ID field.

So far my blockchain has a height of 2, which can be seen by querying the 'last' table.


testch1_state

The state table contains data values replicated from the state database. You’ll query the state table when you create analytics about the state of the ledger.

CREATE TABLE "TESTFOUNDER_testch1_state"
    (
     CHAINCODEID VARCHAR2 (256) ,
     KEY         VARCHAR2 (1024) ,
     VALUE       VARCHAR2 (4000) ,
     VALUEJSON   CLOB ,
     BLOCKNO     NUMBER  NOT NULL ,
     TXNNO       NUMBER  NOT NULL
    )
        TABLESPACE DATA
        LOGGING
    COLUMN STORE COMPRESS FOR QUERY HIGH
    NO INMEMORY
;

ALTER TABLE "TESTFOUNDER_testch1_state"
    ADD CONSTRAINT TESTFOUNDER_TESTFOUNDER_TESTCH1_STATE_JS
    CHECK ( valueJson IS JSON)
;

CREATE UNIQUE INDEX TESTFOUNDER_TESTFOUNDER_TESTCH1_STATE_PK ON "TESTFOUNDER_testch1_state"
    (
     CHAINCODEID ASC ,
     KEY ASC
    )
    TABLESPACE DATA
    LOGGING
;

ALTER TABLE "TESTFOUNDER_testch1_state"
    ADD CONSTRAINT TESTFOUNDER_TESTFOUNDER_TESTCH1_STATE_PK PRIMARY KEY ( CHAINCODEID, KEY )
    USING INDEX TESTFOUNDER_TESTFOUNDER_TESTCH1_STATE_PK ;

Similar to the history table, the state table has a compound primary key, but this one is only made up of two fields (CHAINCODEID, and KEY)


A deep dive into normalizing JSON data structures in a relational database

In this next section I want to walk through how to normalize a JSON structure into a relational view. In order to make use of the data in the RHDB for reporting and analytic purposes you're going to need to put it into a structured format. Since JSON is only semi-structured data, we need to do some processing to make it useful in a relational database.

First thing to clarify is that I am not a DBA. I am an architect and know enough to get in there and do what I need to do, but my methods described here to explore these features may not be what is best in a fully scaled enterprise solution. Topics like staging the JSON data into a relational format or using materialized views are beyond the scope of this tutorial. One could envision that a database with enough data in it may not perform well using a dynamic sql query in a view to parse the JSON. For our purposes it will be sufficient and these methods can certainly be used as the basis for more in-depth scaling efforts.

As mentioned above, the actual transaction information that will be needed for most reporting efforts will be found in the history table, within the VALUEJSON field. Depending on how complex your JSON structure is, parsing this data may be quite simple or it could be a bit more complicated. Let's start with simple.

If you read Part 1 of this tutorial, you will recall that we wrapped up towards the end with executing one of the sample chaincodes to populate the blockchain with some test data. After running the Car Dealer's chaincode we wound up with the following data on our blockchain.

{
  "docType": "vehiclePart",
  "serialNumber": "a00001",
  "assembler": "docbrown",
  "assemblyDate": 1580428800000,
  "name": "fluxcapacitor",
  "owner": "michaelj",
  "recall": false,
  "recallDate": 0
}


This is a very simple JSON structure that consists of eight fields. Each data point in a JSON structure is made up of a key value pair with the field name and the value as seen above. While this is a very efficient way of capturing data we want to be able to use this data with traditional reporting tools using standard SQL. Fortunately, Oracle SQL has JSON functions to help with this.

For simple JSON structures, where you know the fields that you want to return from the set and each JSON will return a single record, you can use the JSON_VALUE function.

SELECT
CHAINCODEID||KEY||BLOCKNO||TXNNO AS SRCTBLPK,
JSON_VALUE(VALUEJSON, '$.docType') AS DOCTYPE,
JSON_VALUE(VALUEJSON, '$.serialNumber') AS SERIALNUMBER,
JSON_VALUE(VALUEJSON, '$.assembler') AS ASSEMBLER,
JSON_VALUE(VALUEJSON, '$.assemblyDate') AS ASSEMBLYDATE,
JSON_VALUE(VALUEJSON, '$.name') AS NAME,
JSON_VALUE(VALUEJSON, '$.owner') AS OWNER,
JSON_VALUE(VALUEJSON, '$.recall') AS RECALL,
JSON_VALUE(VALUEJSON, '$.recallDate') AS RECALLDATE
FROM
ADMIN."TESTFOUNDER_testch1_hist";



This SQL statement will query the VALUEJSON column in the history table and parse the fields into separate columns. Notice that I reconstructed the Primary Key by concatenating the fields together. I recommend doing this to have the option to denormalize the data set after everything is parsed out. This will be more relevant in the next example.

Running the SQL statement above gives you the following results.



This works well enough for simple JSON structures. One of the great things about JSON is that you don't have to work with simple structures, in fact JSON structures can get quite complex with multiple levels of nested information. This further reinforces why it will be necessary to normalize these data sets so each section of a JSON can be broken out into a separate view. Then by using the primary key field you can join your data sets back together, as needed, in your reporting environment.

One particularly common and challenging thing you may come across is a nested JSON with more than one record. A good example of this is an invoice structure. For every invoice I could have a single JSON record that has information about the invoice, maybe the supplier and the buyer, but I could also have multiple line items within a single invoice. The challenge, which makes using the JSON_VALUE function ineffective, is that each invoice could have multiple line items and there's no way to know which ones will have more than one and how many line items it will have. It is conceivable that some invoices will just have one line item, others could have two, three, a dozen, and even a hundred or more line items for an individual invoice. For this we need to make use of the JSON_TABLE function.

The sample chaincode examples provided in OBP do not have complex JSON structures. Therefore, for this example, let's just go ahead and create our own table, with our own JSON structure, then load some data so we can see how this would work.

First create a new table in your ADW schema using this code

CREATE TABLE "TEST_JSON"
    (
     CHAINCODEID  VARCHAR2 (256) ,
     KEY          VARCHAR2 (1024) ,
     TXNISVALID   NUMBER (1)  NOT NULL ,
     VALUE        VARCHAR2 (4000) ,
     VALUEJSON    CLOB
    )
;

ALTER TABLE "TEST_JSON"
    ADD CONSTRAINT TEST_JSON_JS
    CHECK ( valueJson IS JSON)
;

CREATE UNIQUE INDEX TEST_JSON_PK ON "TEST_JSON"
    (
     CHAINCODEID ASC ,
     KEY ASC
    )
;

ALTER TABLE "TEST_JSON"
    ADD CONSTRAINT TEST_JSON_PK PRIMARY KEY ( CHAINCODEID, KEY )
    USING INDEX TEST_JSON_PK ;


Once the table is created go ahead and load data into the table using this code.

INSERT INTO ADMIN.TEST_JSON
(CHAINCODEID, KEY, TXNISVALID, VALUE, VALUEJSON)

VALUES (
    'testjson',
    '1001-CompanyA-CompanyB',
    1,
    '600',
 '{"invoiceNumber":"1001","invoiceDate":"20200330","seller":"CompanyA","buyer":"CompanyB","totalInvoiceAmount":"600","lineItems": [{"lineAmount":"100","lineItemNumber":"0001","lineItemDescription":"Hand sanitizer","lineItemTextData":"testInv"},{"lineAmount":"200","lineItemNumber":"0002","lineItemDescription":"Toilet paper","lineItemTextData":"testInv"},{"lineAmount":"300","lineItemNumber":"0003","lineItemDescription":"Paper towels","lineItemTextData":"testInv"}],"buyerCountry":"US","sellerCountry":"CA","invoiceCurrency":"USD"}');


Now that we have something to test with, let's first take a look at our JSON structure. Assume this is what your chaincode is working with and the record in our test table represents what you would find in the history table.

{
    "invoiceNumber": "1001",
    "invoiceDate": "20200330",
    "seller": "CompanyA",
    "buyer": "CompanyB",
    "totalInvoiceAmount": 600,
    "lineItems": [
        {
            "lineAmount": 100,
            "lineItemNumber": "0001",
            "lineItemDescription": "Hand sanitizer",
            "lineItemTextData": "testInv"
        },
        {
            "lineAmount": 200,
            "lineItemNumber": "0002",
            "lineItemDescription": "Toilet paper",
            "lineItemTextData": "testInv"
        },
        {
            "lineAmount": 300,
            "lineItemNumber": "0003",
            "lineItemDescription": "Paper towels",
            "lineItemTextData": "testInv"
        }
    ],
    "buyerCountry": "US",
    "sellerCountry": "CA",
    "invoiceCurrency": "USD"
 
}

Note that the JSON structure looks similar to the previous examples except for lineitems. We can see that we actually have another JSON structure nested within our main structure. Now working with a nested structure wouldn't necessarily rule out the ability to use the JSON_VALUE function, it is actually the fact that in this case we have multiple line items, and while this particular record has three line items, it is quite conceivable that the next record could have one, two, or many more line items.

Therefore, in order to parse this out we will want to make use of the JSON_TABLE function. By using the JSON_TABLE function we will actually create a separate record in our normalized table/view for each line item. Once that is done we can use the primary key field to join it back to the main invoice record. This is why I said earlier it is important to have the primary key field represented and consistent in your normalized data set. Consider the fact that ten invoices would result in ten records in the invoice table/view we are going to create, similar to what we did above with JSON_VALUE. But now consider that if each of those invoices had ten line items, the line item detail table/view will have 100 records. When we join it back to the invoices to denormalize the set we will end up with 100 records. Each row will repeat the invoice header information that is relevant to each line item, setting you up to run accurate and detailed reports.

Let's go ahead and create two views to work through this example. The first view will be similar to the previous example where we will pull the invoice data. The second view will parse the line items.

Invoice View

CREATE VIEW "ADMIN"."TEST_INV_V" ("SRCTBLPK", "INVOICENUMBER", "INVOICEDATE", "SELLER", "BUYER", "TOTALINVOICEAMOUNT", "BUYERCOUNTRY", "SELLERCOUNTRY", "INVOICECURRENCY") AS
  SELECT
    a.SRCTBLPK,
    a.INVOICENUMBER,
    a.INVOICEDATE,
    a.SELLER,
    a.BUYER,
    a.TOTALINVOICEAMOUNT,
    a.BUYERCOUNTRY,
    a.SELLERCOUNTRY,
    a.INVOICECURRENCY
 
  FROM (
    SELECT
        CHAINCODEID||KEY AS SRCTBLPK,
        JSON_VALUE(VALUEJSON, '$.invoiceNumber') AS INVOICENUMBER,
        JSON_VALUE(VALUEJSON, '$.invoiceDate') AS INVOICEDATE,
        JSON_VALUE(VALUEJSON, '$.seller') AS SELLER,
        JSON_VALUE(VALUEJSON, '$.buyer') AS BUYER,
        JSON_VALUE(VALUEJSON, '$.totalInvoiceAmount') AS TOTALINVOICEAMOUNT,
        JSON_VALUE(VALUEJSON, '$.buyerCountry') AS BUYERCOUNTRY,
        JSON_VALUE(VALUEJSON, '$.sellerCountry') AS SELLERCOUNTRY,
        JSON_VALUE(VALUEJSON, '$.invoiceCurrency') AS INVOICECURRENCY
    FROM TEST_JSON) a
WHERE a.INVOICENUMBER IS NOT NULL
;


Line item view

CREATE VIEW "ADMIN"."TEST_LINEITEMS_V" ("SRCTBLPK", "LINEAMOUNT", "LINEITEMNUMBER", "LINEITEMDESCRIPTION", "LINEITEMTEXTDATA") AS
  SELECT
    a.CHAINCODEID||a.KEY AS SRCTBLPK,
    jt.LINEAMOUNT,
    jt.LINEITEMNUMBER,
    jt.LINEITEMDESCRIPTION,
    jt.LINEITEMTEXTDATA
FROM TEST_JSON a,
JSON_TABLE(
        VALUEJSON, '$'
            COLUMNS
                (
                    NESTED PATH '$.lineItems[*]'
                        COLUMNS (
                            lineAmount VARCHAR2(4000) PATH '$.lineAmount',
                            lineItemNumber VARCHAR2(4000) PATH '$.lineItemNumber',
                            lineItemDescription VARCHAR2(4000) PATH '$.lineItemDescription',
                            lineItemTextData VARCHAR2(4000) PATH '$.lineItemTextData',
                            "LINEITEM" FOR ORDINALITY
                                )
                )
            ) jt
WHERE JSON_VALUE(VALUEJSON, '$.invoiceNumber') IS NOT NULL;


Now query each view and look at the results. You can see that the first view has a single record showing the invoice we loaded. The second view has three records; one for each line item in the invoice.




Now run a query to denormalize the two views into a single query.

SELECT
    a.SRCTBLPK,   
    a.INVOICENUMBER,
    a.INVOICEDATE,
    a.TOTALINVOICEAMOUNT,
    b.LINEITEMNUMBER,
    b.LINEAMOUNT,
    a.SELLER,
    a.BUYER,
    a.BUYERCOUNTRY,
    a.SELLERCOUNTRY,
    a.INVOICECURRENCY,
    b.LINEITEMDESCRIPTION,
    b.LINEITEMTEXTDATA
FROM
ADMIN.TEST_INV_V a
    INNER JOIN ADMIN.TEST_LINEITEMS_V b on a.SRCTBLPK = b.SRCTBLPK;



This is the denormalized invoice data set that you can now use for reporting. Notice how the data associated with the invoice (invoicenumber, invoicedate, etc.) is repeated for each line item.


Closing

That's it for this post, hopefully I have conveyed the value and importance of the Rich History Database in Oracle's Blockchain platform. In my next post, closing out this series, we're going to look at Oracle Analytics Cloud and how you can use it to report and visualize this data now that we have parsed it into views that can be consumed by a tool that works with structured data.


Comments

  1. Hello,

    What happened to the final video?
    I mean, I don't see video 3 of 3

    Thank you for your time
    Best Regards

    ReplyDelete
    Replies
    1. Hey Eduardo - Part 3 isn't done yet. I hope to have it posted in the next week or two. Was waiting to get through the OAC Summit. Please check back or subscribe to get an update when it posts.

      Delete
    2. Part 3 of the series is now available here http://coding-around-the-block.blogspot.com/2020/05/oracle-blockchain-platform-tutorial.html

      Best regards -Gary

      Delete

Post a Comment