Project

General

Profile

Actions

IDs, history tables, timestamps, etc

ID formats

Object instances in the database have unique IDs. All IDs internally generated by the web service, e.g. voucher-ID, user-ID, transaction-ID, JR-ID, etc., are all to be treated as opaque strings, not as integers.

The rules for ID syntax are:

  • opaque, meaningless string
  • between 1 and 50 octets long, variable length
  • one word, i.e. no spaces or special characters in the word, only letters, digits and hyphen (this is needed to use UUID as IDs)
  • case sensitive
  • may start with a digit or a letter
  • equality test possible, i.e. if two ID strings are identical as strings and refer to the objects of the same class, then they refer to the same instance
  • guaranteed unique, i.e. if two object instances of the same class exist in the system, they are guaranteed to have two separate ID strings
  • unique in time, i.e. an object ID is never re-used
  • ordering is meaningless, i.e. "greater than" or "less than" operation is meaningless

Note that this means that the web service may use integers as IDs, and all these rules will be fulfilled. But while representing them in JSON, the IDs will always be enclosed in double-quotes, even if they "look like" integers.

Examples of IDs:

  • "a0ae48b16f90db8f3c542f44f8103701"
  • "5"
  • "750294852039"

Examples of strings which are not IDs:

  • "a0ae48b16f90db8f3c542f4/f8103701" (there's a special character in it)
  • 5 (a digit without quotes is not a string)
  • "776,245,664" (commas are not permitted in IDs)
  • "62.6" (the period is not permitted)

Change logging and audit trails

It is very important that critical business applications must be able to trace the changes being done to each item in a database. This is called an audit trail or change log. It must be possible to trace the changes to any entity in the database from the time it was created through all changes done, to its current avatar. When changes are traced, the before-value and after-value of each field must be recorded so that they may be examined during forensic analysis.

There are various ways to do this. In Remiges (when we were not called Remiges), this was done earlier by using a history table or tables. Today, we have a fantastic and scalable logging system which uses its own private database, therefore we do it using Remiges LogHarbour.

Change logging using history tables

Transaction tables which receive updates will also have history tables which "shadow" the main tables. This will ensure that, taken in an overall database context, all updates are non-destructive, because all before-images of records can be reconstructed by analysing the history tables.

History tables will contain:

  • previous images of updated records
  • deleted records

The following policies will be applied for maintenance of history information:

  • When a record is inserted in a main table, nothing is done to the history table. (This means that there is no history-related overhead of compute power or disk space for tables which are not modified much.)
  • When a record is updated in a main table, its previous image is first inserted into the history table and then the data is updated in the main table
  • When a record is physically deleted (i.e. using the SQL DELETE operation) from a main table, the to-be-deleted record is first copied into the history table
  • When truncate operation is performed on the table, all the rows in the current table will be inserted to its history table

These are the implementation specifications for implementing the history feature:

  • All main tables which are supposed to get the benefit of history tracking will now have corresponding history tables to "shadow" them. If the main table is called X, its shadow table will be called X_hist.
  • The history maintenance feature will be implemented using database triggers.
  • The shadow table will have identical schema to the main table, with two additional fields:

    • hist_when: timestamp specifying when this record was inserted in the shadow table
    • hist_op: will contain either "U", "D" or "T", indicating that this history replica is because of an u(pdate), d(eletion) or t(runcated)
  • The server-side application code will connect to the database using a specific database username and password meant for business operation. (Let's say that this username is appcode.) This username will not have any rights on the shadow tables other than INSERT rights. A separate, second database user (let's say, apphist) will have SELECT rights on the shadow table, for reports. And only the administrator will be able to modify the contents of shadow tables.

There are ready-made examples on the Net about how to implement this sort of history table with a Postgres database. One well-known example is https://wiki.postgresql.org/wiki/Audit_trigger Most of these examples actually use a single history table to store before-images from all primary tables. That approach works well too -- you don't need to have separate shadow tables for each primary table. In particular, a single combined history table is easily implemented in Postgres because of its powerful jsonb column type, which allows us to store any random data structure in one column, thus giving us flexible data structure storage.

Change logging using LogHarbour

Remiges LogHarbour is a logging system which allows very high speed logging into a central repository from a distributed application. LogHarbour supports structured logging, i.e. each log entry has a JSON structure, and many of the fields in the entry are standardised and have well-known meanings, independent of the business logic of the application.

LogHarbour supports three types of logs: (i) debug logs, for developers to debug their code, (ii) activity logs, to allow developers and system managers to trace activities and thread them into threads of execution, and (iii) change logs. Change logs are designed to record change history. Developers need to write calls to LogHarbour in their business logic code whenever the business logic performs any change to the application data store. Creation of a new instance of any entity, changes to field values, deletion of instances, are all supposed to be logged.

A log entry in the change-history log will identify

  • the class of entity which the entry refers to, e.g. a voucher, or a vendor record in the vendor master, or an invoice, etc
  • the unique ID or primary key of the instance to which this log entry refers
  • a list of change-log objects, each of which will specify
    • the field name whose value is changing or being added or removed
    • the old value of the field if any
    • the new value of the field
  • various contextual details like the user who performed this change, the IP address or client from where she performed it, the timestamp, etc

If LogHarbour is integrated with the business application, this power for maintaining audit trails is available out of the box. LogHarbour also provides a GUI to allow authorised users to pull out data from the change-log repository and browse change history.

Logs stored in LogHarbour are safe from tampering by the business application code, because it runs as a separate service with its own data repository, its own web service call interface, and its own access controls. Business application code can insert log entries into the data repository but cannot modify or read this data without proper authorisation. Raw access to the repository is not permitted.

Timestamps

As per JSON Schema specification for date-time, for timestamps values, we will always use the format defined in RFC 3339 e.g. 1990-12-31T23:59:50Z for a global UTC timezone timestamp or 2015-12-31T23:59:50+05:30 for a timezone-included timestamp. All timestamp information in web service parameters will be communicated in these formats. If an application is used by clients across various timezones, all timestamps sent to or received from web services must be in UTC, and must be stored on the servers in UTC. All conversion of those timestamps to local timezones to make it easier for humans to read, for instance, must be done in the client software.

Error checking when calling web services

When we write code to call web services, we need to take care of a few separate levels of checks. This has nothing to do with the programming language used, or the place from where we're calling. We may be making web service calls from one server-side daemon to another, or from a mobile app to a server. These checks need to be done nevertheless.

  • 1: Call timeout. Check if the call has timed out. Your code needs to have an explicit check for timeout of a request.
  • 2: Call network error. This happens if the server crashed while it was processing your request, or if the network connection from your mobile phone or browser was lost after the request was sent out, or a host of other network related issues. This error needs to be tested for separately. For each programming language and framework, you need to read up and find out how that framework returns network errors, and check for that error. With mobile phones and browsers, it's easy to test this by making the request and bringing down wifi just before or just after the request goes out.
  • 3: Return status. All web service calls return some sort of status. In our Remiges standard designs, we return a status attribute in the JSON data structure from the server, and this status may be either ok or error. The caller will get this only if there was no timeout or network error, and if the server was functioning correctly and gave a meaningful response with a status attribute. Therefore, this error needs to be checked after the first two.
  • 4: Data check. If the previous three checks are all ok and you have a valid response data structure from the server, then you need to check if the data in that response matches what you expect to get. The need for this is not obvious to beginners, but we often have a perfectly functioning server returning data which does not match what your calling code expects. This may happen due to (i) a bug in the server code, or (ii) a transient error in the server system which the server side code is not detecting correctly, like a database failure, or (iii) a difference between what you think the web service spec are supposed to be and what they actually are. All three are common reasons. Therefore, your code needs to check that the data has all the attributes which you need, and that the values in the data are not some garbage (like zero length string) or some different format (like getting a space-separated list of integers when you were expecting only one integer).

This is the list of four checks which your calling code needs to do, in this sequence, each time it calls a web service. This is the only way to build code which works reliably across network interfaces.

It is possible that you do not know what code to write to check items 1 and 2. Please ask a senior person like a Technical Architect or Solution Architect. Your tech lead or project manager may not know, specially if they haven't read and understood this page. If you have doubts about item 3, make test calls to the web service using some debug framework like Postman and check the data structure being returned, and go back and read the specs. Don't trust your colleagues, or code borrowed from random sources on the Internet.

What to do after this? Once you have made these checks in your code, you will be ready to detect them. That's great. What do you do when you detect each type of error? The answer depends on context. Therefore discuss each type of error in detail with your tech lead.

  • Network errors may require your code to make a retry, or may require you to push this error up from your low-level code to the UI, so that the UI may display a meaningful message like "Uh-oh! It seems you've lost Internet connectivity. Shall we try this later?" with an "Ok" button.
  • Timeout errors may require your code to make a few repeat attempts in a loop, so that if the timeout was transient, the next attempt may let you proceed. Sometimes, a fault in the connectivity will give you a timeout error the first time and a network error or a success the second time. Therefore, what you want to do with a timeout error is an important design decision. Also how much timeout delay to set is a second important decision, and must be specified in a configuration file or constants file, not hard-coded in your code.
  • Return status checks are quite straightforward: it often means that there is a sensible cause for the error, e.g. the user has entered an invalid ID or key or password, and you need to take action accordingly.
  • Data check failures are the most dicey of all. If you find this error, your code must log everything so that you can escalate it to your seniors for debugging. If you catch this during development and testing, it is easily solved with a discussion. If this error happens in production, then the must-log-everything approach is the only way to debug this and fix the problem.

The key to good code which works over the network is: do not trust anything about your communication with the other party. Check everything.

Updated by Shuvam Misra over 1 year ago · 12 revisions