Figure 1. The design of SHDB Application Programming Interface (API) |
|
Figure 2. Query Processing in the SHDB |
|
|
|
RESEARCH ON SEMANTIC HIDING DATABASES (SHDB) IN CLOUDS
Research Adviser: Jyh-haw Yeh, Computer Science, Boise State University.
Graduate Students: Andres Campossainz, Archana Nanjundarao, Fiona Yan Lee.
Undergraduate Students: Thomas Green.
PROJECT MOTIVATION:
In the current cloud computing setting, if customers outsource their data to clouds, they
are actually forced to believe that the service providers will do their best
to protect customers' data privacy. However, from customers' perspective,
why should they trust service providers? Or more specifically why should they trust DBAs inside the service
providers who are eligible to access their data? The trust issue in cloud computing is one of
the most difficult problems to overcome.
PROJECT OBJECTIVE:
This project tries to develop a semantic hiding database system to protect data privacy
while outsourcing data to clouds. In other words, we try to resolve the trust issue in cloud computing.
Rather than creating clear database instances in clouds, customers create and outsource privacy protected
SHDBs to clouds through a user friendly API, named SHDB tool. As a result, even the DBAs in clouds are
malicious, they cannot easily guess any meaningful information from the outsourced SHDB instances.
SEMANTIC HIDING DATABASES:
In a relational database, each data item itself may not reveal too much semantics.
If more correlated data items are available, the semantics become more obvious.
For example, the following two records show how semantic hiding can be done.
(A) In a PAYROLL database of an XYZ company, a record in the EMPLOYEE table shows that
John Smith's SALARY is 75,000.
(B) In a ? database of an ? company, a record in the ? table shows that ?'s ? is 75,000
Each ? mark represents the cryptographic cipher of the corresponding semantic
telling data (string type).
Line A is not encrypted and thus if the data is outsourced to clouds, malicious or
curious cloud DBAs can see the secret.
In line B, all the semantic telling data including the identities/names
(such as the data owner, database, table and attribute names) and string-typed data
(such as John Smith) are all encrypted.
The cloud DBAs can only see a number 75,000 and will have only a limited capability
of guessing the meaning of that number.
The reason of not encrypting the number is because no efficient fully homomorphic
encryption algorithm exists so that both multiply and add operations can be applied directly
on the numeric ciphers.
The semantic hiding database we proposed applies several encryption algorithms to
co-encrypt the database so that the cloud server can still execute SQL queries over
the encrypted database.
The encryption strategy is to encrypt everything, except some certain numeric data.
The encryption algorithms used include
- Deterministic Encryption Algorithm (DEA):
Encrypts the names (identities) of database, tables and attributes. AES with a fixed IV,
MD5 and SHA are examples of such algorithms.
- Order-Preserved Encryption (OPE):
Encrypts string type data if
the data is not subject to substring matching operations such as SQL "like" command.
- Order-Preserved Encryption in Word-by-Word mode (OPE-WbW):
Encrypts string type data word-by-word if
the data is applicable to substring matching operations.
- Multiplicative-Homomorphic Encryption (MHE):
Encrypts numeric data if the data is applicable to multiplication only.
RSA algorithm is a typical example for such algorithms.
- Additive Homomorphic Encryption (AHE):
Encrypts numeric data if the data is applicable to addition only.
- For numeric data applicable to both addition and
multiplication, a not-so secure polynomially-based homomorphic encoding
(developed by the project adviser Dr. Jyh-haw Yeh) will be applied.
The encoding is not as secure as encryption algorithm and it is subject to known plaintext attacks,
but it does provide privacy protection is several aspects.
|
To further hide the semantics of unencrypted numeric data, SHDB use the following two techniques:
- Data obfuscation:
While creating an SHDB instance, to obfuscate each unencrypted numeric data column,
some false (not-real data) columns, with similar data range and format, will be injected to
increase the data obfuscation.
- Query obfuscation:
For frequently user issued legit queries on those unencrypted numeric data columns, the SHDB tool is designed to
periodically send similar but false (not user issued but automatically generated) queries to those false data columns.
CURRENT STATUS:
A graduate student Andres Campossainz
has developed a prototype API, the cloud SHDB tool, which can create, load and query
SHDB instances stored in clouds.
The API is a user friendly interface between customers and cloud database servers.
The API is responsible of performing all underlying encryption/decryption operations.
Thus the existence of semantic hiding database instances is transparent to
both customers and cloud database servers.
The overall architecture design of the SHDB application programming interface (API)
and how the query processing works in SHDB are illustrated in Figures 1 and 2 on the top of the page.
Some snapshots of the API dialogs are shown below (click to enlarge the figures).
Figure 3. The SHDB API login Dialog |
|
Figure 4. The SHDB API login dialog if wrong user name/password |
|
Figure 5. The SHDB API login successfully |
|
|
|
|
|
Figure 6. The SHDB API main GUI - creating a new database |
|
Figure 7. The SHDB API main GUI - creating tables. Users select an SHDB data type for each attribute |
|
|
|
Figure 8. The SHDB API main GUI - loading data |
|
Figure 9. The SHDB API main GUI - executing query |
|
|
|
PROJECT TASKS:
The following lists the remaining tasks to be done:
- Need to implement the underlying encryption algorithm modules such as OPE and additive homomorphic encryption. Others have been implemented such as AES, RSA and homomorphic encoding.
- The SHDB API Needs to auto-generate the obfuscated data columns while creating the SHDB instance and periodically auto-generate obfuscated queries.
- Need real application dataset to test the SHDB design.
- Performance evaluation. Compare the time and storage efficiency of the SHDB instance to the baseline (clear) database instance.
- Make the software available as open source project.
|