Data Security Strategies

Arun Rajeevan
4 min readMay 28, 2020

Understand Data Life Cycle Phases:

1) Create
The new digital content is generated or acquired, or the existing content is altered or updated during the creation phase.
2) Store
Digital data is committed to storage repository simultaneously with creation in this phase.
3) Use
Data is viewed, processed, or used in other activities in this phase.
4) Share
Data is exchanged among users, customers, and partners in the sharing phase.
5) Archive
Data is stored in a long-term storage for future use.

Data Masking

Data masking is the process of hiding, replacing, or omitting sensitive information from a specific data set.

Different approaches:

Random substitution: The idea of replacing (or appending) the value with a random value is called random substitution.

Algorithmic substitution: The idea of replacing (or appending) the value with an algorithm generated value is called algorithmic substitution. This typically allows for two-way substitution.

Shuffle: The idea of using different entries from within the same data set to represent the data is called shuffling. This has the obvious drawback of using actual production data.

Masking: The idea of using specific characters to hide certain parts of the data is called masking. It usually applies to credit card data formats.

Data anonymization

Data anonymization is a type of information sanitization and its intent is privacy protection. Data generally has direct and indirect identifiers, where direct identifiers represent private data, whereas indirect identifiers have attributes such as demographic and location data. When used together, they could produce the exact identity of an individual. Data anonymization is the process of removing the indirect identifiers either by encrypting or removing personally identifiable information from data sets, so that the people whom the data describes remain anonymous.

Tokenization

Tokenization is the process of substituting a sensitive data element with a nonsensitive equivalent, referred to as a token.
It is the practice of having two distinct databases; one with the live and actual sensitive data and another with nonrepresentational tokens mapped to each piece of that data. The token is usually a collection of random values with the shape and form of the original data placeholder which can be mapped back to the original data by the tokenization application or solution.
Tokenization is able to assist with each of these: Complying with regulations or laws Reducing the cost of compliance Mitigating risks of storing sensitive data and reducing attack vectors on that data.

Tokenization process:

1)An application collects or generates a piece of sensitive data.
2) The data is not stored locally and is sent to the tokenization server.
3) The tokenization server generates the token. The sensitive data and the token are stored in the token database.
4) The tokenization server returns the token to the application.
5) The application stores the token rather than the original data.
6) When the sensitive data is needed, an authorized application or user can request it.

Bit Splitting

Bit splitting involves encrypting data, then splitting the encrypted data into smaller data units and distributing those smaller units to different storage locations, and then further encrypting the data at its new location.
With this process, the data is protected from security breaches, because even if an intruder is able to retrieve and decrypt one data unit, the information would be useless unless it can be combined with decrypted data units from the other locations.
Benefits:
a) Data security is enhanced due to the use of stronger confidentiality mechanisms.
b) Bit splitting between different geographies and jurisdictions make it hard to gain access to the complete data set via a subpoena or other legal processes.
c) It can be scalable, can be incorporated into secured cloud storage API technologies, and can reduce the risk of vendor lock-in.

Challenges of Bit Splitting:
a) Processing and reprocessing the information to encrypt and decrypt the bits is a CPU intensive activity.
b) The whole data set may not be required to be used within the same geographies which is stored and processed by CSP, which in turn leads to the need of ensuring data security on the wire as a part of the security architecture for the system.
c) Storage requirements and costs are usually higher with a bit splitting system.
d) Bit splitting can generate availability risks because all parts of the data may not be available while decrypting the information.

Avoid Data Remanence

Data Remanence is the ability of computer memory to retain previously stored information beyond its intended lifetime.
With many data storage techniques, information can be recovered using specialized techniques and equipment even after it has been overwritten.

Note: Any critical data must not only be protected against unauthorized access and distribution, but also securely deleted at the end of its life-cycle.

For organizations storing information related to health, financial or defense it is mandatory to ensure that no data is left on disks from where it is exposed to the risk of being recovered by malicious users.

The technique of overwriting file sectors does not work without the collaboration of the cloud provider.
You are not given access to the physical device, but only to higher level abstractions like file-systems (e.g. Amazon EBS) or key-value based APIs (e.g. Amazon S3).
In SaaS/Paas environments, access only happens on the data level.
Until cloud providers start paying attention to this issue and offer secure deletion as a feature of there services, there is only one solution that works already today at least on IaaS platforms: strongly encrypt your data and keep the key at a safe place, i.e. outside the cloud where your data is stored.
Secure deletion then becomes nothing more than destroying the key.

--

--