Learn

File Size Conversions

What does file size have to do with compression? Before you look at compression, you will need some practical examples of file size. Look at the chart below for file size conversions.

Name Symbol Binary Measurement Decimal Measurement Number of Bytes Equal to
kilobyte KB 210 103 1,024 1,024 bytes
megabyte MB 220 106 1,048,576 1,024KB
gigabyte GB 230 109 1,073,741,824 1,024MB
terabyte TB 240 1012 1,099,511,627,776 1,024GB
petabyte PB 250 1015 1,125,899,906,842,624 1,024TB

Examples of Files Associated with each Size

  • KB is typically associated with a text file type, such as a simple email.
  • MB is typically associated with an media file type, such as a high-quality digital picture or an audio file.
  • GB is typically associated with a video file type, such 2 GB per hour of video or the size of hardware components.
  • TB is typically associated with computer hard drive or computer server drive in most modern computers.
  • PB is typically associated with a large database, such as an online social media storage.

Compression

Notice the example associated with PB. Social media sites need enormous amounts of space to store all the images and videos that are created each day. With the growth of file size and storage space needed, techniques need to be used to store and transfer data. One technique is compression. Watch the Compression video (11:44) for an introduction to compression.

When sending or sharing files, you will encounter two different kinds of compression: lossy and lossless.

Lossy Compression

Lossy compression reduces the size of files by taking out less important information. It drops nonessential information to decrease file size.

Lossy compression can increase a device’s storage space by decreasing the file size of the data on the device. Lossy compression removes colors that are out of our range for images and pitches that are outside of human hearing in audio files. Many of the removed items will not affect how we interact with the file.

Examples of Lossy Compression

Here are lossy compression examples:

  • Text Example – Using abbreviated text in SMS texting which is compressed text
  • Audio Example – MP3 (MPEG audio 3) is compressed music
  • Images Example – JPEG (joint photographic experts group) is compressed image
  • Video Example – MPEG (moving pictures experts group) is compressed video

Lossless Compression

Lossless compression reduces the size of the file without losing any information and the original file can be reconstructed from the compressed version. Lossless compression finds patterns in data and combines them to form a smaller file size.

In the Compression video, you saw the example of color encoding. When a repetitive pattern is found, lossless compression can combine the repetitive data and shorten its byte length. A library would be needed to reconstruct the data, but the file size would still be smaller than the original. In a large data set, image or video, the compression can be substantial even if a library is needed to reconstruct the original file.

Examples of Lossless Compression

Here are lossless compression examples:

  • Text Example – ZIP or using a library of symbols to represent words, which are compressed files, that need unzipping
  • Audio Example – WAV (waveform audio/windows audio) is uncompressed sound
  • Images Example – PNG (portable network graphics) or GIF (graphic interchange format) are for web graphics
  • Video Example – AV1 (AOMedia video 1 file) is for very large video files

Encryption

Now that you have a better understanding of compression, let’s look at another way that data is manipulated. When data is intended for a certain recipient, the sender wants to make sure that the file is not read by anyone else. Programs do this through encryption.

Encryption is the process of changing a message so that the original is protected from others. This is a hot topic when discussing banking and security. Data breaches and financial hacking are very prevalent in our modern society. Commercial transactions need strong encryption to protect consumers. Encryption is essential for every part of our digital life.

Encryption Process

There are two parts to the encryption process:

  • Encryption – encoding the message to keep it secure
  • Decryption – reverses encryption, takes secure message and reproduces the original

You don’t need to be a hacker to enjoy encryption. Here are some simple types of encryption that can be performed by a beginning user:

  • Caesar Cipher – shifting the alphabet by characters to create a secret alphabet that only the user and creator have through a shared library
  • Random Substitution Cipher – maps each letter of the alphabet to a random other letter of the alphabet

Caesar Cipher

Caesar Cipher Wheel

A Caesar cipher is a simple encryption technique where letters are shifted to encode a message. The cipher is named after Julius Caesar who used the technique to encode messages. A Caesar cipher wheel is a device used to create an encrypted message by rotating the wheel to encode the original text. Each letter is “shifted” or rotated by the same amount. Let’s look at Caesar cipher examples.

A rotation of 1 would make the letter A to be B, B would become C, and so on. If the original word is hello, what is the encrypted message for the word hello?

A rotation of 3 would make A the letter D, B the letter E, C the letter F, and so on. If the original word is hello, what is the encrypted message for the word hello?

Random Substitution Cipher

Original Text (Alphabet) A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Ciphertext (Randomized) E L V W C A R M B P X Q T H D J U F Z I K N Y L O G

The first row is the original alphabet. The second row is the substitution alphabet that has been randomized. Each time the original text (alphabet) is randomized, the ciphertext will change. Letters are mapped to a random ciphertext letter. Ciphertext is encrypted text transformed from original text or plaintext (WhatIs.com).

It isn’t shown in this example, but a letter can be mapped to itself (i.e. E could be E). It is unlikely that all letters will map to themselves as this is randomly generated.

If we encrypt A B C, then encrypted message or ciphertext should be E L V. Map the word HELLO to its ciphertext. What is the encoded word for word HELLO?

Using the ciphertext to replace the original text is called encoding. A message is encoded by replacing the letters of the original alphabet with ciphertext letters.

If given an encoded message, a recipient would need the ciphertext key to decode the message. A message is decoded by using the ciphertext key to map the cipher letters back to the original text.

If the message VZ FDVXZ is decoded using the ciphertext key above, what does the message decode to?

Metadata

One other factor to consider is the data associated with your file that gives information about the file. It must be accounted for in file size, compression, and encryption. This additional data is called metadata.

Metadata on files is hidden data that must be accounted for when compressing and encrypting files. Metadata is data that describes or gives information about other data, or simply data about data. Metadata is best described in a format that most people know.

Metadata Example

When you take a photo on your phone, the metadata is concealed behind the scenes in an attached set of informational data. The metadata gathered and saved is usually:

  • date/time,
  • photo name,
  • settings, and
  • possible geolocation (identifying the actual geographical location of a person or a device using the digital information such as the IP address).

The picture is the data, and the information describing it is the metadata.

Zebra photo (data) Metadata of the photo
zebra

Photo name: Zebra.png

Date: August 5, 2020

Camera: Nikon D3500

geolocation of the zebra image