MCA Data Structures With Algorithms 15

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

UNIT

15 Files and Files Organisation

Names of Sub-Units

Introduction, Data Hierarchy, File Attributes, Text Files, Binary Files, Basic File Operations, File
Organizations and Indexing

Overview

This unit begins by discussing about the concept of �iles and �iles organization and data hierarchy.
Next, the unit discusses the �iles attributes, text �iles and binary �iles. Further the unit explains the
basic �ile operations. Towards the end, the unit discusses the �ile organizations and indexing.

Learning Objectives

In this unit, you will learn to:


 Discuss the concept of �iles and �iles organization
 Explain the concept of data hierarchy and �iles attributes
 Describe the text �iles and binary �iles
 Explain the signi�icance of basic �ile operations
 Discuss the importance of �ile organizations and indexing
Learning Outcomes

At the end of this unit, you would:


 Evaluate the concept of �iles and �iles organization
 Assess the concept of data hierarchy and �iles attributes
 Evaluate the importance of text �iles and binary �iles
 Determine the signi�icance of basic �ile operations
 Assess the importance of �ile organizations and indexing

Pre-Unit Preparatory Material

 http://cre8te.co.uk/wp-content/uploads/2014/07/Files-And-Folders-Updated-June-2015.pdf

15.1 INTRODUCTION
File is a collection of record. It is allocated for storing large amount of information stored on devices
excluding internal memory of the computer. It meant that records are stored in the secondary storage.
The �ile should be organized so that operations will be determined effectively based on features of
secondary storage devices to deploy the �ile. Operations on �ile are to insert and delete records, update
or process records and search for records. These operations are applicable for lists, trees, arrays, list
structures and complex lists. Time used in managing information has been resulted in fast operations
and the ef�iciency should be determined against care needed to manage the organized data structure.

At the time of executing high level language programs, the operations manage the �iles as data.
Operations are traversing and processing all records and also individuals chosen in random order.
Primary function of �ile system is to offer storage facilities and enable �iles to be searched conveniently
so that records may be sequentially retrieved.

15.2 DATA HIERARCHY


Data hierarchy is the systematic organisation of data, which is typically done in a hierarchical method.
In data organisation, characters, �ields, records, �iles, and so on are all used. This concept is a useful
place to start when trying to understand what makes up data and whether it has a structure. How can
someone decipher facts like ‘employee,’ ‘name,’ ‘department,’ ‘Marcy Smith,’ ‘Sales Department,’ and so
on, assuming they’re all connected? To help you understand these notions, think of them as smaller or
larger components in a hierarchy. Marcy Smith could be seen as a Sales Department employee or an
example of a Sales Department employee.

All data has its own hierarchy in data hierarchy, starting at a comprehensive top level and continuing
down to a de�inite bottom level. Someone, for example, is looking for a video game title in a database.

2
The video game console type is �irst, followed by the game creator, the genre, the �irst letter of the
game’s name, and lastly the game itself. This method of cataloguing data makes it easier to locate. It
also makes it easier for the database to process new data by ensuring that datum is only recorded in the
appropriate category.

15.2.1 Components of Data Hierarchy


The following are the components of data hierarchy:
 Bits: In a computer, the smallest data item can have the value 0 or 1. A bit (short for “binary digit”—a
digit that can take one of two values) is a type of data item. Simple bit manipulations are performed
by computer circuitry, such as evaluating the value of a bit, altering the value of a bit, and reversing
the value of a bit (from 1 to 0 or from 0 to 1).
 Characters: Working with data in the low-level form of bits is inconvenient for programmers.
Instead, they prefer to operate with numbers (0–9), letters (A–Z and a–z), and special symbols (e.g.,
$, @, percent, &, *, (,), –, +, “,:,?, and /). Characters are made up of digits, letters, and special symbols.
The character set of a computer is the collection of all the characters that can be used to develop
programmes and represent data. Because computers can only handle 1s and 0s, the character set of
a computer depicts each character as a sequence of 1s and 0s. Unicode characters in Java are made
up of two bytes, each of which is eight bits long. The data type byte is available in Java and can be
used to represent byte data.
 Fields: Fields are made up of characters or bytes. A �ield is a group of characters or bytes that
transmit data. A person’s name, for example, can be represented by a �ield of uppercase and
lowercase letters.
 Recordsand�iles: A record is acollection of interconnected �ields. A �ile is acollection of interconnected
records. Any type of data in any format can be stored in a �ile. In different operating systems, a �ile
is viewed as a collection of bytes.
 Record keys: At least one �ield in each record is chosen as a record key in order to retrieve speci�ic
records from a �ile. A record key is a unique identi�ier for each record that identi�ies it as belonging
to a speci�ic person or entity.
 Sequential �iles: A �ile can be organised in a number of different ways. A sequential �ile is the most
popular, as it stores records in the order de�ined by the record-key �ield.
 Database: A collection of related �iles is known as database and the collection of designed program
to create and manage database is known as database management system (DBMS).

15.3 FILE ATTRIBUTES


A �ile can be a “free formed,” “indexed” or “organised” collection of linked bytes that is only understood
by the person who generated it. In other terms, a �ile is an item in a directory. Name, creator, date, type,
permissions and other information may be present in the �ile.

A �ile is a data structure that contains a series of records in a logical order. Files are kept in a �ile system,
which might be located on a drive or in main memory. Simple (plain text) or complicated �iles are both
possible (specially-formatted). The term “directory” refers to a group of �iles.

3
The �ile system is a collection of directories organised at various levels as shown in Figure 1:

Data

Files

Directory

File System

Figure 1: Levels in File System


Some of the attributes of �ile system are as follows:
 Name: Every �ile has a name that is used to identify it in the �ile system. Two �iles with the same
name cannot exist in the same directory.
 Identi�ier: Each �ile has its own extension that speci�ies the �ile’s kind in addition to its name. A text
�ile, for example, has the extension.txt, whereas a video �ile has the extension.mp4.
 Type: Files are categorised into several sorts in a File System, such as video �iles, audio �iles, text �iles
and executable �iles.
 Location: There are various areas in the File System where �iles can be stored. The location of each
�ile is stored as an attribute.
 Size: One of the most essential characteristics of a �ile is its size. The number of bytes obtained by the
�ile in memory is referred to as the �ile’s size.
 Protection: Distinct safeguards for different �iles may be desired by the computer’s administrator.
As a result, each �ile has its unique set of rights for each User group.
 Time and date: Every �ile has a time stamp that includes the time and date when it was last changed.

15.3.1 Advanced File Attributes


The permissions you can provide to folders and �iles are determined by how they are accessed. Because
they show in the advanced security settings dialogue box, these rights are referred to as “advanced”
permissions. To access them, go to the security tab of the Properties dialogue box and select the advanced
option.
The following is a list of advanced permissions for �iles and folders, along with a brief description of
each are as follows:
 Traverse folder: This allows or disallows going through a restricted folder in the folder hierarchy
to access �iles and folders beneath the restricted folder. Only when the group or user is not granted
the “Bypass traverse checking user” right in the Group Policy snap-in does the traverse folder take
effect. This permission does not automatically grant access to programme �iles that can be run.

4
 Execute �ile: This allows or disallows the execution of executable �iles.
 List folder: This allows or disallows viewing of the folder’s �ile names and subfolder names. List
Folder affects just the contents of the folder; it has no bearing on whether or not the folder you are
setting the permission on will be listed.
 Read data: This allows or disallows viewing of data in �iles.
 Read attributes: This allows or disallows viewing of a �ile’s or folder’s properties, such as “read-only”
and “hidden”
 Read extended attributes: This allows or denies viewing the extended attributes of a �ile or folder.
Extended attributes are de�ined by programs and may vary by program.
 Create �iles: This allows or denies creating �iles within the folder
 Write data: This allows or denies making changes to a �ile and overwriting existing content.
 Create folders: This allows or denies creating subfolders within the folder.
 Append data: It may consent or reject while creating changes to the end of the �ile but does not
change, deleting and overwriting when data is existing.
 Write attributes: It allows or denies changing the attributes of a �ile or folder, for example, “readonly”
or “hidden”.
 Write extended attributes: It allows or disallows modifying a �ile’s or folder's extended attributes.
Programs de�ine extended characteristics, which might differ from one programme to the next. The
Write Extended Attributes permission does not grant the ability to create or delete �iles or folders;
rather, it grants the ability to modify the extended attributes of an existing �ile or folder.
 Delete subfolders and �iles: It allows or disallows the deletion of subfolders and �iles, even if the
delete permission on the subfolder or �ile has not been given.
 Delete: It allows or prevents the deletion of a �ile or folder. Even if you do not have Remove permission
on a �ile or folder, if you have Delete Subfolders and Files permission on the parent folder, you may
still delete it.
 Read permissions: This allows or denies reading permissions of a �ile or folder.
 Change permissions: This allows or denies changing permissions of the �ile or folder.
 Take ownership: This allows or disallows the user to take ownership of a �ile or folder. Regardless of
any current rights that protect the �ile or folder, the owner of the �ile or folder can always alter its
permissions.
 Synchronise: This allows or disallows separate threads to synchronise with another thread that
may signal the handle for the �ile or folder. Only multithreaded, multiprocessing programs are
allowed to use this permission.

15.4 TEXT FILES


A text �ile is a form of digital �ile that is non-executable and only contains text. It can comprise numbers,
characters, symbols, and/or a combination of these, but not special formatting such as italic text, bold
text, underlined text, graphics, and so on. The.txt �ile extension is used to identify text �iles on a Microsoft
Windows computer. A given image is an example of a text �ile.
A text �ile, often known as an ASCII �ile or a �lat �ile, is a type of �ile that is used to contain structured and
standard textual data or information that humans can read. The text �ile can be de�ined in a number of

5
different formats, including ANSI for Windows-based operating systems and ASCII for cross-platform
use.
In a Windows OS, a text editor such as Word or Notepad is used to create a text �ile with the extension.txt
(operating system). Nearly all computer languages, including PHP and Java, employ text �iles to write
and store source code. By changing the �ile extension from.txt to.php or.cpp, the generated �ile can be
converted into a similar programming language.

15.5 BINARY FILES


A binary �ile includes all �iles that aren’t used to store textual material. A binary �ile can be used to
construct any custom �ile type as long as the essential information for reading the �ile is stored in the �ile.
Multiple types of data, such as images, video, and audio, are stored in the same �ile. The only stipulation
they make is that you have an application that can read this type of data from the system. The PNG �ile
format is an excellent example of the above-mentioned scenario. Most image viewers can read PNG �iles,
which contain graphical data. When you open a PNG �ile in a text editor, you’ll notice that the majority
of the �ile is made up of unrecognisable characters. However, there are readable text fragments strewn
over the �ile. This is due to the fact that the PNG �ile includes small portions for storing textual data in
addition to the graphical data. Other �ile formats may also enable this, which is possible due to binary
�iles’ dynamic nature.
A header appears at the top of binary �iles. The �ile’s key is this header. It’s used to keep track of the data
that identi�ies the �ile’s content. An example of binary �ile is shown in Figure 2:

Figure 2: Binary File


The �irst column has starting address of line when * represent repetition. Binary �iles has sequence of
bytes that represents the binary digits in eights. Binary �iles has bytes which are interpreted as symbol
or other characters. Example: compile computer programs which are sometimes referred as binaries.
These binary �iles contain sounds, images, compressed version of other �iles and any type of �ile content.
In computer program, binary �iles composed of blocks of metadata and headers to interpret data in �ile.
Header has magic numbers or signatures that will determine the format. GIF �ile has multiple images
and headers are used to de�ine block of image data. If binary �ile do not have headers, it can be called
as �lat binary �ile.
In order to binary �iles over some systems which do not enable all data values , they will be converted
into plain text. Encoding the data has demerits of enhancing the �ile size at the time of transfer and need

6
translation into binary. An enhanced size will be determined by low level link compression and text data
have less entropy as it has enhanced size.
The standard libraries and Microsoft windows enables the programmer to determine parameter if
�ile is focused on binary or plain text while opening a �ile. In Unix, the standard libraries enables the
programmer to determine whether a �ile is expected to be binary or text.

Viewing
Hex viewer is used to view �ile data as sequence of hexadecimal values of binary �ile. If the binary �ile is
viewed in text editor, each group will be translated as a character and user shows textual characters.
If the �ile is opened in other applications, then it has own use for each byte. The application considers
each byte as output stream of numbers between 0 and 255. It replaces the unprintable characters with
spaces indicating human readable text. It can be helpful for monitoring binary �ile to identify password
in games and hidden text and retain corrupted document. It can be used to explore the suspicious �iles
for unwanted effects. If the �ile is considered as run and executable, the operating system will interpret
the �ile as sequence of instructions in machine language.

Interpretation
Standards are signi�icant to the binary �iles. ASCII character will be displayed in text. Byte may be pixel
or sound or entire word. Binary is meaningless until the executed algorithm describes what needs to be
done with each byte, word or bit. Evaluating the binary to map against the known formats will cause
wrong conclusion. It can be used in steganography where the binary �ile exhibits the hidden content.

15.6 BASIC FILE OPERATIONS


A �ile is a collection of logically related data stored on secondary storage in the form of a sequence
of operations. The creator of the �ile determines the content of the �ile. The many actions that can be
performed on a �ile, such as read, write, open, and close, are referred to as �ile operations. The user
performs these tasks with the assistance of the operating system’s commands. Some examples of
common operations are as follows:
 Create operation: This action is performed to create a �ile in the �ile system. On the �ile system, it is
the most widely used operation. The linked application programme uses the �ile system to create a
new �ile of a certain type. This �ile system allocates space to the �ile. This new �ile gets placed in the
correct directory because the �ile system recognises the directory structure’s format.
 Open operation: This is the most typical operation that is done on the �ile. Before executing any �ile
processing actions, the �ile must �irst be opened. When a user wants to open a �ile, he or she speci�ies
a �ile name that will open that �ile in the �ile system. It instructs the operating system to use the open
system function and provides the �ile system with the �ile name.
 Write operation: Using this method, the information is written into a �ile. A system call write speci�ies
the name of the �ile and the length of data to be written to it. After the last byte is written, the �ile
length is increased by a certain amount and the �ile pointer is shifted.
 Read operation: The contents of a �ile are read with this operation. The OS keeps a Read pointer
those points to the spot where the data has been read up to.
 Re-position or seek operation: The seek system call advances the �ile pointers forward or backward
in the �ile, depending on the user’s demands. This technique is typically done out with the help of �ile
management systems that provide direct access to �iles.

7
 Delete operation: Not only will removing the �ile delete all of the data it contains, but it will also free
up disc space. To delete the selected �ile, the directory is searched. Once the directory entry is found,
all associated �ile space and the directory entry are released.
 Close operation: When the �ile has been processed, it should be closed so that all of the changes are
permanent and all of the resources used are released. When you close the �ile, it deallocates all of the
internal descriptors that were created when you opened it.

15.7 FILE ORGANIZATION AND INDEXING


A �ile organization guarantees that records are prepared for processing. It’s used to discover out how to
ef�iciently organise each base relation’s �iles.
For example, let’s say we wish to sort employee information alphabetically by name. Sorting �iles by
employee name is a good way to organise them. A �ile organised by employee name, on the other hand,
isn’t the ideal way to �ind all employees having grades in a certain range.
The �ile organization can be classi�ied into three types:
 Sequential access �ile organization
 Direct access �ile organization
 Indexed sequential access �ile organization

15.7.1 Indexing
Indexing is a data structure technique that helps to speed up data retrieval. As we can quickly locate
and access the data in the database, it is a must-know data structure that will be needed for database
optimizing. Indexing minimizes the number of disk accesses required when a query is processed. Indexes
are created as a combination of the two columns.
Data retrieval is aided by indexing, which is a data structure approach. Because it lets us to quickly
identify and access data in the database, it is a must-know data structure for database optimization.
Indexing lowers the number of disc accesses necessary when a query is run. The two columns are mixed
together in indexes:
 First column: The Search key is in the �irst column. It has a copy of the table’s primary key or
candidate key. This column’s values can be sorted or not. However, if the values are sorted, the
related data is easily accessible.
 Second column: The Data reference or Pointer is the second column. It contains the disc block
address where the relevant key value can be found. Figure 3 depicts the structure of index:

Search Key Data Reference

Figure 3: Structure of Index

15.7.2 Types of Indexing


Indexing is classi�ied into four types are as follows:
 Primary indexing
 Secondary indexing

8
 Clustered indexing
 Multilevel indexing

Primary Indexing
There are only two columns in primary indexing. The main key values, which are the search keys, are
in the �irst column. The pointers in the second column contain the address to the search key value’s
matching data block. The table should be sorted, and the records in the index �ile and the data blocks
should have a one-to-one relationship. This is a slower but more traditional mechanism. Primary
indexing is further classi�ied into two types are as follows:
 Dense index: For each search key value in the data �ile, there is an index record that contains a
search key and a pointer. Despite the fact that the dense index is a quick solution, it requires more
memory to store index records for each key value. Figure 4 depicts dense index:

1 1 John 25
2 2 Jack 24
3 3 Amey 18
4 4 Ellena 29
5 5 Kate 31
6 6 Will 26
Index record Data block

Figure 4: Dense Index


 Sparse index: There are only a few index records that point to the search key value. First, the index
record starts searching sequentially by pointing to a location of a value in the data �ile until it �inds
the actual location of the search key value. Though sparse indexing is time-consuming, it requires
less memory to store index records as it has less of them. Figure 5 depicts sparse index:

1 John 25

2 Jack 24
1
3 Amey 18
4
4 Ellena 29
6
5 Kate 31

6 Will 26

Index record Data block

Figure 5: Sparse Index

Secondary Indexing (Non-Clustered Indexing)


In the secondary indexing the columns of the candidate key hold the values with the respective pointer
that has the values to the location of an address.

9
An intermediate node is a communication medium between index and data, as shown in Figure 6:

2 Jack 24

1 1 John 25
2 2
6 Will 26
3 4
3 Amey 18
4 5

5 6 4 Ellena 29

6 5 Kate 31
Index �ile Intermediate note Data block

Figure 6: Secondary Indexing

Clustered Indexing
In clustered indexing the table is well-organized. When the indexes are created with the help of non-
primary key at that time, to get the unique values we associate more than two columns together to
identify data uniquely to create the index, as shown in Figure 7:

Sub_id Pointer 1 John 25

4 1 Mill 19

2 1 Gim 20

4 4 Ellena 29

5 4 Ronald 19

6 Will 26

6 Ruby 22

Index �ile Clustered data �ile

Figure 7: Clustered Indexing

Multilevel Indexing
Multilevel indexing is used when the primary index does not �it in the memory. The indices are increased
when the size of the database is increased. In fact only a single-level index can be too huge to accumulate
in the main memory. The data block gets breaken down into the smaller blocks to be stored in the main
memory in multilevel indexing.

10
The multilevel indexing is further classi�ied into two methods:
 B+ tree indexing
 B- tree indexing

Conclusion 15.8 CONCLUSION

 Data hierarchy is the systematic organisation of data, which is typically done in a hierarchical
method.
 A �ile can be a “free formed,” “indexed” or “organised” collection of linked bytes that is only understood
by the person who generated it.
 A text �ile is a form of digital �ile that is non-executable and only contains text.
 A �ile organization guarantees that records are prepared for processing.
 Indexing is a data structure technique that helps to speed up data retrieval.
 In the secondary indexing the columns of the candidate key hold the values with the respective
pointer that has the values to the location of an address.
 There are only two columns in primary indexing.
 In clustered indexing the table is well-organized.
 Multilevel indexing is used when the primary index does not �it in the memory.

15.9 GLOSSARY

 Data hierarchy: The systematic organisation of data, which is typically done in a hierarchical
method.
 File: It can be a “free formed,” “indexed” or “organised” collection of linked bytes that is only
understood by the person who generated it.
 Text �ile: It is a form of digital �ile that is non-executable and only contains text.
 File organization: The guarantees that records are prepared for processing in �ile organization.
 Indexing: It is a data structure technique that helps to speed up data retrieval.
 Secondary indexing: The columns of the candidate key hold the values with the respective pointer
that has the values to the location of an address.
 Clustered indexing: The table is well-organized in clustered indexing.
 Multilevel indexing: It is used when the primary index does not �it in the memory.

15.10 SELF-ASSESSMENT QUESTIONS

A. Essay Type Questions


1. Describe the concept of �ile.
2. All data has its own hierarchy in data hierarchy. Discuss
3. A text �ile, often known as an ASCII �ile or a �lat �ile. Explain the signi�icance of text �ile.
4. Describe the signi�icance of binary �ile.

11
5. Indexing minimizes the number of disk accesses required when a query is processed. What is the
concept of indexing?

15.11 ANSWERS AND HINTS FOR SELF-ASSESSMENT QUESTIONS

A. Hints for Essay Type Questions


1. File is a collection of record. It is allocated for storing large amount of information stored on devices
excluding internal memory of the computer. Refer to Section Introduction
2. Data hierarchy is the systematic organisation of data, which is typically done in a hierarchical
method. Refer to Section Data Hierarchy
3. A text �ile is a form of digital �ile that is non-executable and only contains text. It can comprise
numbers, characters, symbols, and/or a combination of these, but not special formatting such as
italic text, bold text, underlined text, graphics, and so on. Refer to Section Text Files
4. A binary �ile includes all �iles that aren’t used to store textual material. A binary �ile can be used to
construct any custom �ile type as long as the essential information for reading the �ile is stored in
the �ile. Refer to Section Binary Files
5. Indexing is a data structure technique that helps to speed up data retrieval. As we can quickly
locate and access the data in the database, it is a must-know data structure that will be needed for
database optimizing. Refer to Section File Organization and Indexing

@ 15.12 POST-UNIT READING MATERIAL

 https://limbd.org/objectives-factors-to-be-consider-of-�ile-organization/
 https://www.indeed.com/career-advice/career-development/types-of-�iles

15.13 TOPICS FOR DISCUSSION FORUMS

 You can discuss about the concept of �iles and �iles organization with your friends. Also, discuss
about the concept of data hierarchy in real life.

12

You might also like