One of the simplest stages, but which is usually the most problematic in our projects and tasks in R, is loading data.
![](https://i.ibb.co/68N8Mmw/preo.gif)
This is because many times we do not know about nature of our data and even though R is a great tool, this program only reads data (* not intentions *) and it will not be able to do everything for us.
Therefore, in this post we are going to learn about importing and exporting data in formats that we handle the most, because there are a large number of them that we are not going to use at moment. This data may be stored in a file on our computer or presented as online files.
Before starting with our data load we must take into account following aspects to work with our data and files:
Attention, attention!!!
- We must avoid names, values or fields with blank spaces. This error is very common since R interprets each blank as a variable, resulting in errors related to number of elements in our data set
- Choose short names instead of long ones, this will help you too much to be able to work with data within program, it is not same to type place called “yellow gate near the guest house of La Primorosa farm located in municipality from …. blah blah blah”to just keep in mind that that site is going to be called “site1”
- Avoid using following symbols in names:?, $,%, ^, &, *, (,), -, #,?, ,, <,>, /, |, , [,], { and }.
- Delete any comment that we have inserted in our files to avoid extra data, otherwise different values will be entered in our file
Now YES, what we came to!
![](https://i.ibb.co/Hn4rYJn/feliz.gif)
Reading files .txt
It is the most popular file type on our computers. This is a plain text file, so it will be a bit easier to handle, rarely requiring more arguments than specified. In order to import it into our R database we will only need
file : file location.header : whether or not it has a row with column names.col.names : we manually indicate columns names of our data frame, if we don’t have themstringsAsFactors : by default, text fields are treated as a factor. If we want them to be treated as strings, we set this argument to FALSE.sep : we select symbol that is used to delimit columns.dec : we indicate symbol used for decimal representation.
Thus, the basic syntax for our reading will be as follows, in which we will save our data in a variable, to be able to work easily with it
You can download this file
Attention, attention!!!
.txt so that you can see the data shape and to do reading exercise in R
![descarga](https://i.ibb.co/dfgPD3F/txt-icon.png)
a<-read.table(file = "txt_example.txt", # File name
# You can also specify address file
header = FALSE, # if header is shown (TRUE) or not (FALSE)
sep = "", # Here we specify columns separator type
dec = ".") # Sign type to specify decimals, semicolons, or commas
Keep in mind that reading data with this type of syntax is if and only if our file is hosted at the same address as the R directory, as we saw in our POST, otherwise it will generate ERROR.
If that is not the case and we have our file in another folder, we can enter
read.table(choose.files(), # This will open a window to find our file
header = TRUE,
sep = ",",
dec = ".")
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
As we can see, our file has been loaded with total success, 5 variables with 6 data each!
![](https://i.ibb.co/S6zDB90/happy.gif)
But what would happen if we enter data wrong? Let’s see some examples
In this example, we don’t specify the header, spacing, or punctuation of our data
read.table(choose.files(),
header = FALSE)
## V1
## 1 Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species
## 2 5.1,3.5,1.4,0.2,setosa
## 3 4.9,3,1.4,0.2,setosa
## 4 4.7,3.2,1.3,0.2,setosa
## 5 4.6,3.1,1.5,0.2,setosa
## 6 5,3.6,1.4,0.2,setosa
## 7 5.4,3.9,1.7,0.4,setosa
As you can see, our data was disorganized because we didn’t specify to R how it should separate our data, so the program reads them as if they were one per row. Also, if you look at names of our variables they become one more piece of information!
![](https://i.ibb.co/N6GSxfD/no.gif)
Now let’s see this other example, where we specify wrong arguments in the Header, in separation and punctuation
read.table(choose.files(),
header=FALSE,
sep = ".",
dec = ",")
## Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : line 3 did not have 5 elements
And our result is not even shown, it generates an immediate error, R will warn us that our data is wrong, and it may not be our data as in this example, if not that we don’t specify our reading function well.
![](https://i.ibb.co/SVY9tZT/help.gif)
Reading files .csv
The CSV (Comma Separated Values) format is one of the most common when exchanging data between applications. A file in this format is a text file, in which each row is a sample or occurrence and each column a variable. The data in each row is separated from each other by commas, hence the format name. This type of data will be a bit more recurring in our particular area.
We can easily create this type of file in Excel as seen in the following images, we just have to bear in mind that empty rows should not be left, it should not be colored, borders or anything should not be placed.
![final-606273fa32e29f0085e699da-364467](https://i.ibb.co/VwZDWND/final-606273fa32e29f0085e699da-364467.gif)
Save .csv files from Excel
![Drive-csv](https://i.ibb.co/Pggh11f/Drive-csv.gif)
Save .csv files from Googlesheets
In order to read this type of file we will need
You can download this file
Attention, attention!!!
.csv so that you can see data shape and to do reading exercise in R
![descarga](https://i.ibb.co/6gx1V1Y/csv-icon.png)
read.csv(file = "csv_example.csv",
header = TRUE,
sep = ",",
dec = ".")
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
And so, we have loaded our file, we are learning more and more!
![](https://i.ibb.co/hKs2mc4/ride.gif)
Download data online
There is an unimaginable amount of data on web that we can download and use for our studies and as we love to use R, there is also a way to download it directly to R and not occupy our PC memory with one more file (we clarify that it is going to use memory in another way, but only when we use it in our R session)
To begin, we are going to use same functions that we used to load, as we saw previously; the only difference is that, instead of providing the file path, we will have to provide internet path, by means of a variable, as we see in this example
url<-"https://drive.google.com/uc?id=14drpgXNmwwy-vqjGTUvyhx2c8oBZ9Wqr&export=download"
In this step, we only save address of the online file in a variable, you can call it in any way. Now, we are going to carry out same steps that we had already learned, only in the file name we are going to put our previously created variable
read.table(url, # We specify our "url" variable
header = TRUE,
sep = ",",
dec = ".")
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
![voil](https://i.ibb.co/h7x8p9f/voil.gif)
And… voilà!!!
Reading Files .Xlsx
Now we will learn about reading
install.packages("readxl")
library(readxl)
Thus, the arguments for our
sheet_excel <- read_excel(path = '', # File address
sheet = "iris", # File sheet to read
range = "C1:E4", # We can only read a range of the sheet
n_max = 8) # We can read a maximum amount of data
You can download this file
Attention, attention!!!
.xlsx so that you can see data shape and to do reading exercise in R
![xls](https://i.ibb.co/XzfKvzx/xls.png)
read_excel(choose.files(),
sheet = "iris",
n_max = 6)
## # A tibble: 6 x 5
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## <dbl> <dbl> <dbl> <dbl> <chr>
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
Thus, we can see that our first 6 data from sheet
![](https://i.ibb.co/SNryVtq/hap.gif)
Import multiple files
Finally, in some situations we find ourselves faced with this situation: having multiple files for our projects, which means reading each one of them, and if there are enough, because task becomes a bit tedious, therefore, here we show you a way to be able to read all our files (keep in mind that they must have same extension) in our working folder.
First we get files list within folder in question
files_project <- list.files(path = 'C:/Users/David/Desktop/Proyect')
Later, we will read all files from a new list with
file_list <- lapply(files_project, read.csv())
# In this example we will only read .csv files
# You can also define other files types
Post summary
Well, today we are faced with one of the most complex steps in our studies, reading data. We were able to learn how to import
![](https://i.ibb.co/yFZpsHC/Kmta.gif)
After importing the data How do I analyze it?
After learning how to import data into R, it is necessary to analyze our data, know what we are dealing with and how we should treat it … but wait, this is a spoiler… that will come in our next post, which is loaded with many more things to remember and learn. !!
We will wait for you!!!
![](https://i.ibb.co/GMc8BTf/final.gif)
Information consulted
- Jakob Jenkov (2020) R - Load Data
- Al-Ahmadgaid Asaad in R bloggers (2013) Importing Data to R
- Karlijn Willems in Datacamp (2018) This R Data Import Tutorial Is Everything You Need
- Matias Andina (2015) Introducción a estadística con R
- Freddy Hernández (2012) Manual de R
- RCoder
- Mauricio Anderson (2016) Curso de R