preloader

So... how do I upload my data?

blog-image

One of the simplest stages, but which is usually the most problematic in our projects and tasks in R, is loading data.


This is because many times we do not know about nature of our data and even though R is a great tool, this program only reads data (* not intentions *) and it will not be able to do everything for us.

Therefore, in this post we are going to learn about importing and exporting data in formats that we handle the most, because there are a large number of them that we are not going to use at moment. This data may be stored in a file on our computer or presented as online files.

Attention, attention!!!

Before starting with our data load we must take into account following aspects to work with our data and files:
  1. We must avoid names, values or fields with blank spaces. This error is very common since R interprets each blank as a variable, resulting in errors related to number of elements in our data set
  2. Choose short names instead of long ones, this will help you too much to be able to work with data within program, it is not same to type place called “yellow gate near the guest house of La Primorosa farm located in municipality from …. blah blah blah”to just keep in mind that that site is going to be called “site1”
  3. Avoid using following symbols in names:?, $,%, ^, &, *, (,), -, #,?, ,, <,>, /, |, , [,], { and }.
  4. Delete any comment that we have inserted in our files to avoid extra data, otherwise different values will be entered in our file

Now YES, what we came to!

Reading files .txt

It is the most popular file type on our computers. This is a plain text file, so it will be a bit easier to handle, rarely requiring more arguments than specified. In order to import it into our R database we will only need read.table() function, remember that if you need information about this function, you can type ?Read.table().

read.table() function has several arguments to be able to read files. The most important are:

  • file: file location.
  • header: whether or not it has a row with column names.
  • col.names: we manually indicate columns names of our data frame, if we don’t have them
  • stringsAsFactors: by default, text fields are treated as a factor. If we want them to be treated as strings, we set this argument to FALSE.
  • sep: we select symbol that is used to delimit columns.
  • dec: we indicate symbol used for decimal representation.

Thus, the basic syntax for our reading will be as follows, in which we will save our data in a variable, to be able to work easily with it

Attention, attention!!!

You can download this file .txt so that you can see the data shape and to do reading exercise in R
descarga


a<-read.table(file = "txt_example.txt", # File name
        # You can also specify address file
        header = FALSE, # if header is shown (TRUE) or not (FALSE) 
        sep = "", # Here we specify columns separator type                
        dec = ".") # Sign type to specify decimals, semicolons, or commas

Keep in mind that reading data with this type of syntax is if and only if our file is hosted at the same address as the R directory, as we saw in our POST, otherwise it will generate ERROR.

If that is not the case and we have our file in another folder, we can enter choose.files() function as an argument, in order to find the folder where it is hosted, so it will be as follows form

read.table(choose.files(), # This will open a window to find our file
           header = TRUE,
           sep = ",",
           dec = ".")
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa

As we can see, our file has been loaded with total success, 5 variables with 6 data each!


But what would happen if we enter data wrong? Let’s see some examples

In this example, we don’t specify the header, spacing, or punctuation of our data

read.table(choose.files(),
           header = FALSE)
##                                                          V1
## 1 Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species
## 2                                    5.1,3.5,1.4,0.2,setosa
## 3                                      4.9,3,1.4,0.2,setosa
## 4                                    4.7,3.2,1.3,0.2,setosa
## 5                                    4.6,3.1,1.5,0.2,setosa
## 6                                      5,3.6,1.4,0.2,setosa
## 7                                    5.4,3.9,1.7,0.4,setosa

As you can see, our data was disorganized because we didn’t specify to R how it should separate our data, so the program reads them as if they were one per row. Also, if you look at names of our variables they become one more piece of information!


Now let’s see this other example, where we specify wrong arguments in the Header, in separation and punctuation

read.table(choose.files(),
           header=FALSE,
           sep = ".",
           dec = ",")
## Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : line 3 did not have 5 elements

And our result is not even shown, it generates an immediate error, R will warn us that our data is wrong, and it may not be our data as in this example, if not that we don’t specify our reading function well.

Reading files .csv

The CSV (Comma Separated Values) format is one of the most common when exchanging data between applications. A file in this format is a text file, in which each row is a sample or occurrence and each column a variable. The data in each row is separated from each other by commas, hence the format name. This type of data will be a bit more recurring in our particular area.

We can easily create this type of file in Excel as seen in the following images, we just have to bear in mind that empty rows should not be left, it should not be colored, borders or anything should not be placed.

final-606273fa32e29f0085e699da-364467
Save .csv files from Excel


Drive-csv
Save .csv files from Googlesheets


In order to read this type of file we will need read.csv() function, as there are many other functions that have purpose of reading this file, however, this is the most used in general. The arguments are very similar to the read.table() function

Attention, attention!!!

You can download this file .csv so that you can see data shape and to do reading exercise in R
descarga


read.csv(file = "csv_example.csv",
           header = TRUE,
           sep = ",",
           dec = ".")
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa

And so, we have loaded our file, we are learning more and more!

Download data online

There is an unimaginable amount of data on web that we can download and use for our studies and as we love to use R, there is also a way to download it directly to R and not occupy our PC memory with one more file (we clarify that it is going to use memory in another way, but only when we use it in our R session)

To begin, we are going to use same functions that we used to load, as we saw previously; the only difference is that, instead of providing the file path, we will have to provide internet path, by means of a variable, as we see in this example

url<-"https://drive.google.com/uc?id=14drpgXNmwwy-vqjGTUvyhx2c8oBZ9Wqr&export=download"

In this step, we only save address of the online file in a variable, you can call it in any way. Now, we are going to carry out same steps that we had already learned, only in the file name we are going to put our previously created variable

read.table(url, # We specify our "url" variable
           header = TRUE,
           sep = ",",
           dec = ".")
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa
voil
And… voilà!!!


Reading Files .Xlsx

Now we will learn about reading Excel files. To do this, it is necessary that we have readxl package installed in our database and remember that to make use of it, we must load it in our system with library() function

install.packages("readxl")
library(readxl)

Thus, the arguments for our read_excel function, contain the next guidelines

sheet_excel <- read_excel(path = '', # File address
                        sheet = "iris", # File sheet to read
                        range = "C1:E4", # We can only read a range of the sheet
                        n_max = 8) # We can read a maximum amount of data

Attention, attention!!!

You can download this file .xlsx so that you can see data shape and to do reading exercise in R
xls


read_excel(choose.files(),
                  sheet = "iris",
                  n_max = 6)
## # A tibble: 6 x 5
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
##          <dbl>       <dbl>        <dbl>       <dbl> <chr>  
## 1          5.1         3.5          1.4         0.2 setosa 
## 2          4.9         3            1.4         0.2 setosa 
## 3          4.7         3.2          1.3         0.2 setosa 
## 4          4.6         3.1          1.5         0.2 setosa 
## 5          5           3.6          1.4         0.2 setosa 
## 6          5.4         3.9          1.7         0.4 setosa

Thus, we can see that our first 6 data from sheet iris of the book xls_example have been loaded successfully!

Import multiple files

Finally, in some situations we find ourselves faced with this situation: having multiple files for our projects, which means reading each one of them, and if there are enough, because task becomes a bit tedious, therefore, here we show you a way to be able to read all our files (keep in mind that they must have same extension) in our working folder.

First we get files list within folder in question

files_project <- list.files(path = 'C:/Users/David/Desktop/Proyect')

Later, we will read all files from a new list with lappy() function

file_list <- lapply(files_project, read.csv()) 
# In this example we will only read .csv files
# You can also define other files types

Post summary

Well, today we are faced with one of the most complex steps in our studies, reading data. We were able to learn how to import .txt, .csv and .xlsx files, the most common file types we use in science, keep in mind that there are many others, and if you face one of them, you can write to us and we will explain it or you can simply search in google and find many references to solve your problem. Cheer up!!!

After importing the data How do I analyze it?

After learning how to import data into R, it is necessary to analyze our data, know what we are dealing with and how we should treat it … but wait, this is a spoiler… that will come in our next post, which is loaded with many more things to remember and learn. !!

We will wait for you!!!




Information consulted

  • Jakob Jenkov (2020) R - Load Data
  • Al-Ahmadgaid Asaad in R bloggers (2013) Importing Data to R
  • Karlijn Willems in Datacamp (2018) This R Data Import Tutorial Is Everything You Need
  • Matias Andina (2015) Introducción a estadística con R
  • Freddy Hernández (2012) Manual de R
  • RCoder
  • Mauricio Anderson (2016) Curso de R