R (programming language)
BASIC INFORMATION
- R (open source language) is successor of S (proprietary language)
- Like Python use # for comments
- Use Ctrl + Enter to run the current line
- OR select multiple lines and press Ctrl + Enter
- Ctrl + L to clear console in R studio
- How to print variable?
- use print(variable) → CAN only print 1 object at a time
- OR select variable and hit Ctrl + Enter
- Put things in () to print while doing assigning
- (x <- 5) → makes x=5 and prints x
x <- 5 → only makes x=5 - Use View() for better visualisaion (for matrix or data frame)
- Variable name can have {letters, number, . , _ }
- But can start only with letter or .
- If printing multiple things on same line
- USE ; (semicolon)
- Eg.- a <- 10; b <- 20
- Write (TRUE & FALSE) or (T & F) → but True or true are both wrong
- Can store string in " " or ' '
- single-quoted strings can’t contain single quotes
Similarly, double-quoted string can’t contain double quotes - 'I am 'T' the don' → WRONG
"I am 'T' the don" → Correct - But we can write single quotes in double-quoted string and vice-versa
- In-built functions ==> can be applied on both vector and single number
- ceiling, abs, sqrt, floor, sin, log, log2, log10, exp, round, sum, prod, max
- s <- "HE28llo" => assign variable 's' with string
substr(s, 2, 5) → E28l [NOTE: - indexing start from 1]
nchar(s) → 7 which is count of number of characters in string - class(....) function strength ==> list > character > numeric > logical
- Integer Division (%/%) → c(2,3,5,7) %/% 2 ==> 1 1 2 3
- Modulo Division (%%) → c(2,3,5,7) %% c(2,3) ==> 0 0 1 1
- equivalent to (2%2) (3%3) (5%2) (7%3)
- Also, length of (2,3,5,7) = 4 must be divisible by length of (2,3) = 2
- Logical Operators -
- xor(a, b) ✅ a ^ b ❌
- &, | → element-wise comparisons (if vector) → returns vector of results
&&, || → only evaluates single condition and not vector like & or | - Let x = 1:6
(x > 2) & (x < 5) → F F T T F F
x[(x > 2) & (x < 5)] → [3, 4] → logical indexing => get values that are T - (x > 2) && (x < 5) ❌ bcz && cannot parse vectors
(x[1] > 2) && (x[1] < 5) ✅ - Vectorized if statement
Let x = 1:6
ifelse( x < 3, x², x +1) → if x < 3 then x = x² else x = x+1 - Functions → name <- function(x, y) { x+y }
name(3, 4)
for ( i in 1:5 ) { print( i+1) }
repeat { .... } is same as while(TRUE) { ... } - Factors in R - helps to categorize data and store it as levels
- factor(V) → created factor where V is vector + levels are unique values from V
- Labels - Human readable names associated with each level
- Ordering - Factors can be ordered (ordinal) or unordered (nominal)
- How to import dataset
- read.csv('Full path of dataset')
- read.csv(file.choose()) → pop-up appears to select file
Data Structures
- Vector → 1-D array that can hold elements of the same data type
- created using functions like c() or seq() [c means combine]
- c(1,2,5.3,6,-2,4.78)
0:7 ===> [0,1,2,3,4,5,6,7]
7:0 ===> [7,6,5,4,3,2,1,0]
seq(1, 10, length = 5) --> 1 to 10 with equal spacing of total 5 length
seq(from=0, to=6, by = 0.2) → 0 to 6 with spacing of 0.2
rep(1:5, times=2) → 1 2 3 4 5 1 2 3 4 5
rep(1:5, each=2) → 1 1 2 2 3 3 4 4 5 5 - scan()- after running go to console and enter numbers
=> press 'enter' twice to stop taking input - To access
- a<- c("ram" = 12, "by" = -2)
a["ram"] => Correct {bcz key & value pair => can access using key}
a[12] => NA - cnt <- c("one","two","three","four")
cnt[3] = three
cnt[9] = NA
cnt[-2] = "one" "three" "four" → all except index 2
cnt[2:4] = "two" "three" "four" → all from 2 till 4
cnt[2, 3, 2] => Wrong
cnt[c(2, 3, 2)] => "two" "three" "two" [Correct] - Built-in Functions -> length(a), sort(a), sort(a, decreasing = T)
- To delete -> assign NULL to the vector
- If 2 vectors are of same size => can do (a+b) or (a/b) or other operations
- List → 2-D array that can hold elements of different data type
- a<- list(1,5.3,-2,c("one","two","three"))
- To convert list to vector
a<- list(1,5.3,-2)
v<- unlist(a) - Matrix → 2-D array with all same data type
- matrix(nrow=3, ncol=2, data=c(1,2,3,4,5,6))
[,1] [,2] → to access => x[3,2] = 6[1,] 1 4 → to enter data row-wise => make byrow = T[2,] 2 5 → if data = 1:2 => same as {1, 2, 1, 2, 1, 2} [3,] 3 6 NOTE - data must be multiple of (r*c) x[3,] = {3 6} → row x[,2] = {4 5 6} → column x[2:3, 2] → for submatrixcbind - combines vector, matrix or data-frame by column
t(matrix) → transpose of matrixdim(matrix) → 3 2 {rows, columns of matrix => dimension}
nrow(matrix) or ncol(matrix) → tells number of rows or columns of matrixdiag(1:10, nrow=5, ncol=7) → creates diagonal matrix with diagonal elements
from 1 to 10 {NOTE - diagonal will be till 5 only bcz no more row available)matrix * 5 or any arithmetic operation with constant or with another matrix of
same dimension → that arithmetic operation is done with every element
For matrix multiplication → use % * %
crossprod(matrix) ≈ t(x) % * %xsolve(matrix) → for inverse of matrix
eigen(matrix) → find eigen values and eigen vectors of a matrix- Dataframe → 2-D array like structure whose each column can have different data types but in same column -> same data type
- emp<-data.frame(
id = c(1:3), name = c("A","B","C"),sal = c(523.4,98,452.89)
stringsAsFactors = FALSE) - stringsAsFactors = FALSE → because R has tendency to convert character vectors into factors => telling R to keep columns as characters and not convert them to factors automatically
This gives us control over data, as we can convert specific columns to factors when needed. - str(emp) => gives structure of whole data frame
- To access
- column → f1<- data.frame(emp["name"])
- row →
f1<- emp[2,] => for only 2nd row
f1<- emp[2:5,] => for 2nd, 3rd, 4th, 5th row - emp[2,3] => for element at (2, 3) position
- To Add
- list(4, "D", 61) to emp as new row
- x<-list(4, "D", 61)
- rbind(emp, x)
- vector(12, 4, 5, 9) to emp as column
- y<-vector(12, 4, 5, 9)
- cbind(emp, Age = y) → NAME of column is Age with values of y
- To Remove
- Row => emp<-emp[-2,]
- Column => emp$id<- NULL
Detailed Information
- paste - Concatenate vector after converting them all to character and returns character vector => can be assigned to variable
- paste(10, 20) → "10 20" (use space as the default separator)
paste(10, 20, sep="-") → "10-20" - x <- c("a", "b", "c")
paste(x, 1:5) → "a 1" "b 2" "c 3" "a 4" "b 5" - collapse - all elements of the resulting vector are combined into a single string with specified separator between them specified by collapse
- x <- c("a", "b", "c")
paste(x, 1:5, collapse = '-') → "a 1-b 2-c 3-a 4-b 5" - 'sep' separates elements within each position of the concatenation
'collapse' joins the final results into a single string - paste0 - Concatenates strings without any separator (baaki properties same hai)
- slightly more efficient than paste
- cat - Concatenate and prints directly to console => used for debugging
- cannot store in variable
- use '\n' or '\t' as newline character or tab with cat only → with paste or print, it gets treated as normal string (NOTE - can use '\n' or '\t' as sep with paste)
- letters → built-in function that contains 26 lowercase English alphabets
LETTERS → contains 26 uppercase English alphabets
letters[ c(2,4,6) ] → "b" "d" "f" and with LETTERS[ c(2,4,6) ] → "B" "D" "F" - Use readline for user input (like cin>>x)
- Use prompt for better user experience
- Eg.=> a<- readline(prompt = "Enter your name: ")
- Naming of Vector or List
- cnt <- c("one","two","three","four")
names(cnt)= c("a","b","c","d") - Now, b represents two
cnt["b"] or cnt$b → gives "two" - ls()- stands for "list objects" & list names of all objects
rm()- stands for "remove" & remove objects from current workspace
summary()- all needed data like mean, median, quartile range, ... - = VS <- → use <- as it will always be correct but = can fail in 1% cases
- x <- y <- 5 is correct and x=y=5
x <- y = 5 is wrong bcz = has lower precedence → hence first x=y then y=5 - Difference in scope when use them to set argument value in a function call
- '=' → scope of function => doesn't exist in user workspace
'<-' → scope of user workspace => also valid after function call completes - Press Alt and '-' together for '<-'
Graphics
- plot(a, b, main="TASK-7", xlab="Salary", ylab="Age", pch=19, col="red", cex=1.5)
- main creates heading => TASK-7
x-axis label => Salary
y-axis label => Age
col is for colours
pch = 19 for solid circles
cex = 1.5 → make everything 150% in size - plot function automatically finds the best possible distribution for given data considering the number of parameters and number of variables
- Categorical variables - use bar chart (use barplot)
- Don't use barplot(..) directly → creates bar graph for all raw data → Messy
First make table(..) then use barplot → table summarizes data => better visualize - table - used for creating frequency tables and cross-tabulations
=> summarizing categorical data - Quantitative variables - use histogram (use hist)
- Observe these aspects
- Shape - whether skewed, symmetrical
- Gaps in histogram and Outliers
- Histogram in groups -
par(mfrow = c(3,1)) → Puts graph in 3 row and 1 column
make histogram for each of the 3 variety
par(mfrow = c(1,1)) → Restore the graphic parameter - breaks parameter in hist - determines the number and width of bins(intervals)
- Less bins - oversimplify data and hide important details
More bins - overcomplicate data and show noise instead of trends - breaks = 5 → Divide data into 5 bins of nearly equal size
breaks = c(0, 10, 20, 30) → Specifies where each bin starts and ends - xlim parameter in hist - limits the range of x-axis
- Data outside range will be included in calculations but not shown on plot
- freq parameter in hist - if set to False => considers density (not frequency)
- For two quantitative variables - use scatter plot (use plot)
- Observe these aspects
- Linear - whether they follow linearity or not
- Consistent spread across the plane
- Outliers and correlation
Comments
Post a Comment