R (programming language)

R (programming language)

- September 11, 2025

BASIC INFORMATION

R (open source language) is successor of S (proprietary language)
Like Python use # for comments
Use Ctrl + Enter to run the current line

OR select multiple lines and press Ctrl + Enter

Ctrl + L to clear console in R studio
How to print variable?

use print(variable) → CAN only print 1 object at a time
OR select variable and hit Ctrl + Enter
Put things in () to print while doing assigning

(x <- 5) → makes x=5 and prints x
x <- 5 → only makes x=5

Use View() for better visualisaion (for matrix or data frame)

Variable name can have {letters, number, . , _ }

But can start only with letter or .

If printing multiple things on same line

USE ; (semicolon)
Eg.- a <- 10; b <- 20

Write (TRUE & FALSE) or (T & F) → but True or true are both wrong
Can store string in " " or ' '
single-quoted strings can’t contain single quotes
Similarly, double-quoted string can’t contain double quotes

'I am 'T' the don' → WRONG
"I am 'T' the don" → Correct
But we can write single quotes in double-quoted string and vice-versa

In-built functions ==> can be applied on both vector and single number

ceiling, abs, sqrt, floor, sin, log, log2, log10, exp, round, sum, prod, max

s <- "HE28llo" => assign variable 's' with string
substr(s, 2, 5) → E28l [NOTE: - indexing start from 1]
nchar(s) → 7 which is count of number of characters in string
class(....) function strength ==> list > character > numeric > logical
Integer Division (%/%) → c(2,3,5,7) %/% 2 ==> 1 1 2 3

Modulo Division (%%) → c(2,3,5,7) %% c(2,3) ==> 0 0 1 1

equivalent to (2%2) (3%3) (5%2) (7%3)
Also, length of (2,3,5,7) = 4 must be divisible by length of (2,3) = 2

Logical Operators -

xor(a, b) ✅ a ^ b ❌
&, | → element-wise comparisons (if vector) → returns vector of results
&&, || → only evaluates single condition and not vector like & or |

Let x = 1:6
(x > 2) & (x < 5) → F F T T F F
x[(x > 2) & (x < 5)] → [3, 4] → logical indexing => get values that are T
(x > 2) && (x < 5) ❌ bcz && cannot parse vectors
(x[1] > 2) && (x[1] < 5) ✅

Vectorized if statement
Let x = 1:6
ifelse( x < 3, x², x +1) → if x < 3 then x = x² else x = x+1

Functions → name <- function(x, y) { x+y }
name(3, 4)
for ( i in 1:5 ) { print( i+1) }
repeat { .... } is same as while(TRUE) { ... }
Factors in R - helps to categorize data and store it as levels

factor(V) → created factor where V is vector + levels are unique values from V
Labels - Human readable names associated with each level
Ordering - Factors can be ordered (ordinal) or unordered (nominal)

How to import dataset

read.csv('Full path of dataset')
read.csv(file.choose()) → pop-up appears to select file

Data Structures

Vector → 1-D array that can hold elements of the same data type

created using functions like c() or seq() [c means combine]

c(1,2,5.3,6,-2,4.78)
0:7 ===> [0,1,2,3,4,5,6,7]
7:0 ===> [7,6,5,4,3,2,1,0]
seq(1, 10, length = 5) --> 1 to 10 with equal spacing of total 5 length
seq(from=0, to=6, by = 0.2) → 0 to 6 with spacing of 0.2
rep(1:5, times=2) → 1 2 3 4 5 1 2 3 4 5
rep(1:5, each=2) → 1 1 2 2 3 3 4 4 5 5
scan()- after running go to console and enter numbers
=> press 'enter' twice to stop taking input

To access

a<- c("ram" = 12, "by" = -2)
a["ram"] => Correct {bcz key & value pair => can access using key}
a[12] => NA
cnt <- c("one","two","three","four")
cnt[3] = three
cnt[9] = NA
cnt[-2] = "one" "three" "four" → all except index 2
cnt[2:4] = "two" "three" "four" → all from 2 till 4

cnt[2, 3, 2] => Wrong
cnt[c(2, 3, 2)] => "two" "three" "two" [Correct]

Built-in Functions -> length(a), sort(a), sort(a, decreasing = T)
To delete -> assign NULL to the vector
If 2 vectors are of same size => can do (a+b) or (a/b) or other operations

List → 2-D array that can hold elements of different data type

a<- list(1,5.3,-2,c("one","two","three"))
To convert list to vector
a<- list(1,5.3,-2)
v<- unlist(a)

Matrix → 2-D array with all same data type

matrix(nrow=3, ncol=2, data=c(1,2,3,4,5,6))

     [,1]  [,2]          → to access => x[3,2] = 6
[1,]    1     4          → to enter data row-wise => make byrow = T
[2,]    2     5          → if data = 1:2 => same as {1, 2, 1, 2, 1, 2}
[3,]    3     6               NOTE - data must be multiple of (r*c)

x[3,] = {3 6}  → row
x[,2] = {4 5 6}  → column
x[2:3, 2] → for submatrix

cbind - combines vector, matrix or data-frame by column
t(matrix) → transpose of matrix
dim(matrix)  → 3 2  {rows, columns of matrix => dimension}
nrow(matrix) or ncol(matrix) → tells number of rows or columns of matrix

diag(1:10, nrow=5, ncol=7) → creates diagonal matrix with diagonal elements
from 1 to 10 {NOTE - diagonal will be till 5 only bcz no more row available)

matrix * 5 or any arithmetic operation with constant or with another matrix of
same dimension → that arithmetic operation is done with every element
For matrix multiplication → use % * %
crossprod(matrix)   ≈   t(x) % * %x

solve(matrix)  → for inverse of matrix
eigen(matrix)  → find eigen values and eigen vectors of a matrix

Dataframe → 2-D array like structure whose each column can have different data types but in same column -> same data type

emp<-data.frame(
id = c(1:3), name = c("A","B","C"),sal = c(523.4,98,452.89)
stringsAsFactors = FALSE)

stringsAsFactors = FALSE → because R has tendency to convert character vectors into factors => telling R to keep columns as characters and not convert them to factors automatically
This gives us control over data, as we can convert specific columns to factors when needed.

str(emp) => gives structure of whole data frame
To access

column → f1<- data.frame(emp["name"])
row →
f1<- emp[2,] => for only 2nd row
f1<- emp[2:5,] => for 2nd, 3rd, 4th, 5th row
emp[2,3] => for element at (2, 3) position

To Add

list(4, "D", 61) to emp as new row

x<-list(4, "D", 61)
rbind(emp, x)

vector(12, 4, 5, 9) to emp as column

y<-vector(12, 4, 5, 9)
cbind(emp, Age = y) → NAME of column is Age with values of y

To Remove

Row => emp<-emp[-2,]
Column => emp$id<- NULL

Detailed Information

paste - Concatenate vector after converting them all to character and returns character vector => can be assigned to variable

paste(10, 20) → "10 20" (use space as the default separator)
paste(10, 20, sep="-") → "10-20"
x <- c("a", "b", "c")
paste(x, 1:5) → "a 1" "b 2" "c 3" "a 4" "b 5"
collapse - all elements of the resulting vector are combined into a single string with specified separator between them specified by collapse

x <- c("a", "b", "c")
paste(x, 1:5, collapse = '-') → "a 1-b 2-c 3-a 4-b 5"
'sep' separates elements within each position of the concatenation
'collapse' joins the final results into a single string

paste0 - Concatenates strings without any separator (baaki properties same hai)

slightly more efficient than paste

cat - Concatenate and prints directly to console => used for debugging

cannot store in variable
use '\n' or '\t' as newline character or tab with cat only → with paste or print, it gets treated as normal string (NOTE - can use '\n' or '\t' as sep with paste)

letters → built-in function that contains 26 lowercase English alphabets
LETTERS → contains 26 uppercase English alphabets
letters[ c(2,4,6) ] → "b" "d" "f" and with LETTERS[ c(2,4,6) ] → "B" "D" "F"
Use readline for user input (like cin>>x)

Use prompt for better user experience

Eg.=> a<- readline(prompt = "Enter your name: ")

Naming of Vector or List

cnt <- c("one","two","three","four")
names(cnt)= c("a","b","c","d")
Now, b represents two
cnt["b"] or cnt$b → gives "two"

ls()- stands for "list objects" & list names of all objects
rm()- stands for "remove" & remove objects from current workspace
summary()- all needed data like mean, median, quartile range, ...
= VS <- → use <- as it will always be correct but = can fail in 1% cases

x <- y <- 5 is correct and x=y=5
x <- y = 5 is wrong bcz = has lower precedence → hence first x=y then y=5
Difference in scope when use them to set argument value in a function call

'=' → scope of function => doesn't exist in user workspace
'<-' → scope of user workspace => also valid after function call completes

Press Alt and '-' together for '<-'

Graphics

plot(a, b, main="TASK-7", xlab="Salary", ylab="Age", pch=19, col="red", cex=1.5)

main creates heading => TASK-7
x-axis label => Salary
y-axis label => Age
col is for colours
pch = 19 for solid circles
cex = 1.5 → make everything 150% in size
plot function automatically finds the best possible distribution for given data considering the number of parameters and number of variables

Categorical variables - use bar chart (use barplot)

Don't use barplot(..) directly → creates bar graph for all raw data → Messy
First make table(..) then use barplot → table summarizes data => better visualize

table - used for creating frequency tables and cross-tabulations
=> summarizing categorical data

Quantitative variables - use histogram (use hist)

Observe these aspects

Shape - whether skewed, symmetrical
Gaps in histogram and Outliers

Histogram in groups -
par(mfrow = c(3,1)) → Puts graph in 3 row and 1 column
make histogram for each of the 3 variety
par(mfrow = c(1,1)) → Restore the graphic parameter
breaks parameter in hist - determines the number and width of bins(intervals)

Less bins - oversimplify data and hide important details
More bins - overcomplicate data and show noise instead of trends
breaks = 5 → Divide data into 5 bins of nearly equal size
breaks = c(0, 10, 20, 30) → Specifies where each bin starts and ends

xlim parameter in hist - limits the range of x-axis

Data outside range will be included in calculations but not shown on plot

freq parameter in hist - if set to False => considers density (not frequency)

For two quantitative variables - use scatter plot (use plot)

Observe these aspects

Linear - whether they follow linearity or not
Consistent spread across the plane
Outliers and correlation

Comments