I need to extract the last monthly values from column 2 of dataframe ntnb45
. Is there any direct way to do this in R?
library(GetTDData)
ntnb <- download.TD.data('NTN-B')
ntnb45 <- read.TD.files(dl.folder = 'TD Files',maturity = '150545')
I need to extract the last monthly values from column 2 of dataframe ntnb45
. Is there any direct way to do this in R?
library(GetTDData)
ntnb <- download.TD.data('NTN-B')
ntnb45 <- read.TD.files(dl.folder = 'TD Files',maturity = '150545')
A quick and simple solution with dplyr
:
library(dplyr)
ultimos <- ntnb45 %>% group_by(anomes = format(ref.date, "%Y%m")) %>% filter(ref.date == max(ref.date))
ultimos
Source: local data frame [151 x 6]
Groups: anomes [151]
ref.date yield.bid price.bid asset.code matur.date anomes
<date> <dbl> <dbl> <chr> <date> <chr>
1 2004-09-30 0.0906 1022.04 NTN-B 150545 2045-05-15 200409
2 2004-10-29 0.0905 1033.15 NTN-B 150545 2045-05-15 200410
3 2004-11-30 0.0905 1002.60 NTN-B 150545 2045-05-15 200411
4 2004-12-31 0.0905 1018.52 NTN-B 150545 2045-05-15 200412
5 2005-01-31 0.0903 1035.64 NTN-B 150545 2045-05-15 200501
6 2005-02-28 0.0905 1045.57 NTN-B 150545 2045-05-15 200502
7 2005-03-31 0.0905 1059.98 NTN-B 150545 2045-05-15 200503
8 2005-04-29 0.0904 1076.44 NTN-B 150545 2045-05-15 200504
9 2005-05-31 0.0910 1039.66 NTN-B 150545 2045-05-15 200505
10 2005-06-30 0.0910 1049.90 NTN-B 150545 2045-05-15 200506
# ... with 141 more rows
I do not know if there is a function that does this directly, but I was able to build an algorithm that apparently solved the problem.
First, I separated the ref.date
column, which has the dates in the form "yyyy-mm-dd", in three other columns. One column corresponds to the year, one column per month and one column per day. I called this result from datas
:
datas <- strsplit(as.character(ntnb45$ref.date), split="-")
datas <- matrix(as.numeric(unlist(datas)), ncol=3, byrow=TRUE)
I'm displaying the code in two lines to be didactic. It is perfectly possible to run this command on a line only.
With the data frame datas
built, all we have to do is look at a fact that occurs when the month ends. Whenever the month changes, the day "zera". It does not return to zero literally, but the value of the first business day of a month is a smaller numeric value than the last business day of the previous month. Thus, it is enough to find the places where, in the next position, the days were numerically lower than the current position. This can be achieved through a first difference:
index <- which(diff(datas[, 3]) < 0)
What this command does is calculate the differences of type x[n+1]-x[n]
. If n
is the numeric value of the day of the month, this value will always be equal to 1, except how much the month changes. That is, the index
vector has all the positions in which the current month has changed. Now just select my correspondents in the original data frame:
ntnb45.ultimo <- ntnb45[index, ]
head(ntnb45.ultimo)
ref.date yield.bid price.bid asset.code matur.date
190 2004-09-30 0.0906 1022.04 NTN-B 150545 2045-05-15
210 2004-10-29 0.0905 1033.15 NTN-B 150545 2045-05-15
230 2004-11-30 0.0905 1002.60 NTN-B 150545 2045-05-15
253 2004-12-31 0.0905 1018.52 NTN-B 150545 2045-05-15
22 2005-01-31 0.0903 1035.64 NTN-B 150545 2045-05-15
40 2005-02-28 0.0905 1045.57 NTN-B 150545 2045-05-15
To get the values in column 2 only from the last days of the month, just run
ntnb45.ultimo[, 2]