Catch only last month using R

5

I need to extract the last monthly values from column 2 of dataframe ntnb45 . Is there any direct way to do this in R?

library(GetTDData)
ntnb <- download.TD.data('NTN-B')
ntnb45 <- read.TD.files(dl.folder = 'TD Files',maturity = '150545')
    
asked by anonymous 24.03.2017 / 21:20

2 answers

3

A quick and simple solution with dplyr :

library(dplyr)
ultimos <- ntnb45 %>% group_by(anomes = format(ref.date, "%Y%m")) %>% filter(ref.date == max(ref.date))
ultimos

Source: local data frame [151 x 6]
Groups: anomes [151]

     ref.date yield.bid price.bid   asset.code matur.date anomes
       <date>     <dbl>     <dbl>        <chr>     <date>  <chr>
1  2004-09-30    0.0906   1022.04 NTN-B 150545 2045-05-15 200409
2  2004-10-29    0.0905   1033.15 NTN-B 150545 2045-05-15 200410
3  2004-11-30    0.0905   1002.60 NTN-B 150545 2045-05-15 200411
4  2004-12-31    0.0905   1018.52 NTN-B 150545 2045-05-15 200412
5  2005-01-31    0.0903   1035.64 NTN-B 150545 2045-05-15 200501
6  2005-02-28    0.0905   1045.57 NTN-B 150545 2045-05-15 200502
7  2005-03-31    0.0905   1059.98 NTN-B 150545 2045-05-15 200503
8  2005-04-29    0.0904   1076.44 NTN-B 150545 2045-05-15 200504
9  2005-05-31    0.0910   1039.66 NTN-B 150545 2045-05-15 200505
10 2005-06-30    0.0910   1049.90 NTN-B 150545 2045-05-15 200506
# ... with 141 more rows
    
25.03.2017 / 20:07
3

I do not know if there is a function that does this directly, but I was able to build an algorithm that apparently solved the problem.

First, I separated the ref.date column, which has the dates in the form "yyyy-mm-dd", in three other columns. One column corresponds to the year, one column per month and one column per day. I called this result from datas :

datas <- strsplit(as.character(ntnb45$ref.date), split="-")
datas <- matrix(as.numeric(unlist(datas)), ncol=3, byrow=TRUE)

I'm displaying the code in two lines to be didactic. It is perfectly possible to run this command on a line only.

With the data frame datas built, all we have to do is look at a fact that occurs when the month ends. Whenever the month changes, the day "zera". It does not return to zero literally, but the value of the first business day of a month is a smaller numeric value than the last business day of the previous month. Thus, it is enough to find the places where, in the next position, the days were numerically lower than the current position. This can be achieved through a first difference:

index <- which(diff(datas[, 3]) < 0)

What this command does is calculate the differences of type x[n+1]-x[n] . If n is the numeric value of the day of the month, this value will always be equal to 1, except how much the month changes. That is, the index vector has all the positions in which the current month has changed. Now just select my correspondents in the original data frame:

ntnb45.ultimo <- ntnb45[index, ]
head(ntnb45.ultimo)
  ref.date yield.bid price.bid   asset.code matur.date
190 2004-09-30    0.0906   1022.04 NTN-B 150545 2045-05-15
210 2004-10-29    0.0905   1033.15 NTN-B 150545 2045-05-15
230 2004-11-30    0.0905   1002.60 NTN-B 150545 2045-05-15
253 2004-12-31    0.0905   1018.52 NTN-B 150545 2045-05-15
22  2005-01-31    0.0903   1035.64 NTN-B 150545 2045-05-15
40  2005-02-28    0.0905   1045.57 NTN-B 150545 2045-05-15

To get the values in column 2 only from the last days of the month, just run

ntnb45.ultimo[, 2]
    
24.03.2017 / 22:13