As stated in the question link, a minimum repeatable example should have the following contents:
A small set of data;
The smallest possible code that is executable and reproduces the error in the small data set mentioned;
Information about the version of R
and the system on which you are running the code, as well as the packages used;
If you use random data, ensure the results are the same;
In this answer I will list some of the main functions in R
to accomplish these tasks.
It's worth remembering that the examples help pages of R
functions can be of great value in getting a sense of the structure of a repeatable minimal example. In general, the% help codes for the examples of R
satisfy these requirements.
Producing the dataset
To use your own data set, the dput()
function together with head()
can be very useful. For example the code below provides the first 10 observations of the iris
database already with the structure needed to "reassemble" the database. So, for anyone trying to answer your question, just copy and paste the code into structure()
.
dput(head(iris, 10))
#> structure(list(Sepal.Length = c(5.1, 4.9, 4.7, 4.6, 5, 5.4, 4.6,
#> 5, 4.4, 4.9), Sepal.Width = c(3.5, 3, 3.2, 3.1, 3.6, 3.9, 3.4,
#> 3.4, 2.9, 3.1), Petal.Length = c(1.4, 1.4, 1.3, 1.5, 1.4, 1.7,
#> 1.4, 1.5, 1.4, 1.5), Petal.Width = c(0.2, 0.2, 0.2, 0.2, 0.2,
#> 0.4, 0.3, 0.2, 0.2, 0.1), Species = structure(c(1L, 1L, 1L, 1L,
#> 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("setosa", "versicolor", "virginica"
#> ), class = "factor")), .Names = c("Sepal.Length", "Sepal.Width",
#> "Petal.Length", "Petal.Width", "Species"), row.names = c(NA,
#> 10L), class = "data.frame")
Playing the data:
dados <- structure(list(Sepal.Length = c(
5.1, 4.9, 4.7, 4.6, 5, 5.4, 4.6,
5, 4.4, 4.9
), Sepal.Width = c(
3.5, 3, 3.2, 3.1, 3.6, 3.9, 3.4,
3.4, 2.9, 3.1
), Petal.Length = c(
1.4, 1.4, 1.3, 1.5, 1.4, 1.7,
1.4, 1.5, 1.4, 1.5
), Petal.Width = c(
0.2, 0.2, 0.2, 0.2, 0.2,
0.4, 0.3, 0.2, 0.2, 0.1
), Species = structure(c(
1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L
), .Label = c("setosa", "versicolor", "virginica"), class = "factor")), .Names = c(
"Sepal.Length", "Sepal.Width",
"Petal.Length", "Petal.Width", "Species"
), row.names = c(
NA,
10L
), class = "data.frame")
A less ideal solution than this would be to provide the data in text format, such as in the case below:
texto <- "Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa"
In this case, the user who answers your question can reassemble the database using the read.table()
function:
dados <- read.table(text=texto)
Another way to produce a dataset is by generating random values , for example, with the rnorm()
function (you can also generate other non-normal distributions, if applicable), or with the function sample()
for a sampling of values of some vector. A useful case can be the letters()
function, to generate characters or factors. In this case, be sure to supply seed
for the example to be reproducible.
Example:
set.seed(1) # garantir reproducibilidade
dados <- data.frame(x = rnorm(10), y = sample(letters, 10))
dados
#> x y
#> 1 -0.6264538 y
#> 2 0.1836433 f
#> 3 -0.8356286 p
#> 4 1.5952808 c
#> 5 0.3295078 z
#> 6 -0.8204684 i
#> 7 0.4874291 a
#> 8 0.7383247 h
#> 9 0.5757814 x
#> 10 -0.3053884 v
Other interesting functions in this case are functions of type as
, such as as.factor()
, as.Date()
etc, for you to convert the data to the required format.
Producing the minimum code
Try to identify the smallest part of your code that generates the error or question you have. Before submitting the code, make sure that you have listed the required packages to make it playable. For this, it's good to test your code after restarting the R
session, to make sure everything needed is there.
Example:
library(lattice) # a biblioteca utilizada
set.seed(1) # a seed
dados <- data.frame(x = as.character(rnorm(10)), y = sample(letters, 10)) # o conjunto de dados
densityplot(as.numeric(dados$x))
as.numeric(dados$x)
#> [1] 2 5 4 10 6 3 7 9 8 1
This example would correspond to a question like: "I'm trying to make a density graph with lattice
as in the above code, because when I convert the data to numeric they have seen 2, 5, 4 ... and do not remain as the original data of rnorm
? "
System Information
Finally, when necessary, you can provide information about your system with sessionInfo()
, which gives you detailed information about your section. In my case, this information was:
R version 3.0.1 (2013-05-16)
Platform: i386-w64-mingw32/i386 (32-bit)
locale:
[1] LC_COLLATE=Portuguese_Brazil.1252 LC_CTYPE=Portuguese_Brazil.1252
[3] LC_MONETARY=Portuguese_Brazil.1252 LC_NUMERIC=C
[5] LC_TIME=Portuguese_Brazil.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] lattice_0.20-15
loaded via a namespace (and not attached):
[1] grid_3.0.1 tools_3.0.1
Package reprex
To help create the playable sample the reprex package can be very useful, including the previous examples were generated on it. This is a package made specifically to help create and run reproducible examples (the name reprex is the abbreviation for Re producible ample), already formatted for sites like GitHub and StackOverflow.
A simple way to create a playable sample with the package is to copy the code in R
to your clipboard. Then just load the package with library(reprex)
and run the command reprex(venue = "so")
that the code with the commented already formatted results will be available to be pasted to the chosen venue (in this example "so" is the venue stackoverflow). All generated images are placed in the imgur and the link is generated automatically for posting, simply pasting the result.
The package has other useful functions. For example, you can automatically include system information with the si = TRUE
argument and also automatically format your code using the style suggested by Hadley with argument style = TRUE
. For more information see the package page.