What are metadata?

6

I was researching about "sanitize" data and found this answer from @maniero , which cites:

  

"Delete text snippets in a data entry that have    characteristics of metadata , and therefore may cause some    security issue. "

I would like to understand what would be metadata and what application / utility would they have?

    
asked by anonymous 08.03.2018 / 00:17

1 answer

14

Meta, from the Greek (μετά), means "behind" or "beyond." Metadata is information about a given.

Think of a photo taken from a camera:

Thedataitselfistheimage.It'swhatyouseeabove.Themetadatacouldbe:

  • Whatcameratookthisphoto?
  • Whatrolediditreveal?
  • Wherewasthisphototaken?
  • Whotookthisphoto?

Theyareinformationaboutthedata.Itisthegoal,thebehindandthebeyondoftheimageitself.

Itisclearthatthisisnotatermofthetechnologyarea,butapplicabletoanytypeofdata,object,photograph,softwareorthingyoucanimagine.Thisisatermthat#

Contextualizing

Files on disk

To contextualize information technology, let's look at the metadata of a file:

  • File size
  • File extension
  • Filename
  • Busy disk size

Every file in the operating system has metadata. This is saved to disk and takes up space. So a 1 kb file is actually 1 kb + size of your metadata.

JSON APIs

One of the specifications for building Web APIs is the JSON API . Following this specification, suppose you have a paged product listing API:

GET /api/produtos

{
  "produtos": [ ... ],
  "meta": {
    "paginaAtual": 1,
    "proximaPagina": "api/produtos?pagina=2",
    "paginaAnterior": null
  }
}

See that there is a meta node that has no connection to a product, which is what this endpoint provides. It is about pagination. It is information beyond and above the die.

HTML

On the internet pages the scheme is no different. A page like your Facebook profile should contain your timeline, your name, your photo and the like. In addition to all this data, there are some more dedicated to metadata.

That's what the meta HTML tag is for. It was made to be read by machines and not by an end user. They are information about the page.

<head>
  <meta charset="UTF-8">
  <meta name="description" content="Free Web tutorials">
  <meta name="keywords" content="HTML,CSS,XML,JavaScript">
  <meta name="author" content="John Doe">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
</head>

Here, in this W3C example, page metadata would be a description, keywords, page author, and so on.

Databases

Relational database DBMSs also have metadata. A person table can contain entities with name, date of birth and CPF. Content is entities. But if I ask the question "what are the columns of the people table?", This is a question about the metadata of this table.

Some metadata examples from a database table:

  • The table name
  • The size of the table
  • The number of rows in the table
  • The table columns and their typing

This data, in fact, is stored somewhere. In SQL Server you can view them in the sys.columns table. An example:

SELECT name, column_id
FROM sys.columns
WHERE object_id = OBJECT_ID(N'Pessoa.Cpf');

If you want to read more about, see Querying the SQL Server Catalog .

Programming languages

Metaprogramming addresses this part in languages. Maniero replied with a my question about the difference between metaprogramming and reflection , which is worth reading. There he defined the metaprogramming briefly:

  

A paradigm that allows the manipulation of the   More generally, you program how the code should be programmed.

An example of a code reading to yourself in Ruby:

class Developer 
  def self.backend
    "I am backend developer"
  end

  def frontend
    "I am frontend developer"
  end
end

p Developer.class   => Class
p Class.superclass  => Module
p Module.superclass => Object
p Object.superclass => BasicObject

BMP image format

An interesting example is the .bmp (bitmap image file) format of images. If you observe the contents of this file raw, you will see that it follows the following pattern:

  

You can see that much of this information is not the matrix of the image itself, but rather the metadata of that image. As you mentioned, there is no problem in sanitizing some of this information in many cases.

    
08.03.2018 / 00:46