Insert documents in MongoDB via pymongo

0

Good evening. I'm using python for the first time in order to run a crawler. I was able to run and get the acquisitions and I want to save them on MongoDB via pymongo. I tried to follow the official documentation but for some reason I can not. Does anyone know how to insert or have done something like this? Hugs.

import scrapy
import pymongo
from pymongo import MongoClient


class NameSpider(scrapy.Spider):
    name = 'SpiderName'
    allowed_domains = ['randomDomain']
    start_urls = ['randomDomain Url']

    def parse(self, response):
        data = []
        for selector in response.css("span.style_data"):
            data.append(selector.css("::text").extract()

        print(data)

# O data aparece como desejado,agora desejo salvar seu conteudo no MongoDB.
    
asked by anonymous 17.07.2018 / 23:53

1 answer

2

Hello, your problem is missing, but I'll try to help as best I can.

First of all it would be interesting if you already understood the concepts of a non-relational database. In MongoDB we basically have collections and documents , and in summary:

  • collections: group of stored documents (in comparison fairly generic would be similar to the table in a relational database),
  • documents: way of storing the data itself, in MongoDB documents are stored in the JSON format (in the case of pymongo % are used to represent documents ).
Basic operations of dicionários :
  • Create a client to connect and connect to a bd:

    pymongo

    from pymongo import MongoClient

    client = MongoClient()

    #conectar a um bd local or client = MongoClient('localhost', 8000)

  • Accessing a database:

    client = MongoClient('mongodb://localhost:8000') or banco = client.crawler_db

  • Access to a specific collection:

    banco = client['crawler_db'] or colecao = banco.dados_crawler

    obs: collections and banks are created from the moment the first document is inserted!

  • Manipulating documents:

    colecao = banco['dados_crawler']

This is the expected format of a MongoDB document (JSON format).

  • Inserting a document:

    doc_exemplo = { "dado1" : 123, "dado2" : "teste_bd" }

    dados_crawler = banco.dados_crawler

  • Inserting multiple documents:

    resultado = dados_crawler.insert_one(doc_exemplo)

In order not to extend me further, I believe this is enough to solve your problem.

Following are documentation / tutorial references with examples:

  • Basic Tutorial English
  • Introduction to MongoDB - English
  • 18.07.2018 / 02:38