I'm trying to index some wiki pages using Solr 7.0, but in the last step for that, the DataImportHandler is apparently not extracting the data. I do not know what is happing cause any error is throwing.
When I call link two different behavior are noticeable.
The first response for my first request is.
{
"responseHeader":{
"status":0,
"QTime":75
},
"initArgs":[
"defaults",[
"config","data-config.xml"
]
],
"command":"full-import",
"status":"idle",
"importResponse":"",
"statusMessages":{}
}
The second response when I just press enter again is.
{
"responseHeader":{
"status":0,
"QTime":26
},
"initArgs":[
"defaults",[
"config","data-config.xml"
]
],
"command":"full-import",
"status":"idle",
"importResponse":"",
"statusMessages":{
"Total Requests made to DataSource":"0",
"Total Rows Fetched":"2",
"Total Documents Processed":"0",
"Total Documents Skipped":"0",
"Full Dump Started":"2017-10-28 07:05:31",
"":"Indexing completed. Added/Updated: 0 documents. Deleted 0
documents.",
"Committed":"2017-10-28 07:05:31",
"Time taken":"0:0:0.449"
}
}
As you can see in the second answer, the DIH founds 2 documents or rows. It's exactly the number of the document that I have in my test file wiki.xml
. The problem is DIH is not extracting as you may notice in Indexing completed. Added/Updated: 0 documents. Deleted 0 documents.
Here's my Solr configuration: git gist I'm using Windows 10, Solr 7.0 and Lucene 7.0.
What I'm trying so far ...
- One of those data that I'm trying to extract is the "user", but there are some irregularities with it, for example, the
<contributor>
XML tag have some time subtag<username>
(the user nickname) and<id>
(the user id) when a user has an account and some other times when the user does not have an account the<contributor>
appears only with one subtag<ip>
. So I just try to import the data without the "user" data. - I'm just trying to get the id and title. To that, I comment the other fields in
data-config.xml
.
No one that tests work.