Index XML Default Google Shopping on Sphinx Search

0

How to Index a Standard Google Shopping Mall in Sphinx Search?

XML:

<?xml version="1.0"?>
<rss version="2.0" 
xmlns:g="http://base.google.com/ns/1.0">
<channel>
<title>O nome do seu feed de dados</title>
<link>http://www.example.com</link>
<description>Uma descrição do seu conteúdo</description>
<item>
<title>Suéter de lã vermelho</title>
<link> http://www.example.com/item1-info-page.html</link>
<description>Confortável e macio, este suéter o manterá aquecido nas noites frias de inverno.</description>
<g:image_link>http://www.example.com/imagem1.jpg</g:image_link>
<g:price>25</g:price>
<g:condition>new</g:condition>
<g:id>1a</g:id>
</item>
<!-- ... -->

The Sphinx index will use xmlpipe2 data source .

Will I need to convert the XML to xmlpipe2 document pattern before indexing it?

Format xmlpipe2 document :

<?xml version="1.0" encoding="utf-8"?>
<sphinx:docset>

<sphinx:schema>
<sphinx:field name="subject"/>
<sphinx:field name="content"/>
<sphinx:attr name="published" type="timestamp"/>
<sphinx:attr name="author_id" type="int" bits="16" default="1"/>
</sphinx:schema>

<sphinx:document id="1234">
<content>this is the main content <![CDATA[[and this <cdata> entry
must be handled properly by xml parser lib]]></content>
<published>1012325463</published>
<subject>note how field/attr tags can be
in <b class="red">randomized</b> order</subject>
<misc>some undeclared element</misc>
</sphinx:document>

<!-- ... -->
    
asked by anonymous 31.07.2014 / 20:30

1 answer

0

Use Pipe2 :

./pipe2.phar convert:google data/google-shopping-sample.xml

Use in Sphinx configuration:

source xmlSource
{
    type = xmlpipe
    xmlpipe_command = /usr/local/bin/pipe2 convert:google /tmp/google-shopping-sample.xml
}
    
07.08.2014 / 14:01