The hardest part of any SEO’s job is extracting information about a product/page from the database or that product page itself. That’s why I covered parsing HTML/XML with Python in my last article. You can get a lot of stuff from scraping a page, but sometimes that’s not enough. Sometimes you need more information that isn’t directly available.
PMG has several current clients on Magento. And while Magento has a lot of stuff wrong with it, it does have a decent web services API that can help you get more info on just about any model in the Magento database. This is handy if you don’t have access to the database directly, or you want a faster way to retrieve data than web scraping. The Magento web services api comes in two flavors: SOAP and XML-RPC. We’ll be using XML-RPC here.
Python provides an easy to use library to interact with XML-RPC servers.
Getting Started and Connecting
First off we need to import a few things. You can test all this from the Python interpreter, as the code in this tutorial will do.
First we need to import xmlrpclib
and then we’ll set up a XML-RPC client by creating a new instance of xmlrpclib.ServerProxy
.
If you need to see what methods are available to you, you can call server.system.listMethods()
to see what’s available. If you need help with a method, call server.system.methodHelp('method_name')
.
Authenticating
Before we can go any further, Magento will want us to log in. Be sure to enable webservices by visiting your admin area, then in the system menu, choose web services. First, create a role that has all the privledges you need. Then create a new user with that role. Be sure to remember your username and password!
Next we need to call the server.login
method. The first argument will be a username, and the second your password. This method returns a session key (a string of random letters and numbers. We need this later, so we’ll store it in a variable called session
.
Getting Some Data!
The server.call
method is the main interface to interact with magento. It will take three arguments: your session key, an API call, and the arguments you wish to send to that API call.
So, let’s query a category. We’ll use the API call catalog_category.info
, which requires one argument be send along: a category ID. You’ll get back a dictionary. Also worth noting is that the arguments being sent need to be an iterable.
Finishing Up
When you’re done, you need to invalidate your session key by calling server.endSession
, which takes one argument: a session key.