Saturday, July 11, 2015

SugarCRM/SuiteCRM Programming Examples

This is a long post about programming against he SugarCRM/SuiteCRM SOAP server. It's all technical so if you came by for some light, witty banter I'm sorry to disappoint you.

The Problem

So a buddy of mine approached me with a problem. He's migrating data from a custom CRM to SuiteCRM. The old CRM has ten years worth of data, mostly consisting of 147,000+ files. My buddy can handle the custom modules, importing the data and all that but he's a point-and-click kinda' guy and he's not going to point-and-click his way through 147,000+ uploads. That's where I came into the picture; I fix technical problems for businesses (I do more than that, but that's a post for a later time).

So I start digging through the SugarCRM/SuiteCRM docs. They've got quite a bit of documentation but it's not very useful. Much like Javadocs or man pages; they're reference documentation, which is great if you already know the topic and want to look something up, but lousy if you need to learn the topic. Examples of programming this CRM I found around the net helped some but left out some important points that tripped me up for awhile, hence this blog post.

So here's the setup: one SuiteCRM server running in a VM; two external drives, one holding 147,000+ files (about 3/4 of a terabyte), the other holding the files for the SuiteCRM installation, a custom module called Assignments, and an SQLite database holding information about which file(s) goes with which Assignment.

My job was to write a program which will do the following:
  • read the database to determine which Assignment a file belongs to
  • create a Document (set_entry()
  • upload the document (set_document_revision())
  • get the assignment id (get_entry_list())
  • set a relationship between the Assignment and the new uploaded Document (set_relationship())
  • update the database with a status
I have some illustrative code up on GitHub. It's not runnable code. It's just two files: the main program and the SuiteCRM code. I left out things like the logging code and the database code. Those are unique to my situation and would just muddy the message I'm trying to get across.

So, let's walk through this:

Initialize SuiteCRM SOAP Object

Our constructor looks like this:

class SugarCrmSoap{
      var $sess;
      var $sess_id;
      var $soapclient;
      var $log;
      var $soap_url;
      var $login_params;

      function SugarCrmSoap(){
        
        $this->soap_url = 'http://192.168.1.10/sugar/service/v4_1/soap.php?wsdl';
        $this->login_params = array(
            'user_name' => 'admin',
            'password'  => md5('admin'),
            'version'   => '.01'
        );
        // the trace option allows for better debugging
        $this->soapclient = new SoapClient($this->soap_url, array('trace' => 1));
        return $this->soapclient;
      }

(Yeah, I'm not too keen on how Blogger formats code. If anyone knows how to do better, let me know)

The soap_url is really important; it determines which version of the interface you use. While that may be obvious, most of the examples I saw didn't include which version of the SOAP interface they were using. This meant when I copypasta their code, I ended passing the wrong parameters in the wrong positions. :-/

The trace option is very useful. We'll see what that does at the end.

Login

We need to create a Session and get its ID to pass around for authenticating. That's easy enough to do:
      
function login(){
  $result = $this->soapclient->login($this->login_params);
  $this->sess_id= $result->id ;
  return $this->sess;
}

Main Loop

Now we're ready to do the work. We create a new document by passing in the name of the file to set_entry() like this:

 function processFile($filename) {
  try {
  //Creating new Document '$filename'
  $result = $this->soapclient
                 ->set_entry( $this->sess_id,
                                 'Documents',
                                  array(
                                        array ( 'name'  => 'new_with_id',
                                                'value' => true
                                              ),
                                        array ( 'name'  => 'document_name',
                                                'value' => $filename
                                              )
                                        )
                        );
          return $result->id;
    } catch (Exception $e) {
      $this->catchError("processFile", $e);
    }
 }

Then we upload the file by setting a new Document Revision. The $docID is from the previous function and $rec is just an array of information about the file.

function uploadFile($docID, $revision=1, $rec) {
  try {
    // file_get_content spits out a warning about there being no file
    // then successfully gets the content. :-? Hence the
    // warning suppression
    $file_contents = @file_get_contents($rec['full_path']);

    $docArray = array( 'id'       => $docID,
                       'file'     => base64_encode($file_contents),
                       'document_name' => basename($rec['file_title']),
                       'filename' => $rec['file_title'],
                       'revision' => $revision,
                       'assignment_no' => $rec['assignment_no']
        );

    $result = $this->soapclient->set_document_revision ( $this->sess_id, $docArray);

    //New document_revision_id is $result->id"
    return $result->id;

  } catch (Exception $e) {
    $this->catchError("uploadFile", $e);
    return (-1);
  }

}

When you upload a document, say example.jpg, SuiteCRM does not store a jpeg file called example.jpg. It stores a base 64 encoded file with a filename that looks like a UUID. I assume this lets them track revisions better.

Setting the Relationship

We finally get to the most important part of this program: setting a relationship between a (custom module) Assignment and the Document we just uploaded. As you might guess that means getting the assignment_id:

function getAssignments($query='', $offset=0, $maxnum=0, $orderby=''){
  try {
  $result = $this->soapclient->get_entry_list(
      $this->sess_id,
      'Assignments',
      $query,
      $orderby,
      $offset,
      array(
      ),
      array(),
      $maxnum,
      0,
      false
  );
  return $result;
  } catch (Exception $e) {
    $this->catchError("getAssignments", $e);
  }
}

This essentially does an SQL query on the back end. The $query parameter is the WHEN clause of that query minus the 'WHEN'. 

This function returns the entire Assignment entry. The id is found thusly:

$assignment = $sugar->getAssignments($assignStr, 0, 1, '');

$assignId = $assignment->entry_list[0]->id;

And finally, we are ready for the final step: setting the relationship! SuiteCRM does some weird internal stuff to set relationships; it's not as simple as a 1-to-many database relationship. Oh no, that would be too easy!  This is the whole reason we have to go through their SOAP server.
Our function for that is fairly straight forward:

function setAssignmentDocumentRelationship($assignId, $docId) {
try {
  //setting relationship for assignment id $assignId and document id $docId
  $result = $this->soapclient->set_relationship(
                    $this->sess_id,
                    "Assignments",
                    $assignId,
                    "assignments_documents_1",
                    array($docId),
                    array(),
                    0
          );
  return $result;
} catch (Exception $e) {
      $this->catchError("setAssignmentDocumentRelationship", $e);
  }
}

Since an Assignment can have several Documents associated with it, I could have passed in several $docIds in the fifth parameter but that would have added complexity to the main program that I didn't feel was justified. 

Capturing errors

Remember up above when we initialized our SOAP object and we set 'trace' => 1? With that set, we can capture the headers of the last request which lets us write error functions like this one:

function catchError($function, $e) {

 $this->log->error( "====== REQUEST HEADERS =====");
 $this->log->error($this->soapclient->__getLastRequestHeaders());

 if ($function != 'uploadFile') {
   $this->log->error( "========= REQUEST ==========");
   $this->log->error($this->soapclient->__getLastRequest());
 }
 $this->log->error( "====== RESPONSE HEADERS =====");
 $this->log->error($this->soapclient->__getLastResponseHeaders());

 $this->log->error( "========= RESPONSE ==========");
 $this->log->error($this->soapclient->__getLastResponse());

 $this->log->error("$function error: $e()");
 // continue on
 throw new Exception($e);

}

The reason for the if ($function != 'uploadFile') is if an error occurs while we're uploading, the entire contents of the base-64 encoded file will appear in the log and you don't want that, believe me!

Conclusion

That's it; how to upload files and set relationships via SuiteCRM's SOAP server. Easy once someone shows you how, eh? ;-)

In my opinion the SuiteCRM documentation is not very helpful, the other examples I've seen were either out of date or missing some important information (I hope I'm not!) and I've heard grumblings that getting information about programming this stuff is very hard to come by. I hope this helps someone. 



Saturday, September 6, 2014

From http://techcrunch.com/2014/09/05/monetization-automation-enforcement/:


Technically I am also a long-time Facebook user, joining the service back in April 2007. But I don’t think of myself in those terms — and certainly wouldn’t identify as one. I’m still on Facebook, largely for work purposes or to use Facebook as an identity layer for other apps and services, but I am not actively engaged with Facebook. I rarely post anything, log in only sporadically, and — despite Facebook’s position as the undisputed leader of social media services with more than 1 billion active monthly users — feel zero personal attachment to it.
If Facebook disappeared tomorrow I would barely notice its absence (except perhaps in a welcome sense via the sudden lack of spam emails)


Tuesday, April 22, 2014

Vignette: Paris Metro

I get off the train and start making my way through the labyrinth to the next train.

I enter a long hallway. The only people in it are me and four people near the turn at the far end. They're all wearing the same type of vest: oversized, non-descript, grey. They are conversing with one another and taking up the entire space of the hallway. If you want to pass you must go through them.

I don't find this threatening. I've seen this type of behavior before in the Metro as well as in the Tube and the New York Subway. You've seen it too; a group of younguns hanging out and conversing, oblivious to the fact they're impeding traffic. The only difference here was they were white, older, and included a woman in the group.

As I passed through the group, the eldest male held out a card reader and said something in French. My initial thought was He wants my credit card to donate to his charity. Well that ain't gonna happen so I fell back to my default phrase when someone hits me up for money in this town.

"Pardon, je ne comprende Francaise. Je suis..." (Yes, I used the Spanish word "comprende" and not the French word "comprend".)

"Tickets!" said the woman, a little _too_ loudly.

"Sure, no problem." I reached into my jacket pocket, pulled out my ticket and handed it to the gang leader with the card reader. He swiped it and said what I imagined was "Thank you, citizen. Enjoy your evening."

The woman said, again, a little too loudly, "Thank you. Enjoy your time here."

"Merci."

Saturday, April 5, 2014

Startup Weekend New Jersey

Last weekend I attended Startup Weekend New Jersey. For those of you who don't know, which is most of you, Startup Weekend is a worldwide organization which brings people together to work on startup ideas.

Here's how it works: developers, designers, marketers, idea people, etcetera, get together at a venue (ours was Juicetank, a coworking space in Somerset, New Jersey). People who have an idea for startup have five minutes to pitch their ideas to the group. When all the pitches have been pitched, the group votes on the ideas they liked most. Depending on the size of the group, the top n voted ideas are chosen to be worked on. We had about 110 people in attendance, so that meant we had about 15 or so projects selected. Then, depending on your interests and capabilities, you break into teams and work on these ideas. At the end of the weekend, presentations are voted on by a panel of judges with the criteria being how well you did, how well you followed the lean startup techniques, etc.

Now, keep in mind the idea here is not to build a software project. It's to validate a business idea using Lean Startup techniques and maybe build a demo for a startup business. The business may have to do with anything, such as transferring money between people in real-time, to wiping out unemployment, to creating a dating site for cats. By the way, all three of those examples where actual projects that were pitched and worked on during the weekend.

The most-voted project, and the one I joined, was called Waddle The idea behind Waddle is to aggregate a traveler's posts from various social media and present the data on a map. The founders, Suma  and Vishnal, had done a lot of upfront work and had a very well thought out business idea, hence why it was the most upvoted project.

The Waddle team was made up of a bunch of pretty cool people who knew their stuff. We put together a cool demo and presentation; unfortunately I wasn't able to be there for the presentation on the last day. Apparently, it was such a good presentation that we won along with another team. The winners of the competition won acceptance into a startup accelerator called Techlaunch and a purse of $25,000 to further the idea. The team and I are meeting later today to discuss where we go from here. :-)

The thing that impressed me the most was the power of the lean startup techniques, specifically be customer value proposition. Although Suma and Vishnal had done a lot of research, we still got a lot of value out of applying the  techniques. Not only did applying the customer value proposition validate the business idea, it also showed us several new markets that no one had ever thought of! These techniques actually work!

So, if you are entrepreneurial, or just like to work on new business ideas, check out a local Startup Weekend. Its a lot of fun and you will learn quite a bit.

Thursday, April 3, 2014

Changes

I can not believe the changes in my life this past month. I still can't believe it's been a little more than a month since I quit my day job.

I'm sitting in a stylish little flat in Limassol, Cyprus watching a cruise ship move slowly into port. It's 5:00 AM.  Last weekend my team won the Startup New Jersey competition. Before that, I spent two weeks working in one of the richest neighborhoods in the country. I made a fool of myself on the dance floor Sunday night and I didn't care.  In a couple of weeks, I'm swinging through London, Paris and Amsterdam where I'm going to hang with some cyber-friends. Then...?

Thirty-odd days ago, I was living in a cubicle in corporate America as a code monkey: fixing other people's bugs, guessing what this ticket was referring to, and laughing along with my fellow co-worker's gallows humor.  It really was one of those positions you read about and can't quite believe true; the protagonist has a well-paying, easy, corporate job, a comfortable life with good friends, and yet is miserable. But true it was.

New management came in about two years ago and, as new management is wont to do, implemented many changes. I was going to continue going along for the ride (easy corporate job, comfortable life, yada yada yada) but then my manager announced he was quitting after 18 years with the company. Not to greener pastures, just leaving. That was the wake-up call I needed. While I liked my co-workers and being in the office, if things were bad enough for him to leave, it was bad enough for me as well.

Serendipity is a real thing. Just as I started to get my resume together, update my LinkedIn profile, and started networking, a colleague that I haven't talked to in years got a hold of me; he just happened to need my expertise on a contract he was working on and wanted to know if I was available. And here I am.

The sun is coming up. Time for a walk on the beach.


Wednesday, December 28, 2011

DADS: Massaging Data

I've decided that "Diary of an Anomaly Detection System" is too wordy to keep writing in the title of the posts in this series, so I'm shortening it to "DADS" hence the title of this post "DADS: Massaging Data".

Anywho, as I said in the previous post, I'm going to talk a bit about what I needed to do to get my data ready for the anomaly detection algorithm. This post has nothing to do with machine learning, per se, but is an important part of designing an ML algorithm.

I'm going to use seven metrics ("features" in ML parlance) to start with: short-, medium-, and long-term load averages; memory use; number of processes; and the number of zombies. You can argue whether or not these are useful metrics but I'm not interested in that argument at this point. I'm currently building the framework for the ML algo; I'll be adding, subtracting, and inventing metrics once I have something to manipulate them with.

I'm using Python since that is one of the scripting languages of choice at my day job; Perl, unfortunately, is frowned upon and the consensus is Ruby can't do scientific programming just yet. Don't even get me started with Java.

Let's read some data

The data originally resides in RRDtool and needs to be put into a standard matrix form. Shouldn't be that difficult, right? RRDtool has a Python interface, so it's just a matter of reading the data in, right?  I wish! The RRDTool Python API is essentially a wrapper around the command-line tool but the output is "Python-esque".  For example, the CLI ouput for the load average looks like this:

[faber@fabers-desktop data] rrdtool fetch load/load.rrd  AVERAGE -s 1321160400

                      shortterm midterm longterm

1321164000: 5.3888888889e-03 1.2805555556e-02 5.0000000000e-02
1321167600: 3.0555555556e-03 1.1388888889e-02 5.0000000000e-02
1321171200: 3.7500000000e-03 1.1861111111e-02 5.0000000000e-02
...


where the first column is the number of seconds from the epoch and the three remaining colums are short-, medium- and long-term load averages; a very handy format. Unfortunately the Python output looks like this:

>>> mydata = rrdtool.fetch('load/load.rrd', 'AVERAGE', '--start=1321160400')
>>> mydata
((1321160400, 1325098800, 3600), ('shortterm', 'midterm', 'longterm'), [(0.005388888888888891, 0.012805555555555468, 0.05000000000000019), (0.0030555555555555557, 0.011388888888888818, 0.05000000000000019), (0.0037500000000000016, 0.011861111111111041, 0.0500000000000002), ...]


which is not a very handy format. For reasons which I'll get into later, I want the format to be this:

shortterm = ((1321164000, 5.3888888889e-03), 

             (1321167600, 3.0555555556e-03), 
             (1321171200, 3.7500000000e-03),
             ...
            )

mediumterm = ((1321164000, 1.2805555556e-02), (1321167600, 1.1388888889e-02), (1321171200, 1.1861111111e-02_,...)

longterm = ((1321164000, 5.0000000000e-02), (1321167600, 5.0000000000e-02), (1321171200, 5.0000000000e-02),...)


So the next step is to format the data.

List Comprehensions to the Rescue

I've always thought Python was just an okay language but its list comprehensions are kinda cute.  It wasn't until this project that I found out just how useful they are.  Here's the blow-by-blow action:

# mydata[0] = timestamp begin, end, and interval
# mydata[1] = labels
# mydata[2] = list of 3-tuples
mydata = rrdtool.fetch('load/load.rrd', 'AVERAGE', '--start=1321160400')

# create a list of timestamps at the appropriate intervals
tses = [ i for i in range(mydata[0][0], mydata[0][1], mydata[0][2]) ]

# create three lists from the 3-tuple list
st, mt, lt = zip(*mydata[2])

mydict = {}
mydict['shortterm'] = zip(tses, st)
mydict['midterm'] = zip(tses, mt)
mydict['longterm'] = zip(tses, lt)


Seven lines of code. I don't know about you, but I'm impressed when a language allows me to do that with native functions.

So what's with the key/value format?

There's a subtle problem with the raw data that's not obvious until you start reading in other RRDtool files and try munging them together: you don't always have data for all the same timestamps. memory.rrd might have data for timestamps t1 and t2 while load.rrd might have data for t2 and t3. How do you manage your lists so that you don't duplicate timestamps (two t2s in the above case) AND fill in values for data you don't have and don't know you don't have? Easy:

SQL.

I'm going to store my data into an SQLite3 database then generate a matrix from the database table. If I do my SQL correctly (and I will :-), SQLite3 will fill in missing data, order by timestamp and I don't have to keep track of values or timestamps across rrd files! This is why I break every metric.rrd file into a (timestamp, value) data structure and put it into a dictionary called mydict['metric']; so I can easily insert and update the metric column in the database!

How that is actually done, I'll talk about in the next post since it's late.


Monday, December 26, 2011

Diary of an Anomaly Detection System

I recently finished Stanford's Machine Learning course offered as part of SEE. It was one of the best courses I've ever taken. Not only did I learn a lot about ML algorithms, but I learned a lot about the applications thereof as well as some math applications.

One of the algoritms that grabbed my attention was "anomaly detection"; this is the algorithm credit card companies use to flag possibly fraudulent activity. It can also be used for monitoring computers and, I believe, web pages but more about that later.

Which brings us to this series of blog posts. Since I'm on winter break, I decided to spend part of my time designing, coding, and blogging about building an anomaly detection system.

The Motivation


My current and on-going project at work is called Sentry, a system that processes URLs. The system involves 23 (and counting) virtual machines and four physical machines so obviously, system administration is a not-insubstantial part of the project.

Currently, the overall health of the system can be measured by throughput; I use collectd and RRDtool to monitor the system. If the throughput is too low, I know something is wrong with one or more of the machines but that doesn't tell me which machine(s) is having a problem, so I look at the CPU load graph for each machine to see if anything looks odd, if not, I look at the network transfers graph for each machine to see if anything looks off, if not, then I... Since there are 27 machines, each with over a dozen metrics, this is a chore, as you can imagine.

Anomaly detection will reduce the the dozen-plus metrics for each computer down to a range of numbers that we consider "normal behavior" and, by extension, the health of all 27-plus machines down to a range of numbers. So how do we define the range of numbers?


All of my metrics measure different things: load average, memory use, etc. If I plot the range of numbers over time I'll get different looking graphs, but for now, assume all the graphs are Gaussian. What I'm going to do is take each metric's numbers and figure out which Gaussian curve (in other words, the values for μ and σ) best fits the data. I'll then take the latest reading, for say, memory use, and see where on the curve it sits (since I know the equation for the Gaussian distribution function) call that value p(memory), and for any value of p(memory) greater than say, ε, I'll tell the computer to flag that reading as an anomaly.

I'll then take all the latest metric readings (the x-s), calculate the p(x)s and multiply them together; that gives me a single number, super-ε, that tells me if my computer is acting "weird".  If I multiply all the p(x)s for all 27+ computers, I get another super-ε number telling me if my cluster of computers is acting "weird". Neat, huh?

First Things First


But first, I need to gather the data and put it into matrix form that the algorithm can handle. The raw data resides in RRDtool tables. I'll be using Python as my main language. The vectorization libraries will be numpy and scipy. Even though my main machine at home is a Macbook Pro, installing numpy and scipy on a MBP is a PITA, so I'll be doing all of our work on a (Ubuntu) Linux box.

Massaging the data will be the topic of my next blog post.