How can I find open source inside my software?

There has been a lot of press lately, about the legal risks of using open source software embedded inside commercial software.   Those risks can become amplified, if you do not know exactly what is in your source code.  As a software development manager with 20+ years of experience, I personally have faced exactly this situation, more than once. 

The first time, I had been running an engineering group for 3+ years, when I was suddenly faced with a barrage of questions about whether I had open source embedded inside.  I was woefully unprepared for these questions, as we did not have a well specified open source governance policy in place, we had a very geographically distributed engineering team, and finding out what was in my code was challenging. 

Later in my career, I faced a similar situation, but was better prepared.   Three months after I took on the challenge of turning around an engineering team, I found that a very large software sale was dependent on a complete disclosure of all embedded open source software.

Fortunately, I knew the one thing I could not do, which is assume we had no open source.  My engineers were connected to the internet, and I personally had not reviewed all the check-ins, so I knew there was a good chance there were some surprises waiting for me.

So, what can you do in this situation?  Here are some options:

  • Query the engineers.  You’ll certainly get some responses back pointing to some open source code.  Of course, you can only ask the engineers currently working for you.  You can’t ask the outsourced contractors or employees who have left.  One of the challenges with this approach, is your engineers will not remember everything up front.  As I recall, it was very embarrassing to keep adding to the initial disclosure as my engineers remembered additional packages they had downloaded.

  • Examine the source repository structure.  If you have good source configuration management policies in place and any 3rd party libraries & source are in a known location, this approach can be pretty straight forward.  Of course, this does depend on all software engineers following the policy.  It is a good practice to keep a tracking spreadsheet for all of the included packages (both open source and commercial).  If you don’t have this in place, now’s probably a good time to start.

  • Look for common license strings in the source code.  On a Linux box, grep for “GPL”,  “Apache”, “BSD”, “LGPL”, etc.  You’ll find many of the packages this way, but you’ll miss any binary libraries downloaded and any source downloaded where the licenses were removed.  Be aware there are over 75 unique valid open source licenses as maintained by the Open Source Initiative.  If you want to cover all possible copyrights, you will have to create a fairly complex search expression.  If you do this, I am pretty certain you will find software you never knew you had.

  • Once you find all of the copyrights in the source code, track down the original source on the internet.  This is fairly time consuming, because often, the same original open source software is used in many subsequent open source packages, so finding the original source can be tricky.

  • For the non source files, examine the binary library or jar files.  Usually when developers download a package, they do not change the name, and the internet search engines can help you find the package that matches the binary file names.

  • Use a commercial open source scanning package.   This will provide a more complete and accurate audit, avoiding incremental surprises.   There are several companies providing commercial scanning for open source such as BlackDuck, Palamida, and Source Auditor.   I am, of course, biased, and view Source Auditor as your most cost effective choice but any of these scanning packages will provide a complete and accurate audit.

Here are some final pieces of advice for those of you going through this for the first time.  First, avoid the credibility damage that can occur when you go back to the business/legal team multiple times with new open source discoveries.  After the third or fourth time, they may wonder if you really have things under control. 

Second, allocate enough time and do enough digging into your source code before your first report. 

Finally, stay organized.  Keep a detailed list of the open source found, tracking:  the directory where it is located, the package name, the version number, the URL where it can be downloaded, the correct license, and the resulting license obligations.  You don’t want to have to go back and re-research the code if you forgot something.

Once you determine what open source is inside your software, you can determine whether you have any unfulfilled license obligations and whether any of it should be removed.   That’s the subject of another article.

When does it make sense to get a professional open source audit from a firm like Source Auditor?

Professional open source audit firms provide specialized expertise, a well developed analysis methodology, and open source search engine technology.  Source Auditor, for example, has a open source discovery tool that compares your source code line by line against a database of over 500,000 open source packages.  This provides the assurance that the audit will be accurate and complete open source search discovery process.  

Most companies who contact us for assistance usually need a complete, accurate, and independent view of their source code. 

For many technology companies, this is because their customers want a certified full disclosure of open source inside their software.  For larger F500 technology companies, software open source audits are often part of corporate policy, as established by the legal department.   

Other companies are often looking for some form of software due diligence, i.e. they want to purchase a company or a large enterprise software license.  Companies on the other side of that acquisition transaction also need an open source software audit to insure they are fully prepared for the acquiring company’s audit process.

Copyright Source Auditor 2008.  All rights reserved.