How can I find open source inside my
software?
There has
been a lot of press lately, about the legal risks of using open source
software embedded inside commercial software. Those risks can
become amplified, if you do not know exactly what is in your source code.
As a software development manager with 20+ years of experience, I personally
have faced exactly this situation, more than once.
The first
time, I had been running an engineering group for 3+ years, when I was suddenly
faced with a barrage of questions about whether I had open source embedded
inside. I was woefully unprepared for these questions, as we did not have
a well specified open source governance policy in place, we had a very geographically
distributed engineering team, and finding out what was in my code was
challenging.
Later in my
career, I faced a similar situation, but was better prepared. Three
months after I took on the challenge of turning around an engineering team, I
found that a very large software sale was dependent on a complete disclosure of
all embedded open source software.
Fortunately,
I knew the one thing I could not do, which is assume we had no open
source. My engineers were connected to the internet, and I personally had
not reviewed all the check-ins, so I knew there was a good chance there were
some surprises waiting for me.
So, what can
you do in this situation? Here are some options:
-
Query the engineers. You’ll certainly get some
responses back pointing to some open source code. Of course, you can
only ask the engineers currently working for you. You can’t ask the
outsourced contractors or employees who have left. One of the
challenges with this approach, is your engineers will not remember
everything up front. As I recall, it was very embarrassing to keep
adding to the initial disclosure as my engineers remembered additional
packages they had downloaded.
-
Examine the source repository structure. If
you have good source configuration management policies in place and any
3rd party libraries & source are in a known location, this approach
can be pretty straight forward. Of course, this does depend on all
software engineers following the policy. It is a good practice to
keep a tracking spreadsheet for all of the included packages (both open
source and commercial). If you don’t have this in place, now’s
probably a good time to start.
-
Look for common license strings in the source
code. On a Linux box, grep for “GPL”, “Apache”, “BSD”, “LGPL”,
etc. You’ll find many of the packages this way, but you’ll miss any
binary libraries downloaded and any source downloaded where the licenses
were removed. Be aware there are over 75 unique valid open source
licenses as maintained by the Open Source Initiative. If you want to cover all possible
copyrights, you will have to create a fairly complex search
expression. If you do this, I am pretty certain you will find
software you never knew you had.
-
Once you find all of the copyrights in the source
code, track down the original source on the internet. This is fairly
time consuming, because often, the same original open source software is
used in many subsequent open source packages, so finding the original
source can be tricky.
-
For the non source files, examine the binary library
or jar files. Usually when developers download a package, they do
not change the name, and the internet search engines can help you find the
package that matches the binary file names.
-
Use a commercial open source scanning package.
This will provide a more complete and accurate audit, avoiding incremental
surprises. There are several companies providing commercial
scanning for open source such as BlackDuck, Palamida, and Source Auditor.
I am, of course, biased, and view Source Auditor as your most cost
effective choice but any of these scanning packages will provide a
complete and accurate audit.
Here are
some final pieces of advice for those of you going through this for the first
time. First, avoid the credibility damage that can occur when you go back
to the business/legal team multiple times with new open source
discoveries. After the third or fourth time, they may wonder if you
really have things under control.
Second,
allocate enough time and do enough digging into your source code before your
first report.
Finally, stay
organized. Keep a detailed list of the open source found, tracking:
the directory where it is located, the package name, the version
number, the URL where it can be downloaded, the correct license, and the
resulting license obligations. You don’t want to have to go back and
re-research the code if you forgot something.
Once you
determine what open source is inside your software, you can determine whether
you have any unfulfilled license obligations and whether any of it should be
removed. That’s the subject of another article.
When does it make sense to get a professional open
source audit from a firm like Source Auditor?
Professional
open source audit firms provide specialized expertise, a well developed
analysis methodology, and open source search engine technology. Source
Auditor, for example, has a open source discovery tool that compares
your source code line by line against a database of over 500,000 open
source packages. This provides the assurance that the audit will be
accurate and complete open source search discovery process.
Most
companies who contact us for assistance usually need a complete, accurate, and
independent view of their source code.
For many
technology companies, this is because their customers want a certified full
disclosure of open source inside their software. For larger F500 technology companies,
software open source audits are often part of corporate policy, as established
by the legal department.
Other
companies are often looking for some form of software due diligence, i.e. they
want to purchase a company or a large enterprise software license. Companies on the other side of that
acquisition transaction also need an open source software audit to insure they
are fully prepared for the acquiring company’s audit process.
Copyright Source Auditor 2008. All rights reserved.