In our second guest post on Patinformatics.com I would like to introduce Matt Fornwalt. Matt was previously a patent searcher at Johnson and Johnson for 12 years where he explored many patent analytics products. He was also recently a business analyst involved with the development of the new STN platform. Besides having patent search experience, he admits his other coding work is limited to some macros throughout the past decade, a FORTRAN course a couple of decades ago, and some BASIC before that. And now, here’s Matt:
While patent searchers may not think themselves as coders, and hard-code software coders may not think so either, there are aspects of patent searching that overlap with data science, bordering on coding.
As seen during the recent ICIC, there is considerable buzz on big data and data science as it relates to certain aspects of patent searching. Dealing with increasing amounts of data, structured and unstructured across multiple databases, and trying to make sense of it reminds me of skill-sets seen in patent searching for quite sometime now, but that are also currently being explored by data scientist in other fields.
Bill Howe researches and instructs on data science out of the University of Washington’s Department of Computer Science & Engineering. When Howe discusses data science, he alludes to thinking of hacker versus analyst, where the line blurs between the hacker/coder and those that analyze data. Much of this hacker versus analyst thinking can be applied to various levels of patent analysts as these folks often come up with some complex code level queries and then analyze the subsequent results.
Taking a look at some code examples, let’s start with R. R is a programming language, with a focus on user control and ease of quality graphical output. Jeff Leek researches and instructs on data analysis out of the Johns Hopkins Biostatistics Department, and runs the blog Simply Statistics. As seen below, in setting up his example in analysis with R, there is a “for” loop for i in the second line, which is referenced in the third line. (Figure 1)
Programmers often add sense to their complex code in sections with indents and various styles. The example below shows the “else” indented from the “if” it corresponds to, which is then indented from the “while” loop that contains the “if” and “else”. (Figure 2)
Making the parallel to patent analysts, and since we are talking data science, let’s start with an example query proposed in Jurafsky’s Natural Language Processing,
In children with acute febrile illness, what is the efficacy of single-medication therapy with acetaminophen or ibuprofen in reducing fever?
It should not be a stretch for patent analysts to breakdown concepts into a query matrix table, representing A for acute, B for ibuprofen, and C for reducing (Figure 3), while the actual query terms are left general for illustration purposes.
Once completed, these thoughts could be collected into a stacked and indented layout (Figure 4).
Getting back to coding, once a programmer gets some code they like, various programming languages allow it be called upon later to build a query or application. Patent searchers often have the equivalent of a hedge. Hedges are compiled in association with terms, topics, or themes and can be strung together using various Boolean and relational operators.
For the patent analyst, let’s assume some time after this ABC search was run and to the searcher’s liking, a new query comes in:
What is the efficacy of single-medication therapy with ibuprofen in reducing fever in arthritis patients?
B and C are the same as before, and something for a=arthritis can be worked up. B and C can be re-used, and a new stacked and indented layout would be seen blending something new and something used. (Figure 5)
It should be noted that these examples are semi-straightforward ABC types, and there are even more complex examples that could be considered with dozens even hundreds of query strings. While seemingly old-fashioned, the skills associated with command-line driven patent searching are not dissimilar to the process of writing code in one of the hottest fields going today.