Update: Since publishing this post it has been brought to my attention that PatSeer, and Relecura provide family information that was not covered originally. Please see the comments for details on the offerings from these organizations.
Thinking back over the course of the past nine months an argument can be made that the single most important topic in patent analytics is what method should be used for conducting a patent family reduction. Previous posts have looked at the issues associated with using extended families, provided some background on vendor specific patent families, and proposed a hybrid, domestic and extended family method, referred to as One Document Per Invention (ODPI). The method chosen for performing a patent family reduction has vast, and far-reaching impacts on the statistics generated during an analysis project, so all possible options associated with this decision should be explored fully. This post will look at the implications of using simple families as a means of conducting a reduction.
Recently, during the WIPO Regional Workshop on Patent Analytics, it was suggested that the ODPI method was essentially the same as using simple families for patent family reductions. By way of definition, the European Patent Office defines a simple patent family as:
All documents having exactly the same priority or combination of priorities belong to one patent family.
A representative example is provided:
So, in this case Documents D2 and D3 are together in the same simple family, but otherwise the remaining documents reside in other simple families. As discussed previously, all the documents would be encompassed by a single extended family, such as an INPADOC family.
For a function such as “also published as” the use of simple families works very well for identifying equivalents, and it can be calculated fairly easily behind the scenes, assuming a database has all the relevant data, and it has been standardized. Unfortunately, when conducting patent analytics from a practitioners point of view, simple families can be difficult to calculate, or determine for a variety of reasons.
One of the biggest impediments to working with simple families is the apparent lack of a unique key to identify them. When working with INPADOC families, most database providers have a field for the INPADOC Patent Family Number. When sorting a spreadsheet full of patents this key can be used to quickly organize a worldwide collection by extended family. The European Patent Office, and Espacenet are the primary users of simple families, but when exporting data from this system there does not appear to be a unique identifier associated with this property. It appears that the only way to identify the discrete simple families within an extended family is by looking at the individual members one by one, or by using the Common Citation Document database (CCD) that will provide the number, and members of individual simple families within an extended family if the user inputs a single patent number. Theoretically, information about the simple families can be copy out of CCD and pasted into a table, but this can be quite cumbersome, and time consuming.
Simple families are also difficult to work with, from a practical perspective, since most commercial databases don’t support the use of them. In Thomson Innovation, for instance, patent document collections can be collapsed by INPADOC and DWPI Families, or by application number. The ability to collapse by simple family is not supported. Some might think that collapsing by application number might address this, but this function only works with domestic application numbers that match exactly, so it doesn't collapse the equivalents from different countries.
So again, from a practical perspective, simple patent families seem difficult to work with from within commercial tools, or when data is exported into a spreadsheet, and the analyst attempts to identify and sort on them. Beyond the practical ability to reasonably conduct a reduction using simple families, when working with a large collection of patent documents, they can also be problematic based on how they organize the content. Using the Aliphcom Up portfolio, as introduced in the post on extended families, these issues can be exemplified.
When this portfolio was last studied, in April 2013, there were 80 worldwide patent documents associated with it. Since then, as of September 27th 2013, there are 97 worldwide documents within the portfolio, all of which are covered in a single extended INPADOC family. Looking at the major patent families associated with this collection the following is found:
INPADOC Families - 1
DWPI Families - 36
FamPat Families - 29
Simple Families - 35
One Document per Invention - 27
Before going any further with the analysis it is important to point out that while it looks as if the DWPI families are aligning with the simple families, and the FamPat families are aligning with the ODPI results, this is simply a coincidence. When the families are looked at individually, between the systems, there is not much overlap between which documents are in which families from case to case. The table below illustrates this with one of the simple families that actually has more overlap between the members, using the other systems, than most do:
To summarize, looking at a single simple family covering some of the protective overmolding documents found in the Aliphcom portfolio, we find the same documents distributed over five DWPI families, and two families each for FamPat and ODPI. In the case of these twelve documents, but not all the documents within the extended family, the FamPat One Patent Per Family (ODPF) result would be the same as the ODPI one.
In the ODPI case, there is a N/A next to the documents from the other countries, since using this reduction method all of the granted patents and pending applications from the primary country are included, and documents from additional countries would only be included if they were part of an additional extended family without a member from the primary country. Since all of these documents are part of the same extended family the domestic family reduction provides the same result as the ODPI set. An immediate concern with the ODPI method might be that there are three WO documents in the collection while there are only two US documents. In this case though a look in US Public PAIR shows that the three WO documents all claim priority to four US applications. The ‘811 grant and the ‘382 app are accounted for in simple family number 2, but priority is also being claimed to US2012313296 and US2012313272, which appear in other simple families. ‘296 and ‘272 are pending applications from the primary country so they will show up as additional members within the ODPI reduction. For the sake of completeness, ‘296 and ‘272 showed up with the ‘382 application in FamPat Family number 28 so in a ODPF analysis using FamPat they would have been thrown out of the final count.
Another way of looking at this would be to examine all the worldwide documents within this extended family on a particular invention, and see how many documents would still be represented after reduction. the table below provides this analysis for the 19 documents in the Aliphcom portfolio on coating or overmolding, associated with the Up product:
So, to summarize the following was found when looking at just the coating/overmolding documents:
INPADOC Families - 1
DWPI Families - 8, four US and four WOs (assuming WO and not AU or CA are primary)
FamPat Families - 3, but only two from the US, the third would be one of the CN docs
Simple Families - 3, all US
One Document per Invention - 4, all US
In the case of the INPADOC, simple and FamPat families the inventive output, based on priority documents present, would have been under represented. The DWPI family would have over represented the output, and only the domestic method, as incorporated by ODPI, would have produced an accurate inventive output, for this one example.
Having said that, the purpose of this exercise is not to say that one family reduction method is better than another, but it can be clearly seen that in this example each one of these methods would have given a separate collection of patents, and in almost all cases a different statistical value. In this case, using all 97 worldwide documents would not be appropriate, and neither would using an extended family approach, where all of them would be represented by a single document. Absolutely, using any one of the other methods will produce a more accurate result. Pre-processing patent collections for statistical analysis always requires a family reduction step, but it is critical that analysts consider the type of method they will use, and the impact this decision will have on the values generated downstream.
Personally, I still prefer the ODPI method where the primary country is either, the US, or is determined by looking at which priority country is most frequently seen within the collection being analyzed. When the primary country is the US I wondered if there might not be an issue with WO documents where the US was the priority country but the WO might be the first filing. Under these circumstances a US publication might not appear until much later, and the PCT application might be the only published document on the pending invention. The posts on the relationship between WO and US filings, and the last one on WO representation in INPADOC families were written to generate evidence on whether modifications to the ODPI methodology were needed based on this.
Referring to the last two posts we can likely conclude the following:
When the US is not the priority country WO documents will not over or under represent inventive output using an extended family reduction method or one of the narrower definitions, including simple, DWPI and, FamPat families. The ODPI method will also not be impacted.
When the US is the priority country, ~5-7% of the time the WO document will be the only publication when a PCT application is filed, but a National Stage application is never filed in the US.
When the US is the priority country, 20% of the time there will be multiple WO documents found in the same extended family. These can also be situations where additional inventive output is not captured if WO documents are not considered beyond a domestic family reduction.
At most 25% of the potential cases might be under representing the inventive output if WO documents are not considered when the US is the primary country used in a ODPI family reduction. The actual percentage is likely much smaller, but based on these observations it is worth taking a few moments to look at the major applicants in a collection, and check to see if they use PCT applications as the first filing. Experience has shown that in the majority of cases the US publications will still represent an accurate measure of inventive output, but this should be confirmed before generating statistics.
This post has looked at the use of simple patent families when performing a family reduction pre-processing step. While simple patent families usually give similar statistical values to what is seen with the ODPI method, and with DWPI and FamPat families, the use of simple families is not recommended primarily because they are not supported on major commercial platforms, and it is difficult to generate them with data that can be easily exported from most patent search systems. Based on input from the previous two posts the ramifications of the relationship between US and WO documents and the impact they have on the ODPI method was also explored. While the impact is not likely to be significant, in most cases, it is still prudent to look at major applicants within a collection to ensure they are not using PCT applications as the first filing, and waiting till the last moment to file a National Stage application in the US.