Relecura’s New Bucketing Module Provides Flexibility without Sacrificing Control
On several occasions I have talked about the importance of categorization, or bucketing when conducting a patent landscape report (PLR). Buckets provide context, and depth to the study of a technical area, and are critical to adding pivot points to a data collection. This added detail is crucial to helping demonstrate how a technology has changed, or to highlight how different organizations approach technology implementation within a field.
While essential to a well researched PLR the task of creating buckets for sub-categories can often be very time consuming, and will often represent a significant percentage of the time spent on any particular project. In a new release Relecura has taken this task head on with new functionality to address the challenges of bucketing.
Bucketing is a ubiquitous and essential, though often time-consuming, workflow in patent analysis including patent landscaping, competitive analysis, and patent commercialization activities. If done manually, the quality of the bucketing exercise depends a great deal on the subject matter expertise of the analyst, the effort expended, and the time available to complete the exercise.
Relecura’s new Bucketing module provides three modes to bucket a patent portfolio, ranging from a manual approach of specifying buckets using queries, a supervised method, using training sets of representative documents (semi-automated), and finally through a fully automatic mode requiring minimal user intervention. The above modes can be mixed and matched as appropriate to the bucketing exercise. At each stage, Bucketing Statistics are available to monitor the exercise and modify the bucketing strategy if required. In addition to the Bucketing Statistics, all of Relecura’s analytics for individual sets and portfolio comparison may be used to analyze and course-correct the bucketing strategy if required. These bucketing strategies, once validated, can be reused in future analyses as required.
Query-based bucketing – The buckets are specified through Relecura queries. The refine options are available to fine-tune these queries. These queries are saved as rules, to be applied to the portfolio being bucketed.
Training-based bucketing – In this bucketing mode, the user selects training sets of exemplary patents. These training sets are saved as representative rules. The system processes these training rules to create buckets of patents similar to those contained in each of the training sets. An acceptable overlap between the buckets may also be set by the user.
Auto bucketing – The only input required in this bucketing mode are the number of buckets desired and the acceptable overlap. The auto-bucketing algorithm uses these inputs to create the buckets.
The following table compares the characteristics of the three modes:
Table 1. Comparative characteristics of the three bucketing modes.
|Features||Auto Mode||Training Mode||Query Mode|
|User input required||Select the number of buckets and amount of overlap||Specify the buckets using representative documents||Specify buckets through queries|
|Subject matter expertise required||None||Medium||High|
|Control over selection of categories||None||Moderate||Complete|
|Reusability of bucketing strategy||None||Fully reusable||Fully reusable|
|Factors impacting accuracy of buckets||Provides consistent results independent of user input||Dependent on representative documents chosen||Depends on subject matter expertise of analyst|
To understand and appreciate the nuances of the three modes better, we walk through the following example, wherein a portfolio of virtual reality (VR) and augmented reality (AR) patents are bucketed.
Bucketing a portfolio of Virtual Reality (VR) and Augmented Reality (AR) Patents
We obtain a set of patent documents related to this domain, and published in the last 10 years – by searching for “virtual reality” or “augmented reality” in the title, abstract and claims (tac). The query employed is tac:(“virtual reality” OR “augmented reality”) AND pd_year:[2006 TO *]. Relecura returns 16,689 documents, grouped into 9,953 equivalents, for this query as of June 3rd, 2016. We will now attempt to bucket this set using each of the three modes mentioned above.
The Query-based bucketing mode provides the user with complete control over the buckets created. Search queries are crafted to accurately specify the buckets to be created and saved as Query Rules. We decided to create buckets for six topics for this example. The queries used for each of the buckets, and the results obtained, are given in the following table. The query rules can employ all the parameters and operators that Relecura provides.
Table 2. Specifications of the buckets and results using the Query-based bucketing mode.
|Bucket name||Bucket specification (Relecura queries)||Patent documents in bucket|
|Query_Advertising||taco:(Advertis* OR marketing OR ads OR ad OR promotion*) AND (icc:G06Q0030020000 OR cpcc:G06Q0030020000)||https://p.relecura.com/Query_Advertising|
|Query_e-commerce||tac:(ecommerce OR “e commerce” OR shopping OR purchase) AND (cpcc:G06Q0030060000 OR icc:G06Q0030060000)||https://p.relecura.com/Query_e-commerce|
|Query_Education||(tt:(simulators OR simulation OR training) OR ab:(simulators OR simulation OR training)) AND ((tl:”Educational Aids & Equipment”) OR (tl:”Sports Apparatus – Training Equipment”))||https://p.relecura.com/Query_Education|
|Query_Gaming||((ft:”video games” OR gaming OR games) AND ((icc:A63F0013000000) OR (cpcc:A63F0013000000)))||https://p.relecura.com/Query_Gaming|
|Query_Medical||((ft:Medical OR medicine OR diagnosis OR surgery OR surgical OR healthcare) AND (icc:A6100000000000 OR cpcc:A6100000000000))||https://p.relecura.com/Query_Medical|
Using the buckets created by the query-based mode, we pick the top five documents from each of them as representative documents. These sets of five documents each, are specified as training rules, to be employed in a training-based bucketing exercise of the same set of VR and AR patent documents. This will provide us with a comparison of the two bucketing modes.
Table 3. Specifications of the buckets and results using the Training-based bucketing mode.
|Bucket name||Bucket specification (Representative documents)||Patent documents in bucket|
Table 4. Overlap between buckets created using the Query-based and Training-based modes (Equivalents with Total documents shown in brackets).
|Bucket Names||No. of unique patents in Query based buckets||Number of common documents||No. of unique documents in Training based buckets|
|Navigation||272(496)||105 (226)||449 (827)|
In contrast to the approach used in the previous two bucketing mode, Auto bucketing is initiated specifying the number of required buckets as 6 and the acceptable overlap as 50%. Out of the 16,689 documents in the (VR+AR) portfolio, 11,382 are bucketed by Relecura, with the remainder 5,307 documents put in the “Others” category. The details of the auto buckets and the overall between them is detailed in the following tables.
Table 5. Bucketing Statistics of the auto buckets showing unique and overlapping documents (Equivalents with Total documents shown in brackets).
|Bucket Name||Diagnosis & Surgery Identification||Digital Data Processing||Educational Aids & Equipment||Image Data Processing||Optical Elements||Sports Apparatus – Indoor Games|
|Diagnosis & Surgery Identification||326(617)||0||0||0||0||0|
|Digital Data Processing||0||4,194(8,611)||0||3,895(8,026)||1,863(3,799)||51(88)|
|Educational Aids & Equipment||0||0||513(932)||0||0||0|
|Image Data Processing||0||3,895(8,026)||0||4,065(8,330)||1,890(3,845)||51(88)|
|Sports Apparatus – Indoor Games||0||51(88)||0||51(88)||0||448(878)|
Table 6. Bucket labels and bucketed documents in the Auto bucketing mode.
|Bucket Labels (Relecura-generated)||List of publications in buckets|
|Diagnosis & Surgery Identification||https://p.relecura.com/Auto_DSI|
|Digital Data Processing||https://p.relecura.com/Auto_DDP|
|Educational Aids & Equipment||https://p.relecura.com/Auto_EA|
|Image Data Processing||https://p.relecura.com/Auto_IDP|
|Sports Apparatus – Indoor Games||https://p.relecura.com/Auto_SA_IG|
Relecura analytics such as the Comparative Topic Map shown below, may be employed to better understand the buckets created and their relationships – to tweak the bucketing strategy during the course of the workflow.
Figure 1. Bucketing Analytics – Topic Comparison of the assignees of the auto bucketed documents.
The example given in this article compares and contrasts the three bucketing modes provided by Relecura’s new Bucketing module.
Of the three, Auto-bucketing requires the least intervention and subject matter knowledge to bucket the portfolio. The auto bucketing results are consistent but the analyst has no control over the bucket labels, which are machine generated based on the technology content of the portfolio. In contrast, the Training-based and Query-based modes offer increased levels of control to the analyst over the bucketing exercise, but demand a similar increase in subject matter expertise on the part of the analyst.
Bucketing rules once created and applied, may be reused in future bucketing analyses. Rules may be grouped and organized using labels to create the desired taxonomy. Labels may be applied to the portfolio to create multiple buckets simultaneously. Relecura analytics such as “Portfolio comparison” and other custom graphing features may be used to visualize and compare the created buckets.
The release of the Bucketing Module is another of Relecura’s on-going efforts to introduce features to address the needs of its user-base. The various modes may to employed separately or in tandem, and will deliver accurate results along with savings in time and effort. The Bucketing Module provides flexibility and support to users of different skill levels to execute bucketing workflows efficiently, without sacrificing control if required in specific cases.
Relecura plans to release other such features in future, which will incorporate appropriate levels of automation into important IP analysis workflows within Relecura, providing options to different types of creators and consumers of IP intelligence and analytics in the enterprise.