No items found.

Relecura's New Bucketing Module Provides Flexibility without Sacrificing Control

On several occasions I have talked about the importance of categorization, or bucketing when conducting a patent landscape report (PLR). Buckets provide context, and depth to the study of a technical area, and are critical to adding pivot points to a data collection. This added detail is crucial to helping demonstrate how a technology has changed, or to highlight how different organizations approach technology implementation within a field.

While essential to a well researched PLR the task of creating buckets for sub-categories can often be very time consuming, and will often represent a significant percentage of the time spent on any particular project. In a new release Relecura has taken this task head on with new functionality to address the challenges of bucketing.

Introduction

Bucketing is a ubiquitous and essential, though often time-consuming, workflow in patent analysis including patent landscaping, competitive analysis, and patent commercialization activities. If done manually, the quality of the bucketing exercise depends a great deal on the subject matter expertise of the analyst, the effort expended, and the time available to complete the exercise.

Relecura’s new Bucketing module provides three modes to bucket a patent portfolio, ranging from a manual approach of specifying buckets using queries, a supervised method, using training sets of representative documents (semi-automated), and finally through a fully automatic mode requiring minimal user intervention. The above modes can be mixed and matched as appropriate to the bucketing exercise. At each stage, Bucketing Statistics are available to monitor the exercise and modify the bucketing strategy if required. In addition to the Bucketing Statistics, all of Relecura’s analytics for individual sets and portfolio comparison may be used to analyze and course-correct the bucketing strategy if required. These bucketing strategies, once validated, can be reused in future analyses as required.

Bucketing modes

Query-based bucketing – The buckets are specified through Relecura queries. The refine options are available to fine-tune these queries. These queries are saved as rules, to be applied to the portfolio being bucketed.

Training-based bucketing – In this bucketing mode, the user selects training sets of exemplary patents. These training sets are saved as representative rules. The system processes these training rules to create buckets of patents similar to those contained in each of the training sets. An acceptable overlap between the buckets may also be set by the user.

Auto bucketing – The only input required in this bucketing mode are the number of buckets desired and the acceptable overlap. The auto-bucketing algorithm uses these inputs to create the buckets.

The following table compares the characteristics of the three modes:

Table 1. Comparative characteristics of the three bucketing modes.

Features	Auto Mode	Training Mode	Query Mode
User input required	Select the number of buckets and amount of overlap	Specify the buckets using representative documents	Specify buckets through queries
Time required	Minimal	Medium	High
Subject matter expertise required	None	Medium	High
Control over selection of categories	None	Moderate	Complete
Reusability of bucketing strategy	None	Fully reusable	Fully reusable
Factors impacting accuracy of buckets	Provides consistent results independent of user input	Dependent on representative documents chosen	Depends on subject matter expertise of analyst

To understand and appreciate the nuances of the three modes better, we walk through the following example, wherein a portfolio of virtual reality (VR) and augmented reality (AR) patents are bucketed.

Bucketing a portfolio of Virtual Reality (VR) and Augmented Reality (AR) Patents

We obtain a set of patent documents related to this domain, and published in the last 10 years – by searching for “virtual reality” or “augmented reality” in the title, abstract and claims (tac). The query employed is tac:(“virtual reality” OR “augmented reality”) AND pd_year:[2006 TO *]. Relecura returns 16,689 documents, grouped into 9,953 equivalents, for this query as of June 3rd, 2016. We will now attempt to bucket this set using each of the three modes mentioned above.

Query-based Bucketing

The Query-based bucketing mode provides the user with complete control over the buckets created. Search queries are crafted to accurately specify the buckets to be created and saved as Query Rules. We decided to create buckets for six topics for this example. The queries used for each of the buckets, and the results obtained, are given in the following table. The query rules can employ all the parameters and operators that Relecura provides.

Table 2. Specifications of the buckets and results using the Query-based bucketing mode.

Bucket name	Bucket specification (Relecura queries)	Patent documents in bucket
Query_Advertising	taco:(Advertis* OR marketing OR ads OR ad OR promotion*) AND (icc:G06Q0030020000 OR cpcc:G06Q0030020000)	https://p.relecura.com/Query_Advertising
Query_e-commerce	tac:(ecommerce OR "e commerce" OR shopping OR purchase) AND (cpcc:G06Q0030060000 OR icc:G06Q0030060000)	https://p.relecura.com/Query_e-commerce
Query_Education	(tt:(simulators OR simulation OR training) OR ab:(simulators OR simulation OR training)) AND ((tl:"Educational Aids & Equipment") OR (tl:"Sports Apparatus - Training Equipment"))	https://p.relecura.com/Query_Education
Query_Gaming	((ft:"video games" OR gaming OR games) AND ((icc:A63F0013000000) OR (cpcc:A63F0013000000)))	https://p.relecura.com/Query_Gaming
Query_Medical	((ft:Medical OR medicine OR diagnosis OR surgery OR surgical OR healthcare) AND (icc:A6100000000000 OR cpcc:A6100000000000))	https://p.relecura.com/Query_Medical
Query_Navigation	(stl:"Navigation devices")	https://p.relecura.com/Query_Navigation

Training-based Bucketing

Using the buckets created by the query-based mode, we pick the top five documents from each of them as representative documents. These sets of five documents each, are specified as training rules, to be employed in a training-based bucketing exercise of the same set of VR and AR patent documents. This will provide us with a comparison of the two bucketing modes.

Table 3. Specifications of the buckets and results using the Training-based bucketing mode.

Bucket name	Bucket specification (Representative documents)	Patent documents in bucket
Train_Advertising	https://p.relecura.com/Train_Rep+Docs1	https://p.relecura.com/Train_Advertising
Train_e-commerce	https://p.relecura.com/Train_Rep+Docs2	https://p.relecura.com/Train_e-commerce
Train_Education	https://p.relecura.com/Train_Rep+Docs3	https://p.relecura.com/Train_Education
Train_Gaming	https://p.relecura.com/Train_Rep+Docs4	https://p.relecura.com/Train_Gaming
Train_Medical	https://p.relecura.com/Train_Rep+Docs5	https://p.relecura.com/Train_Medical
Train_Navigation	https://p.relecura.com/Train_Rep+Docs6	https://p.relecura.com/Train_Navigation

Table 4. Overlap between buckets created using the Query-based and Training-based modes (Equivalents with Total documents shown in brackets).

Bucket Names	No. of unique patents in Query based buckets	Number of common documents	No. of unique documents in Training based buckets
Advertising	202(322)	79 (142)	224(362)
e-commerce	207(308)	98 (155)	258(419)
Education	499(743)	115 (213)	201(348)
Gaming	627(1,114)	362 (717)	593(1,054)
Medical	494(840)	109 (205)	295(462)
Navigation	272(496)	105 (226)	449¬†(827)

Auto Bucketing

In contrast to the approach used in the previous two bucketing mode, Auto bucketing is initiated specifying the number of required buckets as 6 and the acceptable overlap as 50%. Out of the 16,689 documents in the (VR+AR) portfolio, 11,382 are bucketed by Relecura, with the remainder 5,307 documents put in the “Others” category. The details of the auto buckets and the overall between them is detailed in the following tables.

Table 5. Bucketing Statistics of the auto buckets showing unique and overlapping documents (Equivalents with Total documents shown in brackets).

Bucket Name	Diagnosis & Surgery Identification	Digital Data Processing	Educational Aids & Equipment	Image Data Processing	Optical Elements	Sports Apparatus - Indoor Games
Diagnosis & Surgery Identification	326(617)	0	0	0	0	0
Digital Data Processing	0	4,194(8,611)	0	3,895(8,026)	1,863(3,799)	51(88)
Educational Aids & Equipment	0	0	513(932)	0	0	0
Image Data Processing	0	3,895(8,026)	0	4,065(8,330)	1,890(3,845)	51(88)
Optical Elements	0	1,863(3,799)	0	1,890(3,845)	1,964(3,973)	0
Sports Apparatus - Indoor Games	0	51(88)	0	51(88)	0	448(878)

Table 6. Bucket labels and bucketed documents in the Auto bucketing mode.

Bucket Labels (Relecura-generated)	List of publications in buckets
Diagnosis & Surgery Identification	https://p.relecura.com/Auto_DSI
Digital Data Processing	https://p.relecura.com/Auto_DDP
Educational Aids & Equipment	https://p.relecura.com/Auto_EA
Image Data Processing	https://p.relecura.com/Auto_IDP
Optical Elements	https://p.relecura.com/Auto_OE
Sports Apparatus - Indoor Games	https://p.relecura.com/Auto_SA_IG

Relecura analytics such as the Comparative Topic Map shown below, may be employed to better understand the buckets created and their relationships – to tweak the bucketing strategy during the course of the workflow.

*Figure 1. Bucketing Analytics – Topic Comparison of the assignees of the auto bucketed documents.*

Summary

The example given in this article compares and contrasts the three bucketing modes provided by Relecura’s new Bucketing module.

Of the three, Auto-bucketing requires the least intervention and subject matter knowledge to bucket the portfolio. The auto bucketing results are consistent but the analyst has no control over the bucket labels, which are machine generated based on the technology content of the portfolio. In contrast, the Training-based and Query-based modes offer increased levels of control to the analyst over the bucketing exercise, but demand a similar increase in subject matter expertise on the part of the analyst.

Bucketing rules once created and applied, may be reused in future bucketing analyses. Rules may be grouped and organized using labels to create the desired taxonomy. Labels may be applied to the portfolio to create multiple buckets simultaneously. Relecura analytics such as “Portfolio comparison” and other custom graphing features may be used to visualize and compare the created buckets.

The release of the Bucketing Module is another of Relecura’s on-going efforts to introduce features to address the needs of its user-base. The various modes may to employed separately or in tandem, and will deliver accurate results along with savings in time and effort. The Bucketing Module provides flexibility and support to users of different skill levels to execute bucketing workflows efficiently, without sacrificing control if required in specific cases.

Relecura plans to release other such features in future, which will incorporate appropriate levels of automation into important IP analysis workflows within Relecura, providing options to different types of creators and consumers of IP intelligence and analytics in the enterprise.

‍

published reports