|
|
Indrajit Bhattacharya
Downloadable Resume
Contact
|
firstname AT cs DOT umd DOT edu
|
firstname b AT gmail DOT com
|
Research Interest
My interest lies in reasoning under uncertainty for analysis and prediction over heterogeneous data using probabilistic graphical models and other machine learning techniques. My research includes developing probabilistic and cut-based formulations for collective relational clustering for applications such as entity resolution in structured databases and document collections, and word sense disambiguation from multilingual corpora. More recently, I have been exploring applications of relational clustering in cross domain transfer of learning.
Education
Refereed Conference and Workshop Publications
- "Cross-Guided Clustering: Transfer of Relevant Supervision across Domains for Improved Clustering", Indrajit Bhattacharya, Shantanu Godbole, Sachindra Joshi and Ashish Verma,IEEE International Conference on Data Mining (ICDM), Miami, December 2009.
- "Enabling Analysts in Managed Services for CRM Analytics", Indrajit Bhattacharya, Shantanu Godbole, Ajay Gupta, Ashish Verma, Jeff Achtermann and Kevin English, ACM International Conference
on Knowledge Discovery and Data Mining (SIGKDD), Paris, July, 2009.
-
"Structured Entity Identification and Document Categorization: Two Tasks with One Joint Model",
Indrajit Bhattacharya, Shantanu Godbole, and Sachindra Joshi, ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), Las Vegas, August, 2008.
(6 citations)
- "Online Collective Entity Resolution", Indrajit Bhattacharya and Lise Getoor, NECTAR Track, Conference on Artifical Intellegence (AAAI), 2007.
- "Query-Time Entity Resolution", Indrajit Bhattacharya, Louis Licamele and Lise Getoor,
The 12th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), Philadelphia, USA, August 2006.
(4 citations)
- "Relational Clustering for Entity Resolution Queries",
Indrajit
Bhattacharya, Louis Licamele and Lise Getoor, ICML 2006 Workshop
on Statistical Relational Learning (SRL), Pittsburgh, USA, June 2006.
(1 citations)
- "A Latent Dirichlet Model for Unsupervised Entity Resolution",
Indrajit Bhattacharya and Lise Getoor, The 6th SIAM Conference on Data Mining (SIAM SDM), Bethesda, Maryland, April 2006
(Best Research Paper Award).
(67 citations)
- "Relational
Clustering for Multi-type Entity Resolution", Indrajit Bhattacharya
and Lise Getoor, The 11th ACM SIGKDD Workshop on Multi Relational Data
Mining (MRDM), Chicago, August 2005.
(24 citations)
-
"Similarity Searching in Peer-to-Peer Databases",
Indrajit
Bhattacharya, Srinivas Kashyap and Srinivas Parthasarathy, The 25th
International Conference on Distributed Computing Systems (ICDCS), Columbus, Ohio, June 2005.
(25 citations)
-
"Deduplication and
Group Detection using Links", Indrajit Bhattacharya and Lise
Getoor, The 10th ACM SIGKDD Workshop on Link Analysis and Group Detection
(LinkKDD), Seattle, August 2004.
(42 citations)
-
"The University of
Maryland Senseval-3 system descriptions", Clara Cabezas, Indrajit
Bhattacharya and Philip Resnik, The 3rd International Workshop on the
Evaluation of Systems for the Semantic Analysis of Text (SENSEVAL-3), Barcelona, July 2004
(4 citations)
-
"Unsupervised Sense Disambiguation using
Bilingual Probabilistic Models", Indrajit Bhattacharya, Lise
Getoor and Yoshua Bengio, The 42nd Annual Meeting
of the Association for Computational Linguistics, (ACL-04), Barcelona, July 2004.
(22 citations)
-
"Iterative Record
Linkage for Cleaning and Integration", Indrajit Bhattacharya and
Lise Getoor, The 9th ACM SIGMOD Workshop on Research Issues in Data Mining
and Knowledge Discovery (DMKD), Paris, June 2004.
(97 citations)
Journal Publications and Book Chapters
- "Collective Relational Clustering", Indrajit Bhattacharya and Lise Getoor, Chapter in: Constrained Clustering: Advances in Algorithms, Theory, and Applications, (eds Basu,Davidson,Wagstaff), Chapman & Hall/CRC Data Mining and Knowledge Discovery Series, June 2008.
- "Query-time Entity Resolution",
Indrajit Bhattacharya and Lise Getoor, Journal of Artificial Intelligence Research (JAIR), Volume 30, pages 621-657, 2007.
(19 citations)
- "Collective Entity Resolution in Relational Data",
Indrajit Bhattacharya and Lise Getoor, ACM Transactions on Knowledge
Discovery from Data (ACM-TKDD), Volume 1, Issue 1, March 2007.
(49 citations)
- "Collective Entity Resolution in Relational Data",
Indrajit
Bhattacharya and Lise Getoor, IEEE Data Engineering Bulletin, Special Issue on Data Quality, June 2006.
(2 citations)
- "Entity Resolution in Graphs", Indrajit Bhattacharya and Lise Getoor, Chapter in
Mining Graph Data, Lawrence B. Holder and Diane J. Cook, Editors, Wiley, 2006.
(11 citations)
-
"An Object Oriented Fuzzy Data Model for Similarity
Detection in Image Databases", Arun K. Majumdar, Indrajit Bhattacharya and
Amit K. Saha, IEEE Transactions on Knowledge and Data Engineering,
Vol. 14(5), September/October 2002.
(10 citations)
-
"Quantified Computation Tree Logic", Anindya Patthak,
Indrajit Bhattacharya, Anirban Dasgupta, Pallab Dasgupta and Partha Pratim Chakrabarti,
Information Processing Letters, Vol. 82(3), May 2002.
(6 citations)
PhD Thesis
- "Collective Entity Resolution In Relational Data", Indrajit Bhattacharya, PhD thesis from University of Maryland, College Park, Dec 2006.
Technical Reports and Unpublished Manuscripts
- "Entity Resolution in Graph Data", Indrajit Bhattacharya and Lise
Getoor, University of Maryland Technical Report CS-TR-4758,
October 2005.
(9 citations)
-
"Latent Dirichlet Allocation Model for Entity Resolution", Indrajit
Bhattacharya and Lise Getoor, University of Maryland Technical
Report CS-TR-4740, August 2005.
-
"Similarity Searching in Peer-to-Peer Databases", Indrajit
Bhattacharya, Srinivas Kashyap and Srinivas Parthasarathy, University of Maryland Technical
Report CS-TR-4558, January 2004.
Experience
Professional and Research
4/2007 - current: Research Scientist at the Information Management group at IBM India Research Lab, New Delhi, working on pattern mining over large heterogeneous and noisy information sources.
I am investigating research challenges around information integration and data cleansing, clustering of heterogeneous data and cross-domain learning.
1/2003 - 02/2007: Research assistant
at the Department of Computer Science, University of Maryland under Lise Getoor working on
models and algorithms for collectively resolving references to
real-world entities in structured and semi-structured domains, like
bibliographic and natural language data. I have designed a
relational clustering algorithm that takes domain relationships
into account for iteratively clustering references into entities. In
addition, I have proposed a probabilistic generative model that
looks for hidden group structures among domain entities as evidence
for resolving references. I have developed an efficient unsupervised
inference algorithms for this model using Gibbs Sampling techniques. I
have shown that both of these approaches improve performance over
attribute baselines in multiple real-world and synthetic datasets.
In addition to collective resolution over an entire
database, I have investigated the problem of query-centric entity
resolution. For the related problem of word sense
disambiguation using multiple languages, I have developed
generative models for bilingual corpora and have shown that they
outperform existing sense disambiguation approaches in real datasets.
6/2002 - 12/2002: Research assistant at the Graphics Lab,
University of Maryland, working on faster rendering techniques and
compact representations for 3D models making use of local similarity
in datasets.
6/2001 - 8/2001: Research intern at Virtio Corporation,
Campbell, California, working on the design and implementation of a translator
for Virtio's virtual prototyping language to SystemC.
6/1999 - 5/2000: Project officer at the Department
of Computer Science and Engineering, IIT Kharagpur in the National
Semiconductor Corporation funded "Virtual Silicon" project, working on
verification techniques for the SDL-C prototyping language.
8/1998 - 5/1999: Undergraduate researcher at the
Department
of Computer Science and Engineering, IIT Kharagpur, working on
"Similarity Retrieval from Image Databases" by rank-ordering images in
a database with respect to the spatial and topological relations
existing between objects. I focussed on developing fuzzy similarity
measures to deal with uncertainty/vagueness in images.
Teaching
8/2000 - 12/2002 - Teaching Assistant, University of
Maryland: Duties included teaching lab sections and occasional
classes, holding office hours, designing/grading projects and
homeworks.
Object Oriented Programming in C++ (CMSC214), Fall 2000 and Spring
2001.
Data Structures (CMSC420) with Prof. Hanan Samet,
Fall 2001.
Data Structures (CMSC420) with Prof. V.S. Subrahmanian,
Spring 2002.
Data Structures (CMSC420) with Prof. Leila DeFloriani,
Fall 2002.
Professional Activities
- Program Committee Memberships
- Reviewer for Journals such as IEEE Transactions on Knowledge and Data Engineering, ACM Transactions on Database Systems, IEEE Transactions on Neural Networks, ACM Journal on Data and Information Quality, Pattern Analysis and Applications
|