Resume
Publications
Home

Indrajit Bhattacharya

Downloadable Resume

Contact

firstname AT cs DOT umd DOT edu firstname b AT gmail DOT com

Research Interest

My interest lies in reasoning under uncertainty for analysis and prediction over heterogeneous data using probabilistic graphical models and other machine learning techniques. My research includes developing probabilistic and cut-based formulations for collective relational clustering for applications such as entity resolution in structured databases and document collections, and word sense disambiguation from multilingual corpora. More recently, I have been exploring applications of relational clustering in cross domain transfer of learning.

Education

PhD in Computer Science University of Maryland, College Park 12/2006
MS in Computer Science University of Maryland, College Park 6/2004
BTech in Computer Science Indian Institute of Technology, Kharagpur 6/1999

Refereed Conference and Workshop Publications

  1. "Cross-Guided Clustering: Transfer of Relevant Supervision across Domains for Improved Clustering", Indrajit Bhattacharya, Shantanu Godbole, Sachindra Joshi and Ashish Verma,IEEE International Conference on Data Mining (ICDM), Miami, December 2009.

  2. "Enabling Analysts in Managed Services for CRM Analytics", Indrajit Bhattacharya, Shantanu Godbole, Ajay Gupta, Ashish Verma, Jeff Achtermann and Kevin English, ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), Paris, July, 2009.

  3. "Structured Entity Identification and Document Categorization: Two Tasks with One Joint Model", Indrajit Bhattacharya, Shantanu Godbole, and Sachindra Joshi, ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), Las Vegas, August, 2008.
    (6 citations)

  4. "Online Collective Entity Resolution", Indrajit Bhattacharya and Lise Getoor, NECTAR Track, Conference on Artifical Intellegence (AAAI), 2007.

  5. "Query-Time Entity Resolution", Indrajit Bhattacharya, Louis Licamele and Lise Getoor, The 12th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), Philadelphia, USA, August 2006.
    (4 citations)
  6. "Relational Clustering for Entity Resolution Queries", Indrajit Bhattacharya, Louis Licamele and Lise Getoor, ICML 2006 Workshop on Statistical Relational Learning (SRL), Pittsburgh, USA, June 2006.
    (1 citations)
  7. "A Latent Dirichlet Model for Unsupervised Entity Resolution", Indrajit Bhattacharya and Lise Getoor, The 6th SIAM Conference on Data Mining (SIAM SDM), Bethesda, Maryland, April 2006
    (Best Research Paper Award). (67 citations)
  8. "Relational Clustering for Multi-type Entity Resolution", Indrajit Bhattacharya and Lise Getoor, The 11th ACM SIGKDD Workshop on Multi Relational Data Mining (MRDM), Chicago, August 2005.
    (24 citations)
  9. "Similarity Searching in Peer-to-Peer Databases", Indrajit Bhattacharya, Srinivas Kashyap and Srinivas Parthasarathy, The 25th International Conference on Distributed Computing Systems (ICDCS), Columbus, Ohio, June 2005.
    (25 citations)
  10. "Deduplication and Group Detection using Links", Indrajit Bhattacharya and Lise Getoor, The 10th ACM SIGKDD Workshop on Link Analysis and Group Detection (LinkKDD), Seattle, August 2004.
    (42 citations)
  11. "The University of Maryland Senseval-3 system descriptions", Clara Cabezas, Indrajit Bhattacharya and Philip Resnik, The 3rd International Workshop on the Evaluation of Systems for the Semantic Analysis of Text (SENSEVAL-3), Barcelona, July 2004
    (4 citations)
  12. "Unsupervised Sense Disambiguation using Bilingual Probabilistic Models", Indrajit Bhattacharya, Lise Getoor and Yoshua Bengio, The 42nd Annual Meeting of the Association for Computational Linguistics, (ACL-04), Barcelona, July 2004.
    (22 citations)
  13. "Iterative Record Linkage for Cleaning and Integration", Indrajit Bhattacharya and Lise Getoor, The 9th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (DMKD), Paris, June 2004.
    (97 citations)

Journal Publications and Book Chapters

  1. "Collective Relational Clustering", Indrajit Bhattacharya and Lise Getoor, Chapter in: Constrained Clustering: Advances in Algorithms, Theory, and Applications, (eds Basu,Davidson,Wagstaff), Chapman & Hall/CRC Data Mining and Knowledge Discovery Series, June 2008.
  2. "Query-time Entity Resolution", Indrajit Bhattacharya and Lise Getoor, Journal of Artificial Intelligence Research (JAIR), Volume 30, pages 621-657, 2007.
    (19 citations)
  3. "Collective Entity Resolution in Relational Data", Indrajit Bhattacharya and Lise Getoor, ACM Transactions on Knowledge Discovery from Data (ACM-TKDD), Volume 1, Issue 1, March 2007.
    (49 citations)
  4. "Collective Entity Resolution in Relational Data", Indrajit Bhattacharya and Lise Getoor, IEEE Data Engineering Bulletin, Special Issue on Data Quality, June 2006.
    (2 citations)
  5. "Entity Resolution in Graphs", Indrajit Bhattacharya and Lise Getoor, Chapter in Mining Graph Data, Lawrence B. Holder and Diane J. Cook, Editors, Wiley, 2006.
    (11 citations)
  6. "An Object Oriented Fuzzy Data Model for Similarity Detection in Image Databases", Arun K. Majumdar, Indrajit Bhattacharya and Amit K. Saha, IEEE Transactions on Knowledge and Data Engineering, Vol. 14(5), September/October 2002.
    (10 citations)
  7. "Quantified Computation Tree Logic", Anindya Patthak, Indrajit Bhattacharya, Anirban Dasgupta, Pallab Dasgupta and Partha Pratim Chakrabarti, Information Processing Letters, Vol. 82(3), May 2002.
    (6 citations)

PhD Thesis

  • "Collective Entity Resolution In Relational Data", Indrajit Bhattacharya, PhD thesis from University of Maryland, College Park, Dec 2006.

Technical Reports and Unpublished Manuscripts

  1. "Entity Resolution in Graph Data", Indrajit Bhattacharya and Lise Getoor, University of Maryland Technical Report CS-TR-4758, October 2005.
    (9 citations)
  2. "Latent Dirichlet Allocation Model for Entity Resolution", Indrajit Bhattacharya and Lise Getoor, University of Maryland Technical Report CS-TR-4740, August 2005.
  3. "Similarity Searching in Peer-to-Peer Databases", Indrajit Bhattacharya, Srinivas Kashyap and Srinivas Parthasarathy, University of Maryland Technical Report CS-TR-4558, January 2004.

Experience

Professional and Research

4/2007 - current: Research Scientist at the Information Management group at IBM India Research Lab, New Delhi, working on pattern mining over large heterogeneous and noisy information sources. I am investigating research challenges around information integration and data cleansing, clustering of heterogeneous data and cross-domain learning.

1/2003 - 02/2007: Research assistant at the Department of Computer Science, University of Maryland under Lise Getoor working on models and algorithms for collectively resolving references to real-world entities in structured and semi-structured domains, like bibliographic and natural language data. I have designed a relational clustering algorithm that takes domain relationships into account for iteratively clustering references into entities. In addition, I have proposed a probabilistic generative model that looks for hidden group structures among domain entities as evidence for resolving references. I have developed an efficient unsupervised inference algorithms for this model using Gibbs Sampling techniques. I have shown that both of these approaches improve performance over attribute baselines in multiple real-world and synthetic datasets. In addition to collective resolution over an entire database, I have investigated the problem of query-centric entity resolution. For the related problem of word sense disambiguation using multiple languages, I have developed generative models for bilingual corpora and have shown that they outperform existing sense disambiguation approaches in real datasets.

6/2002 - 12/2002: Research assistant at the Graphics Lab, University of Maryland, working on faster rendering techniques and compact representations for 3D models making use of local similarity in datasets.

6/2001 - 8/2001: Research intern at Virtio Corporation, Campbell, California, working on the design and implementation of a translator for Virtio's virtual prototyping language to SystemC.

6/1999 - 5/2000: Project officer at the Department of Computer Science and Engineering, IIT Kharagpur in the National Semiconductor Corporation funded "Virtual Silicon" project, working on verification techniques for the SDL-C prototyping language.

8/1998 - 5/1999: Undergraduate researcher at the Department of Computer Science and Engineering, IIT Kharagpur, working on "Similarity Retrieval from Image Databases" by rank-ordering images in a database with respect to the spatial and topological relations existing between objects. I focussed on developing fuzzy similarity measures to deal with uncertainty/vagueness in images.

Teaching

8/2000 - 12/2002 - Teaching Assistant, University of Maryland: Duties included teaching lab sections and occasional classes, holding office hours, designing/grading projects and homeworks.

Object Oriented Programming in C++ (CMSC214), Fall 2000 and Spring 2001.

Data Structures (CMSC420) with Prof. Hanan Samet, Fall 2001.

Data Structures (CMSC420) with Prof. V.S. Subrahmanian, Spring 2002.

Data Structures (CMSC420) with Prof. Leila DeFloriani, Fall 2002.

Professional Activities