Shahzad Khan, PhD

Shahzad Khan, PhD

Ottawa, Ontario, Canada
9K followers 500+ connections

About

I have 30 years of info-tech industry experience across the entire software development life-cycle. I am an experienced Artificial Intelligence researcher and product manager with polished client-facing skills, who can lead an engineering team to deliver projects within the triple constraints (cost, time, and scope).

My PhD from Cambridge University (UK) is in computational linguistics. I have a decade of experience in information retrieval (search technologies), machine learning, data mining, and analytics.

I am open to consulting opportunities.

Specialties: Intelligent systems development, with special interests in computational linguistics (search technologies, machine translation, summarization)

Experienced with machine learning, in-simulation competency assessment, natural language generation, decision support system, analytics, event identification and data visualization.

Articles by Shahzad

Activity

Join now to see all activity

Experience

  • Gnowit Inc Graphic

    Gnowit Inc

    Ottawa, Canada Area

  • -

    Ottawa, Ontario, Canada

  • -

    Ottawa, Ontario, Canada

  • -

    Greater New York City Area

  • -

    Ottawa, Ontario, Canada

  • -

    Ottawa, Canada Area

  • -

    Ottawa, Canada Area

  • -

    Ottawa, Canada Area

  • -

    Ottawa, Canada Area

  • -

    Ottawa, ON, Canada

  • -

  • -

  • -

  • -

  • -

  • -

  • -

  • -

  • -

  • -

  • -

  • -

  • -

  • -

Education

  •  Graphic

    -

    My area of research was in sentiment classification using linguistic analyses and machine learning. The title of my research thesis is 'Negation and Antonymy in Sentiment Classification'. I carried out my research as part of the <a href=http://www.cl.cam.ac.uk/research/>Natural Language and Information Processing</a> group under the supervision of Dr. Ted Briscoe.

  • -

    Activities and Societies: Phi Beta Delta International Honor Society

    The Telecommunications and Network Management graduate degree offered at the School of Information Studies is a marriage of technology, policy and management. This programme gives IT professionals a broader view of how to apply and use networking technologies to solve business problems, provide strategic planning and consultancy, and design customer and business-need driven winning products.

  • -

  • -

Volunteer Experience

  • Member

    Member of the Task Force on Applications of AI in Pakistan

    - Present 1 year 8 months

    Advising on the development of modern R&D capabilities, formation of practical regulations and establishment of commercialization efforts through international collaborations

  • Algonquin College of Applied Arts and Technology Graphic

    Advisor, CET-CS/CP/WDIA Program Advisory Committee

    Algonquin College of Applied Arts and Technology

    - Present 3 years 4 months

    Helping Algonquin College to keep it's AI Certificate curriculum relevant to industry needs as an advisory members of the curriculum oversight committee

  • University of Ottawa Graphic

    Advisor, Computer Science, Electrical Engineering Curriculum Advisory Board

    University of Ottawa

    - Present 1 year 7 months

    Helping the University of Ottawa Computer Science and Electrical Engineering Programs to be relevant to the needs of industry

Publications

  • Relevance feedback using semantic association between indexing terms in large free text corpuses

    IADIS International Conference WWW/Internet - ICWI

    Relevance feedback has been considered as a means of incorporating learning into information retrieval systems for quite sometime now. This paper discusses the research results of two methodologies for automatic query expansion to incorporate relevance feedback into our probabilistic information retrieval system. The first methodology employs analysis of topmost ranked documents retrieved by the user’s initial query to form new keyword enriched expanded queries. These queries help the user in…

    Relevance feedback has been considered as a means of incorporating learning into information retrieval systems for quite sometime now. This paper discusses the research results of two methodologies for automatic query expansion to incorporate relevance feedback into our probabilistic information retrieval system. The first methodology employs analysis of topmost ranked documents retrieved by the user’s initial query to form new keyword enriched expanded queries. These queries help the user in focusing his search to a smaller subset of the document collection. The second novel technique developed during the course of this research, resorts to a process similar to text mining employing ‘index term association’ as an underlying mechanism to expand the query with a semantic structure. We also draw a comparison of the two different methodologies using a human judge, measuring the extent of subject deviation of the expanded query from the initial/original query. The results show that significant improvement can be expected using semantic association between indexing terms in query expansion.

    Other authors
    See publication
  • Statistical Word Sense Disambiguation through Unsupervised Sense Marker Enrichment on Large Free Text Corpuses

    IADIS International Conference WWW/INTERNET

    Word sense ambiguity is known to have a destructive effect on the performance of information retrieval and linguistic systems. The problem arises from the inherent polysemous nature of natural languages, where one word can have multiple meanings or senses. This is not a problem for humans but mapping the correct sense of a word is a daunting task for a retrieval system. This paper describes two disambiguation methodologies based on contemporary techniques that seek to enrich text with sense…

    Word sense ambiguity is known to have a destructive effect on the performance of information retrieval and linguistic systems. The problem arises from the inherent polysemous nature of natural languages, where one word can have multiple meanings or senses. This is not a problem for humans but mapping the correct sense of a word is a daunting task for a retrieval system. This paper describes two disambiguation methodologies based on contemporary techniques that seek to enrich text with sense meta-information by identifying the correct sense for an ambiguous noun in a document. This research draws on contemporary statistical disambiguation methodologies, and attempts to make it more effective through a novel weighting scheme, which is simpler than complex schemes used by other disambiguation algorithms. This research follows two recent ground breaking research results --- that words tend to have one sense per document and one sense per collocation. In the experiments, the set of senses for each polysemous word are the same as the Wordnet 1.7 repository. However, the methodologies are generalized, and applicable to any concept repository that is built on a generalization/specialization framework. The two different methodologies are compared with each other and the results establish that this approach leads to an improvement in the disambiguation process. This paper also proposes a strategy to use the disambiguation methodology to enhance relevance feedback and information retrieval performance.

    Other authors
    See publication

Patents

  • Method and system relating to re-labelling multi-document clusters

    Issued CA CA2865184C

    Individuals receive overwhelming barrage of information which must be filtered, processed, analysed, reviewed, consolidated and distributed or acted upon. However, prior art tools for automatically processing content, such as for example returning search results from an Internet or database search for example are ineffective. Prior art search techniques merely provide large numbers of "hits" with at most removal of multiple occurrences of identical items. However, it would be beneficial to…

    Individuals receive overwhelming barrage of information which must be filtered, processed, analysed, reviewed, consolidated and distributed or acted upon. However, prior art tools for automatically processing content, such as for example returning search results from an Internet or database search for example are ineffective. Prior art search techniques merely provide large numbers of "hits" with at most removal of multiple occurrences of identical items. However, it would be beneficial to present searches as a series of multi-document clusters wherein occurrences of commonly themed content are clustered allowing the user to rapidly see the number of different themes and review a selected theme. Further, it would be beneficial, in repeated searches, for new clusters to be identified automatically as well as new items of content associated with existing clusters to be associated to these clusters.

    See patent
  • Method and system relating to re-labelling multi-document clusters

    Issued US US9600470B2

    Individuals receive overwhelming barrage of information which must be filtered, processed, analyzed, reviewed, consolidated and distributed or acted upon. However, prior art tools for automatically processing content, such as for example returning search results from an Internet or database search for example are ineffective. Prior art search techniques merely provide large numbers of “hits” with at most removal of multiple occurrences of identical items. However, it would be beneficial to…

    Individuals receive overwhelming barrage of information which must be filtered, processed, analyzed, reviewed, consolidated and distributed or acted upon. However, prior art tools for automatically processing content, such as for example returning search results from an Internet or database search for example are ineffective. Prior art search techniques merely provide large numbers of “hits” with at most removal of multiple occurrences of identical items. However, it would be beneficial to present searches as a series of multi-document clusters wherein occurrences of commonly themed content are clustered allowing the user to rapidly see the number of different themes and review a selected theme. Further, it would be beneficial, in repeated searches, for new clusters to be identified automatically as well as new items of content associated with existing clusters to be associated to these clusters.

    See patent
  • Method and system relating to salient content extraction for electronic content

    Issued US US9336202B2

    Individuals receive overwhelming barrage of information which must be filtered, processed, analyzed, reviewed, consolidated and distributed or acted upon. Automatic approaches to “scraping” salient content from sources of content are provided allowing the salient content to be provided to the user or subjected to further processing such as clustering or sentiment analysis for example.
    Embodiments of the invention provide for:
    automated scraper induction based on document and/or contextual…

    Individuals receive overwhelming barrage of information which must be filtered, processed, analyzed, reviewed, consolidated and distributed or acted upon. Automatic approaches to “scraping” salient content from sources of content are provided allowing the salient content to be provided to the user or subjected to further processing such as clustering or sentiment analysis for example.
    Embodiments of the invention provide for:
    automated scraper induction based on document and/or contextual semantic cues and document structure analysis.
    identifying salient text, removing boiler-plate text, off-topic content and other non-salient content;
    deriving reusable descriptive extraction patterns for subsequent documents;
    applying descriptive extraction patterns for extraction from subsequent documents from the same source;
    intelligent identification of extraction success confidence score, using historical success scores; and
    employing confidence scores to automatically trigger new extraction pattern identification if extracted confidence is below an acceptable confidence threshold.

    See patent
  • Method and system relating to sentiment analysis of electronic content

    Issued US CA2865186C

    Embodiments of the invention provide automatic contextual based sentiment classification of content in terms of both sentiments expressed and their intensity. Further, a content set is analysed to rapidly establish an "at-a-glance" type assessment of the key topics/themes present within the content set and sentimentally annotate each. Importantly, embodiments of the invention also provide for a user to establish the basis for the sentiment associated with an item of or set of content, i.e. make…

    Embodiments of the invention provide automatic contextual based sentiment classification of content in terms of both sentiments expressed and their intensity. Further, a content set is analysed to rapidly establish an "at-a-glance" type assessment of the key topics/themes present within the content set and sentimentally annotate each. Importantly, embodiments of the invention also provide for a user to establish the basis for the sentiment associated with an item of or set of content, i.e. make it explainable. Further embodiments of the invention provide for the establishment of psychological tone to sentiments where the sentiments and psychological tones to be tuned are from the context or domain of the content.

    See patent
  • Method and system relating to salient content extraction for electronic content

    Issued CA CA2865187C

    Automatic approaches to scraping salient content from sources of content are provided that allow the salient content to be provided to the user or subjected for further processing such as clustering or sentiment analysis. Embodiments of the invention provide for: automated scraper induction based on document and/or contextual semantic cues and document structure analysis; identifying salient text, removing boiler-plate text, off-topic content and other non-salient content; deriving reusable…

    Automatic approaches to scraping salient content from sources of content are provided that allow the salient content to be provided to the user or subjected for further processing such as clustering or sentiment analysis. Embodiments of the invention provide for: automated scraper induction based on document and/or contextual semantic cues and document structure analysis; identifying salient text, removing boiler-plate text, off-topic content and other non-salient content; deriving reusable descriptive extraction patterns for subsequent documents; applying descriptive extraction patterns for extraction from subsequent documents form the same source; intelligent identification of extraction success confidence score, using historical success scores; and employing confidence scores to automatically trigger new extraction pattern identification if extracted confidence is below an acceptable confidence threshold.

    See patent

Languages

  • Turkish

    Native or bilingual proficiency

  • Urdu

    Native or bilingual proficiency

  • Pashto

    Professional working proficiency

  • English

    Native or bilingual proficiency

Recommendations received

2 people have recommended Shahzad

Join now to view

More activity by Shahzad

View Shahzad’s full profile

  • See who you know in common
  • Get introduced
  • Contact Shahzad directly
Join to view full profile

Other similar profiles

Explore collaborative articles

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Explore More

Others named Shahzad Khan, PhD

Add new skills with these courses