A Brief Introduction to Stata

By Zhenxing Cheng


What is 'Stata'?

  • Stata is a statistical package for managing, analyzing, and graphing data.
  • Stata is available for a variety of platforms. Stata may be used either as a point-and-click application or as a command-driven package.
  • Stata’s GUI provides an easy interface for those new to Stata and for experienced Stata users who wish to execute a command that they seldom use.
  • The command language provides a fast way to communicate with Stata and to communicate more complex ideas.

Why 'Stata'?

Why Stata? Stata is fast, accurate, easy to use. Stata is a complete, integrated software package that provides all your data science needs—data manipulation, visualization, statistics, and reproducible reporting.

  • Master your data;
  • Broad suite of statistical features;
  • Publication-quality graphics;
  • Truly reproducible reporting;
  • Truly reproducible research...
  • Data Management

  • Publication-quality graphics

  • Truly reproducible reporting

Get Started

clear all
cd ~/Desktop
sysuse auto, clear
save auto, replace
use auto, clear

Some Basic Stata Commands

  1. use/save
  2. copy
  3. import delimited
  4. import excel
  5. infix
  6. describe
  7. list
  8. gsort/sort
  9. codebook
  10. generate
  11. replace
  12. rename
  13. drop
  14. summarize
  15. display
  16. egen


Make Publication-quality Graphics

Default Schemes

help scheme
graph query, schemes
set scheme s2color
  • Scheme name Description
  • s2color factory setting
  • sj Stata Journal
  • economist The Economist magazine

Blind Schemes

ssc install blindschemes, replace
set scheme plotplain, permanently
help blindschemes
  • Scheme name
  • plotplain plotplainblind
  • plottig plottigblind

Some Plot Commands

  • histogram / twoway bar / graph bar

  • line / tsline / connected / lfit

  • graph pie

  • scatter / dotplot / graph matrix

  • area / rarea

  • twoway function

  • spmap

  • graph box


Stata for Data Science

Data Science Workflows

A Real World Example

nycflights13.dta:: This data contains all 336,776 flights that departed from New York City in 2013.

Stata in Econometrics

Linear Regression

  • regress, noconstant
  • vce
  • if / in
  • predict, [residual]
  • test / testnl
  • cnsreg
  • return list

Loops in Stata

  • help forvalues
  • help foreach
  • help while
sysuse auto, clear
forvalues i = 1/100 {
	if `i' <= 10 {
		di "price[`i']= `=price[`i']'"

foreach j of varlist _all{
	di as green "`j'[1] = `=`j'[1]'"

foreach m in "make" "price" "mpg" {
	di as yellow "`m'[1] = `=`m'[1]'"

local k = 1
while `k' <= 10 {
	di as yellow "price[`k']=`=price[`k']'"
	local ++k

🕷️ Spider

Example:: Obtain data from http:://www.eastmoney.com/

Graph Retouch

Reproducible reporting

Example:: Generate Office Open XML (.docx) file.

putdocx / sum2docx / reg2docx / corr2docx / t2docx


Many Stata users would describe Mata as a matrix language. StataCorp itself markets Mata that way. Mata would be more accurately described, however, as an across-platform portable-code compiled programming language that happens to have matrix capabilities. Just as important as its matrix capabilities are Mata’s structures, classes, and pointers. William W. Gould

A Simple Application:: Maximization

mata clear
void myeval(todo, x, y, g, H)
	y = exp(-x^2 + x - 3)
S = optimize_init()
optimize_init_evaluator(S, &myeval())
optimize_init_params(S, 0)
x = optimize(S)
tw function y = exp(-x^2 + x - 3)