cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
ursyathi
Newcomer I

Show distinct column values in pyspark dataframe

With pyspark  dataframe, how do you do the equivalent of Pandas df['col'].unique().

I want to list out all the unique values in a pyspark dataframe column.

Not the SQL type way (registertemplate then SQL query for distinct values).

Also I don't need groupby then countDistinct, instead I want to check distinct VALUES in that column.

 

 

1 Reply
Caute_cautim
Community Champion

@ursyathi   Have you checked Google? 

 

https://stackoverflow.com/questions/39383557/show-distinct-column-values-in-pyspark-dataframe

 

Not sure whether this helps or not, having never used this data framework myself.

 

Regards

 

Caute_Cautim