Any Data Engineers or Web Designers Interested in Working on UT Football Projects?

#1

DiderotsGhost

Well-Known Member
Joined
Nov 28, 2011
Messages
4,493
Likes
22,674
#1
In the real world, I'm a Data Scientist. I have a few Tennessee related projects I want to work on. Curious if anyone out there might be interested in helping?

(*) Data viz website. First one is a data visualization website for Tennessee. It's already built it out somewhat, but still messing around with D3 to try to make it work well. Nevertheless, my web design could probably use some work. I'm more of a Python data guy than a web designer. I also admit on a basic level, I hate web design.

(*) Predicting Best QBs. Second project is machine learning algorithm that would use high school stats to try to project how good a college QB would be. I've started working on this, but I'm struggling with ideas on how to automate some of the data collection; it's been very cumbersome thus far. My secret reason for building this is to see how a machine learning model would project our inexperienced QBs next season (Shrout, Maurer, Bailey, or even Hill); also to test out the idea that Bailey is underrated in the ratings services (an opinion many here have; I'm agnostic on it right now).

This is a passion project / hobby; don't really expect us to generate boatloads of money or anything, but maybe if we got things going, we could make a little ad revenue.

Let me know if you want to know more / have other ideas / have interest. I already have some samples of the D3 that can be shared for those that want to see; just not up on website yet.
 
Last edited:
#2
#2
No but if you need to size any bridges or culverts for hydraulic crossings I’m your guy. I can give you a 2D model and scour analysis for less that what most people would charge for 1D ;)
 
#3
#3
I can't help you, but I'd be interested to see this when it's done.

Also, if you're ever looking for a job and would like to move to Charlotte, NC I have some options for you!!
 
  • Like
Reactions: DiderotsGhost
#4
#4
A lot of stats sites have an API that you could plug into for data. Not sure how many track high school sports though. Maybe check with sites like maxpreps.com and see if you can hook into it. They definitely have the data for whatever region or position you like, but no idea if they expose it via an API.

Good luck with it, sounds interesting.
 
  • Like
Reactions: DiderotsGhost
#8
#8
A lot of stats sites have an API that you could plug into for data. Not sure how many track high school sports though. Maybe check with sites like maxpreps.com and see if you can hook into it. They definitely have the data for whatever region or position you like, but no idea if they expose it via an API.

Good luck with it, sounds interesting.

Unfortunately, couldn't find an API for Maxpreps. And the data on the site is pretty terribly formatted. I tried copying and pasting a couple years worth of data on top 20 QBs and it was a pain in the butt. Not sure if I reach out to them if they'd give me access to data in exchange for publicity; my guess is "no", but maybe worth a shot.

The other thing is that I've found it surprisingly difficult to find an API for college football stat data. Feels like there should be some out there; ESPN used to have one but not any more. I haven't looked as hard at that side yet, tho, as I was mostly focused on getting high school data for now.
 
#9
#9
Unfortunately, couldn't find an API for Maxpreps. And the data on the site is pretty terribly formatted. I tried copying and pasting a couple years worth of data on top 20 QBs and it was a pain in the butt. Not sure if I reach out to them if they'd give me access to data in exchange for publicity; my guess is "no", but maybe worth a shot.

The other thing is that I've found it surprisingly difficult to find an API for college football stat data. Feels like there should be some out there; ESPN used to have one but not any more. I haven't looked as hard at that side yet, tho, as I was mostly focused on getting high school data for now.

You can scrub maxpreps for data using selenium or some other automated html parser. It’s actually fairly trivial using a little regex and some knowledge of HTML DOM
 
  • Like
Reactions: jamesthesame
#10
#10
You can scrub maxpreps for data using selenium or some other automated html parser. It’s actually fairly trivial using a little regex and some knowledge of HTML DOM

The problem is a bit more complex. I don't think there's any real pattern to the URLs and about a quarter of the players don't have stats at all. Some players have multiple pages (e.g. if they played at multiple high schools, it's not all on the same page). And then you have several cases where you have players with the same names; for instance, here's a search for Brian Maurer:

Brian Maurer - High School Athletes | MaxPreps

The data itself, once you find it, is not particularly difficult, but I've struggled to come up with a real strategy to scrap it efficiently. I've scrapped 247 before and that was pretty easy, but maxpreps is kind of an organizational mess.
 
#11
#11
Unfortunately, couldn't find an API for Maxpreps. And the data on the site is pretty terribly formatted. I tried copying and pasting a couple years worth of data on top 20 QBs and it was a pain in the butt. Not sure if I reach out to them if they'd give me access to data in exchange for publicity; my guess is "no", but maybe worth a shot.

The other thing is that I've found it surprisingly difficult to find an API for college football stat data. Feels like there should be some out there; ESPN used to have one but not any more. I haven't looked as hard at that side yet, tho, as I was mostly focused on getting high school data for now.
Coll
Unfortunately, couldn't find an API for Maxpreps. And the data on the site is pretty terribly formatted. I tried copying and pasting a couple years worth of data on top 20 QBs and it was a pain in the butt. Not sure if I reach out to them if they'd give me access to data in exchange for publicity; my guess is "no", but maybe worth a shot.

The other thing is that I've found it surprisingly difficult to find an API for college football stat data. Feels like there should be some out there; ESPN used to have one but not any more. I haven't looked as hard at that side yet, tho, as I was mostly focused on getting high school data for now.
For college stats try this site. They have tons of data but not sure on the API issue.

College Football Statistics and History | College Football at Sports-Reference.com
 
  • Like
Reactions: DiderotsGhost
#12
#12
If you were doing this for soccer it would be easy, tons of API sites. Maxpreps utilizes data reported from coaches / assistants so it will never be complete. Surely some site out there has more complete data an open API.
 
#13
#13
If you were doing this for soccer it would be easy, tons of API sites. Maxpreps utilizes data reported from coaches / assistants so it will never be complete. Surely some site out there has more complete data an open API.

Sports stats sites, from my limited experience, are so all over the place in terms of ease of use. NBA data is really easy to get, for instance, but I've struggled more with this.

If I could at least find a good API on the college stats side, may just deal with slow copy-and-paste from the high school side. Think I'd be fine if I could get maybe 8-10 years of data. Already got 2 from copy-and-paste.
 
#14
#14
In the real world, I'm a Data Scientist. I have a few Tennessee related projects I want to work on. Curious if anyone out there might be interested in helping?

(*) Data viz website. First one is a data visualization website for Tennessee. It's already built it out somewhat, but still messing around with D3 to try to make it work well. Nevertheless, my web design could probably use some work. I'm more of a Python data guy than a web designer. I also admit on a basic level, I hate web design.

(*) Predicting Best QBs. Second project is machine learning algorithm that would use high school stats to try to project how good a college QB would be. I've started working on this, but I'm struggling with ideas on how to automate some of the data collection; it's been very cumbersome thus far. My secret reason for building this is to see how a machine learning model would project our inexperienced QBs next season (Shrout, Maurer, Bailey, or even Hill); also to test out the idea that Bailey is underrated in the ratings services (an opinion many here have; I'm agnostic on it right now).

This is a passion project / hobby; don't really expect us to generate boatloads of money or anything, but maybe if we got things going, we could make a little ad revenue.

DM me if you want to know more / have other ideas / have interest. I already have some samples of the D3 that can be shared for those that want to see; just not up on website yet.

Sounds like some good ideas. Same line of work here but more of an R guy. 😎

I’m with you. I hate web design. But not as much as I despise web scraping.

It looks like the max preps url structure is
www.maxpreps.com/athlete/[name]/[id]/default.htm

Reference the two pages below to see the url structure.

One strategy might be to find a page with a list of all qbs on maxpreps, inspect html manually, collect all ids, and then scrape with a regex pattern that matches the above with the ids you collected. Won’t matter the format or content of the athlete name with the right regex.

Brian Maurer's High School Timeline

Brianna Maurer | Ellsworth HS, Ellsworth, WI | MaxPreps
 
Last edited:
#15
#15
In the real world, I'm a Data Scientist. I have a few Tennessee related projects I want to work on. Curious if anyone out there might be interested in helping?

(*) Data viz website. First one is a data visualization website for Tennessee. It's already built it out somewhat, but still messing around with D3 to try to make it work well. Nevertheless, my web design could probably use some work. I'm more of a Python data guy than a web designer. I also admit on a basic level, I hate web design.

(*) Predicting Best QBs. Second project is machine learning algorithm that would use high school stats to try to project how good a college QB would be. I've started working on this, but I'm struggling with ideas on how to automate some of the data collection; it's been very cumbersome thus far. My secret reason for building this is to see how a machine learning model would project our inexperienced QBs next season (Shrout, Maurer, Bailey, or even Hill); also to test out the idea that Bailey is underrated in the ratings services (an opinion many here have; I'm agnostic on it right now).

This is a passion project / hobby; don't really expect us to generate boatloads of money or anything, but maybe if we got things going, we could make a little ad revenue.

DM me if you want to know more / have other ideas / have interest. I already have some samples of the D3 that can be shared for those that want to see; just not up on website yet.
Have you already chosen a framework? And are you married to using D3?
 
#16
#16
Sports stats sites, from my limited experience, are so all over the place in terms of ease of use. NBA data is really easy to get, for instance, but I've struggled more with this.

If I could at least find a good API on the college stats side, may just deal with slow copy-and-paste from the high school side. Think I'd be fine if I could get maybe 8-10 years of data. Already got 2 from copy-and-paste.
You could still run a decent model with 2-4 years of data. That's usually all I have to work with at my job
 
#17
#17
You could still run a decent model with 2-4 years of data. That's usually all I have to work with at my job

My gut says that would not yield a terribly good model with this data. Really depends more on the number of datapoints and probably only taking about 20 - 25 QBs from every class and then 1/4 of those end up basically being all nulls, unfortunately. So I think 8-10 years is probably needed at minimum; try to get to 200+ QBs overall.
 
#18
#18
  • Like
Reactions: sona
#19
#19
My gut says that would not yield a terribly good model with this data. Really depends more on the number of datapoints and probably only taking about 20 - 25 QBs from every class and then 1/4 of those end up basically being all nulls, unfortunately. So I think 8-10 years is probably needed at minimum; try to get to 200+ QBs overall.

One of the first questions I silently wondered was can high school’s QB performance actually predict success at the college level? My instinct says the tools around him in college are more important than their success in high school.

But that’s why we build the models. My hypotheses are proven wrong often 😂
 
#20
#20
One of the first questions I silently wondered was can high school’s QB performance actually predict success at the college level? My instinct says the tools around him in college are more important than their success in high school.

But that’s why we build the models. My hypotheses are proven wrong often 😂

I worked in investment before data science, so I've been working with data for like a decade and the truth of the matter is that most of the time, you don't find anything ... but you never really know until you try.

I think in this case, there's reason to believe that it's somewhat predicative, but not strongly. I suspect that some indicators such as accuracy and "pass attempts per interception" are more predictive than others (yards). But I have no idea; I could be totally wrong; or the connections could be too weak to draw any meaningful info.
 
  • Like
Reactions: Steinlöwen
#21
#21
I worked in investment before data science, so I've been working with data for like a decade and the truth of the matter is that most of the time, you don't find anything ... but you never really know until you try.

I think in this case, there's reason to believe that it's somewhat predicative, but not strongly. I suspect that some indicators such as accuracy and "pass attempts per interception" are more predictive than others (yards). But I have no idea; I could be totally wrong; or the connections could be too weak to draw any meaningful info.

Truth!
 
#22
#22
One of the first questions I silently wondered was can high school’s QB performance actually predict success at the college level? My instinct says the tools around him in college are more important than their success in high school.

But that’s why we build the models. My hypotheses are proven wrong often 😂
That's something you could take into account though. What if you looked at incoming Receivers and OL also
 
#23
#23
I feel like you’re going to have trouble with the high school stats. There might be a proprietary API out there that’s used by recruiting services or the schools themselves, but unless there are some data collection standards that are commonly adopted you will have a lot of trouble with the data, especially given that you’ll need historical data to get any predictive power. Beyond getting variable quality of data you’re going to be working with a lot of data provided by coaches about their own players, so there’s a conflict of interest as well.

I’m a web developer but I work on the server side mostly and I don’t do any design at all.
 
#24
#24
For some reason, I can't start a private conversation. Ugh

I'm a Data Engineer myself and would love to work with you. I've been looking for a project to work on during my weekly "innovation hours" at work.
 
#25
#25
For some reason, I can't start a private conversation. Ugh

I'm a Data Engineer myself and would love to work with you. I've been looking for a project to work on during my weekly "innovation hours" at work.

I always thought you could PM people here with the redesign, but maybe I was wrong. I'm not seeing a way to do it either.
 

VN Store



Back
Top